Grammalecte  Check-in [edcc777d4b]

Overview
Comment:[graphspell][fr] better suggestion mechanism
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk | fr | graphspell
Files: files | file ages | folders
SHA3-256: edcc777d4be1db9a1417b6912047d84e2682961609142d23198f73a4b0b24f60
User & Date: olr on 2025-09-14 10:00:59
Other Links: manifest | tags
Context
2025-09-14
12:42
[graphspell] suggestion mechanism update check-in: 6d2e9dc4cb user: olr tags: trunk, graphspell
10:00
[graphspell][fr] better suggestion mechanism check-in: edcc777d4b user: olr tags: trunk, fr, graphspell
2025-09-13
13:25
[graphspell] ad hoc suggestions with full uppercase words check-in: 7e24b83b14 user: olr tags: trunk, graphspell
Changes

Modified gc_lang/fr/modules/tests_modules.py from [20f3337db3] to [10ff0f96cf].

74
75
76
77
78
79
80
81
82
83


84
85
86
87
88
89
90
74
75
76
77
78
79
80

81

82
83
84
85
86
87
88
89
90







-

-
+
+







            ("OEIL", "ŒIL"),
            ("apele", "appel"),
            ("Co2", "CO₂"),
            ("emmppâiiiller", "empailler"),
            ("testt", "test"),
            ("apelaion", "appellation"),
            ("exsepttion", "exception"),
            ("sintaxik", "syntaxique"),
            ("ebriete", "ébriété"),
            ("ennormmement", "énormément")
            ("ennormmement", "énormément"),
            ("sintaxik", "syntaxique")
        ]:
            #with timeblock(sWord):
            for lSugg in self.oSpellChecker.suggest(sWrong):
                #print(sWord, "->", " ".join(lSugg))
                self.assertIn(sSugg, lSugg)


Modified gc_lang/fr/rules.grx from [1f6fafb05d] to [57080614e2].

11808
11809
11810
11811
11812
11813
11814








11815
11816
11817
11818
11819
11820
11821
11808
11809
11810
11811
11812
11813
11814
11815
11816
11817
11818
11819
11820
11821
11822
11823
11824
11825
11826
11827
11828
11829







+
+
+
+
+
+
+
+







    [a|à] [>bât|ba+s|bat+s] [on|ont|>ton|>thon] [>rond|ron+s]  [pu+s|put|pue+s]
        <<- /loc/ ->> à bâtons rompus                   && Confusion. Locution « à bâtons rompus ».|https://fr.wiktionary.org/wiki/%C3%A0_b%C3%A2tons_rompus

TEST: s’organiser, comme on put, {{à bas thon rond pu}}                             ->> à bâtons rompus
TEST: actions {{à bâton rompu}}                                                     ->> à bâtons rompus
TEST: discuter à bâtons rompus


__conf_à_bon_port__
    !! à bon port ¡¡
    [à|a] bon [>porc|>pore|>port]
        <<- /conf/ ->> à bon port                                           && Locution adverbiale “à bon port”.|https://fr.wiktionary.org/wiki/%C3%A0_bon_port

TEST: {{a bon porc}}                                                                ->> à bon port


__conf_à_cor_et_à_cri__
    !! à cor et à cri ¡¡
    [a|à] [corps|cor+s] [et|est|es|ait|ais|aies|é|è|ei|ai|aient] [à|a] [cri+s|crie+s|crit]
        <<- /loc/ ->> à cor et à cri                    && Locution adverbiale invariable. (Le cor est un instrument à vent utilisé pour la chasse.)|https://fr.wiktionary.org/wiki/%C3%A0_cor_et_%C3%A0_cri

TEST: Elles hurlèrent {{à corps et à cris}}.                                        ->> à cor et à cri
15793
15794
15795
15796
15797
15798
15799
15800

15801
15802
15803
15804
15805
15806
15807
15808
15809
15810
15811
15812
15813
15814
15815
15816
15817
15818
15819
15820
15821
15822
15823
15824
15825
15826
15827
15828
15829
15830
15831
15832
15833
15834
15835
15836
15837
15838
15839
15840
15841
15801
15802
15803
15804
15805
15806
15807

15808
15809
15810
15811
15812
15813
15814
15815
15816
15817
15818
15819
15820
15821

15822
15823
15824
15825
15826
15827
15828
15829
15830
15831
15832
15833
15834
15835
15836
15837
15838
15839
15840

15841
15842
15843
15844
15845
15846
15847







-
+













-



















-







    [>avaler|>bouffer|>cuire|>élever|>égorger|>frire|>manger|>mâcher|>mastiquer|>rôtir|>tuer] $:D¬:R [>pore|>port]
    >élevage [de|d’] [>pore|>port]
        <<- /conf/ --1>> porc|porcs                                         && Confusion. Pour évoquer l’animal, écrivez “porc”.|https://fr.wiktionary.org/wiki/porc

    >pore [qui|qué] [>pique|>pic]
    >porc [qui|qué] [>pique|>pic]
    >port [qui|qué] [>pique|>pic]
        <<- morph(<1, ":D|<start>|>[(,]") ->> porc-épic|porcs-épics            && Confusion générale ? Si vous parlez de l’animal, écrivez “porc-épic”.|https://fr.wiktionary.org/wiki/porc-%C3%A9pic
        <<- /conf/ morph(<1, ":D|<start>|>[(,]") ->> porc-épic|porcs-épics  && Confusion générale ? Si vous parlez de l’animal, écrivez “porc-épic”.|https://fr.wiktionary.org/wiki/porc-%C3%A9pic

    ## port
    [>pore|>porc] [de|d’] [plaisance|pêche]
    [>pore|>porc] [>maritime|>spatial]
        <<- /conf/ -1>> port|ports                                          && Confusion. Pour évoquer un havre côtier où les bateaux accostent, écrivez “port”.|https://fr.wiktionary.org/wiki/port

    [<start>|,|le|au|du]  [pore|porc]  [de|d’]  [Alexandrie|Algésiras|Almirante␣Barroso|Amsterdam|Angra␣dos␣Reis|Anvers|Baltimore|Bandar␣Abbas|Bandar␣Khomeini|Bangkok|Barcelone|Bayonne|Beaumont|Bergen|Bintulu|Bombay|Bordeaux|Botaş|Bremerhaven|Brisbane|Brême|Busan|Bâton-Rouge|Calais|Calcutta|Callao|Cannes|Canton|Cayo␣Arcas|Ceyhan|Chennai|Cherbourg|Chiba|Chittagong|Colombo|Constanța|Corpus␣Christi|Daesan|Dakar|Dalian|Dampier|Dieppe|Djeddah|Dubaï|Dunkerque|Durban|el-Dekheila|Gladstone|Goa|Grimsby|Guangzhou|Gwangyang|Gênes|Göteborg|Hambourg|Hampton␣Roads|Hay␣Point|Honfleur|Hong-Kong|Hong␣Kong|Honshu|Houston|Huntington-Tristate|Hô-Chi-Minh-Ville|Ibiza|Immingham|Inchon|Istanbul|Itaguaí|Itaqui|Izmir|Izmit|Jaffa|Jakarta|Jawaharlal␣Nehru|Jebel␣Ali|Jubail|Kaohsiung|Karachi|Kitakyushu|Kobe|Kota␣Baru|Laem␣Chabang|Lake␣Charles|La␣Nouvelle-Orléans|La␣Rochelle|Lianyungang|Londres|Long␣Beach|Lorient|Los␣Angeles|Madras|Manille|Marseille|Miami|Milford␣Haven|Mobile|Monaco|Montréal|Mormugão|Mumbai|Nagoya|Nantes|Naples|Newcastle|New␣Jersey|New␣York|Nice|Ningbo-Zhoushann|Novorossiysk|Odessa|Oran|Osaka|Ouistreham|Oust-Louga|Paradip|Paranaguá|Philadelphie|Plaquemine|Pohang|Porto-Vecchio|Port␣Hedlandn|Primorsk|Qingdao|Qinhuangdao|Quibéron|Quimper|Richards␣Bay|Rizhao|Rome|Rotterdam|Rouen|Saigon|Saint-Pétersbourg|Saldanha␣Bay|Santos|San␣Lorenzo|Sepetiba|Shanghai|Shenzhen|Singapour|Southampton|São␣Sebastião|Taichung|Tanger|Tangshan|Tanjung␣Pelepas|Tanjung␣Priok|Texas␣City|Tianjin|Tientsin|Tokyo|Toulon|Trieste|Tsingtao|Tubarão|Tunis|Ulsan|Valence|Vancouver|Vannes|Venise|Visakhapatnam|Waigaoqiao|Washington|Wellington|Wuhan|Xiamen|Xingang|Yanbu|Yangshan|Yantian|Yingkou|Yokohama|Youjne|Zeebruges]
    [<start>|,|le|au|du]  [pore|porc]  du       [Havre|Touquet]
        <<- /conf/ -2>> port                                                && Confusion. Pour évoquer un havre côtier où les bateaux accostent, écrivez “port”.|https://fr.wiktionary.org/wiki/port

    frais [de|d’] [>pore|>porc]
        <<- /conf/ --1>> port                                               && Confusion. Locution “frais de port”.|https://fr.wiktionary.org/wiki/frais_de_port

    [>arriver|>parvenir] ?@:[WX]¿ [à|a] bon [>pore|>porc]
    [>accoster|>amarrer] ?@:[WX]¿ au [>pore|>porc]
    [>accoster|>amarrer] ?@:[WX]¿ à ce ?petit¿ [>pore|>porc]
        <<- /conf/ --1>> port                                               && Confusion. Pour évoquer un havre côtier où les bateaux accostent, écrivez “port”.|https://fr.wiktionary.org/wiki/port

    [>pore|>porc] [usb|RJ45|DVI|HDMI|Ethernet|DisplayPort]
        <<- /conf/ -1>> port|ports                                          && Confusion. Pour évoquer les connecteurs, écrivez “port”.|https://fr.wiktionary.org/wiki/port

    [pore|porc] de l’ [écharpe|étoile|épée]
    [pore|porc] du masque
        <<- /conf/ -1>> port                                                && Confusion. Pour évoquer ce que l’on porte, écrivez “port”.|https://fr.wiktionary.org/wiki/port

TEST: les {{ports}} de la peau                                              ->> pore|pores
TEST: du {{port}} au caramel                                                ->> porc|porcs
TEST: elle prépare une blanquette de {{port}}                               ->> porc
TEST: un filet mignon de {{pore}}                                           ->> porc
TEST: j’en ai marre de bouffer du {{port}}                                  ->> porc|porcs
TEST: un {{porc qui pique}}                                                 ->> porc-épic|porcs-épics
TEST: le {{pore}} de l’étoile jaune                                         ->> port
TEST: un petit {{porc}} de plaisance                                        ->> port|ports
TEST: nous parvenons enfin à bon {{pore}}                                   ->> port
TEST: les frais de {{pore}}                                                 ->> port
TEST: Accoste au {{porc}}                                                   ->> port
TEST: le {{pore}} de La Rochelle                                            ->> port
TEST: Connecte le {{pore}} USB                                              ->> port|ports
TEST: transpirer par tous les {{porcs}}                                     ->> pores
TEST: je transporte des porcs de Calais à Londres.

Modified graphspell-js/str_transform.js from [3da9c81f36] to [d682ed160b].

35
36
37
38
39
40
41
42
43
44
45
46





47
48
49
50
51
52
53
35
36
37
38
39
40
41





42
43
44
45
46
47
48
49
50
51
52
53







-
-
-
-
-
+
+
+
+
+







        for (let c of sWord) {
            sNewWord += this._xTransCharsForSpelling.gl_get(c, c);
        }
        return sNewWord.normalize("NFC");
    },

    _xTransCharsForSimplification: new Map([
        ['à', 'a'],  ['é', 'é'],  ['î', 'i'],  ['ô', 'o'],  ['û', 'u'],  ['ÿ', 'y'],
        ['â', 'a'],  ['è', 'é'],  ['ï', 'i'],  ['ö', 'o'],  ['ù', 'u'],  ['ŷ', 'y'],
        ['ä', 'a'],  ['ê', 'é'],  ['í', 'i'],  ['ó', 'o'],  ['ü', 'u'],  ['ý', 'y'],
        ['á', 'a'],  ['ë', 'é'],  ['ì', 'i'],  ['ò', 'o'],  ['ú', 'u'],  ['ỳ', 'y'],
        ['ā', 'a'],  ['ē', 'é'],  ['ī', 'i'],  ['ō', 'o'],  ['ū', 'u'],  ['ȳ', 'y'],
        ['à', 'a'],  ['é', 'e'],  ['î', 'i'],  ['ô', 'o'],  ['û', 'u'],  ['ÿ', 'i'],   ['y', 'i'],
        ['â', 'a'],  ['è', 'e'],  ['ï', 'i'],  ['ö', 'o'],  ['ù', 'u'],  ['ŷ', 'i'],
        ['ä', 'a'],  ['ê', 'e'],  ['í', 'i'],  ['ó', 'o'],  ['ü', 'u'],  ['ý', 'i'],
        ['á', 'a'],  ['ë', 'e'],  ['ì', 'i'],  ['ò', 'o'],  ['ú', 'u'],  ['ỳ', 'i'],
        ['ā', 'a'],  ['ē', 'e'],  ['ī', 'i'],  ['ō', 'o'],  ['ū', 'u'],  ['ȳ', 'i'],
        ['ç', 'c'],  ['ñ', 'n'],
        ['œ', 'oe'], ['æ', 'ae'],
        ['ſ', 's'],  ['ffi', 'ffi'],  ['ffl', 'ffl'],  ['ff', 'ff'],  ['ſt', 'ft'],  ['fi', 'fi'],  ['fl', 'fl'],  ['st', 'st'],
        ["⁰", "0"], ["¹", "1"], ["²", "2"], ["³", "3"], ["⁴", "4"], ["⁵", "5"], ["⁶", "6"], ["⁷", "7"], ["⁸", "8"], ["⁹", "9"],
        ["₀", "0"], ["₁", "1"], ["₂", "2"], ["₃", "3"], ["₄", "4"], ["₅", "5"], ["₆", "6"], ["₇", "7"], ["₈", "8"], ["₉", "9"],
        ["’", ""], ["'", ""], ["ʼ", ""], ["‘", ""], ["‛", ""], ["´", ""], ["`", ""], ["′", ""], ["‵", ""], ["՚", ""], ["ꞌ", ""], ["Ꞌ", ""],
        ["-", ""]
61
62
63
64
65
66
67
68

69
70
71
72
73
74
75
61
62
63
64
65
66
67

68
69
70
71
72
73
74
75







-
+







        let i = 1;
        for (let c of sWord) {
            if (c != sWord.slice(i, i+1) || (c == 'e' && sWord.slice(i, i+2) != "ee")) {  // exception for <e> to avoid confusion between crée / créai
                sNewWord += c;
            }
            i++;
        }
        return sNewWord.replace(/eau/g, "o").replace(/au/g, "o").replace(/ai/g, "éi").replace(/ei/g, "é").replace(/ph/g, "f");
        return sNewWord.replace(/eau/g, "o").replace(/au/g, "o").replace(/ei/g, "e").replace(/ai/g, "e").replace(/ph/g, "f");
    },

    cleanWord: function (sWord) {
        // word clean for the user who make commun and preditive error help suggest
        // remove letters repeated more than 2 times
        return sWord.replace(/(.)\1{2,}/ig,'$1$1');
    },

Modified graphspell/ibdawg.py from [6f7d77073f] to [58e076d75b].

46
47
48
49
50
51
52
53
54
55


56
57
58

59
60
61
62
63

64
65
66
67
68



69
70
71
72
73
74
75
46
47
48
49
50
51
52



53
54
55
56

57
58
59
60
61

62
63
64
65


66
67
68
69
70
71
72
73
74
75







-
-
-
+
+


-
+




-
+



-
-
+
+
+







        self.sSimplifiedWord = st.simplifyWord(sWord)
        self.nDistLimit = nDistLimit  if nDistLimit >= 0  else  (len(sWord) // 3) + 1 # used in suggest()
        self.nMinDist = 1000
        # Temporary sets
        self.aAllSugg = set()   # All suggestions, even the one rejected
        self.dAccSugg = {}      # Accepted suggestions
        # Parameters

        self.nSuggLimit = nSuggLimit
        self.nTempSuggLimit = nSuggLimit * 6
        self.nSuggLimit = nSuggLimit            # number of returned suggestions
        self.nTempSuggLimit = nSuggLimit * 6    # limit of accepted suggestions (ends search over this limit)

    def addSugg (self, sSugg, nDeep=0):
        "add a suggestion"
        "add a suggestion to the suggestion list"
        if sSugg in self.aAllSugg:
            return
        self.aAllSugg.add(sSugg)
        nSimDist = st.distanceSift4(self.sSimplifiedWord, st.simplifyWord(sSugg))
        st.showDistance(self.sSimplifiedWord, st.simplifyWord(sSugg))
        #st.showDistance(self.sSimplifiedWord, st.simplifyWord(sSugg))
        if nSimDist < self.nMinDist:
            self.nMinDist = nSimDist
        if nSimDist <= (self.nMinDist + 1):
            nDist = st.distanceJaroWinkler(self.sWord, sSugg)
            st.showDistance(self.sWord, sSugg)
            nDist = min(st.distanceDamerauLevenshtein(self.sWord, sSugg), st.distanceDamerauLevenshtein(self.sSimplifiedWord, st.simplifyWord(sSugg)))
            #print(">", end="")
            #st.showDistance(self.sWord, sSugg)
            self.dAccSugg[sSugg] = min(nDist, nSimDist+1)
            if len(self.dAccSugg) > self.nTempSuggLimit:
                self.nDistLimit = -1  # suggest() ends searching when this variable = -1
        self.nDistLimit = min(self.nDistLimit, self.nMinDist+1)

    def getSuggestions (self):
        "return a list of suggestions"

Modified graphspell/str_transform.py from [853d07d14c] to [bbecc59017].

28
29
30
31
32
33
34
35
36
37
38
39





40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

57
58
59
60
61
62
63
28
29
30
31
32
33
34





35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

56
57
58
59
60
61
62
63







-
-
-
-
-
+
+
+
+
+
















-
+








def spellingNormalization (sWord):
    "nomalization NFC and removing ligatures"
    return unicodedata.normalize("NFC", sWord.translate(_xTransCharsForSpelling))


_xTransCharsForSimplification = str.maketrans({
    'à': 'a',  'é': 'é',  'î': 'i',  'ô': 'o',  'û': 'u',  'ÿ': 'y',
    'â': 'a',  'è': 'é',  'ï': 'i',  'ö': 'o',  'ù': 'u',  'ŷ': 'y',
    'ä': 'a',  'ê': 'é',  'í': 'i',  'ó': 'o',  'ü': 'u',  'ý': 'y',
    'á': 'a',  'ë': 'é',  'ì': 'i',  'ò': 'o',  'ú': 'u',  'ỳ': 'y',
    'ā': 'a',  'ē': 'é',  'ī': 'i',  'ō': 'o',  'ū': 'u',  'ȳ': 'y',
    'à': 'a',  'é': 'e',  'î': 'i',  'ô': 'o',  'û': 'u',  'ÿ': 'i', 'y': 'i',
    'â': 'a',  'è': 'e',  'ï': 'i',  'ö': 'o',  'ù': 'u',  'ŷ': 'i',
    'ä': 'a',  'ê': 'e',  'í': 'i',  'ó': 'o',  'ü': 'u',  'ý': 'i',
    'á': 'a',  'ë': 'e',  'ì': 'i',  'ò': 'o',  'ú': 'u',  'ỳ': 'i',
    'ā': 'a',  'ē': 'e',  'ī': 'i',  'ō': 'o',  'ū': 'u',  'ȳ': 'i',
    'ç': 'c',  'ñ': 'n',
    'œ': 'oe',  'æ': 'ae',
    'ſ': 's',  'ffi': 'ffi',  'ffl': 'ffl',  'ff': 'ff',  'ſt': 'ft',  'fi': 'fi',  'fl': 'fl',  'st': 'st',
    "⁰": "0", "¹": "1", "²": "2", "³": "3", "⁴": "4", "⁵": "5", "⁶": "6", "⁷": "7", "⁸": "8", "⁹": "9",
    "₀": "0", "₁": "1", "₂": "2", "₃": "3", "₄": "4", "₅": "5", "₆": "6", "₇": "7", "₈": "8", "₉": "9",
    "’": "", "'": "", "ʼ": "", "‘": "", "‛": "", "´": "", "`": "", "′": "", "‵": "", "՚": "", "ꞌ": "", "Ꞌ": "",
    "-": ""
})

def simplifyWord (sWord):
    "word simplication before calculating distance between words"
    sWord = sWord.lower().translate(_xTransCharsForSimplification)
    sNewWord = ""
    for i, c in enumerate(sWord, 1):
        if c != sWord[i:i+1] or (c == 'e' and sWord[i:i+2] != "ee"):  # exception for <e> to avoid confusion between crée / créai
            sNewWord += c
    return sNewWord.replace("eau", "o").replace("au", "o").replace("ai", "éi").replace("ei", "é").replace("ph", "f")
    return sNewWord.replace("eau", "o").replace("au", "o").replace("ei", "e").replace("ai", "ei").replace("ph", "f")


def cleanWord (sWord):
    "remove letters repeated more than 2 times"
    return re.sub("(.)\\1{2,}", '\\1\\1', sWord)