Overview
Comment: | [graphspell][fr] better suggestion mechanism |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk | fr | graphspell |
Files: | files | file ages | folders |
SHA3-256: |
edcc777d4be1db9a1417b6912047d84e |
User & Date: | olr on 2025-09-14 10:00:59 |
Other Links: | manifest | tags |
Context
2025-09-14
| ||
12:42 | [graphspell] suggestion mechanism update check-in: 6d2e9dc4cb user: olr tags: trunk, graphspell | |
10:00 | [graphspell][fr] better suggestion mechanism check-in: edcc777d4b user: olr tags: trunk, fr, graphspell | |
2025-09-13
| ||
13:25 | [graphspell] ad hoc suggestions with full uppercase words check-in: 7e24b83b14 user: olr tags: trunk, graphspell | |
Changes
Modified gc_lang/fr/modules/tests_modules.py from [20f3337db3] to [10ff0f96cf].
︙ | ︙ | |||
74 75 76 77 78 79 80 | ("OEIL", "ŒIL"), ("apele", "appel"), ("Co2", "CO₂"), ("emmppâiiiller", "empailler"), ("testt", "test"), ("apelaion", "appellation"), ("exsepttion", "exception"), | < | > | 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | ("OEIL", "ŒIL"), ("apele", "appel"), ("Co2", "CO₂"), ("emmppâiiiller", "empailler"), ("testt", "test"), ("apelaion", "appellation"), ("exsepttion", "exception"), ("ebriete", "ébriété"), ("ennormmement", "énormément"), ("sintaxik", "syntaxique") ]: #with timeblock(sWord): for lSugg in self.oSpellChecker.suggest(sWrong): #print(sWord, "->", " ".join(lSugg)) self.assertIn(sSugg, lSugg) |
︙ | ︙ |
Modified gc_lang/fr/rules.grx from [1f6fafb05d] to [57080614e2].
︙ | ︙ | |||
11808 11809 11810 11811 11812 11813 11814 11815 11816 11817 11818 11819 11820 11821 | [a|à] [>bât|ba+s|bat+s] [on|ont|>ton|>thon] [>rond|ron+s] [pu+s|put|pue+s] <<- /loc/ ->> à bâtons rompus && Confusion. Locution « à bâtons rompus ».|https://fr.wiktionary.org/wiki/%C3%A0_b%C3%A2tons_rompus TEST: s’organiser, comme on put, {{à bas thon rond pu}} ->> à bâtons rompus TEST: actions {{à bâton rompu}} ->> à bâtons rompus TEST: discuter à bâtons rompus __conf_à_cor_et_à_cri__ !! à cor et à cri ¡¡ [a|à] [corps|cor+s] [et|est|es|ait|ais|aies|é|è|ei|ai|aient] [à|a] [cri+s|crie+s|crit] <<- /loc/ ->> à cor et à cri && Locution adverbiale invariable. (Le cor est un instrument à vent utilisé pour la chasse.)|https://fr.wiktionary.org/wiki/%C3%A0_cor_et_%C3%A0_cri TEST: Elles hurlèrent {{à corps et à cris}}. ->> à cor et à cri | > > > > > > > > | 11808 11809 11810 11811 11812 11813 11814 11815 11816 11817 11818 11819 11820 11821 11822 11823 11824 11825 11826 11827 11828 11829 | [a|à] [>bât|ba+s|bat+s] [on|ont|>ton|>thon] [>rond|ron+s] [pu+s|put|pue+s] <<- /loc/ ->> à bâtons rompus && Confusion. Locution « à bâtons rompus ».|https://fr.wiktionary.org/wiki/%C3%A0_b%C3%A2tons_rompus TEST: s’organiser, comme on put, {{à bas thon rond pu}} ->> à bâtons rompus TEST: actions {{à bâton rompu}} ->> à bâtons rompus TEST: discuter à bâtons rompus __conf_à_bon_port__ !! à bon port ¡¡ [à|a] bon [>porc|>pore|>port] <<- /conf/ ->> à bon port && Locution adverbiale “à bon port”.|https://fr.wiktionary.org/wiki/%C3%A0_bon_port TEST: {{a bon porc}} ->> à bon port __conf_à_cor_et_à_cri__ !! à cor et à cri ¡¡ [a|à] [corps|cor+s] [et|est|es|ait|ais|aies|é|è|ei|ai|aient] [à|a] [cri+s|crie+s|crit] <<- /loc/ ->> à cor et à cri && Locution adverbiale invariable. (Le cor est un instrument à vent utilisé pour la chasse.)|https://fr.wiktionary.org/wiki/%C3%A0_cor_et_%C3%A0_cri TEST: Elles hurlèrent {{à corps et à cris}}. ->> à cor et à cri |
︙ | ︙ | |||
15793 15794 15795 15796 15797 15798 15799 | [>avaler|>bouffer|>cuire|>élever|>égorger|>frire|>manger|>mâcher|>mastiquer|>rôtir|>tuer] $:D¬:R [>pore|>port] >élevage [de|d’] [>pore|>port] <<- /conf/ --1>> porc|porcs && Confusion. Pour évoquer l’animal, écrivez “porc”.|https://fr.wiktionary.org/wiki/porc >pore [qui|qué] [>pique|>pic] >porc [qui|qué] [>pique|>pic] >port [qui|qué] [>pique|>pic] | | < < | 15801 15802 15803 15804 15805 15806 15807 15808 15809 15810 15811 15812 15813 15814 15815 15816 15817 15818 15819 15820 15821 15822 15823 15824 15825 15826 15827 15828 15829 15830 15831 15832 15833 15834 15835 15836 15837 15838 15839 15840 15841 15842 15843 15844 15845 15846 15847 | [>avaler|>bouffer|>cuire|>élever|>égorger|>frire|>manger|>mâcher|>mastiquer|>rôtir|>tuer] $:D¬:R [>pore|>port] >élevage [de|d’] [>pore|>port] <<- /conf/ --1>> porc|porcs && Confusion. Pour évoquer l’animal, écrivez “porc”.|https://fr.wiktionary.org/wiki/porc >pore [qui|qué] [>pique|>pic] >porc [qui|qué] [>pique|>pic] >port [qui|qué] [>pique|>pic] <<- /conf/ morph(<1, ":D|<start>|>[(,]") ->> porc-épic|porcs-épics && Confusion générale ? Si vous parlez de l’animal, écrivez “porc-épic”.|https://fr.wiktionary.org/wiki/porc-%C3%A9pic ## port [>pore|>porc] [de|d’] [plaisance|pêche] [>pore|>porc] [>maritime|>spatial] <<- /conf/ -1>> port|ports && Confusion. Pour évoquer un havre côtier où les bateaux accostent, écrivez “port”.|https://fr.wiktionary.org/wiki/port [<start>|,|le|au|du] [pore|porc] [de|d’] [Alexandrie|Algésiras|Almirante␣Barroso|Amsterdam|Angra␣dos␣Reis|Anvers|Baltimore|Bandar␣Abbas|Bandar␣Khomeini|Bangkok|Barcelone|Bayonne|Beaumont|Bergen|Bintulu|Bombay|Bordeaux|Botaş|Bremerhaven|Brisbane|Brême|Busan|Bâton-Rouge|Calais|Calcutta|Callao|Cannes|Canton|Cayo␣Arcas|Ceyhan|Chennai|Cherbourg|Chiba|Chittagong|Colombo|Constanța|Corpus␣Christi|Daesan|Dakar|Dalian|Dampier|Dieppe|Djeddah|Dubaï|Dunkerque|Durban|el-Dekheila|Gladstone|Goa|Grimsby|Guangzhou|Gwangyang|Gênes|Göteborg|Hambourg|Hampton␣Roads|Hay␣Point|Honfleur|Hong-Kong|Hong␣Kong|Honshu|Houston|Huntington-Tristate|Hô-Chi-Minh-Ville|Ibiza|Immingham|Inchon|Istanbul|Itaguaí|Itaqui|Izmir|Izmit|Jaffa|Jakarta|Jawaharlal␣Nehru|Jebel␣Ali|Jubail|Kaohsiung|Karachi|Kitakyushu|Kobe|Kota␣Baru|Laem␣Chabang|Lake␣Charles|La␣Nouvelle-Orléans|La␣Rochelle|Lianyungang|Londres|Long␣Beach|Lorient|Los␣Angeles|Madras|Manille|Marseille|Miami|Milford␣Haven|Mobile|Monaco|Montréal|Mormugão|Mumbai|Nagoya|Nantes|Naples|Newcastle|New␣Jersey|New␣York|Nice|Ningbo-Zhoushann|Novorossiysk|Odessa|Oran|Osaka|Ouistreham|Oust-Louga|Paradip|Paranaguá|Philadelphie|Plaquemine|Pohang|Porto-Vecchio|Port␣Hedlandn|Primorsk|Qingdao|Qinhuangdao|Quibéron|Quimper|Richards␣Bay|Rizhao|Rome|Rotterdam|Rouen|Saigon|Saint-Pétersbourg|Saldanha␣Bay|Santos|San␣Lorenzo|Sepetiba|Shanghai|Shenzhen|Singapour|Southampton|São␣Sebastião|Taichung|Tanger|Tangshan|Tanjung␣Pelepas|Tanjung␣Priok|Texas␣City|Tianjin|Tientsin|Tokyo|Toulon|Trieste|Tsingtao|Tubarão|Tunis|Ulsan|Valence|Vancouver|Vannes|Venise|Visakhapatnam|Waigaoqiao|Washington|Wellington|Wuhan|Xiamen|Xingang|Yanbu|Yangshan|Yantian|Yingkou|Yokohama|Youjne|Zeebruges] [<start>|,|le|au|du] [pore|porc] du [Havre|Touquet] <<- /conf/ -2>> port && Confusion. Pour évoquer un havre côtier où les bateaux accostent, écrivez “port”.|https://fr.wiktionary.org/wiki/port frais [de|d’] [>pore|>porc] <<- /conf/ --1>> port && Confusion. Locution “frais de port”.|https://fr.wiktionary.org/wiki/frais_de_port [>accoster|>amarrer] ?@:[WX]¿ au [>pore|>porc] [>accoster|>amarrer] ?@:[WX]¿ à ce ?petit¿ [>pore|>porc] <<- /conf/ --1>> port && Confusion. Pour évoquer un havre côtier où les bateaux accostent, écrivez “port”.|https://fr.wiktionary.org/wiki/port [>pore|>porc] [usb|RJ45|DVI|HDMI|Ethernet|DisplayPort] <<- /conf/ -1>> port|ports && Confusion. Pour évoquer les connecteurs, écrivez “port”.|https://fr.wiktionary.org/wiki/port [pore|porc] de l’ [écharpe|étoile|épée] [pore|porc] du masque <<- /conf/ -1>> port && Confusion. Pour évoquer ce que l’on porte, écrivez “port”.|https://fr.wiktionary.org/wiki/port TEST: les {{ports}} de la peau ->> pore|pores TEST: du {{port}} au caramel ->> porc|porcs TEST: elle prépare une blanquette de {{port}} ->> porc TEST: un filet mignon de {{pore}} ->> porc TEST: j’en ai marre de bouffer du {{port}} ->> porc|porcs TEST: un {{porc qui pique}} ->> porc-épic|porcs-épics TEST: le {{pore}} de l’étoile jaune ->> port TEST: un petit {{porc}} de plaisance ->> port|ports TEST: les frais de {{pore}} ->> port TEST: Accoste au {{porc}} ->> port TEST: le {{pore}} de La Rochelle ->> port TEST: Connecte le {{pore}} USB ->> port|ports TEST: transpirer par tous les {{porcs}} ->> pores TEST: je transporte des porcs de Calais à Londres. |
︙ | ︙ |
Modified graphspell-js/str_transform.js from [3da9c81f36] to [d682ed160b].
︙ | ︙ | |||
35 36 37 38 39 40 41 | for (let c of sWord) { sNewWord += this._xTransCharsForSpelling.gl_get(c, c); } return sNewWord.normalize("NFC"); }, _xTransCharsForSimplification: new Map([ | | | | | | | 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | for (let c of sWord) { sNewWord += this._xTransCharsForSpelling.gl_get(c, c); } return sNewWord.normalize("NFC"); }, _xTransCharsForSimplification: new Map([ ['à', 'a'], ['é', 'e'], ['î', 'i'], ['ô', 'o'], ['û', 'u'], ['ÿ', 'i'], ['y', 'i'], ['â', 'a'], ['è', 'e'], ['ï', 'i'], ['ö', 'o'], ['ù', 'u'], ['ŷ', 'i'], ['ä', 'a'], ['ê', 'e'], ['í', 'i'], ['ó', 'o'], ['ü', 'u'], ['ý', 'i'], ['á', 'a'], ['ë', 'e'], ['ì', 'i'], ['ò', 'o'], ['ú', 'u'], ['ỳ', 'i'], ['ā', 'a'], ['ē', 'e'], ['ī', 'i'], ['ō', 'o'], ['ū', 'u'], ['ȳ', 'i'], ['ç', 'c'], ['ñ', 'n'], ['œ', 'oe'], ['æ', 'ae'], ['ſ', 's'], ['ffi', 'ffi'], ['ffl', 'ffl'], ['ff', 'ff'], ['ſt', 'ft'], ['fi', 'fi'], ['fl', 'fl'], ['st', 'st'], ["⁰", "0"], ["¹", "1"], ["²", "2"], ["³", "3"], ["⁴", "4"], ["⁵", "5"], ["⁶", "6"], ["⁷", "7"], ["⁸", "8"], ["⁹", "9"], ["₀", "0"], ["₁", "1"], ["₂", "2"], ["₃", "3"], ["₄", "4"], ["₅", "5"], ["₆", "6"], ["₇", "7"], ["₈", "8"], ["₉", "9"], ["’", ""], ["'", ""], ["ʼ", ""], ["‘", ""], ["‛", ""], ["´", ""], ["`", ""], ["′", ""], ["‵", ""], ["՚", ""], ["ꞌ", ""], ["Ꞌ", ""], ["-", ""] |
︙ | ︙ | |||
61 62 63 64 65 66 67 | let i = 1; for (let c of sWord) { if (c != sWord.slice(i, i+1) || (c == 'e' && sWord.slice(i, i+2) != "ee")) { // exception for <e> to avoid confusion between crée / créai sNewWord += c; } i++; } | | | 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | let i = 1; for (let c of sWord) { if (c != sWord.slice(i, i+1) || (c == 'e' && sWord.slice(i, i+2) != "ee")) { // exception for <e> to avoid confusion between crée / créai sNewWord += c; } i++; } return sNewWord.replace(/eau/g, "o").replace(/au/g, "o").replace(/ei/g, "e").replace(/ai/g, "e").replace(/ph/g, "f"); }, cleanWord: function (sWord) { // word clean for the user who make commun and preditive error help suggest // remove letters repeated more than 2 times return sWord.replace(/(.)\1{2,}/ig,'$1$1'); }, |
︙ | ︙ |
Modified graphspell/ibdawg.py from [6f7d77073f] to [58e076d75b].
︙ | ︙ | |||
46 47 48 49 50 51 52 | self.sSimplifiedWord = st.simplifyWord(sWord) self.nDistLimit = nDistLimit if nDistLimit >= 0 else (len(sWord) // 3) + 1 # used in suggest() self.nMinDist = 1000 # Temporary sets self.aAllSugg = set() # All suggestions, even the one rejected self.dAccSugg = {} # Accepted suggestions # Parameters | < | | | | | > | | 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | self.sSimplifiedWord = st.simplifyWord(sWord) self.nDistLimit = nDistLimit if nDistLimit >= 0 else (len(sWord) // 3) + 1 # used in suggest() self.nMinDist = 1000 # Temporary sets self.aAllSugg = set() # All suggestions, even the one rejected self.dAccSugg = {} # Accepted suggestions # Parameters self.nSuggLimit = nSuggLimit # number of returned suggestions self.nTempSuggLimit = nSuggLimit * 6 # limit of accepted suggestions (ends search over this limit) def addSugg (self, sSugg, nDeep=0): "add a suggestion to the suggestion list" if sSugg in self.aAllSugg: return self.aAllSugg.add(sSugg) nSimDist = st.distanceSift4(self.sSimplifiedWord, st.simplifyWord(sSugg)) #st.showDistance(self.sSimplifiedWord, st.simplifyWord(sSugg)) if nSimDist < self.nMinDist: self.nMinDist = nSimDist if nSimDist <= (self.nMinDist + 1): nDist = min(st.distanceDamerauLevenshtein(self.sWord, sSugg), st.distanceDamerauLevenshtein(self.sSimplifiedWord, st.simplifyWord(sSugg))) #print(">", end="") #st.showDistance(self.sWord, sSugg) self.dAccSugg[sSugg] = min(nDist, nSimDist+1) if len(self.dAccSugg) > self.nTempSuggLimit: self.nDistLimit = -1 # suggest() ends searching when this variable = -1 self.nDistLimit = min(self.nDistLimit, self.nMinDist+1) def getSuggestions (self): "return a list of suggestions" |
︙ | ︙ |
Modified graphspell/str_transform.py from [853d07d14c] to [bbecc59017].
︙ | ︙ | |||
28 29 30 31 32 33 34 | def spellingNormalization (sWord): "nomalization NFC and removing ligatures" return unicodedata.normalize("NFC", sWord.translate(_xTransCharsForSpelling)) _xTransCharsForSimplification = str.maketrans({ | | | | | | | | 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | def spellingNormalization (sWord): "nomalization NFC and removing ligatures" return unicodedata.normalize("NFC", sWord.translate(_xTransCharsForSpelling)) _xTransCharsForSimplification = str.maketrans({ 'à': 'a', 'é': 'e', 'î': 'i', 'ô': 'o', 'û': 'u', 'ÿ': 'i', 'y': 'i', 'â': 'a', 'è': 'e', 'ï': 'i', 'ö': 'o', 'ù': 'u', 'ŷ': 'i', 'ä': 'a', 'ê': 'e', 'í': 'i', 'ó': 'o', 'ü': 'u', 'ý': 'i', 'á': 'a', 'ë': 'e', 'ì': 'i', 'ò': 'o', 'ú': 'u', 'ỳ': 'i', 'ā': 'a', 'ē': 'e', 'ī': 'i', 'ō': 'o', 'ū': 'u', 'ȳ': 'i', 'ç': 'c', 'ñ': 'n', 'œ': 'oe', 'æ': 'ae', 'ſ': 's', 'ffi': 'ffi', 'ffl': 'ffl', 'ff': 'ff', 'ſt': 'ft', 'fi': 'fi', 'fl': 'fl', 'st': 'st', "⁰": "0", "¹": "1", "²": "2", "³": "3", "⁴": "4", "⁵": "5", "⁶": "6", "⁷": "7", "⁸": "8", "⁹": "9", "₀": "0", "₁": "1", "₂": "2", "₃": "3", "₄": "4", "₅": "5", "₆": "6", "₇": "7", "₈": "8", "₉": "9", "’": "", "'": "", "ʼ": "", "‘": "", "‛": "", "´": "", "`": "", "′": "", "‵": "", "՚": "", "ꞌ": "", "Ꞌ": "", "-": "" }) def simplifyWord (sWord): "word simplication before calculating distance between words" sWord = sWord.lower().translate(_xTransCharsForSimplification) sNewWord = "" for i, c in enumerate(sWord, 1): if c != sWord[i:i+1] or (c == 'e' and sWord[i:i+2] != "ee"): # exception for <e> to avoid confusion between crée / créai sNewWord += c return sNewWord.replace("eau", "o").replace("au", "o").replace("ei", "e").replace("ai", "ei").replace("ph", "f") def cleanWord (sWord): "remove letters repeated more than 2 times" return re.sub("(.)\\1{2,}", '\\1\\1', sWord) |
︙ | ︙ |