History of file graphspell/tokenizer.py at check-in f1955d65d40dbf7d
2020-12-02
| ||
07:57 | [graphspell] tokenizer update file: [f44ddea977] check-in: [9678e9208c] user: olr, branch: trunk, size: 4031 [annotate] [blame] [check-ins using] [diff] | |
2020-11-30
| ||
15:15 | [graphspell][fx] update tokenizer and lexicographer: add symbols and emojis file: [1e4c6dee79] check-in: [b3448ac17f] user: olr, branch: trunk, size: 3769 [annotate] [blame] [check-ins using] [diff] | |
2020-11-25
| ||
20:50 | [graphspell][fr][fx] rename tokens file: [84d5574a19] check-in: [6aae160f81] user: olr, branch: trunk, size: 3513 [annotate] [blame] [check-ins using] [diff] | |
2020-10-02
| ||
09:32 | [graphspell] tokenizer: token UNDERSCORE file: [88086e8ef7] check-in: [e2313363fe] user: olr, branch: trunk, size: 3522 [annotate] [blame] [check-ins using] [diff] | |
2020-10-01
| ||
14:50 | [graphspell] tokenizer: exclude underscore from WORD token [fr] ajustements, écriture inclusive file: [3a068b8368] check-in: [cfbaf0ad4e] user: olr, branch: trunk, size: 3452 [annotate] [blame] [check-ins using] [diff] | |
2020-09-02
| ||
09:07 | [graphspell] tokenizer: token OTHER as fallback file: [5243432861] check-in: [e201630bf5] user: olr, branch: trunk, size: 3416 [annotate] [blame] [check-ins using] [diff] | |
2020-05-07
| ||
10:35 | [graphspell] tokenizer and suggestion engine: other apostrophes file: [b7228e1a86] check-in: [b68161b398] user: olr, branch: trunk, size: 3256 [annotate] [blame] [check-ins using] [diff] | |
2020-04-20
| ||
18:02 | [graphspell] tokenizer: combining diacritics recognition and NFC normalization file: [81da836011] check-in: [3ef2bdb736] user: olr, branch: trunk, size: 3242 [annotate] [blame] [check-ins using] [diff] | |
2019-09-01
| ||
08:22 | [graphspell] tokenizer: handles all kinds of apostrophes file: [a2c42f5f3e] check-in: [1bdedd3133] user: olr, branch: trunk, size: 3159 [annotate] [blame] [check-ins using] [diff] | |
2019-08-30
| ||
09:45 | [graphspell] tokenizer: consider presqu’ and quelqu’ as separate words file: [18d4ef9f97] check-in: [0f0bc77645] user: olr, branch: trunk, size: 3149 [annotate] [blame] [check-ins using] [diff] | |
2019-07-30
| ||
20:06 | [graphspell][fr] update tokenizer: ordinals file: [af7051a739] check-in: [dcdb32b057] user: olr, branch: trunk, size: 3135 [annotate] [blame] [check-ins using] [diff] | |
2019-06-09
| ||
06:32 | [graphspell] tokenizer: update HOUR file: [62c6ff6f25] check-in: [1bc78ce87f] user: olr, branch: trunk, size: 3105 [annotate] [blame] [check-ins using] [diff] | |
2019-05-15
| ||
11:55 | [graphspell][core][fr] code cleaning (pylint) file: [7d6a173497] check-in: [c65b7e2b8b] user: olr, branch: trunk, size: 3075 [annotate] [blame] [check-ins using] [diff] | |
2019-05-14
| ||
15:19 | [graphspell] tokenizer: update for HOUR tokens file: [08b2581ffe] check-in: [63672ef096] user: olr, branch: trunk, size: 2897 [annotate] [blame] [check-ins using] [diff] | |
2019-05-02
| ||
08:16 | [graphspell] tokinizer: update file: [3330e91775] check-in: [7d30bbec37] user: olr, branch: trunk, size: 2891 [annotate] [blame] [check-ins using] [diff] | |
07:50 | [graphspell] tokinizer: update file: [07708a4bf1] check-in: [ed3b7acf68] user: olr, branch: trunk, size: 2863 [annotate] [blame] [check-ins using] [diff] | |
2019-02-22
| ||
11:53 | [graphspell][fr] tokenisation: +signes €$# (faux positif) file: [13303390f7] check-in: [365d3554c7] user: olr, branch: trunk, size: 2845 [annotate] [blame] [check-ins using] [diff] | |
2018-07-17
| ||
06:42 | [graphspell] tokenizer: remove hyphen in number detection (always considered as a separate sign) file: [daca54adb9] check-in: [6950f5898f] user: olr, branch: rg, size: 2835 [annotate] [blame] [check-ins using] [diff] | |
2018-06-30
| ||
06:30 | [graphspell][bug] tokenizer: syntax error file: [b1bcfc3595] check-in: [ec92f6e873] user: olr, branch: rg, size: 2839 [annotate] [blame] [check-ins using] [diff] | |
2018-06-29
| ||
22:46 | [graphspell] tokenizer: add lMorph to <start> and <end> tokens file: [2adea5dc85] check-in: [2dbf497b04] user: olr, branch: rg, size: 2841 [annotate] [blame] [check-ins using] [diff] | |
2018-06-28
| ||
08:26 | [graphspell][core] tokenizer: rename ACRONYM tokens to WORD_ACRONYM file: [a1211301ce] check-in: [ccbbecbd1b] user: olr, branch: rg, size: 2795 [annotate] [blame] [check-ins using] [diff] | |
08:00 | [graphspell] tokenizer: rename ORDINAL tokens to WORD_ORDINAL file: [026a9c1064] check-in: [20dbc28ded] user: olr, branch: rg, size: 2785 [annotate] [blame] [check-ins using] [diff] | |
07:53 | [graphspell][core] tokenizer: rename ELPFX tokens to WORD_ELIDED file: [8cf6a6bb2e] check-in: [a1b165e276] user: olr, branch: rg, size: 2780 [annotate] [blame] [check-ins using] [diff] | |
2018-06-24
| ||
11:39 | [graphspell] code cleaning (pylint) file: [7c766445e1] check-in: [814d73b60e] user: olr, branch: rg, size: 2774 [annotate] [blame] [check-ins using] [diff] | |
2018-06-18
| ||
20:12 | [graphspell] tokenizer: new signs file: [044a0c747a] check-in: [da0d308818] user: olr, branch: rg, size: 2649 [annotate] [blame] [check-ins using] [diff] | |
2018-06-17
| ||
13:11 | [graphspell] tokenizer: update ordinals file: [84dbf58ecd] check-in: [4be13a74c3] user: olr, branch: rg, size: 2559 [annotate] [blame] [check-ins using] [diff] | |
2018-06-12
| ||
11:24 | [core] text processor: communication between regex rules and graph rules + [graphspell][bug] tokenizer: set i variable to 0, if sentence is empty file: [30951f1c9c] check-in: [cca3887aad] user: olr, branch: rg, size: 2509 [annotate] [blame] [check-ins using] [diff] | |
2018-06-02
| ||
13:47 | [graphspell] tokenizer: add option for <start> and <end> tokens file: [b723a02695] check-in: [3339da6424] user: olr, branch: rg, size: 2495 [annotate] [blame] [check-ins using] [diff] | |
2018-05-18
| ||
13:11 | [graphspell] tokenizer: add token index and avoid punctuations aggregation file: [b3cbfe75ea] check-in: [be6d99bbdc] user: olr, branch: rg, size: 2201 [annotate] [blame] [check-ins using] [diff] | |
2017-12-24
| ||
18:39 | Renamed gc_core/py/tokenizer.py → graphspell/tokenizer.py. [build][py] move files from gc_core to graphspell file: [17f452887e] check-in: [bb8356bd7d] user: olr, branch: graphspell, size: 2146 [annotate] [blame] [check-ins using] [diff] | |
2017-11-12
| ||
13:22 | [core][fx] tokenizer: +acronyms file: [17f452887e] check-in: [fa1205c098] user: olr, branch: Lexicographe, size: 2146 [annotate] [blame] [check-ins using] [diff] | |
2017-10-26
| ||
05:49 | [core] tokenizer: better regex for URLs and folders file: [5a9c0c9105] check-in: [843c0244bc] user: olr, branch: trunk, size: 2018 [annotate] [blame] [check-ins using] [diff] | |
2017-10-25
| ||
18:34 | [core][bug] fix tokenizer for URL file: [829b056f2c] check-in: [ee7d44a3ee] user: olr, branch: trunk, size: 2010 [annotate] [blame] [check-ins using] [diff] | |
2017-10-24
| ||
11:59 | [core] fix tokentizer: two similar group name in regex file: [353949869b] check-in: [78199c4006] user: olr, branch: trunk, size: 2006 [annotate] [blame] [check-ins using] [diff] | |
11:00 | [core] tokenization: folders file: [d05a70dbc3] check-in: [35c48d42a8] user: olr, branch: trunk, size: 2002 [annotate] [blame] [check-ins using] [diff] | |
2017-04-25
| ||
11:51 | Added: commit 1 file: [27b6fefad2] check-in: [2fd7dc4dd5] user: olr, branch: trunk, size: 1474 [annotate] [blame] [check-ins using] | |