Grammalecte  History of graphspell/tokenizer.py of ff502ce05ce2c02a

History of file graphspell/tokenizer.py at check-in ff502ce05ce2c02a

2020-12-02
07:57
[graphspell] tokenizer update file: [f44ddea977] check-in: [9678e9208c] user: olr, branch: trunk, size: 4031 [annotate] [blame] [check-ins using] [diff]
2020-11-30
15:15
[graphspell][fx] update tokenizer and lexicographer: add symbols and emojis file: [1e4c6dee79] check-in: [b3448ac17f] user: olr, branch: trunk, size: 3769 [annotate] [blame] [check-ins using] [diff]
2020-11-25
20:50
[graphspell][fr][fx] rename tokens file: [84d5574a19] check-in: [6aae160f81] user: olr, branch: trunk, size: 3513 [annotate] [blame] [check-ins using] [diff]
2020-10-02
09:32
[graphspell] tokenizer: token UNDERSCORE file: [88086e8ef7] check-in: [e2313363fe] user: olr, branch: trunk, size: 3522 [annotate] [blame] [check-ins using] [diff]
2020-10-01
14:50
[graphspell] tokenizer: exclude underscore from WORD token [fr] ajustements, écriture inclusive file: [3a068b8368] check-in: [cfbaf0ad4e] user: olr, branch: trunk, size: 3452 [annotate] [blame] [check-ins using] [diff]
2020-09-02
09:07
[graphspell] tokenizer: token OTHER as fallback file: [5243432861] check-in: [e201630bf5] user: olr, branch: trunk, size: 3416 [annotate] [blame] [check-ins using] [diff]
2020-05-07
10:35
[graphspell] tokenizer and suggestion engine: other apostrophes file: [b7228e1a86] check-in: [b68161b398] user: olr, branch: trunk, size: 3256 [annotate] [blame] [check-ins using] [diff]
2020-04-20
18:02
[graphspell] tokenizer: combining diacritics recognition and NFC normalization file: [81da836011] check-in: [3ef2bdb736] user: olr, branch: trunk, size: 3242 [annotate] [blame] [check-ins using] [diff]
2019-09-01
08:22
[graphspell] tokenizer: handles all kinds of apostrophes file: [a2c42f5f3e] check-in: [1bdedd3133] user: olr, branch: trunk, size: 3159 [annotate] [blame] [check-ins using] [diff]
2019-08-30
09:45
[graphspell] tokenizer: consider presqu’ and quelqu’ as separate words file: [18d4ef9f97] check-in: [0f0bc77645] user: olr, branch: trunk, size: 3149 [annotate] [blame] [check-ins using] [diff]
2019-07-30
20:06
[graphspell][fr] update tokenizer: ordinals file: [af7051a739] check-in: [dcdb32b057] user: olr, branch: trunk, size: 3135 [annotate] [blame] [check-ins using] [diff]
2019-06-09
06:32
[graphspell] tokenizer: update HOUR file: [62c6ff6f25] check-in: [1bc78ce87f] user: olr, branch: trunk, size: 3105 [annotate] [blame] [check-ins using] [diff]
2019-05-15
11:55
[graphspell][core][fr] code cleaning (pylint) file: [7d6a173497] check-in: [c65b7e2b8b] user: olr, branch: trunk, size: 3075 [annotate] [blame] [check-ins using] [diff]
2019-05-14
15:19
[graphspell] tokenizer: update for HOUR tokens file: [08b2581ffe] check-in: [63672ef096] user: olr, branch: trunk, size: 2897 [annotate] [blame] [check-ins using] [diff]
2019-05-02
08:16
[graphspell] tokinizer: update file: [3330e91775] check-in: [7d30bbec37] user: olr, branch: trunk, size: 2891 [annotate] [blame] [check-ins using] [diff]
07:50
[graphspell] tokinizer: update file: [07708a4bf1] check-in: [ed3b7acf68] user: olr, branch: trunk, size: 2863 [annotate] [blame] [check-ins using] [diff]
2019-02-22
11:53
[graphspell][fr] tokenisation: +signes €$# (faux positif) file: [13303390f7] check-in: [365d3554c7] user: olr, branch: trunk, size: 2845 [annotate] [blame] [check-ins using] [diff]
2018-07-17
06:42
[graphspell] tokenizer: remove hyphen in number detection (always considered as a separate sign) file: [daca54adb9] check-in: [6950f5898f] user: olr, branch: rg, size: 2835 [annotate] [blame] [check-ins using] [diff]
2018-06-30
06:30
[graphspell][bug] tokenizer: syntax error file: [b1bcfc3595] check-in: [ec92f6e873] user: olr, branch: rg, size: 2839 [annotate] [blame] [check-ins using] [diff]
2018-06-29
22:46
[graphspell] tokenizer: add lMorph to <start> and <end> tokens file: [2adea5dc85] check-in: [2dbf497b04] user: olr, branch: rg, size: 2841 [annotate] [blame] [check-ins using] [diff]
2018-06-28
08:26
[graphspell][core] tokenizer: rename ACRONYM tokens to WORD_ACRONYM file: [a1211301ce] check-in: [ccbbecbd1b] user: olr, branch: rg, size: 2795 [annotate] [blame] [check-ins using] [diff]
08:00
[graphspell] tokenizer: rename ORDINAL tokens to WORD_ORDINAL file: [026a9c1064] check-in: [20dbc28ded] user: olr, branch: rg, size: 2785 [annotate] [blame] [check-ins using] [diff]
07:53
[graphspell][core] tokenizer: rename ELPFX tokens to WORD_ELIDED file: [8cf6a6bb2e] check-in: [a1b165e276] user: olr, branch: rg, size: 2780 [annotate] [blame] [check-ins using] [diff]
2018-06-24
11:39
[graphspell] code cleaning (pylint) file: [7c766445e1] check-in: [814d73b60e] user: olr, branch: rg, size: 2774 [annotate] [blame] [check-ins using] [diff]
2018-06-18
20:12
[graphspell] tokenizer: new signs file: [044a0c747a] check-in: [da0d308818] user: olr, branch: rg, size: 2649 [annotate] [blame] [check-ins using] [diff]
2018-06-17
13:11
[graphspell] tokenizer: update ordinals file: [84dbf58ecd] check-in: [4be13a74c3] user: olr, branch: rg, size: 2559 [annotate] [blame] [check-ins using] [diff]
2018-06-12
11:24
[core] text processor: communication between regex rules and graph rules + [graphspell][bug] tokenizer: set i variable to 0, if sentence is empty file: [30951f1c9c] check-in: [cca3887aad] user: olr, branch: rg, size: 2509 [annotate] [blame] [check-ins using] [diff]
2018-06-02
13:47
[graphspell] tokenizer: add option for <start> and <end> tokens file: [b723a02695] check-in: [3339da6424] user: olr, branch: rg, size: 2495 [annotate] [blame] [check-ins using] [diff]
2018-05-18
13:11
[graphspell] tokenizer: add token index and avoid punctuations aggregation file: [b3cbfe75ea] check-in: [be6d99bbdc] user: olr, branch: rg, size: 2201 [annotate] [blame] [check-ins using] [diff]
2017-12-24
18:39
Renamed gc_core/py/tokenizer.py → graphspell/tokenizer.py. [build][py] move files from gc_core to graphspell file: [17f452887e] check-in: [bb8356bd7d] user: olr, branch: graphspell, size: 2146 [annotate] [blame] [check-ins using] [diff]
2017-11-12
13:22
[core][fx] tokenizer: +acronyms file: [17f452887e] check-in: [fa1205c098] user: olr, branch: Lexicographe, size: 2146 [annotate] [blame] [check-ins using] [diff]
2017-10-26
05:49
[core] tokenizer: better regex for URLs and folders file: [5a9c0c9105] check-in: [843c0244bc] user: olr, branch: trunk, size: 2018 [annotate] [blame] [check-ins using] [diff]
2017-10-25
18:34
[core][bug] fix tokenizer for URL file: [829b056f2c] check-in: [ee7d44a3ee] user: olr, branch: trunk, size: 2010 [annotate] [blame] [check-ins using] [diff]
2017-10-24
11:59
[core] fix tokentizer: two similar group name in regex file: [353949869b] check-in: [78199c4006] user: olr, branch: trunk, size: 2006 [annotate] [blame] [check-ins using] [diff]
11:00
[core] tokenization: folders file: [d05a70dbc3] check-in: [35c48d42a8] user: olr, branch: trunk, size: 2002 [annotate] [blame] [check-ins using] [diff]
2017-04-25
11:51
Added: commit 1 file: [27b6fefad2] check-in: [2fd7dc4dd5] user: olr, branch: trunk, size: 1474 [annotate] [blame] [check-ins using]