Overview
| Comment: | [graphspell][core] tokenizer: rename ACRONYM tokens to WORD_ACRONYM |
|---|---|
| Downloads: | Tarball | ZIP archive | SQL archive |
| Timelines: | family | ancestors | descendants | both | core | graphspell | rg |
| Files: | files | file ages | folders |
| SHA3-256: |
ccbbecbd1baca0dc13fd24e1aae3632d |
| User & Date: | olr on 2018-06-28 08:26:20 |
| Other Links: | branch diff | manifest | tags |
Context
|
2018-06-28
| ||
| 08:54 | [core] gc engine: don’t analyse lemmas and morphologies of tokens who don’t have such things check-in: 032aff09bb user: olr tags: core, rg | |
| 08:26 | [graphspell][core] tokenizer: rename ACRONYM tokens to WORD_ACRONYM check-in: ccbbecbd1b user: olr tags: core, graphspell, rg | |
| 08:13 | [core][fr][js] lexicographer update check-in: dfe45ca126 user: olr tags: fr, core, rg | |
Changes
Modified gc_lang/fr/modules-js/lexicographe.js from [f905bdc287] to [8830593e2a].
| ︙ | |||
296 297 298 299 300 301 302 | 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 | - + |
case 'FOLDERWIN':
return {
sType: oToken.sType,
sValue: oToken.sValue.slice(0, 40) + "…",
aLabel: ["dossier Windows"]
};
break;
|
| ︙ |
Modified graphspell-js/tokenizer.js from [a185b00a68] to [2fadfb42f5].
| ︙ | |||
15 16 17 18 19 20 21 | 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | - + - + |
// All regexps must start with ^.
"default":
[
[/^[ \t]+/, 'SPACE'],
[/^\/(?:~|bin|boot|dev|etc|home|lib|mnt|opt|root|sbin|tmp|usr|var|Bureau|Documents|Images|Musique|Public|Téléchargements|Vidéos)(?:\/[a-zA-Zà-öÀ-Ö0-9ø-ÿØ-ßĀ-ʯfi-st_.()-]+)*/, 'FOLDERUNIX'],
[/^[a-zA-Z]:\\(?:Program Files(?: \(x86\)|)|[a-zA-Zà-öÀ-Ö0-9ø-ÿØ-ßĀ-ʯfi-st.()]+)(?:\\[a-zA-Zà-öÀ-Ö0-9ø-ÿØ-ßĀ-ʯfi-st_.()-]+)*/, 'FOLDERWIN'],
[/^[,.;:!?…«»“”‘’"(){}\[\]·–—]/, 'SEPARATOR'],
|
| ︙ |
Modified graphspell/tokenizer.py from [026a9c1064] to [a1211301ce].
1 2 3 4 5 6 7 8 9 10 11 12 13 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | - + - + |
"""
Very simple tokenizer
using regular expressions
"""
import re
_PATTERNS = {
"default":
(
r'(?P<FOLDERUNIX>/(?:bin|boot|dev|etc|home|lib|mnt|opt|root|sbin|tmp|usr|var|Bureau|Documents|Images|Musique|Public|Téléchargements|Vidéos)(?:/[\w.()-]+)*)',
r'(?P<FOLDERWIN>[a-zA-Z]:\\(?:Program Files(?: [(]x86[)]|)|[\w.()]+)(?:\\[\w.()-]+)*)',
r'(?P<PUNC>[][,.;:!?…«»“”‘’"(){}·–—])',
|
| ︙ |