Overview
| Comment: | [graphspell] tokenizer: token OTHER as fallback |
|---|---|
| Downloads: | Tarball | ZIP archive | SQL archive |
| Timelines: | family | ancestors | descendants | both | trunk | graphspell |
| Files: | files | file ages | folders |
| SHA3-256: |
e201630bf5b2bbfdc57f35473c464277 |
| User & Date: | olr on 2020-09-02 09:07:44 |
| Other Links: | manifest | tags |
Context
|
2020-09-02
| ||
| 09:40 | [fx] don’t remove HTML entities check-in: 1097f9a5d9 user: olr tags: trunk, fx | |
| 09:07 | [graphspell] tokenizer: token OTHER as fallback check-in: e201630bf5 user: olr tags: trunk, graphspell | |
| 08:17 | [lo][ui] lexicon editor: replace space replacement check-in: 971a16410f user: olr tags: trunk, lo | |
Changes
Modified graphspell-js/tokenizer.js from [0e7b889227] to [2ede633a83].
| ︙ | |||
21 22 23 24 25 26 27 | 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | - + + - + + |
[/^[#@][a-zA-Zà-öÀ-Ö0-9ø-ÿØ-ßĀ-ʯff-st_-]+/, 'TAG'],
[/^<[a-zA-Z]+.*?>|^<\/[a-zA-Z]+ *>/, 'HTML'],
[/^\[\/?[a-zA-Z]+\]/, 'PSEUDOHTML'],
[/^&\w+;(?:\w+;|)/, 'HTMLENTITY'],
[/^\d\d?[h:]\d\d(?:[m:]\d\ds?|)\b/, 'HOUR'],
[/^\d+(?:[.,]\d+|)/, 'NUM'],
[/^[&%‰€$+±=*/<>⩾⩽#|×¥£§¢¬÷@-]/, 'SIGN'],
|
| ︙ |
Modified graphspell/tokenizer.py from [b7228e1a86] to [5243432861].
| ︙ | |||
17 18 19 20 21 22 23 | 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | + - + - + + |
r'(?P<LINK>(?:https?://|www[.]|\w+[@.]\w\w+[@.])\w[\w./?&!%=+*"\'@$#-]+)',
r'(?P<HASHTAG>[#@][\w-]+)',
r'(?P<HTML><\w+.*?>|</\w+ *>)',
r'(?P<PSEUDOHTML>\[/?\w+\])',
r'(?P<HOUR>\d\d?[h:]\d\d(?:[m:]\d\ds?|)\b)',
r'(?P<NUM>\d+(?:[.,]\d+))',
r'(?P<SIGN>[&%‰€$+±=*/<>⩾⩽#|×¥£§¢¬÷@-])',
r"(?P<WORD>[\w\u0300-\u036f]+(?:[’'`-][\w\u0300-\u036f]+)*)", # with combining diacritics
|
| ︙ |