Overview
| Comment: | [core][bug] fix tokenizer for URL |
|---|---|
| Downloads: | Tarball | ZIP archive | SQL archive |
| Timelines: | family | ancestors | descendants | both | trunk | core |
| Files: | files | file ages | folders |
| SHA3-256: |
ee7d44a3ee3c5862205010247a6871c4 |
| User & Date: | olr on 2017-10-25 18:34:20 |
| Other Links: | manifest | tags |
Context
|
2017-10-26
| ||
| 05:49 | [core] tokenizer: better regex for URLs and folders check-in: 843c0244bc user: olr tags: trunk, core | |
|
2017-10-25
| ||
| 18:34 | [core][bug] fix tokenizer for URL check-in: ee7d44a3ee user: olr tags: trunk, core | |
| 17:03 | [fr] règle non testée check-in: 54a4aceb2f user: olr tags: trunk, fr | |
Changes
Modified gc_core/js/tokenizer.js from [9622b0a610] to [de468b4358].
| ︙ | |||
15 16 17 18 19 20 21 | 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | - + - + |
// All regexps must start with ^.
"default":
[
[/^[ \t]+/, 'SPACE'],
[/^\/(?:bin|boot|dev|etc|home|lib|mnt|opt|root|sbin|tmp|usr|var|Bureau|Documents|Images|Musique|Public|Téléchargements|Vidéos)(?:\/[a-zA-Zà-öÀ-Ö0-9ø-ÿØ-ßĀ-ʯfi-st.()]+)*/, 'FOLDER'],
[/^[a-zA-Z]:\\(?:Program Files(?: \(x86\)|)|[a-zA-Zà-öÀ-Ö0-9ø-ÿØ-ßĀ-ʯfi-st.()]+)(?:\\[a-zA-Zà-öÀ-Ö0-9ø-ÿØ-ßĀ-ʯfi-st.()]+)*/, 'FOLDER'],
[/^[,.;:!?…«»“”‘’"(){}\[\]/·–—]+/, 'SEPARATOR'],
|
| ︙ |
Modified gc_core/py/tokenizer.py from [353949869b] to [829b056f2c].
1 2 3 4 5 6 7 8 9 10 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | - + - + |
# Very simple tokenizer
import re
_PATTERNS = {
"default":
(
r'(?P<FOLDER1>/(?:bin|boot|dev|etc|home|lib|mnt|opt|root|sbin|tmp|usr|var|Bureau|Documents|Images|Musique|Public|Téléchargements|Vidéos)(?:/[\w.()]+)*)',
r'(?P<FOLDER2>[a-zA-Z]:\\(?:Program Files(?: [(]x86[)]|)|[\w.()]+)(?:\\[\w.()]+)*)',
r'(?P<PUNC>[.,?!:;…«»“”"()/·]+)',
|
| ︙ |