Overview
Comment: | [build][fr] build_data: add locutions_vrac.txt for unchecked locutions |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | fr | build | Lexicographe |
Files: | files | file ages | folders |
SHA3-256: |
ceaa0c6eaedd327ecd8daa62ac56c8e3 |
User & Date: | olr on 2017-11-01 08:22:52 |
Other Links: | branch diff | manifest | tags |
Context
2017-11-01
| ||
09:20 | [fr] data: locutions check-in: 144b46f14c user: olr tags: fr, Lexicographe | |
08:22 | [build][fr] build_data: add locutions_vrac.txt for unchecked locutions check-in: ceaa0c6eae user: olr tags: fr, build, Lexicographe | |
08:16 | [build][fr] build_data: readFile as a generator check-in: 1a082eec4e user: olr tags: fr, build, Lexicographe | |
Changes
Modified gc_lang/fr/build_data.py from [39bc8d4153] to [732462dec8].
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #!python3 # FRENCH DATA BUILDER # # by Olivier R. # License: MPL 2 import json import os import grammalecte.ibdawg as ibdawg from grammalecte.echo import echo from grammalecte.str_transform import defineSuffixCode import grammalecte.fr.conj as conj import grammalecte.tokenizer as tkz | > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #!python3 # FRENCH DATA BUILDER # # by Olivier R. # License: MPL 2 import json import os import itertools import grammalecte.ibdawg as ibdawg from grammalecte.echo import echo from grammalecte.str_transform import defineSuffixCode import grammalecte.fr.conj as conj import grammalecte.tokenizer as tkz |
︙ | ︙ | |||
313 314 315 316 317 318 319 | def makeLocutions (sp, bJS=False): "compile list of locutions in JSON" print("> Locutions ", end="") print("(Python et JavaScript)" if bJS else "(Python seulement)") dLocGraph = {} oTokenizer = tkz.Tokenizer("fr") | | | 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 | def makeLocutions (sp, bJS=False): "compile list of locutions in JSON" print("> Locutions ", end="") print("(Python et JavaScript)" if bJS else "(Python seulement)") dLocGraph = {} oTokenizer = tkz.Tokenizer("fr") for sLine in itertools.chain(readFile(sp+"/data/locutions.txt"), readFile(sp+"/data/locutions_vrac.txt")): dCur = dLocGraph sLoc, sTag = sLine.split("\t") for oToken in oTokenizer.genTokens(sLoc.strip()): sWord = oToken["sValue"] if sWord not in dCur: dCur[sWord] = {} dCur = dCur[sWord] |
︙ | ︙ |
Added gc_lang/fr/data/locutions_vrac.txt version [a7ffc6f8bf].