Grammalecte  Check-in [95ac5ee834]

Overview
Comment:[core][py] don’t ask for morphologies several times uselessly
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | core | rg
Files: files | file ages | folders
SHA3-256: 95ac5ee8342cdfefa06027ba01dd394c0a3e18993d1fb10b7e819f0b55e670b6
User & Date: olr on 2018-09-11 20:44:56
Other Links: branch diff | manifest | tags
Context
2018-09-11
20:48
[build] graph builder: use str() instead of json to store graph data (json can’t store integers as keys) check-in: 8f5e61c348 user: olr tags: build, rg
20:44
[core][py] don’t ask for morphologies several times uselessly check-in: 95ac5ee834 user: olr tags: core, rg
18:55
[graphspell][js] tokenizer: don’t use spaces as tokens, yield information token (start/end) check-in: d12872816f user: olr tags: graphspell, rg
Changes

Modified gc_core/py/lang_core/gc_engine.py from [126b8f2d42] to [ce1482f7e7].

411
412
413
414
415
416
417

418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
411
412
413
414
415
416
417
418
419
420
421

422
423
424
425
426
427
428
429
430
431
432

433
434
435
436
437
438

439
440
441
442
443
444
445







+



-











-






-







                    if sLemma in dNode["<lemmas>"]:
                        if bDebug:
                            echo("  MATCH: >" + sLemma)
                        yield { "iNode1": iNode1, "dNode": dGraph[dNode["<lemmas>"][sLemma]] }
                        bTokenFound = True
            # regex morph arcs
            if "<re_morph>" in dNode:
                lMorph = dToken.get("lMorph", _oSpellChecker.getMorph(dToken["sValue"]))
                for sRegex in dNode["<re_morph>"]:
                    if "¬" not in sRegex:
                        # no anti-pattern
                        lMorph = dToken.get("lMorph", _oSpellChecker.getMorph(dToken["sValue"]))
                        if any(re.search(sRegex, sMorph)  for sMorph in lMorph):
                            if bDebug:
                                echo("  MATCH: @" + sRegex)
                            yield { "iNode1": iNode1, "dNode": dGraph[dNode["<re_morph>"][sRegex]] }
                            bTokenFound = True
                    else:
                        # there is an anti-pattern
                        sPattern, sNegPattern = sRegex.split("¬", 1)
                        if sNegPattern == "*":
                            # all morphologies must match with <sPattern>
                            if sPattern:
                                lMorph = dToken.get("lMorph", _oSpellChecker.getMorph(dToken["sValue"]))
                                if lMorph and all(re.search(sPattern, sMorph)  for sMorph in lMorph):
                                    if bDebug:
                                        echo("  MATCH: @" + sRegex)
                                    yield { "iNode1": iNode1, "dNode": dGraph[dNode["<re_morph>"][sRegex]] }
                                    bTokenFound = True
                        else:
                            lMorph = dToken.get("lMorph", _oSpellChecker.getMorph(dToken["sValue"]))
                            if sNegPattern and any(re.search(sNegPattern, sMorph)  for sMorph in lMorph):
                                continue
                            if not sPattern or any(re.search(sPattern, sMorph)  for sMorph in lMorph):
                                if bDebug:
                                    echo("  MATCH: @" + sRegex)
                                yield { "iNode1": iNode1, "dNode": dGraph[dNode["<re_morph>"][sRegex]] }
                                bTokenFound = True