Grammalecte  Check-in [daccfe7e1c]

Overview
Comment:[build][core]disambiguation: add_morph(), [fr] faux positif et ajustements
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk | fr | core | build
Files: files | file ages | folders
SHA3-256: daccfe7e1c7cce907e756f21741cb04c8d56ab02c960f14270e70634e01f5706
User & Date: olr on 2019-08-28 12:57:29
Other Links: manifest | tags
Context
2019-08-28
15:21
[fr] typo: signe moins check-in: 32b653322f user: olr tags: trunk, fr
12:57
[build][core]disambiguation: add_morph(), [fr] faux positif et ajustements check-in: daccfe7e1c user: olr tags: trunk, fr, core, build
2019-08-27
19:43
[fr] faux positif et ajustements, +nr: confusions plant/plan check-in: a51a149425 user: olr tags: trunk, fr
Changes

Modified compile_rules_graph.py from [b8468e70e1] to [e6e8d01dc5].

44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64














65
66
67
68
69
70

71
72
73
74
75
76
77
44
45
46
47
48
49
50














51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

70
71
72
73
74
75
76
77







-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+





-
+







def prepareFunction (sCode):
    "convert simple rule syntax to a string of Python code"
    if sCode[0:1] == "=":
        sCode = sCode[1:]
    sCode = sCode.replace("__also__", "bCondMemo")
    sCode = sCode.replace("__else__", "not bCondMemo")
    sCode = sCode.replace("sContext", "_sAppContext")
    sCode = re.sub(r"(morph|morphVC|analyse|value|tag|displayInfo)[(]\\(\d+)", 'g_\\1(lToken[nTokenOffset+\\2]', sCode)
    sCode = re.sub(r"(morph|morphVC|analyse|value|tag|displayInfo)[(]\\-(\d+)", 'g_\\1(lToken[nLastToken-\\2+1]', sCode)
    sCode = re.sub(r"(select|exclude|define|define_from|change_meta)[(][\\](\d+)", 'g_\\1(lToken[nTokenOffset+\\2]', sCode)
    sCode = re.sub(r"(select|exclude|define|define_from|change_meta)[(][\\]-(\d+)", 'g_\\1(lToken[nLastToken-\\2+1]', sCode)
    sCode = re.sub(r"(tag_before|tag_after)[(][\\](\d+)", 'g_\\1(lToken[nTokenOffset+\\2], dTags', sCode)
    sCode = re.sub(r"(tag_before|tag_after)[(][\\]-(\d+)", 'g_\\1(lToken[nLastToken-\\2+1], dTags', sCode)
    sCode = re.sub(r"space_after[(][\\](\d+)", 'g_space_between_tokens(lToken[nTokenOffset+\\1], lToken[nTokenOffset+\\1+1]', sCode)
    sCode = re.sub(r"space_after[(][\\]-(\d+)", 'g_space_between_tokens(lToken[nLastToken-\\1+1], lToken[nLastToken-\\1+2]', sCode)
    sCode = re.sub(r"analyse_with_next[(][\\](\d+)", 'g_merged_analyse(lToken[nTokenOffset+\\1], lToken[nTokenOffset+\\1+1]', sCode)
    sCode = re.sub(r"analyse_with_next[(][\\]-(\d+)", 'g_merged_analyse(lToken[nLastToken-\\1+1], lToken[nLastToken-\\1+2]', sCode)
    sCode = re.sub(r"(morph|analyse|tag|value)\(>1", 'g_\\1(lToken[nLastToken+1]', sCode)                       # next token
    sCode = re.sub(r"(morph|analyse|tag|value)\(<1", 'g_\\1(lToken[nTokenOffset]', sCode)                       # previous token
    sCode = re.sub(r"(morph|analyse|tag|value)\(>(\d+)", 'g_\\1(g_token(lToken, nLastToken+\\2)', sCode)        # next token
    sCode = re.sub(r"(morph|analyse|tag|value)\(<(\d+)", 'g_\\1(g_token(lToken, nTokenOffset+1-\\2)', sCode)    # previous token
    sCode = re.sub(r"\b(morph|morphVC|analyse|value|tag|displayInfo)[(]\\(\d+)", 'g_\\1(lToken[nTokenOffset+\\2]', sCode)
    sCode = re.sub(r"\b(morph|morphVC|analyse|value|tag|displayInfo)[(]\\-(\d+)", 'g_\\1(lToken[nLastToken-\\2+1]', sCode)
    sCode = re.sub(r"\b(select|exclude|define|define_from|add_morph|change_meta)[(][\\](\d+)", 'g_\\1(lToken[nTokenOffset+\\2]', sCode)
    sCode = re.sub(r"\b(select|exclude|define|define_from|add_morph|change_meta)[(][\\]-(\d+)", 'g_\\1(lToken[nLastToken-\\2+1]', sCode)
    sCode = re.sub(r"\b(tag_before|tag_after)[(][\\](\d+)", 'g_\\1(lToken[nTokenOffset+\\2], dTags', sCode)
    sCode = re.sub(r"\b(tag_before|tag_after)[(][\\]-(\d+)", 'g_\\1(lToken[nLastToken-\\2+1], dTags', sCode)
    sCode = re.sub(r"\bspace_after[(][\\](\d+)", 'g_space_between_tokens(lToken[nTokenOffset+\\1], lToken[nTokenOffset+\\1+1]', sCode)
    sCode = re.sub(r"\bspace_after[(][\\]-(\d+)", 'g_space_between_tokens(lToken[nLastToken-\\1+1], lToken[nLastToken-\\1+2]', sCode)
    sCode = re.sub(r"\banalyse_with_next[(][\\](\d+)", 'g_merged_analyse(lToken[nTokenOffset+\\1], lToken[nTokenOffset+\\1+1]', sCode)
    sCode = re.sub(r"\banalyse_with_next[(][\\]-(\d+)", 'g_merged_analyse(lToken[nLastToken-\\1+1], lToken[nLastToken-\\1+2]', sCode)
    sCode = re.sub(r"\b(morph|analyse|tag|value)\(>1", 'g_\\1(lToken[nLastToken+1]', sCode)                       # next token
    sCode = re.sub(r"\b(morph|analyse|tag|value)\(<1", 'g_\\1(lToken[nTokenOffset]', sCode)                       # previous token
    sCode = re.sub(r"\b(morph|analyse|tag|value)\(>(\d+)", 'g_\\1(g_token(lToken, nLastToken+\\2)', sCode)        # next token
    sCode = re.sub(r"\b(morph|analyse|tag|value)\(<(\d+)", 'g_\\1(g_token(lToken, nTokenOffset+1-\\2)', sCode)    # previous token
    sCode = re.sub(r"\bspell *[(]", '_oSpellChecker.isValid(', sCode)
    sCode = re.sub(r"\bbefore\(\s*", 'look(sSentence[:lToken[1+nTokenOffset]["nStart"]], ', sCode)          # before(sCode)
    sCode = re.sub(r"\bafter\(\s*", 'look(sSentence[lToken[nLastToken]["nEnd"]:], ', sCode)                 # after(sCode)
    sCode = re.sub(r"\bbefore0\(\s*", 'look(sSentence0[:lToken[1+nTokenOffset]["nStart"]], ', sCode)        # before0(sCode)
    sCode = re.sub(r"\bafter0\(\s*", 'look(sSentence[lToken[nLastToken]["nEnd"]:], ', sCode)                # after0(sCode)
    sCode = re.sub(r"analyseWord[(]", 'analyse(', sCode)
    sCode = re.sub(r"\banalyseWord[(]", 'analyse(', sCode)
    sCode = re.sub(r"[\\](\d+)", 'lToken[nTokenOffset+\\1]["sValue"]', sCode)
    sCode = re.sub(r"[\\]-(\d+)", 'lToken[nLastToken-\\1+1]["sValue"]', sCode)
    sCode = re.sub(r">1", 'lToken[nLastToken+1]["sValue"]', sCode)
    sCode = re.sub(r"<1", 'lToken[nTokenOffset]["sValue"]', sCode)
    return sCode


Modified gc_core/js/lang_core/gc_engine.js from [73b1002d2e] to [18d0645fa7].

1444
1445
1446
1447
1448
1449
1450








1451
1452
1453
1454
1455
1456
1457
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465







+
+
+
+
+
+
+
+







            oToken["lMorph"] = lSelect;
        }
    } else if (lDefault) {
        oToken["lMorph"] = lDefault;
    }
    return true;
}

function g_add_morph (oToken, lNewMorph) {
    "Disambiguation: add a morphology to a token"
    let lMorph = (oToken.hasOwnProperty("lMorph")) ? oToken["lMorph"] : _oSpellChecker.getMorph(oToken["sValue"]);
    lMorph.push(...lNewMorph);
    oToken["lMorph"] = lMorph;
    return true;
}

function g_define (oToken, lMorph) {
    // set morphologies of <oToken>, always return true
    oToken["lMorph"] = lMorph;
    return true;
}

Modified gc_core/py/lang_core/gc_engine.py from [3f4ffa12fb] to [0601d28dd4].

1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203

1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221

1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237








1238
1239

1240
1241
1242
1243
1244
1245
1246

1247
1248
1249
1250
1251
1252
1253
1192
1193
1194
1195
1196
1197
1198

1199
1200
1201

1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219

1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245

1246
1247
1248
1249
1250
1251
1252

1253
1254
1255
1256
1257
1258
1259
1260







-



-
+

















-
+
















+
+
+
+
+
+
+
+

-
+






-
+







    if nPos not in dTokenPos:
        echo("Error. There should be a token at this position: ", nPos)
        return True
    dTokenPos[nPos]["lMorph"] = lMorph
    return True



#### Disambiguation for graph rules

def g_select (dToken, sPattern, lDefault=None):
    "select morphologies for <dToken> according to <sPattern>, always return True"
    "Disambiguation: select morphologies for <dToken> according to <sPattern>, always return True"
    lMorph = dToken["lMorph"]  if "lMorph" in dToken  else _oSpellChecker.getMorph(dToken["sValue"])
    if not lMorph or len(lMorph) == 1:
        if lDefault:
            dToken["lMorph"] = lDefault
            #echo("DA:", dToken["sValue"], dToken["lMorph"])
        return True
    lSelect = [ sMorph  for sMorph in lMorph  if re.search(sPattern, sMorph) ]
    if lSelect:
        if len(lSelect) != len(lMorph):
            dToken["lMorph"] = lSelect
    elif lDefault:
        dToken["lMorph"] = lDefault
    #echo("DA:", dToken["sValue"], dToken["lMorph"])
    return True


def g_exclude (dToken, sPattern, lDefault=None):
    "select morphologies for <dToken> according to <sPattern>, always return True"
    "Disambiguation: select morphologies for <dToken> according to <sPattern>, always return True"
    lMorph = dToken["lMorph"]  if "lMorph" in dToken  else _oSpellChecker.getMorph(dToken["sValue"])
    if not lMorph or len(lMorph) == 1:
        if lDefault:
            dToken["lMorph"] = lDefault
            #echo("DA:", dToken["sValue"], dToken["lMorph"])
        return True
    lSelect = [ sMorph  for sMorph in lMorph  if not re.search(sPattern, sMorph) ]
    if lSelect:
        if len(lSelect) != len(lMorph):
            dToken["lMorph"] = lSelect
    elif lDefault:
        dToken["lMorph"] = lDefault
    #echo("DA:", dToken["sValue"], dToken["lMorph"])
    return True


def g_add_morph (dToken, lNewMorph):
    "Disambiguation: add a morphology to a token"
    lMorph = dToken["lMorph"]  if "lMorph" in dToken  else _oSpellChecker.getMorph(dToken["sValue"])
    lMorph.extend(lNewMorph)
    dToken["lMorph"] = lMorph
    return True


def g_define (dToken, lMorph):
    "set morphologies of <dToken>, always return True"
    "Disambiguation: set morphologies of <dToken>, always return True"
    dToken["lMorph"] = lMorph
    #echo("DA:", dToken["sValue"], lMorph)
    return True


def g_define_from (dToken, nLeft=None, nRight=None):
    "set morphologies of <dToken> with slicing its value with <nLeft> and <nRight>"
    "Disambiguation: set morphologies of <dToken> with slicing its value with <nLeft> and <nRight>"
    if nLeft is not None:
        dToken["lMorph"] = _oSpellChecker.getMorph(dToken["sValue"][slice(nLeft, nRight)])
    else:
        dToken["lMorph"] = _oSpellChecker.getMorph(dToken["sValue"])
    return True


Modified gc_lang/fr/rules.grx from [a2f03bf6e6] to [71c0a7c8ba].

1997
1998
1999
2000
2001
2002
2003





2004
2005
2006
2007
2008
2009
2010
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015







+
+
+
+
+








    en peine
        <<- =>> exclude(\2, ":V")

    par  *WORD
        <<- =>> exclude(\2, ":[123][sp]")

    nombre  [de|d’|des]  [@:[NA]|<end>|,]
        <<- not morph(<1, ":D") >>>
        <<- morph(<1, ":A.*:[me]:[si]") =>> add_morph(\1, [">nombre/:G:D"])
        <<- __else__ =>> define(\1, [">nombre/:G:D"])

    plein  [de|d’]  @:[AN]
        <<- not morph(<1, ">(?:être|(?:re|)devenir|rester|demeurer|sembler|para[iî]tre)/") =>> =define(\1, [":G"])

    source [de|d’]
        <<- morph(<1, ">(?:être|(?:re|)devenir|rester|demeurer|sembler|para[iî]tre)/") =>> define(\1, [":LV"])

    tout feu ?,¿ tout >flamme
13328
13329
13330
13331
13332
13333
13334

13335

13336
13337
13338
13339
13340


13341
13342
13343
13344
13345
13346
13347
13333
13334
13335
13336
13337
13338
13339
13340

13341
13342
13343
13344


13345
13346
13347
13348
13349
13350
13351
13352
13353







+
-
+



-
-
+
+








    [/VCint|/VCimp]  [le|la|l’|les|leur|leurs]  @:[NA]¬:[YG]
        <<- =>> =select(\-2, ":D") and exclude(\-1, ":[123][sp]")


__da_substantifs__
    [<start>|,]  $:D  *WORD  *WORD  [ne|n’|me|m’|te|t’|se|s’]
    [<start>|,]  $:D  *WORD  *WORD  [en|nous|vous|lui|y]                    @:(?:[123][sp]|P)
    [<start>|,]  $:D  *WORD  *WORD  [le|la|l’|les|en|nous|vous|lui|leur|y]  @:(?:[123][sp]|P)
    [<start>|,]  $:D  *WORD  *WORD  [le|la|l’|les|leur]                     @:(?:[123][sp]|P)¬:[NA]
    [<start>|,]  $:D  *WORD  *WORD  [nous|vous]     [le|la|l’|les|en|y]     @:(?:[123][sp]|P)
    [<start>|,]  $:D  *WORD  *WORD  [le|la|l’|les]  [lui|leur|en|y]         @:(?:[123][sp]|P)
    [<start>|,]  $:D  *WORD  *WORD  [lui|leur|y]    en                      @:(?:[123][sp]|P)
    [<start>|,]  $:D  *WORD  *WORD  @:(?:3[sp]|P)
        <<- morph(\3, ":[NA]", ":(?:G|V0)") and morph(\4, ":[NA]", ":[PG]") =>> exclude(\4, ":V")
    [<start>|,]  $:D  *WORD  *WORD  @:(?:3[sp]|P)¬:[GW]
        <<- morph(\3, ":[NA]", ":(?:G|V0)") and morph(\4, ":[NA]", ":(?:[PG]|V0)") =>> exclude(\4, ":V")

    [des|ces|mes|tes|ses|nos|vos|quelques|lesdits]  @:A.*:[pi]  @:N.*:[pi]¬(?:3[sp]|G)
        <<- =>> exclude(\3, ":V")

    les  @:A.*:[pi]¬:V  @:N.*:[pi]¬(?:3[sp]|G)
        <<- not before(":O[vs]") =>> exclude(\3, ":V")

13674
13675
13676
13677
13678
13679
13680
13681



13682
13683
13684
13685
13686
13687
13688
13680
13681
13682
13683
13684
13685
13686

13687
13688
13689
13690
13691
13692
13693
13694
13695
13696







-
+
+
+







TEST: Une robe sans manches plutôt sympathique.
TEST: une émancipation des usagers, refusant de se laisser guider par des « machines à sous » les incitant insidieusement à rester en ligne.
TEST: Les pompes à chaleur sont moins rentables.
TEST: Ce qui rend les pompes à chaleur moins rentables.
TEST: l’accusation de lynchage médiatique proférée par François de Clermont à l’encontre de “Marianne” ne tient pas.
TEST: La poïesis, en grec, est ce qui permet de faire passer n’importe quoi du non-être à l’être
TEST: un moteur nouvelle génération

TEST: La première est la critique dite artiste
TEST: la critique conduit nombre de protestataires à se replier sur des modalités de défense efficaces dans le passé mais désormais largement inadaptées
TEST: Tu crois que Microsoft peut contraindre les projets libres à quoi que ce soit ?


@@@@
@@@@
@@@@
@@@@
@@@@END_GRAPH                                                                                      _
15845
15846
15847
15848
15849
15850
15851












15852
15853
15854
15855
15856
15857
15858
15853
15854
15855
15856
15857
15858
15859
15860
15861
15862
15863
15864
15865
15866
15867
15868
15869
15870
15871
15872
15873
15874
15875
15876
15877
15878







+
+
+
+
+
+
+
+
+
+
+
+







    >point de [vu|vus|vues]
        <<- /sgpl/ -3>> vue                                         # Dans un “point de vue”, “vue” est toujours au féminin singulier.

TEST: c’est son point de {{vu}} qui prime.
TEST: Son point de {{vus}} prévaudra toujours, faites-vous à cette idée ou dégagez.
TEST: de mon point de {{vues}}


__sgpl_verbe__
    >faire rires
        <<- /sgpl/ -2>> rire                                        # Faire rire. Rire est un verbe, il ne prend pas la marque du pluriel.

    [>pouvoir|>vouloir|>falloir] [rires|mangers|êtres|avoirs]
        <<- /sgpl/ morph(\1, ":V") -2>> =\2[:-1]                    # Si “\2” est censé être un verbe, ne mettez pas la marque du pluriel.


TEST: On peut {{rires}}, non ?
TEST: Faire {{rires}}, c’est compliqué.



!!
!!
!!!! Confusions                                                                                   !!
!!
!!