Overview
Comment: | [fr] faux positifs |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk | fr |
Files: | files | file ages | folders |
SHA3-256: |
bd904b6202e6c73391d6e3f4e999ba87 |
User & Date: | olr on 2019-03-01 10:33:14 |
Other Links: | manifest | tags |
Context
2019-03-01
| ||
16:14 | [fr] faux positif: tout feu, tout flamme check-in: d6029dc169 user: olr tags: trunk, fr | |
10:33 | [fr] faux positifs check-in: bd904b6202 user: olr tags: trunk, fr | |
2019-02-27
| ||
14:48 | [doc] small documentation update check-in: ea6194b8b7 user: olr tags: trunk, doc | |
Changes
Modified doc/syntax.txt from [ae208bbf9d] to [f728cc54d9].
1 2 | WRITING RULES FOR GRAMMALECTE | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | WRITING RULES FOR GRAMMALECTE Note: This documentation is a draft. Information may be obsolete or incomplete. # Principles # Grammalecte is a bi-passes grammar checker engine. On the first pass, the engine checks the text paragraph by paragraph. On the second pass, the engine check the text sentence by sentence. The command to switch to the second pass is `[++]`. In each pass, you can write as many rules as you need. There are two kinds of rules: * regex rules (triggered by a regular expression) * token rules (triggered by a succession of tokens) A regex rule is defined by: * [optional] flags “LCR” for the regex word boundaries and case sensitiveness |
︙ | ︙ | |||
33 34 35 36 37 38 39 | Token rules must be defined within a graph. Each graph is defined within the second pass with the command: @@@@GRAPH: graph_name | | | > | | | 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | Token rules must be defined within a graph. Each graph is defined within the second pass with the command: @@@@GRAPH: graph_name A graph ends when another graph is defined or when is found the command: @@@@END_GRAPH There is no limit to the number of actions and the type of actions a rule can launch. Each action has its own condition to be triggered. There are several kinds of actions: * Error warning, with a message, and optionally suggestions, and optionally an URL * Text transformation, modifying internally the checked text * Disambiguation action, setting tags on a position * Tagging The rules file for your language must be named `rules.grx`. The settings file must be named `config.ini`. All these files are simple utf-8 text file. UTF-8 is mandatory. # Comments # |
︙ | ︙ | |||
185 186 187 188 189 190 191 | ## Whitespaces at the border of patterns or suggestions ## Example: Recognize double or more spaces and suggests a single space: __<s>__ " +" <<- ->> " " # Extra space(s). | | | 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 | ## Whitespaces at the border of patterns or suggestions ## Example: Recognize double or more spaces and suggests a single space: __<s>__ " +" <<- ->> " " # Extra space(s). Characters `"` protect spaces in the pattern and in the replacement text. ## Pattern groups and back references ## It is usually useful to retrieve parts of the matched pattern. We simply use parenthesis in pattern to get groups with back references. |
︙ | ︙ | |||
211 212 213 214 215 216 217 | ## Pattern matching ## Repeating pattern matching of a single rule continues after the previous matching, so instead of general multiword patterns, like | | | | 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | ## Pattern matching ## Repeating pattern matching of a single rule continues after the previous matching, so instead of general multiword patterns, like (\w+) (\w+) <<- some_check(\1, \2) ->> \1, \2 # foo use (\w+) <<- some_check(\1, word(1)) ->> \1, # foo ## Name definitions ## Grammalecte supports name definitions to simplify the description of the complex rules. |
︙ | ︙ | |||
340 341 342 343 344 345 346 | internally the text before checking the text. The text preprocessor is useful to simplify texts and write simplier checking rules. For example, sentences with the same grammar mistake: | | | | | | | 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 | internally the text before checking the text. The text preprocessor is useful to simplify texts and write simplier checking rules. For example, sentences with the same grammar mistake: These “cats” are black. These cats are “black”. These cats are absolutely black. These stupid “cats” are all black. These unknown cats are as per usual black. Instead of writting complex rules or several rules to find mistakes for all possible cases, you can use the text preprocessor to simplify the text. To remove the chars “”, write: [“”] ->> * |
︙ | ︙ | |||
366 367 368 369 370 371 372 | You can also remove a group reference: these (\w+) (\w+) <<- morph(\1, "adjective") and morph(\2, "noun") -1>> * (am|are|is|were|was) (all) <<- -2>> * With these rules, you get the following sentences: | | | | | | | 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 | You can also remove a group reference: these (\w+) (\w+) <<- morph(\1, "adjective") and morph(\2, "noun") -1>> * (am|are|is|were|was) (all) <<- -2>> * With these rules, you get the following sentences: These cats are black. These cats are black . These cats are black. These cats are black. These cats are black. These grammar mistakes can be detected with one simple rule: these +(\w+) +are +(\w+s) <<- morph(\1, "noun") and morph(\2, "plural") -2>> _ # Adjectives are invariable. |
︙ | ︙ | |||
393 394 395 396 397 398 399 | Mister <<- ->> Mr (Mrs?)[.] <<- ->> \1 # Disambiguation # | | | | | | | < < < < | < | 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 | Mister <<- ->> Mr (Mrs?)[.] <<- ->> \1 # Disambiguation # When Grammalecte analyses a word with morph, before requesting the POS tags to the dictionary, it checks if there is a stored marker for the position where the word is. If there is a marker, Grammalecte uses the stored data and don’t make request to the dictionary. The disambiguation commands store POS tags at the position of a word. There is 3 commands for disambiguation. `select(n, pattern)` > stores at position n only the POS tags of the word matching the pattern. `exclude(n, pattern)` > stores at position n the POS tags of the word, except those matching the pattern. `define(n, [definitions])` > stores at position n the POS tags in definitions (a list of strings). Examples: =>> select(\1, "po:noun is:pl") =>> exclude(\1, "po:verb") =>> define(\1, ["po:adv"]) =>> exclude(\1, "po:verb") and define(\2, ["po:adv"]) and select(\3, "po:adv") Note: select, exclude and define ALWAYS return True. If select and exclude generate an empty list, no marker is set. With define, you must set a list of POS tags. Example: define(\1, ["po:nom is:plur", "po:adj is:sing", "po:adv"]) # Conditions # Conditions are Python expressions, they must return a value, which will be evaluated as boolean. You can use the usual Python syntax and libraries. |
︙ | ︙ |
Modified gc_lang/fr/rules.grx from [e7a4b85ce4] to [c70e301245].
︙ | ︙ | |||
2015 2016 2017 2018 2019 2020 2021 | TEST: {{attaquant-ils}} ->> attaquent-ils TEST: {{prendrons-elles}} un verre avec moi ? __inte_verbes_composés_interrogatifs_impératifs__ ~\w-[nN]ous$ <<- /inte/ morphVC(\1, ":V", ":(?:1p|E:2[sp])") ->> =suggVerb(\1, ":1p", None, True) # Forme interrogative ou impérative incorrecte. | | > | 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 | TEST: {{attaquant-ils}} ->> attaquent-ils TEST: {{prendrons-elles}} un verre avec moi ? __inte_verbes_composés_interrogatifs_impératifs__ ~\w-[nN]ous$ <<- /inte/ morphVC(\1, ":V", ":(?:1p|E:2[sp])") ->> =suggVerb(\1, ":1p", None, True) # Forme interrogative ou impérative incorrecte. <<- /inte/ __else__ and morphVC(\1, ":", ":V|>(?:chez|malgré)/") ->> =suggSimil(\1, ":1p", False, True) # Forme interrogative ou impérative incorrecte. <<- />> -nous|VCint ~\w-[vV]ous$ <<- /inte/ morphVC(\1, ":V", ":2p") ->> =suggVerb(\1, ":2p", None, True) # Forme interrogative ou impérative. Désaccord avec “vous”. Le verbe n’est pas à la 2ᵉ personne du pluriel. <<- /inte/ __else__ and morphVC(\1, ":", ":V|>chez/") ->> =suggSimil(\1, ":2p", False, True) # Forme interrogative ou impérative. Désaccord avec “vous”. Le verbe n’est pas à la 2ᵉ personne du pluriel. <<- />> -vous|VCint TEST: {{Prendront-nous}} ->> Prendrons-nous TEST: {{Attendront-nous}} le train ->> Attendrons-nous TEST: {{Attaquait-vous}} ->> Attaquiez-vous TEST: Elle a de nombreux rendez-vous ce matin. TEST: êtes-vous là ? TEST: C’est notre chez-nous. TEST: Dans votre chez-vous, faites comme bon vous semble. TEST: Libérée en grande majorité durant l’automne 1945, une partie des « Malgré-nous » passe pourtant plusieurs années supplémentaires en captivité. __inte_rendez_vous__ ne [le|la|les] [lui|leur] rendez-vous ne me [le|la|les] rendez-vous ne [lui|leur] en rendez-vous ne [le|la|les|lui|leur] rendez-vous |
︙ | ︙ | |||
2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 | notre père <<- morph(<1, ":D.*:[mp]") ~>> ␣ <<- __also__ =>> define(\1, [":N:m:i"]) __immunités__ à l’ arrache <<- %3>> à ce point en tout point <<- %3>> | > > > > | 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 | notre père <<- morph(<1, ":D.*:[mp]") ~>> ␣ <<- __also__ =>> define(\1, [":N:m:i"]) __immunités__ il y a il n’ y a <<- %-1>> à l’ arrache <<- %3>> à ce point en tout point <<- %3>> |
︙ | ︙ | |||
15261 15262 15263 15264 15265 15266 15267 | [de|d’] ?[ne|n’]¿ [me|m’|te|t’|se|s’|nous|vous] [le|la|l’|les|en|y] @:V¬:Y [de|d’] ?[ne|n’]¿ [le|la|l’|les] [lui|leur|en|y] @:V¬:Y [de|d’] ?[ne|n’]¿ [lui|leur] en @:V¬:Y <<- /infi/ --1>> =suggVerbInfi(\-1) # Après “de”, le verbe devrait être à l’infinitif. [de|d’] @:V1.*:Q¬:N <<- /infi/ not \2[0:1].isupper() and not morph(<1, ">(?:en|passer)/") | | | 15266 15267 15268 15269 15270 15271 15272 15273 15274 15275 15276 15277 15278 15279 15280 | [de|d’] ?[ne|n’]¿ [me|m’|te|t’|se|s’|nous|vous] [le|la|l’|les|en|y] @:V¬:Y [de|d’] ?[ne|n’]¿ [le|la|l’|les] [lui|leur|en|y] @:V¬:Y [de|d’] ?[ne|n’]¿ [lui|leur] en @:V¬:Y <<- /infi/ --1>> =suggVerbInfi(\-1) # Après “de”, le verbe devrait être à l’infinitif. [de|d’] @:V1.*:Q¬:N <<- /infi/ not \2[0:1].isupper() and not morph(<1, ">(?:en|passer)/") and not before("(?i)\\b(?:quelqu(?:e chose|’une?)|(?:l(es?|a)|nous|vous|me|te|se) trait|personne|points? +$|rien(?: +[a-zéèêâîûù]+|) +$)") -2>> =suggVerbInfi(\2) # Le verbe devrait être à l’infinitif. TEST: d’en {{parlé}} sans cesse TEST: cela suffit de les {{aimait}} TEST: de ne leur en {{avancé}} que le nécessaire. TEST: de l’y {{poussé}} TEST: arrête d’y {{consacré}} autant de temps. |
︙ | ︙ | |||
15286 15287 15288 15289 15290 15291 15292 15293 15294 15295 15296 15297 15298 15299 | TEST: l’idée consiste de la lui donner sans contrepartie et voir ce qu’il en fera. TEST: de leur plus grand fils, ils attendent beaucoup. TEST: de n’importe quelle manière TEST: un libéralisme trop « individualiste » s’est transformé en de supposées demandes de droits spécifiques TEST: soit 40 % de plus comparé au quinquennat précédent TEST: On passe de sophistiqué à classique. TEST: Les « événements » d’il y a cinquante ans n’ont sans doute « rien à voir » avec le mouvement des « gilets jaunes » __infi_y_verbe!3__ y ~ée?s?$ <<- /infi/ morph(\2, ":V", ":[123][sp]") -2>> _ # Le verbe ne devrait pas être un participe passé. TEST: y {{mangée}} était un supplice | > | 15291 15292 15293 15294 15295 15296 15297 15298 15299 15300 15301 15302 15303 15304 15305 | TEST: l’idée consiste de la lui donner sans contrepartie et voir ce qu’il en fera. TEST: de leur plus grand fils, ils attendent beaucoup. TEST: de n’importe quelle manière TEST: un libéralisme trop « individualiste » s’est transformé en de supposées demandes de droits spécifiques TEST: soit 40 % de plus comparé au quinquennat précédent TEST: On passe de sophistiqué à classique. TEST: Les « événements » d’il y a cinquante ans n’ont sans doute « rien à voir » avec le mouvement des « gilets jaunes » TEST: Quelques points de gagnés avec cette astuce, ne faisons pas la fine bouche. __infi_y_verbe!3__ y ~ée?s?$ <<- /infi/ morph(\2, ":V", ":[123][sp]") -2>> _ # Le verbe ne devrait pas être un participe passé. TEST: y {{mangée}} était un supplice |
︙ | ︙ |