Overview
Comment: | [doc] syntax update |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk | doc |
Files: | files | file ages | folders |
SHA3-256: |
cdc40fbad0a1ff9a2e8cb04b88a3f695 |
User & Date: | olr on 2020-04-13 11:12:39 |
Other Links: | manifest | tags |
Context
2020-04-13
| ||
17:48 | [doc] syntax update check-in: b26524e478 user: olr tags: trunk, doc | |
11:12 | [doc] syntax update check-in: cdc40fbad0 user: olr tags: trunk, doc | |
11:11 | [fr] ajustements check-in: 6bf6e60197 user: olr tags: trunk, fr | |
Changes
Modified doc/syntax.txt from [97c9a7a212] to [1e3f6930d4].
︙ | ︙ | |||
279 280 281 282 283 284 285 | tokens: alpha (beta) gamma (delta) epsilon positive refs: 1 2 negative refs: -5 -4 -3 -2 -1 tokens: alpha (beta) ?gamma¿ (delta) epsilon positive refs: 1 2 | | | | 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | tokens: alpha (beta) gamma (delta) epsilon positive refs: 1 2 negative refs: -5 -4 -3 -2 -1 tokens: alpha (beta) ?gamma¿ (delta) epsilon positive refs: 1 2 negative refs: (-5/-4) (-4/-3) (-3/none) -2 -1 ## CONDITIONS ## Conditions are Python expressions, they must return a value, which will be evaluated as boolean. You can use the usual Python syntax and libraries. With regex rules, you can call pattern subgroups via `\1`, `\2`… `\0` is the full pattern. Example: these (\w+) <<- \1 == "man" -1>> men # Man is a singular noun. You can also apply functions to subgroups like: `\1.startswith("a")` or `\3.islower()` or `re.search("pattern", \2)`. With token rules, you can also call each token with their reference, like `\1`, `\2`... or `\-1`, `\-2`... Example: |
︙ | ︙ | |||
340 341 342 343 344 345 346 | `value(n, values_string)` > Analyses the value of the nth token. > The <values_string> contains values separated by the sign `|`. > Example: `"|foo|bar|"` | | | | > > > > > > > > > > > > > > > > > | 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 | `value(n, values_string)` > Analyses the value of the nth token. > The <values_string> contains values separated by the sign `|`. > Example: `"|foo|bar|"` `morph(n, "regex"[, "neg_regex"][, trim_left=0][, trim_right=0])` `analyse(n, "regex"[, "neg_regex"][, trim_left=0][, trim_right=0])` > Same action with `morph()` and `morph0()` for regex rules. > Parameters <trim_left> and <trim_right> removed n characters at left or the right of the token before performing an analyse. `space_after(n, min_space[, max_space])` > Returns True if the next token after token n is separated with at least <min_space> blank spaces and at most with <max_space> blank spaces. `tag(n, tag)` > Returns True if <tag> exists on taken the nth token. `tag_before(n, tag)` > Returns True if <tag> is found any token before the nth tag. `tag_after(n, tag)` > Returns True if <tag> is found any token after the nth tag. ### Functions for regex and token rules `__also__` > Returns True if the previous condition returned True. > Example: `<<- __also__ and condition2 ->>` |
︙ | ︙ | |||
375 376 377 378 379 380 381 | > Checks if the text before the pattern matches the regex. ### Default variables `sCountry` | | | | 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 | > Checks if the text before the pattern matches the regex. ### Default variables `sCountry` > Contains the current country locale of the checked paragraph. colour <<- sCountry == "US" ->> color # Use American English spelling. ## ACTIONS ## There are 5 kinds of actions: 1. Suggestions. The grammar checker suggests corrections. |
︙ | ︙ | |||
400 401 402 403 404 405 406 | 5. Immunity. Prevent suggestions to be triggered. ### Positioning Positioning is valid for suggestions, text processing, tagging and immunity. | | | > | | | 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 | 5. Immunity. Prevent suggestions to be triggered. ### Positioning Positioning is valid for suggestions, text processing, tagging and immunity. By default, rules apply on the full text triggered. You can shorten the effect of rules by specifying a back reference group of the pattern or token references. Instead of writing `->>`, write `-n>>` n being the number of a back reference group. Actually, `->>` is similar to `-0>>`. Example: (ying) and yang <<- -1>> yin # Did you mean: __[s]__ (Mr.) [A-Z]\w+ <<- ~1>> Mr |
︙ | ︙ | |||
462 463 464 465 466 467 468 | your’s <<- ->> yours # Possessive pronoun:|http://en.wikipedia.org/wiki/Possessive_pronoun #### Expressions in suggestion or replacement | | < | | < | < | 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 | your’s <<- ->> yours # Possessive pronoun:|http://en.wikipedia.org/wiki/Possessive_pronoun #### Expressions in suggestion or replacement Suggestions started by an equal sign are Python string expressions extended with possible back references and named definitions: Example: <<- ->> ='"' + \1.upper() + '"' # With uppercase letters and quotation marks <<- ~>> =\1.upper() ### Text rewriting Example. Replacing a string by another. Mr. [A-Z]\w+ <<- ~>> Mister **WARNING**: The replacing text must be shorter than the replaced text or have the same length. Breaking this rule will misplace following error reports. You have to ensure yourself the rules comply with this constraint, the text processor won’t do it for you. Specific commands for text rewriting: `~>> *` > Replace by whitespaces |
︙ | ︙ | |||
565 566 567 568 569 570 571 | Mister <<- ~>> Mr (Mrs?)[.] <<- ~>> \1 ### Disambiguation | < | < < < < | | > | > | > | | < | | | | > > > > > > > > | 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 | Mister <<- ~>> Mr (Mrs?)[.] <<- ~>> \1 ### Disambiguation When the grammar checker analyses a token with `morph()`, before requesting the POS tags to the dictionary, it checks if there is a stored marker for the position of the token. If a marker is found, it uses the stored data and don’t make request to the dictionary. There are 4 commands for disambiguation. `select(n, pattern)` > At reference n, select morphologies that match the pattern. `exclude(n, pattern)` > At reference n, exclude morphologies that match the pattern. `define(n, [morph_list])` > At reference n, set the listed morphologies (a list of strings). `add_morph(n, [morph_list])` > At reference n, add the listed morphologies (a list of strings). Examples: =>> select(\1, "po:noun is:pl") =>> exclude(\1, "po:verb") =>> exclude(\1, "po:verb") and define(\2, ["po:adv"]) and select(\3, "po:adv") Note: All these functions ALWAYS return True. If `select()` and `exclude()` generate an empty list, nothing change. With `define()` and `add_morph()`, you must set a list of POS tags. Example: =>> define(\1, ["po:nom is:plur", "po:adj is:sing", "po:adv"]) =>> add_morph(\1, ["po:adv"]) ### Tagging **Only for token rules** ### Immunity **Only for token rules** A immunity rule set a flag on token(s) who are not supposed to be considered as an error. If any other rules find an error, it will be ignored. If an error has already been found, it will be removed. Example: `!2>>` means no error can be set of the second token. Example: `!>>` means all tokens will be considered as correct. The immunity rules are useful to create simple antipattern that will simplify writing of other rules. ## OTHER COMMANDS ## ### Comments Lines beginning with `#` are comments. |
︙ | ︙ |