Index: doc/syntax.txt ================================================================== --- doc/syntax.txt +++ doc/syntax.txt @@ -1,50 +1,48 @@ WRITING RULES FOR GRAMMALECTE +Note: This documentation is obsolete right now. -= Principles = +# Principles # -Grammalecte is a multi-passes grammar checker engine. On the first pass, the -engine checks the text paragraph by paragraph. On the next passes, the engine +Grammalecte is a bi-passes grammar checker engine. On the first pass, the +engine checks the text paragraph by paragraph. On the second passe, the engine check the text sentence by sentence. -The command to add a new pass is: +The command to switch to the second pass is: [++] -You shoudn’t need more than two passes, but you can create as many passes as -you wish. - In each pass, you can write as many rules as you need. A rule is defined by: -- a regex pattern trigger -- a list of actions (can’t be empty) -- [optional] flags “LCR” for the regex word boundaries and case sensitiveness -- [optional] user option name for activating/disactivating the rule +* [optional] flags “LCR” for the regex word boundaries and case sensitiveness +* a regex pattern trigger +* a list of actions (can’t be empty) +* [optional] user option name for activating/disactivating the rule +* [optional] rule name There is no limit to the number of actions and the type of actions a rule can launch. Each action has its own condition to be triggered. There are three kind of actions: -- Error warning, with a message and optionaly suggestions and optionally an URL +- Error warning, with a message, and optionally suggestions, and optionally an URL - Text transformation, modifying internally the checked text - Disambigation action, setting tags on a position The rules file for your language must be named “rules.grx”. -The options file must be named “option.txt”. The settings file must be named “config.ini”. All these files are simple utf-8 text file. UTF-8 is mandatory. -= Rule syntax = +# Rule syntax # -__LCR__ pattern +__LCR/option(rulename)__ pattern <<- condition ->> error_suggestions # message_error|http://awebsite.net... <<- condition ~>> text_rewriting <<- condition =>> commands_for_disambigation ... @@ -57,101 +55,115 @@ Conditions are optional, i.e.: <<- ~>> replacement LCR flags means: -- Left boundary for the regex -- Case sensitiveness -- Right boundary for the regex - -Left boundary: [ word boundary or < no word boundary -right boundary: ] word boundary or > no word boundary -Case sensitiveness: - i: case insensitive - s: case sensitive - u: uppercase allowed for lowercased characters - i.e.: "Word" becomes "W[oO][rR][dD]" +* L: Left boundary for the regex +* C: Case sensitiveness +* R: Right boundary for the regex + +Left boundary (L): + `[` word boundary + `<` no word boundary + +right boundary (R): + `]` word boundary + `>` no word boundary + +Case sensitiveness (C): + `i` case insensitive + `s` case sensitive + `u` uppercase allowed for lowercased characters + i.e.: "Word" becomes "W[oO][rR][dD]" Examples: -__[i]__ pattern -____ pattern -____ pattern -... + __[i]__ pattern + ____ pattern + ____ pattern + ... + User option activating/disactivating is possible with an option name placed just after the LCR flags, i.e.: -__[i]/useroption1__ pattern -__[u]/useroption2__ pattern -__[s>/useroption1__ pattern -__/useroption3__ pattern -__/useroption3__ pattern -... + __[i]/option1__ pattern + __[u]/option2__ pattern + __[s>/option1__ pattern + __/option3__ pattern + __/option3__ pattern + ... + +Rules can be named: + __[i]/option1(name1)__ pattern + __[u]/option2(name2)__ pattern + __[s>/option1(name3)__ pattern + __/option3(name4)__ pattern + __/option3(name5)__ pattern + ... + +Each rule name must be unique. + The LCR flags are also optional. If you don’t set these flags, the default LCR flags will be: -__[i]__ + __[i]__ Example. Report “foo” in the text and suggest "bar": -foo <<- ->> bar # Use bar instead of foo. + foo <<- ->> bar # Use bar instead of foo. Example. Recognize and suggest missing hyphen and rewrite internally the text with the hyphen: -__[s]__ foo bar - <<- ->> foo-bar # Missing hyphen. - <<- ~>> foo-bar + __[s]__ foo bar + <<- ->> foo-bar # Missing hyphen. + <<- ~>> foo-bar == Simple-line or multi-line rules == Rules can be break to multiple lines by leading tabulators or spaces. You should use 4 spaces. Examples: -____ pattern <<- condition - ->> replacement - # message - <<- condition ->> suggestion # message - <<- condition - ~>> text_rewriting - <<- =>> disambiguation - -____ pattern <<- condition ->> replacement # message + ____ pattern + <<- condition ->> replacement + # message + <<- condition ->> suggestion # message + <<- condition + ~>> text_rewriting + <<- =>> disambiguation + + ____ pattern <<- condition ->> replacement # message -== Comments == +## Comments ## Lines beginning with # are comments. -Example. No action done. -# pattern <<- ->> foo bar # message - - -== End of file == +## End of file ## With the command: -#END +`#END` -the compiler won’t go further. Whatever is written after will be considered -as comments. +at the beginning of a line, the compiler won’t go further. +Whatever is written after will be considered as comments. -== Whitespaces at the border of patterns or suggestions == +## Whitespaces at the border of patterns or suggestions ## Example. Recognize double or more spaces and suggests a single space: -____ " +" <<- ->> " " # Extra space(s). + ____ " +" <<- ->> " " # Extra space(s). ASCII " characters protect spaces in the pattern and in the replacement text. -== Pattern groups and back references == +## Pattern groups and back references ## It is usually useful to retrieve parts of the matched pattern. We simply use parenthesis in pattern to get groups with back references. Example. Suggest a word with correct quotation marks: @@ -165,11 +177,11 @@ Example. Back reference in messages. (fooo) bar <<- ->> foo bar # “\1” should be: -== Name definitions == +## Name definitions ## Grammalecte supports name definitions to simplify the description of the complex rules. Example. @@ -179,30 +191,30 @@ Usage in the rules: ({name}) (\w+) ->> "\1-\2" # Missing hyphen? -== Multiple suggestions == +## Multiple suggestions ## Use | in the replacement text to add multiple suggestions: Example 7. Foo, FOO, Bar and BAR suggestions for the input word "foo". foo <<- ->> Foo|FOO|Bar|BAR # Did you mean: -== No suggestion == +## No suggestion ## You can display message without making suggestions. For this purpose, use a single character _ in the suggestion field. Example. No suggestion. foobar <<- ->> _ # Message -== Positioning == +## Positioning ## Positioning is valid only for error creation and text rewriting. By default, the full pattern will be underlined with blue. You can shorten the underlined text area by specifying a back reference group of the pattern. @@ -212,11 +224,11 @@ Example. (ying) and yang <<- -1>> yin # Did you mean: __[s]__ (Mr.) [A-Z]\w+ <<- ~1>> Mr -=== Comparison === +### Comparison ### Rule A: ying and yang <<- ->> yin and yang # Did you mean: Rule B: @@ -229,21 +241,21 @@ With the rule B, only the first group is underlined: ying and yang ^^^^ -== Longer explanations with URLs == +## Longer explanations with URLs ## Warning messages can contain optional URL for longer explanations separated by "|": (your|her|our|their)['’]s <<- ->> \1s # Possessive pronoun:|http://en.wikipedia.org/wiki/Possessive_pronoun -= Text rewriting = +# Text rewriting # Example. Replacing a string by another Mr. [A-Z]\w+ <<- ~>> Mister @@ -270,11 +282,11 @@ __[s]__ Mr. ([a-z]\w+) <<- ~1>> =\1.upper() -= Disambiguation = +# Disambiguation # When Grammalecte analyses a word with morph or morphex, before requesting the POS tags to the dictionary, it checks if there is a stored marker for the position where the word is. If there is a marker, Grammalecte uses the stored data and don’t make request to the dictionary. @@ -307,11 +319,11 @@ This will store a list of tags at the position of the first group: ["po:nom is:plur", "po:adj is:sing", "po:adv"] -= Conditions = +# Conditions # Conditions are Python expressions, they must return a value, which will be evaluated as boolean. You can use the usual Python syntax and libraries. You can call pattern subgroups via \0, \1, \2… @@ -326,11 +338,11 @@ \3.islower() re.match("pattern", \2) … -== Standard functions == +## Standard functions ## word(n) catches the nth next word after the pattern (separated only by white spaces). returns None if no word catched @@ -361,21 +373,21 @@ returns True if option_name is activated else False Note: the analysis is done on the preprocessed text. -== Default variables == +## Default variables ## sCountry It contains the current country locale of the checked paragraph. colour <<- sCountry == "US" ->> color # Use American English spelling. -= Expressions in the suggestions = +# Expressions in the suggestions # Suggestions (and warning messages) started by an equal sign are Python string expressions extended with possible back references and named definitions: Example: