Grammalecte  Check-in [f1c63ee223]

Overview
Comment:[graphspell][py] count words occurrence: count elided words
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk | graphspell
Files: files | file ages | folders
SHA3-256: f1c63ee223c5255895c238d1a91d5510756d11b7201728e0ff8c1e69fb8d4be3
User & Date: olr on 2020-03-02 12:59:38
Other Links: manifest | tags
Context
2020-03-02
15:15
[lo][fr] Recenseur de mots: compte double pour les mots composés formes verbales interrogatives check-in: f02cd26db1 user: olr tags: trunk, lo
12:59
[graphspell][py] count words occurrence: count elided words check-in: f1c63ee223 user: olr tags: trunk, graphspell
12:07
[fr] ajustements check-in: e72a9109a9 user: olr tags: trunk, fr
Changes

Modified graphspell/spellchecker.py from [9c6b027a4b] to [114a0237c3].

146
147
148
149
150
151
152
153
154
155
156
157
158
159
160

    def countWordsOccurrences (self, sText, bByLemma=False, bOnlyUnknownWords=False, dWord={}):
        """count word occurrences.
           <dWord> can be used to cumulate count from several texts."""
        if not self.oTokenizer:
            self._loadTokenizer()
        for dToken in self.oTokenizer.genTokens(sText):
            if dToken['sType'] == "WORD":
                if bOnlyUnknownWords:
                    if not self.isValidToken(dToken['sValue']):
                        dWord[dToken['sValue']] = dWord.get(dToken['sValue'], 0) + 1
                else:
                    if not bByLemma:
                        dWord[dToken['sValue']] = dWord.get(dToken['sValue'], 0) + 1
                    else:







|







146
147
148
149
150
151
152
153
154
155
156
157
158
159
160

    def countWordsOccurrences (self, sText, bByLemma=False, bOnlyUnknownWords=False, dWord={}):
        """count word occurrences.
           <dWord> can be used to cumulate count from several texts."""
        if not self.oTokenizer:
            self._loadTokenizer()
        for dToken in self.oTokenizer.genTokens(sText):
            if dToken['sType'].startswith("WORD"):
                if bOnlyUnknownWords:
                    if not self.isValidToken(dToken['sValue']):
                        dWord[dToken['sValue']] = dWord.get(dToken['sValue'], 0) + 1
                else:
                    if not bByLemma:
                        dWord[dToken['sValue']] = dWord.get(dToken['sValue'], 0) + 1
                    else: