Contractions expander for Jargon
This package implements a TokenExpander for use with the jargon lemmatizer, expanding common English contractions into separate words.
Examples:
- don't → does not
- We’ve → We have
- SHE'S -> SHE IS
It handles lower, Title and UPPER case tokens, as well as straight ' and smart ’ apostrophes.
Command line
Assuming you have installed the Jargon CLI, use the -cont
flag to specify this numbers expander.
echo "I would've called but he's away from his phone" | jargon -cont
In your code
package main
import (
"fmt"
"github.com/clipperhouse/jargon"
"github.com/clipperhouse/jargon/filters/contractions"
)
var lem = jargon.NewLemmatizer(contractions.Expander)
func main() {
text := "I would've called but he's away from his phone"
r := strings.NewReader(text)
tokens := jargon.Tokenize(r)
// Or! Pass tokens on to the lemmatizer
lemmas := lem.Lemmatize(tokens)
for {
lemma := tokens.Next()
if lemma == nil {
break
}
fmt.Print(lemma)
}
}
Implementation
The Lookup method satisfies the jargon.TokenFilter interface.
Here is the base list of contractions. Variations (case, apostrophes) are code-generated.