Documentation
¶
Overview ¶
Package tokenize provides the means to create a tokenizer chain
Index ¶
Constants ¶
View Source
const BASE_SOUNDEX = "0000"
Variables ¶
This section is empty.
Functions ¶
func EncodeSoundex ¶
Types ¶
type CharNgram ¶
type CharNgram struct {
// contains filtered or unexported fields
}
func NewCharNgram ¶
type LeftEdge ¶
type LeftEdge struct {
// contains filtered or unexported fields
}
func NewLeftEdge ¶
type Shingles ¶
type Shingles struct {
// contains filtered or unexported fields
}
Shingles tokenizer (n-gram for words)
type Surround ¶
type Surround struct {
// contains filtered or unexported fields
}
NewSurround("$").Apply([]string{"h","he","hel"}) -> []string{"$h","he","hel$"}
func NewSurround ¶
type Whitespace ¶
type Whitespace struct{}
func NewWhitespace ¶
func NewWhitespace() *Whitespace
func (*Whitespace) Apply ¶
func (w *Whitespace) Apply(current []Token) []Token
Click to show internal directories.
Click to hide internal directories.