Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var IdeographRegexp = regexp.MustCompile(`\p{Han}|\p{Hangul}|\p{Hiragana}|\p{Katakana}`)
Functions ¶
func MakeTokenStream ¶
func MakeTokenStream(input []byte) analysis.TokenStream
Types ¶
type CharacterTokenizer ¶
type CharacterTokenizer struct {
// contains filtered or unexported fields
}
func NewCharacterTokenizer ¶
func NewCharacterTokenizer(f IsTokenRune) *CharacterTokenizer
func NewLetterTokenizer ¶
func NewLetterTokenizer() *CharacterTokenizer
func NewWhitespaceTokenizer ¶
func NewWhitespaceTokenizer() *CharacterTokenizer
func (*CharacterTokenizer) Tokenize ¶
func (c *CharacterTokenizer) Tokenize(input []byte) analysis.TokenStream
type ExceptionsTokenizer ¶
type ExceptionsTokenizer struct {
// contains filtered or unexported fields
}
ExceptionsTokenizer implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream. Use it to preserve sequences which a regular tokenizer would alter or remove.
Its constructor takes the following arguments:
"exceptions" ([]string): one or more Go regular expressions matching the sequence to preserve. Multiple expressions are combined with "|".
"tokenizer" (string): the name of the tokenizer processing the data not matched by "exceptions".
func NewExceptionsTokenizer ¶
func NewExceptionsTokenizer(exception *regexp.Regexp, remaining analysis.Tokenizer) *ExceptionsTokenizer
func NewWebTokenizer ¶
func NewWebTokenizer() *ExceptionsTokenizer
func (*ExceptionsTokenizer) Tokenize ¶
func (t *ExceptionsTokenizer) Tokenize(input []byte) analysis.TokenStream
type IsTokenRune ¶
type RegexpTokenizer ¶
type RegexpTokenizer struct {
// contains filtered or unexported fields
}
func NewRegexpTokenizer ¶
func NewRegexpTokenizer(r *regexp.Regexp) *RegexpTokenizer
func (*RegexpTokenizer) Tokenize ¶
func (rt *RegexpTokenizer) Tokenize(input []byte) analysis.TokenStream
type SingleTokenTokenizer ¶
type SingleTokenTokenizer struct{}
func NewSingleTokenTokenizer ¶
func NewSingleTokenTokenizer() *SingleTokenTokenizer
func (*SingleTokenTokenizer) Tokenize ¶
func (t *SingleTokenTokenizer) Tokenize(input []byte) analysis.TokenStream
type UnicodeTokenizer ¶
type UnicodeTokenizer struct{}
func NewUnicodeTokenizer ¶
func NewUnicodeTokenizer() *UnicodeTokenizer
func (*UnicodeTokenizer) Tokenize ¶
func (rt *UnicodeTokenizer) Tokenize(input []byte) analysis.TokenStream
Click to show internal directories.
Click to hide internal directories.