Documentation ¶
Index ¶
- Variables
- func LowercaseFilter(s string) (string, bool)
- func RemoveDiacritics(s string) string
- func RemoveDiacriticsFilter(s string) (string, bool)
- func Tokenize(s string) []string
- func TrackNumFilter(s string) (string, bool)
- func WhitespaceTokenizer(s string) []string
- type FilterFunc
- type Tokenizer
- type TokenizerFunc
Constants ¶
This section is empty.
Variables ¶
View Source
var DefaultTokenizer = NewTokenizer( WhitespaceTokenizer, MinLengthFilter(2), TrackNumFilter, LowercaseFilter, RemoveDiacriticsFilter)
Functions ¶
func LowercaseFilter ¶
LowercaseFilter turns a token into lower case.
func RemoveDiacritics ¶
RemoveDiacritics drops diacritic marks from the input string, using a Unicode NFKD normalization.
func RemoveDiacriticsFilter ¶
RemoveDiacriticsFilter drops diacritics from the token.
func TrackNumFilter ¶
TrackNumFilter drops tokens that look like track numbers (small two-digit zero-padded values), which frequently end up in song metadata.
func WhitespaceTokenizer ¶
WhitespaceTokenizer splits a string into alphanumeric tokens.
Types ¶
type FilterFunc ¶
func MinLengthFilter ¶
func MinLengthFilter(minLength int) FilterFunc
MinLengthFilter returns a FilterFunc which selects tokens that meet a minimum length.
type Tokenizer ¶
type Tokenizer struct { T TokenizerFunc Filters []FilterFunc }
func NewTokenizer ¶
func NewTokenizer(tokenizer TokenizerFunc, filters ...FilterFunc) *Tokenizer
type TokenizerFunc ¶
Click to show internal directories.
Click to hide internal directories.