Documentation ¶
Index ¶
- func BuildTermFromRunes(runes []rune) []byte
- func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte
- func DeleteRune(in []rune, pos int) []rune
- func InsertRune(in []rune, pos int, r rune) []rune
- func RunesEndsWith(input []rune, suffix string) bool
- func TruncateRunes(input []byte, num int) []byte
- type Analyzer
- type CharFilter
- type Token
- type TokenFilter
- type TokenFreq
- type TokenFrequencies
- type TokenLocation
- type TokenMap
- type TokenStream
- type TokenType
- type Tokenizer
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BuildTermFromRunes ¶
func BuildTermFromRunesOptimistic ¶
BuildTermFromRunesOptimistic will build a term from the provided runes AND optimistically attempt to encode into the provided buffer if at any point it appears the buffer is too small, a new buffer is allocated and that is used instead this should be used in cases where frequently the new term is the same length or shorter than the original term (in number of bytes)
func DeleteRune ¶
func RunesEndsWith ¶
func TruncateRunes ¶
Types ¶
type Analyzer ¶
type Analyzer struct { CharFilters []CharFilter Tokenizer Tokenizer TokenFilters []TokenFilter }
func (*Analyzer) Analyze ¶
func (a *Analyzer) Analyze(input []byte) TokenStream
type CharFilter ¶
type Token ¶
type Token struct { // Start specifies the byte offset of the beginning of the term in the // field. Start int // End specifies the byte offset of the end of the term in the field. End int Term []byte // PositionIncr specifies the position of this token relative to the previous. PositionIncr int Type TokenType KeyWord bool }
Token represents one occurrence of a term at a particular location in a field.
type TokenFilter ¶
type TokenFilter interface {
Filter(TokenStream) TokenStream
}
A TokenFilter adds, transforms or removes tokens from a token stream.
type TokenFreq ¶
type TokenFreq struct { TermVal []byte Locations []*TokenLocation // contains filtered or unexported fields }
TokenFreq represents all the occurrences of a term in all fields of a document.
func (*TokenFreq) EachLocation ¶
func (tf *TokenFreq) EachLocation(location segment.VisitLocation)
type TokenFrequencies ¶
TokenFrequencies maps document terms to their combined frequencies from all fields.
func TokenFrequency ¶
func TokenFrequency(tokens TokenStream, includeTermVectors bool, startOffset int) ( tokenFreqs TokenFrequencies, position int)
func (TokenFrequencies) MergeAll ¶
func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)
func (TokenFrequencies) MergeOneBytes ¶
func (tfs TokenFrequencies) MergeOneBytes(remoteField string, tfk []byte, tf *TokenFreq)
func (TokenFrequencies) Size ¶
func (tfs TokenFrequencies) Size() int
type TokenLocation ¶
TokenLocation represents one occurrence of a term at a particular location in a field. Start, End and Position have the same meaning as in analysis.Token. Field and ArrayPositions identify the field value in the source document. See document.Field for details.
func (*TokenLocation) End ¶
func (tl *TokenLocation) End() int
func (*TokenLocation) Field ¶
func (tl *TokenLocation) Field() string
func (*TokenLocation) Pos ¶
func (tl *TokenLocation) Pos() int
func (*TokenLocation) Size ¶
func (tl *TokenLocation) Size() int
func (*TokenLocation) Start ¶
func (tl *TokenLocation) Start() int
type TokenMap ¶
func NewTokenMap ¶
func NewTokenMap() TokenMap
func (TokenMap) LoadBytes ¶
LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`
type TokenStream ¶
type TokenStream []*Token
type Tokenizer ¶
type Tokenizer interface {
Tokenize([]byte) TokenStream
}
A Tokenizer splits an input string into tokens, the usual behavior being to map words to tokens.
Directories ¶
Path | Synopsis |
---|---|
lang
|
|
en
Package en implements an analyzer with reasonable defaults for processing English text.
|
Package en implements an analyzer with reasonable defaults for processing English text. |
Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.
|
Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules. |