Documentation ¶
Index ¶
- Variables
- func BuildTermFromRunes(runes []rune) []byte
- func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte
- func DeleteRune(in []rune, pos int) []rune
- func InsertRune(in []rune, pos int, r rune) []rune
- func RunesEndsWith(input []rune, suffix string) bool
- func TruncateRunes(input []byte, num int) []byte
- type Analyzer
- type ByteArrayConverter
- type CharFilter
- type DateTimeParser
- type Token
- type TokenFilter
- type TokenFreq
- type TokenFrequencies
- type TokenLocation
- type TokenMap
- type TokenStream
- type TokenType
- type Tokenizer
Constants ¶
This section is empty.
Variables ¶
var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")
Functions ¶
func BuildTermFromRunes ¶
func BuildTermFromRunesOptimistic ¶
BuildTermFromRunesOptimistic will build a term from the provided runes AND optimistically attempt to encode into the provided buffer if at any point it appears the buffer is too small, a new buffer is allocated and that is used instead this should be used in cases where frequently the new term is the same length or shorter than the original term (in number of bytes)
func DeleteRune ¶
func RunesEndsWith ¶
func TruncateRunes ¶
Types ¶
type Analyzer ¶
type Analyzer struct { CharFilters []CharFilter Tokenizer Tokenizer TokenFilters []TokenFilter }
func (*Analyzer) Analyze ¶
func (a *Analyzer) Analyze(input []byte) TokenStream
type ByteArrayConverter ¶
type CharFilter ¶
type Token ¶
type Token struct { // Start specifies the byte offset of the beginning of the term in the // field. Start int `json:"start"` // End specifies the byte offset of the end of the term in the field. End int `json:"end"` Term []byte `json:"term"` // Position specifies the 1-based index of the token in the sequence of // occurrences of its term in the field. Position int `json:"position"` Type TokenType `json:"type"` KeyWord bool `json:"keyword"` }
Token represents one occurrence of a term at a particular location in a field.
type TokenFilter ¶
type TokenFilter interface {
Filter(TokenStream) TokenStream
}
A TokenFilter adds, transforms or removes tokens from a token stream.
type TokenFreq ¶
type TokenFreq struct { Term []byte Locations []*TokenLocation // contains filtered or unexported fields }
TokenFreq represents all the occurrences of a term in all fields of a document.
type TokenFrequencies ¶
TokenFrequencies maps document terms to their combined frequencies from all fields.
func TokenFrequency ¶
func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies
func (TokenFrequencies) MergeAll ¶
func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)
func (TokenFrequencies) Size ¶
func (tfs TokenFrequencies) Size() int
type TokenLocation ¶
TokenLocation represents one occurrence of a term at a particular location in a field. Start, End and Position have the same meaning as in analysis.Token. Field and ArrayPositions identify the field value in the source document. See document.Field for details.
func (*TokenLocation) Size ¶
func (tl *TokenLocation) Size() int
type TokenMap ¶
func NewTokenMap ¶
func NewTokenMap() TokenMap
func (TokenMap) LoadBytes ¶
LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`
type TokenStream ¶
type TokenStream []*Token
type Tokenizer ¶
type Tokenizer interface {
Tokenize([]byte) TokenStream
}
A Tokenizer splits an input string into tokens, the usual behaviour being to map words to tokens.
Directories ¶
Path | Synopsis |
---|---|
analyzer
|
|
char
|
|
datetime
|
|
lang
|
|
en
Package en implements an analyzer with reasonable defaults for processing English text.
|
Package en implements an analyzer with reasonable defaults for processing English text. |
es
Copyright (c) 2017 Couchbase, Inc.
|
Copyright (c) 2017 Couchbase, Inc. |
token
|
|
lowercase
Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.
|
Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules. |
stop
Package stop implements a TokenFilter removing tokens found in a TokenMap.
|
Package stop implements a TokenFilter removing tokens found in a TokenMap. |
tokenizer
|
|
exception
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
|
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream. |
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.
|
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens. |