Documentation ¶
Index ¶
- Variables
- func BuildTermFromRunes(runes []rune) []byte
- func DeleteRune(in []rune, pos int) []rune
- func InsertRune(in []rune, pos int, r rune) []rune
- func RunesEndsWith(input []rune, suffix string) bool
- func TruncateRunes(input []byte, num int) []byte
- type Analyzer
- type ByteArrayConverter
- type CharFilter
- type DateTimeParser
- type Token
- type TokenFilter
- type TokenFreq
- type TokenFrequencies
- type TokenLocation
- type TokenMap
- type TokenStream
- type TokenType
- type Tokenizer
Constants ¶
This section is empty.
Variables ¶
var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")
Functions ¶
func BuildTermFromRunes ¶
func DeleteRune ¶
func RunesEndsWith ¶
func TruncateRunes ¶
Types ¶
type Analyzer ¶
type Analyzer struct { CharFilters []CharFilter Tokenizer Tokenizer TokenFilters []TokenFilter }
func (*Analyzer) Analyze ¶
func (a *Analyzer) Analyze(input []byte) TokenStream
type ByteArrayConverter ¶
type CharFilter ¶
type Token ¶
type Token struct { // Start specifies the byte offset of the beginning of the term in the // field. Start int `json:"start"` // End specifies the byte offset of the end of the term in the field. End int `json:"end"` Term []byte `json:"term"` // Position specifies the 1-based index of the token in the sequence of // occurrences of its term in the field. Position int `json:"position"` Type TokenType `json:"type"` KeyWord bool `json:"keyword"` }
Token represents one occurrence of a term at a particular location in a field.
type TokenFilter ¶
type TokenFilter interface {
Filter(TokenStream) TokenStream
}
A TokenFilter adds, transforms or removes tokens from a token stream.
type TokenFreq ¶
type TokenFreq struct { Term []byte Locations []*TokenLocation // contains filtered or unexported fields }
TokenFreq represents all the occurrences of a term in all fields of a document.
type TokenFrequencies ¶
TokenFrequencies maps document terms to their combined frequencies from all fields.
func TokenFrequency ¶
func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies
func (TokenFrequencies) MergeAll ¶
func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)
type TokenLocation ¶
TokenLocation represents one occurrence of a term at a particular location in a field. Start, End and Position have the same meaning as in analysis.Token. Field and ArrayPositions identify the field value in the source document. See document.Field for details.
type TokenMap ¶
func NewTokenMap ¶
func NewTokenMap() TokenMap
func (TokenMap) LoadBytes ¶
LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`
type TokenStream ¶
type TokenStream []*Token
type Tokenizer ¶
type Tokenizer interface {
Tokenize([]byte) TokenStream
}
A Tokenizer splits an input string into tokens, the usual behaviour being to map words to tokens.
Directories ¶
Path | Synopsis |
---|---|
analyzers
|
|
byte_array_converters
|
|
char_filters
|
|
datetime_parsers
|
|
language
|
|
en
Package en implements an analyzer with reasonable defaults for processing English text.
|
Package en implements an analyzer with reasonable defaults for processing English text. |
token_filters
|
|
lower_case_filter
Package lower_case_filter implements a TokenFilter which converts tokens to lower case according to unicode rules.
|
Package lower_case_filter implements a TokenFilter which converts tokens to lower case according to unicode rules. |
stop_tokens_filter
package stop_tokens_filter implements a TokenFilter removing tokens found in a TokenMap.
|
package stop_tokens_filter implements a TokenFilter removing tokens found in a TokenMap. |
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.
|
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens. |
tokenizers
|
|
exception
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
|
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream. |