analysis

package

v0.8.1 Latest Latest Go to latest Published: Sep 20, 2019 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/nicolai86/bleve

Documentation ¶

Index ¶

Variables
func BuildTermFromRunes(runes []rune) []byte
func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte
func DeleteRune(in []rune, pos int) []rune
func InsertRune(in []rune, pos int, r rune) []rune
func RunesEndsWith(input []rune, suffix string) bool
func TruncateRunes(input []byte, num int) []byte
type Analyzer
- func (a *Analyzer) Analyze(input []byte) TokenStream
type ByteArrayConverter
type CharFilter
type DateTimeParser
type Token
- func (t *Token) String() string
type TokenFilter
type TokenFreq
- func (tf *TokenFreq) Frequency() int
- func (tf *TokenFreq) Size() int
type TokenFrequencies
- func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies
- func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)
- func (tfs TokenFrequencies) Size() int
type TokenLocation
- func (tl *TokenLocation) Size() int
type TokenMap
- func NewTokenMap() TokenMap
- func (t TokenMap) AddToken(token string)
- func (t TokenMap) LoadBytes(data []byte) error
- func (t TokenMap) LoadFile(filename string) error
- func (t TokenMap) LoadLine(line string)
type TokenStream
type TokenType
type Tokenizer

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")

Functions ¶

func BuildTermFromRunes ¶

func BuildTermFromRunes(runes []rune) []byte

func BuildTermFromRunesOptimistic ¶

func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte

BuildTermFromRunesOptimistic will build a term from the provided runes AND optimistically attempt to encode into the provided buffer if at any point it appears the buffer is too small, a new buffer is allocated and that is used instead this should be used in cases where frequently the new term is the same length or shorter than the original term (in number of bytes)

func DeleteRune ¶

func DeleteRune(in []rune, pos int) []rune

func InsertRune ¶

func InsertRune(in []rune, pos int, r rune) []rune

func RunesEndsWith ¶

func RunesEndsWith(input []rune, suffix string) bool

func TruncateRunes ¶

func TruncateRunes(input []byte, num int) []byte

Types ¶

type Analyzer ¶

type Analyzer struct {
	CharFilters  []CharFilter
	Tokenizer    Tokenizer
	TokenFilters []TokenFilter
}

func (*Analyzer) Analyze ¶

func (a *Analyzer) Analyze(input []byte) TokenStream

type ByteArrayConverter ¶

type ByteArrayConverter interface {
	Convert([]byte) (interface{}, error)
}

type CharFilter ¶

type CharFilter interface {
	Filter([]byte) []byte
}

type DateTimeParser ¶

type DateTimeParser interface {
	ParseDateTime(string) (time.Time, error)
}

type Token ¶

type Token struct {
	// Start specifies the byte offset of the beginning of the term in the
	// field.
	Start int `json:"start"`

	// End specifies the byte offset of the end of the term in the field.
	End  int    `json:"end"`
	Term []byte `json:"term"`

	// Position specifies the 1-based index of the token in the sequence of
	// occurrences of its term in the field.
	Position int       `json:"position"`
	Type     TokenType `json:"type"`
	KeyWord  bool      `json:"keyword"`
}

Token represents one occurrence of a term at a particular location in a field.

func (*Token) String ¶

func (t *Token) String() string

type TokenFilter ¶

type TokenFilter interface {
	Filter(TokenStream) TokenStream
}

A TokenFilter adds, transforms or removes tokens from a token stream.

type TokenFreq ¶

type TokenFreq struct {
	Term      []byte
	Locations []*TokenLocation
	// contains filtered or unexported fields
}

TokenFreq represents all the occurrences of a term in all fields of a document.

func (*TokenFreq) Frequency ¶

func (tf *TokenFreq) Frequency() int

func (*TokenFreq) Size ¶

func (tf *TokenFreq) Size() int

type TokenFrequencies ¶

type TokenFrequencies map[string]*TokenFreq

TokenFrequencies maps document terms to their combined frequencies from all fields.

func TokenFrequency ¶

func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies

func (TokenFrequencies) MergeAll ¶

func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)

func (TokenFrequencies) Size ¶

func (tfs TokenFrequencies) Size() int

type TokenLocation ¶

type TokenLocation struct {
	Field          string
	ArrayPositions []uint64
	Start          int
	End            int
	Position       int
}

TokenLocation represents one occurrence of a term at a particular location in a field. Start, End and Position have the same meaning as in analysis.Token. Field and ArrayPositions identify the field value in the source document. See document.Field for details.

func (*TokenLocation) Size ¶

func (tl *TokenLocation) Size() int

type TokenMap ¶

type TokenMap map[string]bool

func NewTokenMap ¶

func NewTokenMap() TokenMap

func (TokenMap) AddToken ¶

func (t TokenMap) AddToken(token string)

func (TokenMap) LoadBytes ¶

func (t TokenMap) LoadBytes(data []byte) error

LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadFile ¶

func (t TokenMap) LoadFile(filename string) error

LoadFile reads in a list of tokens from a text file, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadLine ¶

func (t TokenMap) LoadLine(line string)

type TokenStream ¶

type TokenStream []*Token

type TokenType ¶

type TokenType int

const (
	AlphaNumeric TokenType = iota
	Ideographic
	Numeric
	DateTime
	Shingle
	Single
	Double
	Boolean
)

type Tokenizer ¶

type Tokenizer interface {
	Tokenize([]byte) TokenStream
}

A Tokenizer splits an input string into tokens, the usual behaviour being to map words to tokens.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
analyzer
custom
keyword
simple
standard
web
char
asciifolding
html
regexp
zerowidthnonjoiner
datetime
flexible
optional
lang
ar
bg
ca
cjk
ckb
cs
da
de
el
en Package en implements an analyzer with reasonable defaults for processing English text.	Package en implements an analyzer with reasonable defaults for processing English text.
es
eu
fa
fi
fr
ga
gl
hi
hu
hy
id
in
it
nl
no
pt
ro
ru
sv
tr
token
apostrophe
camelcase
compound
edgengram
elision
keyword
length
lowercase Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.	Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.
ngram
porter
reverse
shingle
snowball
stop Package stop implements a TokenFilter removing tokens found in a TokenMap.	Package stop implements a TokenFilter removing tokens found in a TokenMap.
truncate
unicodenorm
unique
tokenizer
character
exception package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.	package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
letter
regexp
single
unicode
web
whitespace
tokenmap package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.	package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL