analysis

package

v2.4.4 Latest Latest Go to latest Published: Dec 17, 2024 License: Apache-2.0 Imports: 9 Imported by: 227

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/blevesearch/bleve

Documentation ¶

Index ¶

Variables
func BuildTermFromRunes(runes []rune) []byte
func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte
func DeleteRune(in []rune, pos int) []rune
func InsertRune(in []rune, pos int, r rune) []rune
func RunesEndsWith(input []rune, suffix string) bool
func TokenFrequency(tokens TokenStream, arrayPositions []uint64, ...) index.TokenFrequencies
func TruncateRunes(input []byte, num int) []byte
type Analyzer
type ByteArrayConverter
type CharFilter
type DateTimeParser
type DefaultAnalyzer
- func (a *DefaultAnalyzer) Analyze(input []byte) TokenStream
type Token
- func (t *Token) String() string
type TokenFilter
type TokenMap
- func NewTokenMap() TokenMap
- func (t TokenMap) AddToken(token string)
- func (t TokenMap) LoadBytes(data []byte) error
- func (t TokenMap) LoadFile(filename string) error
- func (t TokenMap) LoadLine(line string)
type TokenStream
type TokenType
type Tokenizer

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")

View Source

var ErrInvalidTimestampRange = fmt.Errorf("timestamp out of range")

View Source

var ErrInvalidTimestampString = fmt.Errorf("unable to parse timestamp string")

Functions ¶

func BuildTermFromRunes ¶

func BuildTermFromRunes(runes []rune) []byte

func BuildTermFromRunesOptimistic ¶

func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte

BuildTermFromRunesOptimistic will build a term from the provided runes AND optimistically attempt to encode into the provided buffer if at any point it appears the buffer is too small, a new buffer is allocated and that is used instead this should be used in cases where frequently the new term is the same length or shorter than the original term (in number of bytes)

func DeleteRune ¶

func DeleteRune(in []rune, pos int) []rune

func InsertRune ¶

func InsertRune(in []rune, pos int, r rune) []rune

func RunesEndsWith ¶

func RunesEndsWith(input []rune, suffix string) bool

func TokenFrequency ¶

func TokenFrequency(tokens TokenStream, arrayPositions []uint64, options index.FieldIndexingOptions) index.TokenFrequencies

func TruncateRunes ¶

func TruncateRunes(input []byte, num int) []byte

Types ¶

type Analyzer ¶

type Analyzer interface {
	Analyze([]byte) TokenStream
}

type ByteArrayConverter ¶

type ByteArrayConverter interface {
	Convert([]byte) (interface{}, error)
}

type CharFilter ¶

type CharFilter interface {
	Filter([]byte) []byte
}

type DateTimeParser ¶

type DateTimeParser interface {
	ParseDateTime(string) (time.Time, string, error)
}

type DefaultAnalyzer ¶ added in v2.3.5

type DefaultAnalyzer struct {
	CharFilters  []CharFilter
	Tokenizer    Tokenizer
	TokenFilters []TokenFilter
}

func (*DefaultAnalyzer) Analyze ¶ added in v2.3.5

func (a *DefaultAnalyzer) Analyze(input []byte) TokenStream

type Token ¶

type Token struct {
	// Start specifies the byte offset of the beginning of the term in the
	// field.
	Start int `json:"start"`

	// End specifies the byte offset of the end of the term in the field.
	End  int    `json:"end"`
	Term []byte `json:"term"`

	// Position specifies the 1-based index of the token in the sequence of
	// occurrences of its term in the field.
	Position int       `json:"position"`
	Type     TokenType `json:"type"`
	KeyWord  bool      `json:"keyword"`
}

Token represents one occurrence of a term at a particular location in a field.

func (*Token) String ¶

func (t *Token) String() string

type TokenFilter ¶

type TokenFilter interface {
	Filter(TokenStream) TokenStream
}

A TokenFilter adds, transforms or removes tokens from a token stream.

type TokenMap ¶

type TokenMap map[string]bool

func NewTokenMap ¶

func NewTokenMap() TokenMap

func (TokenMap) AddToken ¶

func (t TokenMap) AddToken(token string)

func (TokenMap) LoadBytes ¶

func (t TokenMap) LoadBytes(data []byte) error

LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadFile ¶

func (t TokenMap) LoadFile(filename string) error

LoadFile reads in a list of tokens from a text file, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadLine ¶

func (t TokenMap) LoadLine(line string)

type TokenStream ¶

type TokenStream []*Token

type TokenType ¶

type TokenType int

const (
	AlphaNumeric TokenType = iota
	Ideographic
	Numeric
	DateTime
	Shingle
	Single
	Double
	Boolean
	IP
)

type Tokenizer ¶

type Tokenizer interface {
	Tokenize([]byte) TokenStream
}

A Tokenizer splits an input string into tokens, the usual behaviour being to map words to tokens.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
analyzer
custom
keyword
simple
standard
web
char
asciifolding
html
regexp
zerowidthnonjoiner
datetime
flexible
iso
optional
percent
sanitized
timestamp/microseconds
timestamp/milliseconds
timestamp/nanoseconds
timestamp/seconds
lang
ar
bg
ca
cjk
ckb
cs
da
de
el
en Package en implements an analyzer with reasonable defaults for processing English text.	Package en implements an analyzer with reasonable defaults for processing English text.
es
eu
fa
fi
fr
ga
gl
hi
hr
hu
hy
id
in
it
nl
no
pl
pl/stempel
pl/stempel/javadata
pt
ro
ru
sv
tr
token
apostrophe
camelcase
compound
edgengram
elision
hierarchy
keyword
length
lowercase Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.	Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.
ngram
porter
reverse
shingle
snowball
stop Package stop implements a TokenFilter removing tokens found in a TokenMap.	Package stop implements a TokenFilter removing tokens found in a TokenMap.
truncate
unicodenorm
unique
tokenizer
character
exception package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.	package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
letter
regexp
single
unicode
web
whitespace
tokenmap package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.	package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL