analysis

package

v0.2.2 Latest Latest Go to latest Published: Jul 4, 2022 License: Apache-2.0 Imports: 9 Imported by: 92

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/blugelabs/bluge

Links

Open Source Insights

Documentation ¶

Index ¶

func BuildTermFromRunes(runes []rune) []byte
func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte
func DeleteRune(in []rune, pos int) []rune
func InsertRune(in []rune, pos int, r rune) []rune
func RunesEndsWith(input []rune, suffix string) bool
func TruncateRunes(input []byte, num int) []byte
type Analyzer
- func (a *Analyzer) Analyze(input []byte) TokenStream
type CharFilter
type Token
- func (t *Token) String() string
type TokenFilter
type TokenFreq
- func (tf *TokenFreq) EachLocation(location segment.VisitLocation)
- func (tf *TokenFreq) Frequency() int
- func (tf *TokenFreq) Size() int
- func (tf *TokenFreq) Term() []byte
type TokenFrequencies
- func TokenFrequency(tokens TokenStream, includeTermVectors bool, startOffset int) (tokenFreqs TokenFrequencies, position int)
- func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)
- func (tfs TokenFrequencies) MergeOneBytes(remoteField string, tfk []byte, tf *TokenFreq)
- func (tfs TokenFrequencies) Size() int
type TokenLocation
- func (tl *TokenLocation) End() int
- func (tl *TokenLocation) Field() string
- func (tl *TokenLocation) Pos() int
- func (tl *TokenLocation) Size() int
- func (tl *TokenLocation) Start() int
type TokenMap
- func NewTokenMap() TokenMap
- func (t TokenMap) AddToken(token string)
- func (t TokenMap) LoadBytes(data []byte)
- func (t TokenMap) LoadFile(filename string) error
- func (t TokenMap) LoadLine(line string)
type TokenStream
type TokenType
type Tokenizer

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BuildTermFromRunes ¶

func BuildTermFromRunes(runes []rune) []byte

func BuildTermFromRunesOptimistic ¶

func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte

BuildTermFromRunesOptimistic will build a term from the provided runes AND optimistically attempt to encode into the provided buffer if at any point it appears the buffer is too small, a new buffer is allocated and that is used instead this should be used in cases where frequently the new term is the same length or shorter than the original term (in number of bytes)

func DeleteRune ¶

func DeleteRune(in []rune, pos int) []rune

func InsertRune ¶

func InsertRune(in []rune, pos int, r rune) []rune

func RunesEndsWith ¶

func RunesEndsWith(input []rune, suffix string) bool

func TruncateRunes ¶

func TruncateRunes(input []byte, num int) []byte

Types ¶

type Analyzer ¶

type Analyzer struct {
	CharFilters  []CharFilter
	Tokenizer    Tokenizer
	TokenFilters []TokenFilter
}

func (*Analyzer) Analyze ¶

func (a *Analyzer) Analyze(input []byte) TokenStream

type CharFilter ¶

type CharFilter interface {
	Filter([]byte) []byte
}

type Token ¶

type Token struct {
	// Start specifies the byte offset of the beginning of the term in the
	// field.
	Start int

	// End specifies the byte offset of the end of the term in the field.
	End  int
	Term []byte

	// PositionIncr specifies the position of this token relative to the previous.
	PositionIncr int
	Type         TokenType
	KeyWord      bool
}

Token represents one occurrence of a term at a particular location in a field.

func (*Token) String ¶

func (t *Token) String() string

type TokenFilter ¶

type TokenFilter interface {
	Filter(TokenStream) TokenStream
}

A TokenFilter adds, transforms or removes tokens from a token stream.

type TokenFreq ¶

type TokenFreq struct {
	TermVal   []byte
	Locations []*TokenLocation
	// contains filtered or unexported fields
}

TokenFreq represents all the occurrences of a term in all fields of a document.

func (*TokenFreq) EachLocation ¶

func (tf *TokenFreq) EachLocation(location segment.VisitLocation)

func (*TokenFreq) Frequency ¶

func (tf *TokenFreq) Frequency() int

func (*TokenFreq) Size ¶

func (tf *TokenFreq) Size() int

func (*TokenFreq) Term ¶

func (tf *TokenFreq) Term() []byte

type TokenFrequencies ¶

type TokenFrequencies map[string]*TokenFreq

TokenFrequencies maps document terms to their combined frequencies from all fields.

func TokenFrequency ¶

func TokenFrequency(tokens TokenStream, includeTermVectors bool, startOffset int) (
	tokenFreqs TokenFrequencies, position int)

func (TokenFrequencies) MergeAll ¶

func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)

func (TokenFrequencies) MergeOneBytes ¶

func (tfs TokenFrequencies) MergeOneBytes(remoteField string, tfk []byte, tf *TokenFreq)

func (TokenFrequencies) Size ¶

func (tfs TokenFrequencies) Size() int

type TokenLocation ¶

type TokenLocation struct {
	FieldVal    string
	StartVal    int
	EndVal      int
	PositionVal int
}

TokenLocation represents one occurrence of a term at a particular location in a field. Start, End and Position have the same meaning as in analysis.Token. Field and ArrayPositions identify the field value in the source document. See document.Field for details.

func (*TokenLocation) End ¶

func (tl *TokenLocation) End() int

func (*TokenLocation) Field ¶

func (tl *TokenLocation) Field() string

func (*TokenLocation) Pos ¶

func (tl *TokenLocation) Pos() int

func (*TokenLocation) Size ¶

func (tl *TokenLocation) Size() int

func (*TokenLocation) Start ¶

func (tl *TokenLocation) Start() int

type TokenMap ¶

type TokenMap map[string]bool

func NewTokenMap ¶

func NewTokenMap() TokenMap

func (TokenMap) AddToken ¶

func (t TokenMap) AddToken(token string)

func (TokenMap) LoadBytes ¶

func (t TokenMap) LoadBytes(data []byte)

LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadFile ¶

func (t TokenMap) LoadFile(filename string) error

LoadFile reads in a list of tokens from a text file, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadLine ¶

func (t TokenMap) LoadLine(line string)

type TokenStream ¶

type TokenStream []*Token

type TokenType ¶

type TokenType int

const (
	AlphaNumeric TokenType = iota
	Ideographic
	Numeric
	DateTime
	Shingle
	Single
	Double
	Boolean
)

type Tokenizer ¶

type Tokenizer interface {
	Tokenize([]byte) TokenStream
}

A Tokenizer splits an input string into tokens, the usual behavior being to map words to tokens.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
analyzer
char
lang
ar
bg
ca
cjk
ckb
cs
da
de
el
en Package en implements an analyzer with reasonable defaults for processing English text.	Package en implements an analyzer with reasonable defaults for processing English text.
es
eu
fa
fi
fr
ga
gl
hi
hu
hy
id
in
it
nl
no
pt
ro
ru
sv
tr
token Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.	Package lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.
tokenizer

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL