analysis

package
v0.0.0-...-8efbd55 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 21, 2015 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")

Functions

func BuildTermFromRunes

func BuildTermFromRunes(runes []rune) []byte

func DeleteRune

func DeleteRune(in []rune, pos int) []rune

func InsertRune

func InsertRune(in []rune, pos int, r rune) []rune

func RunesEndsWith

func RunesEndsWith(input []rune, suffix string) bool

func TruncateRunes

func TruncateRunes(input []byte, num int) []byte

Types

type Analyzer

type Analyzer struct {
	CharFilters  []CharFilter
	Tokenizer    Tokenizer
	TokenFilters []TokenFilter
}

func (*Analyzer) Analyze

func (a *Analyzer) Analyze(input []byte) TokenStream

type ByteArrayConverter

type ByteArrayConverter interface {
	Convert([]byte) (interface{}, error)
}

type CharFilter

type CharFilter interface {
	Filter([]byte) []byte
}

type DateTimeParser

type DateTimeParser interface {
	ParseDateTime(string) (time.Time, error)
}

type Token

type Token struct {
	// Start specifies the byte offset of the beginning of the term in the
	// field.
	Start int `json:"start"`

	// End specifies the byte offset of the end of the term in the field.
	End  int    `json:"end"`
	Term []byte `json:"term"`

	// Position specifies the 1-based index of the token in the sequence of
	// occurrences of its term in the field.
	Position int       `json:"position"`
	Type     TokenType `json:"type"`
	KeyWord  bool      `json:"keyword"`
}

Token represents one occurrence of a term at a particular location in a field.

func (*Token) String

func (t *Token) String() string

type TokenFilter

type TokenFilter interface {
	Filter(TokenStream) TokenStream
}

A TokenFilter adds, transforms or removes tokens from a token stream.

type TokenFreq

type TokenFreq struct {
	Term      []byte
	Locations []*TokenLocation
}

TokenFreq represents all the occurrences of a term in all fields of a document.

func (*TokenFreq) Frequency

func (tf *TokenFreq) Frequency() int

type TokenFrequencies

type TokenFrequencies map[string]*TokenFreq

TokenFrequencies maps document terms to their combined frequencies from all fields.

func TokenFrequency

func TokenFrequency(tokens TokenStream, arrayPositions []uint64) TokenFrequencies

func (TokenFrequencies) MergeAll

func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)

type TokenLocation

type TokenLocation struct {
	Field          string
	ArrayPositions []uint64
	Start          int
	End            int
	Position       int
}

TokenLocation represents one occurrence of a term at a particular location in a field. Start, End and Position have the same meaning as in analysis.Token. Field and ArrayPositions identify the field value in the source document. See document.Field for details.

type TokenMap

type TokenMap map[string]bool

func NewTokenMap

func NewTokenMap() TokenMap

func (TokenMap) AddToken

func (t TokenMap) AddToken(token string)

func (TokenMap) LoadBytes

func (t TokenMap) LoadBytes(data []byte) error

func (TokenMap) LoadFile

func (t TokenMap) LoadFile(filename string) error

func (TokenMap) LoadLine

func (t TokenMap) LoadLine(line string)

type TokenStream

type TokenStream []*Token

type TokenType

type TokenType int
const (
	AlphaNumeric TokenType = iota
	Ideographic
	Numeric
	DateTime
	Shingle
	Single
	Double
)

type Tokenizer

type Tokenizer interface {
	Tokenize([]byte) TokenStream
}

A Tokenizer splits an input string into tokens, the usual behaviour being to map words to tokens.

Directories

Path Synopsis
analyzers
web
byte_array_converters
char_filters
datetime_parsers
language
ar
bg
ca
cjk
ckb
cs
el
en
eu
fa
fr
ga
gl
hi
hy
id
in
it
ja
pt
token_filters
stop_tokens_filter
package stop_tokens_filter implements a TokenFilter removing tokens found in a TokenMap.
package stop_tokens_filter implements a TokenFilter removing tokens found in a TokenMap.
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.
tokenizers
exception
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
web

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL