Documentation ¶
Index ¶
- func Distance(s1, s2 string, c DistanceCalculator) (float64, error)
- func IsFuzzyMatch(s1, s2 string, fuzziness float64, c DistanceCalculator) (bool, error)
- func IsPhraseMatch(pos1, pos2, slop int) (bool, error)
- func LuceneASCIIFolding(str string) string
- func NativeASCIIFolding(text string) string
- func Transform(s1 string, pp PhoneticPreprocessor) (string, error)
- type Analyzer
- type AutoComplete
- type Autos
- type DistanceCalculator
- type Matches
- func (m *Matches) AppendTerm(term string, tv *Vectors)
- func (m *Matches) DocCount() int
- func (m *Matches) HasDocID(docID int) bool
- func (m *Matches) Merge(matches *Matches)
- func (m *Matches) Reset()
- func (m *Matches) TermCount() int
- func (m *Matches) TermFrequency() map[string]int
- func (m *Matches) Total() int
- func (m *Matches) Vectors() *Vectors
- type Operator
- type PhoneticPreprocessor
- type QueryOption
- type QueryParser
- type QueryTerm
- type QueryType
- type SpellCheckOption
- type SpellChecker
- type Token
- type TokenOption
- type TokenStream
- type Tokenizer
- type Vector
- type Vectors
- func (tv *Vectors) Add(doc, pos int)
- func (tv *Vectors) AddPhraseVector(vector Vector)
- func (tv *Vectors) AddVector(vector Vector)
- func (tv *Vectors) DocCount() int
- func (tv *Vectors) HasDoc(doc int) bool
- func (tv *Vectors) HasVector(vector Vector) bool
- func (tv *Vectors) Merge(vectors *Vectors)
- func (tv *Vectors) Size() int
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Distance ¶
func Distance(s1, s2 string, c DistanceCalculator) (float64, error)
Distance calculates the similarity between two strings. The DistanceCalculator determines which algorithm is used.
This function is a wrapper for the matchr library. The documentation of the DistanceCalculator constants and parts of the testdata are adopted from there. For more information about the implementation, see http://github.com/antzucaro/matchr.
There are two groups of algorithms:
Edit distance: Levenshstein, Damerau-Levenshtein, Hamming, Jaro-Winkler, SmithWaterman
Sound similarity: Metaphone, Nysiis, Osa, Phonex, Soundex
func IsFuzzyMatch ¶
func IsFuzzyMatch(s1, s2 string, fuzziness float64, c DistanceCalculator) (bool, error)
IsFuzzyMatch determines if two strings are similar enough within the specified fuzziness.
func IsPhraseMatch ¶
IsPhraseMatch is a helper to determine if two positions are close enough to be part of the same phrase. This is used for phrase queries. Default slop is 0.
func LuceneASCIIFolding ¶
FoldASCII converts Unicode characters to ASCII equivalent. When none is found the original unicode is returned.
The native Go solution in NativeASCIIFolding() does not produce the exact same result as the Lucene 'ASCIIFoldingFilter'.
func NativeASCIIFolding ¶
NativeASCIIFolding uses the Go native ASCII folding functionality. This is sufficient in most cases. When full compliance with the Lucene ASCIIFoldingFilter is required, use LuceneASCIIFolding().
Types ¶
type Analyzer ¶
type Analyzer struct{}
Analyzer is the default analyzer for Search actions. It folds unicode to ASCII characters and lowercases them all.
The goal is to have this analyzer behave similarly to the ElasticSearch Analyzer that Ikuzo comes preconfigured with.
func (*Analyzer) TransformPhrase ¶
type AutoComplete ¶ added in v0.1.3
type AutoComplete struct { SuggestFn func(a Autos) Autos // contains filtered or unexported fields }
func NewAutoComplete ¶ added in v0.1.3
func NewAutoComplete() *AutoComplete
func (*AutoComplete) FromStrings ¶ added in v0.1.3
func (ac *AutoComplete) FromStrings(words []string)
func (*AutoComplete) FromTokenSteam ¶ added in v0.1.3
func (ac *AutoComplete) FromTokenSteam(stream *TokenStream)
type DistanceCalculator ¶
type DistanceCalculator int
const ( // Levenshtein computes the Levenshtein distance between two // strings. The returned value - distance - is the number of insertions, // deletions, and substitutions it takes to transform one // string (s1) into another (s2). Each step in the transformation "costs" // one distance point. Levenshtein DistanceCalculator = iota // DamerauLevenshtein computes the Damerau-Levenshtein distance between two // strings. The returned value - distance - is the number of insertions, // deletions, substitutions, and transpositions it takes to transform one // string (s1) into another (s2). Each step in the transformation "costs" // one distance point. It is similar to the Optimal String Alignment, // algorithm, but is more complex because it allows multiple edits on // substrings. DamerauLevenshtein // Hamming computes the Hamming distance between two equal-length strings. // This is the number of times the two strings differ between characters at // the same index. This implementation is based off of the algorithm // description found at http://en.wikipedia.org/wiki/Hamming_distance. Hamming // Jaro computes the Jaro edit distance between two strings. It represents // this with a float64 between 0 and 1 inclusive, with 0 indicating the two // strings are not at all similar and 1 indicating the two strings are exact // matches. // // See http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance for a // full description. Jaro // JaroWinkler computes the Jaro-Winkler edit distance between two strings. // This is a modification of the Jaro algorithm that gives additional weight // to prefix matches. JaroWinkler // OSA computes the Optimal String Alignment distance between two // strings. The returned value - distance - is the number of insertions, // deletions, substitutions, and transpositions it takes to transform one // string (s1) into another (s2). Each step in the transformation "costs" // one distance point. It is similar to Damerau-Levenshtein, but is simpler // because it does not allow multiple edits on any substring. Osa // SmithWaterman computes the Smith-Waterman local sequence alignment for the // two input strings. This was originally designed to find similar regions in // strings representing DNA or protein sequences. SmithWaterman )
func (DistanceCalculator) String ¶
func (dc DistanceCalculator) String() string
type Matches ¶ added in v0.1.3
type Matches struct {
// contains filtered or unexported fields
}
func NewMatches ¶ added in v0.1.3
func NewMatches() *Matches
func (*Matches) AppendTerm ¶ added in v0.1.3
func (*Matches) Reset ¶ added in v0.1.7
func (m *Matches) Reset()
Reset is used when already gathered matches must be reset when ErrSearchNoMatch is returned.
func (*Matches) TermFrequency ¶ added in v0.1.3
type PhoneticPreprocessor ¶
type PhoneticPreprocessor int
const ( // DoubleMetaphone computes the Double-Metaphone value of the input string. // This value is a phonetic representation of how the string sounds, with // affordances for many different language dialects. It was originally // developed by Lawrence Phillips in the 1990s. // // More information about this algorithm can be found on Wikipedia at // http://en.wikipedia.org/wiki/Metaphone. DoubleMetaphone PhoneticPreprocessor = iota // NYSIIS computes the NYSIIS phonetic encoding of the input string. It is a // modification of the traditional Soundex algorithm. Nysiis // Phonex computes the Phonex phonetic encoding of the input string. Phonex is // a modification of the venerable Soundex algorithm. It accounts for a few // more letter combinations to improve accuracy on some data sets. // // This implementation is based off of the original C implementation by the // creator - A. J. Lait - as found in his research paper entitled "An // Assessment of Name Matching Algorithms." Phonex // Soundex computes the Soundex phonetic representation of the input string. It // attempts to encode homophones with the same characters. More information can // be found at http://en.wikipedia.org/wiki/Soundex. Soundex )
func (PhoneticPreprocessor) String ¶
func (pp PhoneticPreprocessor) String() string
type QueryOption ¶
type QueryOption func(*QueryParser) error
func SetDefaultOperator ¶
func SetDefaultOperator(op Operator) QueryOption
SetDefaultOperator sets the default boolean search operator for the query
func SetFields ¶
func SetFields(field ...string) QueryOption
SetFields sets the default search fields for the query
type QueryParser ¶
type QueryParser struct {
// contains filtered or unexported fields
}
term1* -- Searches for the prefix term1 term1\* -- Searches for the term term1* term*1 -- Searches for the term term*1 term\*1 -- Searches for the term term*1 Note that above examples consider the terms before text processing.
The specification and documentation is adopted from the Lucene documentation for the SimpleQueryParser: https://lucene.apache.org/core/6_6_1/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.html
func NewQueryParser ¶
func NewQueryParser(options ...QueryOption) (*QueryParser, error)
NewQueryParser returns a QueryParser that can be used to parse user queries.
func (*QueryParser) Fields ¶
func (qp *QueryParser) Fields() []string
Fields returns the default search fields for the query
type QueryTerm ¶
type QueryTerm struct { Field string Value string Prohibited bool Phrase bool SuffixWildcard bool PrefixWildcard bool Boost float64 Fuzzy int // fuzzy is for words Slop int // slop is for phrases // contains filtered or unexported fields }
func (*QueryTerm) IsBoolQuery ¶
isBoolQuery returns true if the QueryTerm has a nested QueryTerm in a Boolean clause.
type SpellCheckOption ¶ added in v0.1.3
type SpellCheckOption func(*SpellChecker)
func SetSuggestDepth ¶ added in v0.1.3
func SetSuggestDepth(depth int) SpellCheckOption
func SetThreshold ¶ added in v0.1.3
func SetThreshold(threshold int) SpellCheckOption
type SpellChecker ¶ added in v0.1.3
type SpellChecker struct {
// contains filtered or unexported fields
}
func NewSpellCheck ¶ added in v0.1.3
func NewSpellCheck(options ...SpellCheckOption) *SpellChecker
func (*SpellChecker) SetCount ¶ added in v0.1.3
func (s *SpellChecker) SetCount(term string, count int, suggest bool)
func (*SpellChecker) SpellCheck ¶ added in v0.1.3
func (s *SpellChecker) SpellCheck(input string) string
Return the most likely correction for the input termgg
func (*SpellChecker) SpellCheckSuggestions ¶ added in v0.1.3
func (s *SpellChecker) SpellCheckSuggestions(input string, n int) []string
Return the most likely corrections in order from best to worst
func (*SpellChecker) Train ¶ added in v0.1.3
func (s *SpellChecker) Train(stream *TokenStream)
type Token ¶ added in v0.1.3
type Token struct { Vector int TermVector int OffsetStart int OffsetEnd int Ignored bool RawText string Normal string TrailingSpace bool Punctuation bool DocID int }
func (*Token) GetTermVector ¶ added in v0.1.3
type TokenOption ¶ added in v0.1.3
type TokenOption func(tok *Tokenizer)
func SetPhraseAware ¶ added in v0.1.3
func SetPhraseAware() TokenOption
type TokenStream ¶ added in v0.1.3
type TokenStream struct {
// contains filtered or unexported fields
}
func (*TokenStream) Highlight ¶ added in v0.1.3
func (ts *TokenStream) Highlight(vectors *Vectors, tagLabel, emClass string) string
TODO(kiivihal): refactor to reduce cyclo complexity
func (*TokenStream) String ¶ added in v0.1.3
func (ts *TokenStream) String() string
func (*TokenStream) Tokens ¶ added in v0.1.3
func (ts *TokenStream) Tokens() []Token
type Tokenizer ¶ added in v0.1.3
type Tokenizer struct {
// contains filtered or unexported fields
}
func NewTokenizer ¶ added in v0.1.3
func NewTokenizer(options ...TokenOption) *Tokenizer
func (*Tokenizer) Parse ¶ added in v0.1.3
func (t *Tokenizer) Parse(r io.Reader, docID int) *TokenStream
Parse creates a stream of tokens from an io.Reader. Each time Parse is called the document count is auto-incremented if a document identifier of 0 is given. Otherwise each call to Parse would effectively create the same vectors as the previous runs.
func (*Tokenizer) ParseBytes ¶ added in v0.1.3
func (t *Tokenizer) ParseBytes(b []byte, docID int) *TokenStream
func (*Tokenizer) ParseString ¶ added in v0.1.3
func (t *Tokenizer) ParseString(text string, docID int) *TokenStream
type Vector ¶ added in v0.1.3
func ValidPhrasePosition ¶
ValidPhrasePosition returns a list of valid positions from the source position to determine if the term is part of a phrase.
type Vectors ¶ added in v0.1.3
func NewVectors ¶ added in v0.1.3
func NewVectors() *Vectors