search

package

v0.2.0 Latest Latest Go to latest Published: Mar 10, 2021 License: Apache-2.0 Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/delving/hub3

Links

Open Source Insights

Documentation ¶

Index ¶

func Distance(s1, s2 string, c DistanceCalculator) (float64, error)
func IsFuzzyMatch(s1, s2 string, fuzziness float64, c DistanceCalculator) (bool, error)
func IsPhraseMatch(pos1, pos2, slop int) (bool, error)
func LuceneASCIIFolding(str string) string
func NativeASCIIFolding(text string) string
func Transform(s1 string, pp PhoneticPreprocessor) (string, error)
type Analyzer
- func (a *Analyzer) Transform(text string) string
- func (a *Analyzer) TransformPhrase(text string) string
type AutoComplete
- func NewAutoComplete() *AutoComplete
- func (ac *AutoComplete) FromStrings(words []string)
- func (ac *AutoComplete) FromTokenSteam(stream *TokenStream)
- func (ac *AutoComplete) Suggest(input string, limit int) ([]Autos, error)
type Autos
type DistanceCalculator
- func (dc DistanceCalculator) String() string
type Matches
- func NewMatches() *Matches
- func (m *Matches) AppendTerm(term string, tv *Vectors)
- func (m *Matches) DocCount() int
- func (m *Matches) HasDocID(docID int) bool
- func (m *Matches) Merge(matches *Matches)
- func (m *Matches) Reset()
- func (m *Matches) TermCount() int
- func (m *Matches) TermFrequency() map[string]int
- func (m *Matches) Total() int
- func (m *Matches) Vectors() *Vectors
type Operator
type PhoneticPreprocessor
- func (pp PhoneticPreprocessor) String() string
type QueryOption
- func SetDefaultOperator(op Operator) QueryOption
- func SetFields(field ...string) QueryOption
type QueryParser
- func NewQueryParser(options ...QueryOption) (*QueryParser, error)
- func (qp *QueryParser) Fields() []string
- func (qp *QueryParser) Parse(query string) (*QueryTerm, error)
type QueryTerm
- func (qt *QueryTerm) IsBoolQuery() bool
- func (qt *QueryTerm) Must() []*QueryTerm
- func (qt *QueryTerm) MustNot() []*QueryTerm
- func (qt *QueryTerm) Should() []*QueryTerm
- func (qt *QueryTerm) Type() QueryType
type QueryType
- func (qt QueryType) String() string
type SpellCheckOption
- func SetSuggestDepth(depth int) SpellCheckOption
- func SetThreshold(threshold int) SpellCheckOption
type SpellChecker
- func NewSpellCheck(options ...SpellCheckOption) *SpellChecker
- func (s *SpellChecker) SetCount(term string, count int, suggest bool)
- func (s *SpellChecker) SpellCheck(input string) string
- func (s *SpellChecker) SpellCheckSuggestions(input string, n int) []string
- func (s *SpellChecker) Train(stream *TokenStream)
type Token
- func (t *Token) GetTermVector() Vector
type TokenOption
- func SetPhraseAware() TokenOption
type TokenStream
- func (ts *TokenStream) Highlight(vectors *Vectors, tagLabel, emClass string) string
- func (ts *TokenStream) String() string
- func (ts *TokenStream) Tokens() []Token
type Tokenizer
- func NewTokenizer(options ...TokenOption) *Tokenizer
- func (t *Tokenizer) Parse(r io.Reader, docID int) *TokenStream
- func (t *Tokenizer) ParseBytes(b []byte, docID int) *TokenStream
- func (t *Tokenizer) ParseString(text string, docID int) *TokenStream
type Vector
- func ValidPhrasePosition(vector Vector, slop int) []Vector
type Vectors
- func NewVectors() *Vectors
- func (tv *Vectors) Add(doc, pos int)
- func (tv *Vectors) AddPhraseVector(vector Vector)
- func (tv *Vectors) AddVector(vector Vector)
- func (tv *Vectors) DocCount() int
- func (tv *Vectors) HasDoc(doc int) bool
- func (tv *Vectors) HasVector(vector Vector) bool
- func (tv *Vectors) Merge(vectors *Vectors)
- func (tv *Vectors) Size() int

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Distance ¶

func Distance(s1, s2 string, c DistanceCalculator) (float64, error)

Distance calculates the similarity between two strings. The DistanceCalculator determines which algorithm is used.

This function is a wrapper for the matchr library. The documentation of the DistanceCalculator constants and parts of the testdata are adopted from there. For more information about the implementation, see http://github.com/antzucaro/matchr.

There are two groups of algorithms:

Edit distance: Levenshstein, Damerau-Levenshtein, Hamming, Jaro-Winkler, SmithWaterman

Sound similarity: Metaphone, Nysiis, Osa, Phonex, Soundex

func IsFuzzyMatch ¶

func IsFuzzyMatch(s1, s2 string, fuzziness float64, c DistanceCalculator) (bool, error)

IsFuzzyMatch determines if two strings are similar enough within the specified fuzziness.

func IsPhraseMatch ¶

func IsPhraseMatch(pos1, pos2, slop int) (bool, error)

IsPhraseMatch is a helper to determine if two positions are close enough to be part of the same phrase. This is used for phrase queries. Default slop is 0.

func LuceneASCIIFolding ¶

func LuceneASCIIFolding(str string) string

FoldASCII converts Unicode characters to ASCII equivalent. When none is found the original unicode is returned.

The native Go solution in NativeASCIIFolding() does not produce the exact same result as the Lucene 'ASCIIFoldingFilter'.

func NativeASCIIFolding ¶

func NativeASCIIFolding(text string) string

NativeASCIIFolding uses the Go native ASCII folding functionality. This is sufficient in most cases. When full compliance with the Lucene ASCIIFoldingFilter is required, use LuceneASCIIFolding().

func Transform ¶

func Transform(s1 string, pp PhoneticPreprocessor) (string, error)

Types ¶

type Analyzer ¶

type Analyzer struct{}

Analyzer is the default analyzer for Search actions. It folds unicode to ASCII characters and lowercases them all.

The goal is to have this analyzer behave similarly to the ElasticSearch Analyzer that Ikuzo comes preconfigured with.

func (*Analyzer) Transform ¶

func (a *Analyzer) Transform(text string) string

func (*Analyzer) TransformPhrase ¶

func (a *Analyzer) TransformPhrase(text string) string

type AutoComplete ¶ added in v0.1.3

type AutoComplete struct {
	SuggestFn func(a Autos) Autos
	// contains filtered or unexported fields
}

func NewAutoComplete ¶ added in v0.1.3

func NewAutoComplete() *AutoComplete

func (*AutoComplete) FromStrings ¶ added in v0.1.3

func (ac *AutoComplete) FromStrings(words []string)

func (*AutoComplete) FromTokenSteam ¶ added in v0.1.3

func (ac *AutoComplete) FromTokenSteam(stream *TokenStream)

func (*AutoComplete) Suggest ¶ added in v0.1.3

func (ac *AutoComplete) Suggest(input string, limit int) ([]Autos, error)

type Autos ¶ added in v0.1.3

type Autos struct {
	Term     string
	Count    int
	Metadata map[string][]string
}

type DistanceCalculator ¶

type DistanceCalculator int

const (
	// Levenshtein computes the Levenshtein distance between two
	// strings. The returned value - distance - is the number of insertions,
	// deletions, and substitutions it takes to transform one
	// string (s1) into another (s2). Each step in the transformation "costs"
	// one distance point.
	Levenshtein DistanceCalculator = iota

	// DamerauLevenshtein computes the Damerau-Levenshtein distance between two
	// strings. The returned value - distance - is the number of insertions,
	// deletions, substitutions, and transpositions it takes to transform one
	// string (s1) into another (s2). Each step in the transformation "costs"
	// one distance point. It is similar to the Optimal String Alignment,
	// algorithm, but is more complex because it allows multiple edits on
	// substrings.
	DamerauLevenshtein

	// Hamming computes the Hamming distance between two equal-length strings.
	// This is the number of times the two strings differ between characters at
	// the same index. This implementation is based off of the algorithm
	// description found at http://en.wikipedia.org/wiki/Hamming_distance.
	Hamming

	// Jaro computes the Jaro edit distance between two strings. It represents
	// this with a float64 between 0 and 1 inclusive, with 0 indicating the two
	// strings are not at all similar and 1 indicating the two strings are exact
	// matches.
	//
	// See http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance for a
	// full description.
	Jaro

	// JaroWinkler computes the Jaro-Winkler edit distance between two strings.
	// This is a modification of the Jaro algorithm that gives additional weight
	// to prefix matches.
	JaroWinkler

	// OSA computes the Optimal String Alignment distance between two
	// strings. The returned value - distance - is the number of insertions,
	// deletions, substitutions, and transpositions it takes to transform one
	// string (s1) into another (s2). Each step in the transformation "costs"
	// one distance point. It is similar to Damerau-Levenshtein, but is simpler
	// because it does not allow multiple edits on any substring.
	Osa

	// SmithWaterman computes the Smith-Waterman local sequence alignment for the
	// two input strings. This was originally designed to find similar regions in
	// strings representing DNA or protein sequences.
	SmithWaterman
)

func (DistanceCalculator) String ¶

func (dc DistanceCalculator) String() string

type Matches ¶ added in v0.1.3

type Matches struct {
	// contains filtered or unexported fields
}

func NewMatches ¶ added in v0.1.3

func NewMatches() *Matches

func (*Matches) AppendTerm ¶ added in v0.1.3

func (m *Matches) AppendTerm(term string, tv *Vectors)

func (*Matches) DocCount ¶ added in v0.1.3

func (m *Matches) DocCount() int

func (*Matches) HasDocID ¶ added in v0.1.3

func (m *Matches) HasDocID(docID int) bool

func (*Matches) Merge ¶ added in v0.1.3

func (m *Matches) Merge(matches *Matches)

func (*Matches) Reset ¶ added in v0.1.7

func (m *Matches) Reset()

Reset is used when already gathered matches must be reset when ErrSearchNoMatch is returned.

func (*Matches) TermCount ¶ added in v0.1.3

func (m *Matches) TermCount() int

func (*Matches) TermFrequency ¶ added in v0.1.3

func (m *Matches) TermFrequency() map[string]int

func (*Matches) Total ¶ added in v0.1.3

func (m *Matches) Total() int

func (*Matches) Vectors ¶ added in v0.1.3

func (m *Matches) Vectors() *Vectors

type Operator ¶

type Operator string

const (
	AndOperator      Operator = "AND"
	BoostOperator    Operator = "^"
	FieldOperator    Operator = ":"
	FuzzyOperator    Operator = "~"
	NilOperator      Operator = ""
	NotOperator      Operator = "NOT"
	OrOperator       Operator = "OR"
	WildCardOperator Operator = "*"
)

type PhoneticPreprocessor ¶

type PhoneticPreprocessor int

const (
	// DoubleMetaphone computes the Double-Metaphone value of the input string.
	// This value is a phonetic representation of how the string sounds, with
	// affordances for many different language dialects. It was originally
	// developed by Lawrence Phillips in the 1990s.
	//
	// More information about this algorithm can be found on Wikipedia at
	// http://en.wikipedia.org/wiki/Metaphone.
	DoubleMetaphone PhoneticPreprocessor = iota

	// NYSIIS computes the NYSIIS phonetic encoding of the input string. It is a
	// modification of the traditional Soundex algorithm.
	Nysiis

	// Phonex computes the Phonex phonetic encoding of the input string. Phonex is
	// a modification of the venerable Soundex algorithm. It accounts for a few
	// more letter combinations to improve accuracy on some data sets.
	//
	// This implementation is based off of the original C implementation by the
	// creator - A. J. Lait - as found in his research paper entitled "An
	// Assessment of Name Matching Algorithms."
	Phonex

	// Soundex computes the Soundex phonetic representation of the input string. It
	// attempts to encode homophones with the same characters. More information can
	// be found at http://en.wikipedia.org/wiki/Soundex.
	Soundex
)

func (PhoneticPreprocessor) String ¶

func (pp PhoneticPreprocessor) String() string

type QueryOption ¶

type QueryOption func(*QueryParser) error

func SetDefaultOperator ¶

func SetDefaultOperator(op Operator) QueryOption

SetDefaultOperator sets the default boolean search operator for the query

func SetFields ¶

func SetFields(field ...string) QueryOption

SetFields sets the default search fields for the query

type QueryParser ¶

type QueryParser struct {
	// contains filtered or unexported fields
}

term1* -- Searches for the prefix term1 term1\* -- Searches for the term term1* term*1 -- Searches for the term term*1 term\*1 -- Searches for the term term*1 Note that above examples consider the terms before text processing.

The specification and documentation is adopted from the Lucene documentation for the SimpleQueryParser: https://lucene.apache.org/core/6_6_1/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.html

func NewQueryParser ¶

func NewQueryParser(options ...QueryOption) (*QueryParser, error)

NewQueryParser returns a QueryParser that can be used to parse user queries.

func (*QueryParser) Fields ¶

func (qp *QueryParser) Fields() []string

Fields returns the default search fields for the query

func (*QueryParser) Parse ¶

func (qp *QueryParser) Parse(query string) (*QueryTerm, error)

type QueryTerm ¶

type QueryTerm struct {
	Field          string
	Value          string
	Prohibited     bool
	Phrase         bool
	SuffixWildcard bool
	PrefixWildcard bool
	Boost          float64
	Fuzzy          int // fuzzy is for words
	Slop           int // slop is for phrases
	// contains filtered or unexported fields
}

func (*QueryTerm) IsBoolQuery ¶

func (qt *QueryTerm) IsBoolQuery() bool

isBoolQuery returns true if the QueryTerm has a nested QueryTerm in a Boolean clause.

func (*QueryTerm) Must ¶

func (qt *QueryTerm) Must() []*QueryTerm

Must returns a list of Required QueryTerms.

func (*QueryTerm) MustNot ¶

func (qt *QueryTerm) MustNot() []*QueryTerm

MustNot returns a list of Prohibited QueryTerms.

func (*QueryTerm) Should ¶

func (qt *QueryTerm) Should() []*QueryTerm

Should returns a list of Optional QueryTerms. One or more must match to satistify the Query.

func (*QueryTerm) Type ¶

func (qt *QueryTerm) Type() QueryType

Type returns the type of the Query.

type QueryType ¶

type QueryType int

const (
	BoolQuery QueryType = iota
	FuzzyQuery
	PhraseQuery
	TermQuery
	WildCardQuery
)

func (QueryType) String ¶

func (qt QueryType) String() string

type SpellCheckOption ¶ added in v0.1.3

type SpellCheckOption func(*SpellChecker)

func SetSuggestDepth ¶ added in v0.1.3

func SetSuggestDepth(depth int) SpellCheckOption

func SetThreshold ¶ added in v0.1.3

func SetThreshold(threshold int) SpellCheckOption

type SpellChecker ¶ added in v0.1.3

type SpellChecker struct {
	// contains filtered or unexported fields
}

func NewSpellCheck ¶ added in v0.1.3

func NewSpellCheck(options ...SpellCheckOption) *SpellChecker

func (*SpellChecker) SetCount ¶ added in v0.1.3

func (s *SpellChecker) SetCount(term string, count int, suggest bool)

func (*SpellChecker) SpellCheck ¶ added in v0.1.3

func (s *SpellChecker) SpellCheck(input string) string

Return the most likely correction for the input termgg

func (*SpellChecker) SpellCheckSuggestions ¶ added in v0.1.3

func (s *SpellChecker) SpellCheckSuggestions(input string, n int) []string

Return the most likely corrections in order from best to worst

func (*SpellChecker) Train ¶ added in v0.1.3

func (s *SpellChecker) Train(stream *TokenStream)

type Token ¶ added in v0.1.3

type Token struct {
	Vector        int
	TermVector    int
	OffsetStart   int
	OffsetEnd     int
	Ignored       bool
	RawText       string
	Normal        string
	TrailingSpace bool
	Punctuation   bool
	DocID         int
}

func (*Token) GetTermVector ¶ added in v0.1.3

func (t *Token) GetTermVector() Vector

type TokenOption ¶ added in v0.1.3

type TokenOption func(tok *Tokenizer)

func SetPhraseAware ¶ added in v0.1.3

func SetPhraseAware() TokenOption

type TokenStream ¶ added in v0.1.3

type TokenStream struct {
	// contains filtered or unexported fields
}

func (*TokenStream) Highlight ¶ added in v0.1.3

func (ts *TokenStream) Highlight(vectors *Vectors, tagLabel, emClass string) string

TODO(kiivihal): refactor to reduce cyclo complexity

func (*TokenStream) String ¶ added in v0.1.3

func (ts *TokenStream) String() string

func (*TokenStream) Tokens ¶ added in v0.1.3

func (ts *TokenStream) Tokens() []Token

type Tokenizer ¶ added in v0.1.3

type Tokenizer struct {
	// contains filtered or unexported fields
}

func NewTokenizer ¶ added in v0.1.3

func NewTokenizer(options ...TokenOption) *Tokenizer

func (*Tokenizer) Parse ¶ added in v0.1.3

func (t *Tokenizer) Parse(r io.Reader, docID int) *TokenStream

Parse creates a stream of tokens from an io.Reader. Each time Parse is called the document count is auto-incremented if a document identifier of 0 is given. Otherwise each call to Parse would effectively create the same vectors as the previous runs.

func (*Tokenizer) ParseBytes ¶ added in v0.1.3

func (t *Tokenizer) ParseBytes(b []byte, docID int) *TokenStream

func (*Tokenizer) ParseString ¶ added in v0.1.3

func (t *Tokenizer) ParseString(text string, docID int) *TokenStream

type Vector ¶ added in v0.1.3

type Vector struct {
	DocID    int
	Location int
}

func ValidPhrasePosition ¶

func ValidPhrasePosition(vector Vector, slop int) []Vector

ValidPhrasePosition returns a list of valid positions from the source position to determine if the term is part of a phrase.

type Vectors ¶ added in v0.1.3

type Vectors struct {
	Locations     map[Vector]bool
	Docs          map[int]bool
	PhraseVectors int
}

func NewVectors ¶ added in v0.1.3

func NewVectors() *Vectors

func (*Vectors) Add ¶ added in v0.1.3

func (tv *Vectors) Add(doc, pos int)

pos must not be 0

func (*Vectors) AddPhraseVector ¶ added in v0.1.3

func (tv *Vectors) AddPhraseVector(vector Vector)

func (*Vectors) AddVector ¶ added in v0.1.3

func (tv *Vectors) AddVector(vector Vector)

func (*Vectors) DocCount ¶ added in v0.1.3

func (tv *Vectors) DocCount() int

func (*Vectors) HasDoc ¶ added in v0.1.3

func (tv *Vectors) HasDoc(doc int) bool

func (*Vectors) HasVector ¶ added in v0.1.3

func (tv *Vectors) HasVector(vector Vector) bool

func (*Vectors) Merge ¶ added in v0.1.3

func (tv *Vectors) Merge(vectors *Vectors)

func (*Vectors) Size ¶ added in v0.1.3

func (tv *Vectors) Size() int

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
es

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL