Documentation ¶
Index ¶
- func Analyze(in string) (tokens []string)
- func AnalyzeBytes(in []byte) (tokens [][]byte)
- func Bigrams(tokens []string) (bigrams sort.StringSlice)
- func MSAnalyze(in string) (tokens []string)
- func MSAnalyzeBytes(in []byte) (tokens [][]byte)
- func NGramSimilarity(a string, b string, ngramLen int) float64
- func Shingles(tokens []string) (result []string)
- func TokenNGrams(in string, ln int) (ngrams []string)
- func URLAnalyze(in string) (tokens []string)
- func URLAnalyzeOrEmpty(in string) (analyzed string)
- func UnigramsAndBigrams(tokens []string) (ngrams []string)
- func VisitAnalyzedShingles(input []byte, tokenizer func(b []byte) [][]byte, ...)
- func VisitShingles(tokens [][]byte, visit func(b []byte) (stop bool))
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AnalyzeBytes ¶
AnalyzeBytes normalizes and tokenizes a given input stream
func Bigrams ¶
func Bigrams(tokens []string) (bigrams sort.StringSlice)
Bigrams returns the unique token bigrams for a given ordered list of string tokens
func MSAnalyze ¶
MSAnalyze normalizes and tokenizes a given input stream according to rules reverse engineered to match what MS SQL Server full text indexer does
func MSAnalyzeBytes ¶
MSAnalyzeBytes normalizes and tokenizes a given input according to rules reverse engineered to match what MS SQL Server full text indexer does
func NGramSimilarity ¶
NGramSimilarity calculates the Jaccard similarity of the token ngrams of two input strings
func TokenNGrams ¶
TokenNGrams turns an input like "abcd" into a series of trigrams like ("abc", "bcd") If the input is empty, the result is empty; if the input is 1 or two characters, the output is padded with '$'
func URLAnalyze ¶
URLAnalyze attempts to normalize a URL to a simple host name or returns an empty slice
func URLAnalyzeOrEmpty ¶
URLAnalyzeOrEmpty attempts to normalize a URL to a simple host name or returns an empty string
func UnigramsAndBigrams ¶
UnigramsAndBigrams returns the unique token unigrams and bigrams for a given ordered list of string tokens
func VisitAnalyzedShingles ¶
func VisitAnalyzedShingles(input []byte, tokenizer func(b []byte) [][]byte, visit func(b []byte) (stop bool))
VisitAnalyzedShingles applies the provided tokenizer to the input and then calls the supplied visit function for each shingle of the tokenized input. If input is an empty byte slice, the function returns immediately
func VisitShingles ¶
VisitShingles calls the supplied visit function once per shingle, stopping if the visit function returns true
Types ¶
This section is empty.