Documentation ¶
Index ¶
- func Compare(s1, s2 string, opts ...Option) float64
- func DamerauLevenshteinDistance(s1, s2 string) int
- func FindBestMatch(s string, targets []string, opts ...Option) *similarity.MatchResult
- func FindBestMatchOne(s string, targets []string, opts ...Option) *similarity.Match
- func JaroDistance(s1, s2 string) float32
- func JaroWinklerDistance(s1, s2 string, p float32) float32
- func Levenshtein(str1, str2 string, costIns, costRep, costDel int) int
- func LevenshteinDistance(s1, s2 string) int
- func SimilarText(first, second string, percent *float64) int
- func TrigramCompare(s1, s2 string) float32
- type Option
- type OptionFunc
- func Cosine() OptionFunc
- func Default() OptionFunc
- func DiceCoefficient(ngram ...int) OptionFunc
- func Hamming() OptionFunc
- func IgnoreCase() OptionFunc
- func IgnoreSpace() OptionFunc
- func Jaro(matchWindow ...int) OptionFunc
- func JaroWinkler(matchWindow ...int) OptionFunc
- func SimHash() OptionFunc
- func UseASCII() OptionFunc
- func UseBase64() OptionFunc
- type StringDiff
- func (sd *StringDiff) DamerauLevenshteinDistance(deleteCost, insertCost, replaceCost, swapCost int) int
- func (sd *StringDiff) JaroDistance() float32
- func (sd *StringDiff) JaroWinklerDistance(p float32) float32
- func (sd *StringDiff) LevenshteinDistance() int
- func (sd *StringDiff) TrigramCompare() float32
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DamerauLevenshteinDistance ¶
DamerauLevenshteinDistance Algorithm is an extension to the Levenshtein Algorithm which solves the edit distance problem between a source string and a target string with the following operations:
Read https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
func FindBestMatch ¶
func FindBestMatch(s string, targets []string, opts ...Option) *similarity.MatchResult
FindBestMatch 返回相似度最高的那个字符串, 以及索引位置
func FindBestMatchOne ¶
func FindBestMatchOne(s string, targets []string, opts ...Option) *similarity.Match
FindBestMatchOne 返回相似度最高的那个字符串
func JaroDistance ¶
JaroDistance distance between two words is the minimum number of single-character transpositions required to change one word into the other.
func JaroWinklerDistance ¶
JaroWinklerDistance uses a prefix scale which gives more favourable ratings to strings that match from the beginning for a set prefix length
p argument is constant scaling factor for how much the score is adjusted upwards for having common prefixes. The standard value for this constant in Winkler’s work is p=0.1
func Levenshtein ¶
Levenshtein levenshtein() costIns: Defines the cost of insertion. costRep: Defines the cost of replacement. costDel: Defines the cost of deletion.
func LevenshteinDistance ¶
LevenshteinDistance is the minimum number of single-character edits required to change one word into the other, so the result is a positive integer, sensitive to string length . Which make it more difficult to draw pattern.
Read https://github.com/mhutter/string-similarity and https://en.wikipedia.org/wiki/Levenshtein_distance
func SimilarText ¶
SimilarText 实现PHP中的similar_text函数,用于比较两个文本的相似度
func TrigramCompare ¶
TrigramCompare is a case of n-gram, a contiguous sequence of n (three, in this case) items from a given sample. In our case, an application name is a sample and a character is an item.
Types ¶
type OptionFunc ¶
type OptionFunc func(*option)
OptionFunc 参数方法类型
func DiceCoefficient ¶
func DiceCoefficient(ngram ...int) OptionFunc
DiceCoefficient ngram 是筛子系数需要用的一个值
func SimHash ¶
func SimHash() OptionFunc
type StringDiff ¶
StringDiff is a utility struct to compare similarity between two string.
read https://medium.com/@appaloosastore/string-similarity-algorithms-compared-3f7b4d12f0ff
func NewStringDiff ¶
func NewStringDiff(s1, s2 string) *StringDiff
NewStringDiff will create a new instance of StringDiff
func (*StringDiff) DamerauLevenshteinDistance ¶
func (sd *StringDiff) DamerauLevenshteinDistance(deleteCost, insertCost, replaceCost, swapCost int) int
DamerauLevenshteinDistance Algorithm is an extension to the Levenshtein Algorithm which solves the edit distance problem between a source string and a target string with the following operations:
- Character Insertion - Character Deletion - Character Replacement - Adjacent Character Swap
Note that the adjacent character swap operation is an edit that may be applied when two adjacent characters in the source string match two adjacent characters in the target string, but in reverse order, rather than a general allowance for adjacent character swaps.
This implementation allows the client to specify the costs of the various edit operations with the restriction that the cost of two swap operations must not be less than the cost of a delete operation followed by an insert operation. This restriction is required to preclude two swaps involving the same character being required for optimality which, in turn, enables a fast dynamic programming solution.
The running time of the Damerau-Levenshtein algorithm is O(n*m) where n is the length of the source string and m is the length of the target string. This implementation consumes O(n*m) space.
This code is an adaptation from https://github.com/KevinStern/software-and-algorithms/blob/master/src/main/java/blogspot/software_and_algorithms/stern_library/string/DamerauLevenshteinAlgorithm.java
func (*StringDiff) JaroDistance ¶
func (sd *StringDiff) JaroDistance() float32
JaroDistance distance between two words is the minimum number of single-character transpositions required to change one word into the other.
func (*StringDiff) JaroWinklerDistance ¶
func (sd *StringDiff) JaroWinklerDistance(p float32) float32
JaroWinklerDistance uses a prefix scale which gives more favourable ratings to strings that match from the beginning for a set prefix length
p argument is constant scaling factor for how much the score is adjusted upwards for having common prefixes. The standard value for this constant in Winkler’s work is p=0.1
Read https://github.com/flori/amatch Read https://fr.wikipedia.org/wiki/Distance_de_Jaro-Winkler Read https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
func (*StringDiff) LevenshteinDistance ¶
func (sd *StringDiff) LevenshteinDistance() int
LevenshteinDistance is the minimum number of single-character edits required to change one word into the other, so the result is a positive integer, sensitive to string length . Which make it more difficult to draw pattern.
Read https://github.com/mhutter/string-similarity and https://en.wikipedia.org/wiki/Levenshtein_distance
func (*StringDiff) TrigramCompare ¶
func (sd *StringDiff) TrigramCompare() float32
TrigramCompare is a case of n-gram, a contiguous sequence of n (three, in this case) items from a given sample. In our case, an application name is a sample and a character is an item.
Read https://github.com/milk1000cc/trigram/blob/master/lib/trigram.rb Read http://search.cpan.org/dist/String-Trigram/Trigram.pm Read https://en.wikipedia.org/wiki/N-gram
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
examples
|
|
simhash package implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document.
|
simhash package implements Charikar's simhash algorithm to generate a 64-bit fingerprint of a given document. |