Documentation ¶
Overview ¶
Package strutil provides string metrics for calculating string similarity as well as other string utility functions. Documentation for all the metrics can be found at https://pkg.go.dev/github.com/adrg/strutil/metrics.
Included string metrics:
- Hamming
- Jaro
- Jaro-Winkler
- Levenshtein
- Smith-Waterman-Gotoh
- Sorensen-Dice
- Jaccard
- Overlap coefficient
Index ¶
- func CommonPrefix(a, b string) string
- func NgramCount(term string, size int) int
- func NgramIntersection(a, b string, size int) (map[string]int, int, int, int)
- func NgramMap(term string, size int) (map[string]int, int)
- func Ngrams(term string, size int) []string
- func Similarity(a, b string, metric StringMetric) float64
- func SliceContains(terms []string, q string) bool
- func UniqueSlice(items []string) []string
- type StringMetric
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CommonPrefix ¶
CommonPrefix returns the common prefix of the specified strings. An empty string is returned if the parameters have no prefix in common.
Example ¶
package main import ( "fmt" "github.com/adrg/strutil" ) func main() { fmt.Println("(answer, anvil):", strutil.CommonPrefix("answer", "anvil")) }
Output: (answer, anvil): an
func NgramCount ¶ added in v0.3.0
NgramCount returns the n-gram count of the specified size for the provided term. An n-gram size of 1 is used if the provided size is less than or equal to 0.
Example ¶
package main import ( "fmt" "github.com/adrg/strutil" ) func main() { fmt.Println("abbcd n-gram count (size 2):", strutil.NgramCount("abbcd", 2)) fmt.Println("abbcd n-gram count (size 3):", strutil.NgramCount("abbcd", 3)) }
Output: abbcd n-gram count (size 2): 4 abbcd n-gram count (size 3): 3
func NgramIntersection ¶ added in v0.2.0
NgramIntersection returns a map of the n-grams of the specified size found in both terms, along with their frequency. The function also returns the number of common n-grams (the sum of all the values in the output map), the total number of n-grams in the first term and the total number of n-grams in the second term. An n-gram size of 1 is used if the provided size is less than or equal to 0.
Example ¶
package main import ( "fmt" "github.com/adrg/strutil" ) func main() { ngrams, common, totalA, totalB := strutil.NgramIntersection("ababc", "ababd", 2) fmt.Printf("(ababc, ababd) n-gram intersection: %v (%d/%d n-grams)\n", ngrams, common, totalA+totalB) }
Output: (ababc, ababd) n-gram intersection: map[ab:2 ba:1] (3/8 n-grams)
func NgramMap ¶ added in v0.2.0
NgramMap returns a map of all n-grams of the specified size for the provided term, along with their frequency. The function also returns the total number of n-grams, which is the sum of all the values in the output map. An n-gram size of 1 is used if the provided size is less than or equal to 0.
Example ¶
package main import ( "fmt" "github.com/adrg/strutil" ) func main() { // 2 character n-gram map. ngrams, total := strutil.NgramMap("abbcabb", 2) fmt.Printf("abbcabb n-gram map (size 2): %v (%d ngrams)\n", ngrams, total) // 3 character n-gram map. ngrams, total = strutil.NgramMap("abbcabb", 3) fmt.Printf("abbcabb n-gram map (size 3): %v (%d ngrams)\n", ngrams, total) }
Output: abbcabb n-gram map (size 2): map[ab:2 bb:2 bc:1 ca:1] (6 ngrams) abbcabb n-gram map (size 3): map[abb:2 bbc:1 bca:1 cab:1] (5 ngrams)
func Ngrams ¶ added in v0.2.0
Ngrams returns all the n-grams of the specified size for the provided term. The n-grams in the output slice are in the order in which they occur in the input term. An n-gram size of 1 is used if the provided size is less than or equal to 0.
Example ¶
package main import ( "fmt" "github.com/adrg/strutil" ) func main() { fmt.Println("abbcd n-grams (size 2):", strutil.Ngrams("abbcd", 2)) fmt.Println("abbcd n-grams (size 3):", strutil.Ngrams("abbcd", 3)) }
Output: abbcd n-grams (size 2): [ab bb bc cd] abbcd n-grams (size 3): [abb bbc bcd]
func Similarity ¶
func Similarity(a, b string, metric StringMetric) float64
Similarity returns the similarity of a and b, computed using the specified string metric. The returned similarity is a number between 0 and 1. Larger similarity numbers indicate closer matches.
Example ¶
package main import ( "fmt" "github.com/adrg/strutil" "github.com/adrg/strutil/metrics" ) func main() { sim := strutil.Similarity("riddle", "needle", metrics.NewJaroWinkler()) fmt.Printf("(riddle, needle) similarity: %.2f\n", sim) }
Output: (riddle, needle) similarity: 0.56
func SliceContains ¶
SliceContains returns true if terms contains q, or false otherwise.
Example ¶
package main import ( "fmt" "github.com/adrg/strutil" ) func main() { terms := []string{"a", "b", "c"} fmt.Println("([a b c], b):", strutil.SliceContains(terms, "b")) fmt.Println("([a b c], d):", strutil.SliceContains(terms, "d")) }
Output: ([a b c], b): true ([a b c], d): false
func UniqueSlice ¶
UniqueSlice returns a slice containing the unique items from the specified string slice. The items in the output slice are in the order in which they occur in the input slice.
Example ¶
package main import ( "fmt" "github.com/adrg/strutil" ) func main() { sample := []string{"a", "b", "a", "b", "b", "c"} fmt.Println("[a b a b b c]:", strutil.UniqueSlice(sample)) }
Output: [a b a b b c]: [a b c]
Types ¶
type StringMetric ¶
StringMetric represents a metric for measuring the similarity between strings. The metrics package implements the following string metrics:
- Hamming
- Jaro
- Jaro-Winkler
- Levenshtein
- Smith-Waterman-Gotoh
- Sorensen-Dice
- Jaccard
- Overlap coefficient
For more information see https://pkg.go.dev/github.com/adrg/strutil/metrics.