strutil

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2019 License: MIT Imports: 1 Imported by: 67

README

strutil

Build Status GoDoc Go Report Card

strutil provides string metrics for calculating string similarity as well as other string utility functions.
Full documentation can be found at: https://godoc.org/github.com/adrg/strutil.

Installation

go get github.com/adrg/strutil

String metrics

The package defines the StringMetric interface, which is implemented by all the string metrics. The interface is used with the Similarity function, which calculates the similarity between the specified strings, using the provided string metric.

type StringMetric interface {
	Compare(a, b string) float64
}

func Similarity(a, b string, metric StringMetric) float64 {
}

All defined string metrics can be found in the metrics package.

Levenshtein

Calculate similarity using default options.

similarity := strutil.Similarity("graph", "giraffe", metrics.NewLevenshtein())
fmt.Printf("%.2f\n", similarity) // Output: 0.43

Configure edit operation costs.

lev := metrics.NewLevenshtein()
lev.CaseSensitive = false
lev.InsertCost = 1
lev.ReplaceCost = 2
lev.DeleteCost = 1

similarity := strutil.Similarity("make", "Cake", lev)
fmt.Printf("%.2f\n", similarity) // Output: 0.50

Calculate distance.

lev := metrics.NewLevenshtein()
fmt.Printf("%d\n", lev.Distance("graph", "giraffe")) // Output: 4

More information and additional examples can be found on GoDoc.

Jaro
similarity := strutil.Similarity("think", "tank", metrics.NewJaro())
fmt.Printf("%.2f\n", similarity) // Output: 0.78

More information and additional examples can be found on GoDoc.

Jaro-Winkler
similarity := strutil.Similarity("think", "tank", metrics.NewJaroWinkler())
fmt.Printf("%.2f\n", similarity) // Output: 0.80

More information and additional examples can be found on GoDoc.

Smith-Waterman-Gotoh

Calculate similarity using default options.

swg := metrics.NewSmithWatermanGotoh()
similarity := strutil.Similarity("times roman", "times new roman", swg)
fmt.Printf("%.2f\n", similarity) // Output: 0.82

Customize gap penalty and substitution function.

swg := metrics.NewSmithWatermanGotoh()
swg.CaseSensitive = false
swg.GapPenalty = -0.1
swg.Substitution = metrics.MatchMismatch {
    Match:    1,
    Mismatch: -0.5,
}

similarity := strutil.Similarity("Times Roman", "times new roman", swg)
fmt.Printf("%.2f\n", similarity) // Output: 0.96

More information and additional examples can be found on on GoDoc.

Sorensen-Dice

Calculate similarity using default options.

sd := metrics.NewSorensenDice()
similarity := strutil.Similarity("time to make haste", "no time to waste", sd)
fmt.Printf("%.2f\n", similarity) // Output: 0.62

Customize n-gram size.

sd := metrics.NewSorensenDice()
sd.CaseSensitive = false
sd.NgramSize = 3

similarity := strutil.Similarity("Time to make haste", "no time to waste", sd)
fmt.Printf("%.2f\n", similarity) // Output: 0.53

More information and additional examples can be found on on GoDoc.

References

For more information see:

Contributing

Contributions in the form of pull requests, issues or just general feedback, are always welcome. See CONTRIBUTING.MD.

License

Copyright (c) 2019 Adrian-George Bostan.

This project is licensed under the MIT license. See LICENSE for more details.

Documentation

Overview

Package strutil provides string metrics for calculating string similarity as well as other string utility functions.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func CommonPrefix

func CommonPrefix(a, b string) string

CommonPrefix returns the common prefix of the specified strings. An empty string is returned if the parameters have no prefix in common.

Example
package main

import (
	"fmt"

	"github.com/adrg/strutil"
)

func main() {
	fmt.Println("(answer, anvil):", strutil.CommonPrefix("answer", "anvil"))

}
Output:

(answer, anvil): an

func Similarity

func Similarity(a, b string, metric StringMetric) float64

Similarity returns the similarity of a and b, computed using the specified string metric. The returned similarity is a number between 0 and 1. Larger similarity numbers indicate closer matches.

Example
package main

import (
	"fmt"

	"github.com/adrg/strutil"
	"github.com/adrg/strutil/metrics"
)

func main() {
	sim := strutil.Similarity("riddle", "needle", metrics.NewJaroWinkler())
	fmt.Printf("(riddle, needle) similarity: %.2f\n", sim)

}
Output:

(riddle, needle) similarity: 0.56

func SliceContains

func SliceContains(terms []string, q string) bool

SliceContains returns true if terms contains q, or false otherwise.

Example
package main

import (
	"fmt"

	"github.com/adrg/strutil"
)

func main() {
	terms := []string{"a", "b", "c"}
	fmt.Println("([a b c], b):", strutil.SliceContains(terms, "b"))
	fmt.Println("([a b c], d):", strutil.SliceContains(terms, "d"))

}
Output:

([a b c], b): true
([a b c], d): false

func UniqueSlice

func UniqueSlice(items []string) []string

UniqueSlice returns a slice containing the unique items from the specified string slice. The items in the output slice are in the order in which they occur in the input slice.

Example
package main

import (
	"fmt"

	"github.com/adrg/strutil"
)

func main() {
	sample := []string{"a", "b", "a", "b", "b", "c"}
	fmt.Println("[a b a b b c]:", strutil.UniqueSlice(sample))

}
Output:

[a b a b b c]: [a b c]

Types

type StringMetric

type StringMetric interface {
	Compare(a, b string) float64
}

StringMetric represents a metric for measuring the similarity between strings. The metrics package implements the following string metrics:

  • Jaro
  • Jaro-Winkler
  • Levenshtein
  • Smith-Waterman-Gotoh
  • Sorensen-Dice

For more information see https://godoc.org/github.com/adrg/strutil/metrics.

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL