levenshtein

package
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2024 License: Apache-2.0, Apache-2.0 Imports: 6 Imported by: 13

README

levenshtein

levenshtein automaton

This package makes it fast and simple to build a finite determinic automaton that computes the levenshtein distance from a given string.

Sample usage:

// build a re-usable builder
lb := NewLevenshteinAutomatonBuilder(2, false)

origTerm := "couchbasefts"
dfa := lb.BuildDfa("couchbases", 2)
ed := dfa.eval([]byte(origTerm))
if ed.distance() != 2 {
	log.Errorf("expected distance 2, actual: %d", ed.distance())
}

This implementation is inspired by blog post and is intended to be a port of original rust implementation: https://github.com/tantivy-search/levenshtein-automata

Micro Benchmark Results against the current vellum/levenshtein is as below.

BenchmarkNewEditDistance1-8       	   30000	     52684 ns/op	   89985 B/op	     295 allocs/op
BenchmarkOlderEditDistance1-8     	   10000	    132931 ns/op	  588892 B/op	     363 allocs/op

BenchmarkNewEditDistance2-8       	   10000	    199127 ns/op	  377532 B/op	    1019 allocs/op
BenchmarkOlderEditDistance2-8     	    2000	    988109 ns/op	 4236609 B/op	    1898 allocs/op

Documentation

Index

Constants

View Source
const SinkState = uint32(0)
View Source
const StateLimit = 10000

StateLimit is the maximum number of states allowed

Variables

View Source
var ErrTooManyStates = fmt.Errorf("dfa contains more than %d states",
	StateLimit)

ErrTooManyStates is returned if you attempt to build a Levenshtein automaton which requires too many states.

Functions

This section is empty.

Types

type Alphabet

type Alphabet struct {
	// contains filtered or unexported fields
}

type Atleast

type Atleast struct {
	// contains filtered or unexported fields
}

type DFA

type DFA struct {
	// contains filtered or unexported fields
}

func (*DFA) Accept

func (d *DFA) Accept(state int, b byte) int

func (*DFA) CanMatch

func (d *DFA) CanMatch(state int) bool

func (*DFA) EditDistance added in v1.0.11

func (d *DFA) EditDistance(stateId int) uint8

func (*DFA) IsMatch

func (d *DFA) IsMatch(state int) bool

func (*DFA) MatchAndDistance added in v1.1.0

func (d *DFA) MatchAndDistance(input string) (bool, uint8)

func (*DFA) Start

func (d *DFA) Start() int

func (*DFA) WillAlwaysMatch

func (d *DFA) WillAlwaysMatch(state int) bool

WillAlwaysMatch returns if the specified state will always end in a matching state.

type Distance

type Distance interface {
	// contains filtered or unexported methods
}

/ Levenshtein Distance computed by a Levenshtein Automaton. / / Levenshtein automata can only compute the exact Levenshtein distance / up to a given `max_distance`. / / Over this distance, the automaton will invariably / return `Distance::AtLeast(max_distance + 1)`.

type Exact

type Exact struct {
	// contains filtered or unexported fields
}

type FullCharacteristicVector

type FullCharacteristicVector []uint32

type LevenshteinAutomatonBuilder

type LevenshteinAutomatonBuilder struct {
	// contains filtered or unexported fields
}

LevenshteinAutomatonBuilder wraps a precomputed datastructure that allows to produce small (but not minimal) DFA.

func NewLevenshteinAutomatonBuilder

func NewLevenshteinAutomatonBuilder(maxDistance uint8,
	transposition bool) (*LevenshteinAutomatonBuilder, error)

NewLevenshteinAutomatonBuilder creates a reusable, threadsafe Levenshtein automaton builder. `maxDistance` - maximum distance considered by the automaton. `transposition` - assign a distance of 1 for transposition

Building this automaton builder is computationally intensive. While it takes only a few milliseconds for `d=2`, it grows exponentially with `d`. It is only reasonable to `d <= 5`.

func (*LevenshteinAutomatonBuilder) BuildDfa

func (lab *LevenshteinAutomatonBuilder) BuildDfa(query string,
	fuzziness uint8) (*DFA, error)

BuildDfa builds the levenshtein automaton for serving queries with a given edit distance.

func (*LevenshteinAutomatonBuilder) MaxDistance

func (lab *LevenshteinAutomatonBuilder) MaxDistance() uint8

MaxDistance returns the MaxEdit distance supported by the LevenshteinAutomatonBuilder builder.

type LevenshteinNFA

type LevenshteinNFA struct {
	// contains filtered or unexported fields
}

type MultiState

type MultiState struct {
	// contains filtered or unexported fields
}

func (*MultiState) Clear

func (ms *MultiState) Clear()

func (*MultiState) States

func (ms *MultiState) States() []NFAState

type NFAState

type NFAState struct {
	Offset      uint32
	Distance    uint8
	InTranspose bool
}

type NFAStates

type NFAStates []NFAState

func (NFAStates) Len

func (ns NFAStates) Len() int

func (NFAStates) Less

func (ns NFAStates) Less(i, j int) bool

func (NFAStates) Swap

func (ns NFAStates) Swap(i, j int)

type ParametricDFA

type ParametricDFA struct {
	// contains filtered or unexported fields
}

type ParametricState

type ParametricState struct {
	// contains filtered or unexported fields
}

type ParametricStateIndex

type ParametricStateIndex struct {
	// contains filtered or unexported fields
}

type Transition

type Transition struct {
	// contains filtered or unexported fields
}

type Utf8DFABuilder

type Utf8DFABuilder struct {
	// contains filtered or unexported fields
}

Utf8DFABuilder makes it possible to define a DFA that takes unicode character, and build a `DFA` that operates on utf-8 encoded

type Utf8DFAStateBuilder

type Utf8DFAStateBuilder struct {
	// contains filtered or unexported fields
}

type Utf8StateId

type Utf8StateId uint32

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL