levenshtein2

package
v0.0.0-...-b15db8c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 13, 2022 License: Apache-2.0, Apache-2.0 Imports: 6 Imported by: 1

README

levenshtein

levenshtein automaton

This package makes it fast and simple to build a finite determinic automaton that computes the levenshtein distance from a given string.

Sample usage:

// build a re-usable builder
lb := NewLevenshteinAutomatonBuilder(2, false)

origTerm := "couchbasefts"
dfa := lb.BuildDfa("couchbases", 2)
ed := dfa.eval([]byte(origTerm))
if ed.distance() != 2 {
	log.Errorf("expected distance 2, actual: %d", ed.distance())
}

This implementation is inspired by blog post and is intended to be a port of original rust implementation: https://github.com/tantivy-search/levenshtein-automata

Micro Benchmark Results against the current vellum/levenshtein is as below.

BenchmarkNewEditDistance1-8       	   30000	     52684 ns/op	   89985 B/op	     295 allocs/op
BenchmarkOlderEditDistance1-8     	   10000	    132931 ns/op	  588892 B/op	     363 allocs/op

BenchmarkNewEditDistance2-8       	   10000	    199127 ns/op	  377532 B/op	    1019 allocs/op
BenchmarkOlderEditDistance2-8     	    2000	    988109 ns/op	 4236609 B/op	    1898 allocs/op

Documentation

Index

Constants

View Source
const SinkState = uint32(0)

Variables

This section is empty.

Functions

func SetStateLimit

func SetStateLimit(v int)

SetStateLimit sets the maximum number of states allowed.

func StateLimit

func StateLimit() int

StateLimit is the maximum number of states allowed.

Types

type Alphabet

type Alphabet struct {
	// contains filtered or unexported fields
}

type Atleast

type Atleast struct {
	// contains filtered or unexported fields
}

type DFA

type DFA struct {
	// contains filtered or unexported fields
}

func (*DFA) Accept

func (d *DFA) Accept(state int, b byte) int

func (*DFA) CanMatch

func (d *DFA) CanMatch(state int) bool

func (*DFA) IsMatch

func (d *DFA) IsMatch(state int) bool

func (*DFA) Start

func (d *DFA) Start() int

func (*DFA) WillAlwaysMatch

func (d *DFA) WillAlwaysMatch(state int) bool

WillAlwaysMatch returns if the specified state will always end in a matching state.

type Distance

type Distance interface {
	// contains filtered or unexported methods
}

/ Levenshtein Distance computed by a Levenshtein Automaton. / / Levenshtein automata can only compute the exact Levenshtein distance / up to a given `max_distance`. / / Over this distance, the automaton will invariably / return `Distance::AtLeast(max_distance + 1)`.

type Exact

type Exact struct {
	// contains filtered or unexported fields
}

type FullCharacteristicVector

type FullCharacteristicVector []uint32

type LevenshteinAutomatonBuilder

type LevenshteinAutomatonBuilder struct {
	// contains filtered or unexported fields
}

LevenshteinAutomatonBuilder wraps a precomputed datastructure that allows to produce small (but not minimal) DFA.

func NewLevenshteinAutomatonBuilder

func NewLevenshteinAutomatonBuilder(maxDistance uint8,
	transposition bool) (*LevenshteinAutomatonBuilder, error)

NewLevenshteinAutomatonBuilder creates a reusable, threadsafe Levenshtein automaton builder. `maxDistance` - maximum distance considered by the automaton. `transposition` - assign a distance of 1 for transposition

Building this automaton builder is computationally intensive. While it takes only a few milliseconds for `d=2`, it grows exponentially with `d`. It is only reasonable to `d <= 5`.

func (*LevenshteinAutomatonBuilder) BuildDfa

func (lab *LevenshteinAutomatonBuilder) BuildDfa(query string,
	fuzziness uint8) (*DFA, error)

BuildDfa builds the levenshtein automaton for serving queries with a given edit distance.

func (*LevenshteinAutomatonBuilder) MaxDistance

func (lab *LevenshteinAutomatonBuilder) MaxDistance() uint8

MaxDistance returns the MaxEdit distance supported by the LevenshteinAutomatonBuilder builder.

type LevenshteinNFA

type LevenshteinNFA struct {
	// contains filtered or unexported fields
}

type MultiState

type MultiState struct {
	// contains filtered or unexported fields
}

func (*MultiState) Clear

func (ms *MultiState) Clear()

func (*MultiState) States

func (ms *MultiState) States() []NFAState

type NFAState

type NFAState struct {
	Offset      uint32
	Distance    uint8
	InTranspose bool
}

type NFAStates

type NFAStates []NFAState

func (NFAStates) Len

func (ns NFAStates) Len() int

func (NFAStates) Less

func (ns NFAStates) Less(i, j int) bool

func (NFAStates) Swap

func (ns NFAStates) Swap(i, j int)

type ParametricDFA

type ParametricDFA struct {
	// contains filtered or unexported fields
}

type ParametricState

type ParametricState struct {
	// contains filtered or unexported fields
}

type ParametricStateIndex

type ParametricStateIndex struct {
	// contains filtered or unexported fields
}

type Transition

type Transition struct {
	// contains filtered or unexported fields
}

type Utf8DFABuilder

type Utf8DFABuilder struct {
	// contains filtered or unexported fields
}

Utf8DFABuilder makes it possible to define a DFA that takes unicode character, and build a `DFA` that operates on utf-8 encoded

type Utf8DFAStateBuilder

type Utf8DFAStateBuilder struct {
	// contains filtered or unexported fields
}

type Utf8StateId

type Utf8StateId uint32

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL