fuzzy

package
v0.0.0-...-717b025 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 12, 2022 License: BSD-3-Clause Imports: 3 Imported by: 4

Documentation

Overview

Package fuzzy implements a fuzzy matching algorithm.

Index

Constants

View Source
const (
	// MaxInputSize is the maximum size of the input scored against the fuzzy matcher. Longer inputs
	// will be truncated to this size.
	MaxInputSize = 127
	// MaxPatternSize is the maximum size of the pattern used to construct the fuzzy matcher. Longer
	// inputs are truncated to this size.
	MaxPatternSize = 63
)

Variables

This section is empty.

Functions

func BestMatch

func BestMatch(pattern string, names []string) string

BestMatch returns the name most similar to the pattern, using fuzzy matching, or the empty string.

func LastSegment

func LastSegment(input string, roles []RuneRole) string

LastSegment returns the substring representing the last segment from the input, where each byte has an associated RuneRole in the roles slice. This makes sense only for inputs of Symbol or Filename type.

func Words

func Words(roles []RuneRole, consume WordConsumer)

Words find word delimiters in an input based on its bytes' mappings to rune roles. The offset delimiters for each word are fed to the provided consumer function.

Types

type Matcher

type Matcher struct {
	// contains filtered or unexported fields
}

Matcher implements a fuzzy matching algorithm for scoring candidates against a pattern. The matcher does not support parallel usage.

func NewMatcher

func NewMatcher(pattern string) *Matcher

NewMatcher returns a new fuzzy matcher for scoring candidates against the provided pattern.

func (*Matcher) MatchedRanges

func (m *Matcher) MatchedRanges() []int

MatchedRanges returns matches ranges for the last scored string as a flattened array of [begin, end) byte offset pairs.

func (*Matcher) Score

func (m *Matcher) Score(candidate string) float32

Score returns the score returned by matching the candidate to the pattern. This is not designed for parallel use. Multiple candidates must be scored sequentially. Returns a score between 0 and 1 (0 - no match, 1 - perfect match).

func (*Matcher) ScoreChunks

func (m *Matcher) ScoreChunks(chunks []string) float32

func (*Matcher) ScoreTable

func (m *Matcher) ScoreTable(candidate string) string

ScoreTable returns the score table computed for the provided candidate. Used only for debugging.

type RuneRole

type RuneRole byte

RuneRole specifies the role of a rune in the context of an input.

const (
	// RNone specifies a rune without any role in the input (i.e., whitespace/non-ASCII).
	RNone RuneRole = iota
	// RSep specifies a rune with the role of segment separator.
	RSep
	// RTail specifies a rune which is a lower-case tail in a word in the input.
	RTail
	// RUCTail specifies a rune which is an upper-case tail in a word in the input.
	RUCTail
	// RHead specifies a rune which is the first character in a word in the input.
	RHead
)

func RuneRoles

func RuneRoles(candidate []byte, reuse []RuneRole) []RuneRole

RuneRoles detects the roles of each byte rune in an input string and stores it in the output slice. The rune role depends on the input type. Stops when it parsed all the runes in the string or when it filled the output. If output is nil, then it gets created.

type SymbolMatcher

type SymbolMatcher struct {
	// contains filtered or unexported fields
}

SymbolMatcher implements a fuzzy matching algorithm optimized for Go symbols of the form:

example.com/path/to/package.object.field

Knowing that we are matching symbols like this allows us to make the following optimizations:

  • We can incorporate right-to-left relevance directly into the score calculation.
  • We can match from right to left, discarding leading bytes if the input is too long.
  • We just take the right-most match without losing too much precision. This allows us to use an O(n) algorithm.
  • We can operate directly on chunked strings; in many cases we will be storing the package path and/or package name separately from the symbol or identifiers, so doing this avoids allocating strings.
  • We can return the index of the right-most match, allowing us to trim irrelevant qualification.

This implementation is experimental, serving as a reference fast algorithm to compare to the fuzzy algorithm implemented by Matcher.

func NewSymbolMatcher

func NewSymbolMatcher(pattern string) *SymbolMatcher

NewSymbolMatcher creates a SymbolMatcher that may be used to match the given search pattern.

Currently this matcher only accepts case-insensitive fuzzy patterns.

An empty pattern matches no input.

func (*SymbolMatcher) Match

func (m *SymbolMatcher) Match(chunks []string) (int, float64)

Match looks for the right-most match of the search pattern within the symbol represented by concatenating the given chunks, returning its offset and score.

If a match is found, the first return value will hold the absolute byte offset within all chunks for the start of the symbol. In other words, the index of the match within strings.Join(chunks, ""). If no match is found, the first return value will be -1.

The second return value will be the score of the match, which is always between 0 and 1, inclusive. A score of 0 indicates no match.

type WordConsumer

type WordConsumer func(start, end int)

WordConsumer defines a consumer for a word delimited by the [start,end) byte offsets in an input (start is inclusive, end is exclusive).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL