strmatcher

package
v5.14.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2024 License: MIT Imports: 9 Imported by: 5

Documentation

Index

Constants

View Source
const PrimeRK = 16777619

PrimeRK is the prime base used in Rabin-Karp algorithm.

Variables

This section is empty.

Functions

func AddMatcherToGroup

func AddMatcherToGroup(g MatcherGroup, matcher Matcher, value uint32) error

AddMatcherToGroup is a helper function to try to add a Matcher to any kind of MatcherGroup. It returns error if the MatcherGroup does not accept the provided Matcher's type. This function is provided to help writing code to test a MatcherGroup.

func CompositeMatches added in v5.2.0

func CompositeMatches(matches [][]uint32) []uint32

CompositeMatches flattens the matches slice to produce a single matched indices slice. It is designed to avoid new memory allocation as possible.

func CompositeMatchesReverse added in v5.2.0

func CompositeMatchesReverse(matches [][]uint32) []uint32

CompositeMatches flattens the matches slice to produce a single matched indices slice. It is designed that:

  1. All matchers are concatenated in reverse order, so the matcher that matches further ranks higher.
  2. Indices in the same matcher keeps their original order.
  3. Avoid new memory allocation as possible.

func MemHash

func MemHash(seed uint32, input string) uint32

MemHash is the hash function used by go map, it utilizes available hardware instructions(behaves as aeshash if aes instruction is available). With different seed, each MemHash<seed> performs as distinct hash functions.

func RollingHash

func RollingHash(hash uint32, input string) uint32

RollingHash calculates the rolling murmurHash of given string based on a provided suffix hash.

func ToDomain added in v5.2.0

func ToDomain(pattern string) (string, error)

ToDomain converts input pattern to a domain string, and return error if such a conversion cannot be made.

  1. Conforms to Letter-Digit-Hyphen (LDH) subset (https://tools.ietf.org/html/rfc952): * Letters A to Z (no distinction between uppercase and lowercase, we convert to lowers) * Digits 0 to 9 * Hyphens(-) and Periods(.)
  2. If any non-ASCII characters, domain are converted from Internationalized domain name to Punycode.

Types

type ACAutomatonMatcherGroup

type ACAutomatonMatcherGroup struct {
	// contains filtered or unexported fields
}

ACAutoMationMatcherGroup is an implementation of MatcherGroup. It uses an AC Automata to provide support for Full, Domain and Substr matcher. Trie node is char based.

NOTICE: ACAutomatonMatcherGroup currently uses a restricted charset (LDH Subset), upstream should manually in a way to ensure all patterns and inputs passed to it to be in this charset.

func NewACAutomatonMatcherGroup

func NewACAutomatonMatcherGroup() *ACAutomatonMatcherGroup

func (*ACAutomatonMatcherGroup) AddDomainMatcher

func (ac *ACAutomatonMatcherGroup) AddDomainMatcher(matcher DomainMatcher, value uint32)

AddDomainMatcher implements MatcherGroupForDomain.AddDomainMatcher.

func (*ACAutomatonMatcherGroup) AddFullMatcher

func (ac *ACAutomatonMatcherGroup) AddFullMatcher(matcher FullMatcher, value uint32)

AddFullMatcher implements MatcherGroupForFull.AddFullMatcher.

func (*ACAutomatonMatcherGroup) AddSubstrMatcher

func (ac *ACAutomatonMatcherGroup) AddSubstrMatcher(matcher SubstrMatcher, value uint32)

AddSubstrMatcher implements MatcherGroupForSubstr.AddSubstrMatcher.

func (*ACAutomatonMatcherGroup) Build

func (ac *ACAutomatonMatcherGroup) Build() error

func (*ACAutomatonMatcherGroup) Match

func (ac *ACAutomatonMatcherGroup) Match(input string) []uint32

Match implements MatcherGroup.Match.

func (*ACAutomatonMatcherGroup) MatchAny

func (ac *ACAutomatonMatcherGroup) MatchAny(input string) bool

MatchAny implements MatcherGroup.MatchAny.

type DomainMatcher

type DomainMatcher string

DomainMatcher is an implementation of Matcher.

func (DomainMatcher) Match

func (m DomainMatcher) Match(s string) bool

func (DomainMatcher) Pattern

func (m DomainMatcher) Pattern() string

func (DomainMatcher) String

func (m DomainMatcher) String() string

func (DomainMatcher) Type

func (DomainMatcher) Type() Type

type DomainMatcherGroup

type DomainMatcherGroup struct {
	// contains filtered or unexported fields
}

DomainMatcherGroup is an implementation of MatcherGroup. It uses trie to optimize both memory consumption and lookup speed. Trie node is domain label based.

func NewDomainMatcherGroup added in v5.2.0

func NewDomainMatcherGroup() *DomainMatcherGroup

func (*DomainMatcherGroup) AddDomainMatcher

func (g *DomainMatcherGroup) AddDomainMatcher(matcher DomainMatcher, value uint32)

AddDomainMatcher implements MatcherGroupForDomain.AddDomainMatcher.

func (*DomainMatcherGroup) Match

func (g *DomainMatcherGroup) Match(input string) []uint32

Match implements MatcherGroup.Match.

func (*DomainMatcherGroup) MatchAny

func (g *DomainMatcherGroup) MatchAny(input string) bool

MatchAny implements MatcherGroup.MatchAny.

type FullMatcher

type FullMatcher string

FullMatcher is an implementation of Matcher.

func (FullMatcher) Match

func (m FullMatcher) Match(s string) bool

func (FullMatcher) Pattern

func (m FullMatcher) Pattern() string

func (FullMatcher) String

func (m FullMatcher) String() string

func (FullMatcher) Type

func (FullMatcher) Type() Type

type FullMatcherGroup

type FullMatcherGroup struct {
	// contains filtered or unexported fields
}

FullMatcherGroup is an implementation of MatcherGroup. It uses a hash table to facilitate exact match lookup.

func NewFullMatcherGroup added in v5.2.0

func NewFullMatcherGroup() *FullMatcherGroup

func (*FullMatcherGroup) AddFullMatcher

func (g *FullMatcherGroup) AddFullMatcher(matcher FullMatcher, value uint32)

AddFullMatcher implements MatcherGroupForFull.AddFullMatcher.

func (*FullMatcherGroup) Match

func (g *FullMatcherGroup) Match(input string) []uint32

Match implements MatcherGroup.Match.

func (*FullMatcherGroup) MatchAny

func (g *FullMatcherGroup) MatchAny(input string) bool

MatchAny implements MatcherGroup.Any.

type IndexMatcher

type IndexMatcher interface {
	// Size returns number of matchers added to IndexMatcher.
	Size() uint32

	// Add adds a new Matcher to IndexMatcher, and returns its index. The index will never be 0.
	Add(matcher Matcher) uint32

	// Build builds the IndexMatcher to be ready for matching.
	Build() error

	// Match returns the indices of all matchers that matches the input.
	//   * Empty array is returned if no such matcher exists.
	//   * The order of returned matchers should follow priority specification.
	// Priority specification:
	//   1. Priority between matcher types: full > domain > substr > regex.
	//   2. Priority of same-priority matchers matching at same position: the early added takes precedence.
	//   3. Priority of domain matchers matching at different levels: the further matched domain takes precedence.
	//   4. Priority of substr matchers matching at different positions: the further matched substr takes precedence.
	Match(input string) []uint32

	// MatchAny returns true as soon as one matching matcher is found.
	MatchAny(input string) bool
}

IndexMatcher is a general type of matcher thats accepts all kinds of basic matchers. It should:

  • Accept all Matcher types with no exception.
  • Optimize string matching with a combination of MatcherGroups.
  • Obey certain priority order specification when returning matched Matchers.

type LinearIndexMatcher

type LinearIndexMatcher struct {
	// contains filtered or unexported fields
}

LinearIndexMatcher is an implementation of IndexMatcher.

func NewLinearIndexMatcher

func NewLinearIndexMatcher() *LinearIndexMatcher

func (*LinearIndexMatcher) Add

func (g *LinearIndexMatcher) Add(matcher Matcher) uint32

Add implements IndexMatcher.Add.

func (*LinearIndexMatcher) Build

func (*LinearIndexMatcher) Build() error

Build implements IndexMatcher.Build.

func (*LinearIndexMatcher) Match

func (g *LinearIndexMatcher) Match(input string) []uint32

Match implements IndexMatcher.Match.

func (*LinearIndexMatcher) MatchAny

func (g *LinearIndexMatcher) MatchAny(input string) bool

MatchAny implements IndexMatcher.MatchAny.

func (*LinearIndexMatcher) Size

func (g *LinearIndexMatcher) Size() uint32

Size implements IndexMatcher.Size.

type Matcher

type Matcher interface {
	// Type returns the matcher's type.
	Type() Type

	// Pattern returns the matcher's raw string representation.
	Pattern() string

	// String returns a string representation of the matcher containing its type and pattern.
	String() string

	// Match returns true if the given string matches a predefined pattern.
	//   * This method is seldom used for performance reason
	//     and is generally taken over by their corresponding MatcherGroup.
	Match(input string) bool
}

Matcher is the interface to determine a string matches a pattern.

  • This is a basic matcher to represent a certain kind of match semantic(full, substr, domain or regex).

type MatcherGroup

type MatcherGroup interface {
	// Match returns all matched matchers with their corresponding values.
	Match(input string) []uint32

	// MatchAny returns true as soon as one matching matcher is found.
	MatchAny(input string) bool
}

MatcherGroup is an advanced type of matcher to accept a bunch of basic Matchers (of certain type, not all matcher types). For example:

  • FullMatcherGroup accepts FullMatcher and uses a hash table to facilitate lookup.
  • DomainMatcherGroup accepts DomainMatcher and uses a trie to optimize both memory consumption and lookup speed.

type MatcherGroupForAll

type MatcherGroupForAll interface {
	AddMatcher(matcher Matcher, value uint32)
}

MatcherGroupForAll is an interface indicating a MatcherGroup could accept all types of matchers.

type MatcherGroupForDomain

type MatcherGroupForDomain interface {
	AddDomainMatcher(matcher DomainMatcher, value uint32)
}

MatcherGroupForDomain is an interface indicating a MatcherGroup could accept DomainMatchers.

type MatcherGroupForFull

type MatcherGroupForFull interface {
	AddFullMatcher(matcher FullMatcher, value uint32)
}

MatcherGroupForFull is an interface indicating a MatcherGroup could accept FullMatchers.

type MatcherGroupForRegex

type MatcherGroupForRegex interface {
	AddRegexMatcher(matcher *RegexMatcher, value uint32)
}

MatcherGroupForRegex is an interface indicating a MatcherGroup could accept RegexMatchers.

type MatcherGroupForSubstr

type MatcherGroupForSubstr interface {
	AddSubstrMatcher(matcher SubstrMatcher, value uint32)
}

MatcherGroupForSubstr is an interface indicating a MatcherGroup could accept SubstrMatchers.

type MphIndexMatcher

type MphIndexMatcher struct {
	// contains filtered or unexported fields
}

A MphIndexMatcher is divided into three parts: 1. `full` and `domain` patterns are matched by Rabin-Karp algorithm and minimal perfect hash table; 2. `substr` patterns are matched by ac automaton; 3. `regex` patterns are matched with the regex library.

func NewMphIndexMatcher

func NewMphIndexMatcher() *MphIndexMatcher

func (*MphIndexMatcher) Add

func (g *MphIndexMatcher) Add(matcher Matcher) uint32

Add implements IndexMatcher.Add.

func (*MphIndexMatcher) Build

func (g *MphIndexMatcher) Build() error

Build implements IndexMatcher.Build.

func (*MphIndexMatcher) Match

func (g *MphIndexMatcher) Match(input string) []uint32

Match implements IndexMatcher.Match.

func (*MphIndexMatcher) MatchAny

func (g *MphIndexMatcher) MatchAny(input string) bool

MatchAny implements IndexMatcher.MatchAny.

func (*MphIndexMatcher) Size

func (g *MphIndexMatcher) Size() uint32

Size implements IndexMatcher.Size.

type MphMatcherGroup

type MphMatcherGroup struct {
	// contains filtered or unexported fields
}

MphMatcherGroup is an implementation of MatcherGroup. It implements Rabin-Karp algorithm and minimal perfect hash table for Full and Domain matcher.

func NewMphMatcherGroup

func NewMphMatcherGroup() *MphMatcherGroup

func (*MphMatcherGroup) AddDomainMatcher

func (g *MphMatcherGroup) AddDomainMatcher(matcher DomainMatcher, value uint32)

AddDomainMatcher implements MatcherGroupForDomain.

func (*MphMatcherGroup) AddFullMatcher

func (g *MphMatcherGroup) AddFullMatcher(matcher FullMatcher, value uint32)

AddFullMatcher implements MatcherGroupForFull.

func (*MphMatcherGroup) Build

func (g *MphMatcherGroup) Build() error

Build builds a minimal perfect hash table for insert rules. Algorithm used: Hash, displace, and compress. See http://cmph.sourceforge.net/papers/esa09.pdf

func (*MphMatcherGroup) Lookup

func (g *MphMatcherGroup) Lookup(rollingHash uint32, input string) uint32

Lookup searches for input in minimal perfect hash table and returns its index. 0 indicates not found.

func (*MphMatcherGroup) Match

func (g *MphMatcherGroup) Match(input string) []uint32

Match implements MatcherGroup.Match.

func (*MphMatcherGroup) MatchAny

func (g *MphMatcherGroup) MatchAny(input string) bool

MatchAny implements MatcherGroup.MatchAny.

type RegexMatcher

type RegexMatcher struct {
	// contains filtered or unexported fields
}

RegexMatcher is an implementation of Matcher.

func (*RegexMatcher) Match

func (m *RegexMatcher) Match(s string) bool

func (*RegexMatcher) Pattern

func (m *RegexMatcher) Pattern() string

func (*RegexMatcher) String

func (m *RegexMatcher) String() string

func (*RegexMatcher) Type

func (*RegexMatcher) Type() Type

type SimpleMatcherGroup

type SimpleMatcherGroup struct {
	// contains filtered or unexported fields
}

SimpleMatcherGroup is an implementation of MatcherGroup. It simply stores all matchers in an array and sequentially matches them.

func (*SimpleMatcherGroup) AddMatcher

func (g *SimpleMatcherGroup) AddMatcher(matcher Matcher, value uint32)

AddMatcher implements MatcherGroupForAll.AddMatcher.

func (*SimpleMatcherGroup) Match

func (g *SimpleMatcherGroup) Match(input string) []uint32

Match implements MatcherGroup.Match.

func (*SimpleMatcherGroup) MatchAny

func (g *SimpleMatcherGroup) MatchAny(input string) bool

MatchAny implements MatcherGroup.MatchAny.

type SubstrMatcher

type SubstrMatcher string

SubstrMatcher is an implementation of Matcher.

func (SubstrMatcher) Match

func (m SubstrMatcher) Match(s string) bool

func (SubstrMatcher) Pattern

func (m SubstrMatcher) Pattern() string

func (SubstrMatcher) String

func (m SubstrMatcher) String() string

func (SubstrMatcher) Type

func (SubstrMatcher) Type() Type

type SubstrMatcherGroup

type SubstrMatcherGroup struct {
	// contains filtered or unexported fields
}

SubstrMatcherGroup is implementation of MatcherGroup, It is simply implmeneted to comply with the priority specification of Substr matchers.

func (*SubstrMatcherGroup) AddSubstrMatcher

func (g *SubstrMatcherGroup) AddSubstrMatcher(matcher SubstrMatcher, value uint32)

AddSubstrMatcher implements MatcherGroupForSubstr.AddSubstrMatcher.

func (*SubstrMatcherGroup) Match

func (g *SubstrMatcherGroup) Match(input string) []uint32

Match implements MatcherGroup.Match.

func (*SubstrMatcherGroup) MatchAny

func (g *SubstrMatcherGroup) MatchAny(input string) bool

MatchAny implements MatcherGroup.MatchAny.

type Type

type Type byte

Type is the type of the matcher.

const (
	// Full is the type of matcher that the input string must exactly equal to the pattern.
	Full Type = 0
	// Domain is the type of matcher that the input string must be a sub-domain or itself of the pattern.
	Domain Type = 1
	// Substr is the type of matcher that the input string must contain the pattern as a sub-string.
	Substr Type = 2
	// Regex is the type of matcher that the input string must matches the regular-expression pattern.
	Regex Type = 3
)

func (Type) New

func (t Type) New(pattern string) (Matcher, error)

New creates a new Matcher based on the given pattern.

func (Type) NewDomainPattern added in v5.2.0

func (t Type) NewDomainPattern(pattern string) (Matcher, error)

NewDomainPattern creates a new Matcher based on the given domain pattern. It works like `Type.New`, but will do validation and conversion to ensure it's a valid domain pattern.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL