aho_corasick

package module
v0.1.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 11, 2024 License: MIT Imports: 2 Imported by: 1

README

aho_corasick

A Go implementation of Aho-Corasick algorithm for efficient multiple pattern matching within a string.

Introduction

This project implements the powerful string searching Aho-Corasick algorithm invented by Alfred V. Aho and Margaret J. Corasick in the Go programming language. The Aho-Corasick algorithm is useful because it efficiently indexes all occurrences of a list of keywords within a text string.

This implementation searches at the letter level instead of the word level. Both failure links and dictionary links are implemented.

Installation

The Go module is installable by running:

go get github.com/gnames/aho_corasick

Usage

ereate a new aho_corasick instance with aho_corasick.New() and setup the automaton with the search patterns with ac.Setup(patterns). Run search with ac.Setup(patterns), which returns an array of matches.

ac := aho_corasick.New()
patterns := []string{"aba", "cla", "ac", "gee", "lan"}
ac.Setup(patterns)
haystack := "abacgeeaba"
matches := ac.Search(haystack)

Development

If you find a bug, please open an issue ticket. Pull requests are welcome.

Tests can be run with go test which will produce a text visual of the trie:

******* Trie *******

haystack: abacgeeaba

root->root ┬─ a->root ┬─ b->root ── a*->a
           │          └─ c*->c
           ├─ c->root ── l->l ── a*->a
           ├─ g->root ── e->root ── e*->root
           └─ l->root ── a->a ── n*->root

********************
PASS

In the trie output, root refers to the root node, * represents word nodes, -> indicates the failure links, | indicates dictionary links.

Trie output can also be produced with the debugger, which can be run with:

ac := aho_corasick.New()
haystack := "geeabaclaba"
patterns := []string{"aba", "cla", "ac", "gee", "lan"}
ac.Setup(patterns)
ac.Debug(haystack)

License

Authors

Documentation

Overview

Package aho_corasick provides implementation of Aho-Corasick algorithm. This algorithm allows to efficiently detect multiple substrings in a one or more strings.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/gnames/aho_corasick"
)

func main() {
	// create AhoCorasick instance
	ac := aho_corasick.New()
	// initialize it with string patterns that need to be matched
	ac.Setup([]string{"aba", "cla", "ac", "gee", "lan"})

	// Search for the patterns, providing information of a start and end
	// positions for every found pattern
	matches := ac.Search("aclant")
	fmt.Printf(
		"Pattern: %s, Pos: %d:%d\n",
		matches[0].Pattern,
		matches[0].Start,
		matches[0].End+1,
	)

	// Search and return a unique list of matched patterns
	matchesUnique := ac.SearchUniq("abacgeeaba")
	patterns := make([]string, len(matchesUnique))
	for i, v := range matchesUnique {
		patterns[i] = v.Pattern
	}
	fmt.Printf("Patterns: %s", strings.Join(patterns, ", "))

}
Output:

Pattern: ac, Pos: 0:2
Patterns: aba, ac, gee

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AhoCorasick

type AhoCorasick interface {
	// Setup method has to run before searches. It generates data structure for
	// subsequent searches. It returns back the number of nodes in the resulting
	// suffix tree (trie).
	Setup(patterns []string) int
	// Search takes a string and finds all occurances of patterns used during
	// initialization via Setup method. The result is a slice or matches. Every
	// match entity provides the following information:
	// - Pattern (string): the matched pattern
	// - PatternIndex (int): index of the pattern in the original slice of
	// patterns.
	// - Start: byte index of the pattern occurance on the haystack.
	// - End: byte index of the end of the pattern occurance on the haystack.
	Search(haystack string) []match.Match
	// SearchUniq is similar to Search method, but it returns unique list of
	// matched patterns and does not provide Start and End information.
	SearchUniq(haystack string) []match.Match
	// Debug helps to visualize the generated suffix tree and also prints the
	// haystack string for convenience. This method is not needed for
	// functionality, but is useful for debugging and development purposes.
	Debug(haystack string)
}

AcoCorasick provides methods needed to initialize and run Aho-Corasick algorithm. The algorithm provides linear O(n) performance and allows to match many substrings (patterns) to many strings.

func New

func New() AhoCorasick

New does not require any input and outputs an instance of AhoCorasick entity.

Directories

Path Synopsis
ent
match
Package match provides a structure for Aho-Corasick output.
Package match provides a structure for Aho-Corasick output.
trie
Package trie provides internal data structure for Aho-Corasick algorithm.
Package trie provides internal data structure for Aho-Corasick algorithm.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL