chimera

package
v1.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 1, 2024 License: Apache-2.0, MIT Imports: 11 Imported by: 0

Documentation

Overview

Chimera is a software regular expression matching engine that is a hybrid of Hyperscan and PCRE. The design goals of Chimera are to fully support PCRE syntax as well as to take advantage of the high performance nature of Hyperscan.

Chimera inherits the design guideline of Hyperscan with C APIs for compilation and scanning.

The Chimera API itself is composed of two major components:

Compilation

These functions take a group of regular expressions, along with identifiers and option flags, and compile them into an immutable database that can be used by the Chimera scanning API. This compilation process performs considerable analysis and optimization work in order to build a database that will match the given expressions efficiently.

See Compiling Patterns for more details (https://intel.github.io/hyperscan/dev-reference/chimera.html#chcompile)

Scanning

Once a Chimera database has been created, it can be used to scan data in memory. Chimera only supports block mode in which we scan a single contiguous block in memory.

Matches are delivered to the application via a user-supplied callback function that is called synchronously for each match.

For a given database, Chimera provides several guarantees:

1 No memory allocations occur at runtime with the exception of scratch space allocation, it should be done ahead of time for performance-critical applications:

2 Scratch space: temporary memory used for internal data at scan time. Structures in scratch space do not persist beyond the end of a single scan call.

3 The size of the scratch space required for a given database is fixed and determined at database compile time. This means that the memory requirement of the application are known ahead of time, and the scratch space can be pre-allocated if required for performance reasons.

4 Any pattern that has successfully been compiled by the Chimera compiler can be scanned against any input. There could be internal resource limits or other limitations caused by PCRE at runtime that could cause a scan call to return an error.

* Note

Chimera is designed to have the same matching behavior as PCRE, including greedy/ungreedy, capturing, etc. Chimera reports both start offset and end offset for each match like PCRE. Different from the fashion of reporting all matches in Hyperscan, Chimera only reports non-overlapping matches. For example, the pattern /foofoo/ will match foofoofoofoo at offsets (0, 6) and (6, 12).

* Note

Since Chimera is a hybrid of Hyperscan and PCRE in order to support full PCRE syntax, there will be extra performance overhead compared to Hyperscan-only solution. Please always use Hyperscan for better performance unless you must need full PCRE syntax support.

See Scanning for Patterns for more details (https://intel.github.io/hyperscan/dev-reference/chimera.html#chruntime)

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func Match

func Match(pattern string, data []byte) (bool, error)

Match reports whether the byte slice b contains any match of the regular expression pattern.

Example
package main

import (
	"fmt"

	"github.com/flier/gohs/chimera"
)

func main() {
	matched, err := chimera.Match(`foo.*`, []byte(`seafood`))
	fmt.Println(matched, err)
	matched, err = chimera.Match(`bar.*`, []byte(`seafood`))
	fmt.Println(matched, err)
	matched, err = chimera.Match(`a(b`, []byte(`seafood`))
	fmt.Println(matched, err)
}
Output:

true <nil>
false <nil>
false create block database, PCRE compilation failed: missing ).

func MatchString

func MatchString(pattern, s string) (matched bool, err error)

MatchString reports whether the string s contains any match of the regular expression pattern.

func Quote

func Quote(s string) string

Quote returns a quoted string literal representing s.

func Version

func Version() string

Version identify this release version.

The return version is a string containing the version number of this release build and the date of the build.

Types

type BlockDatabase

type BlockDatabase interface {
	Database
	BlockScanner
	BlockMatcher
}

BlockDatabase scan the target data that is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.

func NewBlockDatabase

func NewBlockDatabase(patterns ...*Pattern) (bdb BlockDatabase, err error)

NewBlockDatabase compile expressions into a pattern database.

func NewManagedBlockDatabase

func NewManagedBlockDatabase(patterns ...*Pattern) (BlockDatabase, error)

NewManagedBlockDatabase is a wrapper for NewBlockDatabase that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.

type BlockMatcher

type BlockMatcher interface {
	// Find returns a slice holding the text of the leftmost match in b of the regular expression.
	// A return value of nil indicates no match.
	Find(data []byte) []byte

	// FindIndex returns a two-element slice of integers defining
	// the location of the leftmost match in b of the regular expression.
	// The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match.
	FindIndex(data []byte) []int

	// FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAll(data []byte, n int) [][]byte

	// FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllIndex(data []byte, n int) [][]int

	// FindString returns a string holding the text of the leftmost match in s of the regular expression.
	// If there is no match, the return value is an empty string, but it will also be empty
	// if the regular expression successfully matches an empty string.
	// Use FindStringIndex if it is necessary to distinguish these cases.
	FindString(s string) string

	// FindStringIndex returns a two-element slice of integers defining
	// the location of the leftmost match in s of the regular expression.
	// The match itself is at s[loc[0]:loc[1]]. A return value of nil indicates no match.
	FindStringIndex(s string) []int

	// FindAllString is the 'All' version of FindString; it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllString(s string, n int) []string

	// FindAllStringIndex is the 'All' version of FindStringIndex;
	// it returns a slice of all successive matches of the expression,
	// as defined by the 'All' description in the package comment. A return value of nil indicates no match.
	FindAllStringIndex(s string, n int) [][]int

	// Match reports whether the pattern database matches the byte slice b.
	Match(b []byte) bool

	// MatchString reports whether the pattern database matches the string s.
	MatchString(s string) bool
}

BlockMatcher implements regular expression search.

type BlockScanner

type BlockScanner interface {
	// This is the function call in which the actual pattern matching takes place for block-mode pattern databases.
	Scan(data []byte, scratch *Scratch, handler Handler, context interface{}) error
}

BlockScanner is the block (non-streaming) regular expression scanner.

Example
package main

import (
	"fmt"

	"github.com/flier/gohs/chimera"
)

func main() {
	p, err := chimera.ParsePattern(`foo(bar)+`)
	if err != nil {
		fmt.Println("parse pattern failed,", err)
		return
	}

	// Create new block database with pattern
	db, err := chimera.NewBlockDatabase(p)
	if err != nil {
		fmt.Println("create database failed,", err)
		return
	}
	defer db.Close()

	// Create new scratch for scanning
	s, err := chimera.NewScratch(db)
	if err != nil {
		fmt.Println("create scratch failed,", err)
		return
	}

	defer func() {
		_ = s.Free()
	}()

	// Record matching text
	type Match struct {
		from uint64
		to   uint64
	}

	var matches []Match

	handler := chimera.HandlerFunc(func(id uint, from, to uint64, flags uint,
		captured []*chimera.Capture, ctx interface{},
	) chimera.Callback {
		matches = append(matches, Match{from, to})
		return chimera.Continue
	})

	data := []byte("hello foobarbar!")

	// Scan data block with handler
	if err := db.Scan(data, s, handler, nil); err != nil {
		fmt.Println("database scan failed,", err)
		return
	}

	// chimera will reports all matches
	for _, m := range matches {
		fmt.Println("match [", m.from, ":", m.to, "]", string(data[m.from:m.to]))
	}

}
Output:

match [ 6 : 15 ] foobarbar

type Builder

type Builder interface {
	// Build the database with the given mode.
	Build(mode CompileMode) (Database, error)

	// ForPlatform determine the target platform for the database
	ForPlatform(mode CompileMode, platform hyperscan.Platform) (Database, error)
}

Builder creates a database with the given mode and target platform.

type Callback

type Callback = ch.Callback

Callback return value used to tell the Chimera matcher what to do after processing this match.

const (
	Continue    Callback = ch.Continue    // Continue matching.
	Terminate   Callback = ch.Terminate   // Terminate matching.
	SkipPattern Callback = ch.SkipPattern // Skip remaining matches for this ID and continue.
)

type Capture

type Capture = ch.Capture

Capture representing a captured subexpression within a match.

type CompileError

type CompileError = ch.CompileError

A type containing error details that is returned by the compile calls on failure.

The caller may inspect the values returned in this type to determine the cause of failure.

type CompileFlag

type CompileFlag = ch.CompileFlag

CompileFlag represents a pattern flag.

const (
	// Caseless represents set case-insensitive matching.
	Caseless CompileFlag = ch.Caseless
	// DotAll represents matching a `.` will not exclude newlines.
	DotAll CompileFlag = ch.DotAll
	// MultiLine set multi-line anchoring.
	MultiLine CompileFlag = ch.MultiLine
	// SingleMatch set single-match only mode.
	SingleMatch CompileFlag = ch.SingleMatch
	// Utf8Mode enable UTF-8 mode for this expression.
	Utf8Mode CompileFlag = ch.Utf8Mode
	// UnicodeProperty enable Unicode property support for this expression.
	UnicodeProperty CompileFlag = ch.UnicodeProperty
)

func ParseCompileFlag

func ParseCompileFlag(s string) (CompileFlag, error)

ParseCompileFlag parse the compile pattern flags from string

i	Caseless 		Case-insensitive matching
s	DotAll			Dot (.) will match newlines
m	MultiLine		Multi-line anchoring
H	SingleMatch		Report match ID at most once (`o` deprecated)
8	Utf8Mode		UTF-8 mode (`u` deprecated)
W	UnicodeProperty		Unicode property support (`p` deprecated)

type CompileMode

type CompileMode = ch.CompileMode

CompileMode flags.

const (
	// Disable capturing groups.
	NoGroups CompileMode = ch.NoGroups

	// Enable capturing groups.
	Groups CompileMode = ch.Groups
)

type Database

type Database interface {
	// Provides information about a database.
	Info() (DbInfo, error)

	// Provides the size of the given database in bytes.
	Size() (int, error)

	// Free a compiled pattern database.
	Close() error
}

Database is an immutable database that can be used by the Chimera scanning API.

func Compile

func Compile(expr string) (Database, error)

Compile a regular expression and returns, if successful, a pattern database in the block mode that can be used to match against text.

func MustCompile

func MustCompile(expr string) Database

MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.

type DatabaseBuilder

type DatabaseBuilder struct {
	// Array of patterns to compile.
	Patterns

	// Compiler mode flags that affect the database as a whole. (Default: capturing groups mode)
	Mode CompileMode

	// If not nil, the platform structure is used to determine the target platform for the database.
	// If nil, a database suitable for running on the current host platform is produced.
	hyperscan.Platform

	// A limit from pcre_extra on the amount of match function called in PCRE to limit backtracking that can take place.
	MatchLimit uint

	// A limit from pcre_extra on the recursion depth of match function in PCRE.
	MatchLimitRecursion uint
}

DatabaseBuilder creates a database that will be used to matching the patterns.

func (*DatabaseBuilder) AddExpressionWithFlags

func (b *DatabaseBuilder) AddExpressionWithFlags(expr string, flags CompileFlag) *DatabaseBuilder

AddExpressionWithFlags add more expressions with flags to the database.

func (*DatabaseBuilder) AddExpressions

func (b *DatabaseBuilder) AddExpressions(exprs ...string) *DatabaseBuilder

AddExpressions add more expressions to the database.

func (*DatabaseBuilder) Build

func (b *DatabaseBuilder) Build() (Database, error)

Build a database base on the expressions and platform.

type DbInfo

type DbInfo string //nolint: stylecheck

DbInfo identify the version and platform information for the supplied database.

func (DbInfo) Mode

func (i DbInfo) Mode() (hyperscan.ModeFlag, error)

Mode is the scanning mode for the supplied database.

func (DbInfo) Parse

func (i DbInfo) Parse() (version, features, mode string, err error)

Parse the version and platform information.

func (DbInfo) String

func (i DbInfo) String() string

func (DbInfo) Version

func (i DbInfo) Version() (string, error)

Version is the version for the supplied database.

type Error

type Error = ch.Error

Error is the type for errors returned by Chimera functions.

const (
	// ErrSuccess is the error returned if the engine completed normally.
	ErrSuccess Error = ch.ErrSuccess
	// ErrInvalid is the error returned if a parameter passed to this function was invalid.
	ErrInvalid Error = ch.ErrInvalid
	// ErrNoMemory is the error returned if a memory allocation failed.
	ErrNoMemory Error = ch.ErrNoMemory
	// ErrScanTerminated is the error returned if the engine was terminated by callback.
	ErrScanTerminated Error = ch.ErrScanTerminated
	// ErrCompileError is the error returned if the pattern compiler failed.
	ErrCompileError Error = ch.ErrCompileError
	// ErrDatabaseVersionError is the error returned if the given database was built
	// for a different version of the Chimera matcher.
	ErrDatabaseVersionError Error = ch.ErrDatabaseVersionError
	// ErrDatabasePlatformError is the error returned if the given database was built for a different platform.
	ErrDatabasePlatformError Error = ch.ErrDatabasePlatformError
	// ErrDatabaseModeError is the error returned if the given database was built for a different mode of operation.
	ErrDatabaseModeError Error = ch.ErrDatabaseModeError
	// ErrBadAlign is the error returned if a parameter passed to this function was not correctly aligned.
	ErrBadAlign Error = ch.ErrBadAlign
	// ErrBadAlloc is the error returned if the memory allocator did not correctly return memory suitably aligned.
	ErrBadAlloc Error = ch.ErrBadAlloc
	// ErrScratchInUse is the error returned if the scratch region was already in use.
	ErrScratchInUse Error = ch.ErrScratchInUse
)

type ErrorEvent

type ErrorEvent = ch.ErrorEvent //nolint: errname

Type used to differentiate the errors raised with the `ErrorEventHandler` callback.

const (
	// PCRE hits its match limit and reports PCRE_ERROR_MATCHLIMIT.
	ErrMatchLimit ErrorEvent = ch.ErrMatchLimit
	// PCRE hits its recursion limit and reports PCRE_ERROR_RECURSIONLIMIT.
	ErrRecursionLimit ErrorEvent = ch.ErrRecursionLimit
)

type Handler

type Handler interface {
	// OnMatch will be invoked whenever a match is located in the target data during the execution of a scan.
	OnMatch(id uint, from, to uint64, flags uint, captured []*Capture, context interface{}) Callback

	// OnError will be invoked when an error event occurs during matching;
	// this indicates that some matches for a given expression may not be reported.
	OnError(event ErrorEvent, id uint, info, context interface{}) Callback
}

Definition of the chimera event callback handler.

type HandlerFunc

type HandlerFunc func(id uint, from, to uint64, flags uint, captured []*Capture, context interface{}) Callback

HandlerFunc type is an adapter to allow the use of ordinary functions as Chimera handlers. If f is a function with the appropriate signature, HandlerFunc(f) is a Handler that calls f.

func (HandlerFunc) OnError

func (f HandlerFunc) OnError(event ErrorEvent, id uint, info, context interface{}) Callback

OnError will be invoked when an error event occurs during matching; this indicates that some matches for a given expression may not be reported.

func (HandlerFunc) OnMatch

func (f HandlerFunc) OnMatch(id uint, from, to uint64, flags uint, captured []*Capture, ctx interface{}) Callback

OnMatch will be invoked whenever a match is located in the target data during the execution of a scan.

type Pattern

type Pattern ch.Pattern

Pattern is a matching pattern.

Example

This example demonstrates construct and match a pattern.

package main

import (
	"fmt"

	"github.com/flier/gohs/chimera"
)

func main() {
	p := chimera.NewPattern(`foo.*bar`, chimera.Caseless)
	fmt.Println(p)

	db, err := chimera.NewBlockDatabase(p)
	fmt.Println(err)

	found := db.MatchString("fooxyzbarbar")
	fmt.Println(found)

}
Output:

/foo.*bar/i
<nil>
true

func NewPattern

func NewPattern(expr string, flags CompileFlag) *Pattern

NewPattern returns a new pattern base on expression and compile flags.

func ParsePattern

func ParsePattern(s string) (*Pattern, error)

ParsePattern parse pattern from a formated string.

<integer id>:/<expression>/<flags>

For example, the following pattern will match `test` in the caseless and multi-lines mode

/test/im
Example

This example demonstrates parsing pattern with id and flags.

package main

import (
	"fmt"

	"github.com/flier/gohs/chimera"
)

func main() {
	p, err := chimera.ParsePattern("3:/foobar/i8")

	fmt.Println(err)
	fmt.Println(p.ID)
	fmt.Println(p.Expression)
	fmt.Println(p.Flags)

}
Output:

<nil>
3
foobar
8i

func (*Pattern) Build

func (p *Pattern) Build(mode CompileMode) (Database, error)

Build the database with the given mode.

func (*Pattern) ForPlatform

func (p *Pattern) ForPlatform(mode CompileMode, platform hyperscan.Platform) (Database, error)

ForPlatform determine the target platform for the database.

func (*Pattern) String

func (p *Pattern) String() string

type Patterns

type Patterns []*Pattern

Patterns is a set of matching patterns.

func ParsePatterns

func ParsePatterns(r io.Reader) (patterns Patterns, err error)

ParsePatterns parse lines as `Patterns`.

Example

This example demonstrates parsing patterns with comment.

package main

import (
	"fmt"
	"strings"

	"github.com/flier/gohs/chimera"
)

func main() {
	patterns, err := chimera.ParsePatterns(strings.NewReader(`
# empty line and comment will be skipped

1:/hatstand.*teakettle/s
2:/(hatstand|teakettle)/iH
3:/^.{10,20}hatstand/m
`))

	fmt.Println(err)

	for _, p := range patterns {
		fmt.Println(p)
	}

}
Output:

<nil>
1:/hatstand.*teakettle/s
2:/(hatstand|teakettle)/Hi
3:/^.{10,20}hatstand/m

func (Patterns) Build

func (p Patterns) Build(mode CompileMode) (Database, error)

Build the database with the given mode.

func (Patterns) ForPlatform

func (p Patterns) ForPlatform(mode CompileMode, platform hyperscan.Platform) (Database, error)

ForPlatform determine the target platform for the database.

func (Patterns) Patterns

func (p Patterns) Patterns() (r []*ch.Pattern)

type Scratch

type Scratch struct {
	// contains filtered or unexported fields
}

Scratch is a Chimera scratch space.

func NewManagedScratch

func NewManagedScratch(db Database) (*Scratch, error)

NewManagedScratch is a wrapper for NewScratch that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.

func NewScratch

func NewScratch(db Database) (*Scratch, error)

NewScratch allocate a "scratch" space for use by Chimera. This is required for runtime use, and one scratch space per thread, or concurrent caller, is required.

func (*Scratch) Clone

func (s *Scratch) Clone() (*Scratch, error)

Clone allocate a scratch space that is a clone of an existing scratch space.

func (*Scratch) Free

func (s *Scratch) Free() error

Free a scratch block previously allocated.

func (*Scratch) Realloc

func (s *Scratch) Realloc(db Database) error

Realloc reallocate the scratch for another database.

func (*Scratch) Size

func (s *Scratch) Size() (int, error)

Size provides the size of the given scratch space.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL