Documentation ¶
Overview ¶
Chimera is a software regular expression matching engine that is a hybrid of Hyperscan and PCRE. The design goals of Chimera are to fully support PCRE syntax as well as to take advantage of the high performance nature of Hyperscan.
Chimera inherits the design guideline of Hyperscan with C APIs for compilation and scanning.
The Chimera API itself is composed of two major components:
Compilation ¶
These functions take a group of regular expressions, along with identifiers and option flags, and compile them into an immutable database that can be used by the Chimera scanning API. This compilation process performs considerable analysis and optimization work in order to build a database that will match the given expressions efficiently.
See Compiling Patterns for more details (https://intel.github.io/hyperscan/dev-reference/chimera.html#chcompile)
Scanning ¶
Once a Chimera database has been created, it can be used to scan data in memory. Chimera only supports block mode in which we scan a single contiguous block in memory.
Matches are delivered to the application via a user-supplied callback function that is called synchronously for each match.
For a given database, Chimera provides several guarantees:
1 No memory allocations occur at runtime with the exception of scratch space allocation, it should be done ahead of time for performance-critical applications:
2 Scratch space: temporary memory used for internal data at scan time. Structures in scratch space do not persist beyond the end of a single scan call.
3 The size of the scratch space required for a given database is fixed and determined at database compile time. This means that the memory requirement of the application are known ahead of time, and the scratch space can be pre-allocated if required for performance reasons.
4 Any pattern that has successfully been compiled by the Chimera compiler can be scanned against any input. There could be internal resource limits or other limitations caused by PCRE at runtime that could cause a scan call to return an error.
* Note
Chimera is designed to have the same matching behavior as PCRE, including greedy/ungreedy, capturing, etc. Chimera reports both start offset and end offset for each match like PCRE. Different from the fashion of reporting all matches in Hyperscan, Chimera only reports non-overlapping matches. For example, the pattern /foofoo/ will match foofoofoofoo at offsets (0, 6) and (6, 12).
* Note
Since Chimera is a hybrid of Hyperscan and PCRE in order to support full PCRE syntax, there will be extra performance overhead compared to Hyperscan-only solution. Please always use Hyperscan for better performance unless you must need full PCRE syntax support.
See Scanning for Patterns for more details (https://intel.github.io/hyperscan/dev-reference/chimera.html#chruntime)
Index ¶
- func Match(pattern string, data []byte) (bool, error)
- func MatchString(pattern, s string) (matched bool, err error)
- func Quote(s string) string
- func Version() string
- type BlockDatabase
- type BlockMatcher
- type BlockScanner
- type Builder
- type Callback
- type Capture
- type CompileError
- type CompileFlag
- type CompileMode
- type Database
- type DatabaseBuilder
- type DbInfo
- type Error
- type ErrorEvent
- type Handler
- type HandlerFunc
- type Pattern
- type Patterns
- type Scratch
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Match ¶
Match reports whether the byte slice b contains any match of the regular expression pattern.
Example ¶
package main import ( "fmt" "github.com/flier/gohs/chimera" ) func main() { matched, err := chimera.Match(`foo.*`, []byte(`seafood`)) fmt.Println(matched, err) matched, err = chimera.Match(`bar.*`, []byte(`seafood`)) fmt.Println(matched, err) matched, err = chimera.Match(`a(b`, []byte(`seafood`)) fmt.Println(matched, err) }
Output: true <nil> false <nil> false create block database, PCRE compilation failed: missing ).
func MatchString ¶
MatchString reports whether the string s contains any match of the regular expression pattern.
Types ¶
type BlockDatabase ¶
type BlockDatabase interface { Database BlockScanner BlockMatcher }
BlockDatabase scan the target data that is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.
func NewBlockDatabase ¶
func NewBlockDatabase(patterns ...*Pattern) (bdb BlockDatabase, err error)
NewBlockDatabase compile expressions into a pattern database.
func NewManagedBlockDatabase ¶
func NewManagedBlockDatabase(patterns ...*Pattern) (BlockDatabase, error)
NewManagedBlockDatabase is a wrapper for NewBlockDatabase that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.
type BlockMatcher ¶
type BlockMatcher interface { // Find returns a slice holding the text of the leftmost match in b of the regular expression. // A return value of nil indicates no match. Find(data []byte) []byte // FindIndex returns a two-element slice of integers defining // the location of the leftmost match in b of the regular expression. // The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match. FindIndex(data []byte) []int // FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAll(data []byte, n int) [][]byte // FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllIndex(data []byte, n int) [][]int // FindString returns a string holding the text of the leftmost match in s of the regular expression. // If there is no match, the return value is an empty string, but it will also be empty // if the regular expression successfully matches an empty string. // Use FindStringIndex if it is necessary to distinguish these cases. FindString(s string) string // FindStringIndex returns a two-element slice of integers defining // the location of the leftmost match in s of the regular expression. // The match itself is at s[loc[0]:loc[1]]. A return value of nil indicates no match. FindStringIndex(s string) []int // FindAllString is the 'All' version of FindString; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllString(s string, n int) []string // FindAllStringIndex is the 'All' version of FindStringIndex; // it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllStringIndex(s string, n int) [][]int // Match reports whether the pattern database matches the byte slice b. Match(b []byte) bool // MatchString reports whether the pattern database matches the string s. MatchString(s string) bool }
BlockMatcher implements regular expression search.
type BlockScanner ¶
type BlockScanner interface { // This is the function call in which the actual pattern matching takes place for block-mode pattern databases. Scan(data []byte, scratch *Scratch, handler Handler, context interface{}) error }
BlockScanner is the block (non-streaming) regular expression scanner.
Example ¶
package main import ( "fmt" "github.com/flier/gohs/chimera" ) func main() { p, err := chimera.ParsePattern(`foo(bar)+`) if err != nil { fmt.Println("parse pattern failed,", err) return } // Create new block database with pattern db, err := chimera.NewBlockDatabase(p) if err != nil { fmt.Println("create database failed,", err) return } defer db.Close() // Create new scratch for scanning s, err := chimera.NewScratch(db) if err != nil { fmt.Println("create scratch failed,", err) return } defer func() { _ = s.Free() }() // Record matching text type Match struct { from uint64 to uint64 } var matches []Match handler := chimera.HandlerFunc(func(id uint, from, to uint64, flags uint, captured []*chimera.Capture, ctx interface{}, ) chimera.Callback { matches = append(matches, Match{from, to}) return chimera.Continue }) data := []byte("hello foobarbar!") // Scan data block with handler if err := db.Scan(data, s, handler, nil); err != nil { fmt.Println("database scan failed,", err) return } // chimera will reports all matches for _, m := range matches { fmt.Println("match [", m.from, ":", m.to, "]", string(data[m.from:m.to])) } }
Output: match [ 6 : 15 ] foobarbar
type Builder ¶
type Builder interface { // Build the database with the given mode. Build(mode CompileMode) (Database, error) // ForPlatform determine the target platform for the database ForPlatform(mode CompileMode, platform hyperscan.Platform) (Database, error) }
Builder creates a database with the given mode and target platform.
type Callback ¶
Callback return value used to tell the Chimera matcher what to do after processing this match.
type CompileError ¶
type CompileError = ch.CompileError
A type containing error details that is returned by the compile calls on failure.
The caller may inspect the values returned in this type to determine the cause of failure.
type CompileFlag ¶
type CompileFlag = ch.CompileFlag
CompileFlag represents a pattern flag.
const ( // Caseless represents set case-insensitive matching. Caseless CompileFlag = ch.Caseless // DotAll represents matching a `.` will not exclude newlines. DotAll CompileFlag = ch.DotAll // MultiLine set multi-line anchoring. MultiLine CompileFlag = ch.MultiLine // SingleMatch set single-match only mode. SingleMatch CompileFlag = ch.SingleMatch // Utf8Mode enable UTF-8 mode for this expression. Utf8Mode CompileFlag = ch.Utf8Mode // UnicodeProperty enable Unicode property support for this expression. UnicodeProperty CompileFlag = ch.UnicodeProperty )
func ParseCompileFlag ¶
func ParseCompileFlag(s string) (CompileFlag, error)
ParseCompileFlag parse the compile pattern flags from string
i Caseless Case-insensitive matching s DotAll Dot (.) will match newlines m MultiLine Multi-line anchoring H SingleMatch Report match ID at most once (`o` deprecated) 8 Utf8Mode UTF-8 mode (`u` deprecated) W UnicodeProperty Unicode property support (`p` deprecated)
type CompileMode ¶
type CompileMode = ch.CompileMode
CompileMode flags.
const ( // Disable capturing groups. NoGroups CompileMode = ch.NoGroups // Enable capturing groups. Groups CompileMode = ch.Groups )
type Database ¶
type Database interface { // Provides information about a database. Info() (DbInfo, error) // Provides the size of the given database in bytes. Size() (int, error) // Free a compiled pattern database. Close() error }
Database is an immutable database that can be used by the Chimera scanning API.
func Compile ¶
Compile a regular expression and returns, if successful, a pattern database in the block mode that can be used to match against text.
func MustCompile ¶
MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.
type DatabaseBuilder ¶
type DatabaseBuilder struct { // Array of patterns to compile. Patterns // Compiler mode flags that affect the database as a whole. (Default: capturing groups mode) Mode CompileMode // If not nil, the platform structure is used to determine the target platform for the database. // If nil, a database suitable for running on the current host platform is produced. hyperscan.Platform // A limit from pcre_extra on the amount of match function called in PCRE to limit backtracking that can take place. MatchLimit uint // A limit from pcre_extra on the recursion depth of match function in PCRE. MatchLimitRecursion uint }
DatabaseBuilder creates a database that will be used to matching the patterns.
func (*DatabaseBuilder) AddExpressionWithFlags ¶
func (b *DatabaseBuilder) AddExpressionWithFlags(expr string, flags CompileFlag) *DatabaseBuilder
AddExpressionWithFlags add more expressions with flags to the database.
func (*DatabaseBuilder) AddExpressions ¶
func (b *DatabaseBuilder) AddExpressions(exprs ...string) *DatabaseBuilder
AddExpressions add more expressions to the database.
func (*DatabaseBuilder) Build ¶
func (b *DatabaseBuilder) Build() (Database, error)
Build a database base on the expressions and platform.
type DbInfo ¶
type DbInfo string //nolint: stylecheck
DbInfo identify the version and platform information for the supplied database.
type Error ¶
Error is the type for errors returned by Chimera functions.
const ( // ErrSuccess is the error returned if the engine completed normally. ErrSuccess Error = ch.ErrSuccess // ErrInvalid is the error returned if a parameter passed to this function was invalid. ErrInvalid Error = ch.ErrInvalid // ErrNoMemory is the error returned if a memory allocation failed. ErrNoMemory Error = ch.ErrNoMemory // ErrScanTerminated is the error returned if the engine was terminated by callback. ErrScanTerminated Error = ch.ErrScanTerminated // ErrCompileError is the error returned if the pattern compiler failed. ErrCompileError Error = ch.ErrCompileError // ErrDatabaseVersionError is the error returned if the given database was built // for a different version of the Chimera matcher. ErrDatabaseVersionError Error = ch.ErrDatabaseVersionError // ErrDatabasePlatformError is the error returned if the given database was built for a different platform. ErrDatabasePlatformError Error = ch.ErrDatabasePlatformError // ErrDatabaseModeError is the error returned if the given database was built for a different mode of operation. ErrDatabaseModeError Error = ch.ErrDatabaseModeError // ErrBadAlign is the error returned if a parameter passed to this function was not correctly aligned. ErrBadAlign Error = ch.ErrBadAlign // ErrBadAlloc is the error returned if the memory allocator did not correctly return memory suitably aligned. ErrBadAlloc Error = ch.ErrBadAlloc // ErrScratchInUse is the error returned if the scratch region was already in use. ErrScratchInUse Error = ch.ErrScratchInUse )
type ErrorEvent ¶
type ErrorEvent = ch.ErrorEvent //nolint: errname
Type used to differentiate the errors raised with the `ErrorEventHandler` callback.
const ( // PCRE hits its match limit and reports PCRE_ERROR_MATCHLIMIT. ErrMatchLimit ErrorEvent = ch.ErrMatchLimit // PCRE hits its recursion limit and reports PCRE_ERROR_RECURSIONLIMIT. ErrRecursionLimit ErrorEvent = ch.ErrRecursionLimit )
type Handler ¶
type Handler interface { // OnMatch will be invoked whenever a match is located in the target data during the execution of a scan. OnMatch(id uint, from, to uint64, flags uint, captured []*Capture, context interface{}) Callback // OnError will be invoked when an error event occurs during matching; // this indicates that some matches for a given expression may not be reported. OnError(event ErrorEvent, id uint, info, context interface{}) Callback }
Definition of the chimera event callback handler.
type HandlerFunc ¶
type HandlerFunc func(id uint, from, to uint64, flags uint, captured []*Capture, context interface{}) Callback
HandlerFunc type is an adapter to allow the use of ordinary functions as Chimera handlers. If f is a function with the appropriate signature, HandlerFunc(f) is a Handler that calls f.
func (HandlerFunc) OnError ¶
func (f HandlerFunc) OnError(event ErrorEvent, id uint, info, context interface{}) Callback
OnError will be invoked when an error event occurs during matching; this indicates that some matches for a given expression may not be reported.
type Pattern ¶
Pattern is a matching pattern.
Example ¶
This example demonstrates construct and match a pattern.
package main import ( "fmt" "github.com/flier/gohs/chimera" ) func main() { p := chimera.NewPattern(`foo.*bar`, chimera.Caseless) fmt.Println(p) db, err := chimera.NewBlockDatabase(p) fmt.Println(err) found := db.MatchString("fooxyzbarbar") fmt.Println(found) }
Output: /foo.*bar/i <nil> true
func NewPattern ¶
func NewPattern(expr string, flags CompileFlag) *Pattern
NewPattern returns a new pattern base on expression and compile flags.
func ParsePattern ¶
ParsePattern parse pattern from a formated string.
<integer id>:/<expression>/<flags>
For example, the following pattern will match `test` in the caseless and multi-lines mode
/test/im
Example ¶
This example demonstrates parsing pattern with id and flags.
package main import ( "fmt" "github.com/flier/gohs/chimera" ) func main() { p, err := chimera.ParsePattern("3:/foobar/i8") fmt.Println(err) fmt.Println(p.ID) fmt.Println(p.Expression) fmt.Println(p.Flags) }
Output: <nil> 3 foobar 8i
func (*Pattern) Build ¶
func (p *Pattern) Build(mode CompileMode) (Database, error)
Build the database with the given mode.
func (*Pattern) ForPlatform ¶
ForPlatform determine the target platform for the database.
type Patterns ¶
type Patterns []*Pattern
Patterns is a set of matching patterns.
func ParsePatterns ¶
ParsePatterns parse lines as `Patterns`.
Example ¶
This example demonstrates parsing patterns with comment.
package main import ( "fmt" "strings" "github.com/flier/gohs/chimera" ) func main() { patterns, err := chimera.ParsePatterns(strings.NewReader(` # empty line and comment will be skipped 1:/hatstand.*teakettle/s 2:/(hatstand|teakettle)/iH 3:/^.{10,20}hatstand/m `)) fmt.Println(err) for _, p := range patterns { fmt.Println(p) } }
Output: <nil> 1:/hatstand.*teakettle/s 2:/(hatstand|teakettle)/Hi 3:/^.{10,20}hatstand/m
func (Patterns) Build ¶
func (p Patterns) Build(mode CompileMode) (Database, error)
Build the database with the given mode.
func (Patterns) ForPlatform ¶
ForPlatform determine the target platform for the database.
type Scratch ¶
type Scratch struct {
// contains filtered or unexported fields
}
Scratch is a Chimera scratch space.
func NewManagedScratch ¶
NewManagedScratch is a wrapper for NewScratch that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.
func NewScratch ¶
NewScratch allocate a "scratch" space for use by Chimera. This is required for runtime use, and one scratch space per thread, or concurrent caller, is required.
func (*Scratch) Clone ¶
Clone allocate a scratch space that is a clone of an existing scratch space.