Documentation
¶
Overview ¶
Package hyperscan is the Golang binding for Intel's HyperScan regex matching library: [hyperscan.io](https://www.hyperscan.io/)
Hyperscan (https://github.com/intel/hyperscan) is a software regular expression matching engine designed with high performance and flexibility in mind. It is implemented as a library that exposes a straightforward C API.
Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data. Hyperscan is typically used in a DPI library stack. The Hyperscan API itself is composed of two major components:
Compilation ¶
These functions take a group of regular expressions, along with identifiers and option flags, and compile them into an immutable database that can be used by the Hyperscan scanning API. This compilation process performs considerable analysis and optimization work in order to build a database that will match the given expressions efficiently. If a pattern cannot be built into a database for any reason (such as the use of an unsupported expression construct, or the overflowing of a resource limit), an error will be returned by the pattern compiler. Compiled databases can be serialized and relocated, so that they can be stored to disk or moved between hosts. They can also be targeted to particular platform features (for example, the use of Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions).
See Compiling Patterns for more detail. (http://intel.github.io/hyperscan/dev-reference/compilation.html)
Scanning ¶
Once a Hyperscan database has been created, it can be used to scan data in memory. Hyperscan provides several scanning modes, depending on whether the data to be scanned is available as a single contiguous block, whether it is distributed amongst several blocks in memory at the same time, or whether it is to be scanned as a sequence of blocks in a stream. Matches are delivered to the application via a user-supplied callback function that is called synchronously for each match. For a given database, Hyperscan provides several guarantees:
1. No memory allocations occur at runtime with the exception of two fixed-size allocations, both of which should be done ahead of time for performance-critical applications:
1.1 Scratch space: temporary memory used for internal data at scan time. Structures in scratch space do not persist beyond the end of a single scan call.
1.2 Stream state: in streaming mode only, some state space is required to store data that persists between scan calls for each stream. This allows Hyperscan to track matches that span multiple blocks of data.
2. The sizes of the scratch space and stream state (in streaming mode) required for a given database are fixed and determined at database compile time. This means that the memory requirements of the application are known ahead of time, and these structures can be pre-allocated if required for performance reasons.
3. Any pattern that has successfully been compiled by the Hyperscan compiler can be scanned against any input. There are no internal resource limits or other limitations at runtime that could cause a scan call to return an error.
See Scanning for Patterns for more detail. (http://intel.github.io/hyperscan/dev-reference/runtime.html)
Building a Database ¶
The Hyperscan compiler API accepts regular expressions and converts them into a compiled pattern database that can then be used to scan data. Compilation allows the Hyperscan library to analyze the given pattern(s) and pre-determine how to scan for these patterns in an optimized fashion that would be far too expensive to compute at run-time. When compiling expressions, a decision needs to be made whether the resulting compiled patterns are to be used in a streaming, block or vectored mode:
1. Streaming mode: the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream. In streaming mode, each stream requires a block of memory to store its state between scan calls.
2. Block mode: the target data is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.
3. Vectored mode: the target data consists of a list of non-contiguous blocks that are available all at once. As for block mode, no retention of state is required.
Index ¶
- Constants
- Variables
- func Match(pattern string, data []byte) (bool, error)
- func MatchReader(pattern string, reader io.Reader) (bool, error)
- func MatchString(pattern, s string) (matched bool, err error)
- func Quote(s string) string
- func SerializedDatabaseSize(data []byte) (int, error)
- func ValidPlatform() error
- func Version() string
- type BlockDatabase
- type BlockMatcher
- type BlockScanner
- type Builder
- type CompileError
- type CompileFlag
- type CpuFeature
- type Database
- type DatabaseBuilder
- type DbInfo
- type Error
- type ExprExt
- type ExprInfo
- type Ext
- type ExtFlag
- type HsError
- type MatchContext
- type MatchEvent
- type MatchHandler
- type ModeFlag
- type Pattern
- func (p *Pattern) Build(mode ModeFlag) (Database, error)
- func (p *Pattern) Ext() (*ExprExt, error)
- func (p *Pattern) ForPlatform(mode ModeFlag, platform Platform) (Database, error)
- func (p *Pattern) Info() (*ExprInfo, error)
- func (p *Pattern) IsValid() bool
- func (p *Pattern) Pattern() *hs.Pattern
- func (p *Pattern) Patterns() []*hs.Pattern
- func (p *Pattern) String() string
- func (p *Pattern) WithExt(exts ...Ext) *Pattern
- type Patterns
- type Platform
- type ScanFlag
- type Scratch
- type Stream
- type StreamCompressor
- type StreamDatabase
- func NewLargeStreamDatabase(patterns ...*Pattern) (sdb StreamDatabase, err error)
- func NewManagedStreamDatabase(patterns ...*Pattern) (StreamDatabase, error)
- func NewMediumStreamDatabase(patterns ...*Pattern) (sdb StreamDatabase, err error)
- func NewStreamDatabase(patterns ...*Pattern) (sdb StreamDatabase, err error)
- func UnmarshalStreamDatabase(data []byte) (StreamDatabase, error)
- type StreamMatcher
- type StreamScanner
- type TuneFlag
- type VectoredDatabase
- type VectoredMatcher
- type VectoredScanner
Examples ¶
Constants ¶
const ( // FloatNumber for matching floating point numbers. FloatNumber = `(?:` + `[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?.)` // IPv4Address for matching IPv4 address. IPv4Address = `(?:` + `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}` + `(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))` // EmailAddress for matching email address. EmailAddress = `(?:` + `^[A-Za-z0-9](([_\.\-]?[a-zA-Z0-9]+)*)@` + `([A-Za-z0-9]+)(([\.\-]?[a-zA-Z0-9]+)*)\.([A-Za-z]{2,})$)` // CreditCard for matching credit card number. CreditCard = `(?:` + `4[0-9]{12}(?:[0-9]{3})?|` + `5[1-5][0-9]{14}|` + `3[47][0-9]{13}|` + `3(?:0[0-5]|[68][0-9])[0-9]{11}|` + `6(?:011|5[0-9]{2})[0-9]{12}|` + `(?:2131|1800|35\d{3})\d{11})` // JCB )
Variables ¶
var ErrTooManyMatches = errors.New("too many matches")
ErrTooManyMatches means too many matches.
Functions ¶
func Match ¶
Match reports whether the byte slice b contains any match of the regular expression pattern.
Example ¶
package main import ( "fmt" "github.com/flier/gohs/hyperscan" ) func main() { matched, err := hyperscan.Match(`foo.*`, []byte(`seafood`)) fmt.Println(matched, err) matched, err = hyperscan.Match(`bar.*`, []byte(`seafood`)) fmt.Println(matched, err) matched, err = hyperscan.Match(`a(b`, []byte(`seafood`)) fmt.Println(matched, err) }
Output: true <nil> false <nil> false parse pattern, pattern `a(b`, Missing close parenthesis for group started at index 1.
func MatchReader ¶
MatchReader reports whether the text returned by the Reader contains any match of the regular expression pattern.
Example ¶
package main import ( "fmt" "strings" "github.com/flier/gohs/hyperscan" ) func main() { s := strings.NewReader(strings.Repeat("a", 4096) + `seafood`) matched, err := hyperscan.MatchReader(`foo.*`, s) fmt.Println(matched, err) matched, err = hyperscan.MatchReader(`bar.*`, s) fmt.Println(matched, err) matched, err = hyperscan.MatchReader(`a(b`, s) fmt.Println(matched, err) }
Output: true <nil> false <nil> false parse pattern, pattern `a(b`, Missing close parenthesis for group started at index 1.
func MatchString ¶
MatchString reports whether the string s contains any match of the regular expression pattern.
func SerializedDatabaseSize ¶
SerializedDatabaseSize reports the size that would be required by a database if it were deserialized.
Types ¶
type BlockDatabase ¶
type BlockDatabase interface { Database BlockScanner BlockMatcher }
BlockDatabase scan the target data that is a discrete, contiguous block which can be scanned in one call and does not require state to be retained.
func NewBlockDatabase ¶
func NewBlockDatabase(patterns ...*Pattern) (bdb BlockDatabase, err error)
NewBlockDatabase create a block database base on the patterns.
func NewManagedBlockDatabase ¶ added in v1.1.1
func NewManagedBlockDatabase(patterns ...*Pattern) (BlockDatabase, error)
NewManagedBlockDatabase is a wrapper for NewBlockDatabase that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.
func UnmarshalBlockDatabase ¶
func UnmarshalBlockDatabase(data []byte) (BlockDatabase, error)
UnmarshalBlockDatabase reconstruct a block database from a stream of bytes.
type BlockMatcher ¶
type BlockMatcher interface { // Find returns a slice holding the text of the leftmost match in b of the regular expression. // A return value of nil indicates no match. Find(data []byte) []byte // FindIndex returns a two-element slice of integers defining // the location of the leftmost match in b of the regular expression. // The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match. FindIndex(data []byte) []int // FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAll(data []byte, n int) [][]byte // FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllIndex(data []byte, n int) [][]int // FindString returns a string holding the text of the leftmost match in s of the regular expression. // If there is no match, the return value is an empty string, but it will also be empty // if the regular expression successfully matches an empty string. // Use FindStringIndex if it is necessary to distinguish these cases. FindString(s string) string // FindStringIndex returns a two-element slice of integers defining // the location of the leftmost match in s of the regular expression. // The match itself is at s[loc[0]:loc[1]]. A return value of nil indicates no match. FindStringIndex(s string) []int // FindAllString is the 'All' version of FindString; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllString(s string, n int) []string // FindAllStringIndex is the 'All' version of FindStringIndex; // it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllStringIndex(s string, n int) [][]int // Match reports whether the pattern database matches the byte slice b. Match(b []byte) bool // MatchString reports whether the pattern database matches the string s. MatchString(s string) bool }
BlockMatcher implements regular expression search.
type BlockScanner ¶
type BlockScanner interface { // This is the function call in which the actual pattern matching takes place for block-mode pattern databases. Scan(data []byte, scratch *Scratch, handler MatchHandler, context interface{}) error }
BlockScanner is the block (non-streaming) regular expression scanner.
Example ¶
package main import ( "fmt" "github.com/flier/gohs/hyperscan" ) func main() { // Pattern with `L` flag enable leftmost start of match reporting. p, err := hyperscan.ParsePattern(`/foo(bar)+/L`) if err != nil { fmt.Println("parse pattern failed,", err) return } // Create new block database with pattern db, err := hyperscan.NewBlockDatabase(p) if err != nil { fmt.Println("create database failed,", err) return } defer db.Close() // Create new scratch for scanning s, err := hyperscan.NewScratch(db) if err != nil { fmt.Println("create scratch failed,", err) return } defer func() { _ = s.Free() }() // Record matching text type Match struct { from uint64 to uint64 } var matches []Match handler := hyperscan.MatchHandler(func(id uint, from, to uint64, flags uint, context interface{}) error { matches = append(matches, Match{from, to}) return nil }) data := []byte("hello foobarbar!") // Scan data block with handler if err := db.Scan(data, s, handler, nil); err != nil { fmt.Println("database scan failed,", err) return } // Hyperscan will reports all matches for _, m := range matches { fmt.Println("match [", m.from, ":", m.to, "]", string(data[m.from:m.to])) } }
Output: match [ 6 : 12 ] foobar match [ 6 : 15 ] foobarbar
type Builder ¶ added in v1.1.1
type Builder interface { // Build the database with the given mode. Build(mode ModeFlag) (Database, error) // ForPlatform determine the target platform for the database ForPlatform(mode ModeFlag, platform Platform) (Database, error) }
Builder creates a database with the given mode and target platform.
type CompileError ¶ added in v1.2.0
type CompileError = hs.CompileError
A type containing error details that is returned by the compile calls on failure.
The caller may inspect the values returned in this type to determine the cause of failure.
type CompileFlag ¶
type CompileFlag = hs.CompileFlag
CompileFlag represents a pattern flag.
const ( // Caseless represents set case-insensitive matching. Caseless CompileFlag = hs.Caseless // DotAll represents matching a `.` will not exclude newlines. DotAll CompileFlag = hs.DotAll // MultiLine set multi-line anchoring. MultiLine CompileFlag = hs.MultiLine // SingleMatch set single-match only mode. SingleMatch CompileFlag = hs.SingleMatch // AllowEmpty allow expressions that can match against empty buffers. AllowEmpty CompileFlag = hs.AllowEmpty // Utf8Mode enable UTF-8 mode for this expression. Utf8Mode CompileFlag = hs.Utf8Mode // UnicodeProperty enable Unicode property support for this expression. UnicodeProperty CompileFlag = hs.UnicodeProperty // PrefilterMode enable prefiltering mode for this expression. PrefilterMode CompileFlag = hs.PrefilterMode // SomLeftMost enable leftmost start of match reporting. SomLeftMost CompileFlag = hs.SomLeftMost )
const ( // Combination represents logical combination. Combination CompileFlag = hs.Combination // Quiet represents don't do any match reporting. Quiet CompileFlag = hs.Quiet )
func ParseCompileFlag ¶
func ParseCompileFlag(s string) (CompileFlag, error)
ParseCompileFlag parse the compile pattern flags from string
i Caseless Case-insensitive matching s DotAll Dot (.) will match newlines m MultiLine Multi-line anchoring H SingleMatch Report match ID at most once (`o` deprecated) V AllowEmpty Allow patterns that can match against empty buffers (`e` deprecated) 8 Utf8Mode UTF-8 mode (`u` deprecated) W UnicodeProperty Unicode property support (`p` deprecated) P PrefilterMode Prefiltering mode (`f` deprecated) L SomLeftMost Leftmost start of match reporting (`l` deprecated) C Combination Logical combination of patterns (Hyperscan 5.0) Q Quiet Quiet at matching (Hyperscan 5.0)
type CpuFeature ¶
type CpuFeature = hs.CpuFeature //nolint: golint,stylecheck,revive
CpuFeature is the CPU feature support flags.
const ( // AVX2 is a CPU features flag indicates that the target platform supports AVX2 instructions. AVX2 CpuFeature = hs.AVX2 // AVX512 is a CPU features flag indicates that the target platform supports AVX512 instructions, // specifically AVX-512BW. Using AVX512 implies the use of AVX2. AVX512 CpuFeature = hs.AVX512 )
type Database ¶
type Database interface { // Provides information about a database. Info() (DbInfo, error) // Provides the size of the given database in bytes. Size() (int, error) // Free a compiled pattern database. Close() error // Serialize a pattern database to a stream of bytes. Marshal() ([]byte, error) // Reconstruct a pattern database from a stream of bytes at a given memory location. Unmarshal(b []byte) error }
Database is an immutable database that can be used by the Hyperscan scanning API.
func Compile ¶
Compile a regular expression and returns, if successful, a pattern database in the block mode that can be used to match against text.
func MustCompile ¶
MustCompile is like Compile but panics if the expression cannot be parsed. It simplifies safe initialization of global variables holding compiled regular expressions.
func UnmarshalDatabase ¶
UnmarshalDatabase reconstruct a pattern database from a stream of bytes.
type DatabaseBuilder ¶
type DatabaseBuilder struct { // Array of patterns to compile. Patterns // Compiler mode flags that affect the database as a whole. (Default: block mode) Mode ModeFlag // If not nil, the platform structure is used to determine the target platform for the database. // If nil, a database suitable for running on the current host platform is produced. Platform Platform }
DatabaseBuilder creates a database that will be used to matching the patterns.
func (*DatabaseBuilder) AddExpressionWithFlags ¶
func (b *DatabaseBuilder) AddExpressionWithFlags(expr string, flags CompileFlag) *DatabaseBuilder
AddExpressionWithFlags add more expressions with flags to the database.
func (*DatabaseBuilder) AddExpressions ¶
func (b *DatabaseBuilder) AddExpressions(exprs ...string) *DatabaseBuilder
AddExpressions add more expressions to the database.
func (*DatabaseBuilder) Build ¶
func (b *DatabaseBuilder) Build() (Database, error)
Build a database base on the expressions and platform.
type DbInfo ¶
type DbInfo string //nolint: stylecheck
DbInfo identify the version and platform information for the supplied database.
func SerializedDatabaseInfo ¶
SerializedDatabaseInfo provides information about a serialized database.
type Error ¶ added in v1.2.0
Error is the type type for errors returned by Hyperscan functions.
const ( // ErrSuccess is the error returned if the engine completed normally. ErrSuccess Error = hs.ErrSuccess // ErrInvalid is the error returned if a parameter passed to this function was invalid. ErrInvalid Error = hs.ErrInvalid // ErrNoMemory is the error returned if a memory allocation failed. ErrNoMemory Error = hs.ErrNoMemory // ErrScanTerminated is the error returned if the engine was terminated by callback. ErrScanTerminated Error = hs.ErrScanTerminated // ErrCompileError is the error returned if the pattern compiler failed. ErrCompileError Error = hs.ErrCompileError // ErrDatabaseVersionError is the error returned if the given database was built for a different version of Hyperscan. ErrDatabaseVersionError Error = hs.ErrDatabaseVersionError // ErrDatabasePlatformError is the error returned if the given database was built for a different platform. ErrDatabasePlatformError Error = hs.ErrDatabasePlatformError // ErrDatabaseModeError is the error returned if the given database was built for a different mode of operation. ErrDatabaseModeError Error = hs.ErrDatabaseModeError // ErrBadAlign is the error returned if a parameter passed to this function was not correctly aligned. ErrBadAlign Error = hs.ErrBadAlign // ErrBadAlloc is the error returned if the memory allocator did not correctly return memory suitably aligned. ErrBadAlloc Error = hs.ErrBadAlloc // ErrScratchInUse is the error returned if the scratch region was already in use. ErrScratchInUse Error = hs.ErrScratchInUse // ErrArchError is the error returned if unsupported CPU architecture. ErrArchError Error = hs.ErrArchError // ErrInsufficientSpace is the error returned if provided buffer was too small. ErrInsufficientSpace Error = hs.ErrInsufficientSpace )
type ExprExt ¶
ExprExt is a structure containing additional parameters related to an expression.
func NewExprExt ¶ added in v1.2.0
func ParseExprExt ¶ added in v1.1.0
ParseExprExt parse containing additional parameters from string.
type Ext ¶ added in v1.1.0
type Ext func(ext *ExprExt)
Ext is a option containing additional parameters related to an expression.
func EditDistance ¶
EditDistance allow patterns to approximately match within this edit distance.
func HammingDistance ¶
HammingDistance allow patterns to approximately match within this Hamming distance.
func MaxOffset ¶
MaxOffset given the maximum end offset in the data stream at which this expression should match successfully.
type ExtFlag ¶
ExtFlag are used in ExprExt.Flags to indicate which fields are used.
const ( // ExtMinOffset is a flag indicating that the ExprExt.MinOffset field is used. ExtMinOffset ExtFlag = hs.ExtMinOffset // ExtMaxOffset is a flag indicating that the ExprExt.MaxOffset field is used. ExtMaxOffset ExtFlag = hs.ExtMaxOffset // ExtMinLength is a flag indicating that the ExprExt.MinLength field is used. ExtMinLength ExtFlag = hs.ExtMinLength // ExtEditDistance is a flag indicating that the ExprExt.EditDistance field is used. ExtEditDistance ExtFlag = hs.ExtEditDistance // ExtHammingDistance is a flag indicating that the ExprExt.HammingDistance field is used. ExtHammingDistance ExtFlag = hs.ExtHammingDistance )
type HsError ¶
type HsError = Error
HsError is the type type for errors returned by Hyperscan functions.
type MatchContext ¶
MatchContext represents a match context.
type MatchEvent ¶
MatchEvent indicates a match event.
type ModeFlag ¶
ModeFlag represents the compile mode flags.
const ( // BlockMode for the block scan (non-streaming) database. BlockMode ModeFlag = hs.BlockMode // NoStreamMode is alias for Block. NoStreamMode ModeFlag = hs.NoStreamMode // StreamMode for the streaming database. StreamMode ModeFlag = hs.StreamMode // VectoredMode for the vectored scanning database. VectoredMode ModeFlag = hs.VectoredMode // SomHorizonLargeMode use full precision to track start of match offsets in stream state. SomHorizonLargeMode ModeFlag = hs.SomHorizonLargeMode // SomHorizonMediumMode use medium precision to track start of match offsets in stream state (within 2^32 bytes). SomHorizonMediumMode ModeFlag = hs.SomHorizonMediumMode // SomHorizonSmallMode use limited precision to track start of match offsets in stream state (within 2^16 bytes). SomHorizonSmallMode ModeFlag = hs.SomHorizonSmallMode )
func ParseModeFlag ¶
ParseModeFlag parse a database mode from string.
type Pattern ¶
type Pattern struct { Expression string // The expression to parse. Flags CompileFlag // Flags which modify the behaviour of the expression. // The ID number to be associated with the corresponding pattern Id int //nolint: revive,stylecheck // contains filtered or unexported fields }
Pattern is a matching pattern.
Example ¶
This example demonstrates construct and match a pattern.
package main import ( "fmt" "github.com/flier/gohs/hyperscan" ) func main() { p := hyperscan.NewPattern(`foo.*bar`, hyperscan.Caseless) fmt.Println(p) db, err := hyperscan.NewBlockDatabase(p) fmt.Println(err) found := db.MatchString("fooxyzbarbar") fmt.Println(found) }
Output: /foo.*bar/i <nil> true
func NewPattern ¶
func NewPattern(expr string, flags CompileFlag, exts ...Ext) *Pattern
NewPattern returns a new pattern base on expression and compile flags.
func ParsePattern ¶
ParsePattern parse pattern from a formated string.
<integer id>:/<expression>/<flags>
For example, the following pattern will match `test` in the caseless and multi-lines mode
/test/im
Example ¶
This example demonstrates parsing pattern with id and flags.
package main import ( "fmt" "github.com/flier/gohs/hyperscan" ) func main() { p, err := hyperscan.ParsePattern("3:/foobar/i8") fmt.Println(err) fmt.Println(p.Id) fmt.Println(p.Expression) fmt.Println(p.Flags) }
Output: <nil> 3 foobar 8i
func (*Pattern) ForPlatform ¶ added in v1.1.1
ForPlatform determine the target platform for the database.
type Patterns ¶ added in v1.1.0
type Patterns []*Pattern
Patterns is a set of matching patterns.
func ParsePatterns ¶ added in v1.1.0
ParsePatterns parse lines as `Patterns`.
Example ¶
This example demonstrates parsing patterns with comment.
package main import ( "fmt" "strings" "github.com/flier/gohs/hyperscan" ) func main() { patterns, err := hyperscan.ParsePatterns(strings.NewReader(` # empty line and comment will be skipped 1:/hatstand.*teakettle/s 2:/(hatstand|teakettle)/iH 3:/^.{10,20}hatstand/m `)) fmt.Println(err) for _, p := range patterns { fmt.Println(p) } }
Output: <nil> 1:/hatstand.*teakettle/s 2:/(hatstand|teakettle)/Hi 3:/^.{10,20}hatstand/m
func (Patterns) ForPlatform ¶ added in v1.1.1
ForPlatform determine the target platform for the database.
type Platform ¶
type Platform interface { // Information about the target platform which may be used to guide the optimisation process of the compile. Tune() TuneFlag // Relevant CPU features available on the target platform CpuFeatures() CpuFeature }
Platform is a type containing information on the target platform.
func NewPlatform ¶
func NewPlatform(tune TuneFlag, cpu CpuFeature) Platform
NewPlatform create a new platform information on the target platform.
func PopulatePlatform ¶
func PopulatePlatform() Platform
PopulatePlatform populates the platform information based on the current host.
type Scratch ¶
type Scratch struct {
// contains filtered or unexported fields
}
Scratch is a Hyperscan scratch space.
func NewManagedScratch ¶ added in v1.1.1
NewManagedScratch is a wrapper for NewScratch that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.
func NewScratch ¶
NewScratch allocate a "scratch" space for use by Hyperscan. This is required for runtime use, and one scratch space per thread, or concurrent caller, is required.
func (*Scratch) Clone ¶
Clone allocate a scratch space that is a clone of an existing scratch space.
type Stream ¶
type Stream interface { Scan(data []byte) error Close() error Reset() error Clone() (Stream, error) }
Stream exist in the Hyperscan library so that pattern matching state can be maintained across multiple blocks of target data.
type StreamCompressor ¶
type StreamCompressor interface { // Creates a compressed representation of the provided stream in the buffer provided. Compress(stream Stream) ([]byte, error) // Decompresses a compressed representation created by `CompressStream` into a new stream. Expand(buf []byte, flags ScanFlag, scratch *Scratch, handler MatchHandler, context interface{}) (Stream, error) // Decompresses a compressed representation created by `CompressStream` on top of the 'to' stream. ResetAndExpand(stream Stream, buf []byte, flags ScanFlag, scratch *Scratch, handler MatchHandler, context interface{}) (Stream, error) }
StreamCompressor implements stream compressor.
type StreamDatabase ¶
type StreamDatabase interface { Database StreamScanner StreamMatcher StreamCompressor StreamSize() (int, error) }
StreamDatabase scan the target data to be scanned is a continuous stream, not all of which is available at once; blocks of data are scanned in sequence and matches may span multiple blocks in a stream.
func NewLargeStreamDatabase ¶
func NewLargeStreamDatabase(patterns ...*Pattern) (sdb StreamDatabase, err error)
NewLargeStreamDatabase create a large-sized stream database base on the patterns.
func NewManagedStreamDatabase ¶ added in v1.1.1
func NewManagedStreamDatabase(patterns ...*Pattern) (StreamDatabase, error)
NewManagedStreamDatabase is a wrapper for NewStreamDatabase that sets a finalizer on the Scratch instance so that memory is freed once the object is no longer in use.
func NewMediumStreamDatabase ¶
func NewMediumStreamDatabase(patterns ...*Pattern) (sdb StreamDatabase, err error)
NewMediumStreamDatabase create a medium-sized stream database base on the patterns.
func NewStreamDatabase ¶
func NewStreamDatabase(patterns ...*Pattern) (sdb StreamDatabase, err error)
NewStreamDatabase create a stream database base on the patterns.
func UnmarshalStreamDatabase ¶
func UnmarshalStreamDatabase(data []byte) (StreamDatabase, error)
UnmarshalStreamDatabase reconstruct a stream database from a stream of bytes.
type StreamMatcher ¶
type StreamMatcher interface { // Find returns a slice holding the text of the leftmost match in b of the regular expression. // A return value of nil indicates no match. Find(reader io.ReadSeeker) []byte // FindIndex returns a two-element slice of integers defining // the location of the leftmost match in b of the regular expression. // The match itself is at b[loc[0]:loc[1]]. A return value of nil indicates no match. FindIndex(reader io.Reader) []int // FindAll is the 'All' version of Find; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAll(reader io.ReadSeeker, n int) [][]byte // FindAllIndex is the 'All' version of FindIndex; it returns a slice of all successive matches of the expression, // as defined by the 'All' description in the package comment. A return value of nil indicates no match. FindAllIndex(reader io.Reader, n int) [][]int // Match reports whether the pattern database matches the byte slice b. Match(reader io.Reader) bool }
StreamMatcher implements regular expression search.
type StreamScanner ¶
type StreamScanner interface { Open(flags ScanFlag, scratch *Scratch, handler MatchHandler, context interface{}) (Stream, error) Scan(reader io.Reader, scratch *Scratch, handler MatchHandler, context interface{}) error }
StreamScanner is the streaming regular expression scanner.
Example ¶
package main import ( "fmt" "github.com/flier/gohs/hyperscan" ) func main() { //nolint:funlen // Pattern with `L` flag enable leftmost start of match reporting. p, err := hyperscan.ParsePattern(`/foo(bar)+/L`) if err != nil { fmt.Println("parse pattern failed,", err) return } // Create new stream database with pattern db, err := hyperscan.NewStreamDatabase(p) if err != nil { fmt.Println("create database failed,", err) return } defer db.Close() // Create new scratch for scanning s, err := hyperscan.NewScratch(db) if err != nil { fmt.Println("create scratch failed,", err) return } defer func() { _ = s.Free() }() // Record matching text type Match struct { from uint64 to uint64 } var matches []Match handler := hyperscan.MatchHandler(func(id uint, from, to uint64, flags uint, context interface{}) error { matches = append(matches, Match{from, to}) return nil }) data := []byte("hello foobarbar!") // Open stream with handler st, err := db.Open(0, s, handler, nil) if err != nil { fmt.Println("open streaming database failed,", err) return } // Scan data with stream for i := 0; i < len(data); i += 4 { start := i end := i + 4 if end > len(data) { end = len(data) } if err = st.Scan(data[start:end]); err != nil { fmt.Println("streaming scan failed,", err) return } } // Close stream if err = st.Close(); err != nil { fmt.Println("streaming scan failed,", err) return } // Hyperscan will reports all matches for _, m := range matches { fmt.Println("match [", m.from, ":", m.to, "]", string(data[m.from:m.to])) } }
Output: match [ 6 : 12 ] foobar match [ 6 : 15 ] foobarbar
type TuneFlag ¶
const ( // Generic indicates that the compiled database should not be tuned for any particular target platform. Generic TuneFlag = hs.Generic // SandyBridge indicates that the compiled database should be tuned for the Sandy Bridge microarchitecture. SandyBridge TuneFlag = hs.SandyBridge // IvyBridge indicates that the compiled database should be tuned for the Ivy Bridge microarchitecture. IvyBridge TuneFlag = hs.IvyBridge // Haswell indicates that the compiled database should be tuned for the Haswell microarchitecture. Haswell TuneFlag = hs.Haswell // Silvermont indicates that the compiled database should be tuned for the Silvermont microarchitecture. Silvermont TuneFlag = hs.Silvermont // Broadwell indicates that the compiled database should be tuned for the Broadwell microarchitecture. Broadwell TuneFlag = hs.Broadwell // Skylake indicates that the compiled database should be tuned for the Skylake microarchitecture. Skylake TuneFlag = hs.Skylake // SkylakeServer indicates that the compiled database should be tuned for the Skylake Server microarchitecture. SkylakeServer TuneFlag = hs.SkylakeServer // Goldmont indicates that the compiled database should be tuned for the Goldmont microarchitecture. Goldmont TuneFlag = hs.Goldmont )
type VectoredDatabase ¶
type VectoredDatabase interface { Database VectoredScanner VectoredMatcher }
VectoredDatabase scan the target data that consists of a list of non-contiguous blocks that are available all at once.
func NewVectoredDatabase ¶
func NewVectoredDatabase(patterns ...*Pattern) (vdb VectoredDatabase, err error)
NewVectoredDatabase create a vectored database base on the patterns.
func UnmarshalVectoredDatabase ¶
func UnmarshalVectoredDatabase(data []byte) (VectoredDatabase, error)
UnmarshalVectoredDatabase reconstruct a vectored database from a stream of bytes.
type VectoredMatcher ¶
type VectoredMatcher interface{}
VectoredMatcher implements regular expression search.
type VectoredScanner ¶
type VectoredScanner interface {
Scan(data [][]byte, scratch *Scratch, handler MatchHandler, context interface{}) error
}
VectoredScanner is the vectored regular expression scanner.
Example ¶
package main import ( "fmt" "github.com/flier/gohs/hyperscan" ) func main() { // Pattern with `L` flag enable leftmost start of match reporting. p, err := hyperscan.ParsePattern(`/foo(bar)+/L`) if err != nil { fmt.Println("parse pattern failed,", err) return } // Create new vectored database with pattern db, err := hyperscan.NewVectoredDatabase(p) if err != nil { fmt.Println("create database failed,", err) return } defer db.Close() // Create new scratch for scanning s, err := hyperscan.NewScratch(db) if err != nil { fmt.Println("create scratch failed,", err) return } defer func() { _ = s.Free() }() // Record matching text type Match struct { from uint64 to uint64 } var matches []Match handler := hyperscan.MatchHandler(func(id uint, from, to uint64, flags uint, context interface{}) error { matches = append(matches, Match{from, to}) return nil }) data := []byte("hello foobarbar!") // Scan vectored data with handler if err := db.Scan([][]byte{data[:8], data[8:12], data[12:]}, s, handler, nil); err != nil { fmt.Println("database scan failed,", err) return } // Hyperscan will reports all matches for _, m := range matches { fmt.Println("match [", m.from, ":", m.to, "]", string(data[m.from:m.to])) } }
Output: match [ 6 : 12 ] foobar match [ 6 : 15 ] foobarbar