Documentation ¶
Overview ¶
Package fastmatch provides a code generation tool for quickly comparing an input string to a set of possible matches which are known at compile time.
A typical use of this would be a "reverse enum", such as in a parser which needs to compare a string to a list of keywords and return the corresponding lexer symbol.
Normally, the easiest way to do this would be with a switch statement, such as:
switch (input) { case "foo": return foo case "bar": return bar case "baz": return baz }
The compiled code for the above will compare the input to each string in sequence. If input doesn't match "foo", we try to match "bar", then "baz". The matching process starts anew for each case. If we have lots of possible matches, this can be a lot of wasted effort.
Another option would be to use a map, on the (probably valid) assumption that Go's map lookups are faster than executing a bunch of string comparisons in sequence:
match := map[string]int{ "foo": foo, "bar": bar, "baz": baz, } return match[input]
The compiled code for the above will recreate the map at runtime. We thus have to hash each possible match every time the map is initialized, allocate memory, garbage collect it, etc. More wasted effort.
And this is all not to mention the potential complications related to case-insensitive matching, partial matches (e.g. strings.HasPrefix and strings.HasSuffix), Unicode normalization, or situations where we want to treat a class of characters (such as all numeric digits) as equivalent for matching purposes. You could use a regular expression, but now you'd have two problems, as the jwz quote goes.
The code generated by this package is theoretically more efficient than the preceding approaches. It supports partial matches, and can treat groups of characters (e.g. 'a' and 'A') as equivalent.
Under the hood, it works by partitioning the search space by the length of the input string, then updating a state machine based on each rune in the input. If the character at a given position in the input doesn't correspond to any possible match, we bail early. Otherwise, the final state is compared against possible matches using a final switch statement.
Is the code output by this package faster enough to matter? Maybe, maybe not. This is a straight port of a C code generation tool I've used on a couple of projects. In C, the difference was significant, due to strcmp() or strcasecmp() function call overhead, and GCC's ability to convert long switch statements into jump tables or binary searches.
Go (as of 1.7) doesn't yet do any optimization of switch statements. See https://github.com/golang/go/issues/5496 and https://github.com/golang/go/issues/15780. Thus, you may actually be worse off in the short-term for using this method instead of a map lookup. (Certainly in terms of code size.) But as the compiler improves, this code will become more relevant. I've played with having this package output assembler code, but it seems like the effort would be better spent improving the compiler instead.
Index ¶
- Variables
- func Generate(w io.Writer, origCases map[string]string, none string, flags ...*Flag) error
- func GenerateReverse(w io.Writer, cases map[string]string, none string, _ ...*Flag) error
- func GenerateTest(w io.Writer, fn, reverseFn string, cases map[string]string, _ ...*Flag) error
- func Range(args ...rune) []rune
- type ErrAmbiguous
- type ErrBadFlags
- type Flag
Constants ¶
This section is empty.
Variables ¶
var Alphanumeric = Range('0', '9', 'a', 'z', 'A', 'Z')
Alphanumeric is a predefined Range covering ASCII numeric digits and upper- and lower-case letters.
var HasPrefix = new(Flag)
HasPrefix is a flag, which can be passed to Generate, to specify that runes proceeding a match should be ignored.
Matching stops as soon as a match is found, thus "foo" and "f" are ambiguous cases when HasPrefix is specified. Generate returns an error if ambiguity is detected.
var HasSuffix = new(Flag)
HasSuffix is a flag, which can be passed to Generate, to match the end of the input string, in the same manner HasPrefix performs a match of the beginning of the string.
var Insensitive = new(Flag)
Insensitive is a flag, which can be passed to Generate, to specify that matching should be case-insensitive.
var Letters = Range('a', 'z', 'A', 'Z')
Letters is a predefined Range covering upper- and lower-case ASCII letters.
var Lowercase = Range('a', 'z')
Lowercase is a predefined Range covering lower-case ASCII letters.
var Normalize = new(Flag)
Normalize is a flag, which can be passed to Generate, to specify that matching should be done without regard to diacritics, accents, etc.
This is currently just a placeholder, and has no effect yet on the generated code.
var Numbers = Range('0', '9')
Numbers is a predefined Range covering the ASCII digits from 0 through 9.
var Uppercase = Range('A', 'Z')
Uppercase is a predefined Range covering upper-case ASCII letters.
Functions ¶
func Generate ¶
Generate outputs Go code to compare a string to a set of possible matches which are known at compile-time.
Each entry in the map consists of a possible match as the key, and the corresponding expression to return as the value. none is the expression to return if no match is found.
Code to perform the match is written to the supplied io.Writer. Before calling this function, the caller is expected to write the method signature and any input pre-processing logic. The string to examine should be in a variable named "input".
If flags are specified, it's possible to generate ambiguous code, in which the same input string will match multiple entries in the cases map, with different return values. This function attempts to detect this and will return an error if ambiguity is detected.
The output is not buffered, and will be incomplete if an error is returned. If the caller cares about this, they should have a way to discard the written output on error. Errors writing to the supplied io.Writer will be passed back to the caller.
Example usage:
fmt.Fprintln(w, "func matchFoo(input string) int {") fastmatch.Generate(w, map[string]string{ "foo": "1", "bar": "2", "baz": "3", }, "-1", fastmatch.Insensitive)
func GenerateReverse ¶
GenerateReverse outputs Go code that returns the string value for a given match. The result from the generated function will be the reverse of that from a function generated with Generate.
If the supplied io.Writer is not valid, or if more than one string maps to the same value, an error is returned.
This function accepts flags (in order to match Generate's function signature), but they are currently ignored.
func GenerateTest ¶
GenerateTest outputs a simple unit test which exercises the generated code.
An error is returned if the supplied io.Writer is not valid. As with Generate and GenerateReverse, the caller is expected to write the method signature (with a *testing.T argument named t) before calling this function.
fn and reverseFn should be the fmt.Printf-style format strings accepting a single argument, which will be replaced with the test input for the generated function. This is typically something like "Function(%q)" for the matcher and "%s.String()" for the reverse matcher. Passing "" causes the respective function to not be tested.
Flags should match what was passed to Generate, but are currently ignored. Future versions of this routine may output more sophisticated tests which take flags into account.
func Range ¶
Range accepts zero or more pairs of runes, and returns a slice covering all runes between the even and odd arguments, inclusive. It can be used with flags which take a list of runes as arguments, such as Equivalent, StopUpon, Ignore, or IgnoreExcept.
For example:
fastmatch.Generate(w, cases, "nil", fastmatch.IgnoreExcept(Range('0', '9', 'a', 'z', 'A', 'Z')...))
If passed an odd number of arguments, this function will panic.
See also the predefined ranges: Numbers, Lowercase, Uppercase, Letters, and Alphanumeric.
Types ¶
type ErrAmbiguous ¶
type ErrAmbiguous struct {
// contains filtered or unexported fields
}
ErrAmbiguous is returned when Generate or GenerateReverse is passed ambiguous matches: cases where it's possible for the same input string to match different return values.
func (*ErrAmbiguous) Error ¶
func (e *ErrAmbiguous) Error() string
type ErrBadFlags ¶
type ErrBadFlags struct {
// contains filtered or unexported fields
}
ErrBadFlags is returned when nonsensical flags are passed to Generate.
func (*ErrBadFlags) Error ¶
func (e *ErrBadFlags) Error() string
type Flag ¶
type Flag struct {
// contains filtered or unexported fields
}
Flag can be passed to Generate and GenerateReverse to modify the functions' behavior. Users of this package should not instantiate their own Flags. Rather, they should use one of HasPrefix, HasSuffix, Insensitive, Normalize, or the return value from Equivalent(), StopUpon(), Ignore(), or IgnoreExcept(). Unknown Flags are silently discarded.
func Equivalent ¶
Equivalent is a flag, which can be passed to Generate, to specify runes that should be treated identically when matching.
func Ignore ¶
Ignore is a flag, which can be passed to Generate, to specify runes (including equivalents) which should be ignored for matching purposes.
func IgnoreExcept ¶
IgnoreExcept is a flag, which can be passed to Generate, to specify which runes (including equivalents) should be examined when matching. This is similar to Ignore, except that when this flag is present, any runes not listed will be ignored.
Ignore and IgnoreExcept may not be combined.
func StopUpon ¶
StopUpon is a flag, which can be passed to Generate, to specify a set of runes (including equivalents) which get treated like a string boundary, i.e. cause matching to immediately cease.
In normal usage, this results in matches which must be immediately followed by either end-of-string or the stop character. This is similar to HasPrefix, except that it allows for matches that would be ambiguous if we stopped upon first match. Consider a matcher for URI schemes (RFC 7595), where we want to match either the bare scheme name or the beginning of a URL. We might generate a matcher as follows:
fmt.Fprintln(w, "func matchScheme(input string) Scheme {") fastmatch.Generate(w, map[string]string{ "http": "HTTP", "https": "HTTPS", }, "nil", fastmatch.Insensitive, fastmatch.StopUpon(':'))
With the above, matchScheme("http") and matchScheme("http://example.com") both return HTTP, and matchScheme("https") and matchScheme("HTTPS://example.com") both return HTTPS. matchScheme("https+xml://example.com") would return nil.
When StopUpon is combined with HasSuffix, the stop character is treated as the beginning of the string. (This is obvious if one considers we match from right-to-left when HasSuffix is specified.) An example filename extension matcher could be generated as follows:
fmt.Fprintln(w, "func matchExt(input string) Extension {") fastmatch.Generate(w, map[string]string{ "exe": "EXE", "dll": "DLL", }, "nil", fastmatch.StopUpon('.'), fastmatch.HasSuffix)
matchExt("foo.exe") and matchExt("exe") both return EXE, and matchExt("bar.dll") and matchExt("dll") both return DLL.
Runes from StopUpon may not also appear in Ignore. If IgnoreExcept is specified, runes from StopUpon will be treated as stop runes regardless of whether or not they appear in IgnoreExcept.