rxp

package module
v0.10.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 18, 2024 License: Apache-2.0 Imports: 7 Imported by: 15

README

godoc codecov Go Report Card

rxp

rxp is an experiment in doing regexp-like things, without actually using regexp to do any of the work.

For most use cases, the regexp package is likely the correct choice as it is fairly optimized and uses the familiar regular expression byte/string patterns to compile and use to match and replace text.

rxp by contrast doesn't really have a compilation phase, rather it is simply the declaration of a Pattern, which is really just a slice of Matcher functions, and to do the neat things one needs to do with regular expressions, simply use the methods on the Pattern list.

Notice

This is the v0.10.x series, it works but likely not exactly as one would expect. For example, the greedy-ness of things is incorrect, however, there are always ways to write the patterns differently such that the greedy-ness issue is irrelevant.

Please do not blindly use this project without at least writing specific unit tests for all Patterns and methods required.

There are no safeguards against footguns and other such pitfalls.

Installation

> go get github.com/go-corelibs/rxp@latest

Examples

Find all words at the start of any line of input

// regexp version:
m := regexp.
    MustCompile(`(?:m)^\s*(\w+)\b`).
    FindAllStringSubmatch(input, -1)

// equivalent rxp version
m := rxp.Pattern{}.
    Caret("m").S("*").W("+", "c").B().
    FindAllStringSubmatch(input, -1)

Perform a series of text transformations

For whatever reason, some text needs to be transformed and these transformations must satisfy four requirements: lowercase everything, consecutive spaces become one space, single quotes must be turned into underscores and all non-alphanumeric-underscore-or-spaces be removed.

These requirements can be explored with the traditional Perl substitution syntax, as in the following table:

# Perl Expression Description
1 s/[A-Z]+/\L${1}\e/mg lowercase all letters
2 s/\s+/ /mg collapse all spaces
3 s/[']/_/mg single quotes to underscores
4 s/[^\w\s]+//mg delete non-word-or-spaces

The result of the above should take: Isn't this neat? and transform it into: isn_t this neat.

// using regexp:
output := strings.ToLower(`Isn't  this  neat?`)
output = regexp.MustCompile(`\s+`).ReplaceAllString(output, " ")
output = regexp.MustCompile(`[']`).ReplaceAllString(output, "_")
output = regexp.MustCompile(`[^\w ]`).ReplaceAllString(output, "")
// output == "isn_t this neat"

// using rxp:
output := rxp.Pipeline{}.
	Transform(strings.ToLower).
	Literal(rxp.S("+"), " ").
	Literal(rxp.Text("'"), "_").
	Literal(rxp.Not(rxp.W(), rxp.S(), "c"), "").
	Process(`Isn't  this  neat?`)
// output == "isn_t this neat"

Benchmarks

These benchmarks can be regenerated using make benchmark.

Historical (make benchstats-historical)

Given that performance is basically the entire point of the rxp package, here's some benchmark statistics showing the evolution of the rxp package itself from v0.1.0 to the current v0.10.0. Each of these releases are present in separate pre-release branches so that curious developers can easily study the progression of this initial development cycle.

goos: linux
goarch: arm64
pkg: github.com/go-corelibs/rxp
                     │     v0.1.0      │                 v0.2.0                  │                 v0.4.0                  │                 v0.8.0                  │                 v0.10.0                 │
                     │     sec/op      │     sec/op       vs base                │     sec/op       vs base                │     sec/op       vs base                │     sec/op       vs base                │
_FindAllString_Rxp      0.004292n ± 0%    0.003496n ± 1%  -18.56% (n=50)            0.002005n ± 0%  -53.30% (n=50)            0.001868n ± 1%  -56.48% (n=50)            0.002162n ± 1%  -49.62% (n=50)
_Pipeline_Combo_Rxp    0.0002862n ± 1%   0.0002866n ± 1%        ~ (p=0.920 n=50)   0.0002945n ± 3%   +2.86% (p=0.037 n=50)   0.0002910n ± 2%   +1.66% (p=0.010 n=50)   0.0002858n ± 1%        ~ (p=0.348 n=50)
_Pipeline_Readme_Rxp    0.047985n ± 0%    0.053185n ± 1%  +10.84% (p=0.000 n=50)    0.010055n ± 0%  -79.05% (n=50)            0.006152n ± 0%  -87.18% (n=50)            0.007117n ± 1%  -85.17% (n=50)
_Replace_ToUpper_Rxp     0.07639n ± 1%     0.08496n ± 1%  +11.22% (p=0.000 n=50)     0.03293n ± 0%  -56.90% (n=50)             0.02835n ± 0%  -62.89% (n=50)             0.02339n ± 1%  -69.38% (n=50)
geomean                 0.008192n         0.008203n        +0.13%                   0.003739n       -54.36%                   0.003120n       -61.91%                   0.003185n       -61.12%

Versus Regexp (make benchstats-regexp)

These benchmarks are loosely comparing regexp with rxp in "as similar as can be done" cases. While rxp seems to outperform regexp, note that a poorly crafted rxp Pattern can easily tank performance, as in the pipeline readme case below.

goos: linux
goarch: arm64
pkg: github.com/go-corelibs/rxp
                 │     regexp      │                   rxp                   │
                 │     sec/op      │     sec/op       vs base                │
_FindAllString      0.005376n ± 0%    0.002162n ± 1%  -59.78% (n=50)
_Pipeline_Combo    0.0028000n ± 1%   0.0002858n ± 1%  -89.79% (n=50)
_Pipeline_Readme    0.005760n ± 1%    0.007117n ± 1%  +23.56% (p=0.000 n=50)
_Replace_ToUpper     0.02497n ± 0%     0.02339n ± 1%   -6.29% (p=0.000 n=50)
geomean             0.006821n         0.003185n       -53.31%

Go-CoreLibs

Go-CoreLibs is a repository of shared code between the Go-Curses and Go-Enjin projects.

License

Copyright 2024 The Go-CoreLibs Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use file except in compliance with the License.
You may obtain a copy of the license at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Documentation

Overview

Package rxp is an experiment in doing regexp-like things, without actually using regexp to do the work

Index

Constants

View Source
const (
	DefaultMinReps = 1
	DefaultMaxReps = 1
)

Variables

Functions

func ParseFlags

func ParseFlags(flags ...string) (Reps, Flags)

ParseFlags parses a regexp-like option string into a Flags instance and two integers, the low and high range of repetitions

|  Flags  | Description                                                                             |
|---------|-----------------------------------------------------------------------------------------|
|    ^    | Invert the meaning of this match group                                                  |
|    m    | Multiline mode Caret and Dollar match begin/end of line in addition to begin/end text   |
|    s    | DotNL allows Dot to match newlines (\n)                                                 |
|    i    | AnyCase is case-insensitive matching of unicode text                                    |
|    c    | Capture allows this Matcher to be included in Pattern substring results                 |
|    *    | zero or more repetitions, prefer more                                                   |
|    +    | one or more repetitions, prefer more                                                    |
|    ?    | zero or one repetition, prefer one                                                      |
|  {l,h}  | range of repetitions, l minimum and up to h maximum, prefer more                        |
|  {l,}   | range of repetitions, l minimum, prefer more                                            |
|  {l}    | range of repetitions, l minimum, prefer more                                            |
|   *?    | zero or more repetitions, prefer less                                                   |
|   +?    | one or more repetitions, prefer less                                                    |
|   ??    | zero or one repetition, prefer zero                                                     |
|  {l,h}? | range of repetitions, l minimum and up to h maximum, prefer less                        |
|  {l,}?  | range of repetitions, l minimum, prefer less                                            |
|  {l}?   | range of repetitions, l minimum, prefer less                                            |

The flags presented above can be combined into a single string argument, or can be individually given to ParseFlags

Any parsing errors will result in a runtime panic

func RuneIsALNUM

func RuneIsALNUM(r rune) bool

RuneIsALNUM returns true for alphanumeric characters [a-zA-Z0-9]

func RuneIsALPHA

func RuneIsALPHA(r rune) bool

RuneIsALPHA returns true for alphanumeric characters [a-zA-Z]

func RuneIsASCII

func RuneIsASCII(r rune) bool

RuneIsASCII returns true for valid ASCII characters [\x00-\x7F]

func RuneIsBLANK

func RuneIsBLANK(r rune) bool

RuneIsBLANK returns true for tab and space characters [\t ]

func RuneIsCNTRL

func RuneIsCNTRL(r rune) bool

RuneIsCNTRL returns true for control characters [\x00-\x1F\x7F]

func RuneIsDIGIT

func RuneIsDIGIT(r rune) bool

RuneIsDIGIT returns true for number digits [0-9]

func RuneIsGRAPH

func RuneIsGRAPH(r rune) bool

RuneIsGRAPH returns true for graphical characters [a-zA-Z0-9!"$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]

Note: upon the first use of RuneIsGRAPH, a lookup map is cached in a global variable and used for detecting the specific runes supported by the regexp [:graph:] class

func RuneIsLOWER

func RuneIsLOWER(r rune) bool

RuneIsLOWER returns true for lowercase alphabetic characters [a-z]

func RuneIsPRINT

func RuneIsPRINT(r rune) bool

RuneIsPRINT returns true for space and RuneIsGRAPH characters [ [:graph:]]

Note: uses RuneIsGRAPH

func RuneIsPUNCT

func RuneIsPUNCT(r rune) bool

RuneIsPUNCT returns true for punctuation characters [!-/:-@[-`{-~]

func RuneIsSPACE

func RuneIsSPACE(r rune) bool

RuneIsSPACE returns true for empty space characters [\t\n\v\f\r ]

func RuneIsSpace

func RuneIsSpace(r rune) bool

RuneIsSpace returns true for space characters [\t\n\f\r ]

func RuneIsUPPER

func RuneIsUPPER(r rune) bool

RuneIsUPPER returns true for lowercase alphabetic characters [A-Z]

func RuneIsWord

func RuneIsWord(r rune) bool

RuneIsWord returns true for word characters [_a-zA-Z0-9]

func RuneIsXDIGIT

func RuneIsXDIGIT(r rune) bool

RuneIsXDIGIT returns true for hexadecimal digits [z-fA-F0-9]

Types

type AsciiNames

type AsciiNames string
const (
	ALNUM  AsciiNames = "alnum"
	ALPHA  AsciiNames = "alpha"
	ASCII  AsciiNames = "ascii"
	BLANK  AsciiNames = "blank"
	CNTRL  AsciiNames = "cntrl"
	DIGIT  AsciiNames = "digit"
	GRAPH  AsciiNames = "graph"
	LOWER  AsciiNames = "lower"
	PRINT  AsciiNames = "print"
	PUNCT  AsciiNames = "punct"
	SPACE  AsciiNames = "space"
	UPPER  AsciiNames = "upper"
	WORD   AsciiNames = "word"
	XDIGIT AsciiNames = "xdigit"
)

type Flags added in v0.2.0

type Flags uint16
const (
	DefaultFlags Flags = 0
	NegatedFlag  Flags = 1 << iota
	CaptureFlag
	MatchedFlag
	MultilineFlag
	DotNewlineFlag
	AnyCaseFlag
	ZeroOrMoreFlag
	ZeroOrOneFlag
	OneOrMoreFlag
	LessFlag
)

func (Flags) AnyCase added in v0.2.0

func (f Flags) AnyCase() bool

func (Flags) Capture added in v0.2.0

func (f Flags) Capture() bool

func (Flags) DotNL added in v0.2.0

func (f Flags) DotNL() bool

func (Flags) Has added in v0.10.0

func (f Flags) Has(flag Flags) bool

func (Flags) Less added in v0.2.0

func (f Flags) Less() bool

func (Flags) Matched added in v0.10.0

func (f Flags) Matched() bool

func (Flags) Merge added in v0.2.0

func (f Flags) Merge(other Flags) Flags

func (Flags) Multiline added in v0.2.0

func (f Flags) Multiline() bool

func (Flags) Negated added in v0.2.0

func (f Flags) Negated() bool

func (Flags) OneOrMore added in v0.4.0

func (f Flags) OneOrMore() bool

func (Flags) Set added in v0.10.0

func (f Flags) Set(flag Flags) Flags

func (Flags) String added in v0.2.0

func (f Flags) String() string

func (Flags) Unset added in v0.10.0

func (f Flags) Unset(flag Flags) Flags

func (Flags) ZeroOrMore added in v0.4.0

func (f Flags) ZeroOrMore() bool

func (Flags) ZeroOrOne added in v0.4.0

func (f Flags) ZeroOrOne() bool

type InputReader added in v0.10.0

type InputReader struct {
	// contains filtered or unexported fields
}

InputReader is an efficient rune based buffer

func NewInputReader added in v0.10.0

func NewInputReader[V []rune | []byte | string](input V) *InputReader

NewInputReader creates a new InputReader instance for the given input string

func (*InputReader) Bytes added in v0.10.0

func (rb *InputReader) Bytes(index, count int) (slice []byte)

Bytes returns the range of runes from start (inclusive) to end (exclusive) if the entire range is Ready

func (*InputReader) End added in v0.10.0

func (rb *InputReader) End(index int) (end bool)

End returns true if this index is exactly the input length, denoting the Dollar zero-width position

func (*InputReader) Get added in v0.10.0

func (rb *InputReader) Get(index int) (r rune, size int, ok bool)

Get returns the Ready rune at the given index position

func (*InputReader) Invalid added in v0.10.0

func (rb *InputReader) Invalid(index int) (invalid bool)

Invalid returns true if the given index position is less than zero or greater than or equal to the total length of the InputReader

func (*InputReader) Len added in v0.10.0

func (rb *InputReader) Len() int

Len returns the total number of runes in the InputReader

func (*InputReader) Next added in v0.10.0

func (rb *InputReader) Next(index int) (r rune, size int, ok bool)

Next returns the Ready rune after the given index position, or \0 if not Ready

func (*InputReader) Prev added in v0.10.0

func (rb *InputReader) Prev(index int) (r rune, size int, ok bool)

Prev returns the Ready rune before the given index position, or \0 if not Ready

The Prev is necessary because to find the previous rune to the given index, Prev must incrementally scan backwards up to four bytes, trying to read a rune without error with each iteration

func (*InputReader) Ready added in v0.10.0

func (rb *InputReader) Ready(index int) (ready bool)

Ready returns true if the given index position is greater than or equal to zero and less than the total length of the InputReader

func (*InputReader) Slice added in v0.10.0

func (rb *InputReader) Slice(index, count int) (slice []rune, size int)

Slice returns the range of runes from start (inclusive) to end (exclusive) if the entire range is Ready

func (*InputReader) String added in v0.10.0

func (rb *InputReader) String(index, count int) string

String returns the string of runes from start (inclusive) to end (exclusive) if the entire range is Ready

func (*InputReader) Valid added in v0.10.0

func (rb *InputReader) Valid(index int) (valid bool)

Valid returns true if the given index position is greater than or equal to zero and less than or equal to the total length of the InputReader

type Matcher

type Matcher func(scope Flags, reps Reps, input *InputReader, index int, sm [][2]int) (scoped Flags, consumed int, proceed bool)

Matcher is a single string matching function

| Argument | Description                        |
|----------|------------------------------------|
|  scope   | current Flags for this iteration   |
|  reps    | min and max repetition settings    |
|  input   | input rune slice (do not modify!)  |
|  index   | current input rune index to match  |

| Return   | Description                        |
|----------|------------------------------------|
| scoped   | possibly modified sub-match scope  |
| consumed | number of runes matched from index |
| proceed  | success, keep matching for more    |

func A

func A(flags ...string) Matcher

A creates a Matcher equivalent to the regexp [\A]

func Alnum

func Alnum(flags ...string) Matcher

Alnum creates a Matcher equivalent to [:alnum:]

func Alpha

func Alpha(flags ...string) Matcher

Alpha creates a Matcher equivalent to [:alpha:]

func Ascii

func Ascii(flags ...string) Matcher

Ascii creates a Matcher equivalent to [:ascii:]

func B

func B(flags ...string) Matcher

B creates a Matcher equivalent to the regexp [\b]

func BackRef added in v0.8.0

func BackRef(gid int, flags ...string) Matcher

BackRef is a Matcher equivalent to Perl backreferences where the gid argument is the match group to use

BackRef will panic if the gid argument is less than one

func Blank

func Blank(flags ...string) Matcher

Blank creates a Matcher equivalent to [:blank:]

func Caret

func Caret(flags ...string) Matcher

Caret creates a Matcher equivalent to the regexp caret [^]

func Cntrl

func Cntrl(flags ...string) Matcher

Cntrl creates a Matcher equivalent to [:cntrl:]

func D

func D(flags ...string) Matcher

D creates a Matcher equivalent to the regexp \d

func Digit

func Digit(flags ...string) Matcher

Digit creates a Matcher equivalent to [:digit:]

func Dollar

func Dollar(flags ...string) Matcher

Dollar creates a Matcher equivalent to the regexp [$]

func Dot

func Dot(flags ...string) Matcher

Dot creates a Matcher equivalent to the regexp dot (.)

func Graph

func Graph(flags ...string) Matcher

Graph creates a Matcher equivalent to [:graph:]

func Group

func Group(options ...interface{}) Matcher

Group processes the list of Matcher instances, in the order they were given, and stops at the first one that does not match, discarding any consumed runes. If all Matcher calls succeed, all consumed runes are accepted together as this group (sub-sub-matches are not a thing in rxp)

func IsAtLeastSixDigits added in v0.10.0

func IsAtLeastSixDigits(flags ...string) Matcher

IsAtLeastSixDigits creates a Matcher equivalent to:

(?:\A[0-9]{6,}\z)

func IsFieldKey added in v0.10.0

func IsFieldKey(flags ...string) Matcher

IsFieldKey creates a Matcher equivalent to:

(?:\b[a-zA-Z][-_a-zA-Z0-9]+?[a-zA-Z0-9]\b)

IsFieldKey is intended to validate CSS and HTML attribute key names such as "data-thing" or "some_value"

func IsFieldWord added in v0.10.0

func IsFieldWord(flags ...string) Matcher

IsFieldWord creates a Matcher equivalent to:

(?:\b[a-zA-Z0-9]+?[-_a-zA-Z0-9']*[a-zA-Z0-9]+\b|\b[a-zA-Z0-9]+\b)

func IsHash10 added in v0.10.0

func IsHash10(flags ...string) Matcher

IsHash10 creates a Matcher equivalent to:

(?:[a-fA-F0-9]{10})

func IsKeyword added in v0.10.0

func IsKeyword(flags ...string) Matcher

IsKeyword is intended for Go-Enjin parsing of simple search keywords from user input and creates a Matcher equivalent to:

(?:\b[-+]?[a-zA-Z][-_a-zA-Z0-9']+?[a-zA-Z0-9]\b)

func IsUUID added in v0.10.0

func IsUUID(flags ...string) Matcher

IsUUID creates a Matcher equivalent to:

(?:[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})

func IsUnicodeRange

func IsUnicodeRange(table *unicode.RangeTable, flags ...string) Matcher

IsUnicodeRange creates a Matcher equivalent to the regexp \pN where N is a unicode character class, passed to IsUnicodeRange as a unicode.RangeTable instance

For example, creating a Matcher for a single braille character:

IsUnicodeRange(unicode.Braille)

func Lower

func Lower(flags ...string) Matcher

Lower creates a Matcher equivalent to [:lower:]

func MakeMatcher added in v0.4.0

func MakeMatcher(match Matcher, flags ...string) Matcher

MakeMatcher creates a rxp standard Matcher implementation wrapped around a given RuneMatcher

func NamedClass added in v0.8.0

func NamedClass(name AsciiNames, flags ...string) Matcher

NamedClass creates a Matcher equivalent to the regexp [:AsciiNames:], see the AsciiNames constants for the list of supported ASCII class names

NamedClass will panic if given an invalid class name

func Not

func Not(options ...interface{}) Matcher

Not processes all the matchers given, in the order they were given, stopping at the first one that succeeds and inverts the proceed return value

Not is equivalent to a negated character class in traditional regular expressions, for example:

[^xyza-f]

could be implemented as any of the following:

// slower due to four different matchers being present
Not(Text("x"),Text("y"),Text("z"),R("a-f"))
// better but still has two matchers
Not(R("xyza-f"))
// no significant difference from the previous
Or(R("xyza-f"), "^") //< negation (^) flag
// simplified to just one matcher present
R("xyza-f", "^") //< negation (^) flag

here's the interesting bit about rxp though, if speed is really the goal, then the following would capture single characters matching [^xyz-af] with significant performance over MakeMatcher based matchers (use Pattern.Add to include the custom Matcher)

func(scope Flags, reps Reps, input *InputReader, index int, sm [][2]int) (scoped Flags, consumed int, proceed bool) {
    scoped = scope
    if r, size, ok := input.Get(index); ok {
        // test for [xyza-f]
        proceed = (r >= 'x' && r <= 'z') || (r >= 'a' && r <= 'f')
        // and invert the result
        proceed = !proceed
        if proceed { // true means the negation is a match
        // MatchedFlag is required, CaptureFlag optionally if needed
        scoped |= MatchedFlag | CaptureFlag
        // consume this rune's size if a capture group is needed
        // using size instead of just 1 will allow support for
        // accurate []rune, []byte and string processing
        consumed += size
    }
    return
}

func Or

func Or(options ...interface{}) Matcher

Or processes the list of Matcher instances, in the order they were given, and stops at the first one that returns a true next

Or accepts Pattern, Matcher and string types and will panic on all others

func Print

func Print(flags ...string) Matcher

Print creates a Matcher equivalent to [:print:]

func Punct

func Punct(flags ...string) Matcher

Punct creates a Matcher equivalent to [:punct:]

func R added in v0.8.0

func R(characters string, flags ...string) Matcher

R creates a Matcher equivalent to regexp character class ranges such as: [xyza-f] where x, y and z are individual runes to accept and a-f is the inclusive range of letters from lowercase a to lowercase f to accept

Note: do not include the [] brackets unless the intent is to actually accept those characters

func S

func S(flags ...string) Matcher

S creates a Matcher equivalent to the regexp \s

func Space

func Space(flags ...string) Matcher

Space creates a Matcher equivalent to [:space:]

func Text

func Text(text string, flags ...string) Matcher

Text creates a Matcher for the plain text given

func Upper

func Upper(flags ...string) Matcher

Upper creates a Matcher equivalent to [:upper:]

func W

func W(flags ...string) Matcher

W creates a Matcher equivalent to the regexp \w

func Word

func Word(flags ...string) Matcher

Word creates a Matcher equivalent to [:word:]

func WrapMatcher added in v0.4.0

func WrapMatcher(matcher RuneMatcher, flags ...string) Matcher

WrapMatcher creates a Matcher using MakeMatcher and wrapping a RuneMatcher

func Xdigit

func Xdigit(flags ...string) Matcher

Xdigit creates a Matcher equivalent to [:xdigit:]

func Z

func Z(flags ...string) Matcher

Z is a Matcher equivalent to the regexp [\z]

type Pattern

type Pattern []Matcher

Pattern is a list of Matcher functions, all of which must match, in the order present, in order to consider the Pattern to match

func ParseOptions

func ParseOptions(options ...interface{}) (pattern Pattern, flags []string, argv []interface{})

ParseOptions accepts Pattern, Matcher and string options and recasts them into their specific types

ParseOptions will panic with any type other than Pattern, Matcher or string

func (Pattern) A

func (p Pattern) A(flags ...string) Pattern

func (Pattern) Add

func (p Pattern) Add(matcher Matcher) Pattern

func (Pattern) Alnum

func (p Pattern) Alnum(flags ...string) Pattern

func (Pattern) Alpha

func (p Pattern) Alpha(flags ...string) Pattern

func (Pattern) Ascii

func (p Pattern) Ascii(flags ...string) Pattern

func (Pattern) B

func (p Pattern) B(flags ...string) Pattern

func (Pattern) BackRef added in v0.8.0

func (p Pattern) BackRef(idx int, flags ...string) Pattern

func (Pattern) Blank

func (p Pattern) Blank(flags ...string) Pattern

func (Pattern) Caret

func (p Pattern) Caret(flags ...string) Pattern

func (Pattern) Cntrl

func (p Pattern) Cntrl(flags ...string) Pattern

func (Pattern) D

func (p Pattern) D(flags ...string) Pattern

func (Pattern) Digit

func (p Pattern) Digit(flags ...string) Pattern

func (Pattern) Dollar

func (p Pattern) Dollar(flags ...string) Pattern

func (Pattern) Dot

func (p Pattern) Dot(flags ...string) Pattern

func (Pattern) FindAllBytes added in v0.10.0

func (p Pattern) FindAllBytes(input []byte, count int) (found [][]byte)

func (Pattern) FindAllBytesIndex added in v0.10.0

func (p Pattern) FindAllBytesIndex(input []byte, count int) (found [][2]int)

func (Pattern) FindAllBytesSubmatch added in v0.10.0

func (p Pattern) FindAllBytesSubmatch(input []byte, count int) (found [][][]byte)

func (Pattern) FindAllBytesSubmatchIndex added in v0.10.0

func (p Pattern) FindAllBytesSubmatchIndex(input []byte, count int) (found [][][2]int)

func (Pattern) FindAllRunes added in v0.10.0

func (p Pattern) FindAllRunes(input []rune, count int) (found [][]rune)

FindAllRunes returns a slice of strings containing all of the Pattern matches present in the input given, in the order the matches are found

func (Pattern) FindAllRunesIndex added in v0.10.0

func (p Pattern) FindAllRunesIndex(input []rune, count int) (found [][2]int)

FindAllRunesIndex returns a slice of starting and ending indices denoting each of the Pattern matches present in the input given

func (Pattern) FindAllRunesSubmatch added in v0.10.0

func (p Pattern) FindAllRunesSubmatch(input []rune, count int) (found [][][]rune)

FindAllRunesSubmatch returns a slice of all Pattern matches (and any sub-matches) present in the input given

func (Pattern) FindAllRunesSubmatchIndex added in v0.10.0

func (p Pattern) FindAllRunesSubmatchIndex(input []rune, count int) (found [][][2]int)

FindAllRunesSubmatchIndex returns a slice of starting and ending points for all Pattern matches (and any sub-matches) present in the input given

func (Pattern) FindAllString

func (p Pattern) FindAllString(input string, count int) (found []string)

FindAllString returns a slice of strings containing all of the Pattern matches present in the input given, in the order the matches are found

func (Pattern) FindAllStringIndex

func (p Pattern) FindAllStringIndex(input string, count int) (found [][2]int)

FindAllStringIndex returns a slice of starting and ending indices denoting each of the Pattern matches present in the input given

func (Pattern) FindAllStringSubmatch

func (p Pattern) FindAllStringSubmatch(input string, count int) (found [][]string)

FindAllStringSubmatch returns a slice of all Pattern matches (and any sub-matches) present in the input given

func (Pattern) FindAllStringSubmatchIndex added in v0.4.0

func (p Pattern) FindAllStringSubmatchIndex(input string, count int) (found [][][2]int)

FindAllStringSubmatchIndex returns a slice of starting and ending points for all Pattern matches (and any sub-matches) present in the input given

func (Pattern) FindBytes added in v0.10.0

func (p Pattern) FindBytes(input []byte) []byte

func (Pattern) FindBytesIndex added in v0.10.0

func (p Pattern) FindBytesIndex(input []byte) (found [2]int)

func (Pattern) FindBytesSubmatch added in v0.10.0

func (p Pattern) FindBytesSubmatch(input []byte) (found [][]byte)

func (Pattern) FindBytesSubmatchIndex added in v0.10.0

func (p Pattern) FindBytesSubmatchIndex(input []byte) (found [][2]int)

func (Pattern) FindRunes added in v0.10.0

func (p Pattern) FindRunes(input []rune) []rune

FindRunes returns the leftmost Pattern match within the input given

func (Pattern) FindRunesIndex added in v0.10.0

func (p Pattern) FindRunesIndex(input []rune) (found [2]int)

FindRunesIndex returns the leftmost Pattern match starting and ending indexes within the string given

func (Pattern) FindRunesSubmatch added in v0.10.0

func (p Pattern) FindRunesSubmatch(input []rune) (found [][]rune)

FindRunesSubmatch returns a slice of strings holding the leftmost match of this Pattern and any of its sub-matches. FindRunesSubmatch returns nil if there was no match of this Pattern within the input given

func (Pattern) FindRunesSubmatchIndex added in v0.10.0

func (p Pattern) FindRunesSubmatchIndex(input []rune) (found [][2]int)

FindRunesSubmatchIndex returns a slice of starting and ending indices denoting the leftmost match of this Pattern and any of its sub-matches

func (Pattern) FindString

func (p Pattern) FindString(input string) string

FindString returns the leftmost Pattern match within the input given

func (Pattern) FindStringIndex added in v0.10.0

func (p Pattern) FindStringIndex(input string) (found [2]int)

FindStringIndex returns the leftmost Pattern match starting and ending indexes within the string given

func (Pattern) FindStringSubmatch

func (p Pattern) FindStringSubmatch(input string) (found []string)

FindStringSubmatch returns a slice of strings holding the leftmost match of this Pattern and any of its sub-matches. FindStringSubmatch returns nil if there was no match of this Pattern within the input given

func (Pattern) FindStringSubmatchIndex added in v0.10.0

func (p Pattern) FindStringSubmatchIndex(input string) (found [][2]int)

FindStringSubmatchIndex returns a slice of starting and ending indices denoting the leftmost match of this Pattern and any of its sub-matches

func (Pattern) Graph

func (p Pattern) Graph(flags ...string) Pattern

func (Pattern) Group

func (p Pattern) Group(options ...interface{}) Pattern

func (Pattern) Lower

func (p Pattern) Lower(flags ...string) Pattern

func (Pattern) MatchBytes added in v0.10.0

func (p Pattern) MatchBytes(input []byte) (ok bool)

func (Pattern) MatchRunes added in v0.10.0

func (p Pattern) MatchRunes(input []rune) (ok bool)

MatchRunes returns true if the input contains at least one match of this Pattern

func (Pattern) MatchString

func (p Pattern) MatchString(input string) (ok bool)

MatchString returns true if the input contains at least one match of this Pattern

func (Pattern) NamedClass added in v0.8.0

func (p Pattern) NamedClass(name AsciiNames, flags ...string) Pattern

func (Pattern) Not

func (p Pattern) Not(options ...interface{}) Pattern

func (Pattern) Or

func (p Pattern) Or(options ...interface{}) Pattern

func (Pattern) Print

func (p Pattern) Print(flags ...string) Pattern

func (Pattern) Punct

func (p Pattern) Punct(flags ...string) Pattern

func (Pattern) R added in v0.8.0

func (p Pattern) R(characters string, flags ...string) Pattern

func (Pattern) RangeTable added in v0.8.0

func (p Pattern) RangeTable(table *unicode.RangeTable, flags ...string) Pattern

func (Pattern) ReplaceAllBytes added in v0.10.0

func (p Pattern) ReplaceAllBytes(input []byte, replacements Replace[[]byte]) []byte

func (Pattern) ReplaceAllBytesFunc added in v0.10.0

func (p Pattern) ReplaceAllBytesFunc(input []byte, transform Transform[[]byte]) []byte

func (Pattern) ReplaceAllLiteralBytes added in v0.10.0

func (p Pattern) ReplaceAllLiteralBytes(input []byte, replacement []byte) []byte

func (Pattern) ReplaceAllLiteralRunes added in v0.10.0

func (p Pattern) ReplaceAllLiteralRunes(input []rune, replacement []rune) (replaced []rune)

ReplaceAllLiteralRunes returns a copy of the input []rune with all Pattern matches replaced with the unmodified replacement text

func (Pattern) ReplaceAllLiteralString added in v0.10.0

func (p Pattern) ReplaceAllLiteralString(input string, replacement string) string

ReplaceAllLiteralString returns a copy of the input string with all Pattern matches replaced with the unmodified replacement text

func (Pattern) ReplaceAllRunes added in v0.10.0

func (p Pattern) ReplaceAllRunes(input []rune, replacements Replace[[]rune]) (replaced []rune)

ReplaceAllRunes returns a copy of the input []rune with all Pattern matches replaced with text returned by the given Replace process

func (Pattern) ReplaceAllRunesFunc added in v0.10.0

func (p Pattern) ReplaceAllRunesFunc(input []rune, transform Transform[[]rune]) (replaced []rune)

ReplaceAllRunesFunc returns a copy of the input []rune with all Pattern matches replaced with the text returned by the given Transform function

func (Pattern) ReplaceAllString

func (p Pattern) ReplaceAllString(input string, replacements Replace[string]) string

ReplaceAllString returns a copy of the input string with all Pattern matches replaced with text returned by the given Replace process

func (Pattern) ReplaceAllStringFunc

func (p Pattern) ReplaceAllStringFunc(input string, transform Transform[string]) string

ReplaceAllStringFunc returns a copy of the input string with all Pattern matches replaced with the text returned by the given Transform function

func (Pattern) S

func (p Pattern) S(flags ...string) Pattern

func (Pattern) Space

func (p Pattern) Space(flags ...string) Pattern

func (Pattern) SplitBytes added in v0.10.0

func (p Pattern) SplitBytes(input []byte, count int) (found [][]byte)

func (Pattern) SplitRunes added in v0.10.0

func (p Pattern) SplitRunes(input []rune, count int) (found [][]rune)

func (Pattern) SplitString added in v0.10.0

func (p Pattern) SplitString(input string, count int) (found []string)

SplitString slices the input into substrings separated by the Pattern and returns a slice of the substrings between those Pattern matches

The slice returned by this method consists of all the substrings of input not contained in the slice returned by Pattern.FindAllString

Example:

s := regexp.MustCompile("a*").SplitString("abaabaccadaaae", 5)
// s: ["", "b", "b", "c", "cadaaae"]

The count determines the number of substrings to return:

| Value Case | Description                                                       |
|------------|-------------------------------------------------------------------|
| count > 0  | at most count substrings; the last will be the un-split remainder |
| count == 0 | the result is nil (zero substrings)                               |
| count < 0  | all substrings                                                    |

func (Pattern) Text

func (p Pattern) Text(text string, flags ...string) Pattern

func (Pattern) Upper

func (p Pattern) Upper(flags ...string) Pattern

func (Pattern) W

func (p Pattern) W(flags ...string) Pattern

func (Pattern) Word

func (p Pattern) Word(flags ...string) Pattern

func (Pattern) Xdigit

func (p Pattern) Xdigit(flags ...string) Pattern

func (Pattern) Z

func (p Pattern) Z(flags ...string) Pattern

type Pipeline

type Pipeline []Stage

Pipeline is a list of stages for transforming strings in a single procedure

func (Pipeline) Literal added in v0.10.0

func (p Pipeline) Literal(search interface{}, text string) Pipeline

Literal appends a search Pattern and literal string replace operation to the Pipeline

The search argument can be a single Matcher function or a single Pattern

func (Pipeline) Process

func (p Pipeline) Process(input string) (output string)

Process returns the output of a complete Pipeline transformation of the input string, and is obviously not a buildable method as it returns a string instead of an updated Pipeline

func (Pipeline) Replace

func (p Pipeline) Replace(search interface{}, replace Replace[string]) Pipeline

Replace appends a search Pattern and Replace[string] operation to the Pipeline

The search argument can be a single Matcher function or a single Pattern

func (Pipeline) Substitute added in v0.10.0

func (p Pipeline) Substitute(search interface{}, transform Transform[string]) Pipeline

Substitute appends a search Pattern and Transform[string] operation to the Pipeline

The search argument can be a single Matcher function or a single Pattern

func (Pipeline) Transform

func (p Pipeline) Transform(transform Transform[string]) Pipeline

Transform appends a Transform[string] function to the Pipeline

type Replace

type Replace[V []rune | []byte | string] []Replacer[V]

Replace is a Replacer pipeline

func (Replace[V]) ToLower

func (r Replace[V]) ToLower() Replace[V]

ToLower is a convenience method which, based on Replace type, will use one of the following methods to lower-case all the lower-case-able values

| Type   | Function        |
|--------|-----------------|
| []rune | unicode.ToLower |
| []byte | bytes.ToLower   |
| string | strings.ToLower |

func (Replace[V]) ToUpper

func (r Replace[V]) ToUpper() Replace[V]

ToUpper is a convenience method which, based on Replace type, will use one of the following methods to upper-case all the upper-case-able values

| Type   | Function        |
|--------|-----------------|
| []rune | unicode.ToUpper |
| []byte | bytes.ToUpper   |
| string | strings.ToUpper |

func (Replace[V]) WithLiteral added in v0.10.0

func (r Replace[V]) WithLiteral(data V) Replace[V]

WithLiteral replaces matches with the literal value given

func (Replace[V]) WithReplace

func (r Replace[V]) WithReplace(replacer Replacer[V]) Replace[V]

WithReplace replaces matches using a Replacer function

func (Replace[V]) WithTransform

func (r Replace[V]) WithTransform(transform Transform[V]) Replace[V]

WithTransform replaces matches with a Transform function

type Replacer added in v0.10.0

type Replacer[V []rune | []byte | string] func(input *InputReader, captured [][2]int, modified V) (replaced V)

Replacer is a Replace processor function

The captured argument is the result of the Pattern match process and is composed of the entire matched text as the first item in the captured list, and any Pattern capture groups following

The modified string argument is the output of the previous Replacer in the Replace process, or the original matched input text if this is the first Replacer in the process

type Reps added in v0.2.0

type Reps []int

func (Reps) IsNil added in v0.4.0

func (r Reps) IsNil() bool

IsNil returns true if the length of the Reps int slice is not at least two

func (Reps) Max added in v0.2.0

func (r Reps) Max() int

Max returns the maximum number of repetitions, or the DefaultMaxReps for nil instances

func (Reps) Min added in v0.2.0

func (r Reps) Min() int

Min returns the minimum number of repetitions, or the DefaultMinReps for nil instances

func (Reps) Satisfied added in v0.4.0

func (r Reps) Satisfied(count int) (minHit, maxHit bool)

Satisfied returns true if the given count meets the required repetitions

func (Reps) Valid added in v0.4.0

func (r Reps) Valid() bool

Valid returns true if the Reps is not nil and if the minimum is not greater than the maximum, unless the maximum is unlimited (zero or a negative)

type RuneMatcher

type RuneMatcher func(r rune) bool

RuneMatcher is the signature for the basic character matching functions such as RuneIsWord

Implementations are expected to operate using the least amount of CPU instructions possible

type Stage

type Stage struct {
	// Search is a Pattern of text, used with Replace to modify the matching
	// text
	Search    Pattern
	Replace   Replace[string]
	Transform Transform[string]
}

Stage is one phase of a text replacement Pipeline and receives an input string from the previous stage (or the initial input text) and returns the output provided to the next stage (or is finally returned to the caller)

func (Stage) Process

func (s Stage) Process(input string) (output string)

Process is the Pipeline processor method for a Stage

If there is a Search Pattern present, and there is at least one Replacer functions present, then the process returns a Search Pattern.ReplaceAllString

If there is a Search Pattern present, and there are no Replacer functions present, then the process returns a Search Pattern.ReplaceAllStringFunc

type Transform

type Transform[V []rune | []byte | string] func(input V) (output V)

Transform is the function signature for non-rxp string transformation pipeline stages

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL