pars

package module
v1.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 13, 2020 License: MIT Imports: 12 Imported by: 4

README

Pars

CircleCI Go Report Card GoDoc

Parser combinator library for Go.

Pars provides parser combinator functionalites for parsing the contents of any object with an io.Reader interface. It is currently focused on usability over performance so if you are looking for a similar, performant library check out goparsify. Any performance patches for the library that do not break the existing interface is welcome.

Documentation

Consult the GoDoc for some detailed documentation of Pars.

Example: Polish Notation Parser

In this section we walk through a simple parser for parsing equation written in Polish notation. First we must write a parser to match numbers. A number can be defined as follows:

  • number
    • int
    • int frac
    • int exp
    • int frac exp
  • int
    • digit
    • non-zero digits
      • digit
      • non-zero digits
      • digit
      • non-zero digits
  • frac
    • .digits
  • exp
    • e int
    • E int
    • e + int
    • E + int
    • e - int
    • E - int
  • digits
    • digit
    • digit digits
  • digit
    • 0
    • non-zero
  • non-zero
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

All of the parsers mentioned here can be represented as a combined parser of basic single byte matchers. A parser is defined with the function signature func(*pars.State, *pars.Result) error which can interact with the parser's state and result to return an error. The definition above can be tranlated to a Pars parser as follows.

import "github.com/go-pars/pars"

var (
	NonZero = pars.Any('1', '2', '3', '4', '5', '6', '7', '8', '9')
	Digit   = pars.Any('0', NonZero)
	Digits  = pars.Many(Digit)
	Int     = pars.Any(Digit, pars.Seq(pars.Any('+', '-', pars.Epsilon), Digit, Digits))
	Exp     = pars.Seq(pars.Any('e', 'E'), int)
	Frac    = pars.Seq('.', Digits)
	Number  = pars.Seq(Int, pars.Maybe(Frac), pars.Maybe(Exp))
)

Let's break down the code from top to bottom. First, NonZero is defined using a pars.Any combinator. A pars.Any combinator will attempt to find a match within the combinators passed as arguments. The Digit is a pars.Any between a '0' and NonZero so it will match any digit. To form a Digits parser the pars.Many combinator is used, which will attempt to match the given parser as many times as possible. To define Int, the pars.Seq combinator is used to match a sequence of parsers in order. Because pars.Epsilon is an 'empty' matcher which will return no error at the current position and continue on, the first pars.Any in the pars.Seq can be interpreted as a parser that will match either a '+', '-', or nothing. The Exp and Frac parsers also use the pars.Seq combinators to match accordingly. The pars.Maybe combinator is functionally equivalent to pars.Any(parser, pars.Epsilon) which will try to match the parser but will simply ignore it if it does not match.

These parsers work, but there is a catch: these parsers do not have optimal performance. Although these definitions are intuitive, an optimal implementation will run a few tens of nanoseconds faster than its naively defined counterpart. This can add up to great amounts when parsing large quantities of bytes. Pars is shipped with pars.Int and pars.Number which have equivalent matching power but is faster. Another benefit of the built-in parsers are that they will also convert the matching string into the designated value type.

Now that we have our number parser, let's define an operation parser. A single operation consists of three elements: a single operator and two numbers. An operator may be any of '+', '-', '*', or '/'.

var Operator = pars.Byte('+', '-', '*', '/')

var Operation = pars.Seq(
	Operator,
	pars.Spaces,
	pars.Number,
	pars.Spaces,
	pars.Number,
)

The pars.Byte parser will match any one of the bytes in the arguments and return the matching byte as a token. This Operation parser will match our desired pattern but this will only be able to parse a single operation. In reality, we would want to parse a nested set of operations. To do this we need to introduce an expression parser that will match either an operation or a number. The operation parser will then need to be modified to parse an operator followed by two expressions. The expression parser will need to be defined prior to the other parsers and implemented later as it is a nested parser.

var Expression pars.Parser

var Operator = pars.Byte('+', '-', '*', '/')

var Operation = pars.Seq(
	Operator,
	pars.Spaces,
	&Expression,
	pars.Spaces,
	&Expression,
)

func init() {
	Expression = pars.Any(Operation, pars.Number)
}

By passing the reference of the Expression parser to pars.Seq, the parser is wrapped so the parser logic can be defined later. In the init() function we define Expression to be either an Operation or a pars.Number. Now the Expression parser can match any arbitrary nested equation written in Polish notation. So how do we actually perform the calculation?

First we need to understand what these parsers yield. A parser has the type of func(*pars.State, *pars.Result) error and the pars.Result struct will have either the Token, Value or Children field set as a result of executing the parser. The pars.Byte parser will set the Token field to the matching byte, the pars.Seq parser will set the Children field to a pars.Results list where the elements correspond to each of the parsers that it matched, and the pars.Numberparser will set the Value field to the parsed number value.

We need to map the result of Operation to a number by actually calculating the the value that the Operation evaluates to. The result consists of five elements set in the Children field where the second and fourth elements are the spaces in between the Operator and Expressions. The first element is the matching operator, and the remaining elements are the result of matching an Expression.

Because an Expression is recursive, we must be careful about handling its result. Here, an Expression is either an Operation or a pars.Number. The pars.Number parser yields a float64 value, and we are just defining the Operation parser to also yield a float64 value, so we can safely deduce that an Expression will yield a float64 value. With this in mind, we can define an evaluate function which takes the Operator and two Expression results to compute the result of the Operation.

func evaluate(result *pars.Result) error {
	op := result.Children[0].Token[0]
	a := result.Children[2].Value.(float64)
	b := result.Children[4].Value.(float64)
	switch op {
	case '+':
		result.SetValue(a + b)
	case '-':
		result.SetValue(a - b)
	case '*':
		result.SetValue(a * b)
	case '/':
		result.SetValue(a / b)
	default:
		return errors.New("operator matched a wrong byte")
	}
	return nil
}

Now that we have a function to map the result of an Expression to a value, we can associate this function to the Expression using the Map method.

func evaluate(result *pars.Result) error {
	op := result.Children[0].Token[0]
	a := result.Children[2].Value.(float64)
	b := result.Children[4].Value.(float64)
	switch op {
	case '+':
		result.SetValue(a + b)
	case '-':
		result.SetValue(a - b)
	case '*':
		result.SetValue(a * b)
	case '/':
		result.SetValue(a / b)
	default:
		return errors.New("operator matched a wrong byte")
	}
	return nil
}

// Expression is a placeholder.
var Expression pars.Parser

// Operator will match one of the four basic operators.
var Operator = pars.Byte('+', '-', '*', '/')

// Operation will match an operation.
var Operation = pars.Seq(
	Operator,
	pars.Spaces,
	&Expression,
	pars.Spaces,
	&Expression,
).Map(evaluate)

func init() {
	Expression = pars.Any(Operation, pars.Number.Map(pars.ParseFloat(64)))
}

You can run this parser as shown in the test code by using pars.Apply.

func TestPolish(t *testing.T) {
	t.Run("matches number", func(t *testing.T) {
		s := pars.FromString("42")
		result, err := Expression.Parse(s)
		require.NoError(t, err)
		require.Equal(t, 42.0, result)
	})

	t.Run("matches flat operation", func(t *testing.T) {
		s := pars.FromString("+ 2 2")
		result, err := Expression.Parse(s)
		require.NoError(t, err)
		require.Equal(t, 4.0, result)
	})

	t.Run("matches nested operation", func(t *testing.T) {
		s := pars.FromString("* - 5 6 7")
		result, err := Expression.Parse(s)
		require.NoError(t, err)
		require.Equal(t, -7.0, result)
	})

	t.Run("matches nested operation", func(t *testing.T) {
		s := pars.FromString("- 5 * 6 7")
		result, err := Expression.Parse(s)
		require.NoError(t, err)
		require.Equal(t, -37.0, result)
	})
}

By applying these concepts you can now create more complicated parsers like the JSON parser included in the examples directory.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	Null    = Byte(0)
	Graphic = ByteRange(32, 126)
	Control = Any(ByteRange(0, 31), Byte(127))
	Space   = Byte(ascii.Space...)
	Upper   = ByteRange('A', 'Z')
	Lower   = ByteRange('a', 'z')
	Letter  = Any(Upper, Lower)
	Digit   = ByteRange('0', '9')
	Latin   = Any(Letter, Digit)
)

Parsers for matching ASCII character patterns.

View Source
var Void = &Result{}

Void is a global Result for storing unused parsing results.

Functions

func Cat

func Cat(result *Result) error

Cat will concatenate the Token fields from all of the Children.

func Cut

func Cut(state *State, result *Result) error

Cut will disable backtracking beyond the cut position.

func EOL added in v1.0.1

func EOL(state *State, result *Result) error

EOL matches the end of a line. The end of a line is one of the following:

a carriage return (CR, '\r')
a line feed (LF, '\n')
a carriage return + line feed (CRLF, "\r\n")
end of state

func End added in v1.0.1

func End(state *State, result *Result) error

End will match if the state buffer has been exhausted and no more bytes can be read from the io.Reader object.

func Epsilon

func Epsilon(state *State, result *Result) error

Epsilon will do nothing.

func Head(state *State, result *Result) error

Head will match if the state is at the beginning of the buffer.

func Int added in v1.0.1

func Int(state *State, result *Result) error

Int will match an integer string and return its numerical representation. An integer is defined to be as follows in EBNF:

zero     = `0`
non-zero = `1` | `2` | `3` | `4` | `5` | `6` | `7` | `8` | `9`
digit    = zero | non-zero
integer  = [ ( `-` | `+` ) ], zero | digit, { digit }

This implementation is optimized so the parser will first scan as far as possible to match a valid integer and then retrieve a block of bytes and convert it to an `int` via strconv.Atoi.

func Line

func Line(state *State, result *Result) error

Line matches up to a newline byte or the end of state.

func NewError added in v1.0.1

func NewError(what string, pos Position) error

NewError creates a new Error.

func NewNestedError added in v1.0.1

func NewNestedError(name string, err error) error

NewNestedError creates a new NestedError.

func Next added in v1.0.1

func Next(state *State) (byte, error)

Next attempts to retrieve the next byte in the given state.

func Number

func Number(state *State, result *Result) error

Number will match a floating point number. A number is defined to be as follows in EBNF:

zero     = `0`
non-zero = `1` | `2` | `3` | `4` | `5` | `6` | `7` | `8` | `9`
digit    = zero | non-zero
integer  = [ ( `-` | `+` ) ], zero | digit, { digit }
fraction = `.`, digit, { digit }
exponent = ( `e` | `E` ), [ ( `-` | `+` ) ], integer
number   = [ ( `-` | `+` ) ],  integer, [ fraction ], [ exponent ]

This implementation is optimized so the parser will first scan as far as possible to match a valid number and then retrieve a block of bytes and convert it to a `float64` via strconv.ParseFloat.

func Skip added in v1.0.1

func Skip(state *State, n int) error

Skip the given state for the given number of bytes.

func Spaces added in v1.0.1

func Spaces(state *State, result *Result) error

Spaces will match as many spaces as possible.

func ToString added in v1.0.1

func ToString(result *Result) error

ToString will convert the Token field to a string Value.

func Trail added in v1.0.1

func Trail(state *State) ([]byte, error)

Trail will return the extent of the state buffer spanning from the most recently pushed state position to the current state position.

Types

type BoundError added in v1.0.1

type BoundError struct {
	// contains filtered or unexported fields
}

BoundError is an error bound to a parser.

func (BoundError) Error added in v1.0.1

func (e BoundError) Error() string

Error satisfies the error interface.

func (BoundError) Unwrap added in v1.0.1

func (e BoundError) Unwrap() error

Unwrap returns the internal error value.

type Error added in v1.0.1

type Error struct {
	// contains filtered or unexported fields
}

Error is a generic parser error.

func (Error) Error added in v1.0.1

func (e Error) Error() string

Error satisfies the error interface.

type Map

type Map func(result *Result) error

Map is the function signature for a result mapper.

func Child

func Child(i int) Map

Child will map to the i'th child of the result.

func Children

func Children(indices ...int) Map

Children will keep the children associated to the given indices.

func Join added in v1.0.1

func Join(sep []byte) Map

Join will join the tokens with the given separator.

func Time

func Time(layout string) Map

Time will attempt to parse the result token as a time.Time object.

type NestedError added in v1.0.1

type NestedError struct {
	// contains filtered or unexported fields
}

NestedError is a nested error type.

func (NestedError) Error added in v1.0.1

func (e NestedError) Error() string

Error satisfies the error interface.

func (NestedError) Unwrap added in v1.0.1

func (e NestedError) Unwrap() error

Unwrap returns the internal error value.

type Parser

type Parser func(state *State, result *Result) error

Parser is the function signature of a parser.

func Any

func Any(qs ...interface{}) Parser

Any creates a Parser which will attempt to match any of the given Parsers. If all of the given Parsers fail to match, the state will attempt to backtrack to the position before any of the given Parsers were applied. An error from the parser will be returned immediately if the state cannot be backtracked. Otherwise, the error from the last Parser will be returned.

func AsParser

func AsParser(q interface{}) Parser

AsParser attempts to create a Parser for a given argument.

func AsParsers

func AsParsers(qs ...interface{}) []Parser

AsParsers applies the AsParser function to each argument.

func Between added in v1.0.1

func Between(l, r byte) Parser

Between creates a Parser which will attempt to match a sequence of bytes between the given bytes. If a backslash appears in the middle of the string, the byte immediately following will be skipped.

func Byte

func Byte(p ...byte) Parser

Byte creates a Parser which will attempt to match the next single byte. If no bytes are given, it will match any byte. Otherwise, the given bytes will be tested for a match.

func ByteRange

func ByteRange(begin, end byte) Parser

ByteRange creates a Parser which will attempt to match a byte between the given range inclusively.

func Bytes

func Bytes(p []byte) Parser

Bytes creates a Parser which will attempt to match the given sequence of bytes.

func Count

func Count(q interface{}, n int) Parser

Count creates a Parser which will attempt to match the given Parser exactly N-times, where N is the given number.

func Delim

func Delim(q, s interface{}) Parser

Delim creates a Parser which will attempt to match the first Parser multiple times like Many, but with the second Parser in between.

func Dry

func Dry(q interface{}) Parser

Dry creates a Parser which will attempt to match the given Parser, but will restore the pre-matching state even if the given Parser matches.

func Exact added in v1.0.1

func Exact(q interface{}) Parser

Exact creates a Parser which will only match if the state is both at the beginning of the state and the given Parser exhausts the entire state after matching. This is equivalent to the following parser:

`pars.Seq(pars.Head, parser, pars.End).Map(Child(1))`

func Filter added in v1.0.1

func Filter(filter ascii.Filter) Parser

Filter creates a Parser which will attempt to match the given ascii.Filter.

func Many

func Many(q interface{}) Parser

Many creates a Parser which will attempt to match the given Parser as many times as possible.

func Maybe added in v1.0.1

func Maybe(q interface{}) Parser

Maybe creates a Parser which will attempt to match the given Parser but will not return an error upon a mismatch unless the state cannot be backtracked.

func Quoted

func Quoted(c byte) Parser

Quoted creates a Parser which will attempt to match a sequence of bytes flanked by the given byte. This Parser is equivalent to the following:

Between(c, c)

func Rune

func Rune(rs ...rune) Parser

Rune creates a Parser which will attempt to match the next single rune. If no runes are given, it will match any rune. Otherwise, the given runes will be tested for a match.

func RuneRange

func RuneRange(begin, end rune) Parser

RuneRange creates a Parser which will attempt to match any rune within the given range inclusively.

func Runes

func Runes(rs []rune) Parser

Runes creates a parser which will attempt to match the given sequence of runes.

func Seq

func Seq(qs ...interface{}) Parser

Seq creates a Parser which will attempt to match all of the given Parsers in the given order. If any of the given Parsers fail to match, the state will attempt to backtrack to the position before any of the given Parsers were applied.

func String

func String(s string) Parser

String creates a Parser which will attempt to match the given string.

func Until

func Until(q interface{}) Parser

Until creates a Parser which will advance the state until the given Parser matches, and return extent of the buffer up to the matching state position.

func Word

func Word(filter ascii.Filter) Parser

Word creates a Parser which will attempt to match a group of bytes which satisfy the given filter.

func (Parser) Bind

func (p Parser) Bind(v interface{}) Parser

Bind will bind the given value as the parser result value.

func (Parser) Child added in v1.0.1

func (p Parser) Child(i int) Parser

Child will map to the i'th child of the result.

func (Parser) Children added in v1.0.1

func (p Parser) Children(indices ...int) Parser

Children will keep the children associated to the given indices.

func (Parser) Error added in v1.0.1

func (p Parser) Error(alt error) Parser

Error will modify the Parser to return the given error if the Parser returns an error.

func (Parser) Map

func (p Parser) Map(f Map) Parser

Map applies the callback if the parser matches.

func (Parser) Parse added in v1.0.1

func (p Parser) Parse(s *State) (Result, error)

Parse the given state using the parser and return the Result.

func (Parser) ToString added in v1.0.1

func (p Parser) ToString() Parser

ToString will convert the Token field to a string Value.

type Position added in v1.0.1

type Position struct {
	Line int
	Byte int
}

Position represents the line and byte numbers.

func (Position) Head added in v1.0.1

func (p Position) Head() bool

Head tests if the position is at the head.

func (Position) Less added in v1.1.0

func (p Position) Less(q Position) bool

Less tests if the position is smaller than the argument.

func (Position) String added in v1.0.1

func (p Position) String() string

String returns a formatted position.

type Reader added in v1.0.1

type Reader struct {
	// contains filtered or unexported fields
}

Reader is a special io.Reader that will skip all whitespaces unless it is a part of a string literal (quoted by a ", ', or `).

func NewReader added in v1.0.1

func NewReader(r io.Reader) *Reader

NewReader creates a new reader.

func (*Reader) Read added in v1.0.1

func (r *Reader) Read(p []byte) (int, error)

Read satisfies the io.Reader interface.

type Result

type Result struct {
	Token    []byte
	Value    interface{}
	Children []Result
}

Result is the output of a parser. One of three fields will be set:

Token: a byte sequence matching for a primitive parser.
Value: any value, useful for constructing complex objects.
Children: results for individual child parsers.

Use one of the Set* methods to mutually set fields.

func AsResult added in v1.0.1

func AsResult(arg interface{}) *Result

AsResult creates a new result based on the given argument type.

func AsResults added in v1.0.1

func AsResults(args ...interface{}) *Result

AsResults transforms a given list of arguments into a result with Children with each argument as is result.

func NewChildrenResult added in v1.0.1

func NewChildrenResult(c []Result) *Result

NewChildrenResult creates a new result with the given children.

func NewTokenResult added in v1.0.1

func NewTokenResult(p []byte) *Result

NewTokenResult creates a new result with the given token.

func NewValueResult added in v1.0.1

func NewValueResult(v interface{}) *Result

NewValueResult creates a new result with the given value.

func (*Result) SetChildren added in v1.0.1

func (r *Result) SetChildren(c []Result)

SetChildren sets the children and clears other fields.

func (*Result) SetToken added in v1.0.1

func (r *Result) SetToken(p []byte)

SetToken sets the token and clears other fields.

func (*Result) SetValue added in v1.0.1

func (r *Result) SetValue(v interface{})

SetValue sets the value and clears other fields.

type State

type State struct {
	// contains filtered or unexported fields
}

State represents a parser state, which is a convenience wrapper for an io.Reader object with buffering and backtracking.

func FromBytes added in v1.0.1

func FromBytes(p []byte) *State

FromBytes creates a new state from the given bytes.

func FromString

func FromString(s string) *State

FromString creates a new state from the given string.

func NewState

func NewState(r io.Reader) *State

NewState creates a new state from the given io.Reader.

func (*State) Advance

func (s *State) Advance()

Advance the state by the amount given in a previous Request call.

func (State) Buffer

func (s State) Buffer() []byte

Buffer returns the range of bytes guaranteed by a Request call.

func (*State) Clear

func (s *State) Clear()

Clear will discard the buffer contents prior to the current state offset and drop all pushed states.

func (*State) Drop added in v1.0.1

func (s *State) Drop()

Drop will discard the most recently pushed state.

func (State) Dump added in v1.0.1

func (s State) Dump() []byte

Dump returns the entire remaining buffer content. Note that the returned byte slice will not always contain the entirety of the bytes that can be read by the io.Reader object.

func (State) Offset added in v1.0.1

func (s State) Offset() int

Offset returns the current state offset.

func (*State) Pop added in v1.0.1

func (s *State) Pop()

Pop will backtrack to the most recently pushed state.

func (State) Position

func (s State) Position() Position

Position returns the current line and byte position of the state.

func (*State) Push added in v1.0.1

func (s *State) Push()

Push the current state position for backtracking.

func (State) Pushed added in v1.0.1

func (s State) Pushed() bool

Pushed tests if the state has been pushed at least once.

func (*State) Read

func (s *State) Read(p []byte) (int, error)

Read satisfies the io.Reader interface.

func (*State) ReadByte added in v1.0.1

func (s *State) ReadByte() (byte, error)

ReadByte satisfies the io.ByteReader interace.

func (*State) Request added in v1.0.1

func (s *State) Request(n int) error

Request checks if the state contains at least the given number of bytes, additionally reading from the io.Reader object as necessary when the internal buffer is exhausted. If the call to Read for the io.Reader object returns an error, Request will return the corresponding error. A subsequent call to Advance will advance the state offset as far as possible.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL