parse

package module
v0.0.0-...-5cc0ed9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2024 License: MIT Imports: 5 Imported by: 6

README

Parse

A set of parsing tools for Go inspired by Sprache.

Build up complex parsers from small, simple functions that chomp away at the input.

Input

The input moves along as the parser succeeds.

input := parse.NewInput("ABCD")
item, ok, err := parse.String("A").Parse(input)
// Input is now at index 1.
item, ok, err := parse.String("B").Parse(input)
// Input is now at index 2.
item, ok, err := parse.String("XYZ").Parse(input)
// Input index didn't change and ok=false.
item, ok, err := parse.String("CD").Parse(input)
// Input is now at index 4.

Design

A parser must match the parse.Parser interface, or be created by the use of the parser.Func helper. These 3 parsers are equivalent.

parse.String("<")
parse.Func(func(in *parse.Input) (item string, ok bool, err error) {
	item, _ = in.Peek(1)
	ok = item == "<"
	return
})
type lessThanParser struct{}

func (ltp lessThanParser) Parse(in *parse.Input) (item string, ok bool, err error) {
	item, _ = in.Peek(1)
	ok = item == "<"
	return
}

Functions

Parser functions provide a way of matching patterns in a given input. They are designed to be able to be composed together to make more complex operations.

The examples directory contains several examples of composing the primitive functions.

  • Any
    • Parse any of the provided parse functions, or roll back.
  • AnyRune
    • Parse any rune.
  • AtLeast
    • Parse the provided function at least the number of times specified, or roll back.
  • AtMost
    • Parse the provided function at least once, and at most the number of times specified, or roll back.
  • Letter
    • Parse any letter in the Unicode Letter range or roll back.
  • Many
    • Parse the provided parse function a number of times or roll back.
  • Optional
    • Attempt to parse, but don't roll back if a match isn't found.
  • Or
    • Return the first successful result of the provided parse functions, or roll back.
  • Rune
    • Parse the specified rune (character) or fallback.
  • RuneIn
    • Parse a rune from the input stream if it's in the specified string, or roll back.
  • RuneInRanges
    • Parse a rune from the input stream if it's in the specified Unicode ranges, or roll back.
  • RuneNotIn
    • Parse a rune from the input stream if it's not in the specified string, or roll back.
  • RuneWhere
    • Parse a rune from the input stream if the predicate function passed in succeeds, or roll back.
  • String
    • Parse a string from the input stream if it exactly matches the provided string, or roll back.
  • StringUntil
    • Parse a string from the input stream until the specified until parser is matched.
  • Then
    • Return the results of the first and second parser passed through the combiner function which converts the two results into a single output (a map / reduce operation), or roll back if either doesn't match.
  • Times
    • Parse using the specified function a set number of times or roll back.
  • Until
    • Parse from the input stream until the specified until parser is matched.
  • ZeroToNine
    • Parse a rune from the input stream if it's within the set of 1234567890.

More complex parsers

More complex parsers will need to store the start position, and rollback by using the input's Seek function if the current parser does not match the input.

func ExampleParser() {
	type GotoStatement struct {
		Line int64
	}
	gotoParser := parse.Func(func(in *parse.Input) (item GotoStatement, ok bool, err error) {
		start := in.Index()

		if _, ok, err = parse.String("GOTO ").Parse(in); err != nil || !ok {
			// Rollback, and return.
			in.Seek(start)
			return
		}

		// Read until the next newline or the EOF.
		until := parse.Any(parse.NewLine, parse.EOF[string]())
		var lineNumber string
		if lineNumber, ok, err = parse.StringUntil(until).Parse(in); err != nil || !ok {
			err = parse.Error("Syntax error: GOTO is missing line number", in.Position())
			return
		}
		// We must have a valid line number now, or there is a syntax error.
		item.Line, err = strconv.ParseInt(lineNumber, 10, 64)
		if err != nil {
			return item, false, parse.Error("Syntax error: GOTO has invalid line number", in.Position())
		}

		// Chomp the newline we read up to.
		until.Parse(in)

		return item, true, nil
	})

	inputs := []string{
		"GOTO 10",
		"GOTO abc",
		"FOR i = 0",
	}
	for _, input := range inputs {
		stmt, ok, err := gotoParser.Parse(parse.NewInput(input))
		fmt.Printf("%+v, ok=%v, err=%v\n", stmt, ok, err)
	}
	// Output:
	// {Line:10}, ok=true, err=<nil>
	// {Line:0}, ok=false, err=Syntax error: GOTO has invalid line number: line 0, col 8
	// {Line:0}, ok=false, err=<nil>
}

Documentation

Index

Examples

Constants

This section is empty.

Variables

View Source
var AnyRune = RuneWhere(func(r rune) bool { return true })

AnyRune matches any single rune.

View Source
var CR = Rune('\r')

CR is a carriage return.

View Source
var CRLF = String("\r\n")

CRLF parses a carriage returned, followed by a line feed, used by Windows systems as the newline.

View Source
var LF = Rune('\n')

CR parses a line feed, used by Unix systems as the newline.

Letter returns a parser which accepts a rune within the Letter Unicode range.

View Source
var NewLine = Any(CRLF, LF)

NewLine matches either a Windows or Unix line break character.

View Source
var OptionalWhitespace = Func(func(in *Input) (output string, ok bool, err error) {
	output, ok, err = Whitespace.Parse(in)
	if err != nil {
		return
	}
	return output, true, nil
})

OptionalWhitespace parses optional whitespace.

Functions

This section is empty.

Types

type Input

type Input struct {
	// contains filtered or unexported fields
}

InputString is an input used by parsers. It stores the current location and character positions.

func NewInput

func NewInput(s string) *Input

NewInput creates an input from the given string.

func (*Input) Index

func (in *Input) Index() int

Index returns the current character index of the parser input.

func (*Input) Peek

func (in *Input) Peek(n int) (s string, ok bool)

func (*Input) Position

func (in *Input) Position() Position

Position returns the zero-bound index, line and column number of the current position within the stream.

func (*Input) PositionAt

func (in *Input) PositionAt(index int) Position

Position returns the zero-bound index, line and column number of the current position within the stream.

func (*Input) Seek

func (in *Input) Seek(index int) (ok bool)

Seek to a position in the string.

func (*Input) Take

func (in *Input) Take(n int) (s string, ok bool)

type Match

type Match[T any] struct {
	Value T
	OK    bool
}

type ParseError

type ParseError struct {
	Msg string
	Pos Position
}

func Error

func Error(msg string, pos Position) ParseError

func (ParseError) Error

func (e ParseError) Error() string

type Parser

type Parser[T any] interface {
	Parse(in *Input) (item T, ok bool, err error)
}

Parser is implemented by all parsers.

Example
package main

import (
	"fmt"
	"strconv"

	"github.com/a-h/parse"
)

func main() {
	type GotoStatement struct {
		Line int64
	}
	gotoParser := parse.Func(func(in *parse.Input) (item GotoStatement, ok bool, err error) {
		start := in.Index()

		if _, ok, err = parse.String("GOTO ").Parse(in); err != nil || !ok {
			// Rollback, and return.
			in.Seek(start)
			return
		}

		// Read until the next newline or the EOF.
		until := parse.Any(parse.NewLine, parse.EOF[string]())
		var lineNumber string
		if lineNumber, ok, err = parse.StringUntil(until).Parse(in); err != nil || !ok {
			err = parse.Error("Syntax error: GOTO is missing line number", in.Position())
			return
		}
		// We must have a valid line number now, or there is a syntax error.
		item.Line, err = strconv.ParseInt(lineNumber, 10, 64)
		if err != nil {
			return item, false, parse.Error("Syntax error: GOTO has invalid line number", in.Position())
		}

		// Chomp the newline we read up to.
		_, _, _ = until.Parse(in)

		return item, true, nil
	})

	inputs := []string{
		"GOTO 10",
		"GOTO abc",
		"FOR i = 0",
	}

	for _, input := range inputs {
		stmt, ok, err := gotoParser.Parse(parse.NewInput(input))
		fmt.Printf("%+v, ok=%v, err=%v\n", stmt, ok, err)
	}
}
Output:

{Line:10}, ok=true, err=<nil>
{Line:0}, ok=false, err=Syntax error: GOTO has invalid line number: line 0, col 8
{Line:0}, ok=false, err=<nil>

Whitespace parses whitespace.

var ZeroToNine Parser[string] = RuneIn("0123456789")

ZeroToNine matches a single rune from 0123456789.

func All

func All[T any](parsers ...Parser[T]) Parser[[]T]

All parses all of the parsers in the list in sequence and combines the result.

Example
package main

import (
	"fmt"

	"github.com/a-h/parse"
)

func main() {
	abcParser := parse.All(parse.String("A"), parse.String("B"), parse.String("C"))

	fmt.Println(abcParser.Parse(parse.NewInput("ABC")))
	fmt.Println(abcParser.Parse(parse.NewInput("AB")))
	fmt.Println(abcParser.Parse(parse.NewInput("A")))
}
Output:

[A B C] true <nil>
[A B] false <nil>
[A] false <nil>

func Any

func Any[T any](parsers ...Parser[T]) Parser[T]

Any parses any one of the parsers in the list.

Example
package main

import (
	"fmt"

	"github.com/a-h/parse"
)

func main() {
	abParser := parse.Any(parse.String("A"), parse.String("B"))

	fmt.Println(abParser.Parse(parse.NewInput("A")))
	fmt.Println(abParser.Parse(parse.NewInput("B")))
	fmt.Println(abParser.Parse(parse.NewInput("C")))
}
Output:

A true <nil>
B true <nil>
 false <nil>

func AtLeast

func AtLeast[T any](min int, p Parser[T]) Parser[[]T]

AtLeast matches the given parser at least min times.

func AtMost

func AtMost[T any](max int, p Parser[T]) Parser[[]T]

AtMost matches the given parser at most max times. It is equivalent to ZeroOrMore.

func Convert

func Convert[A, B any](parser Parser[A], converter func(a A) (B, error)) Parser[B]

Convert a parser's output type using the given conversion function.

func EOF

func EOF[T any]() Parser[T]

EOF matches the end of the input.

func Func

func Func[T any](f func(in *Input) (item T, ok bool, err error)) Parser[T]

Func creates a parser from an input function.

func MustRegexp

func MustRegexp(exp string) (p Parser[string])

MustRegexp creates a parse that parses from the input's current position. Passing in a regular expression that doesn't compile will result in a panic.

func OneOrMore

func OneOrMore[T any](p Parser[T]) Parser[[]T]

OneOrMore matches the given parser at least once.

func Optional

func Optional[T any](parser Parser[T]) Parser[Match[T]]

Optional converts the given parser into an optional parser.

Example
package main

import (
	"fmt"

	"github.com/a-h/parse"
)

func main() {
	abcParser := parse.StringFrom(
		parse.StringFrom(parse.Optional(parse.String("A"))),
		parse.String("B"),
	)

	fmt.Println(abcParser.Parse(parse.NewInput("ABC")))
	fmt.Println(abcParser.Parse(parse.NewInput("B")))
	fmt.Println(abcParser.Parse(parse.NewInput("A")))
}
Output:

AB true <nil>
B true <nil>
 false <nil>

func Or

func Or[A any, B any](a Parser[A], b Parser[B]) Parser[Tuple2[Match[A], Match[B]]]

Or returns a success if either a or b can be parsed. If both a and b match, a takes precedence.

func Regexp

func Regexp(exp string) (p Parser[string], err error)

Regexp creates a parser that parses from the input's current position, or fails.

func Repeat

func Repeat[T any](min, max int, p Parser[T]) Parser[[]T]

Repeat matches the given parser between min and max times.

func Rune

func Rune(r rune) Parser[string]

Rune matches a single rune.

func RuneIn

func RuneIn(s string) Parser[string]

RuneIn matches a single rune when the rune is in the string s.

func RuneInRanges

func RuneInRanges(ranges ...*unicode.RangeTable) Parser[string]

RuneInRanges matches a single rune when the rune is withig one of the given Unicode ranges.

func RuneNotIn

func RuneNotIn(s string) Parser[string]

RuneNotIn matches a single rune when the rune is not in the string s.

func RuneWhere

func RuneWhere(predicate func(r rune) bool) Parser[string]

RuneWhere matches a single rune using the given predicate function.

func SequenceOf2

func SequenceOf2[A, B any](a Parser[A], b Parser[B]) Parser[Tuple2[A, B]]

func SequenceOf3

func SequenceOf3[A, B, C any](a Parser[A], b Parser[B], c Parser[C]) Parser[Tuple3[A, B, C]]

func SequenceOf4

func SequenceOf4[A, B, C, D any](a Parser[A], b Parser[B], c Parser[C], d Parser[D]) Parser[Tuple4[A, B, C, D]]

func SequenceOf5

func SequenceOf5[A, B, C, D, E any](a Parser[A], b Parser[B], c Parser[C], d Parser[D], e Parser[E]) Parser[Tuple5[A, B, C, D, E]]

func SequenceOf6

func SequenceOf6[A, B, C, D, E, F any](a Parser[A], b Parser[B], c Parser[C], d Parser[D], e Parser[E], f Parser[F]) Parser[Tuple6[A, B, C, D, E, F]]

func SequenceOf7

func SequenceOf7[A, B, C, D, E, F, G any](a Parser[A], b Parser[B], c Parser[C], d Parser[D], e Parser[E], f Parser[F], g Parser[G]) Parser[Tuple7[A, B, C, D, E, F, G]]

func SequenceOf8

func SequenceOf8[A, B, C, D, E, F, G, H any](a Parser[A], b Parser[B], c Parser[C], d Parser[D], e Parser[E], f Parser[F], g Parser[G], h Parser[H]) Parser[Tuple8[A, B, C, D, E, F, G, H]]

func SequenceOf9

func SequenceOf9[A, B, C, D, E, F, G, H, I any](a Parser[A], b Parser[B], c Parser[C], d Parser[D], e Parser[E], f Parser[F], g Parser[G], h Parser[H], i Parser[I]) Parser[Tuple9[A, B, C, D, E, F, G, H, I]]

func String

func String(s string) Parser[string]

String matches a given string constant.

Example
package main

import (
	"fmt"

	"github.com/a-h/parse"
)

func main() {
	abParser := parse.Any(parse.String("A"))

	fmt.Println(abParser.Parse(parse.NewInput("A")))
	fmt.Println(abParser.Parse(parse.NewInput("B")))
}
Output:

A true <nil>
 false <nil>

func StringFrom

func StringFrom[T any](parsers ...Parser[T]) Parser[string]

StringFrom returns the string range captured by the given parsers.

func StringInsensitive

func StringInsensitive(s string) Parser[string]

StringInsensitive matches a given string constant using Unicode case folding.

func StringUntil

func StringUntil[T any](delimiter Parser[T]) Parser[string]

StringUntil matches until the delimiter is reached.

func StringUntilEOF

func StringUntilEOF[T any](delimiter Parser[T]) Parser[string]

StringUntilEOF matches until the delimiter or the end of the file is reached.

func Then

func Then[A any, B any](a Parser[A], b Parser[B]) Parser[Tuple2[A, B]]

Then matches a sequence of two parsers. For multiples of the same type, use Times, Repeat, AtLeast, AtMost, ZeroOrMore, OneOrMore.

func Times

func Times[T any](n int, p Parser[T]) Parser[[]T]

Times matches the given parser n times.

func Until

func Until[T, D any](parser Parser[T], delimiter Parser[D]) Parser[[]T]

Until matches until the delimiter is reached.

func UntilEOF

func UntilEOF[T, D any](parser Parser[T], delimiter Parser[D]) Parser[[]T]

UntilEOF matches until the delimiter or the end of the file is reached.

func ZeroOrMore

func ZeroOrMore[T any](p Parser[T]) Parser[[]T]

ZeroOrMore matches the given parser zero or more times.

type Position

type Position struct {
	Index, Line, Col int
}

func (Position) String

func (pos Position) String() string

type Tuple2

type Tuple2[A, B any] struct {
	A A
	B B
}

type Tuple3

type Tuple3[A, B, C any] struct {
	A A
	B B
	C C
}

type Tuple4

type Tuple4[A, B, C, D any] struct {
	A A
	B B
	C C
	D D
}

type Tuple5

type Tuple5[A, B, C, D, E any] struct {
	A A
	B B
	C C
	D D
	E E
}

type Tuple6

type Tuple6[A, B, C, D, E, F any] struct {
	A A
	B B
	C C
	D D
	E E
	F F
}

type Tuple7

type Tuple7[A, B, C, D, E, F, G any] struct {
	A A
	B B
	C C
	D D
	E E
	F F
	G G
}

type Tuple8

type Tuple8[A, B, C, D, E, F, G, H any] struct {
	A A
	B B
	C C
	D D
	E E
	F F
	G G
	H H
}

type Tuple9

type Tuple9[A, B, C, D, E, F, G, H, I any] struct {
	A A
	B B
	C C
	D D
	E E
	F F
	G G
	H H
	I I
}

Directories

Path Synopsis
examples
csv

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL