lexer

package

v0.0.0-...-48f3ab3 Latest Latest Go to latest Published: Nov 27, 2017 License: MIT Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/igor-k/participle

Links

Open Source Insights

Documentation ¶

Overview ¶

Package lexer defines interfaces and implementations used by Participle to perform lexing.

The primary interfaces are Definition and Lexer. There are three implementations of these interfaces:

TextScannerLexer is based on text/scanner. This is the fastest, but least flexible, in that tokens are restricted to those supported by that package. It can scan about 5M tokens/second on a late 2013 15" MacBook Pro.

The second lexer is constructed via the Regexp() function, mapping regexp capture groups to tokens. The complete input source is read into memory, so it is unsuitable for large inputs.

The final lexer provided accepts a lexical grammar in EBNF. Each capitalised production is a lexical token supported by the resulting Lexer. This is very flexible, but a bit slower, scanning around 730K tokens/second on the same machine, though it is currently completely unoptimised. This could/should be converted to a table-based lexer.

Lexer implementations must use Panic/Panicf to report errors.

Index ¶

Constants
func MakeSymbolTable(def Definition, types ...string) map[rune]bool
func NameOfReader(r io.Reader) string
func Panic(pos Position, message string)
func Panicf(pos Position, format string, args ...interface{})
type Definition
type Error
- func Errorf(pos Position, format string, args ...interface{}) *Error
- func (e *Error) Error() string
type Lexer
type MapFunc
type Position
- func (p Position) String() string
type Token
- func ConsumeAll(lexer Lexer) (tokens []Token, err error)
- func RuneToken(r rune) Token
- func (t Token) EOF() bool
- func (t Token) String() string

Constants ¶

View Source

const (
	// EOF represents an end of file.
	EOF rune = -(iota + 1)
)

Variables ¶

This section is empty.

Functions ¶

func MakeSymbolTable ¶

func MakeSymbolTable(def Definition, types ...string) map[rune]bool

MakeSymbolTable is a useful helper function for Definition decorator types.

func NameOfReader ¶

func NameOfReader(r io.Reader) string

NameOfReader attempts to retrieve the filename of a reader.

func Panic ¶

func Panic(pos Position, message string)

Panic throws a lexer error. Lexers should use this to report errors.

func Panicf ¶

func Panicf(pos Position, format string, args ...interface{})

Panicf throws an *Error while parsing.

Types ¶

type Definition ¶

type Definition interface {
	// Lex an io.Reader.
	Lex(io.Reader) Lexer
	// Symbols returns a map of symbolic names to the corresponding pseudo-runes for those symbols.
	// This is the same approach as used by text/scanner. For example, "EOF" might have the rune
	// value of -1, "Ident" might be -2, and so on.
	Symbols() map[string]rune
}

Definition provides the parser with metadata for a lexer.

var (
	// EOFToken is a Token representing EOF.
	EOFToken = Token{Type: EOF, Value: "<<EOF>>"}

	// DefaultDefinition defines properties for the default lexer.
	DefaultDefinition Definition = &defaultDefinition{}
)

var TextScannerLexer Definition = &defaultDefinition{}

TextScannerLexer is a lexer that uses the text/scanner module.

func EBNF ¶

func EBNF(grammar string) (Definition, error)

EBNF creates a Lexer from an EBNF grammar.

The EBNF grammar syntax is as defined by "golang.org/x/exp/ebnf". Upper-case productions are exported as symbols. All productions are lexical.

Here's an example grammar for parsing whitespace and identifiers:

Identifier = alpha { alpha | number } .
Whitespace = "\n" | "\r" | "\t" | " " .
alpha = "a"…"z" | "A"…"Z" | "_" .
number = "0"…"9" .

func Elide ¶

func Elide(def Definition, types ...string) Definition

Elide wraps a Lexer, removing tokens matching the given types.

func Map ¶

func Map(def Definition, f MapFunc) Definition

Map is a Lexer that applies a mapping function to a Lexer's tokens.

func Must ¶

func Must(def Definition, err error) Definition

Must takes the result of a Definition constructor call and returns the definition, but panics if it errors

eg.

lex = lexer.Must(lexer.Build(`Symbol = "symbol" .`))

func Regexp ¶

func Regexp(pattern string) (Definition, error)

Regexp creates a lexer definition from a regular expression.

Each named sub-expression in the regular expression matches a token. Anonymous sub-expressions will be matched and discarded.

eg.

def, err := Regexp(`(?P<Ident>[a-z]+)|(\s+)|(?P<Number>\d+)`)

func Unquote ¶

func Unquote(def Definition, types ...string) Definition

Unquote applies strconv.Unquote() to tokens of the given types.

Tokens of type "String" will be unquoted if no other types are provided.

func Upper ¶

func Upper(def Definition, types ...string) Definition

Upper case all tokens of the given type. Useful for case normalisation.

type Error ¶

type Error struct {
	Message string
	Pos     Position
}

Error represents an error while parsing.

func Errorf ¶

func Errorf(pos Position, format string, args ...interface{}) *Error

Errorf creats a new Error at the given position.

func (*Error) Error ¶

func (e *Error) Error() string

Error complies with the error interface and reports the position of an error.

type Lexer ¶

type Lexer interface {
	// Peek at the next token.
	Peek() Token
	// Next consumes and returns the next token.
	Next() Token
}

A Lexer returns tokens from a source.

Errors are reported via panic, with the panic value being an instance of Error.

func Lex ¶

func Lex(r io.Reader) Lexer

Lex an io.Reader with text/scanner.Scanner.

Note that this differs from text/scanner.Scanner in that string tokens will be unquoted.

func LexBytes ¶

func LexBytes(b []byte) Lexer

LexBytes returns a new default lexer over bytes.

func LexString ¶

func LexString(s string) Lexer

LexString returns a new default lexer over a string.

type MapFunc ¶

type MapFunc func(*Token) *Token

MapFunc transforms tokens.

If nil is returned that token will be discarded.

type Position ¶

type Position struct {
	Filename string
	Offset   int
	Line     int
	Column   int
}

Position of a token.

func (Position) String ¶

func (p Position) String() string

type Token ¶

type Token struct {
	// Type of token. This is the value keyed by symbol as returned by Definition.Symbols().
	Type  rune
	Value string
	Pos   Position
}

A Token returned by a Lexer.

func ConsumeAll ¶

func ConsumeAll(lexer Lexer) (tokens []Token, err error)

ConsumeAll reads all tokens from a Lexer.

func RuneToken ¶

func RuneToken(r rune) Token

RuneToken represents a rune as a Token.

func (Token) EOF ¶

func (t Token) EOF() bool

EOF returns true if this Token is an EOF token.

func (Token) String ¶

func (t Token) String() string

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL