lexer

package
v2.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 30, 2023 License: MIT Imports: 13 Imported by: 270

Documentation

Overview

Package lexer defines interfaces and implementations used by Participle to perform lexing.

The primary interfaces are Definition and Lexer. There are two concrete implementations included. The first is one based on Go's text/scanner package. The second is Participle's default stateful/modal lexer.

The stateful lexer is based heavily on the approach used by Chroma (and Pygments).

It is a state machine defined by a map of rules keyed by state. Each rule is a named regex and optional operation to apply when the rule matches.

As a convenience, any Rule starting with a lowercase letter will be elided from output.

Lexing starts in the "Root" group. Each rule is matched in order, with the first successful match producing a lexeme. If the matching rule has an associated Action it will be executed.

A state change can be introduced with the Action `Push(state)`. `Pop()` will return to the previous state.

To reuse rules from another state, use `Include(state)`.

As a special case, regexes containing backrefs in the form \N (where N is a digit) will match the corresponding capture group from the immediate parent group. This can be used to parse, among other things, heredocs.

See the README, example and tests in this package for details.

Index

Examples

Constants

This section is empty.

Variables

View Source
var ReturnRule = Rule{"returnToParent", "", nil}

ReturnRule signals the lexer to return immediately.

Functions

func BackrefRegex

func BackrefRegex(backrefCache *sync.Map, input string, groups []string) (*regexp.Regexp, error)

BackrefRegex returns a compiled regular expression with backreferences replaced by groups.

func MakeSymbolTable

func MakeSymbolTable(def Definition, types ...string) (map[TokenType]bool, error)

MakeSymbolTable builds a lookup table for checking token ID existence.

For each symbolic name in "types", the returned map will contain the corresponding token ID as a key.

func NameOfReader

func NameOfReader(r interface{}) string

NameOfReader attempts to retrieve the filename of a reader.

func SymbolsByRune

func SymbolsByRune(def Definition) map[TokenType]string

SymbolsByRune returns a map of lexer symbol names keyed by rune.

Types

type Action

type Action interface {
	// contains filtered or unexported methods
}

A Action is applied when a rule matches.

func Pop

func Pop() Action

Pop to the previous state.

func Push

func Push(state string) Action

Push to the given state.

The target state will then be the set of rules used for matching until another Push or Pop is encountered.

type ActionPop

type ActionPop struct{}

ActionPop pops to the previous state when the Rule matches.

type ActionPush

type ActionPush struct {
	State string `json:"state"`
}

ActionPush pushes the current state and switches to "State" when the Rule matches.

type BytesDefinition

type BytesDefinition interface {
	LexBytes(filename string, input []byte) (Lexer, error)
}

BytesDefinition is an optional interface lexer Definition's can implement to offer a fast path for lexing byte slices.

type Checkpoint

type Checkpoint struct {
	// contains filtered or unexported fields
}

Checkpoint wraps the mutable state of the PeekingLexer.

Copying and restoring just this state is a bit faster than copying the entire PeekingLexer.

func (Checkpoint) Cursor

func (c Checkpoint) Cursor() int

Cursor position in tokens, excluding elided tokens.

func (Checkpoint) RawCursor

func (c Checkpoint) RawCursor() RawCursor

RawCursor position in tokens, including elided tokens.

type Definition

type Definition interface {
	// Symbols returns a map of symbolic names to the corresponding pseudo-runes for those symbols.
	// This is the same approach as used by text/scanner. For example, "EOF" might have the rune
	// value of -1, "Ident" might be -2, and so on.
	Symbols() map[string]TokenType
	// Lex an io.Reader.
	Lex(filename string, r io.Reader) (Lexer, error)
}

Definition is the main entry point for lexing.

var (
	TextScannerLexer Definition = &textScannerLexerDefinition{}

	// DefaultDefinition defines properties for the default lexer.
	DefaultDefinition = TextScannerLexer
)

TextScannerLexer is a lexer that uses the text/scanner module.

func Must

func Must(def Definition, err error) Definition

Must takes the result of a Definition constructor call and returns the definition, but panics if it errors

eg.

lex = lexer.Must(lexer.Build(`Symbol = "symbol" .`))

func NewTextScannerLexer

func NewTextScannerLexer(configure func(*scanner.Scanner)) Definition

NewTextScannerLexer constructs a Definition that uses an underlying scanner.Scanner

"configure" will be called after the scanner.Scanner.Init(r) is called. If "configure" is nil a default scanner.Scanner will be used.

type Error

type Error struct {
	Msg string
	Pos Position
}

Error represents an error while lexing.

It complies with the participle.Error interface.

func (*Error) Error

func (e *Error) Error() string

Error formats the error with FormatError.

func (*Error) Message

func (e *Error) Message() string

func (*Error) Position

func (e *Error) Position() Position

type Lexer

type Lexer interface {
	// Next consumes and returns the next token.
	Next() (Token, error)
}

A Lexer returns tokens from a source.

func Lex

func Lex(filename string, r io.Reader) Lexer

Lex an io.Reader with text/scanner.Scanner.

This provides very fast lexing of source code compatible with Go tokens.

Note that this differs from text/scanner.Scanner in that string tokens will be unquoted.

func LexBytes

func LexBytes(filename string, b []byte) Lexer

LexBytes returns a new default lexer over bytes.

func LexString

func LexString(filename, s string) Lexer

LexString returns a new default lexer over a string.

func LexWithScanner

func LexWithScanner(filename string, scan *scanner.Scanner) Lexer

LexWithScanner creates a Lexer from a user-provided scanner.Scanner.

Useful if you need to customise the Scanner.

type PeekingLexer

type PeekingLexer struct {
	Checkpoint
	// contains filtered or unexported fields
}

PeekingLexer supports arbitrary lookahead as well as cloning.

func Upgrade

func Upgrade(lex Lexer, elide ...TokenType) (*PeekingLexer, error)

Upgrade a Lexer to a PeekingLexer with arbitrary lookahead.

"elide" is a slice of token types to elide from processing.

func (*PeekingLexer) FastForward

func (p *PeekingLexer) FastForward(rawCursor RawCursor)

FastForward the internal cursors to this RawCursor position.

func (*PeekingLexer) LoadCheckpoint

func (p *PeekingLexer) LoadCheckpoint(checkpoint Checkpoint)

func (*PeekingLexer) MakeCheckpoint

func (p *PeekingLexer) MakeCheckpoint() Checkpoint

func (*PeekingLexer) Next

func (p *PeekingLexer) Next() *Token

Next consumes and returns the next token.

func (*PeekingLexer) Peek

func (p *PeekingLexer) Peek() *Token

Peek ahead at the next non-elided token.

func (*PeekingLexer) PeekAny

func (p *PeekingLexer) PeekAny(match func(Token) bool) (t Token, rawCursor RawCursor)

PeekAny peeks forward over elided and non-elided tokens.

Elided tokens will be returned if they match, otherwise the next non-elided token will be returned.

The returned RawCursor position is the location of the returned token. Use FastForward to move the internal cursors forward.

func (*PeekingLexer) Range

func (p *PeekingLexer) Range(rawStart, rawEnd RawCursor) []Token

Range returns the slice of tokens between the two cursor points.

func (*PeekingLexer) RawPeek

func (p *PeekingLexer) RawPeek() *Token

RawPeek peeks ahead at the next raw token.

Unlike Peek, this will include elided tokens.

type Position

type Position struct {
	Filename string
	Offset   int
	Line     int
	Column   int
}

Position of a token.

func (Position) Add added in v2.1.0

func (p Position) Add(pos Position) Position

Add returns a new Position that is the sum of this position and "pos".

This is useful when parsing values from a parent grammar.

func (*Position) Advance

func (p *Position) Advance(span string)

Advance the Position based on the number of characters and newlines in "span".

func (Position) GoString

func (p Position) GoString() string

func (Position) String

func (p Position) String() string

type RawCursor

type RawCursor int

RawCursor index in the token stream.

type Rule

type Rule struct {
	Name    string `json:"name"`
	Pattern string `json:"pattern"`
	Action  Action `json:"action"`
}

A Rule matching input and possibly changing state.

func Include

func Include(state string) Rule

Include rules from another state in this one.

func Return

func Return() Rule

Return to the parent state.

Useful as the last rule in a sub-state.

func (*Rule) MarshalJSON

func (r *Rule) MarshalJSON() ([]byte, error)

func (*Rule) UnmarshalJSON

func (r *Rule) UnmarshalJSON(data []byte) error

type Rules

type Rules map[string][]Rule

Rules grouped by name.

type RulesAction

type RulesAction interface {
	// contains filtered or unexported methods
}

RulesAction is an optional interface that Actions can implement.

It is applied during rule construction to mutate the rule map.

type SimpleRule

type SimpleRule struct {
	Name    string
	Pattern string
}

SimpleRule is a named regular expression.

type StatefulDefinition

type StatefulDefinition struct {
	// contains filtered or unexported fields
}

StatefulDefinition is the lexer.Definition.

func MustSimple

func MustSimple(rules []SimpleRule) *StatefulDefinition

MustSimple creates a new Stateful lexer with only a single root state.

It panics if there is an error.

func MustStateful

func MustStateful(rules Rules) *StatefulDefinition

MustStateful creates a new stateful lexer and panics if it is incorrect.

func New

func New(rules Rules) (*StatefulDefinition, error)

New constructs a new stateful lexer from rules.

Example

An example of parsing nested expressions within strings.

type Terminal struct {
	String *String `  @@`
	Ident  string  `| @Ident`
}

type Expr struct {
	Left  *Terminal `@@`
	Op    string    `( @Oper`
	Right *Terminal `  @@)?`
}

type Fragment struct {
	Escaped string `(  @Escaped`
	Expr    *Expr  ` | "${" @@ "}"`
	Text    string ` | @Char)`
}

type String struct {
	Fragments []*Fragment `"\"" @@* "\""`
}

def, err := lexer.New(interpolatedRules)
if err != nil {
	log.Fatal(err)
}
parser, err := participle.Build[String](participle.Lexer(def))
if err != nil {
	log.Fatal(err)
}

actual, err := parser.ParseString("", `"hello ${user + "??"}"`)
if err != nil {
	log.Fatal(err)
}
repr.Println(actual)
Output:

&lexer_test.String{
  Fragments: []*lexer_test.Fragment{
    {
      Text: "hello ",
    },
    {
      Expr: &lexer_test.Expr{
        Left: &lexer_test.Terminal{
          Ident: "user",
        },
        Op: "+",
        Right: &lexer_test.Terminal{
          String: &lexer_test.String{
            Fragments: []*lexer_test.Fragment{
              {
                Text: "??",
              },
            },
          },
        },
      },
    },
  },
}

func NewSimple

func NewSimple(rules []SimpleRule) (*StatefulDefinition, error)

NewSimple creates a new Stateful lexer with only a single root state.

func (*StatefulDefinition) Lex

func (d *StatefulDefinition) Lex(filename string, r io.Reader) (Lexer, error)

func (*StatefulDefinition) LexString

func (d *StatefulDefinition) LexString(filename string, s string) (Lexer, error)

LexString is a fast-path implementation for lexing strings.

func (*StatefulDefinition) MarshalJSON

func (d *StatefulDefinition) MarshalJSON() ([]byte, error)

func (*StatefulDefinition) Rules

func (d *StatefulDefinition) Rules() Rules

Rules returns the user-provided Rules used to construct the lexer.

func (*StatefulDefinition) Symbols

func (d *StatefulDefinition) Symbols() map[string]TokenType

type StatefulLexer

type StatefulLexer struct {
	// contains filtered or unexported fields
}

StatefulLexer implementation.

func (*StatefulLexer) Next

func (l *StatefulLexer) Next() (Token, error)

type StringDefinition

type StringDefinition interface {
	LexString(filename string, input string) (Lexer, error)
}

StringDefinition is an optional interface lexer Definition's can implement to offer a fast path for lexing strings.

type Token

type Token struct {
	// Type of token. This is the value keyed by symbol as returned by Definition.Symbols().
	Type  TokenType
	Value string
	Pos   Position
}

A Token returned by a Lexer.

func ConsumeAll

func ConsumeAll(lexer Lexer) ([]Token, error)

ConsumeAll reads all tokens from a Lexer.

func EOFToken

func EOFToken(pos Position) Token

EOFToken creates a new EOF token at the given position.

func (Token) EOF

func (t Token) EOF() bool

EOF returns true if this Token is an EOF token.

func (Token) GoString

func (t Token) GoString() string

func (Token) String

func (t Token) String() string

type TokenType

type TokenType int
const (
	// EOF represents an end of file.
	EOF TokenType = -(iota + 1)
)

Directories

Path Synopsis
Code generated by Participle.
Code generated by Participle.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL