Documentation ¶
Overview ¶
Package lexer defines interfaces and implementations used by Participle to perform lexing.
The primary interfaces are Definition and Lexer. There are two concrete implementations included. The first is one based on Go's text/scanner package. The second is Participle's default stateful/modal lexer.
The stateful lexer is based heavily on the approach used by Chroma (and Pygments).
It is a state machine defined by a map of rules keyed by state. Each rule is a named regex and optional operation to apply when the rule matches.
As a convenience, any Rule starting with a lowercase letter will be elided from output.
Lexing starts in the "Root" group. Each rule is matched in order, with the first successful match producing a lexeme. If the matching rule has an associated Action it will be executed.
A state change can be introduced with the Action `Push(state)`. `Pop()` will return to the previous state.
To reuse rules from another state, use `Include(state)`.
As a special case, regexes containing backrefs in the form \N (where N is a digit) will match the corresponding capture group from the immediate parent group. This can be used to parse, among other things, heredocs.
See the README, example and tests in this package for details.
Index ¶
- Variables
- func BackrefRegex(backrefCache *sync.Map, input string, groups []string) (*regexp.Regexp, error)
- func MakeSymbolTable(def Definition, types ...string) (map[TokenType]bool, error)
- func NameOfReader(r interface{}) string
- func SymbolsByRune(def Definition) map[TokenType]string
- type Action
- type ActionPop
- type ActionPush
- type BytesDefinition
- type Checkpoint
- type Definition
- type Error
- type Lexer
- type PeekingLexer
- func (p *PeekingLexer) FastForward(rawCursor RawCursor)
- func (p *PeekingLexer) LoadCheckpoint(checkpoint Checkpoint)
- func (p *PeekingLexer) MakeCheckpoint() Checkpoint
- func (p *PeekingLexer) Next() *Token
- func (p *PeekingLexer) Peek() *Token
- func (p *PeekingLexer) PeekAny(match func(Token) bool) (t Token, rawCursor RawCursor)
- func (p *PeekingLexer) Range(rawStart, rawEnd RawCursor) []Token
- func (p *PeekingLexer) RawPeek() *Token
- type Position
- type RawCursor
- type Rule
- type Rules
- type RulesAction
- type SimpleRule
- type StatefulDefinition
- func (d *StatefulDefinition) Lex(filename string, r io.Reader) (Lexer, error)
- func (d *StatefulDefinition) LexString(filename string, s string) (Lexer, error)
- func (d *StatefulDefinition) MarshalJSON() ([]byte, error)
- func (d *StatefulDefinition) Rules() Rules
- func (d *StatefulDefinition) Symbols() map[string]TokenType
- type StatefulLexer
- type StringDefinition
- type Token
- type TokenType
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ReturnRule = Rule{"returnToParent", "", nil}
ReturnRule signals the lexer to return immediately.
Functions ¶
func BackrefRegex ¶
BackrefRegex returns a compiled regular expression with backreferences replaced by groups.
func MakeSymbolTable ¶
func MakeSymbolTable(def Definition, types ...string) (map[TokenType]bool, error)
MakeSymbolTable builds a lookup table for checking token ID existence.
For each symbolic name in "types", the returned map will contain the corresponding token ID as a key.
func NameOfReader ¶
func NameOfReader(r interface{}) string
NameOfReader attempts to retrieve the filename of a reader.
func SymbolsByRune ¶
func SymbolsByRune(def Definition) map[TokenType]string
SymbolsByRune returns a map of lexer symbol names keyed by rune.
Types ¶
type Action ¶
type Action interface {
// contains filtered or unexported methods
}
A Action is applied when a rule matches.
type ActionPop ¶
type ActionPop struct{}
ActionPop pops to the previous state when the Rule matches.
type ActionPush ¶
type ActionPush struct {
State string `json:"state"`
}
ActionPush pushes the current state and switches to "State" when the Rule matches.
type BytesDefinition ¶
BytesDefinition is an optional interface lexer Definition's can implement to offer a fast path for lexing byte slices.
type Checkpoint ¶
type Checkpoint struct {
// contains filtered or unexported fields
}
Checkpoint wraps the mutable state of the PeekingLexer.
Copying and restoring just this state is a bit faster than copying the entire PeekingLexer.
func (Checkpoint) Cursor ¶
func (c Checkpoint) Cursor() int
Cursor position in tokens, excluding elided tokens.
func (Checkpoint) RawCursor ¶
func (c Checkpoint) RawCursor() RawCursor
RawCursor position in tokens, including elided tokens.
type Definition ¶
type Definition interface { // Symbols returns a map of symbolic names to the corresponding pseudo-runes for those symbols. // This is the same approach as used by text/scanner. For example, "EOF" might have the rune // value of -1, "Ident" might be -2, and so on. Symbols() map[string]TokenType // Lex an io.Reader. Lex(filename string, r io.Reader) (Lexer, error) }
Definition is the main entry point for lexing.
var ( TextScannerLexer Definition = &textScannerLexerDefinition{} // DefaultDefinition defines properties for the default lexer. DefaultDefinition = TextScannerLexer )
TextScannerLexer is a lexer that uses the text/scanner module.
func Must ¶
func Must(def Definition, err error) Definition
Must takes the result of a Definition constructor call and returns the definition, but panics if it errors
eg.
lex = lexer.Must(lexer.Build(`Symbol = "symbol" .`))
func NewTextScannerLexer ¶
func NewTextScannerLexer(configure func(*scanner.Scanner)) Definition
NewTextScannerLexer constructs a Definition that uses an underlying scanner.Scanner
"configure" will be called after the scanner.Scanner.Init(r) is called. If "configure" is nil a default scanner.Scanner will be used.
type Error ¶
Error represents an error while lexing.
It complies with the participle.Error interface.
type Lexer ¶
A Lexer returns tokens from a source.
func Lex ¶
Lex an io.Reader with text/scanner.Scanner.
This provides very fast lexing of source code compatible with Go tokens.
Note that this differs from text/scanner.Scanner in that string tokens will be unquoted.
type PeekingLexer ¶
type PeekingLexer struct { Checkpoint // contains filtered or unexported fields }
PeekingLexer supports arbitrary lookahead as well as cloning.
func Upgrade ¶
func Upgrade(lex Lexer, elide ...TokenType) (*PeekingLexer, error)
Upgrade a Lexer to a PeekingLexer with arbitrary lookahead.
"elide" is a slice of token types to elide from processing.
func (*PeekingLexer) FastForward ¶
func (p *PeekingLexer) FastForward(rawCursor RawCursor)
FastForward the internal cursors to this RawCursor position.
func (*PeekingLexer) LoadCheckpoint ¶
func (p *PeekingLexer) LoadCheckpoint(checkpoint Checkpoint)
func (*PeekingLexer) MakeCheckpoint ¶
func (p *PeekingLexer) MakeCheckpoint() Checkpoint
func (*PeekingLexer) Next ¶
func (p *PeekingLexer) Next() *Token
Next consumes and returns the next token.
func (*PeekingLexer) Peek ¶
func (p *PeekingLexer) Peek() *Token
Peek ahead at the next non-elided token.
func (*PeekingLexer) PeekAny ¶
func (p *PeekingLexer) PeekAny(match func(Token) bool) (t Token, rawCursor RawCursor)
PeekAny peeks forward over elided and non-elided tokens.
Elided tokens will be returned if they match, otherwise the next non-elided token will be returned.
The returned RawCursor position is the location of the returned token. Use FastForward to move the internal cursors forward.
func (*PeekingLexer) Range ¶
func (p *PeekingLexer) Range(rawStart, rawEnd RawCursor) []Token
Range returns the slice of tokens between the two cursor points.
func (*PeekingLexer) RawPeek ¶
func (p *PeekingLexer) RawPeek() *Token
RawPeek peeks ahead at the next raw token.
Unlike Peek, this will include elided tokens.
type Position ¶
Position of a token.
func (Position) Add ¶ added in v2.1.0
Add returns a new Position that is the sum of this position and "pos".
This is useful when parsing values from a parent grammar.
type Rule ¶
type Rule struct { Name string `json:"name"` Pattern string `json:"pattern"` Action Action `json:"action"` }
A Rule matching input and possibly changing state.
func Return ¶
func Return() Rule
Return to the parent state.
Useful as the last rule in a sub-state.
func (*Rule) MarshalJSON ¶
func (*Rule) UnmarshalJSON ¶
type RulesAction ¶
type RulesAction interface {
// contains filtered or unexported methods
}
RulesAction is an optional interface that Actions can implement.
It is applied during rule construction to mutate the rule map.
type SimpleRule ¶
SimpleRule is a named regular expression.
type StatefulDefinition ¶
type StatefulDefinition struct {
// contains filtered or unexported fields
}
StatefulDefinition is the lexer.Definition.
func MustSimple ¶
func MustSimple(rules []SimpleRule) *StatefulDefinition
MustSimple creates a new Stateful lexer with only a single root state.
It panics if there is an error.
func MustStateful ¶
func MustStateful(rules Rules) *StatefulDefinition
MustStateful creates a new stateful lexer and panics if it is incorrect.
func New ¶
func New(rules Rules) (*StatefulDefinition, error)
New constructs a new stateful lexer from rules.
Example ¶
An example of parsing nested expressions within strings.
type Terminal struct { String *String ` @@` Ident string `| @Ident` } type Expr struct { Left *Terminal `@@` Op string `( @Oper` Right *Terminal ` @@)?` } type Fragment struct { Escaped string `( @Escaped` Expr *Expr ` | "${" @@ "}"` Text string ` | @Char)` } type String struct { Fragments []*Fragment `"\"" @@* "\""` } def, err := lexer.New(interpolatedRules) if err != nil { log.Fatal(err) } parser, err := participle.Build[String](participle.Lexer(def)) if err != nil { log.Fatal(err) } actual, err := parser.ParseString("", `"hello ${user + "??"}"`) if err != nil { log.Fatal(err) } repr.Println(actual)
Output: &lexer_test.String{ Fragments: []*lexer_test.Fragment{ { Text: "hello ", }, { Expr: &lexer_test.Expr{ Left: &lexer_test.Terminal{ Ident: "user", }, Op: "+", Right: &lexer_test.Terminal{ String: &lexer_test.String{ Fragments: []*lexer_test.Fragment{ { Text: "??", }, }, }, }, }, }, }, }
func NewSimple ¶
func NewSimple(rules []SimpleRule) (*StatefulDefinition, error)
NewSimple creates a new Stateful lexer with only a single root state.
func (*StatefulDefinition) LexString ¶
func (d *StatefulDefinition) LexString(filename string, s string) (Lexer, error)
LexString is a fast-path implementation for lexing strings.
func (*StatefulDefinition) MarshalJSON ¶
func (d *StatefulDefinition) MarshalJSON() ([]byte, error)
func (*StatefulDefinition) Rules ¶
func (d *StatefulDefinition) Rules() Rules
Rules returns the user-provided Rules used to construct the lexer.
func (*StatefulDefinition) Symbols ¶
func (d *StatefulDefinition) Symbols() map[string]TokenType
type StatefulLexer ¶
type StatefulLexer struct {
// contains filtered or unexported fields
}
StatefulLexer implementation.
func (*StatefulLexer) Next ¶
func (l *StatefulLexer) Next() (Token, error)
type StringDefinition ¶
StringDefinition is an optional interface lexer Definition's can implement to offer a fast path for lexing strings.
type Token ¶
type Token struct { // Type of token. This is the value keyed by symbol as returned by Definition.Symbols(). Type TokenType Value string Pos Position }
A Token returned by a Lexer.
func ConsumeAll ¶
ConsumeAll reads all tokens from a Lexer.