lex

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2023 License: MIT Imports: 11 Imported by: 6

Documentation

Overview

Package lex provides lexing functionality for the ictiobus parser generator. It uses the regex provided by Go's built-in RE2 engine for matching on input and supports multiple states and state swapping, although does not retain any info about prior states.

All lexers provided by this package support four different handlings of input pattern matching: lex the input and return a token of some class, change the lexer state to a new one, lex a token *and then* change the lexer state to a new one, or discard the matched text and continue from after it.

Lexing is invoked by obtaining a Lexer and calling its Lex method. This will return a TokenStream that returns tokens lexed from input when its Next method is called. This TokenStream can be passed on to further stages of input analysis, such as a parser.

Index

Constants

This section is empty.

Variables

View Source
var (
	TokenUndefined = MakeDefaultClass("<ictiobus_undefined_token>")
	TokenError     = MakeDefaultClass("<ictioubus_error>")
	TokenEndOfText = NewTokenClass("$", "end of input")
)

Functions

func NewSyntaxErrorFromToken added in v0.8.0

func NewSyntaxErrorFromToken(msg string, tok Token) *syntaxerr.Error

NewSyntaxErrorFromToken uses the location information in the provided token to create a SyntaxError with a detailed message on the error and the source code which caused it.

Types

type Action

type Action struct {
	Type    ActionType
	ClassID string
	State   string
}

Action is an action for the lexer to take when it matches a defined regex pattern.

func Discard

func Discard() Action

Discard returns a lexer action that indicates that it should take no action and effectively discard the text it matched against.

func LexAndSwapState

func LexAndSwapState(classID string, newState string) Action

LexAndSwapState returns a lexer action that indicates that the lexer should take the source text that it matched against and lex it as a token of the given token class, and then it should swap to the new state.

func LexAs

func LexAs(classID string) Action

LexAs returns a lexer action that indicates that the lexer should take the source text that it matched against and lex it as a token of the given token class.

func SwapState

func SwapState(toState string) Action

SwapState returns a lexer action that indicates that the lexer should swap to the given state.

type ActionType

type ActionType int

ActionType is a type of action that the lexer can take.

const (
	ActionNone ActionType = iota
	ActionScan
	ActionState
	ActionScanAndState
)

type Lexer added in v0.8.0

type Lexer interface {

	// Lex returns a token stream. The tokens may be lexed in a lazy fashion or
	// an immediate fashion; if it is immediate, errors will be returned at that
	// point. If it is lazy, then error token productions will be returned to
	// the callers of the returned TokenStream at the point where the error
	// occured.
	Lex(input io.Reader) (TokenStream, error)

	// RegisterClass registers a token class for use in some state of the Lexer.
	// Token classes must be registered before they can be used.
	RegisterClass(cl TokenClass, forState string)

	// AddPattern adds a new pattern for the lexer to recognize.
	AddPattern(pat string, action Action, forState string, priority int) error

	// FakeLexemeProducer returns a map of token IDs to functions that will produce
	// a lexable value for that ID. As some token classes may have multiple ways of
	// lexing depending on the state, either state must be selected or combine must
	// be set to true.
	//
	// If combine is true, then state is ignored and all states' regexes for that ID
	// are combined into a single function that will alternate between them. If
	// combine is false, then state must be set and only the regexes for that state
	// are used to produce a lexable value.
	//
	// This can be useful for testing but may not produce useful values for all
	// token classes, especially those that have particularly complicated lexing
	// rules. If a caller finds that one of the functions in the map produced by
	// FakeLexemeProducer does not produce a lexable value, then it can be replaced
	// manually by replacing that entry in the map with a custom function.
	FakeLexemeProducer(combine bool, state string) map[string]func() string

	// SetStartingState sets the initial state of the lexer. If not set, the
	// starting state will be the default state.
	SetStartingState(s string)

	// StartingState returns the initial state of the lexer. If one wasn't set, this
	// will be the default state, "".
	StartingState() string

	// RegisterTraceListener provides a function to call whenever a new token is
	// lexed. It can be used for debug purposes.
	RegisterTraceListener(func(t Token))
}

A Lexer represents an in-progress or ready-built lexing engine ready for use. It can be stored as a byte representation and retrieved from bytes as well.

func NewLexer

func NewLexer(lazy bool) Lexer

NewLexer creates a new Lexer that performs lexing in a lazy or immediate fashion as specified by lazy.

type Token added in v0.8.0

type Token interface {
	// Class returns the TokenClass of the Token.
	Class() TokenClass

	// Lexeme returns the text that was lexed as the TokenClass of the Token, as
	// it appears in the source text.
	Lexeme() string

	// LinePos returns the 1-indexed character-of-line that the token appears
	// on in the source text.
	LinePos() int

	// Line returns the 1-indexed line number of the line that the token appears
	// on in the source text.
	Line() int

	// FullLine returns the full of text of the line in source that the token
	// appears on, including both anything that came before the token as well as
	// after it on the line.
	FullLine() string

	// String is the string representation.
	String() string
}

Token is a lexeme read from text combined with the token class it is as well as additional supplementary information gathered during lexing to inform error reporting.

func NewToken added in v0.3.0

func NewToken(class TokenClass, lexed string, linePos int, lineNum int, line string) Token

type TokenClass added in v0.8.0

type TokenClass interface {
	// ID returns the ID of the token class. The ID must uniquely identify the
	// token within all terminals of a grammar.
	ID() string

	// Human returns a human-readable name for the token class, for use in
	// contexts such as error reporting.
	Human() string

	// Equal returns whether the TokenClass equals another. If two IDs are the
	// same, Equal must return true. TOOD: can't we replace all uses with a call
	// to ID() then? check this once move is done.
	Equal(o any) bool
}

TokenClass is the class of a token in ictiobus compiler frontends. This is how tokens are represented in grammar, and can be considered the 'type' of a lexed token.

func MakeDefaultClass added in v0.8.0

func MakeDefaultClass(s string) TokenClass

MakeDefaultClass takes a string and returns a token that both uses the lower-case version of the string as its ID and the un-modified string as its Human-readable string.

func NewTokenClass

func NewTokenClass(id string, human string) TokenClass

NewTokenClass creates a new token class with the given id and human readable name.

type TokenStream added in v0.8.0

type TokenStream interface {
	// Next returns the next token in the stream and advances the stream by one
	// token.
	Next() Token

	// Peek returns the next token in the stream without advancing the stream.
	Peek() Token

	// HasNext returns whether the stream has any additional tokens.
	HasNext() bool
}

TokenStream is a stream of tokens read from source text. The stream may be lazily-loaded or immediately available.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL