lexer

package
v0.0.0-...-9e081f2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2024 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package lexer defines lexical analyzer.

Index

Constants

View Source
const (
	// ErrorTokenType is the type for fake tokens capturing broken lexemes (e.g. incorrect string literals).
	// The purpose of these tokens is to generate more informative error messages.
	// Lexer will never return a token of this type, an error with message containing token text will be returned instead.
	ErrorTokenType = LowestTokenType - 1

	// ErrorTokenName is the type name for ErrorTokenType.
	ErrorTokenName = "-error-"
)
View Source
const (
	// WrongCharError indicates that lexer cannot fetch any token at current position.
	// Error message contains the rune at current source position.
	WrongCharError = llx.LexicalErrors + iota

	// BadTokenError indicates that lexer has fetched a token of ErrorTokenType.
	BadTokenError
)

Error codes used by lexer:

View Source
const (
	// EofTokenType is a fake token indicating the end of source file.
	// Line and column (if present) mark the position right after the last rune of source file.
	EofTokenType = -2

	// EofTokenName is the type name for EofTokenType
	EofTokenName = "-end-of-file-"

	// EoiTokenType is a fake token indicating absence of queued sources (i.e. all sources are processed).
	EoiTokenType = -3

	// EoiTokenName is the type name for EoiTokenType
	EoiTokenName = "-end-of-input-"

	LowestTokenType = -3
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer performs lexical analysis of current source in source.Queue using regexp.Regexp. Lexer itself is immutable, stateless, and safe for concurrent use (i.e. the same Lexer instance may be used with different queues by different goroutines), but it affects queue state. Each token type that may be returned by lexer maps to its own regexp capturing group index. A match containing no captured groups is treated as insignificant lexeme (e.g. whitespace), in this case lexer tries to fetch a token again at new position. Every byte of source file must belong to some lexeme.

func New

func New(re *regexp.Regexp, types []TokenType) *Lexer

New creates new Lexer. Each n-th element of types describes token type for (n+1)-th regexp capturing group. A group that has no description is treated as ErrorTokenType.

func (*Lexer) Next

func (l *Lexer) Next(q *source.Queue) (*Token, error)

Next fetches token starting at current source position and advances current position. Returns nil token and llx.Error and does not make any changes if there is a lexical error. Returns EoI token if queue is empty. Returns EoF token and discards current source if current position is beyond the end of current source.

func (*Lexer) Shrink

func (l *Lexer) Shrink(q *source.Queue, tok *Token) *Token

Shrink tries to fetch a token which starts at the same position as given and is at least one byte shorter. Adjusts current position and returns shrunk token on success. Makes no changes and returns nil if given token has no captured source and position information, was fetched from source other than current, or a lexical error occurs.

type Token

type Token struct {
	// contains filtered or unexported fields
}

Token represents a lexeme, either fetched from a source file or "external" one. Contains token type, text, and source and starting position (if known). Immutable.

func EofToken

func EofToken(s *source.Source) *Token

EofToken creates a token of EofTokenType. s may be nil.

func EoiToken

func EoiToken() *Token

EoiToken creates a token of EoiTokenType.

func NewToken

func NewToken(tokenType int, typeName string, content []byte, sp source.Pos) *Token

NewToken creates a token. Expects zero value for sp if token source is not known.

func (*Token) Col

func (t *Token) Col() int

Col returns 1-based column number of the first byte of the token. Returns 0 if source is not known.

func (*Token) Content

func (t *Token) Content() []byte

Content returns token content.

func (*Token) Line

func (t *Token) Line() int

Line returns 1-based line number of the first byte of the token. Returns 0 if source is not known.

func (*Token) Pos

func (t *Token) Pos() source.Pos

Pos returns captured source position.

func (*Token) Source

func (t *Token) Source() *source.Source

Source returns captured source. Returns nil if source is not known.

func (*Token) SourceName

func (t *Token) SourceName() string

SourceName returns source file name. Returns empty string if source is not known.

func (*Token) Text

func (t *Token) Text() string

Text returns lexeme body converted to string. Conversion occurs on first call, resulting string is stored and reused to minimize number of allocations.

func (*Token) Type

func (t *Token) Type() int

Type returns token type.

func (*Token) TypeName

func (t *Token) TypeName() string

TypeName returns token type name.

type TokenType

type TokenType struct {
	// Type contains token type, may be any value. ErrorTokenType is treated specially.
	Type int

	// TypeName contains token type name, may be any value.
	TypeName string
}

TokenType describes token type for specific capturing group of regular expression.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL