Documentation ¶
Overview ¶
Package lexer provides a simple scanner and types for handrolling lexers. The implementation is based on Rob Pike's talk.
http://www.youtube.com/watch?v=HxaD_trXwRE
There are some key differences to Pike's presented code. Next has been renamed Advance to be more idiomatic with Backup. Next is used by the parser to retrieve items from the lexer.
Two APIs ¶
The Lexer type has two APIs, one is used byte StateFn types. The other is called by the parser. These APIs are called the scanner and the parser APIs here.
The parser API ¶
The only function the parser calls on the lexer is Next to retreive the next token from the input stream. Eventually an item with type ItemEOF is returned at which point there are no more tokens in the stream.
The scanner API ¶
The lexer uses Emit to construct complete lexemes to return from future/concurrent calls to Next by the parser. The scanner uses a combination of methods to manipulate its position and and prepare lexemes to be emitted. Lexer errors are emitted to the parser using the Errorf method which keeps the scanner-parser interface uniform.
Common lexer methods used in a scanner are the Accept[Run][Range] family of methods. Accept* methods take a set and advance the lexer if incoming runes are in the set. The AcceptRun* subfamily advance the lexer as far as possible.
For scanning known sequences of bytes (e.g. keywords) the AcceptString method avoids a lot of branching that would be incurred using methods that match character classes.
The remaining methods provide low level functionality that can be combined to address corner cases.
Index ¶
- Constants
- func IsEOF(c rune, n int) bool
- func IsInvalid(c rune, n int) bool
- type Error
- type Item
- type ItemType
- type Lexer
- func (l *Lexer) Accept(valid string) (ok bool)
- func (l *Lexer) AcceptFunc(fn func(rune) bool) (ok bool)
- func (l *Lexer) AcceptRange(tab *unicode.RangeTable) (ok bool)
- func (l *Lexer) AcceptRun(valid string) (n int)
- func (l *Lexer) AcceptRunFunc(fn func(rune) bool) int
- func (l *Lexer) AcceptRunRange(tab *unicode.RangeTable) (n int)
- func (l *Lexer) AcceptString(s string) (ok bool)
- func (l *Lexer) Advance() (rune, int)
- func (l *Lexer) Backup()
- func (l *Lexer) Current() string
- func (l *Lexer) Emit(t ItemType)
- func (l *Lexer) Errorf(format string, vs ...interface{}) StateFn
- func (l *Lexer) Ignore()
- func (l *Lexer) Input() string
- func (l *Lexer) Last() (r rune, width int)
- func (l *Lexer) Next() (i *Item)
- func (l *Lexer) Peek() (rune, int)
- func (l *Lexer) Pos() int
- func (l *Lexer) Start() int
- type StateFn
Examples ¶
Constants ¶
const EOF rune = 0x04
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Item ¶
An individual scanned item (a lexeme).
type Lexer ¶
type Lexer struct {
// contains filtered or unexported fields
}
Lexer contains an input string and state associate with the lexing the input.
Example (Advance) ¶
This example shows a trivial parser using Advance, the lowest level lexer function. The parser decodes a serialization format for test status messages generated by a hypothetical test suite. The rune '.' is translated using the format "%d success", the rune '!' is translated using "%d failure".
// delare token types as constants const ( itemOK lexer.ItemType = iota itemFail ) // create a StateFn to parse the language. var start lexer.StateFn start = func(lex *lexer.Lexer) lexer.StateFn { c, n := lex.Advance() if lexer.IsEOF(c, n) { return nil } if lexer.IsInvalid(c, n) { return lex.Errorf("invalid utf-8 rune") } switch c { case '.': lex.Emit(itemOK) case '!': lex.Emit(itemFail) default: // lex.Backup() does not need to be called even though lex.Pos() // points at the next rune. The position of the error is the start // of the current lexeme (in this case the unexpected rune we just // read). return lex.Errorf("unexpected rune %q", c) } return start } // create a parser for the language. parse := func(input string) ([]string, error) { lex := lexer.New(start, input) var status []string for { item := lex.Next() err := item.Err() if err != nil { return nil, fmt.Errorf("%v (pos %d)", err, item.Pos) } switch item.Type { case lexer.ItemEOF: return status, nil case itemOK: status = append(status, fmt.Sprintf("%d success", item.Pos)) case itemFail: status = append(status, fmt.Sprintf("%d failure", item.Pos)) default: panic(fmt.Sprintf("unexpected item %0x (pos %d)", item.Type, item.Pos)) } } } // parse a valid string and print the status status, err := parse(".!") fmt.Printf("%q %v\n", status, err) // parse an invalid string and print the error status, err = parse("!.!?.") fmt.Printf("%q %v\n", status, err)
Output: ["0 success" "1 failure"] <nil> [] unexpected rune '?' (pos 3)
func (*Lexer) AcceptFunc ¶
AcceptFunc advances the lexer if fn return true for the next rune.
func (*Lexer) AcceptRange ¶
func (l *Lexer) AcceptRange(tab *unicode.RangeTable) (ok bool)
AcceptRange advances l's position if the current rune is in tab.
func (*Lexer) AcceptRunFunc ¶
AcceptRunFunc advances l's position as long as fn returns true for the next input rune.
func (*Lexer) AcceptRunRange ¶
func (l *Lexer) AcceptRunRange(tab *unicode.RangeTable) (n int)
AcceptRunRange advances l's possition as long as the current rune is in tab.
func (*Lexer) AcceptString ¶
AcceptString advances the lexer len(s) bytes if the next len(s) bytes equal s. AcceptString returns true if l advanced.
func (*Lexer) Advance ¶
Advance adds one rune of input to the current lexeme, increments the lexer's position, and returns the input rune with its size in bytes (encoded as UTF-8). Invalid UTF-8 codepoints cause the current call and all subsequent calls to return (utf8.RuneError, 1). If there is no input the returned size is zero.
func (*Lexer) Backup ¶
func (l *Lexer) Backup()
Backup removes the last rune from the current lexeme and moves l's position back in the input string accordingly. Backup should only be called after a call to Advance.
func (*Lexer) Errorf ¶
Errorf causes an error item to be emitted from l.Next(). The item's value (and its error message) are the result of evaluating format and vs with fmt.Sprintf.
func (*Lexer) Next ¶
The method by which items are extracted from the input. Returns nil if the lexer has entered a nil state.
func (*Lexer) Peek ¶
Peek returns the next rune in the input stream without adding it to the current lexeme.