Documentation ¶
Overview ¶
Package Lexer implements a simple lexing toolkit.
Index ¶
- Constants
- type Channel
- type Iterator
- type LexInner
- func (l *LexInner) Accept(valid string) bool
- func (l *LexInner) AcceptRun(valid string) (acceptnum int)
- func (l *LexInner) Back()
- func (l *LexInner) Bytes(number int) bool
- func (l *LexInner) Emit(typ TokenType)
- func (l *LexInner) EmitEof() StateFn
- func (l *LexInner) EmitString(typ TokenType, str string)
- func (l *LexInner) Eof() bool
- func (l *LexInner) Errorf(format string, args ...interface{}) StateFn
- func (l *LexInner) Except(valid string) bool
- func (l *LexInner) ExceptRun(valid string) (acceptnum int)
- func (l *LexInner) Find(valid string) bool
- func (l *LexInner) Get() string
- func (l *LexInner) Ignore()
- func (l *LexInner) Last() rune
- func (l *LexInner) Len() int
- func (l *LexInner) Mark() Mark
- func (l *LexInner) Next() (char rune)
- func (l *LexInner) One(f func(rune) bool) bool
- func (l *LexInner) Peek() rune
- func (l *LexInner) Replace(start Mark, with string)
- func (l *LexInner) ReplaceGet() string
- func (l *LexInner) Retry()
- func (l *LexInner) Run(f func(rune) bool) (acceptnum int)
- func (l *LexInner) Skip(n int) int
- func (l *LexInner) String(valid string) bool
- func (l *LexInner) Unmark(mark Mark)
- func (l *LexInner) Warningf(format string, args ...interface{})
- func (l *LexInner) Whitespace(except string) (acceptnum int)
- type Lexer
- type Mark
- type Replacer
- type StateFn
- type Token
- type TokenType
Examples ¶
Constants ¶
const Eof rune = -1
This is returned by next when there are no more characters to read.
const Err rune = utf8.RuneError
This is returned when a bad rune is encountered.
const MaxEmitsInFunction = 10
The maximum number of emits in a single state function when using Token. If this number has been reached, Token returns a StateError. If you wish to emit more than this, use the Go method to read tokens off the channel directly.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Iterator ¶
type Iterator struct {
// contains filtered or unexported fields
}
Generates tokens synchronously. See Lexer.Iterate
type LexInner ¶
type LexInner struct {
// contains filtered or unexported fields
}
LexInner is the inner type which is used within StateFn to do the actual lexing.
func (*LexInner) Accept ¶
Read one character, but only if it is one of the characters in the given string.
func (*LexInner) AcceptRun ¶
Read as many characters as possible, but only characters that exist in the given string.
func (*LexInner) Back ¶
func (l *LexInner) Back()
Undo the last Next. This is probably won't work after calling any other lexer functions. If you need to undo more, use Mark and Unmark.
func (*LexInner) Bytes ¶ added in v0.2.5
Consume the given number of bytes. Returns true if successful, false if there are not enough bytes.
func (*LexInner) Emit ¶
Emit the gathered token, given its type. Emits the result of ReplaceGet, then calls Ignore.
func (*LexInner) EmitString ¶
Emit a token with the given type and string.
func (*LexInner) Except ¶
Read one character, but only if it is NOT one of the characters in the given string. If Eof or Err is reached, Except fails regardless of what the given string is.
func (*LexInner) ExceptRun ¶
Read as many characters as possible, but only characters that do NOT exist in the given string. If Eof is reached, ExceptRun stops as though it found a successful character. Thus, ExceptRun("") accepts everything until Eof. or Err.
func (*LexInner) Find ¶
Accepts things until the first occurence of the given string. The string itself is not accepted.
func (*LexInner) Ignore ¶
func (l *LexInner) Ignore()
Ignore everything gathered about the token so far. Also removes any Replaces.
func (*LexInner) Next ¶
Read a single character. If there are no more characters, it will return Eof. If a non-utf8 character is read, it will return Err.
func (*LexInner) One ¶
Accept a single character and return true if f returns true. Otherwise, do nothing and return false.
func (*LexInner) Replace ¶ added in v0.2.2
Replace the text from the start Mark to the current position with the given string. With may be a different length than the string being replaced, but this change will not be reflected by functions like Len and Get. Call ReplaceGet to get the token including its replaces. This is how it will be sent by Emit. The replace is part of the current Mark, so Unmarking to before a replace was done will remove the replace.
func (*LexInner) ReplaceGet ¶ added in v0.2.2
Get the current token with all replaces included. This can be expensive, if you have many replaces. Without any replaces, it is identical to Get.
func (*LexInner) Run ¶
Reads characters and feeds them to the given function, and keeps reading until it returns false.
func (*LexInner) Skip ¶
Read n characters. Returns the number of characters read. If it returns less than n, it will have reached EOF.
func (*LexInner) String ¶
Attempt to read a string. Only if the entire string is successfully accepted does it return true. If only a part of the string was matched, none of it is.
func (*LexInner) Whitespace ¶
Accepts any whitespace (unicode.IsSpace), except for whitespace in except. For instance, Whitespace("\n") will accept all whitespace except newlines. Returns the number of runes read.
type Lexer ¶
type Lexer struct {
// contains filtered or unexported fields
}
Lexer is the external type which emits tokens.
Example ¶
package main import ( "fmt" "unicode" "github.com/PieterD/lexer" ) const ( tokenComment lexer.TokenType = 1 + iota tokenVariable tokenAssign tokenNumber tokenString ) func main() { text := ` /* comment */ pie=314 // comment string = "Hello world!" ` l := lexer.New("filename", text, state_base) tokenchan := l.Go() for token := range tokenchan { fmt.Printf("%s:%d [%d]\"%s\"\n", token.File, token.Line, token.Typ, token.Val) } } // Start parsing with this. func state_base(l *lexer.LexInner) lexer.StateFn { // Ignore all whitespace. l.Run(unicode.IsSpace) l.Ignore() if l.String("//") { // We're remembering the '//' here so it gets included in the Emit // contained in state_comment_line. return state_comment_line } if l.String("/*") { return state_comment_block(state_base) } if l.Eof() { return l.EmitEof() } // It's not a comment or Eof, so it must be a variable name. return state_variable } // Parse a line comment. func state_comment_line(l *lexer.LexInner) lexer.StateFn { // Eat up everything until end of line (or Eof) l.ExceptRun("\n") l.Emit(tokenComment) // Consume the end of line. If we reached Eof, this does nothing. l.Accept("\n") // Ignore that last newline l.Ignore() return state_base } // Parse a block comment. // Since block comments may appear in different states, // instead of defining the usual StateFn we define a function that // returns a statefn, which in turn will return the parent state // after its parsing is done. func state_comment_block(parent lexer.StateFn) lexer.StateFn { return func(l *lexer.LexInner) lexer.StateFn { if !l.Find("*/") { // If closing statement couldn't be found, emit an error. // Errorf always returns nil, so parsing is done after this. return l.Errorf("Couldn't find end of block comment") } l.String("*/") l.Emit(tokenComment) return parent } } // Parse a variable name func state_variable(l *lexer.LexInner) lexer.StateFn { if l.AcceptRun("abcdefghijklmnopqrstuvwxyz") == 0 { return l.Errorf("Invalid variable name") } l.Emit(tokenVariable) return state_operator } // Parse an assignment operator func state_operator(l *lexer.LexInner) lexer.StateFn { l.Run(unicode.IsSpace) l.Ignore() if l.Accept("=") { l.Emit(tokenAssign) return state_value } return l.Errorf("Only '=' is a valid operator") } // Parse a value func state_value(l *lexer.LexInner) lexer.StateFn { l.Run(unicode.IsSpace) l.Ignore() if l.AcceptRun("0123456789") > 0 { l.Emit(tokenNumber) return state_base } if l.Accept("\"") { return state_string } return l.Errorf("Unidentified value") } // Parse a string func state_string(l *lexer.LexInner) lexer.StateFn { for { l.ExceptRun("\"\\") // Now we're either at a ", a \, or Eof. if l.Accept("\"") { l.Emit(tokenString) return state_base } if l.Accept("\\") { if !l.Accept("nrt\"'\\") { return l.Errorf("Invalid escape sequence: \"\\%c\"", l.Last()) } } if l.Eof() { return l.Errorf("No closing '\"' found") } } }
Output: filename:2 [1]"/* comment */" filename:3 [2]"pie" filename:3 [3]"=" filename:3 [4]"314" filename:4 [1]"// comment" filename:5 [2]"string" filename:5 [3]"=" filename:5 [5]""Hello world!"" filename:5 [-3]"EOF"
func (*Lexer) Go ¶
Spawn a goroutine which keeps sending tokens on the returned channel, until TokenEmpty would be encountered. If Go or Iterate has already been called, it will return nil.
type Mark ¶
type Mark struct {
// contains filtered or unexported fields
}
The Mark type (used by Mark and Unmark) can be used to save the current state of the lexer, and restore it later.
type Token ¶
Tokens are emitted by the lexer. They contained a (usually) user-defined Typ, the Value of the token, and the Filename and Line number where the token was generated.
type TokenType ¶
type TokenType int
TokenType is an integer representing the type of token that has been emitted. Most TokenTypes will be user-defined, and those that are must be greater than 0. Other than TokenEmpty, which is read when there is absolutely nothing left to read or when the channel is closed, the package-defined Error, Warning and EOF tokens are only generated by emitting them manually, or by evoking their corresponding Emit* functions.
const ( // TokenEmpty is the TokenType with value 0. // Any zero-valued token will have this as its Typ. // It is also returned when the lexer has stopped (by an error, or Eof) TokenEmpty TokenType = -iota // TokenError is the Typ for errors reported by, for example, Lexer.Errorf. TokenError // TokenWarning is the Typ for warnings. TokenWarning // TokenEOF should be returned once per file, when the end of file has been reached. // This is not done automatically! TokenEOF )