Documentation ¶
Overview ¶
Package lexer is an AWK lexer (tokenizer).
The lexer turns a string of AWK source code into a stream of tokens for parsing.
To tokenize some source, create a new lexer with NewLexer(src) and then call Scan() until the token type is EOF or ILLEGAL.
Example ¶
lexer := NewLexer([]byte(`$0 { print $1 }`)) for { pos, tok, val := lexer.Scan() if tok == EOF { break } fmt.Printf("%d:%d %s %q\n", pos.Line, pos.Column, tok, val) }
Output: 1:1 $ "" 1:2 number "0" 1:4 { "" 1:6 print "" 1:12 $ "" 1:13 number "1" 1:15 } ""
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Lexer ¶
type Lexer struct {
// contains filtered or unexported fields
}
Lexer tokenizes a byte string of AWK source code. Use NewLexer to actually create a lexer, and Scan() or ScanRegex() to get tokens.
func NewLexer ¶
NewLexer creates a new lexer that will tokenize the given source code. See the module-level example for a working example.
func (*Lexer) HadSpace ¶
HadSpace returns true if the previously-scanned token had whitespace before it. Used by the parser because when calling a user-defined function the grammar doesn't allow a space between the function name and the left parenthesis.
func (*Lexer) PeekByte ¶ added in v1.12.0
PeekByte returns the next unscanned byte; used when parsing "getline lvalue" expressions. Returns 0 at end of input.
func (*Lexer) Scan ¶
Scan scans the next token and returns its position (line/column), token value (one of the uppercase token constants), and the string value of the token. For most tokens, the token value is empty. For NAME, NUMBER, STRING, and REGEX tokens, it's the token's value. For an ILLEGAL token, it's the error message.
type Position ¶
type Position struct { // Line number of the token (starts at 1). Line int // Column on the line (starts at 1). Note that this is the byte // offset into the line, not rune offset. Column int }
Position stores the source line and column where a token starts.
type Token ¶
type Token int
Token is the type of a single token.
const ( ILLEGAL Token = iota EOF NEWLINE CONCAT // Not really a token, but used as an operator ADD ADD_ASSIGN AND APPEND ASSIGN AT COLON COMMA DECR DIV DIV_ASSIGN DOLLAR EQUALS GTE GREATER INCR LBRACE LBRACKET LESS LPAREN LTE MATCH MOD MOD_ASSIGN MUL MUL_ASSIGN NOT_MATCH NOT NOT_EQUALS OR PIPE POW POW_ASSIGN QUESTION RBRACE RBRACKET RPAREN SEMICOLON SUB SUB_ASSIGN BEGIN BREAK CONTINUE DELETE DO ELSE END EXIT FOR FUNCTION GETLINE IF IN NEXT NEXTFILE PRINT PRINTF RETURN WHILE F_ATAN2 F_CLOSE F_COS F_EXP F_FFLUSH F_GSUB F_INDEX F_INT F_LENGTH F_LOG F_MATCH F_RAND F_SIN F_SPLIT F_SPRINTF F_SQRT F_SRAND F_SUB F_SUBSTR F_SYSTEM F_TOLOWER F_TOUPPER NAME NUMBER STRING REGEX LAST = REGEX FIRST_FUNC = F_ATAN2 LAST_FUNC = F_TOUPPER )
func KeywordToken ¶ added in v1.1.0
KeywordToken returns the token associated with the given keyword string, or ILLEGAL if given name is not a keyword.