lexer

package

v1.29.1 Latest Latest Go to latest Published: Nov 24, 2024 License: MIT Imports: 3 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/benhoyt/goawk

Links

Open Source Insights

Documentation ¶

Overview ¶

Package lexer is an AWK lexer (tokenizer).

The lexer turns a string of AWK source code into a stream of tokens for parsing.

To tokenize some source, create a new lexer with NewLexer(src) and then call Scan() until the token type is EOF or ILLEGAL.

Example ¶

lexer := NewLexer([]byte(`$0 { print $1 }`))
for {
	pos, tok, val := lexer.Scan()
	if tok == EOF {
		break
	}
	fmt.Printf("%d:%d %s %q\n", pos.Line, pos.Column, tok, val)
}

Output:

1:1 $ ""
1:2 number "0"
1:4 { ""
1:6 print ""
1:12 $ ""
1:13 number "1"
1:15 } ""

Examples ¶

Package

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Unescape ¶ added in v1.19.0

func Unescape(s string) (string, error)

Unescape unescapes the backslash escapes in s (which shouldn't include the surrounding quotes) and returns the unquoted string. It's intended for use when unescaping command line var=value assignments, as required by the POSIX AWK spec.

Types ¶

type Lexer ¶

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer tokenizes a byte string of AWK source code. Use NewLexer to actually create a lexer, and Scan() or ScanRegex() to get tokens.

func NewLexer ¶

func NewLexer(src []byte) *Lexer

NewLexer creates a new lexer that will tokenize the given source code. See the module-level example for a working example.

func (*Lexer) HadSpace ¶

func (l *Lexer) HadSpace() bool

HadSpace returns true if the previously-scanned token had whitespace before it. Used by the parser because when calling a user-defined function the grammar doesn't allow a space between the function name and the left parenthesis.

func (*Lexer) PeekByte ¶ added in v1.12.0

func (l *Lexer) PeekByte() byte

PeekByte returns the next unscanned byte; used when parsing "getline lvalue" expressions. Returns 0 at end of input.

func (*Lexer) Scan ¶

func (l *Lexer) Scan() (Position, Token, string)

Scan scans the next token and returns its position (line/column), token value (one of the uppercase token constants), and the string value of the token. For most tokens, the token value is empty. For NAME, NUMBER, STRING, and REGEX tokens, it's the token's value. For an ILLEGAL token, it's the error message.

func (*Lexer) ScanRegex ¶

func (l *Lexer) ScanRegex() (Position, Token, string)

ScanRegex parses an AWK regular expression in /slash/ syntax. The AWK grammar has somewhat special handling of regex tokens, so the parser can only call this after a DIV or DIV_ASSIGN token has just been scanned.

type Position ¶

type Position struct {
	// Line number of the token (starts at 1).
	Line int
	// Column on the line (starts at 1). Note that this is the byte
	// offset into the line, not rune offset.
	Column int
}

Position stores the source line and column where a token starts.

func (Position) String ¶ added in v1.21.0

func (p Position) String() string

String returns the position in "line:col" format.

type Token ¶

type Token int

Token is the type of a single token.

const (
	ILLEGAL Token = iota
	EOF
	NEWLINE
	CONCAT // Not really a token, but used as an operator

	ADD
	ADD_ASSIGN
	AND
	APPEND
	ASSIGN
	AT
	COLON
	COMMA
	DECR
	DIV
	DIV_ASSIGN
	DOLLAR
	EQUALS
	GTE
	GREATER
	INCR
	LBRACE
	LBRACKET
	LESS
	LPAREN
	LTE
	MATCH
	MOD
	MOD_ASSIGN
	MUL
	MUL_ASSIGN
	NOT_MATCH
	NOT
	NOT_EQUALS
	OR
	PIPE
	POW
	POW_ASSIGN
	QUESTION
	RBRACE
	RBRACKET
	RPAREN
	SEMICOLON
	SUB
	SUB_ASSIGN

	BEGIN
	BREAK
	CONTINUE
	DELETE
	DO
	ELSE
	END
	EXIT
	FOR
	FUNCTION
	GETLINE
	IF
	IN
	NEXT
	NEXTFILE
	PRINT
	PRINTF
	RETURN
	WHILE

	F_ATAN2
	F_CLOSE
	F_COS
	F_EXP
	F_FFLUSH
	F_GSUB
	F_INDEX
	F_INT
	F_LENGTH
	F_LOG
	F_MATCH
	F_RAND
	F_SIN
	F_SPLIT
	F_SPRINTF
	F_SQRT
	F_SRAND
	F_SUB
	F_SUBSTR
	F_SYSTEM
	F_TOLOWER
	F_TOUPPER

	NAME
	NUMBER
	STRING
	REGEX

	LAST       = REGEX
	FIRST_FUNC = F_ATAN2
	LAST_FUNC  = F_TOUPPER
)

func KeywordToken ¶ added in v1.1.0

func KeywordToken(name string) Token

KeywordToken returns the token associated with the given keyword string, or ILLEGAL if given name is not a keyword.

func (Token) String ¶

func (t Token) String() string

String returns the string name of this token.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL