Documentation ¶
Index ¶
- Constants
- Variables
- func IsComment(pos TokenPos) bool
- func IsMetaBlock(pos TokenPos) bool
- func IsSentence(pos TokenPos) bool
- func ReadAll(reader TokenReader) error
- type Cursor
- type ExprEndToken
- type ExprStartToken
- type KeywordToken
- type LDArrowToken
- type QMarkToken
- type SymbolToken
- type Token
- func ExpressionEnd(pos TokenPos) Token
- func ExpressionStart(pos TokenPos) Token
- func Identifier(name string, pos TokenPos) Token
- func Integer(image string, pos TokenPos) Token
- func KeywordAt(image string, pos TokenPos) Token
- func LeftDoubleArrow(pos TokenPos) Token
- func LineComment(image string, pos TokenPos) Token
- func QuestionMark(pos TokenPos) Token
- func UnexpectedToken(image string, pos TokenPos) Token
- type TokenPos
- func (pos TokenPos) Column() uint
- func (pos TokenPos) InComment() TokenPos
- func (pos TokenPos) InMetaBlock() TokenPos
- func (pos TokenPos) InSentence() TokenPos
- func (pos TokenPos) Line() uint
- func (pos TokenPos) NextAt(lines, cols uint) TokenPos
- func (pos TokenPos) NextCol() TokenPos
- func (pos TokenPos) NextLine() TokenPos
- func (pos TokenPos) ResetFlag() TokenPos
- func (data TokenPos) String() string
- type TokenReader
- type TokenType
Constants ¶
const ( RUNE_OPEN_PAREN = '(' RUNE_CLOSE_PAREN = ')' RUNE_QUESTION_MARK = '?' RUNE_COMMENT_SEMI = ';' RUNE_BEGIN_ARROW_LD = '<' IMAGE_ARROW_LD = "<=" )
const CURSOR_TAB_STOP = 4
Arbitrary size for \t alignment.
Variables ¶
var EOF = Token{kTOKENPOS_ZERO, &eofToken{}}
EOF token indicates the end of the token stream. As EOF is not in the document, its TokenPos is always zero.
Functions ¶
func IsMetaBlock ¶
func IsSentence ¶
func ReadAll ¶
func ReadAll(reader TokenReader) error
Repeatedly calls `NextToken()` until either the enf of file (EOF) is reached or until an error is returned when attempting to read the next token. Unlike NextToken(), it does not forward the io.EOF error - if `EOF` is reached and no other errors are encountered, this method returns `nil`.
Types ¶
type Cursor ¶
type Cursor interface { // NextRune is called to extend the cursor by reading the next rune from // input. Also updates the pending string except when skipping spaces, and // returns the updated cursor and the rune that was read. NextRune(input io.RuneReader) (Cursor, rune) // Similar to NextRune() but will read from the pending buffer if nonempty, // and read from input, populating pending, if pending was empty. Implicitly // ignores leading spaces if needing to read from input. FirstRune(input io.RuneReader) (Cursor, rune) // Consumes all characters in the pending rune list, updating pos to match. ConsumeAll() (Cursor, string) // Same as ConsumeAll() except the last rune is left in the pending buffer. ConsumeExceptFinal() (Cursor, string) // Resets the TokenPos for this cursor to (0, 0, UNKNOWN). ResetPos() Cursor // The current position of the next Token that would be produced by consuming // the contents of this Cursor, whether or not anything is in pending buffer. Pos() TokenPos // Returns true if there is nothing pending in the cursor. IsEmpty() bool // Returns true if the last ReadRune call returned an error. HasError() bool // Returns `true` if the embedded error is io.EOF. IsEOF() bool // Returns the error (or nil) from the most recent read of input. If an error // is encountered, it will persist through update methods and prohibit reads. // // Intentionally not extending `error` interface by naming this ErrorValue. ErrorValue() error }
The Cursor represents a few properties of the lexer's state that are invariably coupled to each other -- token position, the runes ready to be integrated into the next token, and whether there is a pending rune waiting to be processed. The token's next position should always be the current position plus the size of the pending rune, if there is a pending rune, but that relies on whether scanning can be done in LL(1) or (in some cases) LL(0) as with `(` and `)`. It also smelled bad to be updating only part of the lexer state and another part that depended on it, while not doing so atomically.
This, and its backing struct, are a solution to the above problems while also aiding the readability of the token-specific lexer code. The coupled updates are done within the Advance and Consume methods, there is no redundant next pos or ambiguity about the contents of the pending image. In addition to that, the cursor is copy-on-write, all updates are conveyed by the return value of the updating method, and the implementing methods use by-value receivers so downcast-and-update has limited adverse effect.
However, it assumes that it is the only reader on the provided input, and that its scan position is consistent between calls to Advance. If there is need of multiple concurrent cursors on the same reader source, use new readers for each cursor or tee the source RunReader. Rather than further complicate this code with management of byte offsets and seeks at each read, especially while this task of tokenizing byte streams is inherently single- threaded. Calling code is expected to manage it, typically via lexerState.
type ExprEndToken ¶
type ExprEndToken struct{ SymbolToken }
Token EXPR_END = ")"
var EXPR_END ExprEndToken
func (ExprEndToken) Image ¶
func (tok ExprEndToken) Image() string
func (ExprEndToken) TypeString ¶
func (tok ExprEndToken) TypeString() string
type ExprStartToken ¶
type ExprStartToken struct{ SymbolToken }
Token EXPR_START = "("
var EXPR_START ExprStartToken
func (ExprStartToken) Image ¶
func (tok ExprStartToken) Image() string
func (ExprStartToken) TypeString ¶
func (tok ExprStartToken) TypeString() string
type KeywordToken ¶
type KeywordToken struct {
// contains filtered or unexported fields
}
All keywords are given the KEYWORD token type.
func (KeywordToken) At ¶
func (tok KeywordToken) At(pos TokenPos) Token
Constructs a Token instance pointing to the singular KeywordToken instance for the specific keyword.
func (KeywordToken) Image ¶
func (tok KeywordToken) Image() string
Satisfies the requirement for TokenType interface.
func (KeywordToken) TypeString ¶
func (tok KeywordToken) TypeString() string
Satisfies the requirement for TokenType interface.
type LDArrowToken ¶
type LDArrowToken struct{ SymbolToken }
Token ARROW_LD = "<="
var ARROW_LD LDArrowToken
func (LDArrowToken) Image ¶
func (tok LDArrowToken) Image() string
func (LDArrowToken) TypeString ¶
func (tok LDArrowToken) TypeString() string
type QMarkToken ¶
type QMarkToken struct{ SymbolToken }
Token QUE_MARK = "?"
var QUE_MARK QMarkToken
func (QMarkToken) Image ¶
func (tok QMarkToken) Image() string
func (QMarkToken) TypeString ¶
func (tok QMarkToken) TypeString() string
type SymbolToken ¶
type SymbolToken struct{}
Symbol tokens always have the same image, they can share a common instance.
type Token ¶
Represents a Token instance by its position in the source and its type. The TokenType is an embedded interface (see above) and may be initialized with state/context or reuse a shared instance for the many tokens that are universally identical within their type (e.g. keywords, operator symbols). TokenPos is a 32-bit uint composite value defined in [token_pos.go].
func ExpressionEnd ¶
Indicates the end of expressions and sub-expressions within a sentence.
func ExpressionStart ¶
Begins all expressions, the main structural denotation in GDL syntax.
func Identifier ¶
Identifier is a catch-all token for alpha-num strings that are not keywords.
func Integer ¶
More complex numeric types can be constructed from sequences of unsigned integers and punctuation. This also keeps the tokenizer state management simpler by defining negatives, floats, etc. in terms of production rule semantics. GDL and GDL-II both only assume integer constants in [0-100].
func LineComment ¶
Line comments are any sequence of characters beginning with a semicolon and extending until the next newline rune '\n'.
func QuestionMark ¶
Used in the production rule for Variable terms.
func UnexpectedToken ¶
An unexpected token is used when a parse error is encountered, despite there being no read errors encountered (those are returned with the NextToken call). An example would be incomplete Unicode bytes or a string without end quotes. Illegal tokens retain the image of the scan up to and including the bad char.
type TokenPos ¶
type TokenPos uint32
TokenPos encoded as 32-bit uint:
.LLLLLLLLLLLLLLLLLLLLCCCCCCCCCCFF. :[++++++++++++++++++] : : 20 bits LINE : : [++++++++] : : 10 bits COLUMN : : []: : 2 bits FLAGS: : : `10987654321098765432109876543210'
Use Line(), Column() and Next*() methods to read and update values.
func NewTokenPos ¶
func (TokenPos) Column ¶
Returns the 1-indexed column number of the position, zero means unknwon. Token embeds this from TokenPos interface to adopt its Column() method.
func (TokenPos) InComment ¶
Produces the same Token position, ensuring its flag is set to COMMENT mode.
func (TokenPos) InMetaBlock ¶
Produces the same Token position, ensuring its flag is set to COMMENT mode.
func (TokenPos) InSentence ¶
Produces the same Token position, ensuring its flag is set to SENTENCE mode.
func (TokenPos) Line ¶
Returns the 1-indexed line number of the position, zero means unknwon. Token embeds this from TokenPos interface to adopt ts Line() method.
func (TokenPos) NextLine ¶
Increments the position to its next line, resetting the column as well. Flag's current value is reset from comment mode, retained otherwise.
type TokenReader ¶
type TokenReader interface { // Reads the next token, sending it to output, returning error or nil. If an // io.EOF error was encountered it is returned here as well. NextToken() error // Read/Receive-only channel for Token values sent as being read from the input. // Calling NextToken() or ReadAll() will produce tokens on this channel and one // of those methods will close the channel when it encounters EOF. An EOF token // is also produced as the last token on the channel, so consumers can listen // for it specifically or listen until channel close using `for ... := range`. TokenReceiver() <-chan Token }
Public interface for reading a stream of tokens, sending them to a channel. See also ReadAll(reader) which provides a simpler interface for full reads.
func NewTokenReader ¶
func NewTokenReader(input io.RuneReader, output chan Token) TokenReader
Constructor function for a lexer-based token reader.
type TokenType ¶
type TokenType interface { // Returns a string representation of the type of this token. TypeString() string // Returns a string representation of this token, its syntactic image. Image() string }
TokenType intrinsically defines the subtype of a Token and provides identifying methods.