Documentation ¶
Overview ¶
Package common contains common definitions and routines for the Hydra parser. This includes definitions of character classes, errors, locations, standard tokens, and the Profile, which enables dynamic changes to the way the parser functions. The Profile, in particular, allows for relatively easy versioning of the Hydra language.
Character classes are in classes.go; common errors, in errors.go. The Location class exists in locations.go; and options, which houses the Profile, is in options.go. The Profile itself is defined in profile.go, and basic interfaces, such as the one defining a scanner, are in interfaces.go.
The basic tokens are defined in tokens.go, with identifiers.go, operators.go, and strings.go containing the code for describing those token types. (The identifiers.go file contains code associated with keywords, which are recognized by the identifiers recognizer in the lexer.)
Index ¶
- Constants
- Variables
- func ErrDanglingOpen(tok *Token) error
- func ErrNoOpen(sym *Symbol) error
- func ErrOpMismatch(openTok *Token, close *Symbol) error
- type AugChar
- type FilePos
- type Keywords
- type Lexer
- type Location
- type MockLexer
- type MockScanner
- type Operators
- type Option
- type Options
- type Profile
- type Scanner
- type StrEscape
- type Symbol
- type Token
Constants ¶
const ( EOF rune = -(iota + 1) // End of file Err // An error occurred )
Special character constants.
const ( CharWS uint16 = 1 << iota // Whitespace characters CharNL // Newline character CharBinDigit // Binary digit CharOctDigit // Octal digit CharDecDigit // Decimal digit CharHexDigit // Hexadecimal digit CharIDStart // Valid character for ID start CharIDCont // Valid character for ID continue CharStrFlag // String flag character CharQuote // String quote character CharComment // Comment character )
Defined character classes.
const ( StrRaw uint8 = 1 << iota // Raw strings, ignores escapes StrBytes // Byte strings StrMulti // Multi-line (triple-quoted) string StrTriple // Quote allows triples )
Defined string flags
Variables ¶
var ( ErrSplitEntity = errors.New("entity split across files") ErrBadRune = errors.New("illegal UTF-8 encoding") ErrBadIndent = errors.New("inconsistent indentation") ErrBadOp = errors.New("bad operator character") ErrMixedIndent = errors.New("mixed whitespace types in indent") ErrDanglingBackslash = errors.New("dangling backslash") ErrBadNumber = errors.New("bad character for number literal") ErrBadEscape = errors.New("bad escape sequence") ErrBadStrChar = errors.New("invalid character for string") ErrUnclosedStr = errors.New("unclosed string literal") ErrBadIdent = errors.New("bad identifier character") )
Various errors that may occur during parsing.
var ( TokError = &Symbol{Name: "<Error>"} TokEOF = &Symbol{Name: "<EOF>"} TokNewline = &Symbol{Name: "<Newline>"} TokIndent = &Symbol{Name: "<Indent>"} TokDedent = &Symbol{Name: "<Dedent>"} TokIdent = &Symbol{Name: "<Ident>"} TokInt = &Symbol{Name: "<Int>"} TokFloat = &Symbol{Name: "<Float>"} TokString = &Symbol{Name: "<String>"} TokBytes = &Symbol{Name: "<Bytes>"} TokDocComment = &Symbol{Name: "<DocComment>"} )
Standard token symbols
var CharClasses = utils.FlagSet16{ CharWS: "whitespace", CharNL: "newline", CharBinDigit: "binary digit", CharOctDigit: "octal digit", CharDecDigit: "decimal digit", CharHexDigit: "hexadecimal digit", CharIDStart: "ID start", CharIDCont: "ID continue", CharStrFlag: "string flag", CharQuote: "quote", CharComment: "comment", }
CharClasses is a mapping of character class flags to names.
var StrFlags = utils.FlagSet8{ StrRaw: "raw", StrBytes: "bytes", StrMulti: "multi-line", StrTriple: "triple quote", }
StrFlags is a mapping of string flags to names.
Functions ¶
func ErrDanglingOpen ¶
ErrDanglingOpen generates an error for a dangling open operator with no corresponding close operator.
func ErrNoOpen ¶
ErrNoOpen generates an error for a close operator with no corresponding open operator.
func ErrOpMismatch ¶
ErrOpMismatch generates an error for a close operator that doesn't match the open operator.
Types ¶
type AugChar ¶
type AugChar struct { C rune // The character Class uint16 // The character's class Loc Location // The character's location Val interface{} // The "value"; an integer for digits }
AugChar is a struct that packages together a character, its class, its location, and any numeric value it may have. This is the type that the scanner returns.
type FilePos ¶
type FilePos struct { L int // The line number of the position C int // The column number of the position }
FilePos specifies a position within a given file.
type Keywords ¶
Keywords is a map mapping identifier strings to the symbols to use for keyword tokens.
type Lexer ¶
type Lexer interface { // Next retrieves the next token from the scanner. If the end // of file is reached, an EOF token is returned; if an error // occurs while scanning or lexically analyzing the file, an // error token is returned with the error as the token's // semantic value. After either an EOF token or an error // token, nil will be returned. Next() *Token // Push pushes a single token back onto the lexer. Any number // of tokens may be pushed back. Push(tok *Token) }
Lexer is an interface describing a lexer. A lexer pulls characters from a scanner and converts them to tokens, which may then be used by the parser.
type Location ¶
type Location struct { File string // The name of the file B FilePos // The beginning of the range E FilePos // The end of the range }
Location specifies the exact range of locations of some entity.
func OctEscape ¶
OctEscape is a StrEscape that consumes octal digits and returns the specified rune.
func (*Location) Advance ¶
Advance advances a location in place. The current range end becomes the range beginning, and the range end is the sum of the new range beginning and the provided offset.
func (*Location) AdvanceTab ¶
AdvanceTab advances a location in place, as if by a tab character. The argument indicates the size of a tab stop.
type MockLexer ¶
MockLexer is a mock object for lexers.
func (*MockLexer) Next ¶
Next retrieves the next token from the scanner. If the end of file is reached, an EOF token is returned; if an error occurs while scanning or lexically analyzing the file, an error token is returned with the error as the token's semantic value. After either an EOF token or an error token, nil will be returned.
type MockScanner ¶
MockScanner is a mock object for scanners.
func (*MockScanner) Next ¶
func (m *MockScanner) Next() AugChar
Next retrieves the next rune from the file. An EOF augmented character is returned on end of file, and an Err augmented character is returned in the event of an error.
func (*MockScanner) Push ¶
func (m *MockScanner) Push(ch AugChar)
Push pushes back a single augmented character onto the scanner. Any number of characters may be pushed back.
type Operators ¶
type Operators struct { Sym *Symbol // The operator at this node // contains filtered or unexported fields }
Operators is a structure for describing an operator tree. The lexer uses the operator tree to match operators, while allowing for backtracking; this enables selecting the longest match.
func NewOperators ¶
NewOperators constructs an Operators tree with all the specified operators.
func (*Operators) Children ¶
Children implements the utils.Visitable interface, allowing an operator tree to be visualized using utils.Visualize().
func (*Operators) Copy ¶
Copy constructs a copy of this Operators tree. The copy will contain just the subtree rooted at this node, if this node is not the root.
func (*Operators) Next ¶
Next looks up the next node in the tree, given an operator rune. Returns nil if no corresponding node exists in the tree.
type Option ¶
type Option func(opts *Options)
Option type for option functions. Each function mutates a passed-in Options structure to set the specific option.
func Encoding ¶
Encoding sets the encoding for the file being scanned. If not set, an attempt is made to guess it from the source (depends on source implementing io.Seeker), and a default of "utf-8" is used if that fails.
type Options ¶
type Options struct { Source io.Reader // The source from which to read Filename string // The name of the file being parsed Encoding string // The encoding of the source Prof *Profile // The profile TabStop int // The size of a tab stop }
Options contains the options for the parser.
type Profile ¶
type Profile struct { IDStart runes.Set // Set of valid identifier start chars IDCont runes.Set // Set of valid identifier continue chars StrFlags map[rune]uint8 // Valid string flags Quotes map[rune]uint8 // Valid quote characters Escapes map[rune]StrEscape // String escapes Keywords Keywords // Mapping of keywords Norm norm.Form // Normalization for identifiers Operators *Operators // Recognized operators }
Profile describes a profile for the parser. A profile is simply the version-specific rules, with desired options applied, and covers such things as the sets of identifier characters, etc.
type Scanner ¶
type Scanner interface { // Next retrieves the next rune from the file. An EOF // augmented character is returned on end of file, and an Err // augmented character is returned in the event of an error. Next() AugChar // Push pushes back a single augmented character onto the // scanner. Any number of characters may be pushed back. Push(ch AugChar) }
Scanner is an interface describing a scanner. A scanner reads a source character rune by character rune, returning augmented characters.
type StrEscape ¶
StrEscape is a function type for handling string escapes. It is called with the character, the scanner, and the string flags, and should return a rune to add to the buffer and the escape sequence location. If an error is returned, the error location should be returned instead. If no character should be written, return EOF.
func HexEscape ¶
HexEscape sets up a StrEscape that consumes the specified number of hexadecimal digits and returns the specified rune.
func SimpleEscape ¶
SimpleEscape sets up a StrEscape that returns a specified character.
type Symbol ¶
type Symbol struct { Name string // The name of the symbol, for display purposes Open string // Paired operator that opens Close string // Paired operator that closes }
Symbol represents a defined symbol, or token type. This could indicate something with a fixed value, like an operator, or something that has semantic value, such as a number literal.
type Token ¶
type Token struct { Sym *Symbol // The token type Loc Location // The location range of the token Val interface{} // The semantic value of the token }
Token represents a single token emitted by the lexer.