Documentation ¶
Overview ¶
Command yy processes yacc source code and produces three output files:
- A Go file containing definitions of AST nodes.
- A Go file containing documentation examples[0] of productions defined by the yacc grammar.
- A new yacc file with automatic actions instantiating the AST nodes.
Installation ¶
To install yy
$ go get [-u] github.com/cznic/yy
Online documentation ¶
http://godoc.org/github.com/cznic/yy
Usage ¶
Invocation:
$ yy [options] <input.y>
Options ¶
Flags handled by the yy command:
-ast string Output AST nodes definitions. (default "ast.go") -astExamples string Output AST examples. (default "ast_test.go") -astImport string Optional AST file imports. -exampleAST string Fuction to call to produce example ASTs. (default "exampleAST") -kind string Default node kind (rule case) field name. (default "kind") -namedCases Generate typed and named case numbers. -node string Default non terminal yacc type. (default "node") -o string Output yacc file. (default "parser.y") -pkg string Package name of generated Go files. Extract from input when blank. -prettyString string Fuction to stringify things nicely. (default "prettyString") -token string Default terminal yacc type. (default "Token") -tokenSep string AST examples token separator string. (default " ") -v string create grammar report (default "y.output") -yylex string Type of yacc's yylex. (default "*lexer")
Changelog ¶
2017-10-23: Added the case directive.
Examples ¶
A partial example: see the testdata directory and files
input: in.y output: ast.go output: ast_test.go output: out.y
The three output files were generated by
yy -o testdata/out.y -ast testdata/ast.go -astExamples testdata/ast_test.go testdata/in.y
A more complete, working project using yy can be found at http://godoc.org/github.com/cznic/pl0
Concepts ¶
Every rule is turned into a definition of a struct type in ast.go (adjust using the -ast flag). The fields of the type are a sum of all productions (cases) of the rule.
Rule: Foo Bar // Case 0 | Foo Baz // Case 1
The generated type will be something like
type Rule struct { Case in // In [0, 1]. Bar *Bar Baz *Baz Foo *Foo }
In the above, Foo and Bar fields will be non nill when Case is 0 and Foo and Baz fields will be non nil when Case is 1.
The above holds when both Foo and Bar are non terminal symbols. If the production(s) contain also terminal symbols, all those symbols are turned into fields named Token with an optional numeric suffix when more than one non terminal appears in any of the production(s).
Rule: Foo '+' Bar | Foo '[' NUMBER ']' Bar
The generated type will be like
type Rule struct { Case int // In [0, 1]. Bar *Bar Baz *Baz Foo *Foo Token MyTokenType Token2 MyTokenType Token3 MyTokenType }
In the above, Token will capture '+' when Case is 0. For Case 1, Token will capture '[', Token2 NUMBER and Token3 ']'.
MyTokenType is the type defined in the yacc %union as in
%union { node MyNodeType Token MyTokenType }
It is assumed that the lexer passed as an argument to yyParse instantiantes the lval.Token field with additional token information, like the lexeme value, starting position in the file etc.
Generated actions ¶
There's a direct mapping, though not in the same order, of yacc pseudo variables $1, $2, ... and fields of the generated node types. For every production not disabled by the yy:ignore direction, yy injects code for instantiating the AST node when the production is reduced. For example, this rule from input.y
File: Prologue TopLevelDeclList
having no semantic action is turned into
File: Prologue TopLevelDeclList { $$ = &File{ Prologue: $1.(*Prologue), TopLevelDeclList: $2.(*TopLevelDeclList).reverse(), } }
in output.y. The default yacc type of AST nodes is 'node' and can be changed using the -node flag.
Conventions ¶
Option-like rules, for example as in
BlockOpt: | Block are converted into BlockOpt: /* empty */ { $$ = (*BlockOpt)(nil) } | Block { $$ = &BlockOpt{ Block: $1.(*Block), } }
in output.y, ie. the empty case does not produce a &RuleOpt{}, but nil instead to conserve space.
Generated examples depend on an user supplied function, by default named exampleAST, with a signature
exampleAST(rule int, src string) interface{}
This function is called with the production number, as assigned by goyacc and an example string generated by yy. exampleAST should parse the example string and return the AST created when production rule is reduced.
When the project's parser is not yet working, a dummy exampleAST function returnin always nil is a workaround.
Magic names ¶
yy inspects rule actions found in the input file. If the action code mentions identifier lx, yy asumes it refers to the yyLexer passed to yyParse. In that case code like
lx := yylex.(*lexer)
is injected near the beginning of the semantic action. The specific type into which the yylex parameter is type asserted is adjustable using the -yylex flag. Similarly, when identifier lhs is mentioned, a short variable definiton of variable lhs, like
lhs := &Foo{...} $$ = lhs
is injected into the output.y action, replacing the default generated action (see "Concepts")
For example, an action in input.y
| IdentifierList Type '=' ExpressionList { lhs.declare(lx.scope) }
Produces
{ lx := yylex.(*lexer) lhs := &VarSpec{ Case: 2, IdentifierList: $1.(*IdentifierList).reverse(), Type: $2.(*Type), Token: $3, ExpressionList: $4.(*ExpressionList).reverse(), } $$ = lhs lhs.declare(lx.scope) }
in output.y.
The AST examples generator depends on presence of the yy:token directive for all non constant terminal symbols or the presence of the constant token value as in this example
%token /*yy:token "%c" */ IDENTIFIER "identifier" %token BREAK "break"
Using fe ¶
The AST examples yy generates must be post processed by using the fe command (http://godoc.org/github.com/cznic/fe), for example
$ go test -run ^Example[^_] | fe
One of the reasons why this is not done automatically by yy is that the above command will succeed only after your project has a _working_ scanner/parser combination. That's not the case in the early stages.
Directives ¶
yy recognizes specially formatted comments within the input as directives. All directive have the format
//yy:command argument or /*yy:command argument */
Note that the directive must follow immediately the comment opening. There must be no empty line(s) between the directive and the production it aplies to.
Directive example ¶
For example
//yy:example "foo * bar" Rule: Foo '*' Bar //yy:example "foo / bar" | Foo '/' Bar
The argument of the example directive is a doubly quoted Go string. The string is used instead of an automatically generated example.
Directive field ¶
For example
//yy:field count int //yy:field flag bool Rule: Foo Bar
The argument of the field directive is the text up to the end of the comment. The argument is added to the automatically generated fields of the node type of Rule.
Directive ignore ¶
For example
//yy:ignore Rule: Foo Bar
The ignore directive has no arguments. The directive disables generating of the node type of Rule as well as generating code instantiating such node.
Directive list ¶
For example
//yy:list Rule: Item | Rule ',' Item
The list directive has no arguments. yy by default detects all left recursive rules. When such rule has name having suffix 'List', yy automatically generates proper reversing of the rule items. Using the list directive enables the same when such a left recursive rule does not have suffix 'List' in its name.
Directive token ¶
For example
/*yy:token %c*/ IDENT /*yy:token %d*/ NUMBER
The argument of the token directive is a doubly quoted Go string. The string is passed to a fmt.Sprinf call with an numeric argument chosen by yy that falls small ASCII letters. The resulting string is used to generate textual token values in examples.
Directive case ¶
For example
//yy:case Foo /*yy:case Bar */ NUMBER
The argument of the case directive is an identifier, which is appended to the rule name to produce a symbolic and typed case number value. The type name is <RuleName>Case.