gocgo
gocgo (go-c-go), implemented in Go, includes lexer, parser, and
visitors for the C programming language.
This small project started as a demo of ANTLR and the visitor design
pattern. It turned out that having a C parser available was valuable
for a few other cases on our side, thus, we decided to make it public
as it might be of interest to others as well.
gocgo is (currently) suitable for anlaysis of C code. While some
transformation is possible, having a flexible transformation API could
be added later.
A Quick Example
Below is an example of a visitor that collects C code into a buffer
and along the way removes function bodies.
type FuncDeleteVisitor struct {
*BaseVisitorImpl
in bool
Buff bytes.Buffer
}
func NewFuncDeleteVisitor() *FuncDeleteVisitor {
return &FuncDeleteVisitor{in: false}
}
func (v *FuncDeleteVisitor) VisitTerminal(n antlr.TerminalNode) error {
v.Buff.WriteString(n.GetSymbol().GetText())
v.Buff.WriteString(" ")
return nil
}
func (v *FuncDeleteVisitor) VisitFunctionDefinition(ctx *parsing.FunctionDefinitionContext) (bool, error) {
v.in = true
return true, nil
}
func (v *FuncDeleteVisitor) VisitFunctionDefinitionEnd(ctx *parsing.FunctionDefinitionContext) error {
v.in = false
return nil
}
func (v *FuncDeleteVisitor) VisitBlockItemList(ctx *parsing.BlockItemListContext) (bool, error) {
if v.in {
return false, nil
} else {
return true, nil
}
}
Usage
We provide a complete example of running the aforementioned visitor as
a command in this repo. You can run the following:
go run ./cmd/gocgo/gocgo.go cprogs/for.c
You can take a look at the command for details, but it is rather
simple code.
Architecture and Design
Each visitor should embed a pointer to BaseVisitorImpl
, which
provides the default implementation of Visit
methods.
There is a VisitX
method for each type of AST node (e.g.,
VisitFunctionDefinition
for a function definition). Consider this
example:
func (v *FuncDeleteVisitor) VisitFunctionDefinition(ctx *parsing.FunctionDefinitionContext) (bool, error) {
v.in = true
return true, nil
}
This Visit
method is invoked for every function definition in parsed
C code. There are two return values. The first value (bool
) says
if children should be visited. The second value (error
) is the
error
Go interface.
Each VisitX
method has a corresponding VisitXEnd
method invoked
once all children of the node are visited. From our earlier example:
func (v *FuncDeleteVisitor) VisitFunctionDefinitionEnd(ctx *parsing.FunctionDefinitionContext) error {
v.in = false
return nil
}
VisitXEnd
methods have a single return value (error
).
This architecture is common in many tools, e.g., visitors in the
Eclipse implementation.
Implementation
gocgo cuts a few steps for you if you were planning to use ANTLR to
obtain a parser for the C programming language.
We used the grammar from the ANTLR
repo.
If you wish to reproduce the steps to get lexer, parser, and the
original visitors, you can run the following command in the root of
this repo:
go generate ./...
We made a few changes to simplify the usage of visitors and address
some issues (e.g., Issue
4398). Below is the
summary of key changes / additions:
-
Addressed the issue 4398, by introducing (locally in this repo)
VisitChildren
.
-
Removed the generated c_base_visitor.go
, because it limits
changing behavior of any Visit
method.
-
Introduced our own visitor interface (BaseVisitor
) and its
implementation (BaseVisitorImpl
) that enable proper struct
embedding. Your visitor can now change behavior of (a subset of)
Visit
methods.
-
Updated API design, such that each Visit
method returns bool
and
error
(rather than interface{}
). We find the new approach more
Go appropriate.
License
BSD-3-Clause license.
Feel free to get in touch if you have any comments: Milos Gligoric
<milos.gligoric@gmail.com>
.