Mir Code Generator
Motivation
Mir requires a lot of boilerplate code to be written, which is further exacerbated when using the Domain Specific Language (DSL). This presents several issues:
- Writing boilerplate is tedious and time-consuming, making Mir less attractive for prototyping, which is, arguably, one of its main applications.
- It makes it difficult to evolve Mir, as a small change to the core or DSL may require manual changes to a large amount of existing boilerplate code.
- It forces the “waterfall” model on the development of modules. Since any change to the event and message descriptions (currently, in
.proto
files) needs to be propagated to all the related boilerplate, the programmer is discouraged from making such changes and, thus, is incentivized to think through the implementation and decide which events and network messages will be used before they start actually implementing the logic of the module. This approach is not always optimal for prototyping and exploring new ideas.
The code generator aims to address these problems and make Mir a higher level framework. It may also simplify migration to a different model for representing events and network messages, or even allow multiple such models to coexist by performing an automatic conversion between them when necessary.
Usage
Annotating the .proto
definitions
See protos/mir/codegen_extensions.proto
and protos/net/codegen_extensions.proto
for the list of extensions.
The code generator produces a set of types distinct from those generated by protoc
, allowing for greater customization. This is achieved through the use of special annotations in the .proto
files, which help to make the code more explicit and prevent the need for hard-coded rules and conventions such as inferring the semantic of a proto message or a field from its name.
Mir structs
The most basic annotation is option (mir.struct) = true;
in a proto message (not to be confused with network messages). This annotation simply instructs the code generator to process the message and create a Mir-generated type for it. Proto messages that have no Mir annotations are ignored by the Mir code generator.
Example:
message SigVerData {
option (mir.struct) = true;
repeated bytes data = 1;
}
The event hierarchy
Events in Mir are organized into a tree-like hierarchy.
The root of the hierarchy should be annotated with option (mir.event_root) = true;
.
Moreover, it must have a oneof field (typically, named type
or Type
) annotated with option (mir.event_type) = true;
. This oneof lists all the “children” of the root.
Example (adopted from protos/eventpb/event.proto
):
message Event {
option (mir.event_root) = true;
oneof type {
option (mir.event_type) = true;
Init init = 1;
Tick tick = 2;
bcbpb.Event bcb = 28;
}
string dest_module = 200 [(mir.type) = "github.com/filecoin-project/mir/pkg/types.ModuleID"];
}
An internal node of the hierarchy must be annotated with option (mir.event_class) = true;
and must also contain a oneof field annotated with option (mir.event_type) = true;
. This oneof lists all the “children” of the node.
Example (from protos/bcbpb/bcbpb.proto
):
message Event {
option (mir.event_class) = true;
oneof type {
option (mir.event_type) = true;
BroadcastRequest request = 1;
Deliver deliver = 2;
}
}
Finally, a “leaf” in the hierarchy is an actual event and it must be annotated with option (mir.event) = true;
.
Example (from protos/bcbpb/bcbpb.proto
):
message BroadcastRequest {
option (mir.event) = true;
bytes data = 1;
}
The network messages hierarchy
The hierarchy of network messages is similar to the hierarchy of events.
The annotations used are net.message_root
, net.message_class
, and net.message
for the proto messages representing the nodes in the net message hierarchy and option (net.message_type) = true;
for the oneof fields that list the children of a node.
Customizing the generated types
Unfortunately, the protobuf data model is not as expressive as the Go data model.
One big difference is that Go allows for a form of type aliases such as:
type ModuleID string
This creates a major inconvenience as the programmer has to manually convert between the protobuf and Go types, which becomes increasingly cumbersome with repeated
fields (i.e., arrays).
In some cases, event handlers would consist mostly of type conversion boilerplate code.
This issue is addressed by the annotation [(mir.type) = "full/package/path.TypeName"]
.
It allows to specify the type in the Mir-generated code that corresponds to the annotated field in a proto message. The type must be convertible to/from the type of the field.
Example:
message NodeSigsVerified {
option (mir.event) = true;
SigVerOrigin origin = 1 [(mir.origin_response) = true];
repeated string node_ids = 2 [(mir.type) = "github.com/filecoin-project/mir/pkg/types.NodeID"];
repeated bool valid = 3;
repeated string errors = 4 [(mir.type) = "error"];
bool all_ok = 5;
}
Here, the node_ids
field will be represented as a slice of types.NodeID
in the generated code instead of string
.
Similarly, the errors
field will be represented as []error
instead of []string
.
A string
field annotated with [(mir.type) = "error"]
is an exception to the general rule that the type of the field must be convertible to/from the type of the field.
Indeed, in Go, error
cannot be directly cast to string
and vice versa.
The following two functions are used instead (located in codegen/model/types/error.go
):
func StringToError(s string) error {
if s == "" {
return nil
}
return errors.New(s)
}
func ErrorToString(err error) string {
if err == nil {
return ""
}
s := err.Error()
if s == "" {
panic("error.Error() must not return an empty string")
}
return s
}
Customizing repeated and map types
When annotating repeated types, as shown in the previous example, the annotated type is assigned to the underlying type of the slice.
For example, the annotation repeated string node_ids = 2 [(mir.type) = "github.com/filecoin-project/mir/pkg/types.NodeID"];
is represented as a slice of types.NodeID
in the generated code.
At the moment of writing, it is not possible to annotate the slice itself. This means that in the above example, if we have a type alias in Go: type NodeList []NodeID
, then we could not use the DSL to automatically convert a proto field directly into the NodeList type. This is a special inconvenience when annotating bytes
, as they are actually treated as repeated byte
proto fields and thus a slice of bytes cannot be currently annotated with a custom Go type, only the underlying byte type. This is a known issue that is to be resolved soon.
As for maps, the annotations mir.key_type
and mir.value_type
should be used instead of mir.type
, which allows annotating either the key type or the value type.
Example:
[...]
map<string, string> membership = 2 [(mir.key_type) = "github.com/filecoin-project/mir/pkg/types.NodeID",
(mir.key_type) = "github.com/filecoin-project/mir/pkg/types.NodeIP"];
[...]
Here, the map<string,string> membership
will be represented as a map with key types types.NodeID
and value types types.NodeIP
the generated code instead of string
. If only the mir.key_type
was provided:
[...]
map<string, string> membership = 2 [(mir.key_type) = "github.com/filecoin-project/mir/pkg/types.NodeID"];
[...]
Then the generated code will represent this proto field as a map with key types types.NodeID
and value types string
. The same occurs if only the value type was provided. Currently, passing mir.type
to map types results in an error. Same as with slices, the outer map type cannot currently be directly annotated.
The “origin” field for request-response events
Mir currently employs a special model for “request-response” events.
Both events contain a special field named origin
(or Origin
) that serves two purposes: (1) it stores the ID of the module that initiated the request; and (2) it enables the module that sent the request to recover the context in which it was made.
Since these fields need special treatment, they should be marked with [(mir.origin_request) = true]
or [(mir.origin_response) = true]
respectively.
Other annotations
Annotation [(mir.omit_in_event_constructors) = true]
can be used to omit a field from the constructor of the corresponding event type.
Running the code generator
See protos/generate.go
.
The code generator should be typically run using go generate, GNU make, a bash script, or another similar tool.
The following order is important to avoid circular dependencies (the examples are in the go generate format):
- Compile the protbuf extensions:
//go:generate -command protoc-basic protoc --proto_path=. --go_out=../pkg/pb/ --go_opt=paths=source_relative
// Generate the code for codegen extensions.
//go:generate protoc-basic mir/codegen_extensions.proto
//go:generate protoc-basic net/codegen_extensions.proto
- Compile the protoc plugin:
//go:generate go build -o ../codegen/protoc-plugin/protoc-gen-mir ../codegen/protoc-plugin
- Run
protoc
with this plugin enabled on the .proto
files containing the definitions of Mir events:
//go:generate -command protoc-events protoc --proto_path=. --go_out=../pkg/pb/ --go_opt=paths=source_relative --plugin=../codegen/protoc-plugin/protoc-gen-mir --mir_out=../pkg/pb --mir_opt=paths=source_relative
// Generate the protoc-generated code for events and messages.
//go:generate protoc-events trantorpb/trantorpb.proto
//go:generate protoc-events messagepb/messagepb.proto
//go:generate protoc-events eventpb/eventpb.proto
//...
- Build and run the Mir code generator with the import path of the
protoc
-generated code as its input:
// Build the custom code generators.
//go:generate go build -o ../codegen/generators/mir-std-gen/mir-std-gen.bin ../codegen/generators/mir-std-gen
//go:generate -command std-gen ../codegen/generators/mir-std-gen/mir-std-gen.bin
// Generate the Mir-generated code for events and messages.
//go:generate std-gen "github.com/filecoin-project/mir/pkg/pb/eventpb"
//go:generate std-gen "github.com/filecoin-project/mir/pkg/pb/messagepb"
//go:generate std-gen "github.com/filecoin-project/mir/pkg/pb/bcbpb"
//...
The Mir-generated code will be in sub-folders of the folder containing the protoc-generated code.
Architecture
At the time of writing this README, Mir uses protobufs to represent events and network messages.
For a set of reasons (including flexibility, ease of implementation, and potential future migration to a different representation) the Mir code generator is not implemented as a protoc plugin.
Instead, it uses the Go code generated by protoc
as its input.
It inspects this code using reflection.
Hence, the whole package containing the protoc
-generated code must be compilable (i.e., contain no syntax errors).
We proceed by describing each of the components one by one.
The protobuf extensions
Located in protos/mir/codegen_extensions.proto
and protos/net/codegen_extensions.proto
.
These extensions are used to annotate the .proto
definitions with Mir-specific information. See the “Annotating the .proto definitions” section for details.
Dependencies
None
Parsing the protobuf extensions
Code located in codegen/annotations.go
.
The code in this file simply checks if a certain message is marked with a certain annotation.
This file also contains an important function called ShouldGenerateMirType
, which determines whether a given protobuf message should be processed by the Mir code generator, based on whether it is marked with any of the standard Mir annotations.
Dependencies
The protobuf extensions must be compiled with protoc
.
protoc plugin
Located in codegen/protoc-plugin/main.go
.
This tiny plugin slightly enriches the reflection information of the protoc
-generated code.
Specifically, for each protobuf message marked with one of the standard Mir annotations, for each oneof
in this message, the plugin generates a method named Reflect[OneofName]Options
that returns the information about all the options of the oneof
.
Example
// Protobuf definition.
message RequestCertOrigin {
option (mir.struct) = true;
string module = 1 [(mir.type) = "github.com/filecoin-project/mir/pkg/types.ModuleID"];
oneof type {
contextstorepb.Origin context_store = 2;
dslpb.Origin dsl = 3;
}
}
// Code generated by the plugin.
func (*RequestCertOrigin) ReflectTypeOptions() []reflect.Type {
return []reflect.Type{
reflect.TypeOf((*RequestCertOrigin_ContextStore)(nil)),
reflect.TypeOf((*RequestCertOrigin_Dsl)(nil)),
}
}
Dependencies
github.com/filecoin-projct/mir/codegen
-- the plugin uses the ShouldGenerateMirType
function.
The Generator
interface and the RunGenerator
function
Located in codegen/generator.go
.
Code generation in Mir is organized in a modular way.
There are many small generators.
All of them implement the same interface located in codegen/generator.go
:
type Generator interface {
Run(structTypes []reflect.Type) error
}
The Run
method of a code generator takes as input a list of all struct types exported by a package and may potentially return an error.
One non-trivial implication of using reflection to parse the input is that the input package must actually be compiled into the same binary as the code generator itself.
This is solved by the RunGenerator
function located in codegen/generator.go
:
func RunGenerator[GeneratorType Generator](inputPkgPath string) error
This function acts as a meta-code-generator, i.e., it generates, compiles, and runs the code generator itself (see the implementation for details).
Therefore, it accepts the code generator as a type parameter and imposes a few natural restrictions on it:
- It must be a concrete type that can be instantiated (i.e., not an interface).
- The type must be exported (i.e., start with a capital letter).
- It must be in a package that can be imported (i.e., not in a package named
main
or internal
).
This list is likely to be not exhaustive, and it is probably still possible to write a Generator
that would satisfy all of these conditions but would not compile when passed to RunGenerator
.
Hence, it is recommended to stay close to the existing examples in codegen/generators
.
The Run
method is invoked on a zero value of the generator type.
File codgen/generator.go
contains the template code for the meta-code-generator.
A similar approach is employed by gomock in the reflect mode.
Example
See codegen/generators/types-gen/generator/generator.go
for an example of a generator and codegen/generators/types-gen/main.go
for an example of how to run it.
See codegen/generators/mir-std-gen/generator/generator.go
and codegen/generators/mir-std-gen/main.go
for an example of how multiple code generators can be combined into one.
Dependencies
None
Code located in codegen/model
.
The first step of a code generator is to parse the input and construct a model for it.
The most important part of the model and the parser for it are located in folder codegen/model/types
.
It describes the annotated proto messages and the types of their fields.
Note that the parser is implemented using the singleton pattern and contains a cache for already processed data.
This is done to avoid parsing the same data multiple times from multiple code generators when they are composed (see codegen/generators/mir-std-gen/generator/generator.go
for an example of a composed generator).
To avoid unnecessary recursive processing, fields of a proto message are not parsed when the message itself is parsed. To parse the fields, one must explicitly call the ParseFields(msg)
method on the parser.
The remaining two parts of the model are located in codegen/model/events
and codegen/model/messages
, which describe the event and the network message hierarchies respectively.
Dependencies
github.com/filecoin-project/mir/codegen
-- the parser uses the functions to inspect annotations.
Generating the code
Code located in codegen/generators
.
The code generation is done with the help of jennifer with a collection of utility functions located in folder codegen/util
and file codegen/render.go
.
Dependencies
github.com/filecoin-project/mir/codegen/model
Standard code generators
Types generator
Located in codegen/generators/types-gen
.
Generates Mir types for the annotated protobuf messages and functions to easily convert them to/from their protoc
-generated counterparts.
Example:
See protos/bcbpb/bcbpb.proto
and pkg/pb/bcbpb/types/types.mir.go
.
Events generator
Located in codegen/generators/events-gen
.
Generates constructor Functions for Mir events.
Example:
See protos/bcbpb/bcbpb.proto
and pkg/pb/bcbpb/events/events.mir.go
.
Net generator
Located in codegen/generators/net-gen
.
Generates constructor Functions for Mir network messages.
Example:
See protos/bcbpb/bcbpb.proto
and pkg/pb/bcbpb/msgs/msgs.mir.go
.
Dsl generator
Located in codegen/generators/dsl-gen
.
Generates the functions for emitting and handling Mir events and for handling Mir network messages.
Example:
See protos/bcbpb/bcbpb.proto
and the files in pkg/pb/bcbpb/dsl
.
Mir-std generator
Combines all the aforementioned generators into one.
Not only is it more convenient to run a single code generator, but it is also faster as the generators share the parser cache, meaning that the same input data will never be parsed twice.
Creating a third-party code generator
Some low-level modules, such as the net
module, may benefit from their own code generators,
and it should be possible for third-party modules to also benefit from it.
For example, for the case when a third-party module provides a more elaborate communication primitive.
To this end, the code is organised as a collection of small single-task generators rather than a single monolithic generator,
so that it is easier to create a new generator by analogy with the existing ones and to compose mir-std-gen with, potentially, multiple third-party generators into a single binary specific for a particular project.