codegen

package
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 10, 2023 License: Apache-2.0 Imports: 24 Imported by: 0

README

Mir Code Generator

Motivation

Mir requires a lot of boilerplate code to be written, which is further exacerbated when using the Domain Specific Language (DSL). This presents several issues:

  • Writing boilerplate is tedious and time-consuming, making Mir less attractive for prototyping, which is, arguably, one of its main applications.
  • It makes it difficult to evolve Mir, as a small change to the core or DSL may require manual changes to a large amount of existing boilerplate code.
  • It forces the “waterfall” model on the development of modules. Since any change to the event and message descriptions (currently, in .proto files) needs to be propagated to all the related boilerplate, the programmer is discouraged from making such changes and, thus, is incentivized to think through the implementation and decide which events and network messages will be used before they start actually implementing the logic of the module. This approach is not always optimal for prototyping and exploring new ideas.

The code generator aims to address these problems and make Mir a higher level framework. It may also simplify migration to a different model for representing events and network messages, or even allow multiple such models to coexist by performing an automatic conversion between them when necessary.

Usage

Annotating the .proto definitions

See protos/mir/codegen_extensions.protoand protos/net/codegen_extensions.proto for the list of extensions.

The code generator produces a set of types distinct from those generated by protoc, allowing for greater customization. This is achieved through the use of special annotations in the .proto files, which help to make the code more explicit and prevent the need for hard-coded rules and conventions such as inferring the semantic of a proto message or a field from its name.

Mir structs

The most basic annotation is option (mir.struct) = true; in a proto message (not to be confused with network messages). This annotation simply instructs the code generator to process the message and create a Mir-generated type for it. Proto messages that have no Mir annotations are ignored by the Mir code generator.

Example:

message SigVerData {
  option (mir.struct) = true;

  repeated bytes data = 1;
}
The event hierarchy

Events in Mir are organized into a tree-like hierarchy.

The root of the hierarchy should be annotated with option (mir.event_root) = true;. Moreover, it must have a oneof field (typically, named type or Type) annotated with option (mir.event_type) = true;. This oneof lists all the “children” of the root.

Example (adopted from protos/eventpb/event.proto):

message Event {
  option (mir.event_root) = true;

  oneof type {
    option (mir.event_type) = true;

    Init         init = 1;
    Tick         tick = 2;
    bcbpb.Event  bcb  = 28;
  }

  string dest_module = 200 [(mir.type) = "github.com/filecoin-project/mir/pkg/types.ModuleID"];
}

An internal node of the hierarchy must be annotated with option (mir.event_class) = true; and must also contain a oneof field annotated with option (mir.event_type) = true;. This oneof lists all the “children” of the node.

Example (from protos/bcbpb/bcbpb.proto):

message Event {
  option (mir.event_class) = true;

  oneof type {
    option (mir.event_type) = true;

    BroadcastRequest request = 1;
    Deliver          deliver = 2;
  }
}

Finally, a “leaf” in the hierarchy is an actual event and it must be annotated with option (mir.event) = true;.

Example (from protos/bcbpb/bcbpb.proto):

message BroadcastRequest {
  option (mir.event) = true;

  bytes data = 1;
}
The network messages hierarchy

The hierarchy of network messages is similar to the hierarchy of events. The annotations used are net.message_root, net.message_class, and net.message for the proto messages representing the nodes in the net message hierarchy and option (net.message_type) = true; for the oneof fields that list the children of a node.

Customizing the generated types

Unfortunately, the protobuf data model is not as expressive as the Go data model. One big difference is that Go allows for a form of type aliases such as:

type ModuleID string

This creates a major inconvenience as the programmer has to manually convert between the protobuf and Go types, which becomes increasingly cumbersome with repeated fields (i.e., arrays). In some cases, event handlers would consist mostly of type conversion boilerplate code.

This issue is addressed by the annotation [(mir.type) = "full/package/path.TypeName"]. It allows to specify the type in the Mir-generated code that corresponds to the annotated field in a proto message. The type must be convertible to/from the type of the field.

Example:

message NodeSigsVerified {
  option (mir.event) = true;

  SigVerOrigin    origin   = 1 [(mir.origin_response) = true];
  repeated string node_ids = 2 [(mir.type) = "github.com/filecoin-project/mir/pkg/types.NodeID"];
  repeated bool   valid    = 3;
  repeated string errors   = 4 [(mir.type) = "error"];
  bool            all_ok   = 5;
}

Here, the node_ids field will be represented as a slice of types.NodeID in the generated code instead of string. Similarly, the errors field will be represented as []error instead of []string.

A string field annotated with [(mir.type) = "error"] is an exception to the general rule that the type of the field must be convertible to/from the type of the field. Indeed, in Go, error cannot be directly cast to string and vice versa. The following two functions are used instead (located in codegen/model/types/error.go):

func StringToError(s string) error {
	if s == "" {
		return nil
	}
	return errors.New(s)
}

func ErrorToString(err error) string {
	if err == nil {
		return ""
	}
	s := err.Error()
	if s == "" {
		panic("error.Error() must not return an empty string")
	}
	return s
}
Customizing repeated and map types

When annotating repeated types, as shown in the previous example, the annotated type is assigned to the underlying type of the slice. For example, the annotation repeated string node_ids = 2 [(mir.type) = "github.com/filecoin-project/mir/pkg/types.NodeID"]; is represented as a slice of types.NodeID in the generated code.

At the moment of writing, it is not possible to annotate the slice itself. This means that in the above example, if we have a type alias in Go: type NodeList []NodeID, then we could not use the DSL to automatically convert a proto field directly into the NodeList type. This is a special inconvenience when annotating bytes, as they are actually treated as repeated byte proto fields and thus a slice of bytes cannot be currently annotated with a custom Go type, only the underlying byte type. This is a known issue that is to be resolved soon.

As for maps, the annotations mir.key_type and mir.value_type should be used instead of mir.type, which allows annotating either the key type or the value type. Example:

  [...]
  map<string, string> membership = 2 [(mir.key_type) = "github.com/filecoin-project/mir/pkg/types.NodeID",
  				      (mir.key_type) = "github.com/filecoin-project/mir/pkg/types.NodeIP"];
  [...]

Here, the map<string,string> membership will be represented as a map with key types types.NodeID and value types types.NodeIP the generated code instead of string. If only the mir.key_type was provided:

  [...]
  map<string, string> membership = 2 [(mir.key_type) = "github.com/filecoin-project/mir/pkg/types.NodeID"];
  [...]

Then the generated code will represent this proto field as a map with key types types.NodeID and value types string. The same occurs if only the value type was provided. Currently, passing mir.type to map types results in an error. Same as with slices, the outer map type cannot currently be directly annotated.

The “origin” field for request-response events

Mir currently employs a special model for “request-response” events. Both events contain a special field named origin (or Origin) that serves two purposes: (1) it stores the ID of the module that initiated the request; and (2) it enables the module that sent the request to recover the context in which it was made.

Since these fields need special treatment, they should be marked with [(mir.origin_request) = true] or [(mir.origin_response) = true] respectively.

Other annotations

Annotation [(mir.omit_in_event_constructors) = true] can be used to omit a field from the constructor of the corresponding event type.

Running the code generator

See protos/generate.go.

The code generator should be typically run using go generate, GNU make, a bash script, or another similar tool. The following order is important to avoid circular dependencies (the examples are in the go generate format):

  1. Compile the protbuf extensions:
    //go:generate -command protoc-basic protoc --proto_path=. --go_out=../pkg/pb/ --go_opt=paths=source_relative
    
    // Generate the code for codegen extensions.
    //go:generate protoc-basic mir/codegen_extensions.proto
    //go:generate protoc-basic net/codegen_extensions.proto
    
  2. Compile the protoc plugin:
     //go:generate go build -o ../codegen/protoc-plugin/protoc-gen-mir ../codegen/protoc-plugin
    
  3. Run protoc with this plugin enabled on the .proto files containing the definitions of Mir events:
    //go:generate -command protoc-events protoc --proto_path=. --go_out=../pkg/pb/ --go_opt=paths=source_relative --plugin=../codegen/protoc-plugin/protoc-gen-mir --mir_out=../pkg/pb --mir_opt=paths=source_relative
    
    // Generate the protoc-generated code for events and messages.
    //go:generate protoc-events trantorpb/trantorpb.proto
    //go:generate protoc-events messagepb/messagepb.proto
    //go:generate protoc-events eventpb/eventpb.proto
    //...
    
  4. Build and run the Mir code generator with the import path of the protoc-generated code as its input:
    // Build the custom code generators.
    //go:generate go build -o ../codegen/generators/mir-std-gen/mir-std-gen.bin ../codegen/generators/mir-std-gen
    //go:generate -command std-gen ../codegen/generators/mir-std-gen/mir-std-gen.bin
    
    // Generate the Mir-generated code for events and messages.
    //go:generate std-gen "github.com/filecoin-project/mir/pkg/pb/eventpb"
    //go:generate std-gen "github.com/filecoin-project/mir/pkg/pb/messagepb"
    //go:generate std-gen "github.com/filecoin-project/mir/pkg/pb/bcbpb"
    //...
    

The Mir-generated code will be in sub-folders of the folder containing the protoc-generated code.

Architecture

At the time of writing this README, Mir uses protobufs to represent events and network messages. For a set of reasons (including flexibility, ease of implementation, and potential future migration to a different representation) the Mir code generator is not implemented as a protoc plugin. Instead, it uses the Go code generated by protoc as its input. It inspects this code using reflection. Hence, the whole package containing the protoc-generated code must be compilable (i.e., contain no syntax errors).

We proceed by describing each of the components one by one.

The protobuf extensions

Located in protos/mir/codegen_extensions.proto and protos/net/codegen_extensions.proto.

These extensions are used to annotate the .proto definitions with Mir-specific information. See the “Annotating the .proto definitions” section for details.

Dependencies

None

Parsing the protobuf extensions

Code located in codegen/annotations.go.

The code in this file simply checks if a certain message is marked with a certain annotation. This file also contains an important function called ShouldGenerateMirType, which determines whether a given protobuf message should be processed by the Mir code generator, based on whether it is marked with any of the standard Mir annotations.

Dependencies

The protobuf extensions must be compiled with protoc.

protoc plugin

Located in codegen/protoc-plugin/main.go.

This tiny plugin slightly enriches the reflection information of the protoc-generated code. Specifically, for each protobuf message marked with one of the standard Mir annotations, for each oneof in this message, the plugin generates a method named Reflect[OneofName]Options that returns the information about all the options of the oneof.

Example
// Protobuf definition.
message RequestCertOrigin {
  option (mir.struct) = true;

  string module = 1 [(mir.type) = "github.com/filecoin-project/mir/pkg/types.ModuleID"];
  oneof type {
    contextstorepb.Origin context_store = 2;
    dslpb.Origin          dsl           = 3;
  }
}
// Code generated by the plugin.
func (*RequestCertOrigin) ReflectTypeOptions() []reflect.Type {
	return []reflect.Type{
		reflect.TypeOf((*RequestCertOrigin_ContextStore)(nil)),
		reflect.TypeOf((*RequestCertOrigin_Dsl)(nil)),
	}
}
Dependencies

github.com/filecoin-projct/mir/codegen -- the plugin uses the ShouldGenerateMirType function.

The Generator interface and the RunGenerator function

Located in codegen/generator.go.

Code generation in Mir is organized in a modular way. There are many small generators. All of them implement the same interface located in codegen/generator.go:

type Generator interface {
	Run(structTypes []reflect.Type) error
}

The Run method of a code generator takes as input a list of all struct types exported by a package and may potentially return an error.

One non-trivial implication of using reflection to parse the input is that the input package must actually be compiled into the same binary as the code generator itself. This is solved by the RunGenerator function located in codegen/generator.go:

func RunGenerator[GeneratorType Generator](inputPkgPath string) error

This function acts as a meta-code-generator, i.e., it generates, compiles, and runs the code generator itself (see the implementation for details). Therefore, it accepts the code generator as a type parameter and imposes a few natural restrictions on it:

  • It must be a concrete type that can be instantiated (i.e., not an interface).
  • The type must be exported (i.e., start with a capital letter).
  • It must be in a package that can be imported (i.e., not in a package named main or internal).

This list is likely to be not exhaustive, and it is probably still possible to write a Generator that would satisfy all of these conditions but would not compile when passed to RunGenerator. Hence, it is recommended to stay close to the existing examples in codegen/generators.

The Run method is invoked on a zero value of the generator type. File codgen/generator.go contains the template code for the meta-code-generator. A similar approach is employed by gomock in the reflect mode.

Example

See codegen/generators/types-gen/generator/generator.go for an example of a generator and codegen/generators/types-gen/main.go for an example of how to run it.

See codegen/generators/mir-std-gen/generator/generator.go and codegen/generators/mir-std-gen/main.go for an example of how multiple code generators can be combined into one.

Dependencies

None

Parsing input and building a model

Code located in codegen/model.

The first step of a code generator is to parse the input and construct a model for it. The most important part of the model and the parser for it are located in folder codegen/model/types. It describes the annotated proto messages and the types of their fields.

Note that the parser is implemented using the singleton pattern and contains a cache for already processed data. This is done to avoid parsing the same data multiple times from multiple code generators when they are composed (see codegen/generators/mir-std-gen/generator/generator.go for an example of a composed generator).

To avoid unnecessary recursive processing, fields of a proto message are not parsed when the message itself is parsed. To parse the fields, one must explicitly call the ParseFields(msg) method on the parser.

The remaining two parts of the model are located in codegen/model/events and codegen/model/messages, which describe the event and the network message hierarchies respectively.

Dependencies

github.com/filecoin-project/mir/codegen -- the parser uses the functions to inspect annotations.

Generating the code

Code located in codegen/generators.

The code generation is done with the help of jennifer with a collection of utility functions located in folder codegen/util and file codegen/render.go.

Dependencies

github.com/filecoin-project/mir/codegen/model

Standard code generators

Types generator

Located in codegen/generators/types-gen.

Generates Mir types for the annotated protobuf messages and functions to easily convert them to/from their protoc-generated counterparts.

Example:

See protos/bcbpb/bcbpb.proto and pkg/pb/bcbpb/types/types.mir.go.

Events generator

Located in codegen/generators/events-gen.

Generates constructor Functions for Mir events.

Example:

See protos/bcbpb/bcbpb.proto and pkg/pb/bcbpb/events/events.mir.go.

Net generator

Located in codegen/generators/net-gen.

Generates constructor Functions for Mir network messages.

Example:

See protos/bcbpb/bcbpb.proto and pkg/pb/bcbpb/msgs/msgs.mir.go.

Dsl generator

Located in codegen/generators/dsl-gen.

Generates the functions for emitting and handling Mir events and for handling Mir network messages.

Example:

See protos/bcbpb/bcbpb.proto and the files in pkg/pb/bcbpb/dsl.

Mir-std generator

Combines all the aforementioned generators into one. Not only is it more convenient to run a single code generator, but it is also faster as the generators share the parser cache, meaning that the same input data will never be parsed twice.

Creating a third-party code generator

Some low-level modules, such as the net module, may benefit from their own code generators, and it should be possible for third-party modules to also benefit from it. For example, for the case when a third-party module provides a more elaborate communication primitive.

To this end, the code is organised as a collection of small single-task generators rather than a single monolithic generator, so that it is easier to create a new generator by analogy with the existing ones and to compose mir-std-gen with, potentially, multiple third-party generators into a single binary specific for a particular project.

Documentation

Index

Constants

View Source
const WarningCodegen = "// Code generated by Mir codegen. DO NOT EDIT.\n\n"

Variables

This section is empty.

Functions

func IsMirEvent

func IsMirEvent(protoDesc protoreflect.MessageDescriptor) bool

func IsMirEventClass

func IsMirEventClass(protoDesc protoreflect.MessageDescriptor) bool

func IsMirEventRoot

func IsMirEventRoot(protoDesc protoreflect.MessageDescriptor) bool

func IsMirStruct

func IsMirStruct(protoDesc protoreflect.MessageDescriptor) bool

func IsNetMessage

func IsNetMessage(protoDesc protoreflect.MessageDescriptor) bool

func IsNetMessageClass

func IsNetMessageClass(protoDesc protoreflect.MessageDescriptor) bool

func IsNetMessageRoot

func IsNetMessageRoot(protoDesc protoreflect.MessageDescriptor) bool

func RenderJenFile

func RenderJenFile(jenFile *jen.File, outputDir, outputFileName string) (err error)

func RenderJenFiles

func RenderJenFiles(
	jenFileBySourcePackagePath map[string]*jen.File,
	outputDirBySourceDir func(string) string,
	outputFileName string,
) error

func RunGenerator

func RunGenerator[GeneratorType Generator](inputPkgPath string) error

RunGenerator runs a generator on the exported struct types of the package corresponding to inputPkgPath. The zero value of the provided GeneratorType is used. GeneratorType cannot be an interface, it must be exported (i.e., the name of the type should start from a capital letter), and it cannot be in "main" or "internal" package.

func ShouldGenerateMirType

func ShouldGenerateMirType(protoDesc protoreflect.MessageDescriptor) bool

ShouldGenerateMirType returns true if the message is marked by one of the standard Mir annotations. Namely, one of the following:

option (mir.struct) = true;
option (mir.event_root) = true;
option (mir.event_class) = true;
option (mir.event) = true;
option (net.message_root) = true;
option (net.message_class) = true;
option (net.message) = true;

Among these, (mir.struct) is the only one that has no special meaning.

Types

type Generator

type Generator interface {
	Run(structTypes []reflect.Type) error
}

Generator receives a list of all struct types exported by a package as input and produces the generated code. It is assumed to use reflection to inspect the input types. All the types are from the same package and, thus, from the same folder. To obtain the source folder, buildutil.GetSourceDirForPackage can be used.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL