canoto

package module
v0.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 10, 2025 License: BSD-3-Clause Imports: 8 Imported by: 2

README

Canoto

Canoto is a serialization format designed to be:

  1. Fast
  2. Compact
  3. Canonical
  4. Backwards compatible
  5. Read compatible with Protocol Buffers.

Install

go install github.com/StephenButtolph/canoto/canoto@latest

Define Messages

Canoto messages are defined as normal golang structs:

type ExampleStruct0 struct {
	Int32              int32           `canoto:"int,1"`
	Int64              int64           `canoto:"int,2"`
	Uint32             uint32          `canoto:"int,3"`
	Uint64             uint64          `canoto:"int,4"`
	Sint32             int32           `canoto:"sint,5"`
	Sint64             int64           `canoto:"sint,6"`
	Fixed32            uint32          `canoto:"fint32,7"`
	Fixed64            uint64          `canoto:"fint64,8"`
	Sfixed32           int32           `canoto:"fint32,9"`
	Sfixed64           int64           `canoto:"fint64,10"`
	Bool               bool            `canoto:"bool,11"`
	String             string          `canoto:"string,12"`
	Bytes              []byte          `canoto:"bytes,13"`
	OtherStruct        ExampleStruct1  `canoto:"value,14"`
	OtherStructPointer *ExampleStruct1 `canoto:"pointer,15"`
	OtherStructField   *ExampleStruct1 `canoto:"field,16"`

	canotoData canotoData_ExampleStruct0
}

type ExampleStruct1 struct {
	Int32 int32 `canoto:"int,536870911"`
}

All structs must include a field called canotoData that will cache the results of calculating the size of the struct.

The type canotoData_${structName} is automatically generated by Canoto.

For a given Struct, Canoto automatically implements the Message and FieldMaker[*Struct] interfaces:

// Message defines a type that can be a stand-alone Canoto message.
type Message interface {
	Field
	// MarshalCanoto returns the Canoto representation of this message.
	//
	// It is assumed that this message is ValidCanoto.
	MarshalCanoto() []byte
	// UnmarshalCanoto unmarshals a Canoto-encoded byte slice into the message.
	UnmarshalCanoto(bytes []byte) error
}

// Field defines a type that can be included inside of a Canoto message.
type Field interface {
	// MarshalCanotoInto writes the field into a canoto.Writer and returns the
	// resulting canoto.Writer.
	//
	// It is assumed that CalculateCanotoCache has been called since the last
	// modification to this field.
	//
	// It is assumed that this field is ValidCanoto.
	MarshalCanotoInto(w Writer) Writer
	// CalculateCanotoCache populates internal caches based on the current
	// values in the struct.
	CalculateCanotoCache()
	// CachedCanotoSize returns the previously calculated size of the Canoto
	// representation from CalculateCanotoCache.
	//
	// If CalculateCanotoCache has not yet been called, or the field has been
	// modified since the last call to CalculateCanotoCache, the returned size
	// may be incorrect.
	CachedCanotoSize() int
	// UnmarshalCanotoFrom populates the field from a canoto.Reader.
	UnmarshalCanotoFrom(r Reader) error
	// ValidCanoto validates that the field can be correctly marshaled into the
	// Canoto format.
	ValidCanoto() bool
}

// FieldMaker is a Field that can create a new value of type T.
//
// The returned value must be able to be unmarshaled into.
//
// This type can be used when implementing a generic Field. However, if T is an
// interface, it is possible for generated code to compile and panic at runtime.
type FieldMaker[T any] interface {
	Field
	MakeCanoto() T
}

Generate

In order to generate canoto information for all of the structs in a file, simply run the canoto command with one or more files.

canoto example0.go example1.go

The above example will generate example0.canoto.go and example1.canoto.go.

The corresponding proto file for a canoto file can also be generated by adding the --proto.

canoto --proto example.go

The above example will generate example.canoto.go and example.proto.

go:generate

To automatically generate the .canoto.go version of a file, it is recommended to use go:generate

Placing

//go:generate canoto $GOFILE

at the top of a file will update the .canoto.go version of the file every time go generate ./... is run.

Best Practices

canoto only inspects a single golang file at a time, so it is recommended to define nested messages in the same file to be able to generate the most useful proto file.

Additionally, while fully supported in the canoto output, type aliases and generic types will result in proto files with default types. It is still guaranteed for the generated proto file to be able to parse canoto data, but the types may not be as specific as they could be.

If type aliases are needed, it may make sense to modify the generated proto file to specify the most specific proto type possible.

Generics

There are two ways to utilize generics with canoto.

Value and Pointer Types

To guarantee safe usage of a struct, type constraints can be used to implement a struct with a generic field T. Canoto inspects the generic types, so the struct must include a type parameter of canoto.FieldPointer[T]. Such as:

type GenericField[T any, _ canoto.FieldPointer[T]] struct {
	Value   T  `canoto:"value,1"`
	Pointer *T `canoto:"pointer,2"`

	canotoData canotoData_GenericField
}

If canoto.FieldPointer is aliased to a different type or is otherwise re-implemented, Canoto will not be able to correctly tie the type constraints together.

Field Types

Because using multiple types to constrain a single type is clunky, there is support for canoto.FieldMakers. canoto.FieldMakers can be used to allocate new messages during parsing. In order for canoto.FieldMakers to work safely, the implementing type must have a useful zero value.

[!WARNING] MakeCanoto, CalculateCanotoCache, CachedCanotoSize, ValidCanoto and MarshalCanotoInto must be able to be called with the zero value of the type implementing canoto.FieldMaker to avoid runtime panics. It is never safe to pass an interface as the canoto.FieldMaker type.

An example of correctly using canoto.FieldMaker:

type GenericField[T canoto.FieldMaker[T]] struct {
	Value T `canoto:"field,1"`

	canotoData canotoData_GenericField
}

var _ canoto.Message = (*GenericField[*ExampleStruct0])(nil)

An example of incorrectly using canoto.FieldMaker:

type GenericField[T canoto.FieldMaker[T]] struct {
	Value T `canoto:"field,1"`

	canotoData canotoData_GenericField
}

type BadUsage interface {
	canoto.Field
	MakeCanoto() BadUsage
}

var _ canoto.Message = (*GenericField[BadUsage])(nil)

Because BadUsage is an interface, it does not have a useful zero value and will panic when GenericField attempts to call its methods.

Standalone Implementations

In some instances, it may be desirable for the generated code to avoid introducing the dependency on this repo into the go.mod file. As an example, if the user must support having multiple versions of canoto utilized in the same application.

There are two CLI flags that enable using canoto without impacting the go.mod.

  1. --library when specified generates the canoto library in the provided folder. For example --library="./internal" generates the canoto library in the ./internal/canoto package.
  2. --import specifies the canoto library to depend on in any generated code.

For example:

canoto --library="./internal" --import="github.com/StephenButtolph/canoto/internal/canoto" ./canoto.go

Will generate the canoto library in ./internal/canoto and will import "github.com/StephenButtolph/canoto/internal/canoto" rather than the default "github.com/StephenButtolph/canoto" when generating ./canoto.canoto.go.

Supported Types

go type canoto type proto type wire type
int8 int int32 varint
int16 int int32 varint
int32 int int32 varint
int64 int int64 varint
uint8 int uint32 varint
uint16 int uint32 varint
uint32 int uint32 varint
uint64 int uint64 varint
int8 sint sint32 varint
int16 sint sint32 varint
int32 sint sint32 varint
int64 sint sint64 varint
uint32 fint32 fixed32 i32
uint64 fint64 fixed64 i64
int32 fint32 sfixed32 i32
int64 fint64 sfixed64 i64
bool bool bool varint
string string string len
[]byte bytes bytes len
[x]byte fixed bytes bytes len
T Message value message len
*T Message pointer message len
T FieldMaker field message len
[]int8 repeated int repeated int32 len
[]int16 repeated int repeated int32 len
[]int32 repeated int repeated int32 len
[]int64 repeated int repeated int64 len
[]uint8 repeated int repeated uint32 len
[]uint16 repeated int repeated uint32 len
[]uint32 repeated int repeated uint32 len
[]uint64 repeated int repeated uint64 len
[]int8 repeated sint repeated sint32 len
[]int16 repeated sint repeated sint32 len
[]int32 repeated sint repeated sint32 len
[]int64 repeated sint repeated sint64 len
[]uint32 repeated fint32 repeated fixed32 len
[]uint64 repeated fint64 repeated fixed64 len
[]int32 repeated fint32 repeated sfixed32 len
[]int64 repeated fint64 repeated sfixed64 len
[]bool repeated bool repeated bool len
[]string repeated string repeated string len
[][]byte repeated bytes repeated bytes len
[][x]byte repeated fixed bytes repeated bytes len
[]T Message repeated value repeated message len
[]*T Message repeated pointer repeated message len
[]T FieldMaker repeated field repeated message len
[x]int8 fixed repeated int repeated int32 len
[x]int16 fixed repeated int repeated int32 len
[x]int32 fixed repeated int repeated int32 len
[x]int64 fixed repeated int repeated int64 len
[x]uint8 fixed repeated int repeated uint32 len
[x]uint16 fixed repeated int repeated uint32 len
[x]uint32 fixed repeated int repeated uint32 len
[x]uint64 fixed repeated int repeated uint64 len
[x]int8 fixed repeated sint repeated sint32 len
[x]int16 fixed repeated sint repeated sint32 len
[x]int32 fixed repeated sint repeated sint32 len
[x]int64 fixed repeated sint repeated sint64 len
[x]uint32 fixed repeated fint32 repeated fixed32 len
[x]uint64 fixed repeated fint64 repeated fixed64 len
[x]int32 fixed repeated fint32 repeated sfixed32 len
[x]int64 fixed repeated fint64 repeated sfixed64 len
[x]bool fixed repeated bool repeated bool len
[x]string fixed repeated string repeated string len
[x][]byte fixed repeated bytes repeated bytes len
[x][y]byte fixed repeated fixed bytes repeated bytes len
[x]T Message fixed repeated value repeated message len
[x]*T Message fixed repeated pointer repeated message len
[x]T FieldMaker fixed repeated field repeated message len
Non-standard encoding

It is valid to define a Field that implements a non-standard format. However, this format should still be canonical and the corresponding Proto file should report opaque bytes.

Why not Proto?

Proto is a fast, compact, encoding format with extensive language support. However, Proto is not canonical.

Proto is designed to be forwards-compatible. Almost by definition, a forwards-compatible serialization format can not be canonical. The format of a field can not validated to be canonical if the expected type of the field is not known during decoding.

Why is being canonical important?

In some cases, non-canonical serialization formats are subtle to work with.

For example, if the hash of the serialized data is important or if the serialized data is cryptographically signed.

In order to ensure that the hash of the serialized data does not change, it is important to carefully avoid re-serializing a message that was previously serialized.

For canonical serialization formats, the hash of the serialized data is guaranteed never to change. Every correct implementation of the format will produce the same hash.

Why be read compatible with Proto?

By being read compatible with Proto, users of the Canoto format inherit some Proto's cross language support.

If an application only needs to read Canoto messages, but not write them, it can simply treat the Canoto message as a Proto message.

Is Canoto Fast?

Canoto is typically more performant for both serialization and deserialization than Proto. However, Proto does not typically validate that fields are canonical. If a field is expensive to inspect, it's possible Canoto can be slightly slower.

Canoto is optimized to perform no unnecessary memory allocations, so careful management to ensure messages are stack allocated can significantly improve performance over Proto.

Is Canoto Forwards Compatible?

No. Canoto chooses to be a canonical serialization format rather than being forwards compatible.

Documentation

Overview

Canoto provides common functionality required for reading and writing the canoto format.

Index

Constants

View Source
const (
	Varint WireType = iota
	I64
	Len

	I32

	// SizeFint32 is the size of a 32-bit fixed size integer in bytes.
	SizeFint32 = 4
	// SizeFint64 is the size of a 64-bit fixed size integer in bytes.
	SizeFint64 = 8
	// SizeBool is the size of a boolean in bytes.
	SizeBool = 1

	// MaxFieldNumber is the maximum field number allowed to be used in a Tag.
	MaxFieldNumber = 1<<29 - 1

	// Version is the current version of the canoto library.
	Version = "v0.10.0"
)

Variables

View Source
var (
	// Code is the actual golang code for this library; including this comment.
	//
	// This variable is not used internally, so the compiler is smart enough to
	// omit this value from the binary if the user of this library does not
	// utilize this variable; at least at the time of writing.
	//
	// This can be used during codegen to generate this library.
	//
	//go:embed canoto.go
	Code string

	ErrInvalidFieldOrder  = errors.New("invalid field order")
	ErrUnexpectedWireType = errors.New("unexpected wire type")
	ErrDuplicateOneOf     = errors.New("duplicate oneof field")
	ErrInvalidLength      = errors.New("decoded length is invalid")
	ErrZeroValue          = errors.New("zero value")
	ErrUnknownField       = errors.New("unknown field")
	ErrPaddedZeroes       = errors.New("padded zeroes")

	ErrOverflow        = errors.New("overflow")
	ErrInvalidWireType = errors.New("invalid wire type")
	ErrInvalidBool     = errors.New("decoded bool is neither true nor false")
	ErrStringNotUTF8   = errors.New("decoded string is not UTF-8")
)

Functions

func Append

func Append[T Bytes](w *Writer, v T)

Append writes unprefixed bytes to the writer.

func AppendBool

func AppendBool[T ~bool](w *Writer, b T)

AppendBool writes a boolean to the writer.

func AppendBytes

func AppendBytes[T Bytes](w *Writer, v T)

AppendBytes writes a length-prefixed byte slice to the writer.

func AppendFint32

func AppendFint32[T Int32](w *Writer, v T)

AppendFint32 writes a 32-bit fixed size integer to the writer.

func AppendFint64

func AppendFint64[T Int64](w *Writer, v T)

AppendFint64 writes a 64-bit fixed size integer to the writer.

func AppendInt

func AppendInt[T Int](w *Writer, v T)

AppendInt writes an integer to the writer as a varint.

func AppendSint

func AppendSint[T Sint](w *Writer, v T)

AppendSint writes an integer to the writer as a zigzag encoded varint.

func CountBytes

func CountBytes(bytes []byte, tag string) (int, error)

CountBytes counts the consecutive number of length-prefixed fields with the given tag.

func CountInts

func CountInts(bytes []byte) int

CountInts counts the number of varints that are encoded in bytes.

func HasNext

func HasNext(r *Reader) bool

HasNext returns true if there are more bytes to read.

func HasPrefix

func HasPrefix(bytes []byte, prefix string) bool

HasPrefix returns true if the bytes start with the given prefix.

func IsZero

func IsZero[T comparable](v T) bool

IsZero returns true if the value is the zero value for its type.

func MakePointer added in v0.5.0

func MakePointer[T any](_ *T) *T

MakePointer creates a new pointer. It is equivalent to `new(T)`.

This function is useful to use in auto-generated code, when the type of a variable is unknown. For example, if we have a variable `v` which we know to be a pointer, but we do not know the type of the pointer, we can use this function to leverage golang's type inference to create the new pointer.

func MakeSlice

func MakeSlice[T any](_ []T, length int) []T

MakeSlice creates a new slice with the given length. It is equivalent to `make([]T, length)`.

This function is useful to use in auto-generated code, when the type of a variable is unknown. For example, if we have a variable `v` which we know to be a slice, but we do not know the type of the elements, we can use this function to leverage golang's type inference to create the new slice.

func ReadBool

func ReadBool[T ~bool](r *Reader, v *T) error

ReadBool reads a boolean from the reader.

func ReadBytes

func ReadBytes[T ~[]byte](r *Reader, v *T) error

ReadBytes reads a byte slice from the reader.

func ReadFint32

func ReadFint32[T Int32](r *Reader, v *T) error

ReadFint32 reads a 32-bit fixed size integer from the reader.

func ReadFint64

func ReadFint64[T Int64](r *Reader, v *T) error

ReadFint64 reads a 64-bit fixed size integer from the reader.

func ReadInt

func ReadInt[T Int](r *Reader, v *T) error

ReadInt reads a varint encoded integer from the reader.

func ReadSint

func ReadSint[T Sint](r *Reader, v *T) error

ReadSint reads a zigzag encoded integer from the reader.

func ReadString

func ReadString[T ~string](r *Reader, v *T) error

ReadString reads a string from the reader. The string is verified to be valid UTF-8.

func SizeBytes

func SizeBytes[T Bytes](v T) int

SizeBytes calculates the size the length-prefixed bytes would take if written.

func SizeInt

func SizeInt[T Int](v T) int

SizeInt calculates the size of an integer when encoded as a varint.

func SizeSint

func SizeSint[T Sint](v T) int

SizeSint calculates the size of an integer when zigzag encoded as a varint.

func Tag

func Tag(fieldNumber uint32, wireType WireType) []byte

Tag calculates the tag for a field number and wire type.

This function should not typically be used during marshaling, as tags can be precomputed.

func Zero added in v0.7.0

func Zero[T any](_ T) (_ T)

Zero returns the zero value for its type.

Types

type Bytes

type Bytes interface{ ~string | ~[]byte }

type Field

type Field interface {
	// MarshalCanotoInto writes the field into a canoto.Writer and returns
	// the resulting canoto.Writer.
	//
	// It is assumed that CalculateCanotoCache has been called since the
	// last modification to this field.
	//
	// It is assumed that this field is ValidCanoto.
	MarshalCanotoInto(w Writer) Writer
	// CalculateCanotoCache populates internal caches based on the current
	// values in the struct.
	CalculateCanotoCache()
	// CachedCanotoSize returns the previously calculated size of the Canoto
	// representation from CalculateCanotoCache.
	//
	// If CalculateCanotoCache has not yet been called, or the field has
	// been modified since the last call to CalculateCanotoCache, the
	// returned size may be incorrect.
	CachedCanotoSize() int
	// UnmarshalCanotoFrom populates the field from a canoto.Reader.
	UnmarshalCanotoFrom(r Reader) error
	// ValidCanoto validates that the field can be correctly marshaled into
	// the Canoto format.
	ValidCanoto() bool
}

Field defines a type that can be included inside of a Canoto message.

type FieldMaker added in v0.7.0

type FieldMaker[T any] interface {
	Field
	MakeCanoto() T
}

FieldMaker is a Field that can create a new value of type T.

The returned value must be able to be unmarshaled into.

This type can be used when implementing a generic Field. However, if T is an interface, it is possible for generated code to compile and panic at runtime.

type FieldPointer added in v0.4.0

type FieldPointer[T any] interface {
	Field
	*T
}

FieldPointer is a pointer to a concrete Field value T.

This type must be used when implementing a value for a generic Field.

type Int

type Int interface{ Sint | Uint }

type Int32

type Int32 interface{ ~int32 | ~uint32 }

type Int64

type Int64 interface{ ~int64 | ~uint64 }

type Message

type Message interface {
	Field
	// MarshalCanoto returns the Canoto representation of this message.
	//
	// It is assumed that this message is ValidCanoto.
	MarshalCanoto() []byte
	// UnmarshalCanoto unmarshals a Canoto-encoded byte slice into the message.
	UnmarshalCanoto(bytes []byte) error
}

Message defines a type that can be a stand-alone Canoto message.

type Reader

type Reader struct {
	B      []byte
	Unsafe bool
}

Reader contains all the state needed to unmarshal a Canoto type.

The functions in this package are not methods on the Reader type to enable the usage of generics.

type Sint

type Sint interface {
	~int8 | ~int16 | ~int32 | ~int64
}

type Uint

type Uint interface {
	~uint8 | ~uint16 | ~uint32 | ~uint64
}

type WireType

type WireType byte

WireType represents the Proto wire description of a field. Within Proto it is used to provide forwards compatibility. For Canoto, it exists to provide compatibility with Proto.

func ReadTag

func ReadTag(r *Reader) (uint32, WireType, error)

ReadTag reads the next field number and wire type from the reader.

func (WireType) IsValid

func (w WireType) IsValid() bool

func (WireType) String

func (w WireType) String() string

type Writer

type Writer struct {
	B []byte
}

Writer contains all the state needed to marshal a Canoto type.

The functions in this package are not methods on the Writer type to enable the usage of generics.

Directories

Path Synopsis
Canoto is command to generate code for reading and writing the canoto format.
Canoto is command to generate code for reading and writing the canoto format.
cli module
Generate exposes functionality to generate code for reading and writing the canoto format.
Generate exposes functionality to generate code for reading and writing the canoto format.
big
canoto
Canoto provides common functionality required for reading and writing the canoto format.
Canoto provides common functionality required for reading and writing the canoto format.
pb

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL