msgpackpb

package
v0.0.0-...-aa8b1ce Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 2, 2024 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Overview

Package msgpackpb implements generic protobuf message serialization to msgpack.

This library exists primarially to allow safe interchange between lua scripts running inside of a Redis instance, and external programs.

It is intended to be fast and compact for lua to decode via cmsgpack (specifically, the version of cmsgpack which ships with Redis 5.1+).

To avoid implementing a brand new versioning or encoding schemes while not totally sacrificing performance and storage cababilities (e.g. by using JSONPB), we lean on the versioning and compatibility features of protobufs, and so choose a scheme which can be derived entirely from the proto schema definition.

The scheme works by encoding a message as a map of `field tag` to `value`.

The value can be a message, a scalar (bool, int, uint, float, string), a list of messages or scalars, a map of (bool, int, uint, string) to messages or scalars.

message Foo {
  string field = 2;
  Foo recurse = 7;
}

Would encode the instance `field: "hello" recurse {field: "hi"}}` as msgpack:

{
  2: "hello",
  7: {
    2: "hi"
  }
}

i.e. `94 02 a5 68 65 6c 6c 6f f9 92 02 a2 68 69`

This would be 14 bytes vs 12 bytes for binary proto or 46 bytes for JSONPB.

This encoding is simple enough that we can make a simple table based encoder/decoder in Lua, but robust enough that as long as everyone follows the backwards compatibility rules for proto, we should be OK when having Go and Lua interact with the same-encoded messages.

Unknown fields

Unknown fields are saved in decoded protobufs as an unknown field tagged with INT32_MAX. The value of this field is essentially a filtered version of the Message; a map of `field tag` and `value`, but just for the unknown field tags.

Deterministic serialization

This library optionally provides deterministic serialization of messages, which will:

  • Sort all maps
  • Order all messages by tag
  • Emit all numeric types using the most compact representation in msgpack.
  • Walk all unknown fields, ordering messages there by tag, and interleaving their field numbers with the known fields in sorted order.
  • If any of the above would yield a map with integer keys from 1..N, instead emit this as a list of just the values.

It is intended that this encoding be stable across binary versions, and should be suitable for hashing (see MarshalStream; You can use io.MultiWriter to marshal to e.g. a strings.Builder at the same time that you write to a hash).

Notes

NOTE: It could probably be a better idea to implement native protobuf encoding for Lua (and, in fact, there are Redis extensions for this), but we cannot currently use them with Cloud Memorystore, which is what manages our Redis instance. Writing a 'pure lua' protobuf codec seemed like it would be a time sink, but it could still be an option if this msgpack encoding proves difficult to work with.

NOTE: An alternative to this package would be to create a bespoke encoding for your data objects, and include, e.g. a version identifier, to allow for schema updates. This is actually the initial approach that we took before writing this library, but we deemed this to be unwieldy and were concerned about having to worry both about proto change semantics and mapping them to the bespoke versioning semantics.

NOTE: It would be possible to build a more efficient code generation marshalling implementation, but it presents a problem; Doing so would require generating serialization code for ALL proto messages which could be encoded, which includes references to externally declared proto messages (for example, google.protobuf.Duration). As such, a code generation approach would need to diverge pretty significantly from the 'usual' Go codegen model (i.e. where the generated code lives next to the protos that generate the code). Specifically, the generated code for external protos would need to be generated into a common library or something like that which lives separately from the protos.

The other alternative would be to partially generate the marshalling code for protos which we own, but then fall back to a reflection based approach for messages which don't have the codegen encoding scheme. However this means that we would need a fully working codegen scheme AND ALSO a fully working reflection based scheme.

For 'simplicity', we took the reflection-based approach as the initial and sole approach for now.

NOTE: Unlike binary protobuf, msgpackpb messages cannot be concatenated to produce a single encodeable message. However, concatenated messages can accurately be parsed serially and applied to the same Message using UnmarshalStream.

NOTE: Using this to interact with lua, keep in mind that lua (until 5.3) stores all numbers as double precision floating point (note that Redis looks like it's effectively stuck on lua 5.1 indefinitely, as of late 2022). The upshot of this is that integer types will be integers until they hit 2^52 or so (assuming that your redis is using lua with 64bit numbers! It is possible to configure lua to use 32bit numbers...). If your lua program serializes a number past this threshold, Go will refuse to decode it into a field with an integer type, so at least this won't cause silent corruption.

NOTE: Lua only has a single type to represent maps and lists; the 'table'. Additionally, the lua cmsgpack library will encode a table as a list if it 'looks like' a list. A table 'looks like' a list if it contains N entries and all the entries are keyed with the numbers 1 through N. This affects the encoding of both messages (which are map of field tag to value) as well as proto fields which are maps.

NOTE: Should you need to switch to another encoding, note that because this encoding ALWAYS encodes a message, the msgpack 'type' of the first item in a stream will always be either a map or a list (due to lua table shenanigans). This means that you could insert a number into the stream as the first item, instead, and use this to disambiguate future versions of this encoding.

Index

Constants

This section is empty.

Variables

View Source
var File_go_chromium_org_luci_common_proto_msgpackpb_unknown_proto protoreflect.FileDescriptor

Functions

func Deterministic

func Deterministic(o *options)

Deterministic is an Option which affects Marshal.

(Providing this to Unmarshal is an error).

If set, the proto will be encoded with the following additional rules:

  • All fields will be output ordered by their field tag number.
  • Maps will sort keys (lexically or numerically)
  • Any unknown msgpack fields will (if PreserveUnknownFields was given when Unmarshalling) be interleaved with the known fields in order, sorting any messages or maps.

func DisallowUnknownFields

func DisallowUnknownFields(o *options)

DisallowUnknownFields is an Option which affects Marshal + Unmarshal.

Marshal: Return an error when encoding proto messages containing any unknown fields.

Unmarshal: Return an error when decoding messages containing unknown fields.

func IgnoreUnknownFields

func IgnoreUnknownFields(o *options)

IgnoreUnknownFields is an Option which affects Marshal + Unmarshal.

Marshal: Unknown msgpack fields on the proto message to be dropped (non-msgpack unknown fields will result in an error).

Unmarshal: Skips unknown fields, and do not store them on the decoded proto message.

func Marshal

func Marshal(msg proto.Message, opts ...Option) (msgpack.RawMessage, error)

Marshal encodes all the known fields in msg to a msgpack string.

By default, this will emit any unknown msgpack fields (generated by the Unmarshal method in this package) back to the serialized message. Pass IgnoreUnknownFields or DisallowUnknownFields to affect this behavior.

This can also produce a deterministic encoding if Deterministic is passed as an option. Otherwise this will do a faster non-determnistic encoding without trying to sort field tags or map keys.

Returns an error if `msg` contains unknown fields.

func MarshalStream

func MarshalStream(writer io.Writer, msg proto.Message, opts ...Option) error

MarshalStream is like Marshal but outputs to an io.Writer instead of returning a string.

func Unmarshal

func Unmarshal(msg msgpack.RawMessage, to proto.Message, opts ...Option) (err error)

Unmarshal parses the encoded msgpack into the given proto message.

This does NOT reset the Message; if it is partially populated, this will effectively do a proto.Merge on top of it.

By default, this will output unknown fields in the Message, but this will only be usable by the corresponding Marshal function in this package. Pass IgnoreUnknownFields or DisallowUnknownFields to affect this behavior.

func UnmarshalStream

func UnmarshalStream(reader io.Reader, to proto.Message, opts ...Option) (err error)

UnmarshalStream is like Unmarshal but takes an io.Reader instead of accepting a string.

If the reader contains multiple msgpackpb messages, this function will stop exactly at where the next message in the stream begins (i.e. you could call this in a loop until the reader is exhausted to merge the messages together).

Types

type Option

type Option func(*options)

Option allows modifying the behavior of Marshal and Unmarshal.

func WithStringInternTable

func WithStringInternTable(table []string) Option

WithStringInternTable lets you set an optional string internment table; Any strings encoded which are contained in this table will be replaced with an integer denoting the index in this table where the string was found.

This includes repeated strings, map keys, and string value fields.

type UnknownFields

type UnknownFields struct {

	// This contains a valid msgpackpb of all the unknown fields to which this
	// UnknownFields message is attached (i.e. map of number -> value).
	//
	// 536870911 was selected as the tag number to avoid accidentally overlapping
	// with real fields.
	//
	// If you try to Marshal a message which actually populates a field with this
	// number, it will return an error.
	MsgpackpbData []byte `protobuf:"bytes,536870911,opt,name=msgpackpb_data,json=msgpackpbData,proto3" json:"msgpackpb_data,omitempty"` // max valid field number
	// contains filtered or unexported fields
}

UnknownFields is a formal definition of how this package embeds unknown fields in Unmarshalled messages.

func (*UnknownFields) Descriptor deprecated

func (*UnknownFields) Descriptor() ([]byte, []int)

Deprecated: Use UnknownFields.ProtoReflect.Descriptor instead.

func (*UnknownFields) GetMsgpackpbData

func (x *UnknownFields) GetMsgpackpbData() []byte

func (*UnknownFields) ProtoMessage

func (*UnknownFields) ProtoMessage()

func (*UnknownFields) ProtoReflect

func (x *UnknownFields) ProtoReflect() protoreflect.Message

func (*UnknownFields) Reset

func (x *UnknownFields) Reset()

func (*UnknownFields) String

func (x *UnknownFields) String() string

Directories

Path Synopsis
Package luagen implements a lua code generator for proto code.
Package luagen implements a lua code generator for proto code.
examplepb
Package examplepb serves as an example protobuf library which is structured similarly to how other protobuf libraries in this repo are structured, with the addition of a generated .lua file output.
Package examplepb serves as an example protobuf library which is structured similarly to how other protobuf libraries in this repo are structured, with the addition of a generated .lua file output.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL