codec

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 18, 2021 License: BSD-3-Clause Imports: 13 Imported by: 0

README

Fast Encoding of Go Values

This project is an enhanced version of the package pkgsite/internal/godoc/codec.

The original motivation was fast decoding of parsed Go files, of type go/ast.File. The pkg.go.dev site saves these when processing a module, and decodes them on the serving path to render documentation. So decoding had to be fast, and had to handle the cycles that these structures contain. It also had to work with existing types that we did not control. We couldn't find any existing encoders with these properties, so we wrote our own.

For usage, see the package documentation.

Encoding Scheme

The encoding is a virtual machine in which every encoded value begins with a 1-byte code that describes what (if anything) follows. The encoding does not preserve type information--for instance, the value 1 could be an int or a bool-- but it does have enough information to skip values, since the decoder must be able to do that if it encounters a struct field it doesn't know.

Most of the values of a value's initial byte can be devoted to small unsigned integers. For example, the number 17 is represented by the single byte 17. Only a few byte values have special meaning, as described below.

The nil code indicates that the value is nil. (We don't absolutely need this: we could always represent the nil value for a type as something that couldn't be mistaken for an encoded value of that type. For instance, we could use 0 for nil in the case of slices (which always begin with the nValues code), and for pointers to numbers like *int, we could use something like "nBytes 0". But it is simpler to have a reserved value for nil.)

The nBytes code indicates that an unsigned integer N is encoded next, followed by N bytes of data. There are optimized codes for values of N from 0 to 4. These are used to represent strings and byte slices, as well numbers bigger than can fit into the initial byte. For example, the string "hello" is represented as: nBytes 5 'h' 'e' 'l' 'l' 'o'.

Unsigned integers that can't fit into the initial byte are encoded as byte sequences of length 1, 2, 4 or 8, holding big-endian values.

The nValues code is for sequences of values whose size is known beforehand, like a Go slice or array. The slice []string{"hi", "bye"} is encoded as

nValues 2 bytes2 'h' 'i' bytes3 'b' 'y' 'e'

The ptr and refPtr codes indicate a pointer to the encoded value. The latter signals to the decoder that it should remember the pointer because it will be referred to later in the stream.

The ref code is used to refer to an earlier encoded pointer. It is followed by a uint denoting the relative offset to the position of the corresponding refPtr code.

The start and end codes delimit a value whose length is unknown beforehand. They are used for structs.

Documentation

Overview

Package codec implements an encoder for Go values. It relies on code generation rather than reflection, so it is significantly faster than reflection-based encoders like gob. It can also preserve sharing among pointers (but not other forms of sharing, like sub-slices).

Encodings with maps are not deterministic, due to the non-deterministic order of map iteration.

Generating Code

The package supports Go built-in types (int, string and so on) out of the box, but for any other type you must generate code by calling GenerateFile. This can be done with a small program in your project's directory:

    // file generate.go
    //+build ignore

	package main

	import (
	   "mypkg"
	   "github.com/jba/codec"
	)

	func main() {
		err := codec.GenerateFile("types.gen.go", "mypkg", nil,
			[]mypkg.Type1{}, &mypkg.Type2{})
		if err != nil {
			log.Fatal(err)
		}
	}

Code will be generated for each type listed and for all types they contain. So this program will generate code for []mypkg.Type1, mypkg.Type1, *mypkg.Type2, and mypkg.Type2.

The "//+build ignore" tag prevents the program from being compiled as part of your package. Instead, invoke it directly with "go run". Use "go generate" to do so if you like:

//go:generate go run generate.go

On subsequent runs, the generator reads the generated file to get the names and order of all struct fields. It uses this information to generate correct code when fields are moved or added. Make sure the old generated files remain available to the generator, or changes to your structs may result in existing encoded data being decoded incorrectly.

Encoding and Decoding

Create an Encoder by passing it an io.Writer:

var buf bytes.Buffer
e := codec.NewEncoder(&buf, nil)

Then use it to encode one or more values:

if err := e.Encode(x); err != nil { ... }

To decode, pass an io.Reader to NewDecoder, and call Decode:

f, err := os.Open(filename)
...
d := codec.NewDecoder(f, nil)
value, err := d.Decode()
...

Sharing and Cycles

By default, if two pointers point to the same value, that value will be duplicated upon decoding. If there is a cycle, where a value directly or indirectly points to itself, then the encoder will crash by exceeding available stack space. This is the same behavior as encoding/gob and many other encoders.

Set EncodeOptions.TrackPointers to true to preserve pointer sharing and cycles, at the cost of slower encoding.

Other forms of memory sharing are not preserved. For example, if two slices refer to the same underlying array during encoding, they will refer to separate arrays after decoding.

Struct Tags

Struct tags in the style of encoding/json are supported, under the name "codec". You can easily generate code for structs designed for the encoding/json package by changing the name to "json" in an option to GenerateFile.

An example:

type T struct {
    A int `codec:"B"`
    C int `codec:"-"`
}

Here, field A will use the name "B" and field C will be omitted. There is no need for the omitempty option because the encoder always omits zero values.

Since the encoding uses numbers for fields instead of names, renaming a field doesn't actually affect the encoding. It does matter if subsequent changes are made to the struct, however. For example, say that originally T was

type T struct {
    A int
}

but you rename the field to "B":

type T struct {
    B int
}

The generator will treat "B" as a new field. Data encoded with "A" will not be decoded into "B". So you should use a tag to express that it is a renaming:

type T struct {
    B int `codec:"A"`
}
Example
package main

import (
	"bytes"
	"fmt"
	"log"

	"github.com/jba/codec"
)

func main() {
	var buf bytes.Buffer
	e := codec.NewEncoder(&buf, nil)
	for _, x := range []interface{}{1, "hello", true} {
		if err := e.Encode(x); err != nil {
			log.Fatal(err)
		}
	}

	d := codec.NewDecoder(bytes.NewReader(buf.Bytes()), nil)
	for i := 0; i < 3; i++ {
		var got interface{}
		err := d.Decode(&got)
		if err != nil {
			log.Fatal(err)
		}
		fmt.Println(got)
	}

}
Output:

1
hello
true

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func GenerateFile

func GenerateFile(filename, packagePath string, opts *GenerateOptions, values ...interface{}) error

GenerateFile writes encoders and decoders to filename. It generates code for the type of each given value, as well as any types they depend on. packagePath is the output package path.

Example
package main

import (
	"fmt"
	"log"
	"os"

	"github.com/jba/codec"
)

func main() {
	err := codec.GenerateFile("types.gen.go", "mypkg", nil, []int{}, map[string]bool{})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(err)
	os.Remove("types.gen.go")

}
Output:

<nil>

Types

type DecodeOptions

type DecodeOptions struct {
	// DisallowUnknownFields configures whether unknown struct fields are skipped
	// (the default) or cause decoding to fail immediately.
	DisallowUnknownFields bool
}

DecodeOptions holds options for Decoding.

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

A Decoder decodes a Go value encoded by an Encoder. To use a Decoder: - Pass NewDecoder the return value of Encoder.Bytes. - Call the Decode method once for each call to Encoder.Encode.

func NewDecoder

func NewDecoder(r io.Reader, opts *DecodeOptions) *Decoder

NewDecoder creates a Decoder that reads from r.

func (*Decoder) Decode

func (d *Decoder) Decode(p interface{}) error

Decode decodes a value encoded with Encoder.Encode and stores the result in the value pointed to by p. The decoded value must be assignable to the pointee's type; no conversions are performed. Decode returns io.EOF if there are no more values.

type EncodeOptions

type EncodeOptions struct {
	// If TrackPointers is true, the encoder will keep track of pointers so it
	// can preserve the pointer topology of the encoded value. Cyclical and
	// shared values will decode to the same representation. If TrackPointers is
	// false, then shared pointers will decode to distinct values, and cycles
	// will result in stack overflow.
	//
	// Setting this to true will significantly slow down encoding.
	TrackPointers bool

	// If non-nil, Encode will use this buffer instead of creating one. If the
	// encoding is large, providing a buffer of sufficient size can speed up
	// encoding by reducing allocation.
	Buffer []byte
}

EncodeOptions holds options for encoding.

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

An Encoder encodes Go values into a sequence of bytes.

func NewEncoder

func NewEncoder(w io.Writer, opts *EncodeOptions) *Encoder

NewEncoder returns an Encoder that writes to w.

func (*Encoder) Encode

func (e *Encoder) Encode(x interface{}) (err error)

Encode encodes x.

type GenerateOptions

type GenerateOptions struct {
	// FieldTag is the name that GenerateFile will use to look up
	// field tag information. The default is "codec".
	FieldTag string
}

Directories

Path Synopsis
Package codecapi is used by the codec package and by code generated by codec.GenerateFile.
Package codecapi is used by the codec package and by code generated by codec.GenerateFile.
internal
testpkg
This is a package whose name is not the last component of its import path.
This is a package whose name is not the last component of its import path.
benchmarks Module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL