Documentation
¶
Overview ¶
Package avro is an AVRO encoder and decoder aimed principly at decoding AVRO output from Google's Big Query. It encodes directly from Go structs and decodes directly into Go structs, and uses json tags as naming hints.
The primary decoding interface is ReadFile. This reads an AVRO file, combining the schema in the file with type information from the struct passed via the out parameter to decode the records. It then passes an instance of a struct of type out to the callback cb for each record in the file.
Use Encoder to encode a file ¶
You can implement custom decoders for your own types and register them via the Register function. github.com/phil/avro/null is an example of custom decoders for the types defined in github.com/unravelin/null
Index ¶
- Variables
- func ReadFile(r Reader, out interface{}, cb func(val unsafe.Pointer, rb *ResourceBank) error) error
- func Register(typ reflect.Type, f CodecBuildFunc)
- func RegisterSchema(typ reflect.Type, s Schema)
- type BoolCodec
- type BytesCodec
- type Codec
- type CodecBuildFunc
- type Compression
- type DoubleCodec
- type Encoder
- type FileHeader
- type FileWriter
- type Float32DoubleCodec
- type FloatCodec
- type Int16Codec
- type Int32Codec
- type Int64Codec
- type IntCodec
- type MapCodec
- type PointerCodec
- type ReadBuf
- func (d *ReadBuf) Alloc(rtyp reflect.Type) unsafe.Pointer
- func (d *ReadBuf) ExtractResourceBank() *ResourceBank
- func (d *ReadBuf) Len() int
- func (d *ReadBuf) Next(l int) ([]byte, error)
- func (d *ReadBuf) NextAsString(l int) (string, error)
- func (d *ReadBuf) ReadByte() (byte, error)
- func (d *ReadBuf) Reset(data []byte)
- func (d *ReadBuf) Varint() (int64, error)
- type Reader
- type ResourceBank
- type Schema
- type SchemaObject
- type SchemaRecordField
- type StringCodec
- type WriteBuf
Constants ¶
This section is empty.
Variables ¶
var FileMagic = [4]byte{'O', 'b', 'j', 1}
Functions ¶
func ReadFile ¶
ReadFile reads from an AVRO file. The records in the file are decoded into structs of the type indicated by out. These are fed back to the application via the cb callback. ReadFile calls cb with a pointer to the struct. The pointer is converted to an unsafe.Pointer. The pointer should not be retained by the application past the return of cb.
var records []myrecord if err := ReadFile(f, myrecord{}, func(val unsafe.Pointer) error { records = append(records, *(*record)(val)) return nil }); err != nil { return err }
func Register ¶
func Register(typ reflect.Type, f CodecBuildFunc)
Register is used to set a custom codec builder for a type
func RegisterSchema ¶ added in v0.0.20
Call RegisterSchema to indicate what schema should be used for a given type. Use this to register the schema to use for a type for which you write a custom codec.
Types ¶
type BytesCodec ¶
type BytesCodec struct {
// contains filtered or unexported fields
}
func (BytesCodec) Skip ¶
func (BytesCodec) Skip(r *ReadBuf) error
type Codec ¶
type Codec interface { // Read reads the wire format bytes for the current field from r and sets up // the value that p points to. The codec can assume that the memory for an // instance of the type for which the codec is registered is present behind // p Read(r *ReadBuf, p unsafe.Pointer) error // Skip advances the reader over the bytes for the current field. Skip(r *ReadBuf) error // New creates a pointer to the type for which the codec is registered. It is // used if the enclosing record has a field that is a pointer to this type New(r *ReadBuf) unsafe.Pointer // Omit returns true if the value that p points to should be omitted from the // output. This is used for optional fields in records. Omit(p unsafe.Pointer) bool // Write writes the wire format bytes for the value that p points to to w. Write(w *WriteBuf, p unsafe.Pointer) }
Codec defines an encoder / decoder for a type. You can write custom Codecs for types. See Register and CodecBuildFunc
type CodecBuildFunc ¶
CodecBuildFunc is the function signature for a codec builder. If you want to customise AVRO decoding for a type register a CodecBuildFunc via the Register call. Schema is the AVRO schema for the type to build. typ should match the type the function was registered under.
type Compression ¶ added in v0.0.18
type Compression string
const ( CompressionNull Compression = "null" CompressionDeflate Compression = "deflate" CompressionSnappy Compression = "snappy" )
type DoubleCodec ¶
type DoubleCodec = floatCodec[float64]
type Encoder ¶ added in v0.0.20
type Encoder[T any] struct { // contains filtered or unexported fields }
func NewEncoderFor ¶ added in v0.0.20
func NewEncoderFor[T any](w io.Writer, compression Compression, approxBlockSize int) (*Encoder[T], error)
NewEncoder returns a new Encoder. Data will be written to w in Avro format, including a schema header. The data will be compressed using the specified compression algorithm. Data is written in blocks of at least approxBlockSize bytes. A block is written when it reaches that size, or when Flush is called.
type FileHeader ¶
type FileHeader struct { Magic [4]byte `json:"magic"` Meta map[string][]byte `json:"meta"` Sync [16]byte `json:"sync"` }
FileHeader represents an AVRO file header
type FileWriter ¶ added in v0.0.18
type FileWriter struct {
// contains filtered or unexported fields
}
FileWriter provides limited support for writing AVRO files. It allows you to write blocks of already encoded data. Actually encoding data as AVRO is not yet supported.
func NewFileWriter ¶ added in v0.0.18
func NewFileWriter(schema []byte, compression Compression) (*FileWriter, error)
NewFileWriter creates a new FileWriter. The schema is the JSON encoded schema. The compression parameter indicates the compression codec to use.
func (*FileWriter) AppendHeader ¶ added in v0.0.18
func (f *FileWriter) AppendHeader(buf []byte) []byte
AppendHeader appends the AVRO file header to the provided buffer.
func (*FileWriter) WriteBlock ¶ added in v0.0.18
WriteBlock writes a block of data to the writer. The block must be rowCount rows of AVRO encoded data.
func (*FileWriter) WriteHeader ¶ added in v0.0.18
func (f *FileWriter) WriteHeader(w io.Writer) error
WriteHeader writes the AVRO file header to the writer.
type Float32DoubleCodec ¶
type Float32DoubleCodec struct {
DoubleCodec
}
func (Float32DoubleCodec) Omit ¶ added in v0.0.20
func (rc Float32DoubleCodec) Omit(p unsafe.Pointer) bool
type FloatCodec ¶
type FloatCodec = floatCodec[float32]
type Int16Codec ¶
type Int32Codec ¶
type Int64Codec ¶
type IntCodec ¶ added in v0.0.13
Int64Codec is an avro codec for int64
type MapCodec ¶
type MapCodec struct {
// contains filtered or unexported fields
}
MapCodec is a decoder for map types. The key must always be string
type PointerCodec ¶ added in v0.0.6
type PointerCodec struct {
Codec
}
type ReadBuf ¶ added in v0.0.20
type ReadBuf struct {
// contains filtered or unexported fields
}
ReadBuf is a very simple replacement for bytes.Reader that avoids data copies
func NewReadBuf ¶ added in v0.0.22
NewReadBuf returns a new Buffer.
func (*ReadBuf) Alloc ¶ added in v0.0.20
Alloc allocates a pointer to the type rtyp. The data is allocated in a ResourceBank
func (*ReadBuf) ExtractResourceBank ¶ added in v0.0.20
func (d *ReadBuf) ExtractResourceBank() *ResourceBank
ExtractResourceBank extracts the current ResourceBank from the buffer, and replaces it with a fresh one.
func (*ReadBuf) Next ¶ added in v0.0.20
Next returns the next l bytes from the buffer. It does so without copying, so if you hold onto the data you risk holding onto a lot of data. If l exceeds the remaining space Next returns io.EOF
func (*ReadBuf) NextAsString ¶ added in v0.0.20
NextAsString returns the next l bytes from the buffer as a string. The string data is held in a StringBank and will be valid only until someone calls Close on that bank. If l exceeds the remaining space NextAsString returns io.EOF
func (*ReadBuf) ReadByte ¶ added in v0.0.20
ReadByte returns the next byte from the buffer. If no bytes are left it returns io.EOF
type Reader ¶
type Reader interface { io.Reader io.ByteReader }
Reader combines io.ByteReader and io.Reader. It's what we need to read
type ResourceBank ¶ added in v0.0.6
type ResourceBank struct {
// contains filtered or unexported fields
}
ResourceBank is used to allocate memory used to create structs to decode AVRO into. The primary reason for having it is to allow the user to flag the memory can be re-used, so reducing the strain on the GC
We allocate using the required type of thing so the GC can still inspect within the memory.
func (*ResourceBank) Alloc ¶ added in v0.0.6
func (rb *ResourceBank) Alloc(rtyp reflect.Type) unsafe.Pointer
Alloc reserves some memory in the ResourceBank. Note that this memory may be re-used after Close is called.
func (*ResourceBank) Close ¶ added in v0.0.6
func (rb *ResourceBank) Close()
Close marks the resources in the ResourceBank as available for re-use
func (*ResourceBank) ToString ¶ added in v0.0.6
func (rb *ResourceBank) ToString(in []byte) string
ToString saves string data in the bank and returns a string. The string is valid until someone calls Close
type Schema ¶
type Schema struct { Type string Object *SchemaObject Union []Schema }
Schema is a representation of AVRO schema JSON. Primitive types populate Type only. UnionTypes populate Type and Union fields. All other types populate Type and a subset of Object fields.
Note that the jsoniter fuzzy decoders (github.com/json-iterator/go/extra RegisterFuzzyDecoders) break decoding of Schema objects. Tolleration of arrays as structs breaks decoding of unions.
func FileSchema ¶ added in v0.0.11
FileSchema reads the Schema from an AVRO file.
func SchemaForType ¶ added in v0.0.20
SchemaForType returns a Schema for the given type. It aims to produce a Schema that's compatible with BigQuery.
func SchemaFromString ¶ added in v0.0.14
SchemaFromString decodes a JSON string into a Schema
type SchemaObject ¶
type SchemaObject struct { Type string `json:"type"` LogicalType string `json:"logicalType,omitempty"` Name string `json:"name,omitempty"` Namespace string `json:"namespace,omitempty"` // Fields in a record Fields []SchemaRecordField `json:"fields,omitempty"` // The type of each item in an array Items Schema `json:"items,omitempty"` // The value types of a map (keys are strings) Values Schema `json:"values,omitempty"` // The size of a fixed type Size int `json:"size,omitempty"` // The values of an enum Symbols []string `json:"symbols,omitempty"` }
SchemaObject contains all the fields of more complex schema types
type SchemaRecordField ¶
type SchemaRecordField struct { Name string `json:"name,omitempty"` Type Schema `json:"type,omitempty"` }
SchemaRecordField represents one field of a Record schema
type StringCodec ¶
type StringCodec struct {
// contains filtered or unexported fields
}
StringCodec is a decoder for strings
func (StringCodec) Skip ¶
func (StringCodec) Skip(r *ReadBuf) error
type WriteBuf ¶ added in v0.0.20
type WriteBuf struct {
// contains filtered or unexported fields
}
WriteBuf is a simple, append only, replacement for bytes.Buffer.