Documentation ¶
Overview ¶
Package goavro is a library that encodes and decodes of Avro data. It provides an interface to encode data directly to io.Writer streams, and to decode data from io.Reader streams. Goavro fully adheres to version 1.7.7 of the Avro specification and data encoding.
Index ¶
- Constants
- Variables
- func IsCompressionCodecSupported(someCodec string) bool
- type ByteWriter
- type Codec
- type CodecSetter
- type Datum
- type Decoder
- type Encoder
- type Enum
- type ErrCodecBuild
- type ErrDecoder
- type ErrEncoder
- type ErrInvalidName
- type ErrNoSuchField
- type ErrNotRecord
- type ErrReader
- type ErrReaderBlockCount
- type ErrReaderInit
- type ErrSchemaParse
- type ErrWriterInit
- type Fixed
- type Reader
- type ReaderSetter
- type Record
- func (r Record) Get(fieldName string) (interface{}, error)
- func (r Record) GetFieldSchema(fieldName string) (interface{}, error)
- func (r Record) GetQualified(qualifiedName string) (interface{}, error)
- func (r Record) Set(fieldName string, value interface{}) error
- func (r Record) SetQualified(qualifiedName string, value interface{}) error
- func (r Record) String() string
- type RecordCache
- type RecordSetter
- type StringWriter
- type Writer
- type WriterSetter
- func BlockSize(blockSize int64) WriterSetter
- func BlockTick(blockTick time.Duration) WriterSetter
- func BufferToWriter(w io.Writer) WriterSetter
- func Compression(someCompressionCodec string) WriterSetter
- func Sync(someSync []byte) WriterSetter
- func ToWriter(w io.Writer) WriterSetter
- func UseCodec(codec Codec) WriterSetter
- func WriterSchema(someSchema string) WriterSetter
Constants ¶
const ( CompressionNull = "null" CompressionDeflate = "deflate" CompressionSnappy = "snappy" )
Compression codecs that Reader and Writer instances can process.
const DefaultWriterBlockSize = 10
DefaultWriterBlockSize specifies the default number of datum items in a block when writing.
Variables ¶
var MaxDecodeSize = int64(math.MaxInt32)
MaxDecodeSize defines the maximum length a String or Bytes field. This is here because the way we decode Strings and Bytes fields is entirely stateless. This means we can't follow the example set by other encoders who look at how much data is left to be decoded and return an error if that amount is exceeded.
If you need to decode Avro data which contains strings or bytes fields longer than ~2.2GB, modify this value at your discretion.
On a 32bit platform this value should not exceed math.MaxInt32, as Go's make function is limited to only creating MaxInt number of objects at a time. On a 64bit platform the limitation is primarily your avaialble memory.
Example:
func init() { goavro.MaxDecodeSize = (1 << 40) // 1 TB of runes or bytes }
Functions ¶
func IsCompressionCodecSupported ¶
IsCompressionCodecSupported returns true if and only if the specified codec string is supported by this library.
Types ¶
type ByteWriter ¶
ByteWriter is the interface implemented by any object that bytes can be written to.
type Codec ¶
type Codec interface { Decoder Encoder Schema() string NewWriter(...WriterSetter) (*Writer, error) }
The Codec interface supports both Decode and Encode operations.
func NewCodec ¶
func NewCodec(someJSONSchema string, setters ...CodecSetter) (Codec, error)
NewCodec creates a new object that supports both the Decode and Encode methods. It requires an Avro schema, expressed as a JSON string.
codec, err := goavro.NewCodec(someJSONSchema) if err != nil { return nil, err } // Decoding data uses codec created above, and an io.Reader, // definition not shown: datum, err := codec.Decode(r) if err != nil { return nil, err } // Encoding data uses codec created above, an io.Writer, // definition not shown, and some data: err := codec.Encode(w, datum) if err != nil { return nil, err } // Encoding data using bufio.Writer to buffer the writes // during data encoding: func encodeWithBufferedWriter(c Codec, w io.Writer, datum interface{}) error { bw := bufio.NewWriter(w) err := c.Encode(bw, datum) if err != nil { return err } return bw.Flush() } err := encodeWithBufferedWriter(codec, w, datum) if err != nil { return nil, err }
type CodecSetter ¶
CodecSetter functions are those those which are used to modify a newly instantiated Codec.
type Datum ¶
type Datum struct { Value interface{} Err error }
Datum binds together a piece of data and any error resulting from either reading or writing that datum.
type Enum ¶
type Enum struct {
Name, Value string
}
Enum is an abstract data type used to hold data corresponding to an Avro enum. Whenever an Avro schema specifies an enum, this library's Decode method will return an Enum initialized to the enum's name and value read from the io.Reader. Likewise, when using Encode to convert data to an Avro record, it is necessary to create and send an Enum instance to the Encode method.
type ErrCodecBuild ¶
ErrCodecBuild is returned when the encoder encounters an error.
func (ErrCodecBuild) Error ¶
func (e ErrCodecBuild) Error() string
type ErrDecoder ¶
ErrDecoder is returned when the encoder encounters an error.
func (ErrDecoder) Error ¶
func (e ErrDecoder) Error() string
type ErrEncoder ¶
ErrEncoder is returned when the encoder encounters an error.
func (ErrEncoder) Error ¶
func (e ErrEncoder) Error() string
type ErrInvalidName ¶
type ErrInvalidName struct {
Message string
}
ErrInvalidName is returned when a Codec cannot be created due to invalid name format.
func (ErrInvalidName) Error ¶
func (e ErrInvalidName) Error() string
type ErrNoSuchField ¶
type ErrNoSuchField struct {
// contains filtered or unexported fields
}
ErrNoSuchField is returned when attempt to Get a field that does not exist in a Record.
func (ErrNoSuchField) Error ¶
func (e ErrNoSuchField) Error() string
Error returns the string representation of an ErrNoSuchField error.
type ErrNotRecord ¶
type ErrNotRecord struct {
// contains filtered or unexported fields
}
ErrNotRecord is returned when codepath expecs goavro.Record, but found something else.
func (ErrNotRecord) Error ¶
func (e ErrNotRecord) Error() string
Error returns a string representation of the ErrNotRecord error.
type ErrReaderBlockCount ¶
type ErrReaderBlockCount struct {
Err error
}
ErrReaderBlockCount is returned when a reader detects an error while attempting to read the block count and block size.
func (*ErrReaderBlockCount) Error ¶
func (e *ErrReaderBlockCount) Error() string
type ErrReaderInit ¶
ErrReaderInit is returned when the encoder encounters an error.
func (ErrReaderInit) Error ¶
func (e ErrReaderInit) Error() string
type ErrSchemaParse ¶
ErrSchemaParse is returned when a Codec cannot be created due to an error while reading or parsing the schema.
func (ErrSchemaParse) Error ¶
func (e ErrSchemaParse) Error() string
type ErrWriterInit ¶
ErrWriterInit is returned when an error is created during Writer initialization.
func (*ErrWriterInit) Error ¶
func (e *ErrWriterInit) Error() string
Error converts the error instance to a string.
type Fixed ¶
Fixed is an abstract data type used to hold data corresponding to an Avro 'Fixed' type. Whenever an Avro schema specifies a "Fixed" type, this library's Decode method will return a Fixed value initialized to the Fixed name, and value read from the io.Reader. Likewise, when using Encode to convert data to an Avro record, it is necessary to create and send a Fixed instance to the Encode method.
type Reader ¶
type Reader struct { CompressionCodec string DataSchema string Sync []byte // contains filtered or unexported fields }
Reader structure contains data necessary to read Avro files.
func NewReader ¶
func NewReader(setters ...ReaderSetter) (*Reader, error)
NewReader returns a object to read data from an io.Reader using the Avro Object Container Files format.
func main() { conn, err := net.Dial("tcp", "127.0.0.1:8080") if err != nil { log.Fatal(err) } fr, err := goavro.NewReader(goavro.FromReader(conn)) if err != nil { log.Fatal("cannot create Reader: ", err) } defer func() { if err := fr.Close(); err != nil { log.Fatal(err) } }() for fr.Scan() { datum, err := fr.Read() if err != nil { log.Println("cannot read datum: ", err) continue } fmt.Println("RECORD: ", datum) } }
type ReaderSetter ¶
ReaderSetter functions are those those which are used to instantiate a new Reader.
func BufferFromReader ¶
func BufferFromReader(r io.Reader) ReaderSetter
BufferFromReader wraps the specified `io.Reader` using a `bufio.Reader` to read from a file.
func FromReader ¶
func FromReader(r io.Reader) ReaderSetter
FromReader specifies the `io.Reader` to use when reading a file.
type Record ¶
type Record struct { Name string Fields []*recordField // contains filtered or unexported fields }
Record is an abstract data type used to hold data corresponding to an Avro record. Wherever an Avro schema specifies a record, this library's Decode method will return a Record initialized to the record's values read from the io.Reader. Likewise, when using Encode to convert data to an Avro record, it is necessary to create and send a Record instance to the Encode method.
func NewRecord ¶
func NewRecord(setters ...RecordSetter) (*Record, error)
NewRecord will create a Record instance corresponding to the specified schema.
func recordExample(codec goavro.Codec, w io.Writer, recordSchema string) error { // To encode a Record, you need to instantiate a Record instance // that adheres to the schema the Encoder expect. someRecord, err := goavro.NewRecord(goavro.RecordSchema(recordSchema)) if err != nil { return err } // Once you have a Record, you can set the values of the various fields. someRecord.Set("username", "Aquaman") someRecord.Set("comment", "The Atlantic is oddly cold this morning!") // Feel free to fully qualify the field name if you'd like someRecord.Set("com.example.timestamp", int64(1082196484)) // Once the fields of the Record have the correct data, you can encode it err = codec.Encode(w, someRecord) return err }
func (Record) GetFieldSchema ¶
GetFieldSchema returns the schema of the specified Record field.
func (Record) GetQualified ¶
GetQualified returns the datum of the specified Record field, without attempting to qualify the name
func (Record) SetQualified ¶
SetQualified updates the datum of the specified Record field, without attempting to qualify the name
type RecordCache ¶
type RecordCache struct {
// contains filtered or unexported fields
}
RecordCache provides simplified way of getting a value from a nested field, while memoizing intermediate results in a cache for future lookups.
func NewRecordCache ¶
func NewRecordCache(record *Record, delim byte) (*RecordCache, error)
NewRecordCache returns a new RecordCache structure used to get values from nested fields.
func example(codec Codec, someReader io.Reader) (string, error) { decoded, err := codec.Decode(someReader) record, ok := decoded.(*Record) if !ok { return "", ErrNotRecord{decoded} } rc, err := NewRecordCache(record, '/') if err != nil { return "", err } account, err := rc.Get("com.example.user/com.example.account") if err != nil { return "", err } s, ok := account.(string) if !ok { return "", fmt.Errorf("expected: string; actual: %T", account) } return s, nil }
func (*RecordCache) Get ¶
func (rc *RecordCache) Get(name string) (interface{}, error)
Get splits the specified name by the stored delimiter, and attempts to retrieve the nested value corresponding to the nested fields.
type RecordSetter ¶
RecordSetter functions are those those which are used to instantiate a new Record.
func RecordEnclosingNamespace ¶
func RecordEnclosingNamespace(someNamespace string) RecordSetter
RecordEnclosingNamespace specifies the enclosing namespace of the record to create. For instance, if the enclosing namespace is `com.example`, and the record name is `Foo`, then the full record name will be `com.example.Foo`.
func RecordPedantic ¶
func RecordPedantic() RecordSetter
RecordPedantic specifies pedantic handling, and will cause NewRecord to signal an error if various harmless schema violations occur.
func RecordSchema ¶
func RecordSchema(recordSchemaJSON string) RecordSetter
RecordSchema specifies the schema of the record to create. Schema must be a JSON string.
type StringWriter ¶
StringWriter is the interface implemented by any object that strings can be written to.
type Writer ¶
type Writer struct { CompressionCodec string Sync []byte // contains filtered or unexported fields }
Writer structure contains data necessary to write Avro files.
func NewWriter ¶
func NewWriter(setters ...WriterSetter) (*Writer, error)
NewWriter returns a object to write data to an io.Writer using the Avro Object Container Files format.
func serveClient(conn net.Conn, codec goavro.Codec) { fw, err := goavro.NewWriter( goavro.BlockSize(100), // flush data every 100 items goavro.BlockTick(10 * time.Second), // but at least every 10 seconds goavro.Compression(goavro.CompressionSnappy), goavro.ToWriter(conn), goavro.UseCodec(codec)) if err != nil { log.Fatal("cannot create Writer: ", err) } defer fw.Close() // create a record that matches the schema we want to encode someRecord, err := goavro.NewRecord(goavro.RecordSchema(recordSchema)) if err != nil { log.Fatal(err) } // identify field name to set datum for someRecord.Set("username", "Aquaman") someRecord.Set("comment", "The Atlantic is oddly cold this morning!") // you can fully qualify the field name someRecord.Set("com.example.timestamp", int64(1082196484)) fw.Write(someRecord) // create another record someRecord, err = goavro.NewRecord(goavro.RecordSchema(recordSchema)) if err != nil { log.Fatal(err) } someRecord.Set("username", "Batman") someRecord.Set("comment", "Who are all of these crazies?") someRecord.Set("com.example.timestamp", int64(1427383430)) fw.Write(someRecord) }
type WriterSetter ¶
WriterSetter functions are those those which are used to instantiate a new Writer.
func BlockSize ¶
func BlockSize(blockSize int64) WriterSetter
BlockSize specifies the default number of data items to be grouped in a block, compressed, and written to the stream.
It is a valid use case to set both BlockTick and BlockSize. For example, if BlockTick is set to time.Minute and BlockSize is set to 20, but only 13 items are written to the Writer in a minute, those 13 items will be grouped in a block, compressed, and written to the stream without waiting for the addition 7 items to complete the BlockSize.
By default, BlockSize is set to DefaultWriterBlockSize.
func BlockTick ¶
func BlockTick(blockTick time.Duration) WriterSetter
BlockTick specifies the duration of time between when the Writer will flush the blocks to the stream.
It is a valid use case to set both BlockTick and BlockSize. For example, if BlockTick is set to time.Minute and BlockSize is set to 20, but only 13 items are written to the Writer in a minute, those 13 items will be grouped in a block, compressed, and written to the stream without waiting for the addition 7 items to complete the BlockSize.
By default, BlockTick is set to 0 and is ignored. This causes the blocker to fill up its internal queue of data to BlockSize items before flushing them to the stream.
func BufferToWriter ¶
func BufferToWriter(w io.Writer) WriterSetter
BufferToWriter specifies which io.Writer is the target of the Writer stream, and creates a bufio.Writer around that io.Writer. It is invalid to specify both BufferToWriter and ToWriter. Exactly one of these must be called for a given Writer initialization.
func Compression ¶
func Compression(someCompressionCodec string) WriterSetter
Compression is used to set the compression codec of a new Writer instance.
func Sync ¶
func Sync(someSync []byte) WriterSetter
Sync is used to set the sync marker bytes of a new instance. It checks to ensure the byte slice is 16 bytes long, but does not check that it has been set to something other than the zero value. Usually you can elide the `Sync` call and allow it to create a random byte sequence.
func ToWriter ¶
func ToWriter(w io.Writer) WriterSetter
ToWriter specifies which io.Writer is the target of the Writer stream. It is invalid to specify both BufferToWriter and ToWriter. Exactly one of these must be called for a given Writer initialization.
func UseCodec ¶
func UseCodec(codec Codec) WriterSetter
UseCodec specifies that a Writer should reuse an existing Codec rather than creating a new one, needlessly recompling the same schema.
func WriterSchema ¶
func WriterSchema(someSchema string) WriterSetter
WriterSchema is used to set the Avro schema of a new instance. If a codec has already been compiled for the schema, it is faster to use the UseCodec method instead of WriterSchema.