Documentation ¶
Overview ¶
Package avro reads Avro OCF files and presents the extracted data as records
Index ¶
- Variables
- func ArrowSchemaFromAvro(schema avro.Schema) (s *arrow.Schema, err error)
- type OCFReader
- func (r *OCFReader) AvroSchema() string
- func (r *OCFReader) Close()
- func (r *OCFReader) Err() error
- func (r *OCFReader) Metrics() string
- func (r *OCFReader) Next() bool
- func (r *OCFReader) OCFRecordsReadCount() int64
- func (r *OCFReader) Record() arrow.Record
- func (r *OCFReader) Release()
- func (r *OCFReader) Retain()
- func (rr *OCFReader) Reuse(r io.Reader, opts ...Option) error
- func (r *OCFReader) Schema() *arrow.Schema
- type Option
Constants ¶
This section is empty.
Variables ¶
var ErrMismatchFields = errors.New("arrow/avro: number of records mismatch")
var (
ErrNullStructData = errors.New("null struct data")
)
Functions ¶
Types ¶
type OCFReader ¶
type OCFReader struct {
// contains filtered or unexported fields
}
Reader wraps goavro/OCFReader and creates array.Records from a schema.
func NewOCFReader ¶
NewReader returns a reader that reads from an Avro OCF file and creates arrow.Records from the converted avro data.
func (*OCFReader) AvroSchema ¶
AvroSchema returns the Avro schema of the Avro OCF
func (*OCFReader) Close ¶
func (r *OCFReader) Close()
Close closes the OCFReader's Avro record read cache and converted Arrow record cache. OCFReader must be closed if the Avro OCF's records have not been read to completion.
func (*OCFReader) Err ¶
Err returns the last error encountered during the iteration over the underlying Avro file.
func (*OCFReader) Metrics ¶
Metrics returns the maximum queue depth of the Avro record read cache and of the converted Arrow record cache.
func (*OCFReader) Next ¶
Next returns whether a Record can be received from the converted record queue. The user should check Err() after call to Next that return false to check if an error took place.
func (*OCFReader) OCFRecordsReadCount ¶
OCFRecordsReadCount returns the number of Avro datum that were read from the Avro file.
func (*OCFReader) Record ¶
Record returns the current record that has been extracted from the underlying Avro OCF file. It is valid until the next call to Next.
func (*OCFReader) Release ¶
func (r *OCFReader) Release()
Release decreases the reference count by 1. When the reference count goes to zero, the memory is freed. Release may be called simultaneously from multiple goroutines.
func (*OCFReader) Retain ¶
func (r *OCFReader) Retain()
Retain increases the reference count by 1. Retain may be called simultaneously from multiple goroutines.
type Option ¶
type Option func(config)
Option configures an Avro reader/writer.
func WithAllocator ¶
WithAllocator specifies the Arrow memory allocator used while building records.
func WithChunk ¶
WithChunk specifies the chunk size used while reading Avro OCF files.
If n is zero or 1, no chunking will take place and the reader will create one record per row. If n is greater than 1, chunks of n rows will be read. If n is negative, the reader will load the whole Avro OCF file into memory and create one big record with all the rows.
func WithReadCacheSize ¶
WithReadCacheSize specifies the size of the OCF record decode queue, default value is 500.
func WithRecordCacheSize ¶
WithRecordCacheSize specifies the size of the converted Arrow record queue, default value is 1.
func WithSchemaEdit ¶
WithSchemaEdit specifies modifications to the Avro schema. Supported methods are 'set' and 'delete'. Set sets the value for the specified path. Delete deletes the value for the specified path. A path is in dot syntax, such as "fields.1" or "fields.0.type". The modified Avro schema is validated before conversion to Arrow schema - NewOCFReader will return an error if the modified schema cannot be parsed.