avro

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 10, 2024 License: Apache-2.0, BSD-2-Clause, BSD-3-Clause, + 7 more Imports: 21 Imported by: 0

Documentation

Overview

Package avro reads Avro OCF files and presents the extracted data as records

Index

Constants

This section is empty.

Variables

View Source
var ErrMismatchFields = errors.New("arrow/avro: number of records mismatch")
View Source
var (
	ErrNullStructData = errors.New("null struct data")
)

Functions

func ArrowSchemaFromAvro

func ArrowSchemaFromAvro(schema avro.Schema) (s *arrow.Schema, err error)

ArrowSchemaFromAvro returns a new Arrow schema from an Avro schema

Types

type OCFReader

type OCFReader struct {
	// contains filtered or unexported fields
}

Reader wraps goavro/OCFReader and creates array.Records from a schema.

func NewOCFReader

func NewOCFReader(r io.Reader, opts ...Option) (*OCFReader, error)

NewReader returns a reader that reads from an Avro OCF file and creates arrow.Records from the converted avro data.

func (*OCFReader) AvroSchema

func (r *OCFReader) AvroSchema() string

AvroSchema returns the Avro schema of the Avro OCF

func (*OCFReader) Close

func (r *OCFReader) Close()

Close closes the OCFReader's Avro record read cache and converted Arrow record cache. OCFReader must be closed if the Avro OCF's records have not been read to completion.

func (*OCFReader) Err

func (r *OCFReader) Err() error

Err returns the last error encountered during the iteration over the underlying Avro file.

func (*OCFReader) Metrics

func (r *OCFReader) Metrics() string

Metrics returns the maximum queue depth of the Avro record read cache and of the converted Arrow record cache.

func (*OCFReader) Next

func (r *OCFReader) Next() bool

Next returns whether a Record can be received from the converted record queue. The user should check Err() after call to Next that return false to check if an error took place.

func (*OCFReader) OCFRecordsReadCount

func (r *OCFReader) OCFRecordsReadCount() int64

OCFRecordsReadCount returns the number of Avro datum that were read from the Avro file.

func (*OCFReader) Record

func (r *OCFReader) Record() arrow.Record

Record returns the current record that has been extracted from the underlying Avro OCF file. It is valid until the next call to Next.

func (*OCFReader) Release

func (r *OCFReader) Release()

Release decreases the reference count by 1. When the reference count goes to zero, the memory is freed. Release may be called simultaneously from multiple goroutines.

func (*OCFReader) Retain

func (r *OCFReader) Retain()

Retain increases the reference count by 1. Retain may be called simultaneously from multiple goroutines.

func (*OCFReader) Reuse

func (rr *OCFReader) Reuse(r io.Reader, opts ...Option) error

Reuse allows the OCFReader to be reused to read another Avro file provided the new Avro file has an identical schema.

func (*OCFReader) Schema

func (r *OCFReader) Schema() *arrow.Schema

Schema returns the converted Arrow schema of the Avro OCF

type Option

type Option func(config)

Option configures an Avro reader/writer.

func WithAllocator

func WithAllocator(mem memory.Allocator) Option

WithAllocator specifies the Arrow memory allocator used while building records.

func WithChunk

func WithChunk(n int) Option

WithChunk specifies the chunk size used while reading Avro OCF files.

If n is zero or 1, no chunking will take place and the reader will create one record per row. If n is greater than 1, chunks of n rows will be read. If n is negative, the reader will load the whole Avro OCF file into memory and create one big record with all the rows.

func WithReadCacheSize

func WithReadCacheSize(n int) Option

WithReadCacheSize specifies the size of the OCF record decode queue, default value is 500.

func WithRecordCacheSize

func WithRecordCacheSize(n int) Option

WithRecordCacheSize specifies the size of the converted Arrow record queue, default value is 1.

func WithSchemaEdit

func WithSchemaEdit(method, path string, value any) Option

WithSchemaEdit specifies modifications to the Avro schema. Supported methods are 'set' and 'delete'. Set sets the value for the specified path. Delete deletes the value for the specified path. A path is in dot syntax, such as "fields.1" or "fields.0.type". The modified Avro schema is validated before conversion to Arrow schema - NewOCFReader will return an error if the modified schema cannot be parsed.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL