arrowpb

package module
v1.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 3, 2025 License: MIT Imports: 25 Imported by: 0

README

arrowpb

Convert Apache Arrow records to Protocol Buffers

Go Report Card Build Status Go Reference

Examples

For examples, see the arrowpb-example repository.

Author

arrowpb is developed by @TFMV.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Documentation

Overview

arrowpb.go

Package arrowpb provides utilities for converting Apache Arrow schemas and records into Protobuf descriptors and dynamic messages. These conversions are used to serialize Arrow record batches into Protobuf messages for writing to BigQuery via the Storage Write API. The conversions support options such as using well-known timestamp types, wrapper types (e.g. google.protobuf.StringValue), and proto2/proto3 syntax. For BigQuery the generated descriptor must match the Arrow IPC serialization.

NOTE: Some BigQuery–specific value conversions (e.g. converting an Arrow value to a BigQuery Value) are expected to live in a separate package.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ArrowReaderToProtos added in v1.1.0

func ArrowReaderToProtos(ctx context.Context, reader array.RecordReader, msgDesc protoreflect.MessageDescriptor, cfg *ConvertConfig) ([][]byte, error)

ArrowReaderToProtos reads Arrow records from a RecordReader and converts them to serialized protos.

func ArrowSchemaToFileDescriptorProto added in v1.1.0

func ArrowSchemaToFileDescriptorProto(schema *arrow.Schema, packageName, messagePrefix string, cfg *ConvertConfig) (*descriptorpb.FileDescriptorProto, error)

ArrowSchemaToFileDescriptorProto converts an Arrow schema to a FileDescriptorProto. The generated proto descriptor (and top-level message) can be compiled and used to dynamically serialize Arrow record batches for BigQuery.

func CompileFileDescriptorProto added in v1.1.0

func CompileFileDescriptorProto(fdp *descriptorpb.FileDescriptorProto) (protoreflect.FileDescriptor, error)

CompileFileDescriptorProto builds a protoreflect.FileDescriptor from the given FileDescriptorProto. It also attempts to load well-known types if referenced.

func CompileFileDescriptorProtoWithRetry added in v1.1.0

func CompileFileDescriptorProtoWithRetry(fdp *descriptorpb.FileDescriptorProto) (protoreflect.FileDescriptor, error)

CompileFileDescriptorProtoWithRetry attempts to compile the file descriptor proto and retries with exponential backoff for up to 3 seconds.

func ConvertInParallel added in v1.1.0

func ConvertInParallel(ctx context.Context, rec arrow.Record, msgDesc protoreflect.MessageDescriptor, concurrency int, cfg *ConvertConfig) ([][]byte, error)

ConvertInParallel converts rows from an Arrow record in parallel into serialized protos. It uses an error group to cancel all workers upon any error.

func CreateArrowRecord added in v0.2.0

func CreateArrowRecord() (array.RecordReader, error)

CreateArrowRecord is a simple helper to create an Arrow record for testing purposes.

func ExtractArrowValue added in v1.1.0

func ExtractArrowValue(col arrow.Array, rowIndex int) interface{}

ExtractArrowValue extracts the underlying value from an Arrow array at the given index.

func FormatArrowJSON

func FormatArrowJSON(reader array.RecordReader, output io.Writer) error

FormatArrowJSON outputs an Arrow record as formatted JSON for debugging.

func GetDynamicFieldValue added in v1.9.0

func GetDynamicFieldValue(field protoreflect.FieldDescriptor, msg protoreflect.Message) interface{}

GetDynamicFieldValue extracts the underlying scalar value from a dynamic message field. If the field is a wrapper type (google.protobuf.*Value), it returns the inner value.

func GetTopLevelMessageDescriptor added in v1.1.0

func GetTopLevelMessageDescriptor(fd protoreflect.FileDescriptor) (protoreflect.MessageDescriptor, error)

GetTopLevelMessageDescriptor retrieves the first (top-level) message descriptor from a FileDescriptor.

func RecordToDynamicProtos added in v1.1.0

func RecordToDynamicProtos(rec arrow.Record, msgDesc protoreflect.MessageDescriptor, cfg *ConvertConfig) ([][]byte, error)

RecordToDynamicProtos converts an entire Arrow record into a slice of serialized proto messages.

func RowToDynamicProto added in v1.1.0

func RowToDynamicProto(rec arrow.Record, msgDesc protoreflect.MessageDescriptor, rowIndex int, cfg *ConvertConfig) (*dynamicpb.Message, error)

RowToDynamicProto converts a single row from an Arrow record to a dynamic proto message.

Types

type ConvertConfig added in v1.1.0

type ConvertConfig struct {
	// UseWellKnownTimestamps instructs the conversion to map Arrow TIMESTAMP
	// types to the well-known google.protobuf.Timestamp type.
	UseWellKnownTimestamps bool

	// UseProto2Syntax instructs the conversion to generate proto2 instead of proto3.
	UseProto2Syntax bool

	// UseWrapperTypes instructs the conversion to represent basic types as wrapper
	// messages (e.g. google.protobuf.StringValue) rather than primitive fields.
	UseWrapperTypes bool

	// MapDictionariesToEnums, when true, maps Arrow dictionary types to enum fields.
	MapDictionariesToEnums bool

	// ForceFlatSchema, when true, forces nested struct fields to be flattened.
	ForceFlatSchema bool

	// DescriptorCache is used to cache FileDescriptorProto values for a given Arrow schema.
	DescriptorCache sync.Map
}

ConvertConfig holds options that control how an Arrow schema is converted into a Protobuf descriptor.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL