cdata

package

v18.0.0-...-f9949aa Latest Latest Go to latest Published: Dec 16, 2024 License: Apache-2.0, BSD-3-Clause Imports: 24 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/joechenrh/arrow-go

Documentation ¶

Index ¶

func CreateAsyncDeviceStreamHandler(ctx context.Context, queueSize uint64, out *CArrowAsyncDeviceStreamHandler) <-chan AsyncRecordBatchStream
func ExportArrowArray(arr arrow.Array, out *CArrowArray, outSchema *CArrowSchema)
func ExportArrowRecordBatch(rb arrow.Record, out *CArrowArray, outSchema *CArrowSchema)
func ExportArrowSchema(schema *arrow.Schema, out *CArrowSchema)
func ExportAsyncRecordBatchStream(schema *arrow.Schema, stream <-chan RecordMessage, ...) error
func ExportRecordReader(reader array.RecordReader, out *CArrowArrayStream)
func ImportCArray(arr *CArrowArray, schema *CArrowSchema) (arrow.Field, arrow.Array, error)
func ImportCArrayStream(stream *CArrowArrayStream, schema *arrow.Schema) arrio.Readerdeprecated
func ImportCArrayWithType(arr *CArrowArray, dt arrow.DataType) (arrow.Array, error)
func ImportCArrowField(out *CArrowSchema) (arrow.Field, error)
func ImportCArrowSchema(out *CArrowSchema) (*arrow.Schema, error)
func ImportCRecordBatch(arr *CArrowArray, sc *CArrowSchema) (arrow.Record, error)
func ImportCRecordBatchWithSchema(arr *CArrowArray, sc *arrow.Schema) (arrow.Record, error)
func ImportCRecordReader(stream *CArrowArrayStream, schema *arrow.Schema) (arrio.Reader, error)
func ReleaseCArrowArray(arr *CArrowArray)
func ReleaseCArrowSchema(schema *CArrowSchema)
type AsyncRecordBatchStream
type AsyncStreamError
- func (e AsyncStreamError) Error() string
type CArrowArray
- func ArrayFromPtr(ptr uintptr) *CArrowArray
type CArrowArrayStream
type CArrowAsyncDeviceStreamHandler
type CArrowAsyncProducer
type CArrowAsyncTask
type CArrowDeviceArray
type CArrowSchema
- func SchemaFromPtr(ptr uintptr) *CArrowSchema
type RecordMessage

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CreateAsyncDeviceStreamHandler ¶

func CreateAsyncDeviceStreamHandler(ctx context.Context, queueSize uint64, out *CArrowAsyncDeviceStreamHandler) <-chan AsyncRecordBatchStream

CreateAsyncDeviceStreamHandler populates a given ArrowAsyncDeviceStreamHandler's callbacks and waits for the on_schema callback to be called before passing the AsyncRecordBatchStream object across the returned channel.

The provided queueSize is the number of records that will be requested at a time to be passed along the Stream in the returned AsyncRecordBatchStream. See the documentation on https://arrow.apache.org/docs/format/CDeviceDataInterface.html for more information as to the expected semantics of that size.

The populated ArrowAsyncDeviceStreamHandler can then be given to any compatible provider for async record batch streams via the C Device interface.

func ExportArrowArray ¶

func ExportArrowArray(arr arrow.Array, out *CArrowArray, outSchema *CArrowSchema)

ExportArrowArray populates the CArrowArray that is passed in with the pointers to the memory being used by the arrow.Array passed in, in order to share with zero-copy across the C Data Interface. See the documentation for ExportArrowRecordBatch for details on how to ensure you do not leak memory and prevent unwanted, undefined or strange behaviors.

WARNING: the output ArrowArray MUST BE ZERO INITIALIZED, or the Go garbage collector may error at runtime, due to CGO rules ("the current implementation may sometimes cause a runtime error if the contents of the C memory appear to be a Go pointer"). You have been warned!

func ExportArrowRecordBatch ¶

func ExportArrowRecordBatch(rb arrow.Record, out *CArrowArray, outSchema *CArrowSchema)

ExportArrowRecordBatch populates the passed in CArrowArray (and optionally the schema too) by sharing the memory used for the buffers of each column's arrays. It does not copy the data, and will internally increment the reference counters so that releasing the record will not free the memory prematurely.

When using CGO, memory passed to C is pinned so that the Go garbage collector won't move where it is allocated out from under the C pointer locations, ensuring the C pointers stay valid. This is only true until the CGO call returns, at which point the garbage collector is free to move things around again. As a result, if the function you're calling is going to hold onto the pointers or otherwise continue to reference the memory *after* the call returns, you should use the CgoArrowAllocator rather than the GoAllocator (or DefaultAllocator) so that the memory which is allocated for the record batch in the first place is allocated in C, not by the Go runtime and is therefore not subject to the Garbage collection.

The release function on the populated CArrowArray will properly decrease the reference counts, and release the memory if the record has already been released. But since this must be explicitly done, make sure it is released so that you do not create a memory leak.

WARNING: the output ArrowArray MUST BE ZERO INITIALIZED, or the Go garbage collector may error at runtime, due to CGO rules ("the current implementation may sometimes cause a runtime error if the contents of the C memory appear to be a Go pointer"). You have been warned!

func ExportArrowSchema ¶

func ExportArrowSchema(schema *arrow.Schema, out *CArrowSchema)

ExportArrowSchema populates the passed in CArrowSchema with the schema passed in so that it can be passed to some consumer of the C Data Interface. The `release` function is tied to a callback in order to properly release any memory that was allocated during the populating of the struct. Any memory allocated will be allocated using malloc which means that it is invisible to the Go Garbage Collector and must be freed manually using the callback on the CArrowSchema object.

WARNING: the output ArrowSchema MUST BE ZERO INITIALIZED, or the Go garbage collector may error at runtime, due to CGO rules ("the current implementation may sometimes cause a runtime error if the contents of the C memory appear to be a Go pointer"). You have been warned!

func ExportAsyncRecordBatchStream ¶

func ExportAsyncRecordBatchStream(schema *arrow.Schema, stream <-chan RecordMessage, handler *CArrowAsyncDeviceStreamHandler) error

ExportAsyncRecordBatchStream takes in a schema and a channel of RecordMessages along with a ArrowAsyncDeviceStreamHandler to export the records as they come across the channel and call the appropriate callbacks on the handler. This function will block until the stream is closed or a message containing an error comes across the channel.

The returned error will be nil if everything is successful, otherwise it will be the error which is encountered on the stream or an AsyncError if one of the handler callbacks returns an error.

func ExportRecordReader ¶

func ExportRecordReader(reader array.RecordReader, out *CArrowArrayStream)

ExportRecordReader populates the CArrowArrayStream that is passed in with the appropriate callbacks to be a working ArrowArrayStream utilizing the passed in RecordReader. The CArrowArrayStream takes ownership of the RecordReader until the consumer calls the release callback, as such it is unnecessary to call Release on the passed in reader unless it has previously been retained.

WARNING: the output ArrowArrayStream MUST BE ZERO INITIALIZED, or the Go garbage collector may error at runtime, due to CGO rules ("the current implementation may sometimes cause a runtime error if the contents of the C memory appear to be a Go pointer"). You have been warned!

func ImportCArray ¶

func ImportCArray(arr *CArrowArray, schema *CArrowSchema) (arrow.Field, arrow.Array, error)

ImportCArray takes a pointer to both a C Data ArrowArray and C Data ArrowSchema in order to import them into usable Go Objects. If err is not nil, then ArrowArrayRelease must still be called on arr to release the memory. The ArrowSchemaRelease will be called on the passed in schema regardless of whether there is an error or not.

The Schema will be copied with the information used to populate the returned Field, complete with metadata. The array will reference the same memory that is referred to by the ArrowArray object and take ownership of it as per ImportCArrayWithType. The returned arrow.Array will own the C memory and call ArrowArrayRelease when the array.Data object is cleaned up.

NOTE: The array takes ownership of the underlying memory buffers via ArrowArrayMove, it does not take ownership of the actual arr object itself.

func ImportCArrayStream deprecated

func ImportCArrayStream(stream *CArrowArrayStream, schema *arrow.Schema) arrio.Reader

ImportCArrayStream creates an arrio.Reader from an ArrowArrayStream taking ownership of the underlying stream object via ArrowArrayStreamMove.

The records returned by this reader must be released manually after they are returned. The reader itself will release the stream via SetFinalizer when it is garbage collected. It will return (nil, io.EOF) from the Read function when there are no more records to return.

NOTE: The reader takes ownership of the underlying memory buffers via ArrowArrayStreamMove, it does not take ownership of the actual stream object itself.

Deprecated: This will panic if importing the schema fails (which is possible). Prefer ImportCRecordReader instead.

func ImportCArrayWithType ¶

func ImportCArrayWithType(arr *CArrowArray, dt arrow.DataType) (arrow.Array, error)

ImportCArrayWithType takes a pointer to a C Data ArrowArray and interprets the values as an array with the given datatype. If err is not nil, then ArrowArrayRelease must still be called on arr to release the memory.

The underlying buffers will not be copied, but will instead be referenced directly by the resulting array interface object. The passed in ArrowArray will have it's ownership transferred to the resulting arrow.Array via ArrowArrayMove. The underlying array.Data object that is owned by the Array will now be the owner of the memory pointer and will call ArrowArrayRelease when it is released and garbage collected via runtime.SetFinalizer.

NOTE: The array takes ownership of the underlying memory buffers via ArrowArrayMove, it does not take ownership of the actual arr object itself.

func ImportCArrowField ¶

func ImportCArrowField(out *CArrowSchema) (arrow.Field, error)

ImportCArrowField takes in an ArrowSchema from the C Data interface, it will copy the metadata and type definitions rather than keep direct references to them. It is safe to call C.ArrowSchemaRelease after receiving the field from this function.

func ImportCArrowSchema ¶

func ImportCArrowSchema(out *CArrowSchema) (*arrow.Schema, error)

ImportCArrowSchema takes in the ArrowSchema from the C Data Interface, it will copy the metadata and schema definitions over from the C object rather than keep direct references to them. This function will call ArrowSchemaRelease on the passed in schema regardless of whether or not there is an error returned.

This version is intended to take in a schema for a record batch, which means that the top level of the schema should be a struct of the schema fields. If importing a single array's schema, then use ImportCArrowField instead.

func ImportCRecordBatch ¶

func ImportCRecordBatch(arr *CArrowArray, sc *CArrowSchema) (arrow.Record, error)

ImportCRecordBatch imports an ArrowArray from C as a record batch. If err is not nil, then ArrowArrayRelease must still be called to release the memory.

A record batch is represented in the C Data Interface as a Struct Array whose fields are the columns of the record batch. Thus after importing the schema passed in here, if it is not a Struct type, this will return an error. As with ImportCArray, the columns in the record batch will take ownership of the CArrowArray memory if successful. Since ArrowArrayMove is used, it's still safe to call ArrowArrayRelease on the source regardless. But if there is an error, it *MUST* be called to ensure there is no memory leak.

NOTE: The array takes ownership of the underlying memory buffers via ArrowArrayMove, it does not take ownership of the actual arr object itself.

func ImportCRecordBatchWithSchema ¶

func ImportCRecordBatchWithSchema(arr *CArrowArray, sc *arrow.Schema) (arrow.Record, error)

ImportCRecordBatchWithSchema is used for importing a Record Batch array when the schema is already known such as when receiving record batches through a stream.

All of the semantics regarding memory ownership are the same as when calling ImportCRecordBatch directly with a schema.

NOTE: The array takes ownership of the underlying memory buffers via ArrowArrayMove, it does not take ownership of the actual arr object itself.

func ImportCRecordReader ¶

func ImportCRecordReader(stream *CArrowArrayStream, schema *arrow.Schema) (arrio.Reader, error)

ImportCStreamReader creates an arrio.Reader from an ArrowArrayStream taking ownership of the underlying stream object via ArrowArrayStreamMove.

The records returned by this reader must be released manually after they are returned. The reader itself will release the stream via SetFinalizer when it is garbage collected. It will return (nil, io.EOF) from the Read function when there are no more records to return.

NOTE: The reader takes ownership of the underlying memory buffers via ArrowArrayStreamMove, it does not take ownership of the actual stream object itself.

func ReleaseCArrowArray ¶

func ReleaseCArrowArray(arr *CArrowArray)

ReleaseCArrowArray calls ArrowArrayRelease on the passed in cdata array

func ReleaseCArrowSchema ¶

func ReleaseCArrowSchema(schema *CArrowSchema)

ReleaseCArrowSchema calls ArrowSchemaRelease on the passed in cdata schema

Types ¶

type AsyncRecordBatchStream ¶

type AsyncRecordBatchStream struct {
	Schema             *arrow.Schema
	AdditionalMetadata arrow.Metadata
	Err                error
	Stream             <-chan RecordMessage
}

AsyncRecordBatchStream represents a stream of record batches being read in from an ArrowAsyncDeviceStreamHandler's callbacks. If an error was encountered before the call to on_schema, then this will contain the error as Err. Otherwise the Schema will be valid and the Stream is a channel of RecordMessages being propagated via on_next_task and extract_data.

type AsyncStreamError ¶

type AsyncStreamError struct {
	Code     int
	Msg      string
	Metadata string
}

AsyncStreamError represents an error encountered via a call to the on_error callback of an ArrowAsyncDeviceStreamHandler. The Code is the error code that should be errno compatible.

func (AsyncStreamError) Error ¶

func (e AsyncStreamError) Error() string

type CArrowArray ¶

type CArrowArray = C.struct_ArrowArray

CArrowArray is the C Data Interface object for Arrow Arrays as defined in abi.h

func ArrayFromPtr ¶

func ArrayFromPtr(ptr uintptr) *CArrowArray

ArrayFromPtr is a simple helper function to cast a uintptr to a *CArrowArray

type CArrowArrayStream ¶

type CArrowArrayStream = C.struct_ArrowArrayStream

CArrowArrayStream is the C Stream Interface object for handling streams of record batches.

type CArrowAsyncDeviceStreamHandler ¶

type CArrowAsyncDeviceStreamHandler = C.struct_ArrowAsyncDeviceStreamHandler

type CArrowAsyncProducer ¶

type CArrowAsyncProducer = C.struct_ArrowAsyncProducer

type CArrowAsyncTask ¶

type CArrowAsyncTask = C.struct_ArrowAsyncTask

type CArrowDeviceArray ¶

type CArrowDeviceArray = C.struct_ArrowDeviceArray

type CArrowSchema ¶

type CArrowSchema = C.struct_ArrowSchema

CArrowSchema is the C Data Interface for ArrowSchemas defined in abi.h

func SchemaFromPtr ¶

func SchemaFromPtr(ptr uintptr) *CArrowSchema

SchemaFromPtr is a simple helper function to cast a uintptr to a *CArrowSchema

type RecordMessage ¶

type RecordMessage struct {
	Record             arrow.Record
	AdditionalMetadata arrow.Metadata
	Err                error
}

RecordMessage is a simple container for a record batch channel to stream for using the Async C Data Interface via ExportAsyncRecordBatchStream.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL