cdata

package
v0.3.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2024 License: Apache-2.0, Apache-2.0, BSD-2-Clause, + 9 more Imports: 21 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExportArrowArray

func ExportArrowArray(arr array.Interface, out *CArrowArray, outSchema *CArrowSchema)

ExportArrowArray populates the CArrowArray that is passed in with the pointers to the memory being used by the array.Interface passed in, in order to share with zero-copy across the C Data Interface. See the documentation for ExportArrowRecordBatch for details on how to ensure you do not leak memory and prevent unwanted, undefined or strange behaviors.

func ExportArrowRecordBatch

func ExportArrowRecordBatch(rb array.Record, out *CArrowArray, outSchema *CArrowSchema)

ExportArrowRecordBatch populates the passed in CArrowArray (and optionally the schema too) by sharing the memory used for the buffers of each column's arrays. It does not copy the data, and will internally increment the reference counters so that releasing the record will not free the memory prematurely.

When using CGO, memory passed to C is pinned so that the Go garbage collector won't move where it is allocated out from under the C pointer locations, ensuring the C pointers stay valid. This is only true until the CGO call returns, at which point the garbage collector is free to move things around again. As a result, if the function you're calling is going to hold onto the pointers or otherwise continue to reference the memory *after* the call returns, you should use the CgoArrowAllocator rather than the GoAllocator (or DefaultAllocator) so that the memory which is allocated for the record batch in the first place is allocated in C, not by the Go runtime and is therefore not subject to the Garbage collection.

The release function on the populated CArrowArray will properly decrease the reference counts, and release the memory if the record has already been released. But since this must be explicitly done, make sure it is released so that you do not create a memory leak.

func ExportArrowSchema

func ExportArrowSchema(schema *arrow.Schema, out *CArrowSchema)

ExportArrowSchema populates the passed in CArrowSchema with the schema passed in so that it can be passed to some consumer of the C Data Interface. The `release` function is tied to a callback in order to properly release any memory that was allocated during the populating of the struct. Any memory allocated will be allocated using malloc which means that it is invisible to the Go Garbage Collector and must be freed manually using the callback on the CArrowSchema object.

func ImportCArray

func ImportCArray(arr *CArrowArray, schema *CArrowSchema) (arrow.Field, array.Interface, error)

ImportCArray takes a pointer to both a C Data ArrowArray and C Data ArrowSchema in order to import them into usable Go Objects. If err is not nil, then ArrowArrayRelease must still be called on arr to release the memory. The ArrowSchemaRelease will be called on the passed in schema regardless of whether there is an error or not.

The Schema will be copied with the information used to populate the returned Field, complete with metadata. The array will reference the same memory that is referred to by the ArrowArray object and take ownership of it as per ImportCArrayWithType. The returned array.Interface will own the C memory and call ArrowArrayRelease when the array.Data object is cleaned up.

NOTE: The array takes ownership of the underlying memory buffers via ArrowArrayMove, it does not take ownership of the actual arr object itself.

func ImportCArrayStream

func ImportCArrayStream(stream *CArrowArrayStream, schema *arrow.Schema) arrio.Reader

ImportCArrayStream creates an arrio.Reader from an ArrowArrayStream taking ownership of the underlying stream object via ArrowArrayStreamMove.

The records returned by this reader must be released manually after they are returned. The reader itself will release the stream via SetFinalizer when it is garbage collected. It will return (nil, io.EOF) from the Read function when there are no more records to return.

NOTE: The reader takes ownership of the underlying memory buffers via ArrowArrayStreamMove, it does not take ownership of the actual stream object itself.

func ImportCArrayWithType

func ImportCArrayWithType(arr *CArrowArray, dt arrow.DataType) (array.Interface, error)

ImportCArrayWithType takes a pointer to a C Data ArrowArray and interprets the values as an array with the given datatype. If err is not nil, then ArrowArrayRelease must still be called on arr to release the memory.

The underlying buffers will not be copied, but will instead be referenced directly by the resulting array interface object. The passed in ArrowArray will have it's ownership transferred to the resulting array.Interface via ArrowArrayMove. The underlying array.Data object that is owned by the Array will now be the owner of the memory pointer and will call ArrowArrayRelease when it is released and garbage collected via runtime.SetFinalizer.

NOTE: The array takes ownership of the underlying memory buffers via ArrowArrayMove, it does not take ownership of the actual arr object itself.

func ImportCArrowField

func ImportCArrowField(out *CArrowSchema) (arrow.Field, error)

ImportCArrowField takes in an ArrowSchema from the C Data interface, it will copy the metadata and type definitions rather than keep direct references to them. It is safe to call C.ArrowSchemaRelease after receiving the field from this function.

func ImportCArrowSchema

func ImportCArrowSchema(out *CArrowSchema) (*arrow.Schema, error)

ImportCArrowSchema takes in the ArrowSchema from the C Data Interface, it will copy the metadata and schema definitions over from the C object rather than keep direct references to them. This function will call ArrowSchemaRelease on the passed in schema regardless of whether or not there is an error returned.

This version is intended to take in a schema for a record batch, which means that the top level of the schema should be a struct of the schema fields. If importing a single array's schema, then use ImportCArrowField instead.

func ImportCRecordBatch

func ImportCRecordBatch(arr *CArrowArray, sc *CArrowSchema) (array.Record, error)

ImportCRecordBatch imports an ArrowArray from C as a record batch. If err is not nil, then ArrowArrayRelease must still be called to release the memory.

A record batch is represented in the C Data Interface as a Struct Array whose fields are the columns of the record batch. Thus after importing the schema passed in here, if it is not a Struct type, this will return an error. As with ImportCArray, the columns in the record batch will take ownership of the CArrowArray memory if successful. Since ArrowArrayMove is used, it's still safe to call ArrowArrayRelease on the source regardless. But if there is an error, it *MUST* be called to ensure there is no memory leak.

NOTE: The array takes ownership of the underlying memory buffers via ArrowArrayMove, it does not take ownership of the actual arr object itself.

func ImportCRecordBatchWithSchema

func ImportCRecordBatchWithSchema(arr *CArrowArray, sc *arrow.Schema) (array.Record, error)

ImportCRecordBatchWithSchema is used for importing a Record Batch array when the schema is already known such as when receiving record batches through a stream.

All of the semantics regarding memory ownership are the same as when calling ImportCRecordBatch directly with a schema.

NOTE: The array takes ownership of the underlying memory buffers via ArrowArrayMove, it does not take ownership of the actual arr object itself.

Types

type CArrowArray

type CArrowArray = C.struct_ArrowArray

CArrowArray is the C Data Interface object for Arrow Arrays as defined in abi.h

func ArrayFromPtr

func ArrayFromPtr(ptr uintptr) *CArrowArray

ArrayFromPtr is a simple helper function to cast a uintptr to a *CArrowArray

type CArrowArrayStream

type CArrowArrayStream = C.struct_ArrowArrayStream

CArrowArrayStream is the Experimental API for handling streams of record batches through the C Data interface.

type CArrowSchema

type CArrowSchema = C.struct_ArrowSchema

CArrowSchema is the C Data Interface for ArrowSchemas defined in abi.h

func SchemaFromPtr

func SchemaFromPtr(ptr uintptr) *CArrowSchema

SchemaFromPtr is a simple helper function to cast a uintptr to a *CArrowSchema

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL