Documentation ¶
Index ¶
- func ArrayFromSlice[T arrow.NumericType | bool](mem memory.Allocator, data []T) arrow.Array
- func ArrayFromSliceWithValid[T arrow.NumericType | bool](mem memory.Allocator, data []T, valid []bool) arrow.Array
- func FillZeroLength(dt arrow.DataType, span *ArraySpan)
- func GetAllocator(ctx context.Context) memory.Allocator
- func GetSpanOffsets[T int32 | int64](span *ArraySpan, i int) []T
- func GetSpanValues[T arrow.FixedWidthType](span *ArraySpan, i int) []T
- func HashCombine(seed, value uint64) uint64
- func Max[T constraints.Ordered](a, b T) T
- func Min[T constraints.Ordered](a, b T) T
- func PromoteExecSpanScalars(span ExecSpan)
- func RechunkArraysConsistently(groups [][]arrow.Array) [][]arrow.Array
- func WithAllocator(ctx context.Context, mem memory.Allocator) context.Context
- type ArrayIter
- type ArrayKernelExec
- type ArraySpan
- func (a *ArraySpan) Dictionary() *ArraySpan
- func (a *ArraySpan) FillFromScalar(val scalar.Scalar)
- func (a *ArraySpan) GetBuffer(idx int) *memory.Buffer
- func (a *ArraySpan) MakeArray() arrow.Array
- func (a *ArraySpan) MakeData() arrow.ArrayData
- func (a *ArraySpan) MayHaveNulls() bool
- func (a *ArraySpan) NumBuffers() int
- func (a *ArraySpan) Release()
- func (a *ArraySpan) SetDictionary(span *ArraySpan)
- func (a *ArraySpan) SetMembers(data arrow.ArrayData)
- func (a *ArraySpan) SetSlice(off, length int64)
- func (a *ArraySpan) TakeOwnership(data arrow.ArrayData)
- func (a *ArraySpan) UpdateNullCount() int64
- type BoolIter
- type BufferSpan
- type ChunkResolver
- type ChunkedExec
- type ExecResult
- type ExecSpan
- type ExecValue
- type FSBIter
- type FinalizeFunc
- type InputKind
- type InputType
- type Kernel
- type KernelCtx
- type KernelInitArgs
- type KernelInitFn
- type KernelSignature
- type KernelState
- type MemAlloc
- type NonAggKernel
- type NullHandling
- type OutputType
- type PrimitiveIter
- type ResolveKind
- type ScalarKernel
- func (s ScalarKernel) CanFillSlices() bool
- func (s *ScalarKernel) Exec(ctx *KernelCtx, sp *ExecSpan, out *ExecResult) error
- func (k ScalarKernel) GetInitFn() KernelInitFn
- func (s ScalarKernel) GetMemAlloc() MemAlloc
- func (s ScalarKernel) GetNullHandling() NullHandling
- func (k ScalarKernel) GetSig() *KernelSignature
- type TypeMatcher
- func BinaryLike() TypeMatcher
- func DurationTypeUnit(unit arrow.TimeUnit) TypeMatcher
- func FixedSizeBinaryLike() TypeMatcher
- func Integer() TypeMatcher
- func LargeBinaryLike() TypeMatcher
- func Primitive() TypeMatcher
- func RunEndEncoded(runEndsMatcher, encodedMatcher TypeMatcher) TypeMatcher
- func SameTypeID(id arrow.Type) TypeMatcher
- func Time32TypeUnit(unit arrow.TimeUnit) TypeMatcher
- func Time64TypeUnit(unit arrow.TimeUnit) TypeMatcher
- func TimestampTypeUnit(unit arrow.TimeUnit) TypeMatcher
- type TypeResolver
- type VarBinaryIter
- type VectorKernel
- func (s VectorKernel) CanFillSlices() bool
- func (s *VectorKernel) Exec(ctx *KernelCtx, sp *ExecSpan, out *ExecResult) error
- func (k VectorKernel) GetInitFn() KernelInitFn
- func (s VectorKernel) GetMemAlloc() MemAlloc
- func (s VectorKernel) GetNullHandling() NullHandling
- func (k VectorKernel) GetSig() *KernelSignature
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ArrayFromSlice ¶
func ArrayFromSliceWithValid ¶
func FillZeroLength ¶
FillZeroLength fills an ArraySpan with the appropriate information for a Zero Length Array of the provided type.
func GetAllocator ¶
GetAllocator retrieves the allocator from the context, or returns memory.DefaultAllocator if there was no allocator in the provided context.
func GetSpanOffsets ¶
GetSpanOffsets is like GetSpanValues, except it is only for int32 or int64 and adds the additional 1 expected value for an offset buffer (ie. len(output) == span.Len+1)
func GetSpanValues ¶
func GetSpanValues[T arrow.FixedWidthType](span *ArraySpan, i int) []T
GetSpanValues returns a properly typed slice by reinterpreting the buffer at index i using unsafe.Slice. This will take into account the offset of the given ArraySpan.
func HashCombine ¶
func Max ¶
func Max[T constraints.Ordered](a, b T) T
func Min ¶
func Min[T constraints.Ordered](a, b T) T
func PromoteExecSpanScalars ¶
func PromoteExecSpanScalars(span ExecSpan)
PromoteExecSpanScalars promotes the values of the passed in ExecSpan from scalars to Arrays of length 1 for each value.
Types ¶
type ArrayIter ¶
type ArrayIter[T arrayTypes] interface {
Next() T
}
func NewBoolIter ¶
func NewFSBIter ¶
func NewPrimitiveIter ¶
func NewPrimitiveIter[T arrow.FixedWidthType](arr *ArraySpan) ArrayIter[T]
type ArrayKernelExec ¶
type ArrayKernelExec = func(*KernelCtx, *ExecSpan, *ExecResult) error
ArrayKernelExec is an alias definition for a kernel's execution function.
This is used for both stateless and stateful kernels. If a kernel depends on some execution state, it can be accessed from the KernelCtx object, which also contains the context.Context object which can be used for shortcircuiting by checking context.Done / context.Err. This allows kernels to control handling timeouts or cancellation of computation.
type ArraySpan ¶
type ArraySpan struct { Type arrow.DataType Len int64 Nulls int64 Offset int64 Buffers [3]BufferSpan // Scratch is a holding spot for things such as // offsets or union type codes when converting from scalars Scratch [2]uint64 Children []ArraySpan }
ArraySpan is a light-weight, non-owning version of arrow.ArrayData for more efficient handling with computation and engines. We use explicit go Arrays to define the buffers and some scratch space for easily populating and shifting around pointers to memory without having to worry about and deal with retain/release during calculations.
func (*ArraySpan) Dictionary ¶
Dictionary returns a pointer to the array span for the dictionary which we will always place as the first (and only) child if it exists.
func (*ArraySpan) FillFromScalar ¶
FillFromScalar populates this ArraySpan as if it were a 1 length array with the single value equal to the passed in Scalar.
func (*ArraySpan) GetBuffer ¶
GetBuffer returns the buffer for the requested index. If this buffer is owned by another array/arrayspan the Owning buffer is returned, otherwise if this slice has no owning buffer, we call NewBufferBytes to wrap it as a memory.Buffer. Can also return nil if there is no buffer in this index.
func (*ArraySpan) MakeArray ¶
MakeArray is a convenience function for calling array.MakeFromData(a.MakeData())
func (*ArraySpan) MakeData ¶
MakeData generates an arrow.ArrayData object for this ArraySpan, properly updating the buffer ref count if necessary.
func (*ArraySpan) MayHaveNulls ¶
func (*ArraySpan) NumBuffers ¶
NumBuffers returns the number of expected buffers for this type
func (*ArraySpan) Release ¶
func (a *ArraySpan) Release()
if an error is encountered, call Release on a preallocated span to ensure it releases any self-allocated buffers, it will not call release on buffers it doesn't own (SelfAlloc != true)
func (*ArraySpan) SetDictionary ¶
func (*ArraySpan) SetMembers ¶
SetMembers populates this ArraySpan from the given ArrayData object. As this is a non-owning reference, the ArrayData object must not be fully released while this ArraySpan is in use, otherwise any buffers referenced will be released too
func (*ArraySpan) SetSlice ¶
SetSlice updates the offset and length of this ArraySpan to refer to a specific slice of the underlying buffers.
func (*ArraySpan) TakeOwnership ¶
TakeOwnership is like SetMembers only this takes ownership of the buffers by calling Retain on them so that the passed in ArrayData can be released without negatively affecting this ArraySpan
func (*ArraySpan) UpdateNullCount ¶
UpdateNullCount will count the bits in the null bitmap and update the number of nulls if the current null count is unknown, otherwise it just returns the value of a.Nulls
type BoolIter ¶
type BoolIter struct {
Rdr *bitutil.BitmapReader
}
type BufferSpan ¶
type BufferSpan struct { // Buf should be the byte slice representing this buffer, if this is // nil then this bufferspan should be considered empty. Buf []byte // Owner should point to an underlying parent memory.Buffer if this // memory is owned by a different, existing, buffer. Retain is not // called on this buffer, so it must not be released as long as // this BufferSpan refers to it. Owner *memory.Buffer // SelfAlloc tracks whether or not this bufferspan is the only owner // of the Owning memory.Buffer. This happens when preallocating // memory or if a kernel allocates it's own buffer for a result. // In these cases, we have to know so we can properly maintain the // refcount if this is later turned into an ArrayData object. SelfAlloc bool }
BufferSpan is a lightweight Buffer holder for ArraySpans that does not take ownership of the underlying memory.Buffer at all or could be used to reference raw byte slices instead.
func (*BufferSpan) SetBuffer ¶
func (b *BufferSpan) SetBuffer(buf *memory.Buffer)
SetBuffer sets the given buffer into this BufferSpan and marks SelfAlloc as false. This should be called when setting a buffer that is externally owned/created.
func (*BufferSpan) WrapBuffer ¶
func (b *BufferSpan) WrapBuffer(buf *memory.Buffer)
WrapBuffer wraps this bufferspan around a buffer and marks SelfAlloc as true. This should be called when setting a buffer that was allocated as part of an execution rather than just re-using an existing buffer from an input array.
type ChunkResolver ¶
type ChunkResolver struct {
// contains filtered or unexported fields
}
func NewChunkResolver ¶
func NewChunkResolver(chunks []arrow.Array) *ChunkResolver
func (*ChunkResolver) Resolve ¶
func (c *ChunkResolver) Resolve(idx int64) (chunk, index int64)
type ChunkedExec ¶
type ChunkedExec func(*KernelCtx, []*arrow.Chunked, *ExecResult) ([]*ExecResult, error)
ChunkedExec is the signature for executing a stateful vector kernel against a ChunkedArray input. It is optional
type ExecResult ¶
type ExecResult = ArraySpan
ExecResult is the result of a kernel execution and should be populated by the execution functions and/or a kernel. For now we're just going to alias an ArraySpan.
type ExecSpan ¶
ExecSpan represents a slice of inputs and is used to provide slices of input values to iterate over.
Len is the length of the span (all elements in Values should either be scalar or an array with a length + offset of at least Len).
type ExecValue ¶
ExecValue represents a single input to an execution which could be either an Array (ArraySpan) or a Scalar value
type FinalizeFunc ¶
FinalizeFunc is an optional finalizer function for any postprocessing that may need to be done on data before returning it
type InputKind ¶
type InputKind int8
InputKind is an enum representing the type of Input matching that will be done. Either accepting any type, an exact specific type or using a TypeMatcher.
type InputType ¶
type InputType struct { Kind InputKind Type arrow.DataType Matcher TypeMatcher }
InputType is used for type checking arguments passed to a kernel and stored within a KernelSignature. The type-checking rule can be supplied either with an exact DataType instance or a custom TypeMatcher.
func NewExactInput ¶
func NewIDInput ¶
func NewMatchedInput ¶
func NewMatchedInput(match TypeMatcher) InputType
type Kernel ¶
type Kernel interface { GetInitFn() KernelInitFn GetSig() *KernelSignature }
Kernel defines the minimum interface required for the basic execution kernel. It will grow as the implementation requires.
type KernelCtx ¶
type KernelCtx struct { Ctx context.Context Kernel Kernel State KernelState }
KernelCtx is a small struct holding the context for a kernel execution consisting of a pointer to the kernel, initialized state (if needed) and the context for this execution.
type KernelInitArgs ¶
type KernelInitArgs struct { Kernel Kernel Inputs []arrow.DataType // Options are opaque and specific to the Kernel being initialized, // may be nil if the kernel doesn't require options. Options any }
KernelInitArgs are the arguments required to initialize an Kernel's state using the input types and any options.
type KernelInitFn ¶
type KernelInitFn = func(*KernelCtx, KernelInitArgs) (KernelState, error)
KernelInitFn is any function that receives a KernelCtx and initialization arguments and returns the initialized state or an error.
type KernelSignature ¶
type KernelSignature struct { InputTypes []InputType OutType OutputType IsVarArgs bool // contains filtered or unexported fields }
KernelSignature holds the input and output types for a kernel.
Variable argument functions with a minimum of N arguments should pass up to N input types to be used to validate for invocation. The first N-1 types will be matched against the first N-1 arguments and the last type will be matched against the remaining arguments.
func (KernelSignature) Equals ¶
func (k KernelSignature) Equals(other KernelSignature) bool
func (*KernelSignature) Hash ¶
func (k *KernelSignature) Hash() uint64
func (KernelSignature) MatchesInputs ¶
func (k KernelSignature) MatchesInputs(types []arrow.DataType) bool
func (KernelSignature) String ¶
func (k KernelSignature) String() string
type KernelState ¶
type KernelState any
func OptionsInit ¶
func OptionsInit[T any](_ *KernelCtx, args KernelInitArgs) (KernelState, error)
OptionsInit should be used in the case where a KernelState is simply represented with a specific type by value (instead of pointer). This will initialize the KernelState as a value-copied instance of the passed in function options argument to ensure separation and allow the kernel to manipulate the options if necessary without any negative consequences since it will have its own copy of the options.
type MemAlloc ¶
type MemAlloc int8
MemAlloc is the preference for preallocating memory of fixed-width type outputs during kernel execution.
const ( // For data types that support pre-allocation (fixed-width), the // kernel expects to be provided a pre-allocated buffer to write into. // Non-fixed-width types must always allocate their own buffers. // The allocation is made for the same length as the execution batch, // so vector kernels yielding differently sized outputs should not // use this. // // It is valid for the data to not be preallocated but the validity // bitmap is (or is computed using intersection). // // For variable-size output types like Binary or String, or for nested // types, this option has no effect. MemPrealloc MemAlloc = iota // The kernel is responsible for allocating its own data buffer // for fixed-width output types. MemNoPrealloc )
type NonAggKernel ¶
type NonAggKernel interface { Kernel Exec(*KernelCtx, *ExecSpan, *ExecResult) error GetNullHandling() NullHandling GetMemAlloc() MemAlloc CanFillSlices() bool }
NonAggKernel builds on the base Kernel interface for non aggregate execution kernels. Specifically this will represent Scalar and Vector kernels.
type NullHandling ¶
type NullHandling int8
NullHandling is an enum representing how a particular Kernel wants the executor to handle nulls.
const ( // Compute the output validity bitmap by intersection the validity // bitmaps of the arguments using bitwise-and operations. This means // that values in the output are valid/non-null only if the corresponding // values in all input arguments were valid/non-null. Kernels generally // do not have to touch the bitmap afterwards, but a kernel's exec function // is permitted to alter the bitmap after the null intersection is computed // if necessary. NullIntersection NullHandling = iota // Kernel expects a pre-allocated buffer to write the result bitmap // into. NullComputedPrealloc // Kernel will allocate and set the validity bitmap of the output NullComputedNoPrealloc // kernel output is never null and a validity bitmap doesn't need to // be allocated NullNoOutput )
type OutputType ¶
type OutputType struct { Kind ResolveKind Type arrow.DataType Resolver TypeResolver }
func NewComputedOutputType ¶
func NewComputedOutputType(resolver TypeResolver) OutputType
func NewOutputType ¶
func NewOutputType(dt arrow.DataType) OutputType
func (OutputType) String ¶
func (o OutputType) String() string
type PrimitiveIter ¶
type PrimitiveIter[T arrow.FixedWidthType] struct { Values []T }
func (*PrimitiveIter[T]) Next ¶
func (p *PrimitiveIter[T]) Next() (v T)
type ResolveKind ¶
type ResolveKind int8
ResolveKind defines the way that a particular OutputType resolves its type. Either it has a fixed type to resolve to or it contains a Resolver which will compute the resolved type based on the input types.
const ( ResolveFixed ResolveKind = iota ResolveComputed )
type ScalarKernel ¶
type ScalarKernel struct { ExecFn ArrayKernelExec CanWriteIntoSlices bool NullHandling NullHandling MemAlloc MemAlloc // contains filtered or unexported fields }
A ScalarKernel is the kernel implementation for a Scalar Function. In addition to the members found in the base Kernel, it contains the null handling and memory pre-allocation preferences.
func NewScalarKernel ¶
func NewScalarKernel(in []InputType, out OutputType, exec ArrayKernelExec, init KernelInitFn) ScalarKernel
NewScalarKernel constructs a new kernel for scalar execution, constructing a KernelSignature with the provided input types and output type, and using the passed in execution implementation and initialization function.
func NewScalarKernelWithSig ¶
func NewScalarKernelWithSig(sig *KernelSignature, exec ArrayKernelExec, init KernelInitFn) ScalarKernel
NewScalarKernelWithSig is a convenience when you already have a signature to use for constructing a kernel. It's equivalent to passing the components of the signature (input and output types) to NewScalarKernel.
func (ScalarKernel) CanFillSlices ¶
func (s ScalarKernel) CanFillSlices() bool
func (*ScalarKernel) Exec ¶
func (s *ScalarKernel) Exec(ctx *KernelCtx, sp *ExecSpan, out *ExecResult) error
func (ScalarKernel) GetInitFn ¶
func (k ScalarKernel) GetInitFn() KernelInitFn
func (ScalarKernel) GetMemAlloc ¶
func (s ScalarKernel) GetMemAlloc() MemAlloc
func (ScalarKernel) GetNullHandling ¶
func (s ScalarKernel) GetNullHandling() NullHandling
func (ScalarKernel) GetSig ¶
func (k ScalarKernel) GetSig() *KernelSignature
type TypeMatcher ¶
type TypeMatcher interface { fmt.Stringer Matches(typ arrow.DataType) bool Equals(other TypeMatcher) bool }
TypeMatcher define an interface for matching Input or Output types for execution kernels. There are multiple implementations of this interface provided by this package.
func BinaryLike ¶
func BinaryLike() TypeMatcher
BinaryLike returns a TypeMatcher that will match Binary or String
func DurationTypeUnit ¶
func DurationTypeUnit(unit arrow.TimeUnit) TypeMatcher
DurationTypeUnit returns a TypeMatcher that will match only a Duration datatype with the specified TimeUnit.
func FixedSizeBinaryLike ¶
func FixedSizeBinaryLike() TypeMatcher
FixedSizeBinaryLike returns a TypeMatcher that will match FixedSizeBinary or Decimal128/256
func Integer ¶
func Integer() TypeMatcher
Integer returns a TypeMatcher which will match any integral type like int8 or uint16
func LargeBinaryLike ¶
func LargeBinaryLike() TypeMatcher
LargeBinaryLike returns a TypeMatcher which will match LargeBinary or LargeString
func Primitive ¶
func Primitive() TypeMatcher
Primitive returns a TypeMatcher that will match any type that arrow.IsPrimitive returns true for.
func RunEndEncoded ¶
func RunEndEncoded(runEndsMatcher, encodedMatcher TypeMatcher) TypeMatcher
RunEndEncoded returns a matcher which matches a RunEndEncoded type whose encoded type is matched by the passed in matcher.
func SameTypeID ¶
func SameTypeID(id arrow.Type) TypeMatcher
SameTypeID returns a type matcher which will match any DataType that uses the same arrow.Type ID as the one passed in here.
func Time32TypeUnit ¶
func Time32TypeUnit(unit arrow.TimeUnit) TypeMatcher
Time32TypeUnit returns a TypeMatcher that will match only a Time32 datatype with the specified TimeUnit.
func Time64TypeUnit ¶
func Time64TypeUnit(unit arrow.TimeUnit) TypeMatcher
Time64TypeUnit returns a TypeMatcher that will match only a Time64 datatype with the specified TimeUnit.
func TimestampTypeUnit ¶
func TimestampTypeUnit(unit arrow.TimeUnit) TypeMatcher
TimestampTypeUnit returns a TypeMatcher that will match only a Timestamp datatype with the specified TimeUnit.
type TypeResolver ¶
TypeResolver is simply a function that takes a KernelCtx and a list of input types and returns the resolved type or an error.
type VarBinaryIter ¶
func (*VarBinaryIter[OffsetT]) Next ¶
func (v *VarBinaryIter[OffsetT]) Next() []byte
type VectorKernel ¶
type VectorKernel struct { ExecFn ArrayKernelExec ExecChunked ChunkedExec Finalize FinalizeFunc NullHandling NullHandling MemAlloc MemAlloc CanWriteIntoSlices bool CanExecuteChunkWise bool OutputChunked bool // contains filtered or unexported fields }
VectorKernel is a structure for implementations of vector functions. It can optionally contain a finalizer function, the null handling and memory pre-allocation preferences (different defaults from scalar kernels when using NewVectorKernel), and other execution related options.
func NewVectorKernel ¶
func NewVectorKernel(inTypes []InputType, outType OutputType, exec ArrayKernelExec, init KernelInitFn) VectorKernel
NewVectorKernel constructs a new kernel for execution of vector functions, which take into account more than just the individual scalar values of its input. Output of a vector kernel may be a different length than its inputs.
func NewVectorKernelWithSig ¶
func NewVectorKernelWithSig(sig *KernelSignature, exec ArrayKernelExec, init KernelInitFn) VectorKernel
NewVectorKernelWithSig is a convenience function for creating a kernel when you already have a signature constructed.
func (VectorKernel) CanFillSlices ¶
func (s VectorKernel) CanFillSlices() bool
func (*VectorKernel) Exec ¶
func (s *VectorKernel) Exec(ctx *KernelCtx, sp *ExecSpan, out *ExecResult) error
func (VectorKernel) GetInitFn ¶
func (k VectorKernel) GetInitFn() KernelInitFn
func (VectorKernel) GetMemAlloc ¶
func (s VectorKernel) GetMemAlloc() MemAlloc
func (VectorKernel) GetNullHandling ¶
func (s VectorKernel) GetNullHandling() NullHandling
func (VectorKernel) GetSig ¶
func (k VectorKernel) GetSig() *KernelSignature