exec

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 10, 2024 License: Apache-2.0, BSD-2-Clause, BSD-3-Clause, + 7 more Imports: 15 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ArrayFromSlice

func ArrayFromSlice[T arrow.NumericType | bool](mem memory.Allocator, data []T) arrow.Array

func ArrayFromSliceWithValid

func ArrayFromSliceWithValid[T arrow.NumericType | bool](mem memory.Allocator, data []T, valid []bool) arrow.Array

func FillZeroLength

func FillZeroLength(dt arrow.DataType, span *ArraySpan)

FillZeroLength fills an ArraySpan with the appropriate information for a Zero Length Array of the provided type.

func GetAllocator

func GetAllocator(ctx context.Context) memory.Allocator

GetAllocator retrieves the allocator from the context, or returns memory.DefaultAllocator if there was no allocator in the provided context.

func GetSpanOffsets

func GetSpanOffsets[T int32 | int64](span *ArraySpan, i int) []T

GetSpanOffsets is like GetSpanValues, except it is only for int32 or int64 and adds the additional 1 expected value for an offset buffer (ie. len(output) == span.Len+1)

func GetSpanValues

func GetSpanValues[T arrow.FixedWidthType](span *ArraySpan, i int) []T

GetSpanValues returns a properly typed slice by reinterpreting the buffer at index i using unsafe.Slice. This will take into account the offset of the given ArraySpan.

func HashCombine

func HashCombine(seed, value uint64) uint64

func Max

func Max[T constraints.Ordered](a, b T) T

func Min

func Min[T constraints.Ordered](a, b T) T

func PromoteExecSpanScalars

func PromoteExecSpanScalars(span ExecSpan)

PromoteExecSpanScalars promotes the values of the passed in ExecSpan from scalars to Arrays of length 1 for each value.

func RechunkArraysConsistently

func RechunkArraysConsistently(groups [][]arrow.Array) [][]arrow.Array

func WithAllocator

func WithAllocator(ctx context.Context, mem memory.Allocator) context.Context

WithAllocator returns a new context with the provided allocator embedded into the context.

Types

type ArrayIter

type ArrayIter[T arrayTypes] interface {
	Next() T
}

func NewBoolIter

func NewBoolIter(arr *ArraySpan) ArrayIter[bool]

func NewFSBIter

func NewFSBIter(arr *ArraySpan) ArrayIter[[]byte]

func NewPrimitiveIter

func NewPrimitiveIter[T arrow.FixedWidthType](arr *ArraySpan) ArrayIter[T]

func NewVarBinaryIter

func NewVarBinaryIter[OffsetT int32 | int64](arr *ArraySpan) ArrayIter[[]byte]

type ArrayKernelExec

type ArrayKernelExec = func(*KernelCtx, *ExecSpan, *ExecResult) error

ArrayKernelExec is an alias definition for a kernel's execution function.

This is used for both stateless and stateful kernels. If a kernel depends on some execution state, it can be accessed from the KernelCtx object, which also contains the context.Context object which can be used for shortcircuiting by checking context.Done / context.Err. This allows kernels to control handling timeouts or cancellation of computation.

type ArraySpan

type ArraySpan struct {
	Type    arrow.DataType
	Len     int64
	Nulls   int64
	Offset  int64
	Buffers [3]BufferSpan

	// Scratch is a holding spot for things such as
	// offsets or union type codes when converting from scalars
	Scratch [2]uint64

	Children []ArraySpan
}

ArraySpan is a light-weight, non-owning version of arrow.ArrayData for more efficient handling with computation and engines. We use explicit go Arrays to define the buffers and some scratch space for easily populating and shifting around pointers to memory without having to worry about and deal with retain/release during calculations.

func (*ArraySpan) Dictionary

func (a *ArraySpan) Dictionary() *ArraySpan

Dictionary returns a pointer to the array span for the dictionary which we will always place as the first (and only) child if it exists.

func (*ArraySpan) FillFromScalar

func (a *ArraySpan) FillFromScalar(val scalar.Scalar)

FillFromScalar populates this ArraySpan as if it were a 1 length array with the single value equal to the passed in Scalar.

func (*ArraySpan) GetBuffer

func (a *ArraySpan) GetBuffer(idx int) *memory.Buffer

GetBuffer returns the buffer for the requested index. If this buffer is owned by another array/arrayspan the Owning buffer is returned, otherwise if this slice has no owning buffer, we call NewBufferBytes to wrap it as a memory.Buffer. Can also return nil if there is no buffer in this index.

func (*ArraySpan) MakeArray

func (a *ArraySpan) MakeArray() arrow.Array

MakeArray is a convenience function for calling array.MakeFromData(a.MakeData())

func (*ArraySpan) MakeData

func (a *ArraySpan) MakeData() arrow.ArrayData

MakeData generates an arrow.ArrayData object for this ArraySpan, properly updating the buffer ref count if necessary.

func (*ArraySpan) MayHaveNulls

func (a *ArraySpan) MayHaveNulls() bool

func (*ArraySpan) NumBuffers

func (a *ArraySpan) NumBuffers() int

NumBuffers returns the number of expected buffers for this type

func (*ArraySpan) Release

func (a *ArraySpan) Release()

if an error is encountered, call Release on a preallocated span to ensure it releases any self-allocated buffers, it will not call release on buffers it doesn't own (SelfAlloc != true)

func (*ArraySpan) SetDictionary

func (a *ArraySpan) SetDictionary(span *ArraySpan)

func (*ArraySpan) SetMembers

func (a *ArraySpan) SetMembers(data arrow.ArrayData)

SetMembers populates this ArraySpan from the given ArrayData object. As this is a non-owning reference, the ArrayData object must not be fully released while this ArraySpan is in use, otherwise any buffers referenced will be released too

func (*ArraySpan) SetSlice

func (a *ArraySpan) SetSlice(off, length int64)

SetSlice updates the offset and length of this ArraySpan to refer to a specific slice of the underlying buffers.

func (*ArraySpan) TakeOwnership

func (a *ArraySpan) TakeOwnership(data arrow.ArrayData)

TakeOwnership is like SetMembers only this takes ownership of the buffers by calling Retain on them so that the passed in ArrayData can be released without negatively affecting this ArraySpan

func (*ArraySpan) UpdateNullCount

func (a *ArraySpan) UpdateNullCount() int64

UpdateNullCount will count the bits in the null bitmap and update the number of nulls if the current null count is unknown, otherwise it just returns the value of a.Nulls

type BoolIter

type BoolIter struct {
	Rdr *bitutil.BitmapReader
}

func (*BoolIter) Next

func (b *BoolIter) Next() (out bool)

type BufferSpan

type BufferSpan struct {
	// Buf should be the byte slice representing this buffer, if this is
	// nil then this bufferspan should be considered empty.
	Buf []byte
	// Owner should point to an underlying parent memory.Buffer if this
	// memory is owned by a different, existing, buffer. Retain is not
	// called on this buffer, so it must not be released as long as
	// this BufferSpan refers to it.
	Owner *memory.Buffer
	// SelfAlloc tracks whether or not this bufferspan is the only owner
	// of the Owning memory.Buffer. This happens when preallocating
	// memory or if a kernel allocates it's own buffer for a result.
	// In these cases, we have to know so we can properly maintain the
	// refcount if this is later turned into an ArrayData object.
	SelfAlloc bool
}

BufferSpan is a lightweight Buffer holder for ArraySpans that does not take ownership of the underlying memory.Buffer at all or could be used to reference raw byte slices instead.

func (*BufferSpan) SetBuffer

func (b *BufferSpan) SetBuffer(buf *memory.Buffer)

SetBuffer sets the given buffer into this BufferSpan and marks SelfAlloc as false. This should be called when setting a buffer that is externally owned/created.

func (*BufferSpan) WrapBuffer

func (b *BufferSpan) WrapBuffer(buf *memory.Buffer)

WrapBuffer wraps this bufferspan around a buffer and marks SelfAlloc as true. This should be called when setting a buffer that was allocated as part of an execution rather than just re-using an existing buffer from an input array.

type ChunkResolver

type ChunkResolver struct {
	// contains filtered or unexported fields
}

func NewChunkResolver

func NewChunkResolver(chunks []arrow.Array) *ChunkResolver

func (*ChunkResolver) Resolve

func (c *ChunkResolver) Resolve(idx int64) (chunk, index int64)

type ChunkedExec

type ChunkedExec func(*KernelCtx, []*arrow.Chunked, *ExecResult) ([]*ExecResult, error)

ChunkedExec is the signature for executing a stateful vector kernel against a ChunkedArray input. It is optional

type ExecResult

type ExecResult = ArraySpan

ExecResult is the result of a kernel execution and should be populated by the execution functions and/or a kernel. For now we're just going to alias an ArraySpan.

type ExecSpan

type ExecSpan struct {
	Len    int64
	Values []ExecValue
}

ExecSpan represents a slice of inputs and is used to provide slices of input values to iterate over.

Len is the length of the span (all elements in Values should either be scalar or an array with a length + offset of at least Len).

type ExecValue

type ExecValue struct {
	Array  ArraySpan
	Scalar scalar.Scalar
}

ExecValue represents a single input to an execution which could be either an Array (ArraySpan) or a Scalar value

func (*ExecValue) IsArray

func (e *ExecValue) IsArray() bool

func (*ExecValue) IsScalar

func (e *ExecValue) IsScalar() bool

func (*ExecValue) Type

func (e *ExecValue) Type() arrow.DataType

type FSBIter

type FSBIter struct {
	Data  []byte
	Width int
	Pos   int64
}

func (*FSBIter) Next

func (f *FSBIter) Next() []byte

type FinalizeFunc

type FinalizeFunc func(*KernelCtx, []*ArraySpan) ([]*ArraySpan, error)

FinalizeFunc is an optional finalizer function for any postprocessing that may need to be done on data before returning it

type InputKind

type InputKind int8

InputKind is an enum representing the type of Input matching that will be done. Either accepting any type, an exact specific type or using a TypeMatcher.

const (
	InputAny InputKind = iota
	InputExact
	InputUseMatcher
)

type InputType

type InputType struct {
	Kind    InputKind
	Type    arrow.DataType
	Matcher TypeMatcher
}

InputType is used for type checking arguments passed to a kernel and stored within a KernelSignature. The type-checking rule can be supplied either with an exact DataType instance or a custom TypeMatcher.

func NewExactInput

func NewExactInput(dt arrow.DataType) InputType

func NewIDInput

func NewIDInput(id arrow.Type) InputType

func NewMatchedInput

func NewMatchedInput(match TypeMatcher) InputType

func (*InputType) Equals

func (it *InputType) Equals(other *InputType) bool

func (InputType) Hash

func (it InputType) Hash() uint64

func (InputType) MatchID

func (it InputType) MatchID() arrow.Type

func (InputType) Matches

func (it InputType) Matches(dt arrow.DataType) bool

func (InputType) String

func (it InputType) String() string

type Kernel

type Kernel interface {
	GetInitFn() KernelInitFn
	GetSig() *KernelSignature
}

Kernel defines the minimum interface required for the basic execution kernel. It will grow as the implementation requires.

type KernelCtx

type KernelCtx struct {
	Ctx    context.Context
	Kernel Kernel
	State  KernelState
}

KernelCtx is a small struct holding the context for a kernel execution consisting of a pointer to the kernel, initialized state (if needed) and the context for this execution.

func (*KernelCtx) Allocate

func (k *KernelCtx) Allocate(bufsize int) *memory.Buffer

func (*KernelCtx) AllocateBitmap

func (k *KernelCtx) AllocateBitmap(nbits int64) *memory.Buffer

type KernelInitArgs

type KernelInitArgs struct {
	Kernel Kernel
	Inputs []arrow.DataType
	// Options are opaque and specific to the Kernel being initialized,
	// may be nil if the kernel doesn't require options.
	Options any
}

KernelInitArgs are the arguments required to initialize an Kernel's state using the input types and any options.

type KernelInitFn

type KernelInitFn = func(*KernelCtx, KernelInitArgs) (KernelState, error)

KernelInitFn is any function that receives a KernelCtx and initialization arguments and returns the initialized state or an error.

type KernelSignature

type KernelSignature struct {
	InputTypes []InputType
	OutType    OutputType
	IsVarArgs  bool
	// contains filtered or unexported fields
}

KernelSignature holds the input and output types for a kernel.

Variable argument functions with a minimum of N arguments should pass up to N input types to be used to validate for invocation. The first N-1 types will be matched against the first N-1 arguments and the last type will be matched against the remaining arguments.

func (KernelSignature) Equals

func (k KernelSignature) Equals(other KernelSignature) bool

func (*KernelSignature) Hash

func (k *KernelSignature) Hash() uint64

func (KernelSignature) MatchesInputs

func (k KernelSignature) MatchesInputs(types []arrow.DataType) bool

func (KernelSignature) String

func (k KernelSignature) String() string

type KernelState

type KernelState any

func OptionsInit

func OptionsInit[T any](_ *KernelCtx, args KernelInitArgs) (KernelState, error)

OptionsInit should be used in the case where a KernelState is simply represented with a specific type by value (instead of pointer). This will initialize the KernelState as a value-copied instance of the passed in function options argument to ensure separation and allow the kernel to manipulate the options if necessary without any negative consequences since it will have its own copy of the options.

type MemAlloc

type MemAlloc int8

MemAlloc is the preference for preallocating memory of fixed-width type outputs during kernel execution.

const (
	// For data types that support pre-allocation (fixed-width), the
	// kernel expects to be provided a pre-allocated buffer to write into.
	// Non-fixed-width types must always allocate their own buffers.
	// The allocation is made for the same length as the execution batch,
	// so vector kernels yielding differently sized outputs should not
	// use this.
	//
	// It is valid for the data to not be preallocated but the validity
	// bitmap is (or is computed using intersection).
	//
	// For variable-size output types like Binary or String, or for nested
	// types, this option has no effect.
	MemPrealloc MemAlloc = iota
	// The kernel is responsible for allocating its own data buffer
	// for fixed-width output types.
	MemNoPrealloc
)

type NonAggKernel

type NonAggKernel interface {
	Kernel
	Exec(*KernelCtx, *ExecSpan, *ExecResult) error
	GetNullHandling() NullHandling
	GetMemAlloc() MemAlloc
	CanFillSlices() bool
}

NonAggKernel builds on the base Kernel interface for non aggregate execution kernels. Specifically this will represent Scalar and Vector kernels.

type NullHandling

type NullHandling int8

NullHandling is an enum representing how a particular Kernel wants the executor to handle nulls.

const (
	// Compute the output validity bitmap by intersection the validity
	// bitmaps of the arguments using bitwise-and operations. This means
	// that values in the output are valid/non-null only if the corresponding
	// values in all input arguments were valid/non-null. Kernels generally
	// do not have to touch the bitmap afterwards, but a kernel's exec function
	// is permitted to alter the bitmap after the null intersection is computed
	// if necessary.
	NullIntersection NullHandling = iota
	// Kernel expects a pre-allocated buffer to write the result bitmap
	// into.
	NullComputedPrealloc
	// Kernel will allocate and set the validity bitmap of the output
	NullComputedNoPrealloc
	// kernel output is never null and a validity bitmap doesn't need to
	// be allocated
	NullNoOutput
)

type OutputType

type OutputType struct {
	Kind     ResolveKind
	Type     arrow.DataType
	Resolver TypeResolver
}

func NewComputedOutputType

func NewComputedOutputType(resolver TypeResolver) OutputType

func NewOutputType

func NewOutputType(dt arrow.DataType) OutputType

func (OutputType) Resolve

func (o OutputType) Resolve(ctx *KernelCtx, types []arrow.DataType) (arrow.DataType, error)

func (OutputType) String

func (o OutputType) String() string

type PrimitiveIter

type PrimitiveIter[T arrow.FixedWidthType] struct {
	Values []T
}

func (*PrimitiveIter[T]) Next

func (p *PrimitiveIter[T]) Next() (v T)

type ResolveKind

type ResolveKind int8

ResolveKind defines the way that a particular OutputType resolves its type. Either it has a fixed type to resolve to or it contains a Resolver which will compute the resolved type based on the input types.

const (
	ResolveFixed ResolveKind = iota
	ResolveComputed
)

type ScalarKernel

type ScalarKernel struct {
	ExecFn             ArrayKernelExec
	CanWriteIntoSlices bool
	NullHandling       NullHandling
	MemAlloc           MemAlloc
	// contains filtered or unexported fields
}

A ScalarKernel is the kernel implementation for a Scalar Function. In addition to the members found in the base Kernel, it contains the null handling and memory pre-allocation preferences.

func NewScalarKernel

func NewScalarKernel(in []InputType, out OutputType, exec ArrayKernelExec, init KernelInitFn) ScalarKernel

NewScalarKernel constructs a new kernel for scalar execution, constructing a KernelSignature with the provided input types and output type, and using the passed in execution implementation and initialization function.

func NewScalarKernelWithSig

func NewScalarKernelWithSig(sig *KernelSignature, exec ArrayKernelExec, init KernelInitFn) ScalarKernel

NewScalarKernelWithSig is a convenience when you already have a signature to use for constructing a kernel. It's equivalent to passing the components of the signature (input and output types) to NewScalarKernel.

func (ScalarKernel) CanFillSlices

func (s ScalarKernel) CanFillSlices() bool

func (*ScalarKernel) Exec

func (s *ScalarKernel) Exec(ctx *KernelCtx, sp *ExecSpan, out *ExecResult) error

func (ScalarKernel) GetInitFn

func (k ScalarKernel) GetInitFn() KernelInitFn

func (ScalarKernel) GetMemAlloc

func (s ScalarKernel) GetMemAlloc() MemAlloc

func (ScalarKernel) GetNullHandling

func (s ScalarKernel) GetNullHandling() NullHandling

func (ScalarKernel) GetSig

func (k ScalarKernel) GetSig() *KernelSignature

type TypeMatcher

type TypeMatcher interface {
	fmt.Stringer
	Matches(typ arrow.DataType) bool
	Equals(other TypeMatcher) bool
}

TypeMatcher define an interface for matching Input or Output types for execution kernels. There are multiple implementations of this interface provided by this package.

func BinaryLike

func BinaryLike() TypeMatcher

BinaryLike returns a TypeMatcher that will match Binary or String

func DurationTypeUnit

func DurationTypeUnit(unit arrow.TimeUnit) TypeMatcher

DurationTypeUnit returns a TypeMatcher that will match only a Duration datatype with the specified TimeUnit.

func FixedSizeBinaryLike

func FixedSizeBinaryLike() TypeMatcher

FixedSizeBinaryLike returns a TypeMatcher that will match FixedSizeBinary or Decimal128/256

func Integer

func Integer() TypeMatcher

Integer returns a TypeMatcher which will match any integral type like int8 or uint16

func LargeBinaryLike

func LargeBinaryLike() TypeMatcher

LargeBinaryLike returns a TypeMatcher which will match LargeBinary or LargeString

func Primitive

func Primitive() TypeMatcher

Primitive returns a TypeMatcher that will match any type that arrow.IsPrimitive returns true for.

func RunEndEncoded

func RunEndEncoded(runEndsMatcher, encodedMatcher TypeMatcher) TypeMatcher

RunEndEncoded returns a matcher which matches a RunEndEncoded type whose encoded type is matched by the passed in matcher.

func SameTypeID

func SameTypeID(id arrow.Type) TypeMatcher

SameTypeID returns a type matcher which will match any DataType that uses the same arrow.Type ID as the one passed in here.

func Time32TypeUnit

func Time32TypeUnit(unit arrow.TimeUnit) TypeMatcher

Time32TypeUnit returns a TypeMatcher that will match only a Time32 datatype with the specified TimeUnit.

func Time64TypeUnit

func Time64TypeUnit(unit arrow.TimeUnit) TypeMatcher

Time64TypeUnit returns a TypeMatcher that will match only a Time64 datatype with the specified TimeUnit.

func TimestampTypeUnit

func TimestampTypeUnit(unit arrow.TimeUnit) TypeMatcher

TimestampTypeUnit returns a TypeMatcher that will match only a Timestamp datatype with the specified TimeUnit.

type TypeResolver

type TypeResolver = func(*KernelCtx, []arrow.DataType) (arrow.DataType, error)

TypeResolver is simply a function that takes a KernelCtx and a list of input types and returns the resolved type or an error.

type VarBinaryIter

type VarBinaryIter[OffsetT int32 | int64] struct {
	Offsets []OffsetT
	Data    []byte
	Pos     int64
}

func (*VarBinaryIter[OffsetT]) Next

func (v *VarBinaryIter[OffsetT]) Next() []byte

type VectorKernel

type VectorKernel struct {
	ExecFn              ArrayKernelExec
	ExecChunked         ChunkedExec
	Finalize            FinalizeFunc
	NullHandling        NullHandling
	MemAlloc            MemAlloc
	CanWriteIntoSlices  bool
	CanExecuteChunkWise bool
	OutputChunked       bool
	// contains filtered or unexported fields
}

VectorKernel is a structure for implementations of vector functions. It can optionally contain a finalizer function, the null handling and memory pre-allocation preferences (different defaults from scalar kernels when using NewVectorKernel), and other execution related options.

func NewVectorKernel

func NewVectorKernel(inTypes []InputType, outType OutputType, exec ArrayKernelExec, init KernelInitFn) VectorKernel

NewVectorKernel constructs a new kernel for execution of vector functions, which take into account more than just the individual scalar values of its input. Output of a vector kernel may be a different length than its inputs.

func NewVectorKernelWithSig

func NewVectorKernelWithSig(sig *KernelSignature, exec ArrayKernelExec, init KernelInitFn) VectorKernel

NewVectorKernelWithSig is a convenience function for creating a kernel when you already have a signature constructed.

func (VectorKernel) CanFillSlices

func (s VectorKernel) CanFillSlices() bool

func (*VectorKernel) Exec

func (s *VectorKernel) Exec(ctx *KernelCtx, sp *ExecSpan, out *ExecResult) error

func (VectorKernel) GetInitFn

func (k VectorKernel) GetInitFn() KernelInitFn

func (VectorKernel) GetMemAlloc

func (s VectorKernel) GetMemAlloc() MemAlloc

func (VectorKernel) GetNullHandling

func (s VectorKernel) GetNullHandling() NullHandling

func (VectorKernel) GetSig

func (k VectorKernel) GetSig() *KernelSignature

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL