encoding

package
v9.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 28, 2022 License: Apache-2.0, BSD-2-Clause, BSD-3-Clause, + 8 more Imports: 24 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	Int32EncoderTraits             int32EncoderTraits
	Int32DecoderTraits             int32DecoderTraits
	Int64EncoderTraits             int64EncoderTraits
	Int64DecoderTraits             int64DecoderTraits
	Int96EncoderTraits             int96EncoderTraits
	Int96DecoderTraits             int96DecoderTraits
	Float32EncoderTraits           float32EncoderTraits
	Float32DecoderTraits           float32DecoderTraits
	Float64EncoderTraits           float64EncoderTraits
	Float64DecoderTraits           float64DecoderTraits
	BooleanEncoderTraits           boolEncoderTraits
	BooleanDecoderTraits           boolDecoderTraits
	ByteArrayEncoderTraits         byteArrayEncoderTraits
	ByteArrayDecoderTraits         byteArrayDecoderTraits
	FixedLenByteArrayEncoderTraits fixedLenByteArrayEncoderTraits
	FixedLenByteArrayDecoderTraits fixedLenByteArrayDecoderTraits
)

Functions

func LevelEncodingMaxBufferSize

func LevelEncodingMaxBufferSize(encoding parquet.Encoding, maxLvl int16, nbuffered int) int

LevelEncodingMaxBufferSize estimates the max number of bytes needed to encode data with the specified encoding given the max level and number of buffered values provided.

func NewDictConverter

func NewDictConverter(dict TypedDecoder) utils.DictionaryConverter

NewDictConverter creates a dict converter of the appropriate type, using the passed in decoder as the decoder to decode the dictionary index.

Types

type BinaryMemoTable

type BinaryMemoTable interface {
	MemoTable
	// ValuesSize returns the total number of bytes needed to copy all of the values
	// from this table.
	ValuesSize() int
	// CopyOffsets populates out with the start and end offsets of each value in the
	// table data. Out should be sized to Size()+1 to accomodate all of the offsets.
	CopyOffsets(out []int32)
	// CopyOffsetsSubset is like CopyOffsets but only gets a subset of the offsets
	// starting at the specified index.
	CopyOffsetsSubset(start int, out []int32)
	// CopyFixedWidthValues exists to cope with the fact that the table doesn't track
	// the fixed width when inserting the null value into the databuffer populating
	// a zero length byte slice for the null value (if found).
	CopyFixedWidthValues(start int, width int, out []byte)
	// VisitValues calls visitFn on each value in the table starting with the index specified
	VisitValues(start int, visitFn func([]byte))
	// Retain increases the reference count of the separately stored binary data that is
	// kept alongside the table which contains all of the values in the table. This is
	// safe to call simultaneously across multiple goroutines.
	Retain()
	// Release decreases the reference count by 1 of the separately stored binary data
	// kept alongside the table containing the values. When the reference count goes to
	// 0, the memory is freed. This is safe to call across multiple goroutines simultaneoulsy.
	Release()
}

BinaryMemoTable is an extension of the MemoTable interface adding extra methods for handling byte arrays/strings/fixed length byte arrays.

func NewBinaryDictionary

func NewBinaryDictionary(mem memory.Allocator) BinaryMemoTable

NewBinaryDictionary returns a memotable interface for use with strings, byte slices, parquet.ByteArray and parquet.FixedLengthByteArray only.

func NewBinaryMemoTable

func NewBinaryMemoTable(mem memory.Allocator) BinaryMemoTable

type BooleanDecoder

type BooleanDecoder interface {
	TypedDecoder
	Decode([]bool) (int, error)
	DecodeSpaced([]bool, int, []byte, int64) (int, error)
}

BooleanDecoder is the interface for all encoding types that implement decoding bool values.

type BooleanEncoder

type BooleanEncoder interface {
	TypedEncoder
	Put([]bool)
	PutSpaced([]bool, []byte, int64)
}

BooleanEncoder is the interface for all encoding types that implement encoding bool values.

type Buffer

type Buffer interface {
	Len() int
	Buf() []byte
	Bytes() []byte
	Resize(int)
	Release()
}

Buffer is an interface used as a general interface for handling buffers regardless of the underlying implementation.

type BufferWriter

type BufferWriter struct {
	// contains filtered or unexported fields
}

BufferWriter is a utility class for building and writing to a memory.Buffer with a given allocator that fulfills the interfaces io.Write, io.WriteAt and io.Seeker, while providing the ability to pre-allocate memory.

func NewBufferWriter

func NewBufferWriter(initial int, mem memory.Allocator) *BufferWriter

NewBufferWriter constructs a buffer with initially reserved/allocated memory.

func NewBufferWriterFromBuffer

func NewBufferWriterFromBuffer(b *memory.Buffer, mem memory.Allocator) *BufferWriter

NewBufferWriterFromBuffer wraps the provided buffer to allow it to fulfill these interfaces.

func (*BufferWriter) Bytes

func (b *BufferWriter) Bytes() []byte

Bytes returns the current bytes slice of slice Len

func (*BufferWriter) Cap

func (b *BufferWriter) Cap() int

Cap returns the current capacity of the underlying buffer

func (*BufferWriter) Finish

func (b *BufferWriter) Finish() *memory.Buffer

Finish returns the current buffer, with the responsibility for releasing the memory on the caller, resetting this writer to be re-used

func (*BufferWriter) Len

func (b *BufferWriter) Len() int

Len provides the current Length of the byte slice

func (*BufferWriter) Reserve

func (b *BufferWriter) Reserve(nbytes int)

Reserve ensures that there is at least enough capacity to write nbytes without another allocation, may allocate more than that in order to efficiently reduce allocations

func (*BufferWriter) Reset

func (b *BufferWriter) Reset(initial int)

Reset will release any current memory and initialize it with the new allocated bytes.

func (*BufferWriter) Seek

func (b *BufferWriter) Seek(offset int64, whence int) (int64, error)

Seek fulfills the io.Seeker interface returning it's new position whence must be io.SeekStart, io.SeekCurrent or io.SeekEnd or it will be ignored.

func (*BufferWriter) SetOffset

func (b *BufferWriter) SetOffset(offset int)

func (*BufferWriter) Tell

func (b *BufferWriter) Tell() int64

func (*BufferWriter) Truncate

func (b *BufferWriter) Truncate()

func (*BufferWriter) UnsafeWrite

func (b *BufferWriter) UnsafeWrite(buf []byte) (int, error)

UnsafeWrite does not check the capacity / length before writing.

func (*BufferWriter) UnsafeWriteCopy

func (b *BufferWriter) UnsafeWriteCopy(ncopies int, pattern []byte) (int, error)

func (*BufferWriter) Write

func (b *BufferWriter) Write(buf []byte) (int, error)

func (*BufferWriter) WriteAt

func (b *BufferWriter) WriteAt(p []byte, offset int64) (n int, err error)

WriteAt writes the bytes from p into this buffer starting at offset.

Does not affect the internal position of the writer.

type ByteArrayDecoder

type ByteArrayDecoder interface {
	TypedDecoder
	Decode([]parquet.ByteArray) (int, error)
	DecodeSpaced([]parquet.ByteArray, int, []byte, int64) (int, error)
}

ByteArrayDecoder is the interface for all encoding types that implement decoding parquet.ByteArray values.

type ByteArrayDictConverter

type ByteArrayDictConverter struct {
	// contains filtered or unexported fields
}

ByteArrayDictConverter is a helper for dictionary handling which is used for converting run length encoded indexes into the actual values that are stored in the dictionary index page.

func (*ByteArrayDictConverter) Copy

func (dc *ByteArrayDictConverter) Copy(out interface{}, vals []utils.IndexType) error

Copy populates the slice provided with the values in the dictionary at the indexes in the vals slice.

func (*ByteArrayDictConverter) Fill

func (dc *ByteArrayDictConverter) Fill(out interface{}, val utils.IndexType) error

Fill populates the slice passed in entirely with the value at dictionary index indicated by val

func (*ByteArrayDictConverter) FillZero

func (dc *ByteArrayDictConverter) FillZero(out interface{})

FillZero populates the entire slice of out with the zero value for parquet.ByteArray

func (*ByteArrayDictConverter) IsValid

func (dc *ByteArrayDictConverter) IsValid(idxes ...utils.IndexType) bool

IsValid verifies that the set of indexes passed in are all valid indexes in the dictionary and if necessary decodes dictionary indexes up to the index requested.

type ByteArrayEncoder

type ByteArrayEncoder interface {
	TypedEncoder
	Put([]parquet.ByteArray)
	PutSpaced([]parquet.ByteArray, []byte, int64)
}

ByteArrayEncoder is the interface for all encoding types that implement encoding parquet.ByteArray values.

type DecoderTraits

type DecoderTraits interface {
	Decoder(e parquet.Encoding, descr *schema.Column, useDict bool, mem memory.Allocator) TypedDecoder
	BytesRequired(int) int
}

DecoderTraits provides an interface for more easily interacting with types to generate decoders for specific types.

type DeltaBitPackInt32Decoder

type DeltaBitPackInt32Decoder struct {
	// contains filtered or unexported fields
}

DeltaBitPackInt32Decoder decodes Int32 values which are packed using the Delta BitPacking algorithm.

func (DeltaBitPackInt32Decoder) Allocator

func (d DeltaBitPackInt32Decoder) Allocator() memory.Allocator

func (*DeltaBitPackInt32Decoder) Decode

func (d *DeltaBitPackInt32Decoder) Decode(out []int32) (int, error)

Decode retrieves min(remaining values, len(out)) values from the data and returns the number of values actually decoded and any errors encountered.

func (*DeltaBitPackInt32Decoder) DecodeSpaced

func (d *DeltaBitPackInt32Decoder) DecodeSpaced(out []int32, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is like Decode, but the result is spaced out appropriately based on the passed in bitmap

func (DeltaBitPackInt32Decoder) SetData

func (d DeltaBitPackInt32Decoder) SetData(nvalues int, data []byte) error

SetData sets the bytes and the expected number of values to decode into the decoder, updating the decoder and allowing it to be reused.

func (DeltaBitPackInt32Decoder) Type

Type returns the physical parquet type that this decoder decodes, in this case Int32

type DeltaBitPackInt32Encoder

type DeltaBitPackInt32Encoder struct {
	// contains filtered or unexported fields
}

DeltaBitPackInt32Encoder is an encoder for the delta bitpacking encoding for int32 data.

func (DeltaBitPackInt32Encoder) EstimatedDataEncodedSize

func (enc DeltaBitPackInt32Encoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the current amount of data actually flushed out and written

func (DeltaBitPackInt32Encoder) FlushValues

func (enc DeltaBitPackInt32Encoder) FlushValues() (Buffer, error)

FlushValues flushes any remaining data and returns the finished encoded buffer or returns nil and any error encountered during flushing.

func (DeltaBitPackInt32Encoder) Put

func (enc DeltaBitPackInt32Encoder) Put(in []int32)

Put writes the values from the provided slice of int32 to the encoder

func (DeltaBitPackInt32Encoder) PutSpaced

func (enc DeltaBitPackInt32Encoder) PutSpaced(in []int32, validBits []byte, validBitsOffset int64)

PutSpaced takes a slice of int32 along with a bitmap that describes the nulls and an offset into the bitmap in order to write spaced data to the encoder.

func (DeltaBitPackInt32Encoder) Type

Type returns the underlying physical type this encoder works with, in this case Int32

type DeltaBitPackInt64Decoder

type DeltaBitPackInt64Decoder struct {
	// contains filtered or unexported fields
}

DeltaBitPackInt64Decoder decodes a delta bit packed int64 column of data.

func (DeltaBitPackInt64Decoder) Allocator

func (d DeltaBitPackInt64Decoder) Allocator() memory.Allocator

func (*DeltaBitPackInt64Decoder) Decode

func (d *DeltaBitPackInt64Decoder) Decode(out []int64) (int, error)

Decode retrieves min(remaining values, len(out)) values from the data and returns the number of values actually decoded and any errors encountered.

func (DeltaBitPackInt64Decoder) DecodeSpaced

func (d DeltaBitPackInt64Decoder) DecodeSpaced(out []int64, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is like Decode, but the result is spaced out appropriately based on the passed in bitmap

func (DeltaBitPackInt64Decoder) SetData

func (d DeltaBitPackInt64Decoder) SetData(nvalues int, data []byte) error

SetData sets the bytes and the expected number of values to decode into the decoder, updating the decoder and allowing it to be reused.

func (DeltaBitPackInt64Decoder) Type

Type returns the physical parquet type that this decoder decodes, in this case Int64

type DeltaBitPackInt64Encoder

type DeltaBitPackInt64Encoder struct {
	// contains filtered or unexported fields
}

DeltaBitPackInt32Encoder is an encoder for the delta bitpacking encoding for int32 data.

func (DeltaBitPackInt64Encoder) EstimatedDataEncodedSize

func (enc DeltaBitPackInt64Encoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the current amount of data actually flushed out and written

func (DeltaBitPackInt64Encoder) FlushValues

func (enc DeltaBitPackInt64Encoder) FlushValues() (Buffer, error)

FlushValues flushes any remaining data and returns the finished encoded buffer or returns nil and any error encountered during flushing.

func (DeltaBitPackInt64Encoder) Put

func (enc DeltaBitPackInt64Encoder) Put(in []int64)

Put writes the values from the provided slice of int64 to the encoder

func (DeltaBitPackInt64Encoder) PutSpaced

func (enc DeltaBitPackInt64Encoder) PutSpaced(in []int64, validBits []byte, validBitsOffset int64)

PutSpaced takes a slice of int64 along with a bitmap that describes the nulls and an offset into the bitmap in order to write spaced data to the encoder.

func (DeltaBitPackInt64Encoder) Type

Type returns the underlying physical type this encoder works with, in this case Int64

type DeltaByteArrayDecoder

type DeltaByteArrayDecoder struct {
	*DeltaLengthByteArrayDecoder
	// contains filtered or unexported fields
}

DeltaByteArrayDecoder is a decoder for a column of data encoded using incremental or prefix encoding.

func (*DeltaByteArrayDecoder) Allocator

func (d *DeltaByteArrayDecoder) Allocator() memory.Allocator

func (*DeltaByteArrayDecoder) Decode

func (d *DeltaByteArrayDecoder) Decode(out []parquet.ByteArray) (int, error)

Decode decodes byte arrays into the slice provided and returns the number of values actually decoded

func (*DeltaByteArrayDecoder) DecodeSpaced

func (d *DeltaByteArrayDecoder) DecodeSpaced(out []parquet.ByteArray, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is like decode, but the result is spaced out based on the bitmap provided.

func (DeltaByteArrayDecoder) Encoding

func (d DeltaByteArrayDecoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*DeltaByteArrayDecoder) SetData

func (d *DeltaByteArrayDecoder) SetData(nvalues int, data []byte) error

SetData expects the data passed in to be the prefix lengths, followed by the blocks of suffix data in order to initialize the decoder.

func (DeltaByteArrayDecoder) Type

Type returns the underlying physical type this decoder operates on, in this case ByteArrays only

func (DeltaByteArrayDecoder) ValuesLeft

func (d DeltaByteArrayDecoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type DeltaByteArrayEncoder

type DeltaByteArrayEncoder struct {
	// contains filtered or unexported fields
}

DeltaByteArrayEncoder is an encoder for writing bytearrays which are delta encoded this is also known as incremental encoding or front compression. For each element in a sequence of strings, we store the prefix length of the previous entry plus the suffix see https://en.wikipedia.org/wiki/Incremental_encoding for a longer description.

This is stored as a sequence of delta-encoded prefix lengths followed by the suffixes encoded as delta length byte arrays.

func (*DeltaByteArrayEncoder) Allocator

func (e *DeltaByteArrayEncoder) Allocator() memory.Allocator

func (*DeltaByteArrayEncoder) Bytes

func (e *DeltaByteArrayEncoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*DeltaByteArrayEncoder) Encoding

func (e *DeltaByteArrayEncoder) Encoding() parquet.Encoding

func (*DeltaByteArrayEncoder) EstimatedDataEncodedSize

func (enc *DeltaByteArrayEncoder) EstimatedDataEncodedSize() int64

func (*DeltaByteArrayEncoder) FlushValues

func (enc *DeltaByteArrayEncoder) FlushValues() (Buffer, error)

Flush flushes any remaining data out and returns the finished encoded buffer. or returns nil and any error encountered during flushing.

func (*DeltaByteArrayEncoder) Put

func (enc *DeltaByteArrayEncoder) Put(in []parquet.ByteArray)

Put writes a slice of ByteArrays to the encoder

func (*DeltaByteArrayEncoder) PutSpaced

func (enc *DeltaByteArrayEncoder) PutSpaced(in []parquet.ByteArray, validBits []byte, validBitsOffset int64)

PutSpaced is like Put, but assumes the data is already spaced for nulls and uses the bitmap provided and offset to compress the data before writing it without the null slots.

func (*DeltaByteArrayEncoder) ReserveForWrite

func (e *DeltaByteArrayEncoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*DeltaByteArrayEncoder) Reset

func (e *DeltaByteArrayEncoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (DeltaByteArrayEncoder) Type

Type returns the underlying physical type this operates on, in this case ByteArrays only

type DeltaLengthByteArrayDecoder

type DeltaLengthByteArrayDecoder struct {
	// contains filtered or unexported fields
}

DeltaLengthByteArrayDecoder is a decoder for handling data produced by the corresponding encoder which expects delta packed lengths followed by the bytes of data.

func (*DeltaLengthByteArrayDecoder) Allocator

func (*DeltaLengthByteArrayDecoder) Decode

Decode populates the passed in slice with data decoded until it hits the length of out or runs out of values in the column to decode, then returns the number of values actually decoded.

func (*DeltaLengthByteArrayDecoder) DecodeSpaced

func (d *DeltaLengthByteArrayDecoder) DecodeSpaced(out []parquet.ByteArray, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is like Decode, but for spaced data using the provided bitmap to determine where the nulls should be inserted.

func (*DeltaLengthByteArrayDecoder) Encoding

func (d *DeltaLengthByteArrayDecoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*DeltaLengthByteArrayDecoder) SetData

func (d *DeltaLengthByteArrayDecoder) SetData(nvalues int, data []byte) error

SetData sets in the expected data to the decoder which should be nvalues delta packed lengths followed by the rest of the byte array data immediately after.

func (DeltaLengthByteArrayDecoder) Type

Type returns the underlying type which is handled by this encoder, ByteArrays only.

func (*DeltaLengthByteArrayDecoder) ValuesLeft

func (d *DeltaLengthByteArrayDecoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type DeltaLengthByteArrayEncoder

type DeltaLengthByteArrayEncoder struct {
	// contains filtered or unexported fields
}

DeltaLengthByteArrayEncoder encodes data using by taking all of the byte array lengths and encoding them in front using delta encoding, followed by all of the binary data concatenated back to back. The expected savings is from the cost of encoding the lengths and possibly better compression in the data which will no longer be interleaved with the lengths.

This encoding is always preferred over PLAIN for byte array columns where possible.

For example, if the data was "Hello", "World", "Foobar", "ABCDEF" the encoded data would be: DeltaEncoding(5, 5, 6, 6) "HelloWorldFoobarABCDEF"

func (*DeltaLengthByteArrayEncoder) Allocator

func (e *DeltaLengthByteArrayEncoder) Allocator() memory.Allocator

func (*DeltaLengthByteArrayEncoder) Bytes

func (e *DeltaLengthByteArrayEncoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*DeltaLengthByteArrayEncoder) Encoding

func (e *DeltaLengthByteArrayEncoder) Encoding() parquet.Encoding

func (*DeltaLengthByteArrayEncoder) EstimatedDataEncodedSize

func (e *DeltaLengthByteArrayEncoder) EstimatedDataEncodedSize() int64

func (*DeltaLengthByteArrayEncoder) FlushValues

func (enc *DeltaLengthByteArrayEncoder) FlushValues() (Buffer, error)

FlushValues flushes any remaining data and returns the final encoded buffer of data or returns nil and any error encountered.

func (*DeltaLengthByteArrayEncoder) Put

Put writes the provided slice of byte arrays to the encoder

func (*DeltaLengthByteArrayEncoder) PutSpaced

func (enc *DeltaLengthByteArrayEncoder) PutSpaced(in []parquet.ByteArray, validBits []byte, validBitsOffset int64)

PutSpaced is like Put, but the data is spaced out according to the bitmap provided and is compressed accordingly before it is written to drop the null data from the write.

func (*DeltaLengthByteArrayEncoder) ReserveForWrite

func (e *DeltaLengthByteArrayEncoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*DeltaLengthByteArrayEncoder) Reset

func (e *DeltaLengthByteArrayEncoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (DeltaLengthByteArrayEncoder) Type

Type returns the underlying type which is handled by this encoder, ByteArrays only.

type DictByteArrayDecoder

type DictByteArrayDecoder struct {
	// contains filtered or unexported fields
}

DictByteArrayDecoder is a decoder for decoding dictionary encoded data for parquet.ByteArray columns

func (*DictByteArrayDecoder) Decode

func (d *DictByteArrayDecoder) Decode(out []parquet.ByteArray) (int, error)

Decode populates the passed in slice with min(len(out), remaining values) values, decoding using hte dictionary to get the actual values. Returns the number of values actually decoded and any error encountered.

func (*DictByteArrayDecoder) DecodeSpaced

func (d *DictByteArrayDecoder) DecodeSpaced(out []parquet.ByteArray, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

Decode spaced is like Decode but will space out the data leaving slots for null values based on the provided bitmap.

func (*DictByteArrayDecoder) SetData

func (d *DictByteArrayDecoder) SetData(nvals int, data []byte) error

SetData sets the index value data into the decoder.

func (*DictByteArrayDecoder) SetDict

func (d *DictByteArrayDecoder) SetDict(dict TypedDecoder)

SetDict sets a decoder that can be used to decode the dictionary that is used for this column in order to return the proper values.

func (DictByteArrayDecoder) Type

Type returns the underlying physical type that can be decoded with this decoder

type DictByteArrayEncoder

type DictByteArrayEncoder struct {
	// contains filtered or unexported fields
}

DictByteArrayEncoder is an encoder for parquet.ByteArray data using dictionary encoding

func (*DictByteArrayEncoder) BitWidth

func (d *DictByteArrayEncoder) BitWidth() int

BitWidth returns the max bitwidth that would be necessary for encoding the index values currently in the dictionary based on the size of the dictionary index.

func (*DictByteArrayEncoder) DictEncodedSize

func (d *DictByteArrayEncoder) DictEncodedSize() int

DictEncodedSize returns the current size of the encoded dictionary

func (*DictByteArrayEncoder) EstimatedDataEncodedSize

func (d *DictByteArrayEncoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the maximum number of bytes needed to store the RLE encoded indexes, not including the dictionary index in the computation.

func (*DictByteArrayEncoder) FlushValues

func (d *DictByteArrayEncoder) FlushValues() (Buffer, error)

FlushValues dumps all the currently buffered indexes that would become the data page to a buffer and returns it or returns nil and any error encountered.

func (*DictByteArrayEncoder) NumEntries

func (d *DictByteArrayEncoder) NumEntries() int

NumEntries returns the number of entires in the dictionary index for this encoder.

func (*DictByteArrayEncoder) Put

func (enc *DictByteArrayEncoder) Put(in []parquet.ByteArray)

Put takes a slice of ByteArrays to add and encode.

func (*DictByteArrayEncoder) PutByteArray

func (enc *DictByteArrayEncoder) PutByteArray(in parquet.ByteArray)

PutByteArray adds a single byte array to buffer, updating the dictionary and encoded size if it's a new value

func (*DictByteArrayEncoder) PutSpaced

func (enc *DictByteArrayEncoder) PutSpaced(in []parquet.ByteArray, validBits []byte, validBitsOffset int64)

PutSpaced like with the non-dict encoder leaves out the values where the validBits bitmap is 0

func (*DictByteArrayEncoder) Reset

func (d *DictByteArrayEncoder) Reset()

Reset drops all the currently encoded values from the index and indexes from the data to allow restarting the encoding process.

func (*DictByteArrayEncoder) Type

func (enc *DictByteArrayEncoder) Type() parquet.Type

Type returns the underlying physical type that can be encoded with this encoder

func (*DictByteArrayEncoder) WriteDict

func (enc *DictByteArrayEncoder) WriteDict(out []byte)

WriteDict writes the dictionary out to the provided slice, out should be at least DictEncodedSize() bytes

func (*DictByteArrayEncoder) WriteIndices

func (d *DictByteArrayEncoder) WriteIndices(out []byte) (int, error)

WriteIndices performs Run Length encoding on the indexes and the writes the encoded index value data to the provided byte slice, returning the number of bytes actually written. If any error is encountered, it will return -1 and the error.

type DictDecoder

type DictDecoder interface {
	TypedDecoder
	// SetDict takes in a decoder which can decode the dictionary index to be used
	SetDict(TypedDecoder)
}

DictDecoder is a special TypedDecoder which implements dictionary decoding

func NewDictDecoder

func NewDictDecoder(t parquet.Type, descr *schema.Column, mem memory.Allocator) DictDecoder

NewDictDecoder is like NewDecoder but for dictionary encodings, panics if type is bool.

if mem is nil, memory.DefaultAllocator will be used

type DictEncoder

type DictEncoder interface {
	TypedEncoder
	// WriteIndices populates the byte slice with the final indexes of data and returns
	// the number of bytes written
	WriteIndices(out []byte) (int, error)
	// DictEncodedSize returns the current size of the encoded dictionary index.
	DictEncodedSize() int
	// BitWidth returns the bitwidth needed to encode all of the index values based
	// on the number of values in the dictionary index.
	BitWidth() int
	// WriteDict populates out with the dictionary index values, out should be sized to at least
	// as many bytes as DictEncodedSize
	WriteDict(out []byte)
	// NumEntries returns the number of values currently in the dictionary index.
	NumEntries() int
}

DictEncoder is a special kind of TypedEncoder which implements Dictionary encoding.

type DictFixedLenByteArrayDecoder

type DictFixedLenByteArrayDecoder struct {
	// contains filtered or unexported fields
}

DictFixedLenByteArrayDecoder is a decoder for decoding dictionary encoded data for parquet.FixedLenByteArray columns

func (*DictFixedLenByteArrayDecoder) Decode

Decode populates the passed in slice with min(len(out), remaining values) values, decoding using hte dictionary to get the actual values. Returns the number of values actually decoded and any error encountered.

func (*DictFixedLenByteArrayDecoder) DecodeSpaced

func (d *DictFixedLenByteArrayDecoder) DecodeSpaced(out []parquet.FixedLenByteArray, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

Decode spaced is like Decode but will space out the data leaving slots for null values based on the provided bitmap.

func (*DictFixedLenByteArrayDecoder) SetData

func (d *DictFixedLenByteArrayDecoder) SetData(nvals int, data []byte) error

SetData sets the index value data into the decoder.

func (*DictFixedLenByteArrayDecoder) SetDict

func (d *DictFixedLenByteArrayDecoder) SetDict(dict TypedDecoder)

SetDict sets a decoder that can be used to decode the dictionary that is used for this column in order to return the proper values.

func (DictFixedLenByteArrayDecoder) Type

Type returns the underlying physical type that can be decoded with this decoder

type DictFixedLenByteArrayEncoder

type DictFixedLenByteArrayEncoder struct {
	// contains filtered or unexported fields
}

DictFixedLenByteArrayEncoder is an encoder for parquet.FixedLenByteArray data using dictionary encoding

func (*DictFixedLenByteArrayEncoder) BitWidth

func (d *DictFixedLenByteArrayEncoder) BitWidth() int

BitWidth returns the max bitwidth that would be necessary for encoding the index values currently in the dictionary based on the size of the dictionary index.

func (*DictFixedLenByteArrayEncoder) DictEncodedSize

func (d *DictFixedLenByteArrayEncoder) DictEncodedSize() int

DictEncodedSize returns the current size of the encoded dictionary

func (*DictFixedLenByteArrayEncoder) EstimatedDataEncodedSize

func (d *DictFixedLenByteArrayEncoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the maximum number of bytes needed to store the RLE encoded indexes, not including the dictionary index in the computation.

func (*DictFixedLenByteArrayEncoder) FlushValues

func (d *DictFixedLenByteArrayEncoder) FlushValues() (Buffer, error)

FlushValues dumps all the currently buffered indexes that would become the data page to a buffer and returns it or returns nil and any error encountered.

func (*DictFixedLenByteArrayEncoder) NumEntries

func (d *DictFixedLenByteArrayEncoder) NumEntries() int

NumEntries returns the number of entires in the dictionary index for this encoder.

func (*DictFixedLenByteArrayEncoder) Put

Put writes fixed length values to a dictionary encoded column

func (*DictFixedLenByteArrayEncoder) PutSpaced

func (enc *DictFixedLenByteArrayEncoder) PutSpaced(in []parquet.FixedLenByteArray, validBits []byte, validBitsOffset int64)

PutSpaced is like Put but leaves space for nulls

func (*DictFixedLenByteArrayEncoder) Reset

func (d *DictFixedLenByteArrayEncoder) Reset()

Reset drops all the currently encoded values from the index and indexes from the data to allow restarting the encoding process.

func (*DictFixedLenByteArrayEncoder) Type

Type returns the underlying physical type that can be encoded with this encoder

func (*DictFixedLenByteArrayEncoder) WriteDict

func (enc *DictFixedLenByteArrayEncoder) WriteDict(out []byte)

WriteDict overrides the embedded WriteDict function to call a specialized function for copying out the Fixed length values from the dictionary more efficiently.

func (*DictFixedLenByteArrayEncoder) WriteIndices

func (d *DictFixedLenByteArrayEncoder) WriteIndices(out []byte) (int, error)

WriteIndices performs Run Length encoding on the indexes and the writes the encoded index value data to the provided byte slice, returning the number of bytes actually written. If any error is encountered, it will return -1 and the error.

type DictFloat32Decoder

type DictFloat32Decoder struct {
	// contains filtered or unexported fields
}

DictFloat32Decoder is a decoder for decoding dictionary encoded data for float32 columns

func (*DictFloat32Decoder) Decode

func (d *DictFloat32Decoder) Decode(out []float32) (int, error)

Decode populates the passed in slice with min(len(out), remaining values) values, decoding using hte dictionary to get the actual values. Returns the number of values actually decoded and any error encountered.

func (*DictFloat32Decoder) DecodeSpaced

func (d *DictFloat32Decoder) DecodeSpaced(out []float32, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

Decode spaced is like Decode but will space out the data leaving slots for null values based on the provided bitmap.

func (*DictFloat32Decoder) SetData

func (d *DictFloat32Decoder) SetData(nvals int, data []byte) error

SetData sets the index value data into the decoder.

func (*DictFloat32Decoder) SetDict

func (d *DictFloat32Decoder) SetDict(dict TypedDecoder)

SetDict sets a decoder that can be used to decode the dictionary that is used for this column in order to return the proper values.

func (DictFloat32Decoder) Type

Type returns the underlying physical type that can be decoded with this decoder

type DictFloat32Encoder

type DictFloat32Encoder struct {
	// contains filtered or unexported fields
}

DictFloat32Encoder is an encoder for float32 data using dictionary encoding

func (*DictFloat32Encoder) BitWidth

func (d *DictFloat32Encoder) BitWidth() int

BitWidth returns the max bitwidth that would be necessary for encoding the index values currently in the dictionary based on the size of the dictionary index.

func (*DictFloat32Encoder) DictEncodedSize

func (d *DictFloat32Encoder) DictEncodedSize() int

DictEncodedSize returns the current size of the encoded dictionary

func (*DictFloat32Encoder) EstimatedDataEncodedSize

func (d *DictFloat32Encoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the maximum number of bytes needed to store the RLE encoded indexes, not including the dictionary index in the computation.

func (*DictFloat32Encoder) FlushValues

func (d *DictFloat32Encoder) FlushValues() (Buffer, error)

FlushValues dumps all the currently buffered indexes that would become the data page to a buffer and returns it or returns nil and any error encountered.

func (*DictFloat32Encoder) NumEntries

func (d *DictFloat32Encoder) NumEntries() int

NumEntries returns the number of entires in the dictionary index for this encoder.

func (*DictFloat32Encoder) Put

func (enc *DictFloat32Encoder) Put(in []float32)

Put encodes the values passed in, adding to the index as needed.

func (*DictFloat32Encoder) PutSpaced

func (enc *DictFloat32Encoder) PutSpaced(in []float32, validBits []byte, validBitsOffset int64)

PutSpaced is the same as Put but for when the data being encoded has slots open for null values, using the bitmap provided to skip values as needed.

func (*DictFloat32Encoder) Reset

func (d *DictFloat32Encoder) Reset()

Reset drops all the currently encoded values from the index and indexes from the data to allow restarting the encoding process.

func (*DictFloat32Encoder) Type

func (enc *DictFloat32Encoder) Type() parquet.Type

Type returns the underlying physical type that can be encoded with this encoder

func (*DictFloat32Encoder) WriteDict

func (enc *DictFloat32Encoder) WriteDict(out []byte)

WriteDict populates the byte slice with the dictionary index

func (*DictFloat32Encoder) WriteIndices

func (d *DictFloat32Encoder) WriteIndices(out []byte) (int, error)

WriteIndices performs Run Length encoding on the indexes and the writes the encoded index value data to the provided byte slice, returning the number of bytes actually written. If any error is encountered, it will return -1 and the error.

type DictFloat64Decoder

type DictFloat64Decoder struct {
	// contains filtered or unexported fields
}

DictFloat64Decoder is a decoder for decoding dictionary encoded data for float64 columns

func (*DictFloat64Decoder) Decode

func (d *DictFloat64Decoder) Decode(out []float64) (int, error)

Decode populates the passed in slice with min(len(out), remaining values) values, decoding using hte dictionary to get the actual values. Returns the number of values actually decoded and any error encountered.

func (*DictFloat64Decoder) DecodeSpaced

func (d *DictFloat64Decoder) DecodeSpaced(out []float64, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

Decode spaced is like Decode but will space out the data leaving slots for null values based on the provided bitmap.

func (*DictFloat64Decoder) SetData

func (d *DictFloat64Decoder) SetData(nvals int, data []byte) error

SetData sets the index value data into the decoder.

func (*DictFloat64Decoder) SetDict

func (d *DictFloat64Decoder) SetDict(dict TypedDecoder)

SetDict sets a decoder that can be used to decode the dictionary that is used for this column in order to return the proper values.

func (DictFloat64Decoder) Type

Type returns the underlying physical type that can be decoded with this decoder

type DictFloat64Encoder

type DictFloat64Encoder struct {
	// contains filtered or unexported fields
}

DictFloat64Encoder is an encoder for float64 data using dictionary encoding

func (*DictFloat64Encoder) BitWidth

func (d *DictFloat64Encoder) BitWidth() int

BitWidth returns the max bitwidth that would be necessary for encoding the index values currently in the dictionary based on the size of the dictionary index.

func (*DictFloat64Encoder) DictEncodedSize

func (d *DictFloat64Encoder) DictEncodedSize() int

DictEncodedSize returns the current size of the encoded dictionary

func (*DictFloat64Encoder) EstimatedDataEncodedSize

func (d *DictFloat64Encoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the maximum number of bytes needed to store the RLE encoded indexes, not including the dictionary index in the computation.

func (*DictFloat64Encoder) FlushValues

func (d *DictFloat64Encoder) FlushValues() (Buffer, error)

FlushValues dumps all the currently buffered indexes that would become the data page to a buffer and returns it or returns nil and any error encountered.

func (*DictFloat64Encoder) NumEntries

func (d *DictFloat64Encoder) NumEntries() int

NumEntries returns the number of entires in the dictionary index for this encoder.

func (*DictFloat64Encoder) Put

func (enc *DictFloat64Encoder) Put(in []float64)

Put encodes the values passed in, adding to the index as needed.

func (*DictFloat64Encoder) PutSpaced

func (enc *DictFloat64Encoder) PutSpaced(in []float64, validBits []byte, validBitsOffset int64)

PutSpaced is the same as Put but for when the data being encoded has slots open for null values, using the bitmap provided to skip values as needed.

func (*DictFloat64Encoder) Reset

func (d *DictFloat64Encoder) Reset()

Reset drops all the currently encoded values from the index and indexes from the data to allow restarting the encoding process.

func (*DictFloat64Encoder) Type

func (enc *DictFloat64Encoder) Type() parquet.Type

Type returns the underlying physical type that can be encoded with this encoder

func (*DictFloat64Encoder) WriteDict

func (enc *DictFloat64Encoder) WriteDict(out []byte)

WriteDict populates the byte slice with the dictionary index

func (*DictFloat64Encoder) WriteIndices

func (d *DictFloat64Encoder) WriteIndices(out []byte) (int, error)

WriteIndices performs Run Length encoding on the indexes and the writes the encoded index value data to the provided byte slice, returning the number of bytes actually written. If any error is encountered, it will return -1 and the error.

type DictInt32Decoder

type DictInt32Decoder struct {
	// contains filtered or unexported fields
}

DictInt32Decoder is a decoder for decoding dictionary encoded data for int32 columns

func (*DictInt32Decoder) Decode

func (d *DictInt32Decoder) Decode(out []int32) (int, error)

Decode populates the passed in slice with min(len(out), remaining values) values, decoding using hte dictionary to get the actual values. Returns the number of values actually decoded and any error encountered.

func (*DictInt32Decoder) DecodeSpaced

func (d *DictInt32Decoder) DecodeSpaced(out []int32, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

Decode spaced is like Decode but will space out the data leaving slots for null values based on the provided bitmap.

func (*DictInt32Decoder) SetData

func (d *DictInt32Decoder) SetData(nvals int, data []byte) error

SetData sets the index value data into the decoder.

func (*DictInt32Decoder) SetDict

func (d *DictInt32Decoder) SetDict(dict TypedDecoder)

SetDict sets a decoder that can be used to decode the dictionary that is used for this column in order to return the proper values.

func (DictInt32Decoder) Type

func (DictInt32Decoder) Type() parquet.Type

Type returns the underlying physical type that can be decoded with this decoder

type DictInt32Encoder

type DictInt32Encoder struct {
	// contains filtered or unexported fields
}

DictInt32Encoder is an encoder for int32 data using dictionary encoding

func (*DictInt32Encoder) BitWidth

func (d *DictInt32Encoder) BitWidth() int

BitWidth returns the max bitwidth that would be necessary for encoding the index values currently in the dictionary based on the size of the dictionary index.

func (*DictInt32Encoder) DictEncodedSize

func (d *DictInt32Encoder) DictEncodedSize() int

DictEncodedSize returns the current size of the encoded dictionary

func (*DictInt32Encoder) EstimatedDataEncodedSize

func (d *DictInt32Encoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the maximum number of bytes needed to store the RLE encoded indexes, not including the dictionary index in the computation.

func (*DictInt32Encoder) FlushValues

func (d *DictInt32Encoder) FlushValues() (Buffer, error)

FlushValues dumps all the currently buffered indexes that would become the data page to a buffer and returns it or returns nil and any error encountered.

func (*DictInt32Encoder) NumEntries

func (d *DictInt32Encoder) NumEntries() int

NumEntries returns the number of entires in the dictionary index for this encoder.

func (*DictInt32Encoder) Put

func (enc *DictInt32Encoder) Put(in []int32)

Put encodes the values passed in, adding to the index as needed.

func (*DictInt32Encoder) PutSpaced

func (enc *DictInt32Encoder) PutSpaced(in []int32, validBits []byte, validBitsOffset int64)

PutSpaced is the same as Put but for when the data being encoded has slots open for null values, using the bitmap provided to skip values as needed.

func (*DictInt32Encoder) Reset

func (d *DictInt32Encoder) Reset()

Reset drops all the currently encoded values from the index and indexes from the data to allow restarting the encoding process.

func (*DictInt32Encoder) Type

func (enc *DictInt32Encoder) Type() parquet.Type

Type returns the underlying physical type that can be encoded with this encoder

func (*DictInt32Encoder) WriteDict

func (enc *DictInt32Encoder) WriteDict(out []byte)

WriteDict populates the byte slice with the dictionary index

func (*DictInt32Encoder) WriteIndices

func (d *DictInt32Encoder) WriteIndices(out []byte) (int, error)

WriteIndices performs Run Length encoding on the indexes and the writes the encoded index value data to the provided byte slice, returning the number of bytes actually written. If any error is encountered, it will return -1 and the error.

type DictInt64Decoder

type DictInt64Decoder struct {
	// contains filtered or unexported fields
}

DictInt64Decoder is a decoder for decoding dictionary encoded data for int64 columns

func (*DictInt64Decoder) Decode

func (d *DictInt64Decoder) Decode(out []int64) (int, error)

Decode populates the passed in slice with min(len(out), remaining values) values, decoding using hte dictionary to get the actual values. Returns the number of values actually decoded and any error encountered.

func (*DictInt64Decoder) DecodeSpaced

func (d *DictInt64Decoder) DecodeSpaced(out []int64, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

Decode spaced is like Decode but will space out the data leaving slots for null values based on the provided bitmap.

func (*DictInt64Decoder) SetData

func (d *DictInt64Decoder) SetData(nvals int, data []byte) error

SetData sets the index value data into the decoder.

func (*DictInt64Decoder) SetDict

func (d *DictInt64Decoder) SetDict(dict TypedDecoder)

SetDict sets a decoder that can be used to decode the dictionary that is used for this column in order to return the proper values.

func (DictInt64Decoder) Type

func (DictInt64Decoder) Type() parquet.Type

Type returns the underlying physical type that can be decoded with this decoder

type DictInt64Encoder

type DictInt64Encoder struct {
	// contains filtered or unexported fields
}

DictInt64Encoder is an encoder for int64 data using dictionary encoding

func (*DictInt64Encoder) BitWidth

func (d *DictInt64Encoder) BitWidth() int

BitWidth returns the max bitwidth that would be necessary for encoding the index values currently in the dictionary based on the size of the dictionary index.

func (*DictInt64Encoder) DictEncodedSize

func (d *DictInt64Encoder) DictEncodedSize() int

DictEncodedSize returns the current size of the encoded dictionary

func (*DictInt64Encoder) EstimatedDataEncodedSize

func (d *DictInt64Encoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the maximum number of bytes needed to store the RLE encoded indexes, not including the dictionary index in the computation.

func (*DictInt64Encoder) FlushValues

func (d *DictInt64Encoder) FlushValues() (Buffer, error)

FlushValues dumps all the currently buffered indexes that would become the data page to a buffer and returns it or returns nil and any error encountered.

func (*DictInt64Encoder) NumEntries

func (d *DictInt64Encoder) NumEntries() int

NumEntries returns the number of entires in the dictionary index for this encoder.

func (*DictInt64Encoder) Put

func (enc *DictInt64Encoder) Put(in []int64)

Put encodes the values passed in, adding to the index as needed.

func (*DictInt64Encoder) PutSpaced

func (enc *DictInt64Encoder) PutSpaced(in []int64, validBits []byte, validBitsOffset int64)

PutSpaced is the same as Put but for when the data being encoded has slots open for null values, using the bitmap provided to skip values as needed.

func (*DictInt64Encoder) Reset

func (d *DictInt64Encoder) Reset()

Reset drops all the currently encoded values from the index and indexes from the data to allow restarting the encoding process.

func (*DictInt64Encoder) Type

func (enc *DictInt64Encoder) Type() parquet.Type

Type returns the underlying physical type that can be encoded with this encoder

func (*DictInt64Encoder) WriteDict

func (enc *DictInt64Encoder) WriteDict(out []byte)

WriteDict populates the byte slice with the dictionary index

func (*DictInt64Encoder) WriteIndices

func (d *DictInt64Encoder) WriteIndices(out []byte) (int, error)

WriteIndices performs Run Length encoding on the indexes and the writes the encoded index value data to the provided byte slice, returning the number of bytes actually written. If any error is encountered, it will return -1 and the error.

type DictInt96Decoder

type DictInt96Decoder struct {
	// contains filtered or unexported fields
}

DictInt96Decoder is a decoder for decoding dictionary encoded data for parquet.Int96 columns

func (*DictInt96Decoder) Decode

func (d *DictInt96Decoder) Decode(out []parquet.Int96) (int, error)

Decode populates the passed in slice with min(len(out), remaining values) values, decoding using hte dictionary to get the actual values. Returns the number of values actually decoded and any error encountered.

func (*DictInt96Decoder) DecodeSpaced

func (d *DictInt96Decoder) DecodeSpaced(out []parquet.Int96, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

Decode spaced is like Decode but will space out the data leaving slots for null values based on the provided bitmap.

func (*DictInt96Decoder) SetData

func (d *DictInt96Decoder) SetData(nvals int, data []byte) error

SetData sets the index value data into the decoder.

func (*DictInt96Decoder) SetDict

func (d *DictInt96Decoder) SetDict(dict TypedDecoder)

SetDict sets a decoder that can be used to decode the dictionary that is used for this column in order to return the proper values.

func (DictInt96Decoder) Type

func (DictInt96Decoder) Type() parquet.Type

Type returns the underlying physical type that can be decoded with this decoder

type DictInt96Encoder

type DictInt96Encoder struct {
	// contains filtered or unexported fields
}

DictInt96Encoder is an encoder for parquet.Int96 data using dictionary encoding

func (*DictInt96Encoder) BitWidth

func (d *DictInt96Encoder) BitWidth() int

BitWidth returns the max bitwidth that would be necessary for encoding the index values currently in the dictionary based on the size of the dictionary index.

func (*DictInt96Encoder) DictEncodedSize

func (d *DictInt96Encoder) DictEncodedSize() int

DictEncodedSize returns the current size of the encoded dictionary

func (*DictInt96Encoder) EstimatedDataEncodedSize

func (d *DictInt96Encoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the maximum number of bytes needed to store the RLE encoded indexes, not including the dictionary index in the computation.

func (*DictInt96Encoder) FlushValues

func (d *DictInt96Encoder) FlushValues() (Buffer, error)

FlushValues dumps all the currently buffered indexes that would become the data page to a buffer and returns it or returns nil and any error encountered.

func (*DictInt96Encoder) NumEntries

func (d *DictInt96Encoder) NumEntries() int

NumEntries returns the number of entires in the dictionary index for this encoder.

func (*DictInt96Encoder) Put

func (enc *DictInt96Encoder) Put(in []parquet.Int96)

Put encodes the values passed in, adding to the index as needed

func (*DictInt96Encoder) PutSpaced

func (enc *DictInt96Encoder) PutSpaced(in []parquet.Int96, validBits []byte, validBitsOffset int64)

PutSpaced is like Put but assumes space for nulls

func (*DictInt96Encoder) Reset

func (d *DictInt96Encoder) Reset()

Reset drops all the currently encoded values from the index and indexes from the data to allow restarting the encoding process.

func (*DictInt96Encoder) Type

func (enc *DictInt96Encoder) Type() parquet.Type

Type returns the underlying physical type that can be encoded with this encoder

func (*DictInt96Encoder) WriteDict

func (enc *DictInt96Encoder) WriteDict(out []byte)

WriteDict populates the byte slice with the dictionary index

func (*DictInt96Encoder) WriteIndices

func (d *DictInt96Encoder) WriteIndices(out []byte) (int, error)

WriteIndices performs Run Length encoding on the indexes and the writes the encoded index value data to the provided byte slice, returning the number of bytes actually written. If any error is encountered, it will return -1 and the error.

type EncoderTraits

type EncoderTraits interface {
	Encoder(format.Encoding, bool, *schema.Column, memory.Allocator) TypedEncoder
}

EncoderTraits is an interface for the different types to make it more convenient to construct encoders for specific types.

type FixedLenByteArrayDecoder

type FixedLenByteArrayDecoder interface {
	TypedDecoder
	Decode([]parquet.FixedLenByteArray) (int, error)
	DecodeSpaced([]parquet.FixedLenByteArray, int, []byte, int64) (int, error)
}

FixedLenByteArrayDecoder is the interface for all encoding types that implement decoding parquet.FixedLenByteArray values.

type FixedLenByteArrayDictConverter

type FixedLenByteArrayDictConverter struct {
	// contains filtered or unexported fields
}

FixedLenByteArrayDictConverter is a helper for dictionary handling which is used for converting run length encoded indexes into the actual values that are stored in the dictionary index page.

func (*FixedLenByteArrayDictConverter) Copy

func (dc *FixedLenByteArrayDictConverter) Copy(out interface{}, vals []utils.IndexType) error

Copy populates the slice provided with the values in the dictionary at the indexes in the vals slice.

func (*FixedLenByteArrayDictConverter) Fill

func (dc *FixedLenByteArrayDictConverter) Fill(out interface{}, val utils.IndexType) error

Fill populates the slice passed in entirely with the value at dictionary index indicated by val

func (*FixedLenByteArrayDictConverter) FillZero

func (dc *FixedLenByteArrayDictConverter) FillZero(out interface{})

FillZero populates the entire slice of out with the zero value for parquet.FixedLenByteArray

func (*FixedLenByteArrayDictConverter) IsValid

func (dc *FixedLenByteArrayDictConverter) IsValid(idxes ...utils.IndexType) bool

IsValid verifies that the set of indexes passed in are all valid indexes in the dictionary and if necessary decodes dictionary indexes up to the index requested.

type FixedLenByteArrayEncoder

type FixedLenByteArrayEncoder interface {
	TypedEncoder
	Put([]parquet.FixedLenByteArray)
	PutSpaced([]parquet.FixedLenByteArray, []byte, int64)
}

FixedLenByteArrayEncoder is the interface for all encoding types that implement encoding parquet.FixedLenByteArray values.

type Float32Decoder

type Float32Decoder interface {
	TypedDecoder
	Decode([]float32) (int, error)
	DecodeSpaced([]float32, int, []byte, int64) (int, error)
}

Float32Decoder is the interface for all encoding types that implement decoding float32 values.

type Float32DictConverter

type Float32DictConverter struct {
	// contains filtered or unexported fields
}

Float32DictConverter is a helper for dictionary handling which is used for converting run length encoded indexes into the actual values that are stored in the dictionary index page.

func (*Float32DictConverter) Copy

func (dc *Float32DictConverter) Copy(out interface{}, vals []utils.IndexType) error

Copy populates the slice provided with the values in the dictionary at the indexes in the vals slice.

func (*Float32DictConverter) Fill

func (dc *Float32DictConverter) Fill(out interface{}, val utils.IndexType) error

Fill populates the slice passed in entirely with the value at dictionary index indicated by val

func (*Float32DictConverter) FillZero

func (dc *Float32DictConverter) FillZero(out interface{})

FillZero populates the entire slice of out with the zero value for float32

func (*Float32DictConverter) IsValid

func (dc *Float32DictConverter) IsValid(idxes ...utils.IndexType) bool

IsValid verifies that the set of indexes passed in are all valid indexes in the dictionary and if necessary decodes dictionary indexes up to the index requested.

type Float32Encoder

type Float32Encoder interface {
	TypedEncoder
	Put([]float32)
	PutSpaced([]float32, []byte, int64)
}

Float32Encoder is the interface for all encoding types that implement encoding float32 values.

type Float64Decoder

type Float64Decoder interface {
	TypedDecoder
	Decode([]float64) (int, error)
	DecodeSpaced([]float64, int, []byte, int64) (int, error)
}

Float64Decoder is the interface for all encoding types that implement decoding float64 values.

type Float64DictConverter

type Float64DictConverter struct {
	// contains filtered or unexported fields
}

Float64DictConverter is a helper for dictionary handling which is used for converting run length encoded indexes into the actual values that are stored in the dictionary index page.

func (*Float64DictConverter) Copy

func (dc *Float64DictConverter) Copy(out interface{}, vals []utils.IndexType) error

Copy populates the slice provided with the values in the dictionary at the indexes in the vals slice.

func (*Float64DictConverter) Fill

func (dc *Float64DictConverter) Fill(out interface{}, val utils.IndexType) error

Fill populates the slice passed in entirely with the value at dictionary index indicated by val

func (*Float64DictConverter) FillZero

func (dc *Float64DictConverter) FillZero(out interface{})

FillZero populates the entire slice of out with the zero value for float64

func (*Float64DictConverter) IsValid

func (dc *Float64DictConverter) IsValid(idxes ...utils.IndexType) bool

IsValid verifies that the set of indexes passed in are all valid indexes in the dictionary and if necessary decodes dictionary indexes up to the index requested.

type Float64Encoder

type Float64Encoder interface {
	TypedEncoder
	Put([]float64)
	PutSpaced([]float64, []byte, int64)
}

Float64Encoder is the interface for all encoding types that implement encoding float64 values.

type Int32Decoder

type Int32Decoder interface {
	TypedDecoder
	Decode([]int32) (int, error)
	DecodeSpaced([]int32, int, []byte, int64) (int, error)
}

Int32Decoder is the interface for all encoding types that implement decoding int32 values.

type Int32DictConverter

type Int32DictConverter struct {
	// contains filtered or unexported fields
}

Int32DictConverter is a helper for dictionary handling which is used for converting run length encoded indexes into the actual values that are stored in the dictionary index page.

func (*Int32DictConverter) Copy

func (dc *Int32DictConverter) Copy(out interface{}, vals []utils.IndexType) error

Copy populates the slice provided with the values in the dictionary at the indexes in the vals slice.

func (*Int32DictConverter) Fill

func (dc *Int32DictConverter) Fill(out interface{}, val utils.IndexType) error

Fill populates the slice passed in entirely with the value at dictionary index indicated by val

func (*Int32DictConverter) FillZero

func (dc *Int32DictConverter) FillZero(out interface{})

FillZero populates the entire slice of out with the zero value for int32

func (*Int32DictConverter) IsValid

func (dc *Int32DictConverter) IsValid(idxes ...utils.IndexType) bool

IsValid verifies that the set of indexes passed in are all valid indexes in the dictionary and if necessary decodes dictionary indexes up to the index requested.

type Int32Encoder

type Int32Encoder interface {
	TypedEncoder
	Put([]int32)
	PutSpaced([]int32, []byte, int64)
}

Int32Encoder is the interface for all encoding types that implement encoding int32 values.

type Int64Decoder

type Int64Decoder interface {
	TypedDecoder
	Decode([]int64) (int, error)
	DecodeSpaced([]int64, int, []byte, int64) (int, error)
}

Int64Decoder is the interface for all encoding types that implement decoding int64 values.

type Int64DictConverter

type Int64DictConverter struct {
	// contains filtered or unexported fields
}

Int64DictConverter is a helper for dictionary handling which is used for converting run length encoded indexes into the actual values that are stored in the dictionary index page.

func (*Int64DictConverter) Copy

func (dc *Int64DictConverter) Copy(out interface{}, vals []utils.IndexType) error

Copy populates the slice provided with the values in the dictionary at the indexes in the vals slice.

func (*Int64DictConverter) Fill

func (dc *Int64DictConverter) Fill(out interface{}, val utils.IndexType) error

Fill populates the slice passed in entirely with the value at dictionary index indicated by val

func (*Int64DictConverter) FillZero

func (dc *Int64DictConverter) FillZero(out interface{})

FillZero populates the entire slice of out with the zero value for int64

func (*Int64DictConverter) IsValid

func (dc *Int64DictConverter) IsValid(idxes ...utils.IndexType) bool

IsValid verifies that the set of indexes passed in are all valid indexes in the dictionary and if necessary decodes dictionary indexes up to the index requested.

type Int64Encoder

type Int64Encoder interface {
	TypedEncoder
	Put([]int64)
	PutSpaced([]int64, []byte, int64)
}

Int64Encoder is the interface for all encoding types that implement encoding int64 values.

type Int96Decoder

type Int96Decoder interface {
	TypedDecoder
	Decode([]parquet.Int96) (int, error)
	DecodeSpaced([]parquet.Int96, int, []byte, int64) (int, error)
}

Int96Decoder is the interface for all encoding types that implement decoding parquet.Int96 values.

type Int96DictConverter

type Int96DictConverter struct {
	// contains filtered or unexported fields
}

Int96DictConverter is a helper for dictionary handling which is used for converting run length encoded indexes into the actual values that are stored in the dictionary index page.

func (*Int96DictConverter) Copy

func (dc *Int96DictConverter) Copy(out interface{}, vals []utils.IndexType) error

Copy populates the slice provided with the values in the dictionary at the indexes in the vals slice.

func (*Int96DictConverter) Fill

func (dc *Int96DictConverter) Fill(out interface{}, val utils.IndexType) error

Fill populates the slice passed in entirely with the value at dictionary index indicated by val

func (*Int96DictConverter) FillZero

func (dc *Int96DictConverter) FillZero(out interface{})

FillZero populates the entire slice of out with the zero value for parquet.Int96

func (*Int96DictConverter) IsValid

func (dc *Int96DictConverter) IsValid(idxes ...utils.IndexType) bool

IsValid verifies that the set of indexes passed in are all valid indexes in the dictionary and if necessary decodes dictionary indexes up to the index requested.

type Int96Encoder

type Int96Encoder interface {
	TypedEncoder
	Put([]parquet.Int96)
	PutSpaced([]parquet.Int96, []byte, int64)
}

Int96Encoder is the interface for all encoding types that implement encoding parquet.Int96 values.

type LevelDecoder

type LevelDecoder struct {
	// contains filtered or unexported fields
}

LevelDecoder handles the decoding of repetition and definition levels from a parquet file supporting bit packed and run length encoded values.

func (*LevelDecoder) Decode

func (l *LevelDecoder) Decode(levels []int16) (int, int64)

Decode decodes the bytes that were set with SetData into the slice of levels returning the total number of levels that were decoded and the number of values which had a level equal to the max level, indicating how many physical values exist to be read.

func (*LevelDecoder) SetData

func (l *LevelDecoder) SetData(encoding parquet.Encoding, maxLvl int16, nbuffered int, data []byte) (int, error)

SetData sets in the data to be decoded by subsequent calls by specifying the encoding type the maximum level (which is what determines the bit width), the number of values expected and the raw bytes to decode. Returns the number of bytes expected to be decoded.

func (*LevelDecoder) SetDataV2

func (l *LevelDecoder) SetDataV2(nbytes int32, maxLvl int16, nbuffered int, data []byte) error

SetDataV2 is the same as SetData but only for DataPageV2 pages and only supports run length encoding.

type LevelEncoder

type LevelEncoder struct {
	// contains filtered or unexported fields
}

LevelEncoder is for handling the encoding of Definition and Repetition levels to parquet files.

func (*LevelEncoder) Encode

func (l *LevelEncoder) Encode(lvls []int16) (nencoded int, err error)

Encode encodes the slice of definition or repetition levels based on the currently configured encoding type and returns the number of values that were encoded.

func (*LevelEncoder) EncodeNoFlush

func (l *LevelEncoder) EncodeNoFlush(lvls []int16) (nencoded int, err error)

EncodeNoFlush encodes the provided levels in the encoder, but doesn't flush the buffer and return it yet, appending these encoded values. Returns the number of values encoded and any error encountered or nil. If err is not nil, nencoded will be the number of values encoded before the error was encountered

func (*LevelEncoder) Flush

func (l *LevelEncoder) Flush()

Flush flushes out any encoded data to the underlying writer.

func (*LevelEncoder) Init

func (l *LevelEncoder) Init(encoding parquet.Encoding, maxLvl int16, w io.WriterAt)

Init is called to set up the desired encoding type, max level and underlying writer for a level encoder to control where the resulting encoded buffer will end up.

func (*LevelEncoder) Len

func (l *LevelEncoder) Len() int

Len returns the number of bytes that were written as Run Length encoded levels, this is only valid for run length encoding and will panic if using deprecated bit packed encoding.

func (*LevelEncoder) Reset

func (l *LevelEncoder) Reset(maxLvl int16)

Reset resets the encoder allowing it to be reused and updating the maxlevel to the new specified value.

type MemoTable

type MemoTable interface {
	// Reset drops everything in the table allowing it to be reused
	Reset()
	// Size returns the current number of unique values stored in the table
	// including whether or not a null value has been passed in using GetOrInsertNull
	Size() int
	// CopyValues populates out with the values currently in the table, out must
	// be a slice of the appropriate type for the table type.
	CopyValues(out interface{})
	// CopyValuesSubset is like CopyValues but only copies a subset of values starting
	// at the indicated index.
	CopyValuesSubset(start int, out interface{})

	WriteOut(out []byte)
	WriteOutSubset(start int, out []byte)
	// Get returns the index of the table the specified value is, and a boolean indicating
	// whether or not the value was found in the table. Will panic if val is not the appropriate
	// type for the underlying table.
	Get(val interface{}) (int, bool)
	// GetOrInsert is the same as Get, except if the value is not currently in the table it will
	// be inserted into the table.
	GetOrInsert(val interface{}) (idx int, existed bool, err error)
	// GetNull returns the index of the null value and whether or not it was found in the table
	GetNull() (int, bool)
	// GetOrInsertNull returns the index of the null value, if it didn't already exist in the table,
	// it is inserted.
	GetOrInsertNull() (idx int, existed bool)
}

MemoTable interface that can be used to swap out implementations of the hash table used for handling dictionary encoding. Dictionary encoding is built against this interface to make it easy for code generation and changing implementations.

Values should remember the order they are inserted to generate a valid dictionary index

func NewFloat32Dictionary

func NewFloat32Dictionary() MemoTable

NewFloat32Dictionary returns a memotable interface for use with Float32 values only

func NewFloat32MemoTable

func NewFloat32MemoTable(memory.Allocator) MemoTable

func NewFloat64Dictionary

func NewFloat64Dictionary() MemoTable

NewFloat64Dictionary returns a memotable interface for use with Float64 values only

func NewFloat64MemoTable

func NewFloat64MemoTable(memory.Allocator) MemoTable

func NewInt32Dictionary

func NewInt32Dictionary() MemoTable

NewInt32Dictionary returns a memotable interface for use with Int32 values only

func NewInt32MemoTable

func NewInt32MemoTable(memory.Allocator) MemoTable

func NewInt64Dictionary

func NewInt64Dictionary() MemoTable

NewInt64Dictionary returns a memotable interface for use with Int64 values only

func NewInt64MemoTable

func NewInt64MemoTable(memory.Allocator) MemoTable

func NewInt96MemoTable

func NewInt96MemoTable(memory.Allocator) MemoTable

type NumericMemoTable

type NumericMemoTable interface {
	MemoTable
	// WriteOutLE writes the contents of the memo table out to the byteslice
	// but ensures the values are little-endian before writing them (converting
	// if on a big endian system).
	WriteOutLE(out []byte)
	// WriteOutSubsetLE writes the contents of the memo table out to the byteslice
	// starting with the index indicated by start, but ensures the values are little
	// endian before writing them (converting if on a big-endian system).
	WriteOutSubsetLE(start int, out []byte)
}

type PlainBooleanDecoder

type PlainBooleanDecoder struct {
	// contains filtered or unexported fields
}

PlainBooleanDecoder is for the Plain Encoding type, there is no dictionary decoding for bools.

func (*PlainBooleanDecoder) Decode

func (dec *PlainBooleanDecoder) Decode(out []bool) (int, error)

Decode fills out with bools decoded from the data at the current point or until we reach the end of the data.

Returns the number of values decoded

func (*PlainBooleanDecoder) DecodeSpaced

func (dec *PlainBooleanDecoder) DecodeSpaced(out []bool, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is like Decode except it expands the values to leave spaces for null as determined by the validBits bitmap.

func (*PlainBooleanDecoder) Encoding

func (d *PlainBooleanDecoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*PlainBooleanDecoder) SetData

func (dec *PlainBooleanDecoder) SetData(nvals int, data []byte) error

func (PlainBooleanDecoder) Type

Type for the PlainBooleanDecoder is parquet.Types.Boolean

func (*PlainBooleanDecoder) ValuesLeft

func (d *PlainBooleanDecoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type PlainBooleanEncoder

type PlainBooleanEncoder struct {
	// contains filtered or unexported fields
}

PlainBooleanEncoder encodes bools as a bitmap as per the Plain Encoding

func (*PlainBooleanEncoder) Allocator

func (e *PlainBooleanEncoder) Allocator() memory.Allocator

func (*PlainBooleanEncoder) Bytes

func (e *PlainBooleanEncoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*PlainBooleanEncoder) Encoding

func (e *PlainBooleanEncoder) Encoding() parquet.Encoding

func (*PlainBooleanEncoder) EstimatedDataEncodedSize

func (enc *PlainBooleanEncoder) EstimatedDataEncodedSize() int64

EstimatedDataEncodedSize returns the current number of bytes that have been buffered so far

func (*PlainBooleanEncoder) FlushValues

func (enc *PlainBooleanEncoder) FlushValues() (Buffer, error)

FlushValues returns the buffered data, the responsibility is on the caller to release the buffer memory

func (*PlainBooleanEncoder) Put

func (enc *PlainBooleanEncoder) Put(in []bool)

Put encodes the contents of in into the underlying data buffer.

func (*PlainBooleanEncoder) PutSpaced

func (enc *PlainBooleanEncoder) PutSpaced(in []bool, validBits []byte, validBitsOffset int64)

PutSpaced will use the validBits bitmap to determine which values are nulls and can be left out from the slice, and the encoded without those nulls.

func (*PlainBooleanEncoder) ReserveForWrite

func (e *PlainBooleanEncoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*PlainBooleanEncoder) Reset

func (e *PlainBooleanEncoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (PlainBooleanEncoder) Type

Type for the PlainBooleanEncoder is parquet.Types.Boolean

type PlainByteArrayDecoder

type PlainByteArrayDecoder struct {
	// contains filtered or unexported fields
}

PlainByteArrayDecoder decodes a data chunk for bytearrays according to the plain encoding. The byte arrays will use slices to reference the data rather than copying it.

The parquet spec defines Plain encoding for ByteArrays as a 4 byte little endian integer containing the length of the bytearray followed by that many bytes being the raw data of the byte array.

func (*PlainByteArrayDecoder) Decode

func (pbad *PlainByteArrayDecoder) Decode(out []parquet.ByteArray) (int, error)

Decode will populate the slice of bytearrays in full or until the number of values is consumed.

Returns the number of values that were decoded.

func (*PlainByteArrayDecoder) DecodeSpaced

func (pbad *PlainByteArrayDecoder) DecodeSpaced(out []parquet.ByteArray, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is like Decode, but expands the slice out to leave empty values where the validBits bitmap has 0s

func (*PlainByteArrayDecoder) Encoding

func (d *PlainByteArrayDecoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*PlainByteArrayDecoder) SetData

func (d *PlainByteArrayDecoder) SetData(nvals int, data []byte) error

SetData sets the data for decoding into the decoder to update the available data bytes and number of values available.

func (PlainByteArrayDecoder) Type

Type returns parquet.Types.ByteArray for this decoder

func (*PlainByteArrayDecoder) ValuesLeft

func (d *PlainByteArrayDecoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type PlainByteArrayEncoder

type PlainByteArrayEncoder struct {
	// contains filtered or unexported fields
}

PlainByteArrayEncoder encodes byte arrays according to the spec for Plain encoding by encoding the length as a int32 followed by the bytes of the value.

func (*PlainByteArrayEncoder) Allocator

func (e *PlainByteArrayEncoder) Allocator() memory.Allocator

func (*PlainByteArrayEncoder) Bytes

func (e *PlainByteArrayEncoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*PlainByteArrayEncoder) Encoding

func (e *PlainByteArrayEncoder) Encoding() parquet.Encoding

func (*PlainByteArrayEncoder) EstimatedDataEncodedSize

func (e *PlainByteArrayEncoder) EstimatedDataEncodedSize() int64

func (*PlainByteArrayEncoder) FlushValues

func (e *PlainByteArrayEncoder) FlushValues() (Buffer, error)

FlushValues flushes any unwritten data to the buffer and returns the finished encoded buffer of data. This also clears the encoder, ownership of the data belongs to whomever called FlushValues, Release should be called on the resulting Buffer when done.

func (*PlainByteArrayEncoder) Put

func (enc *PlainByteArrayEncoder) Put(in []parquet.ByteArray)

Put writes out all of the values in this slice to the encoding sink

func (*PlainByteArrayEncoder) PutByteArray

func (enc *PlainByteArrayEncoder) PutByteArray(val parquet.ByteArray)

PutByteArray writes out the 4 bytes for the length followed by the data

func (*PlainByteArrayEncoder) PutSpaced

func (enc *PlainByteArrayEncoder) PutSpaced(in []parquet.ByteArray, validBits []byte, validBitsOffset int64)

PutSpaced uses the bitmap of validBits to leave out anything that is null according to the bitmap.

If validBits is nil, this is equivalent to calling Put

func (*PlainByteArrayEncoder) ReserveForWrite

func (e *PlainByteArrayEncoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*PlainByteArrayEncoder) Reset

func (e *PlainByteArrayEncoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (PlainByteArrayEncoder) Type

Type returns parquet.Types.ByteArray for the bytearray encoder

type PlainFixedLenByteArrayDecoder

type PlainFixedLenByteArrayDecoder struct {
	// contains filtered or unexported fields
}

PlainFixedLenByteArrayDecoder is a plain encoding decoder for Fixed Length Byte Arrays

func (*PlainFixedLenByteArrayDecoder) Decode

Decode populates out with fixed length byte array values until either there are no more values to decode or the length of out has been filled. Then returns the total number of values that were decoded.

func (*PlainFixedLenByteArrayDecoder) DecodeSpaced

func (pflba *PlainFixedLenByteArrayDecoder) DecodeSpaced(out []parquet.FixedLenByteArray, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced does the same as Decode but spaces out the resulting slice according to the bitmap leaving space for null values

func (*PlainFixedLenByteArrayDecoder) Encoding

func (d *PlainFixedLenByteArrayDecoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*PlainFixedLenByteArrayDecoder) SetData

func (d *PlainFixedLenByteArrayDecoder) SetData(nvals int, data []byte) error

SetData sets the data for decoding into the decoder to update the available data bytes and number of values available.

func (PlainFixedLenByteArrayDecoder) Type

Type returns the physical type this decoder operates on, FixedLength Byte Arrays

func (*PlainFixedLenByteArrayDecoder) ValuesLeft

func (d *PlainFixedLenByteArrayDecoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type PlainFixedLenByteArrayEncoder

type PlainFixedLenByteArrayEncoder struct {
	// contains filtered or unexported fields
}

PlainFixedLenByteArrayEncoder writes the raw bytes of the byte array always writing typeLength bytes for each value.

func (*PlainFixedLenByteArrayEncoder) Allocator

func (e *PlainFixedLenByteArrayEncoder) Allocator() memory.Allocator

func (*PlainFixedLenByteArrayEncoder) Bytes

func (e *PlainFixedLenByteArrayEncoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*PlainFixedLenByteArrayEncoder) Encoding

func (e *PlainFixedLenByteArrayEncoder) Encoding() parquet.Encoding

func (*PlainFixedLenByteArrayEncoder) EstimatedDataEncodedSize

func (e *PlainFixedLenByteArrayEncoder) EstimatedDataEncodedSize() int64

func (*PlainFixedLenByteArrayEncoder) FlushValues

func (e *PlainFixedLenByteArrayEncoder) FlushValues() (Buffer, error)

FlushValues flushes any unwritten data to the buffer and returns the finished encoded buffer of data. This also clears the encoder, ownership of the data belongs to whomever called FlushValues, Release should be called on the resulting Buffer when done.

func (*PlainFixedLenByteArrayEncoder) Put

Put writes the provided values to the encoder

func (*PlainFixedLenByteArrayEncoder) PutSpaced

func (enc *PlainFixedLenByteArrayEncoder) PutSpaced(in []parquet.FixedLenByteArray, validBits []byte, validBitsOffset int64)

PutSpaced is like Put but works with data that is spaced out according to the passed in bitmap

func (*PlainFixedLenByteArrayEncoder) ReserveForWrite

func (e *PlainFixedLenByteArrayEncoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*PlainFixedLenByteArrayEncoder) Reset

func (e *PlainFixedLenByteArrayEncoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (PlainFixedLenByteArrayEncoder) Type

Type returns the underlying physical type this encoder works with, Fixed Length byte arrays.

type PlainFloat32Decoder

type PlainFloat32Decoder struct {
	// contains filtered or unexported fields
}

PlainFloat32Decoder is a decoder specifically for decoding Plain Encoding data of float32 type.

func (*PlainFloat32Decoder) Decode

func (dec *PlainFloat32Decoder) Decode(out []float32) (int, error)

Decode populates the given slice with values from the data to be decoded, decoding the min(len(out), remaining values). It returns the number of values actually decoded and any error encountered.

func (*PlainFloat32Decoder) DecodeSpaced

func (dec *PlainFloat32Decoder) DecodeSpaced(out []float32, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is the same as decode, except it expands the data out to leave spaces for null values as defined by the bitmap provided.

func (*PlainFloat32Decoder) Encoding

func (d *PlainFloat32Decoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*PlainFloat32Decoder) SetData

func (d *PlainFloat32Decoder) SetData(nvals int, data []byte) error

SetData sets the data for decoding into the decoder to update the available data bytes and number of values available.

func (PlainFloat32Decoder) Type

Type returns the physical type this decoder is able to decode for

func (*PlainFloat32Decoder) ValuesLeft

func (d *PlainFloat32Decoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type PlainFloat32Encoder

type PlainFloat32Encoder struct {
	// contains filtered or unexported fields
}

PlainFloat32Encoder is an encoder for float32 values using Plain Encoding which in general is just storing the values as raw bytes of the appropriate size

func (*PlainFloat32Encoder) Allocator

func (e *PlainFloat32Encoder) Allocator() memory.Allocator

func (*PlainFloat32Encoder) Bytes

func (e *PlainFloat32Encoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*PlainFloat32Encoder) Encoding

func (e *PlainFloat32Encoder) Encoding() parquet.Encoding

func (*PlainFloat32Encoder) EstimatedDataEncodedSize

func (e *PlainFloat32Encoder) EstimatedDataEncodedSize() int64

func (*PlainFloat32Encoder) FlushValues

func (e *PlainFloat32Encoder) FlushValues() (Buffer, error)

FlushValues flushes any unwritten data to the buffer and returns the finished encoded buffer of data. This also clears the encoder, ownership of the data belongs to whomever called FlushValues, Release should be called on the resulting Buffer when done.

func (*PlainFloat32Encoder) Put

func (enc *PlainFloat32Encoder) Put(in []float32)

Put encodes a slice of values into the underlying buffer

func (*PlainFloat32Encoder) PutSpaced

func (enc *PlainFloat32Encoder) PutSpaced(in []float32, validBits []byte, validBitsOffset int64)

PutSpaced encodes a slice of values into the underlying buffer which are spaced out including null values defined by the validBits bitmap starting at a given bit offset. the values are first compressed by having the null slots removed before writing to the buffer

func (*PlainFloat32Encoder) ReserveForWrite

func (e *PlainFloat32Encoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*PlainFloat32Encoder) Reset

func (e *PlainFloat32Encoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (PlainFloat32Encoder) Type

Type returns the underlying physical type this encoder is able to encode

type PlainFloat64Decoder

type PlainFloat64Decoder struct {
	// contains filtered or unexported fields
}

PlainFloat64Decoder is a decoder specifically for decoding Plain Encoding data of float64 type.

func (*PlainFloat64Decoder) Decode

func (dec *PlainFloat64Decoder) Decode(out []float64) (int, error)

Decode populates the given slice with values from the data to be decoded, decoding the min(len(out), remaining values). It returns the number of values actually decoded and any error encountered.

func (*PlainFloat64Decoder) DecodeSpaced

func (dec *PlainFloat64Decoder) DecodeSpaced(out []float64, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is the same as decode, except it expands the data out to leave spaces for null values as defined by the bitmap provided.

func (*PlainFloat64Decoder) Encoding

func (d *PlainFloat64Decoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*PlainFloat64Decoder) SetData

func (d *PlainFloat64Decoder) SetData(nvals int, data []byte) error

SetData sets the data for decoding into the decoder to update the available data bytes and number of values available.

func (PlainFloat64Decoder) Type

Type returns the physical type this decoder is able to decode for

func (*PlainFloat64Decoder) ValuesLeft

func (d *PlainFloat64Decoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type PlainFloat64Encoder

type PlainFloat64Encoder struct {
	// contains filtered or unexported fields
}

PlainFloat64Encoder is an encoder for float64 values using Plain Encoding which in general is just storing the values as raw bytes of the appropriate size

func (*PlainFloat64Encoder) Allocator

func (e *PlainFloat64Encoder) Allocator() memory.Allocator

func (*PlainFloat64Encoder) Bytes

func (e *PlainFloat64Encoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*PlainFloat64Encoder) Encoding

func (e *PlainFloat64Encoder) Encoding() parquet.Encoding

func (*PlainFloat64Encoder) EstimatedDataEncodedSize

func (e *PlainFloat64Encoder) EstimatedDataEncodedSize() int64

func (*PlainFloat64Encoder) FlushValues

func (e *PlainFloat64Encoder) FlushValues() (Buffer, error)

FlushValues flushes any unwritten data to the buffer and returns the finished encoded buffer of data. This also clears the encoder, ownership of the data belongs to whomever called FlushValues, Release should be called on the resulting Buffer when done.

func (*PlainFloat64Encoder) Put

func (enc *PlainFloat64Encoder) Put(in []float64)

Put encodes a slice of values into the underlying buffer

func (*PlainFloat64Encoder) PutSpaced

func (enc *PlainFloat64Encoder) PutSpaced(in []float64, validBits []byte, validBitsOffset int64)

PutSpaced encodes a slice of values into the underlying buffer which are spaced out including null values defined by the validBits bitmap starting at a given bit offset. the values are first compressed by having the null slots removed before writing to the buffer

func (*PlainFloat64Encoder) ReserveForWrite

func (e *PlainFloat64Encoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*PlainFloat64Encoder) Reset

func (e *PlainFloat64Encoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (PlainFloat64Encoder) Type

Type returns the underlying physical type this encoder is able to encode

type PlainInt32Decoder

type PlainInt32Decoder struct {
	// contains filtered or unexported fields
}

PlainInt32Decoder is a decoder specifically for decoding Plain Encoding data of int32 type.

func (*PlainInt32Decoder) Decode

func (dec *PlainInt32Decoder) Decode(out []int32) (int, error)

Decode populates the given slice with values from the data to be decoded, decoding the min(len(out), remaining values). It returns the number of values actually decoded and any error encountered.

func (*PlainInt32Decoder) DecodeSpaced

func (dec *PlainInt32Decoder) DecodeSpaced(out []int32, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is the same as decode, except it expands the data out to leave spaces for null values as defined by the bitmap provided.

func (*PlainInt32Decoder) Encoding

func (d *PlainInt32Decoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*PlainInt32Decoder) SetData

func (d *PlainInt32Decoder) SetData(nvals int, data []byte) error

SetData sets the data for decoding into the decoder to update the available data bytes and number of values available.

func (PlainInt32Decoder) Type

Type returns the physical type this decoder is able to decode for

func (*PlainInt32Decoder) ValuesLeft

func (d *PlainInt32Decoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type PlainInt32Encoder

type PlainInt32Encoder struct {
	// contains filtered or unexported fields
}

PlainInt32Encoder is an encoder for int32 values using Plain Encoding which in general is just storing the values as raw bytes of the appropriate size

func (*PlainInt32Encoder) Allocator

func (e *PlainInt32Encoder) Allocator() memory.Allocator

func (*PlainInt32Encoder) Bytes

func (e *PlainInt32Encoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*PlainInt32Encoder) Encoding

func (e *PlainInt32Encoder) Encoding() parquet.Encoding

func (*PlainInt32Encoder) EstimatedDataEncodedSize

func (e *PlainInt32Encoder) EstimatedDataEncodedSize() int64

func (*PlainInt32Encoder) FlushValues

func (e *PlainInt32Encoder) FlushValues() (Buffer, error)

FlushValues flushes any unwritten data to the buffer and returns the finished encoded buffer of data. This also clears the encoder, ownership of the data belongs to whomever called FlushValues, Release should be called on the resulting Buffer when done.

func (*PlainInt32Encoder) Put

func (enc *PlainInt32Encoder) Put(in []int32)

Put encodes a slice of values into the underlying buffer

func (*PlainInt32Encoder) PutSpaced

func (enc *PlainInt32Encoder) PutSpaced(in []int32, validBits []byte, validBitsOffset int64)

PutSpaced encodes a slice of values into the underlying buffer which are spaced out including null values defined by the validBits bitmap starting at a given bit offset. the values are first compressed by having the null slots removed before writing to the buffer

func (*PlainInt32Encoder) ReserveForWrite

func (e *PlainInt32Encoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*PlainInt32Encoder) Reset

func (e *PlainInt32Encoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (PlainInt32Encoder) Type

Type returns the underlying physical type this encoder is able to encode

type PlainInt64Decoder

type PlainInt64Decoder struct {
	// contains filtered or unexported fields
}

PlainInt64Decoder is a decoder specifically for decoding Plain Encoding data of int64 type.

func (*PlainInt64Decoder) Decode

func (dec *PlainInt64Decoder) Decode(out []int64) (int, error)

Decode populates the given slice with values from the data to be decoded, decoding the min(len(out), remaining values). It returns the number of values actually decoded and any error encountered.

func (*PlainInt64Decoder) DecodeSpaced

func (dec *PlainInt64Decoder) DecodeSpaced(out []int64, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is the same as decode, except it expands the data out to leave spaces for null values as defined by the bitmap provided.

func (*PlainInt64Decoder) Encoding

func (d *PlainInt64Decoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*PlainInt64Decoder) SetData

func (d *PlainInt64Decoder) SetData(nvals int, data []byte) error

SetData sets the data for decoding into the decoder to update the available data bytes and number of values available.

func (PlainInt64Decoder) Type

Type returns the physical type this decoder is able to decode for

func (*PlainInt64Decoder) ValuesLeft

func (d *PlainInt64Decoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type PlainInt64Encoder

type PlainInt64Encoder struct {
	// contains filtered or unexported fields
}

PlainInt64Encoder is an encoder for int64 values using Plain Encoding which in general is just storing the values as raw bytes of the appropriate size

func (*PlainInt64Encoder) Allocator

func (e *PlainInt64Encoder) Allocator() memory.Allocator

func (*PlainInt64Encoder) Bytes

func (e *PlainInt64Encoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*PlainInt64Encoder) Encoding

func (e *PlainInt64Encoder) Encoding() parquet.Encoding

func (*PlainInt64Encoder) EstimatedDataEncodedSize

func (e *PlainInt64Encoder) EstimatedDataEncodedSize() int64

func (*PlainInt64Encoder) FlushValues

func (e *PlainInt64Encoder) FlushValues() (Buffer, error)

FlushValues flushes any unwritten data to the buffer and returns the finished encoded buffer of data. This also clears the encoder, ownership of the data belongs to whomever called FlushValues, Release should be called on the resulting Buffer when done.

func (*PlainInt64Encoder) Put

func (enc *PlainInt64Encoder) Put(in []int64)

Put encodes a slice of values into the underlying buffer

func (*PlainInt64Encoder) PutSpaced

func (enc *PlainInt64Encoder) PutSpaced(in []int64, validBits []byte, validBitsOffset int64)

PutSpaced encodes a slice of values into the underlying buffer which are spaced out including null values defined by the validBits bitmap starting at a given bit offset. the values are first compressed by having the null slots removed before writing to the buffer

func (*PlainInt64Encoder) ReserveForWrite

func (e *PlainInt64Encoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*PlainInt64Encoder) Reset

func (e *PlainInt64Encoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (PlainInt64Encoder) Type

Type returns the underlying physical type this encoder is able to encode

type PlainInt96Decoder

type PlainInt96Decoder struct {
	// contains filtered or unexported fields
}

PlainInt96Decoder is a decoder specifically for decoding Plain Encoding data of parquet.Int96 type.

func (*PlainInt96Decoder) Decode

func (dec *PlainInt96Decoder) Decode(out []parquet.Int96) (int, error)

Decode populates the given slice with values from the data to be decoded, decoding the min(len(out), remaining values). It returns the number of values actually decoded and any error encountered.

func (*PlainInt96Decoder) DecodeSpaced

func (dec *PlainInt96Decoder) DecodeSpaced(out []parquet.Int96, nullCount int, validBits []byte, validBitsOffset int64) (int, error)

DecodeSpaced is the same as decode, except it expands the data out to leave spaces for null values as defined by the bitmap provided.

func (*PlainInt96Decoder) Encoding

func (d *PlainInt96Decoder) Encoding() parquet.Encoding

Encoding returns the encoding type used by this decoder to decode the bytes.

func (*PlainInt96Decoder) SetData

func (d *PlainInt96Decoder) SetData(nvals int, data []byte) error

SetData sets the data for decoding into the decoder to update the available data bytes and number of values available.

func (PlainInt96Decoder) Type

Type returns the physical type this decoder is able to decode for

func (*PlainInt96Decoder) ValuesLeft

func (d *PlainInt96Decoder) ValuesLeft() int

ValuesLeft returns the number of remaining values that can be decoded

type PlainInt96Encoder

type PlainInt96Encoder struct {
	// contains filtered or unexported fields
}

PlainInt96Encoder is an encoder for parquet.Int96 values using Plain Encoding which in general is just storing the values as raw bytes of the appropriate size

func (*PlainInt96Encoder) Allocator

func (e *PlainInt96Encoder) Allocator() memory.Allocator

func (*PlainInt96Encoder) Bytes

func (e *PlainInt96Encoder) Bytes() []byte

Bytes returns the current bytes that have been written to the encoder's buffer but doesn't transfer ownership.

func (*PlainInt96Encoder) Encoding

func (e *PlainInt96Encoder) Encoding() parquet.Encoding

func (*PlainInt96Encoder) EstimatedDataEncodedSize

func (e *PlainInt96Encoder) EstimatedDataEncodedSize() int64

func (*PlainInt96Encoder) FlushValues

func (e *PlainInt96Encoder) FlushValues() (Buffer, error)

FlushValues flushes any unwritten data to the buffer and returns the finished encoded buffer of data. This also clears the encoder, ownership of the data belongs to whomever called FlushValues, Release should be called on the resulting Buffer when done.

func (*PlainInt96Encoder) Put

func (enc *PlainInt96Encoder) Put(in []parquet.Int96)

Put encodes a slice of values into the underlying buffer

func (*PlainInt96Encoder) PutSpaced

func (enc *PlainInt96Encoder) PutSpaced(in []parquet.Int96, validBits []byte, validBitsOffset int64)

PutSpaced encodes a slice of values into the underlying buffer which are spaced out including null values defined by the validBits bitmap starting at a given bit offset. the values are first compressed by having the null slots removed before writing to the buffer

func (*PlainInt96Encoder) ReserveForWrite

func (e *PlainInt96Encoder) ReserveForWrite(n int)

ReserveForWrite allocates n bytes so that the next n bytes written do not require new allocations.

func (*PlainInt96Encoder) Reset

func (e *PlainInt96Encoder) Reset()

Reset drops the data currently in the encoder and resets for new use.

func (PlainInt96Encoder) Type

Type returns the underlying physical type this encoder is able to encode

type PooledBufferWriter

type PooledBufferWriter struct {
	// contains filtered or unexported fields
}

PooledBufferWriter uses buffers from the buffer pool to back it while implementing io.Writer and io.WriterAt interfaces

func NewPooledBufferWriter

func NewPooledBufferWriter(initial int) *PooledBufferWriter

NewPooledBufferWriter returns a new buffer with 'initial' bytes reserved and pre-allocated to guarantee that writing that many more bytes will not require another allocation.

func (*PooledBufferWriter) Bytes

func (b *PooledBufferWriter) Bytes() []byte

Bytes returns the current bytes slice of slice Len

func (*PooledBufferWriter) Finish

func (b *PooledBufferWriter) Finish() Buffer

Finish returns the current buffer, with the responsibility for releasing the memory on the caller, resetting this writer to be re-used

func (*PooledBufferWriter) Len

func (b *PooledBufferWriter) Len() int

Len provides the current Length of the byte slice

func (*PooledBufferWriter) Reserve

func (b *PooledBufferWriter) Reserve(nbytes int)

Reserve pre-allocates nbytes to ensure that the next write of that many bytes will not require another allocation.

func (*PooledBufferWriter) Reset

func (b *PooledBufferWriter) Reset(initial int)

Reset will release any current memory and initialize it with the new allocated bytes.

func (*PooledBufferWriter) SetOffset

func (b *PooledBufferWriter) SetOffset(offset int)

SetOffset sets an offset in the buffer which will ensure that all references to offsets and sizes in the buffer will be offset by this many bytes, allowing the writer to reserve space in the buffer.

func (*PooledBufferWriter) Tell

func (b *PooledBufferWriter) Tell() int64

func (*PooledBufferWriter) UnsafeWrite

func (b *PooledBufferWriter) UnsafeWrite(buf []byte) (n int, err error)

UnsafeWrite does not check the capacity / length before writing.

func (*PooledBufferWriter) UnsafeWriteCopy

func (b *PooledBufferWriter) UnsafeWriteCopy(ncopies int, pattern []byte) (int, error)

func (*PooledBufferWriter) Write

func (b *PooledBufferWriter) Write(buf []byte) (int, error)

func (*PooledBufferWriter) WriteAt

func (b *PooledBufferWriter) WriteAt(p []byte, offset int64) (n int, err error)

WriteAt writes the bytes from p into this buffer starting at offset.

Does not affect the internal position of the writer.

type TypedDecoder

type TypedDecoder interface {
	// SetData updates the data in the decoder with the passed in byte slice and the
	// stated number of values as expected to be decoded.
	SetData(buffered int, buf []byte) error
	// Encoding returns the encoding type that this decoder decodes data of
	Encoding() parquet.Encoding
	// ValuesLeft returns the number of remaining values to be decoded
	ValuesLeft() int
	// Type returns the physical type this can decode.
	Type() parquet.Type
}

TypedDecoder is the general interface for all decoder types which can then be type asserted to a specific Type Decoder

func NewDecoder

func NewDecoder(t parquet.Type, e parquet.Encoding, descr *schema.Column, mem memory.Allocator) TypedDecoder

NewDecoder constructs a decoder for a given type and encoding

type TypedEncoder

type TypedEncoder interface {
	// Bytes returns the current slice of bytes that have been encoded but does not pass ownership
	Bytes() []byte
	// Reset resets the encoder and dumps all the data to let it be reused.
	Reset()
	// ReserveForWrite reserves n bytes in the buffer so that the next n bytes written will not
	// cause a memory allocation.
	ReserveForWrite(n int)
	// EstimatedDataEncodedSize returns the estimated number of bytes in the buffer
	// so far.
	EstimatedDataEncodedSize() int64
	// FlushValues finishes up any unwritten data and returns the buffer of data passing
	// ownership to the caller, Release needs to be called on the Buffer to free the memory
	// if error is nil
	FlushValues() (Buffer, error)
	// Encoding returns the type of encoding that this encoder operates with
	Encoding() parquet.Encoding
	// Allocator returns the allocator that was used when creating this encoder
	Allocator() memory.Allocator
	// Type returns the underlying physical type this encodes.
	Type() parquet.Type
}

TypedEncoder is the general interface for all encoding types which can then be type asserted to a specific Type Encoder

func NewEncoder

func NewEncoder(t parquet.Type, e parquet.Encoding, useDict bool, descr *schema.Column, mem memory.Allocator) TypedEncoder

NewEncoder will return the appropriately typed encoder for the requested physical type and encoding.

If mem is nil, memory.DefaultAllocator will be used.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL