Documentation ¶
Overview ¶
Package utils contains various internal utilities for the parquet library that aren't intended to be exposed to external consumers such as interfaces and bitmap readers/writers including the RLE encoder/decoder and so on.
Index ¶
- Constants
- func BytesToBools(in []byte, out []bool)
- func MaxRLEBufferSize(width, numValues int) int
- func MinRLEBufferSize(bitWidth int) int
- type BitReader
- func (b *BitReader) CurOffset() int64
- func (b *BitReader) GetAligned(nbytes int, v interface{}) bool
- func (b *BitReader) GetBatch(bits uint, out []uint64) (int, error)
- func (b *BitReader) GetBatchBools(out []bool) (int, error)
- func (b *BitReader) GetBatchIndex(bits uint, out []IndexType) (i int, err error)
- func (b *BitReader) GetValue(width int) (uint64, bool)
- func (b *BitReader) GetVlqInt() (uint64, bool)
- func (b *BitReader) GetZigZagVlqInt() (int64, bool)
- func (b *BitReader) ReadByte() (byte, error)
- func (b *BitReader) Reset(r reader)
- type BitWriter
- func (b *BitWriter) Clear()
- func (b *BitWriter) Flush(align bool)
- func (b *BitWriter) SkipBytes(nbytes int) (int, error)
- func (b *BitWriter) WriteAligned(val uint64, nbytes int) bool
- func (b *BitWriter) WriteAt(val []byte, off int64) (int, error)
- func (b *BitWriter) WriteValue(v uint64, nbits uint) error
- func (b *BitWriter) WriteVlqInt(v uint64) bool
- func (b *BitWriter) WriteZigZagVlqInt(v int64) bool
- func (b *BitWriter) Written() int
- type BitmapWriter
- type DictionaryConverter
- type IndexType
- type RleDecoder
- func (r *RleDecoder) GetBatch(values []uint64) int
- func (r *RleDecoder) GetBatchSpaced(vals []uint64, nullcount int, validBits []byte, validBitsOffset int64) (int, error)
- func (r *RleDecoder) GetBatchWithDict(dc DictionaryConverter, vals interface{}) (int, error)
- func (r *RleDecoder) GetBatchWithDictByteArray(dc DictionaryConverter, vals []parquet.ByteArray) (int, error)
- func (r *RleDecoder) GetBatchWithDictFixedLenByteArray(dc DictionaryConverter, vals []parquet.FixedLenByteArray) (int, error)
- func (r *RleDecoder) GetBatchWithDictFloat32(dc DictionaryConverter, vals []float32) (int, error)
- func (r *RleDecoder) GetBatchWithDictFloat64(dc DictionaryConverter, vals []float64) (int, error)
- func (r *RleDecoder) GetBatchWithDictInt32(dc DictionaryConverter, vals []int32) (int, error)
- func (r *RleDecoder) GetBatchWithDictInt64(dc DictionaryConverter, vals []int64) (int, error)
- func (r *RleDecoder) GetBatchWithDictInt96(dc DictionaryConverter, vals []parquet.Int96) (int, error)
- func (r *RleDecoder) GetBatchWithDictSpaced(dc DictionaryConverter, vals interface{}, nullCount int, validBits []byte, ...) (int, error)
- func (r *RleDecoder) GetBatchWithDictSpacedByteArray(dc DictionaryConverter, vals []parquet.ByteArray, nullCount int, ...) (totalProcessed int, err error)
- func (r *RleDecoder) GetBatchWithDictSpacedFixedLenByteArray(dc DictionaryConverter, vals []parquet.FixedLenByteArray, nullCount int, ...) (totalProcessed int, err error)
- func (r *RleDecoder) GetBatchWithDictSpacedFloat32(dc DictionaryConverter, vals []float32, nullCount int, validBits []byte, ...) (totalProcessed int, err error)
- func (r *RleDecoder) GetBatchWithDictSpacedFloat64(dc DictionaryConverter, vals []float64, nullCount int, validBits []byte, ...) (totalProcessed int, err error)
- func (r *RleDecoder) GetBatchWithDictSpacedInt32(dc DictionaryConverter, vals []int32, nullCount int, validBits []byte, ...) (totalProcessed int, err error)
- func (r *RleDecoder) GetBatchWithDictSpacedInt64(dc DictionaryConverter, vals []int64, nullCount int, validBits []byte, ...) (totalProcessed int, err error)
- func (r *RleDecoder) GetBatchWithDictSpacedInt96(dc DictionaryConverter, vals []parquet.Int96, nullCount int, validBits []byte, ...) (totalProcessed int, err error)
- func (r *RleDecoder) GetValue() (uint64, bool)
- func (r *RleDecoder) Next() bool
- func (r *RleDecoder) Reset(data *bytes.Reader, width int)
- type RleEncoder
- type TellWrapper
- type WriteCloserTell
- type WriterAtBuffer
- type WriterAtWithLen
- type WriterTell
Constants ¶
const ( MaxIndexType = math.MaxInt32 MinIndexType = math.MinInt32 )
Max and Min constants for the IndexType
const (
MaxValuesPerLiteralRun = (1 << 6) * 8
)
Variables ¶
This section is empty.
Functions ¶
func BytesToBools ¶
BytesToBools efficiently populates a slice of booleans from an input bitmap
func MaxRLEBufferSize ¶
func MinRLEBufferSize ¶
Types ¶
type BitReader ¶
type BitReader struct {
// contains filtered or unexported fields
}
BitReader implements functionality for reading bits or bytes buffering up to a uint64 at a time from the reader in order to improve efficiency. It also provides methods to read multiple bytes in one read such as encoded ints/values.
This BitReader is the basis for the other utility classes like RLE decoding and such, providing the necessary functions for interpreting the values.
func NewBitReader ¶
func NewBitReader(r reader) *BitReader
NewBitReader takes in a reader that implements io.Reader, io.ReaderAt and io.Seeker interfaces and returns a BitReader for use with various bit level manipulations.
func (*BitReader) CurOffset ¶
CurOffset returns the current Byte offset into the data that the reader is at.
func (*BitReader) GetAligned ¶
GetAligned reads nbytes from the underlying stream into the passed interface value. Returning false if there aren't enough bytes remaining in the stream or if an invalid type is passed. The bytes are read aligned to byte boundaries.
v must be a pointer to a byte or sized uint type (*byte, *uint16, *uint32, *uint64). encoded values are assumed to be little endian.
func (*BitReader) GetBatch ¶
GetBatch fills out by decoding values repeated from the stream that are encoded using bits as the number of bits per value. The values are expected to be bit packed so we will unpack the values to populate.
func (*BitReader) GetBatchBools ¶
GetBatchBools is like GetBatch but optimized for reading bits as boolean values
func (*BitReader) GetBatchIndex ¶
GetBatchIndex is like GetBatch but for IndexType (used for dictionary decoding)
func (*BitReader) GetValue ¶
GetValue returns a single value that is bit packed using width as the number of bits and returns false if there weren't enough bits remaining.
func (*BitReader) GetVlqInt ¶
GetVlqInt reads a Vlq encoded int from the stream. The encoded value must start at the beginning of a byte and this returns false if there weren't enough bytes in the buffer or reader. This will call `ReadByte` which in turn retrieves byte aligned values from the reader
func (*BitReader) GetZigZagVlqInt ¶
GetZigZagVlqInt reads a zigzag encoded integer, returning false if there weren't enough bytes remaining.
type BitWriter ¶
type BitWriter struct {
// contains filtered or unexported fields
}
BitWriter is a utility for writing values of specific bit widths to a stream using a uint64 as a buffer to build up between flushing for efficiency.
func NewBitWriter ¶
func NewBitWriter(w WriterAtWithLen) *BitWriter
NewBitWriter initializes a new bit writer to write to the passed in interface using WriteAt to write the appropriate offsets and values.
func (*BitWriter) Clear ¶
func (b *BitWriter) Clear()
Clear resets the writer so that subsequent writes will start from offset 0, allowing reuse of the underlying buffer and writer.
func (*BitWriter) Flush ¶
Flush will flush any buffered data to the underlying writer, pass true if the next write should be byte-aligned after this flush.
func (*BitWriter) SkipBytes ¶
SkipBytes reserves the next aligned nbytes, skipping them and returning the offset to use with WriteAt to write to those reserved bytes. Used for RLE encoding to fill in the indicators after encoding.
func (*BitWriter) WriteAligned ¶
WriteAligned writes the value val as a little endian value in exactly nbytes byte-aligned to the underlying writer, flushing via Flush(true) before writing nbytes without buffering.
func (*BitWriter) WriteAt ¶
WriteAt fulfills the io.WriterAt interface to write len(p) bytes from p to the underlying byte slice starting at offset off. It returns the number of bytes written from p (0 <= n <= len(p)) and any error encountered. This allows writing full bytes directly to the underlying writer.
func (*BitWriter) WriteValue ¶
WriteValue writes the value v using nbits to pack it, returning false if it fails for some reason.
func (*BitWriter) WriteVlqInt ¶
WriteVlqInt writes v as a vlq encoded integer byte-aligned to the underlying writer without buffering.
func (*BitWriter) WriteZigZagVlqInt ¶
WriteZigZagVlqInt writes a zigzag encoded integer byte-aligned to the underlying writer without buffering.
type BitmapWriter ¶
type BitmapWriter interface { // Set sets the current bit that will be written Set() // Clear clears the current bit that will be written Clear() // Next advances to the next bit for the writer Next() // Finish flushes the current byte out to the bitmap slice Finish() // AppendWord takes nbits from word which should be an LSB bitmap and appends them to the bitmap. AppendWord(word uint64, nbits int64) // AppendBools appends the bit representation of the bools slice, returning the number // of bools that were able to fit in the remaining length of the bitmapwriter. AppendBools(in []bool) int // Pos is the current position that will be written next Pos() int // Reset allows reusing the bitmapwriter by resetting Pos to start with length as // the number of bits that the writer can write. Reset(start, length int) }
BitmapWriter is an interface for bitmap writers so that we can use multiple implementations or swap if necessary.
func NewBitmapWriter ¶
func NewBitmapWriter(bitmap []byte, start, length int) BitmapWriter
func NewFirstTimeBitmapWriter ¶
func NewFirstTimeBitmapWriter(buf []byte, start, length int64) BitmapWriter
NewFirstTimeBitmapWriter creates a bitmap writer that might clobber any bit values following the bits written to the bitmap, as such it is faster than the bitmapwriter that is created with NewBitmapWriter
type DictionaryConverter ¶
type DictionaryConverter interface { // Copy takes an interface{} which must be a slice of the appropriate type, and will be populated // by the dictionary values at the indexes from the IndexType slice Copy(interface{}, []IndexType) error // Fill fills interface{} which must be a slice of the appropriate type, with the value // specified by the dictionary index passed in. Fill(interface{}, IndexType) error // FillZero fills interface{}, which must be a slice of the appropriate type, with the zero value // for the given type. FillZero(interface{}) // IsValid validates that all of the indexes passed in are valid indexes for the dictionary IsValid(...IndexType) bool }
DictionaryConverter is an interface used for dealing with RLE decoding and encoding when working with dictionaries to get values from indexes.
type IndexType ¶
type IndexType = int32
IndexType is the type we're going to use for Dictionary indexes, currently an alias to int32
type RleDecoder ¶
type RleDecoder struct {
// contains filtered or unexported fields
}
func NewRleDecoder ¶
func NewRleDecoder(data *bytes.Reader, width int) *RleDecoder
func (*RleDecoder) GetBatch ¶
func (r *RleDecoder) GetBatch(values []uint64) int
func (*RleDecoder) GetBatchSpaced ¶
func (*RleDecoder) GetBatchWithDict ¶
func (r *RleDecoder) GetBatchWithDict(dc DictionaryConverter, vals interface{}) (int, error)
func (*RleDecoder) GetBatchWithDictByteArray ¶
func (r *RleDecoder) GetBatchWithDictByteArray(dc DictionaryConverter, vals []parquet.ByteArray) (int, error)
func (*RleDecoder) GetBatchWithDictFixedLenByteArray ¶
func (r *RleDecoder) GetBatchWithDictFixedLenByteArray(dc DictionaryConverter, vals []parquet.FixedLenByteArray) (int, error)
func (*RleDecoder) GetBatchWithDictFloat32 ¶
func (r *RleDecoder) GetBatchWithDictFloat32(dc DictionaryConverter, vals []float32) (int, error)
func (*RleDecoder) GetBatchWithDictFloat64 ¶
func (r *RleDecoder) GetBatchWithDictFloat64(dc DictionaryConverter, vals []float64) (int, error)
func (*RleDecoder) GetBatchWithDictInt32 ¶
func (r *RleDecoder) GetBatchWithDictInt32(dc DictionaryConverter, vals []int32) (int, error)
func (*RleDecoder) GetBatchWithDictInt64 ¶
func (r *RleDecoder) GetBatchWithDictInt64(dc DictionaryConverter, vals []int64) (int, error)
func (*RleDecoder) GetBatchWithDictInt96 ¶
func (r *RleDecoder) GetBatchWithDictInt96(dc DictionaryConverter, vals []parquet.Int96) (int, error)
func (*RleDecoder) GetBatchWithDictSpaced ¶
func (r *RleDecoder) GetBatchWithDictSpaced(dc DictionaryConverter, vals interface{}, nullCount int, validBits []byte, validBitsOffset int64) (int, error)
func (*RleDecoder) GetBatchWithDictSpacedByteArray ¶
func (r *RleDecoder) GetBatchWithDictSpacedByteArray(dc DictionaryConverter, vals []parquet.ByteArray, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (*RleDecoder) GetBatchWithDictSpacedFixedLenByteArray ¶
func (r *RleDecoder) GetBatchWithDictSpacedFixedLenByteArray(dc DictionaryConverter, vals []parquet.FixedLenByteArray, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (*RleDecoder) GetBatchWithDictSpacedFloat32 ¶
func (r *RleDecoder) GetBatchWithDictSpacedFloat32(dc DictionaryConverter, vals []float32, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (*RleDecoder) GetBatchWithDictSpacedFloat64 ¶
func (r *RleDecoder) GetBatchWithDictSpacedFloat64(dc DictionaryConverter, vals []float64, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (*RleDecoder) GetBatchWithDictSpacedInt32 ¶
func (r *RleDecoder) GetBatchWithDictSpacedInt32(dc DictionaryConverter, vals []int32, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (*RleDecoder) GetBatchWithDictSpacedInt64 ¶
func (r *RleDecoder) GetBatchWithDictSpacedInt64(dc DictionaryConverter, vals []int64, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (*RleDecoder) GetBatchWithDictSpacedInt96 ¶
func (r *RleDecoder) GetBatchWithDictSpacedInt96(dc DictionaryConverter, vals []parquet.Int96, nullCount int, validBits []byte, validBitsOffset int64) (totalProcessed int, err error)
func (*RleDecoder) GetValue ¶
func (r *RleDecoder) GetValue() (uint64, bool)
func (*RleDecoder) Next ¶
func (r *RleDecoder) Next() bool
type RleEncoder ¶
type RleEncoder struct { BitWidth int // contains filtered or unexported fields }
func NewRleEncoder ¶
func NewRleEncoder(w WriterAtWithLen, width int) *RleEncoder
func (*RleEncoder) Clear ¶
func (r *RleEncoder) Clear()
func (*RleEncoder) Flush ¶
func (r *RleEncoder) Flush() int
func (*RleEncoder) Put ¶
func (r *RleEncoder) Put(value uint64) error
Put buffers input values 8 at a time. after seeing all 8 values, it decides whether they should be encoded as a literal or repeated run.
type TellWrapper ¶
TellWrapper wraps any io.Writer to add a Tell function that tracks the position based on calls to Write. It does not take into account any calls to Seek or any Writes that don't go through the TellWrapper
func (*TellWrapper) Close ¶
func (w *TellWrapper) Close() error
Close makes TellWrapper an io.Closer so that calling Close will also call Close on the wrapped writer if it has a Close function.
func (*TellWrapper) Tell ¶
func (w *TellWrapper) Tell() int64
type WriteCloserTell ¶
type WriteCloserTell interface { io.WriteCloser Tell() int64 }
WriteCloserTell is an interface adding a Tell function to a WriteCloser so if the underlying writer has a Close function, it is exposed and not hidden.
type WriterAtBuffer ¶
type WriterAtBuffer struct {
// contains filtered or unexported fields
}
WriterAtBuffer is a convenience struct for providing a WriteAt function to a byte slice for use with things that want an io.WriterAt
func (*WriterAtBuffer) Len ¶
func (w *WriterAtBuffer) Len() int
Len returns the length of the underlying byte slice.
func (*WriterAtBuffer) Reserve ¶
func (w *WriterAtBuffer) Reserve(nbytes int)
func (*WriterAtBuffer) WriteAt ¶
func (w *WriterAtBuffer) WriteAt(p []byte, off int64) (n int, err error)
WriteAt fulfills the io.WriterAt interface to write len(p) bytes from p to the underlying byte slice starting at offset off. It returns the number of bytes written from p (0 <= n <= len(p)) and any error encountered.
type WriterAtWithLen ¶
WriterAtWithLen is an interface for an io.WriterAt with a Len function
func NewWriterAtBuffer ¶
func NewWriterAtBuffer(buf []byte) WriterAtWithLen
NewWriterAtBuffer returns an object which fulfills the io.WriterAt interface by taking ownership of the passed in slice.
type WriterTell ¶
WriterTell is an interface that adds a Tell function to an io.Writer