Documentation ¶
Overview ¶
Package colblk implements a columnar block format.
The columnar block format organizes data into columns. The number of columns and their data types are dependent on the use and are configurable.
Format ¶
Every columnar block begins with a header describing the structure and schema of the block. Then columns values are encoded in sequence. The block ends with a single padding byte.
The block header begins with: - Version number (1 byte) - The number of columns in the block (2 bytes) - The number of rows in the block (4 bytes)
Then follows a column-header for each column. Each column header encodes the data type (1 byte) and the offset into the block where the column data begins (4 bytes).
+-----------+ | Vers (1B) | +-------------------+--------------------------------+ | # columns (2B) | # of rows (4B) | +-----------+-------+---------------------+----------+ | Type (1B) | Page offset (4B) | Col 0 +-----------+---------------------------------+ | Type (1B) | Page offset (4B) | Col 1 +-----------+---------------------------------+ | ... | ... | +-----------+---------------------------------+ | Type (1B) | Page offset (4B) | Col n-1 +-----------+---------------------------------- | column 0 data ... +---------------------------------------------- | column 1 data ... +---------------------------------------------- | ... +---------------------------------------------- | column n-1 data ... +-------------+-------------------------------- | Unused (1B) | +-------------+
The encoding of column data itself is dependent on the data type.
The trailing padding byte exists to allow columns to represent the end of column data using a pointer to the byte after the end of the column. The padding byte ensures that the pointer will always fall within the allocated memory. Without the padding byte, such a pointer would be invalid according to Go's pointer rules.
Alignment ¶
Block buffers are required to be word-aligned, during encoding and decoding. This ensures that if any individual column or piece of data requires word-alignment, the writer can align the offset into the block buffer appropriately to ensure that the data is word-aligned.
Keyspan blocks ¶
Blocks encoding key spans (range deletions, range keys) decompose the fields of keyspan.Key into columns. Key spans are always stored fragmented, such that all overlapping keys have identical bounds. When encoding key spans to a columnar block, we take advantage of this fragmentation to store the set of unique user key span boundaries in a separate column. This column does not have the same number of rows as the other columns. Each individual fragment stores the index of its start boundary user key within the user key column.
For example, consider the three unfragmented spans:
a-e:{(#0,RANGEDEL)} b-d:{(#100,RANGEDEL)} f-g:{(#20,RANGEDEL)}
Once fragmented, these spans become five keyspan.Keys:
a-b:{(#0,RANGEDEL)} b-d:{(#100,RANGEDEL), (#0,RANGEDEL)} d-e:{(#0,RANGEDEL)} f-g:{(#20,RANGEDEL)}
When encoded within a columnar keyspan block, the boundary columns (user key and start indices) would hold six rows:
+-----------------+-----------------+ | User key | Start index | +-----------------+-----------------+ | a | 0 | +-----------------+-----------------+ | b | 1 | +-----------------+-----------------+ | d | 3 | +-----------------+-----------------+ | e | 4 | +-----------------+-----------------+ | f | 4 | +-----------------+-----------------+ | g | 5 | +-----------------+-----------------+
The remaining keyspan.Key columns would look like:
+-----------------+-----------------+-----------------+ | Trailer | Suffix | Value | +-----------------+-----------------+-----------------+ | (#0,RANGEDEL) | - | - | (0) +-----------------+-----------------+-----------------+ | (#100,RANGEDEL) | - | - | (1) +-----------------+-----------------+-----------------+ | (#0,RANGEDEL) | - | - | (2) +-----------------+-----------------+-----------------+ | (#0,RANGEDEL) | - | - | (3) +-----------------+-----------------+-----------------+ | (#20,RANGEDEL) | - | - | (4) +-----------------+-----------------+-----------------+
This encoding does not explicitly encode the mapping of keyspan.Key to boundary keys. Rather each boundary key encodes the index where keys beginning at a boundary >= the key begin. Readers look up the key start index for the start boundary (s) and the end boundary (e). Any keys within indexes [s,e) have the corresponding bounds.
Both range deletions and range keys are encoded with the same schema. Range deletion keyspan.Keys never contain suffixes or values. When one of these columns is encoded, the RawBytes encoding uses uintEncodingAllZero to avoid encoding N offsets. Each of these empty columns is encoded in just 1 byte of column data.
Index ¶
- Constants
- func AssertKeyCompare(comparer *base.Comparer, a, b []byte, kcmp KeyComparison)
- func Clone[V any](a Array[V], n int) []V
- func DecodeColumn[V any](d *BlockDecoder, col int, rows int, dataType DataType, ...) V
- func FinishBlock(rows int, writers []ColumnWriter) []byte
- func InitDataBlockMetadata(schema *KeySchema, md *block.Metadata, data []byte) (err error)
- func InitIndexBlockMetadata(md *block.Metadata, data []byte) (err error)
- func InitKeyspanBlockMetadata(md *block.Metadata, data []byte) (err error)
- type Array
- type Bitmap
- type BitmapBuilder
- func (b *BitmapBuilder) DataType(int) DataType
- func (b *BitmapBuilder) Finish(col, nRows int, offset uint32, buf []byte) uint32
- func (b *BitmapBuilder) Invert(nRows int)
- func (b *BitmapBuilder) InvertedSize(rows int, offset uint32) uint32
- func (b *BitmapBuilder) NumColumns() int
- func (b *BitmapBuilder) Reset()
- func (b *BitmapBuilder) Set(i int)
- func (b *BitmapBuilder) Size(rows int, offset uint32) uint32
- func (b *BitmapBuilder) WriteDebug(w io.Writer, rows int)
- type BlockDecoder
- func (d *BlockDecoder) Bitmap(col int) Bitmap
- func (d *BlockDecoder) DataType(col int) DataType
- func (d *BlockDecoder) FormattedString() string
- func (d *BlockDecoder) Init(data []byte, customHeaderSize uint32)
- func (d *BlockDecoder) PrefixBytes(col int) PrefixBytes
- func (d *BlockDecoder) RawBytes(col int) RawBytes
- func (d *BlockDecoder) Rows() int
- func (d *BlockDecoder) Uints(col int) UnsafeUints
- type ColumnWriter
- type DataBlockDecoder
- func (d *DataBlockDecoder) BlockDecoder() *BlockDecoder
- func (d *DataBlockDecoder) Describe(f *binfmt.Formatter, tp treeprinter.Node)
- func (d *DataBlockDecoder) Init(schema *KeySchema, data []byte)
- func (d *DataBlockDecoder) KeySchemaHeader() []byte
- func (d *DataBlockDecoder) PrefixChanged() Bitmap
- func (d *DataBlockDecoder) Validate(comparer *base.Comparer, keySchema *KeySchema) error
- type DataBlockEncoder
- func (w *DataBlockEncoder) Add(ikey base.InternalKey, value []byte, valuePrefix block.ValuePrefix, ...)
- func (w *DataBlockEncoder) Finish(rows, size int) (finished []byte, lastKey base.InternalKey)
- func (w *DataBlockEncoder) Init(schema *KeySchema)
- func (w *DataBlockEncoder) MaterializeLastUserKey(appendTo []byte) []byte
- func (w *DataBlockEncoder) Reset()
- func (w *DataBlockEncoder) Rows() int
- func (w *DataBlockEncoder) Size() int
- func (w *DataBlockEncoder) String() string
- type DataBlockIter
- func (i *DataBlockIter) Close() error
- func (i *DataBlockIter) DebugTree(tp treeprinter.Node)
- func (i *DataBlockIter) Error() error
- func (i *DataBlockIter) First() *base.InternalKV
- func (i *DataBlockIter) Handle() block.BufferHandle
- func (i *DataBlockIter) Init(d *DataBlockDecoder, transforms block.IterTransforms) error
- func (i *DataBlockIter) InitHandle(comparer *base.Comparer, h block.BufferHandle, transforms block.IterTransforms) error
- func (i *DataBlockIter) InitOnce(keySchema *KeySchema, comparer *base.Comparer, ...)
- func (i *DataBlockIter) Invalidate()
- func (i *DataBlockIter) IsDataInvalidated() bool
- func (i *DataBlockIter) IsLowerBound(k []byte) bool
- func (i *DataBlockIter) KV() *base.InternalKV
- func (i *DataBlockIter) Last() *base.InternalKV
- func (i *DataBlockIter) Next() *base.InternalKV
- func (i *DataBlockIter) NextPrefix(_ []byte) *base.InternalKV
- func (i *DataBlockIter) Prev() *base.InternalKV
- func (i *DataBlockIter) SeekGE(key []byte, flags base.SeekGEFlags) *base.InternalKV
- func (i *DataBlockIter) SeekLT(key []byte, _ base.SeekLTFlags) *base.InternalKV
- func (i *DataBlockIter) SeekPrefixGE(prefix, key []byte, flags base.SeekGEFlags) *base.InternalKV
- func (i *DataBlockIter) SetBounds(lower, upper []byte)
- func (i *DataBlockIter) SetContext(_ context.Context)
- func (i *DataBlockIter) String() string
- func (i *DataBlockIter) Valid() bool
- type DataBlockRewriter
- type DataType
- type DecodeFunc
- type Encoder
- type Header
- type IndexBlockDecoder
- type IndexBlockWriter
- func (w *IndexBlockWriter) AddBlockHandle(separator []byte, handle block.Handle, blockProperties []byte) int
- func (w *IndexBlockWriter) Finish(rows int) []byte
- func (w *IndexBlockWriter) Init()
- func (w *IndexBlockWriter) Reset()
- func (w *IndexBlockWriter) Rows() int
- func (w *IndexBlockWriter) Size() int
- func (w *IndexBlockWriter) UnsafeSeparator(i int) []byte
- type IndexIter
- func (i *IndexIter) BlockHandleWithProperties() (block.HandleWithProperties, error)
- func (i *IndexIter) Close() error
- func (i *IndexIter) First() bool
- func (i *IndexIter) Handle() block.BufferHandle
- func (i *IndexIter) Init(comparer *base.Comparer, blk []byte, transforms block.IterTransforms) error
- func (i *IndexIter) InitHandle(comparer *base.Comparer, blk block.BufferHandle, ...) error
- func (i *IndexIter) InitWithDecoder(comparer *base.Comparer, d *IndexBlockDecoder, transforms block.IterTransforms)
- func (i *IndexIter) Invalidate()
- func (i *IndexIter) IsDataInvalidated() bool
- func (i *IndexIter) Last() bool
- func (i *IndexIter) Next() bool
- func (i *IndexIter) Prev() bool
- func (i *IndexIter) RowIndex() int
- func (i *IndexIter) SeekGE(key []byte) bool
- func (i *IndexIter) Separator() []byte
- func (i *IndexIter) SeparatorGT(key []byte, inclusively bool) bool
- func (i *IndexIter) SeparatorLT(key []byte) bool
- func (i *IndexIter) Valid() bool
- type KeyComparison
- type KeySchema
- type KeySeeker
- type KeySeekerMetadata
- type KeyWriter
- type KeyspanBlockWriter
- func (w *KeyspanBlockWriter) AddSpan(s keyspan.Span)
- func (w *KeyspanBlockWriter) Finish() []byte
- func (w *KeyspanBlockWriter) Init(equal base.Equal)
- func (w *KeyspanBlockWriter) KeyCount() int
- func (w *KeyspanBlockWriter) Reset()
- func (w *KeyspanBlockWriter) Size() int
- func (w *KeyspanBlockWriter) String() string
- func (w *KeyspanBlockWriter) UnsafeBoundaryKeys() (smallest, largest base.InternalKey)
- func (w *KeyspanBlockWriter) UnsafeLastSpan() (start, end []byte, largestTrailer base.InternalKeyTrailer)
- type KeyspanDecoder
- type KeyspanIter
- func (i *KeyspanIter) Close()
- func (i *KeyspanIter) DebugTree(tp treeprinter.Node)
- func (i *KeyspanIter) First() (*keyspan.Span, error)
- func (i *KeyspanIter) Last() (*keyspan.Span, error)
- func (i *KeyspanIter) Next() (*keyspan.Span, error)
- func (i *KeyspanIter) Prev() (*keyspan.Span, error)
- func (i *KeyspanIter) SeekGE(key []byte) (*keyspan.Span, error)
- func (i *KeyspanIter) SeekLT(key []byte) (*keyspan.Span, error)
- func (i *KeyspanIter) SetContext(context.Context)
- func (i *KeyspanIter) WrapChildren(keyspan.WrapFn)
- type PrefixBytes
- func (b PrefixBytes) At(i int) []byte
- func (b *PrefixBytes) BundleCount() int
- func (b *PrefixBytes) BundlePrefix(i int) []byte
- func (b *PrefixBytes) RowBundlePrefix(row int) []byte
- func (b *PrefixBytes) RowSuffix(row int) []byte
- func (b *PrefixBytes) Rows() int
- func (b *PrefixBytes) Search(k []byte) (rowIndex int, isEqual bool)
- func (b *PrefixBytes) SetAt(it *PrefixBytesIter, i int)
- func (b *PrefixBytes) SetNext(it *PrefixBytesIter)
- func (b *PrefixBytes) SharedPrefix() []byte
- func (b *PrefixBytes) UnsafeFirstSlice() []byte
- type PrefixBytesBuilder
- func (b *PrefixBytesBuilder) DataType(int) DataType
- func (b *PrefixBytesBuilder) Finish(col int, rows int, offset uint32, buf []byte) (endOffset uint32)
- func (b *PrefixBytesBuilder) Init(bundleSize int)
- func (b *PrefixBytesBuilder) NumColumns() int
- func (b *PrefixBytesBuilder) Put(key []byte, bytesSharedWithPrev int)
- func (b *PrefixBytesBuilder) Reset()
- func (b *PrefixBytesBuilder) Rows() int
- func (b *PrefixBytesBuilder) Size(rows int, offset uint32) uint32
- func (b *PrefixBytesBuilder) UnsafeGet(i int) []byte
- func (b *PrefixBytesBuilder) WriteDebug(w io.Writer, rows int)
- type PrefixBytesIter
- type RawBytes
- type RawBytesBuilder
- func (b *RawBytesBuilder) DataType(int) DataType
- func (b *RawBytesBuilder) Finish(col, rows int, offset uint32, buf []byte) uint32
- func (b *RawBytesBuilder) Init()
- func (b *RawBytesBuilder) NumColumns() int
- func (b *RawBytesBuilder) Put(s []byte)
- func (b *RawBytesBuilder) PutConcat(s1, s2 []byte)
- func (b *RawBytesBuilder) Reset()
- func (b *RawBytesBuilder) Rows() int
- func (b *RawBytesBuilder) Size(rows int, offset uint32) uint32
- func (b *RawBytesBuilder) UnsafeGet(i int) []byte
- func (b *RawBytesBuilder) WriteDebug(w io.Writer, rows int)
- type Uint
- type UintBuilder
- func (b *UintBuilder) DataType(int) DataType
- func (b *UintBuilder) Finish(col, rows int, offset uint32, buf []byte) uint32
- func (b *UintBuilder) Get(row int) uint64
- func (b *UintBuilder) Init()
- func (b *UintBuilder) InitWithDefault()
- func (b *UintBuilder) NumColumns() int
- func (b *UintBuilder) Reset()
- func (b *UintBuilder) Set(row int, v uint64)
- func (b *UintBuilder) Size(rows int, offset uint32) uint32
- func (b *UintBuilder) WriteDebug(w io.Writer, rows int)
- type UintEncoding
- type UnsafeBuf
- type UnsafeOffsets
- type UnsafeRawSlice
- type UnsafeUints
- type Version
Constants ¶
const KeySeekerMetadataSize = 176
KeySeekerMetadataSize is chosen to fit the CockroachDB key seeker implementation.
const UintEncodingRowThreshold = 8
UintEncodingRowThreshold is the threshold under which the number of rows can affect the best encoding. This happens when the constant 8 bytes for the delta base doesn't make up for saving a byte or two in the per-row encoding.
Variables ¶
This section is empty.
Functions ¶
func AssertKeyCompare ¶
func AssertKeyCompare(comparer *base.Comparer, a, b []byte, kcmp KeyComparison)
AssertKeyCompare compares two keys using the provided comparer, ensuring the provided KeyComparison accurately describing the result. Panics if the assertion does not hold.
func DecodeColumn ¶
func DecodeColumn[V any]( d *BlockDecoder, col int, rows int, dataType DataType, decodeFunc DecodeFunc[V], ) V
DecodeColumn decodes the col'th column of the provided reader's block as a column of dataType using decodeFunc.
func FinishBlock ¶
func FinishBlock(rows int, writers []ColumnWriter) []byte
FinishBlock writes the columnar block to a heap-allocated byte slice. FinishBlock assumes all columns have the same number of rows. If that's not the case, the caller should manually construct their own block.
func InitDataBlockMetadata ¶
InitDataBlockMetadata initializes the metadata for a data block.
func InitIndexBlockMetadata ¶
InitIndexBlockMetadata initializes the metadata for an index block.
Types ¶
type Bitmap ¶
type Bitmap struct {
// contains filtered or unexported fields
}
Bitmap is a bitmap structure built on a []uint64. A bitmap utilizes ~1 physical bit/logical bit (~0.125 bytes/row). The bitmap is encoded into an 8-byte aligned array of 64-bit words which is (nRows+63)/64 words in length.
A summary bitmap is stored after the primary bitmap in which each bit in the summary bitmap corresponds to 1 word in the primary bitmap. A bit is set in the summary bitmap if the corresponding word in the primary bitmap is non-zero. The summary bitmap accelerates predecessor and successor operations.
func DecodeBitmap ¶
DecodeBitmap decodes the structure of a Bitmap and returns a Bitmap that reads from b supporting bitCount logical bits. No bounds checking is performed, so the caller must guarantee the bitmap is appropriately sized and the provided bitCount correctly identifies the number of bits in the bitmap.
func (Bitmap) SeekSetBitGE ¶
SeekSetBitGE returns the next bit greater than or equal to i set in the bitmap. The i parameter must be ≥ 0. Returns the number of bits represented by the bitmap if no next bit is set (or if i >= bitCount).
func (Bitmap) SeekSetBitLE ¶
SeekSetBitLE returns the previous bit less than or equal to i set in the bitmap. The i parameter must be in [0, bitCount). Returns -1 if no previous bit is set.
func (Bitmap) SeekUnsetBitGE ¶
SeekUnsetBitGE returns the next bit greater than or equal to i that is unset in the bitmap. The i parameter must be in [0, bitCount). Returns the number of bits represented by the bitmap if no next bit is unset.
func (Bitmap) SeekUnsetBitLE ¶
SeekUnsetBitLE returns the previous bit less than or equal to i set in the bitmap. The i parameter must be in [0, bitCount). Returns -1 if no previous bit is unset.
type BitmapBuilder ¶
type BitmapBuilder struct {
// contains filtered or unexported fields
}
BitmapBuilder constructs a Bitmap. Bits default to false.
func (*BitmapBuilder) DataType ¶
func (b *BitmapBuilder) DataType(int) DataType
DataType implements the ColumnWriter interface.
func (*BitmapBuilder) Finish ¶
func (b *BitmapBuilder) Finish(col, nRows int, offset uint32, buf []byte) uint32
Finish finalizes the bitmap, computing the per-word summary bitmap and writing the resulting data to buf at offset.
func (*BitmapBuilder) Invert ¶
func (b *BitmapBuilder) Invert(nRows int)
Invert inverts the bitmap, setting all bits that are not set and clearing all bits that are set. If the bitmap's tail is sparse and is not large enough to represent nRows rows, it's first materialized.
Note that Invert can affect the Size of the bitmap. Use InvertedSize() if you intend to invert the bitmap before finishing.
func (*BitmapBuilder) InvertedSize ¶
func (b *BitmapBuilder) InvertedSize(rows int, offset uint32) uint32
InvertedSize returns the size of the encoded bitmap, assuming Invert will be called.
func (*BitmapBuilder) NumColumns ¶
func (b *BitmapBuilder) NumColumns() int
NumColumns implements the ColumnWriter interface.
func (*BitmapBuilder) Reset ¶
func (b *BitmapBuilder) Reset()
Reset resets the bitmap to the empty state.
func (*BitmapBuilder) Set ¶
func (b *BitmapBuilder) Set(i int)
Set sets the bit at position i to true.
func (*BitmapBuilder) Size ¶
func (b *BitmapBuilder) Size(rows int, offset uint32) uint32
Size implements the ColumnWriter interface.
func (*BitmapBuilder) WriteDebug ¶
func (b *BitmapBuilder) WriteDebug(w io.Writer, rows int)
WriteDebug implements the ColumnWriter interface.
type BlockDecoder ¶
type BlockDecoder struct {
// contains filtered or unexported fields
}
A BlockDecoder holds metadata for accessing the columns of a columnar block.
func DecodeBlock ¶
func DecodeBlock(data []byte, customHeaderSize uint32) BlockDecoder
DecodeBlock decodes the header of the provided columnar block and returns a new BlockDecoder configured to read from the block. The caller must ensure that the data is formatted as to the block layout specification.
func (*BlockDecoder) Bitmap ¶
func (d *BlockDecoder) Bitmap(col int) Bitmap
Bitmap retrieves the col'th column as a bitmap. The column must be of type DataTypeBool.
func (*BlockDecoder) DataType ¶
func (d *BlockDecoder) DataType(col int) DataType
DataType returns the data type of the col'th column. Every column's data type is encoded within the block header.
func (*BlockDecoder) FormattedString ¶
func (d *BlockDecoder) FormattedString() string
FormattedString returns a formatted representation of the block's binary data.
func (*BlockDecoder) Init ¶
func (d *BlockDecoder) Init(data []byte, customHeaderSize uint32)
Init initializes a BlockDecoder with the data contained in the provided block.
func (*BlockDecoder) PrefixBytes ¶
func (d *BlockDecoder) PrefixBytes(col int) PrefixBytes
PrefixBytes retrieves the col'th column as a prefix-compressed byte slice column. The column must be of type DataTypePrefixBytes.
func (*BlockDecoder) RawBytes ¶
func (d *BlockDecoder) RawBytes(col int) RawBytes
RawBytes retrieves the col'th column as a column of byte slices. The column must be of type DataTypeBytes.
func (*BlockDecoder) Rows ¶
func (d *BlockDecoder) Rows() int
Rows returns the number of rows in the block, as indicated by the block header.
func (*BlockDecoder) Uints ¶
func (d *BlockDecoder) Uints(col int) UnsafeUints
Uints retrieves the col'th column as a column of uints. The column must be of type DataTypeUint.
type ColumnWriter ¶
type ColumnWriter interface { Encoder // NumColumns returns the number of columns the ColumnWriter will encode. NumColumns() int // DataType returns the data type of the col'th column. DataType(col int) DataType // Finish serializes the column at the specified index, writing the column's // data to buf at offset, and returning the offset at which the next column // should be encoded. // // The supplied buf must have enough space at the provided offset to fit the // column. The caller may use Size() to calculate the exact size required. // The caller passes the number of rows they want to serialize. All // implementations of Finish must support cases where rows is the number of // rows the caller has set, or one less. Some implementations may be more // permissive. // // The provided column index must be less than NumColumns(). Finish is // called for each index < NumColumns() in order. // // The provided buf must be word-aligned (at offset 0). If a column writer // requires a particularly alignment, it's responsible for padding offset // appropriately first. Finish(col, rows int, offset uint32, buf []byte) (nextOffset uint32) }
ColumnWriter is an interface implemented by column encoders that accumulate a column's values and then serialize them.
type DataBlockDecoder ¶
type DataBlockDecoder struct {
// contains filtered or unexported fields
}
A DataBlockDecoder holds state for interpreting a columnar data block. It may be shared among multiple DataBlockIters.
func (*DataBlockDecoder) BlockDecoder ¶
func (d *DataBlockDecoder) BlockDecoder() *BlockDecoder
BlockDecoder returns a pointer to the underlying BlockDecoder.
func (*DataBlockDecoder) Describe ¶
func (d *DataBlockDecoder) Describe(f *binfmt.Formatter, tp treeprinter.Node)
Describe descirbes the binary format of the data block, assuming f.Offset() is positioned at the beginning of the same data block described by d.
func (*DataBlockDecoder) Init ¶
func (d *DataBlockDecoder) Init(schema *KeySchema, data []byte)
Init initializes the data block reader with the given serialized data block.
func (*DataBlockDecoder) KeySchemaHeader ¶
func (d *DataBlockDecoder) KeySchemaHeader() []byte
KeySchemaHeader returns the KeySchema-specific header.
func (*DataBlockDecoder) PrefixChanged ¶
func (d *DataBlockDecoder) PrefixChanged() Bitmap
PrefixChanged returns the prefix-changed bitmap.
type DataBlockEncoder ¶
type DataBlockEncoder struct { Schema *KeySchema KeyWriter KeyWriter // contains filtered or unexported fields }
DataBlockEncoder encodes columnar data blocks using a user-defined schema.
func (*DataBlockEncoder) Add ¶
func (w *DataBlockEncoder) Add( ikey base.InternalKey, value []byte, valuePrefix block.ValuePrefix, kcmp KeyComparison, isObsolete bool, )
Add adds the provided key to the data block. Keys must be added in order. The caller must supply a KeyComparison containing the comparison of the key to the previously-added key, obtainable through
KeyWriter.ComparePrev(ikey.UserKey)
The caller is required to pass this in because in expected use cases, the caller will also require the same information.
func (*DataBlockEncoder) Finish ¶
func (w *DataBlockEncoder) Finish(rows, size int) (finished []byte, lastKey base.InternalKey)
Finish serializes the pending data block, including the first [rows] rows. The value of [rows] must be Rows() or Rows()-1. The provided size must be the size of the data block with the provided row count (i.e., the return value of [Size] when DataBlockEncoder.Rows() = [rows]).
Finish the returns the serialized, uncompressed data block and the InternalKey of the last key contained within the data block. The memory of the lastKey's UserKey is owned by the DataBlockEncoder. The caller must copy it if they require it to outlive a Reset of the writer.
func (*DataBlockEncoder) Init ¶
func (w *DataBlockEncoder) Init(schema *KeySchema)
Init initializes the data block writer.
func (*DataBlockEncoder) MaterializeLastUserKey ¶
func (w *DataBlockEncoder) MaterializeLastUserKey(appendTo []byte) []byte
MaterializeLastUserKey materializes the last added user key.
func (*DataBlockEncoder) Reset ¶
func (w *DataBlockEncoder) Reset()
Reset resets the data block writer to its initial state, retaining buffers.
func (*DataBlockEncoder) Rows ¶
func (w *DataBlockEncoder) Rows() int
Rows returns the number of rows in the current pending data block.
func (*DataBlockEncoder) Size ¶
func (w *DataBlockEncoder) Size() int
Size returns the size of the current pending data block.
func (*DataBlockEncoder) String ¶
func (w *DataBlockEncoder) String() string
String outputs a human-readable summary of internal DataBlockEncoder state.
type DataBlockIter ¶
type DataBlockIter struct {
// contains filtered or unexported fields
}
DataBlockIter iterates over a columnar data block.
func (*DataBlockIter) Close ¶
func (i *DataBlockIter) Close() error
Close implements the base.InternalIterator interface.
func (*DataBlockIter) DebugTree ¶
func (i *DataBlockIter) DebugTree(tp treeprinter.Node)
DebugTree is part of the InternalIterator interface.
func (*DataBlockIter) Error ¶
func (i *DataBlockIter) Error() error
Error implements the base.InternalIterator interface. A DataBlockIter is infallible and always returns a nil error.
func (*DataBlockIter) First ¶
func (i *DataBlockIter) First() *base.InternalKV
First implements the base.InternalIterator interface.
func (*DataBlockIter) Handle ¶
func (i *DataBlockIter) Handle() block.BufferHandle
Handle returns the handle to the block.
func (*DataBlockIter) Init ¶
func (i *DataBlockIter) Init(d *DataBlockDecoder, transforms block.IterTransforms) error
Init initializes the data block iterator, configuring it to read from the provided decoder.
func (*DataBlockIter) InitHandle ¶
func (i *DataBlockIter) InitHandle( comparer *base.Comparer, h block.BufferHandle, transforms block.IterTransforms, ) error
InitHandle initializes the block from the provided buffer handle. InitHandle assumes that the block's metadata was initialized using InitDataBlockMetadata().
func (*DataBlockIter) InitOnce ¶
func (i *DataBlockIter) InitOnce( keySchema *KeySchema, comparer *base.Comparer, getLazyValuer block.GetLazyValueForPrefixAndValueHandler, )
InitOnce configures the data block iterator's key schema and lazy value handler. The iterator must be initialized with a block before it can be used. It may be reinitialized with new blocks without calling InitOnce again.
func (*DataBlockIter) Invalidate ¶
func (i *DataBlockIter) Invalidate()
Invalidate invalidates the block iterator, removing references to the block it was initialized with. The iterator may continue to be used after a call to Invalidate, but all positioning methods should return false. Valid() must also return false.
func (*DataBlockIter) IsDataInvalidated ¶
func (i *DataBlockIter) IsDataInvalidated() bool
IsDataInvalidated returns true when the iterator has been invalidated using an Invalidate call.
func (*DataBlockIter) IsLowerBound ¶
func (i *DataBlockIter) IsLowerBound(k []byte) bool
IsLowerBound implements the block.DataBlockIterator interface.
func (*DataBlockIter) KV ¶
func (i *DataBlockIter) KV() *base.InternalKV
KV returns the key-value pair at the current iterator position. The iterator must be positioned over a valid KV.
func (*DataBlockIter) Last ¶
func (i *DataBlockIter) Last() *base.InternalKV
Last implements the base.InternalIterator interface.
func (*DataBlockIter) Next ¶
func (i *DataBlockIter) Next() *base.InternalKV
Next advances to the next KV pair in the block.
func (*DataBlockIter) NextPrefix ¶
func (i *DataBlockIter) NextPrefix(_ []byte) *base.InternalKV
NextPrefix moves the iterator to the next row with a different prefix than the key at the current iterator position.
The columnar block implementation uses a newPrefix bitmap to identify the next row with a differing prefix from the current row's key. If newPrefix[i] is set then row's i key prefix is different that row i-1. The bitmap is organized as a slice of 64-bit words. If a 64-bit word in the bitmap is zero then all of the rows corresponding to the bits in that word have the same prefix and we can skip ahead. If a row is non-zero a small bit of bit shifting and masking combined with bits.TrailingZeros64 can identify the next bit that is set after the current row. The bitmap uses 1 bit/row (0.125 bytes/row). A 32KB block containing 1300 rows (25 bytes/row) would need a bitmap of 21 64-bit words. Even in the worst case where every word is 0 this bitmap can be scanned in ~20 ns (1 ns/word) leading to a total NextPrefix time of ~30 ns if a row is found and decodeRow are called. In more normal cases, NextPrefix takes ~15% longer that a single Next call.
For comparison, the rowblk nextPrefixV3 optimizations work by setting a bit in the value prefix byte that indicates that the current key has the same prefix as the previous key. Additionally, a bit is stolen from the restart table entries indicating whether a restart table entry has the same key prefix as the previous entry. Checking the value prefix byte bit requires locating that byte which requires decoding 3 varints per key/value pair.
func (*DataBlockIter) Prev ¶
func (i *DataBlockIter) Prev() *base.InternalKV
Prev moves the iterator to the previous KV pair in the block.
func (*DataBlockIter) SeekGE ¶
func (i *DataBlockIter) SeekGE(key []byte, flags base.SeekGEFlags) *base.InternalKV
SeekGE implements the base.InternalIterator interface.
func (*DataBlockIter) SeekLT ¶
func (i *DataBlockIter) SeekLT(key []byte, _ base.SeekLTFlags) *base.InternalKV
SeekLT implements the base.InternalIterator interface.
func (*DataBlockIter) SeekPrefixGE ¶
func (i *DataBlockIter) SeekPrefixGE(prefix, key []byte, flags base.SeekGEFlags) *base.InternalKV
SeekPrefixGE implements the base.InternalIterator interface.
func (*DataBlockIter) SetBounds ¶
func (i *DataBlockIter) SetBounds(lower, upper []byte)
SetBounds implements the base.InternalIterator interface.
func (*DataBlockIter) SetContext ¶
func (i *DataBlockIter) SetContext(_ context.Context)
SetContext implements the base.InternalIterator interface.
func (*DataBlockIter) String ¶
func (i *DataBlockIter) String() string
String implements the base.InternalIterator interface.
func (*DataBlockIter) Valid ¶
func (i *DataBlockIter) Valid() bool
Valid returns true if the iterator is currently positioned at a valid KV.
type DataBlockRewriter ¶
type DataBlockRewriter struct { KeySchema *KeySchema // contains filtered or unexported fields }
DataBlockRewriter rewrites data blocks. See RewriteSuffixes.
func NewDataBlockRewriter ¶
func NewDataBlockRewriter(keySchema *KeySchema, comparer *base.Comparer) *DataBlockRewriter
NewDataBlockRewriter creates a block rewriter.
func (*DataBlockRewriter) RewriteSuffixes ¶
func (rw *DataBlockRewriter) RewriteSuffixes( input []byte, from []byte, to []byte, ) (start, end base.InternalKey, rewritten []byte, err error)
RewriteSuffixes rewrites the input block. It expects the input block to only contain keys with the suffix `from`. It rewrites the block to contain the same keys with the suffix `to`.
RewriteSuffixes returns the start and end keys of the rewritten block, and the finished rewritten block. The returned start and end keys have indefinite lifetimes. The returned rewritten block is owned by the DataBlockRewriter. If it must be retained beyond the next call to RewriteSuffixes, the caller should make a copy.
Note that the input slice must be 8-byte aligned.
type DataType ¶
type DataType uint8
DataType describes the logical type of a column's values. Some data types have multiple possible physical representations. Encoding a column may choose between possible physical representations depending on the distribution of values and the size of the resulting physical representation.
const ( // DataTypeInvalid represents an unset or invalid data type. DataTypeInvalid DataType = 0 // DataTypeBool is a data type encoding a bool per row. DataTypeBool DataType = 1 // DataTypeUint is a data type encoding a fixed 8 bits per row. DataTypeUint DataType = 2 // DataTypeBytes is a data type encoding a variable-length byte string per // row. DataTypeBytes DataType = 3 // DataTypePrefixBytes is a data type encoding variable-length, // lexicographically-sorted byte strings, with prefix compression. DataTypePrefixBytes DataType = 4 )
type DecodeFunc ¶
A DecodeFunc decodes a data structure from a byte slice, returning an accessor for the data and the offset of the first byte after the structure. The rows argument must be number of logical rows encoded within the data structure.
type Encoder ¶
type Encoder interface { // Reset clears the ColumnWriter's internal state, preparing it for reuse. Reset() // Size returns the size required to encode the column's current values. // // The `rows` argument must be the current number of logical rows in the // column. Some implementations support defaults, and these implementations // rely on the caller to inform them the current number of logical rows. The // provided `rows` must be greater than or equal to the largest row set + 1. // In other words, Size does not support determining the size of a column's // earlier size before additional rows were added. Size(rows int, offset uint32) uint32 // WriteDebug writes a human-readable description of the current column // state to the provided writer. WriteDebug(w io.Writer, rows int) }
Encoder is an interface implemented by column encoders.
type Header ¶
type Header struct { Version Version // Columns holds the number of columns encoded within the block. Columns uint16 // Rows holds the number of rows encoded within the block. Rows uint32 }
Header holds the metadata extracted from a columnar block's header.
func DecodeHeader ¶
DecodeHeader reads the block header from the provided serialized columnar block.
type IndexBlockDecoder ¶
type IndexBlockDecoder struct {
// contains filtered or unexported fields
}
An IndexBlockDecoder reads columnar index blocks.
func (*IndexBlockDecoder) DebugString ¶
func (r *IndexBlockDecoder) DebugString() string
DebugString prints a human-readable explanation of the keyspan block's binary representation.
func (*IndexBlockDecoder) Describe ¶
func (r *IndexBlockDecoder) Describe(f *binfmt.Formatter, tp treeprinter.Node)
Describe describes the binary format of the index block, assuming f.Offset() is positioned at the beginning of the same index block described by r.
func (*IndexBlockDecoder) Init ¶
func (r *IndexBlockDecoder) Init(data []byte)
Init initializes the index block decoder with the given serialized index block.
type IndexBlockWriter ¶
type IndexBlockWriter struct {
// contains filtered or unexported fields
}
IndexBlockWriter writes columnar index blocks. The writer is used for both first-level and second-level index blocks. The index block schema consists of three primary columns:
- Separators: a user key that is ≥ the largest user key in the corresponding entry, and ≤ the smallest user key in the next entry. Note that this allows consecutive separators to be equal. This is possible when snapshots required we preserve duplicate user keys at different sequence numbers.
- Offsets: the offset of the end of the corresponding block.
- Lengths: the length in bytes of the corresponding block.
- Block properties: a slice encoding arbitrary user-defined block properties.
TODO(jackson): Consider splitting separators into prefixes and suffixes (even without user-defined columns). This would allow us to use prefix compression for the prefix. Separators should typically be suffixless unless two KVs with the same prefix straddle a block boundary. We would need to use a buffer to materialize the separator key when we need to use it outside the context of seeking within the block.
func (*IndexBlockWriter) AddBlockHandle ¶
func (w *IndexBlockWriter) AddBlockHandle( separator []byte, handle block.Handle, blockProperties []byte, ) int
AddBlockHandle adds a new separator and end offset of a data block to the index block. Add returns the index of the row.
AddBlockHandle should only be used for first-level index blocks.
func (*IndexBlockWriter) Finish ¶
func (w *IndexBlockWriter) Finish(rows int) []byte
Finish serializes the pending index block, including the first [rows] rows. The value of [rows] must be Rows() or Rows()-1.
func (*IndexBlockWriter) Init ¶
func (w *IndexBlockWriter) Init()
Init initializes the index block writer.
func (*IndexBlockWriter) Reset ¶
func (w *IndexBlockWriter) Reset()
Reset resets the index block writer to its initial state, retaining buffers.
func (*IndexBlockWriter) Rows ¶
func (w *IndexBlockWriter) Rows() int
Rows returns the number of entries in the index block so far.
func (*IndexBlockWriter) Size ¶
func (w *IndexBlockWriter) Size() int
Size returns the size of the pending index block.
func (*IndexBlockWriter) UnsafeSeparator ¶
func (w *IndexBlockWriter) UnsafeSeparator(i int) []byte
UnsafeSeparator returns the separator of the i'th entry.
type IndexIter ¶
type IndexIter struct {
// contains filtered or unexported fields
}
IndexIter is an iterator over the block entries in an index block.
func (*IndexIter) BlockHandleWithProperties ¶
func (i *IndexIter) BlockHandleWithProperties() (block.HandleWithProperties, error)
BlockHandleWithProperties decodes the block handle with any encoded properties at the iterator's current position.
func (*IndexIter) First ¶
First seeks index iterator to the first block entry. It returns false if the index block is empty.
func (*IndexIter) Handle ¶
func (i *IndexIter) Handle() block.BufferHandle
Handle returns the underlying block buffer handle, if the iterator was initialized with one.
func (*IndexIter) Init ¶
func (i *IndexIter) Init( comparer *base.Comparer, blk []byte, transforms block.IterTransforms, ) error
Init initializes an iterator from the provided block data slice.
func (*IndexIter) InitHandle ¶
func (i *IndexIter) InitHandle( comparer *base.Comparer, blk block.BufferHandle, transforms block.IterTransforms, ) error
InitHandle initializes an iterator from the provided block handle.
func (*IndexIter) InitWithDecoder ¶
func (i *IndexIter) InitWithDecoder( comparer *base.Comparer, d *IndexBlockDecoder, transforms block.IterTransforms, )
InitWithDecoder initializes an index iterator from the provided decoder.
func (*IndexIter) Invalidate ¶
func (i *IndexIter) Invalidate()
Invalidate invalidates the block iterator, removing references to the block it was initialized with.
func (*IndexIter) IsDataInvalidated ¶
IsDataInvalidated returns true when the iterator has been invalidated using an Invalidate call. NB: this is different from Valid.
func (*IndexIter) Last ¶
Last seeks index iterator to the last block entry. It returns false if the index block is empty.
func (*IndexIter) Next ¶
Next steps the index iterator to the next block entry. It returns false if the index block is exhausted in the forward direction. A call to Next while already exhausted in the forward direction is a no-op.
func (*IndexIter) Prev ¶
Prev steps the index iterator to the previous block entry. It returns false if the index block is exhausted in the reverse direction. A call to Prev while already exhausted in the reverse direction is a no-op.
func (*IndexIter) RowIndex ¶
RowIndex returns the index of the block entry at the iterator's current position.
func (*IndexIter) SeekGE ¶
SeekGE seeks the index iterator to the first block entry with a separator key greater or equal to the given key. It returns false if the seek key is greater than all index block separators.
func (*IndexIter) Separator ¶
Separator returns the separator at the iterator's current position. The iterator must be positioned at a valid row.
func (*IndexIter) SeparatorGT ¶
SeparatorGT returns true if the separator at the iterator's current position is strictly greater than (or equal, if orEqual=true) the provided key.
func (*IndexIter) SeparatorLT ¶
SeparatorLT returns true if the separator at the iterator's current position is strictly less than the provided key.
type KeyComparison ¶
type KeyComparison struct { // PrefixLen is the length of the prefix of the key. It's the outcome of // calling base.Split on the key. PrefixLen int32 // CommonPrefixLen is the length of the physical (byte-wise) prefix of the // logical prefix that is shared with the other key. For example, for // "apple@1" and "applied@3" the value is 4 (the length of "appl"). For // "apple@1" and "apple@10" the value is 5 (the length of "apple"), because // the shared bytes within the suffix are not included. CommonPrefixLen int32 // UserKeyComparison is the comparison of the user keys of the two keys. // Should be equivalent to // // Compare(key, otherKey) UserKeyComparison int32 }
KeyComparison holds information about a key and its comparison to another a key.
func (KeyComparison) PrefixEqual ¶
func (kcmp KeyComparison) PrefixEqual() bool
PrefixEqual returns true if the key comparison determined that the keys have equal prefixes.
func (KeyComparison) String ¶
func (kcmp KeyComparison) String() string
String returns a string representation of the KeyComparison.
type KeySchema ¶
type KeySchema struct { Name string // KeySchema implementations can optionally make use a fixed-sized custom // header inside each block. HeaderSize uint32 ColumnTypes []DataType NewKeyWriter func() KeyWriter // InitKeySeekerMetadata initializes the provided KeySeekerMetadata. This // happens once when a block enters the block cache and can be used to save // computation in NewKeySeeker. InitKeySeekerMetadata func(meta *KeySeekerMetadata, d *DataBlockDecoder) // KeySeeker returns a KeySeeker using metadata that was previously // initialized with InitKeySeekerMetadata. The returned key seeker can be an // unsafe cast of the metadata itself. KeySeeker func(meta *KeySeekerMetadata) KeySeeker }
KeySchema defines the schema of a user key, as defined by the user's application.
TODO(jackson): Consider making this KVSchema. It feels like there's an opportunity to generalize the ShortAttribute so that when a value is stored out-of-band, the DataBlockEncoder calls user-provided code to store the short attributes inlined within the data block. For inlined-values, the user-defined value columns would be implicitly null.
type KeySeeker ¶
type KeySeeker interface { // IsLowerBound returns true if all keys in the data block (after suffix // replacement if syntheticSuffix is not empty) are >= the given key. If the // data block contains no keys, returns true. IsLowerBound(k []byte, syntheticSuffix []byte) bool // SeekGE returns the index of the first row with a key greater than or equal // to [key], and whether that row has the same prefix as [key]. // // If the caller externally knows a bound on where the key is located, it // may indicate it through [boundRow] and [searchDir]. A [searchDir] value // of -1 indicates that the sought row must be at an index ≤ [boundRow]. A // [searchDir] value of +1 indicates that the sought row must be at an index // ≥ [boundRow]. Implementations may use this information to constrain the // search. See (base.SeekGEFlags).TrySeekUsingNext for context on when this // may be set in practice. SeekGE(key []byte, boundRow int, searchDir int8) (row int, equalPrefix bool) // MaterializeUserKey materializes the user key of the specified row, // returning a slice of the materialized user key. // // The provided keyIter must have a buffer large enough to hold the key. // // The prevRow parameter is the row MaterializeUserKey was last invoked with // (or a negative number if not applicable). Implementations may take // advantage of that knowledge to reduce work. MaterializeUserKey(keyIter *PrefixBytesIter, prevRow, row int) []byte // MaterializeUserKeyWithSyntheticSuffix is a variant of MaterializeUserKey // where the suffix is replaced. // // The provided keyIter must have a buffer large enough to hold the key after // suffix replacement. // // The prevRow parameter is the row MaterializeUserKeyWithSyntheticSuffix was // last invoked with (or a negative number if not applicable). Implementations // may take advantage of that knowledge to reduce work. MaterializeUserKeyWithSyntheticSuffix( keyIter *PrefixBytesIter, syntheticSuffix []byte, prevRow, row int, ) []byte }
KeySeeker iterates over the keys in a columnar data block.
Users of Pebble who define their own key schema must implement KeySeeker to seek over their decomposed keys.
KeySeeker implementations must be safe for concurrent use by multiple goroutines. In practice, multiple DataBlockIterators may use the same KeySeeker.
type KeySeekerMetadata ¶
type KeySeekerMetadata [KeySeekerMetadataSize]byte
KeySeekerMetadata is an in-memory buffer that stores metadata for a block. It is allocated together with the buffer storing the block and is initialized once when the block is read from disk. It is always 8-byte aligned.
Portions of this buffer can be cast to the structures we need (through unsafe.Pointer), but note that any pointers in these structures will be invisible to the GC. Pointers to the block's data buffer are ok, since the metadata and the data have the same lifetime (sharing the underlying allocation).
KeySeekerMetadata is stored inside block.Metadata.
type KeyWriter ¶
type KeyWriter interface { ColumnWriter // ComparePrev compares the provided user to the previously-written user // key. The returned KeyComparison's UserKeyComparison field is equivalent // to Compare(key, prevKey) where prevKey is the last key passed to // WriteKey. // // If no key has been written yet, ComparePrev returns a KeyComparison with // PrefixLen set and UserKeyComparison=1. ComparePrev(key []byte) KeyComparison // WriteKey writes a user key into the KeyWriter's columns. The // keyPrefixLenSharedWithPrev parameter takes the number of bytes prefixing // the key's logical prefix (as defined by (base.Comparer).Split) that the // previously-written key's prefix shares. // // WriteKey is guaranteed to be called sequentially with increasing row // indexes, beginning at zero. WriteKey(row int, key []byte, keyPrefixLen, keyPrefixLenSharedWithPrev int32) // MaterializeKey appends the zero-indexed row'th key written to dst, // returning the result. MaterializeKey(dst []byte, row int) []byte // FinishHeader serializes an internal header of exactly KeySchema.HeaderSize bytes. FinishHeader(dst []byte) }
A KeyWriter maintains ColumnWriters for a data block for writing user keys into the database-specific key schema. Users may define their own key schema and implement KeyWriter to encode keys into custom columns that are aware of the structure of user keys.
type KeyspanBlockWriter ¶
type KeyspanBlockWriter struct {
// contains filtered or unexported fields
}
A KeyspanBlockWriter writes keyspan blocks. See the colblk package documentation for more details on the schema.
func (*KeyspanBlockWriter) AddSpan ¶
func (w *KeyspanBlockWriter) AddSpan(s keyspan.Span)
AddSpan appends a new Span to the pending block. Spans must already be fragmented (non-overlapping) and added in sorted order.
func (*KeyspanBlockWriter) Finish ¶
func (w *KeyspanBlockWriter) Finish() []byte
Finish finalizes the pending block and returns the encoded block.
func (*KeyspanBlockWriter) Init ¶
func (w *KeyspanBlockWriter) Init(equal base.Equal)
Init initializes a keyspan block writer.
func (*KeyspanBlockWriter) KeyCount ¶
func (w *KeyspanBlockWriter) KeyCount() int
KeyCount returns the count of keyspan.Keys written to the writer.
func (*KeyspanBlockWriter) Reset ¶
func (w *KeyspanBlockWriter) Reset()
Reset resets the keyspan block writer to an empty state, retaining memory for reuse.
func (*KeyspanBlockWriter) Size ¶
func (w *KeyspanBlockWriter) Size() int
Size returns the size of the pending block.
func (*KeyspanBlockWriter) String ¶
func (w *KeyspanBlockWriter) String() string
String returns a string representation of the pending block's state.
func (*KeyspanBlockWriter) UnsafeBoundaryKeys ¶
func (w *KeyspanBlockWriter) UnsafeBoundaryKeys() (smallest, largest base.InternalKey)
UnsafeBoundaryKeys returns the smallest and largest keys written to the keyspan block so far. The returned internal keys have user keys that point directly into the block writer's memory and must not be mutated.
func (*KeyspanBlockWriter) UnsafeLastSpan ¶
func (w *KeyspanBlockWriter) UnsafeLastSpan() ( start, end []byte, largestTrailer base.InternalKeyTrailer, )
UnsafeLastSpan returns the start and end user keys of the last span written to the block and the trailer of its largest key. The returned keys point directly into the block writer's memory and must not be mutated.
type KeyspanDecoder ¶
type KeyspanDecoder struct {
// contains filtered or unexported fields
}
A KeyspanDecoder exposes facilities for decoding a keyspan block. A KeyspanDecoder is safe for concurrent use after initialization.
func (*KeyspanDecoder) DebugString ¶
func (d *KeyspanDecoder) DebugString() string
DebugString prints a human-readable explanation of the keyspan block's binary representation.
func (*KeyspanDecoder) Describe ¶
func (d *KeyspanDecoder) Describe(f *binfmt.Formatter, tp treeprinter.Node)
Describe describes the binary format of the keyspan block, assuming f.Offset() is positioned at the beginning of the same keyspan block described by r.
func (*KeyspanDecoder) Init ¶
func (d *KeyspanDecoder) Init(data []byte)
Init initializes the keyspan decoder with the given block data.
type KeyspanIter ¶
type KeyspanIter struct {
// contains filtered or unexported fields
}
A KeyspanIter is an iterator over a columnar keyspan block. It implements the keyspan.FragmentIterator interface.
func NewKeyspanIter ¶
func NewKeyspanIter( cmp base.Compare, h block.BufferHandle, transforms block.FragmentIterTransforms, ) *KeyspanIter
NewKeyspanIter constructs a new iterator over a keyspan columnar block.
func (*KeyspanIter) DebugTree ¶
func (i *KeyspanIter) DebugTree(tp treeprinter.Node)
DebugTree is part of the FragmentIterator interface.
func (*KeyspanIter) SeekGE ¶
SeekGE moves the iterator to the first span covering a key greater than or equal to the given key. This is equivalent to seeking to the first span with an end key greater than the given key.
func (*KeyspanIter) SeekLT ¶
SeekLT moves the iterator to the last span covering a key less than the given key. This is equivalent to seeking to the last span with a start key less than the given key.
func (*KeyspanIter) SetContext ¶
SetContext implements keyspan.FragmentIterator.
func (*KeyspanIter) WrapChildren ¶
WrapChildren implements keyspan.FragmentIterator.
type PrefixBytes ¶
type PrefixBytes struct {
// contains filtered or unexported fields
}
PrefixBytes holds an array of lexicographically ordered byte slices. It provides prefix compression. Prefix compression applies strongly to two cases in CockroachDB: removal of the "[/tenantID]/tableID/indexID" prefix that is present on all table data keys, and multiple versions of a key that are distinguished only by different timestamp suffixes. With columnar blocks enabling the timestamp to be placed in a separate column, the multiple version problem becomes one of efficiently handling exact duplicate keys. PrefixBytes builds off of the RawBytes encoding, introducing additional slices for encoding (n+bundleSize-1)/bundleSize bundle prefixes and 1 block-level shared prefix for the column.
Unlike the original prefix compression performed by rowblk (inherited from LevelDB and RocksDB), PrefixBytes does not perform all prefix compression relative to the previous key. Rather it performs prefix compression relative to the first key of a key's bundle. This can result in less compression, but simplifies reverse iteration and allows iteration to be largely stateless.
To understand the PrefixBytes layout, we'll work through an example using these 15 keys:
0123456789 0 aaabbbc 1 aaabbbcc 2 aaabbbcde 3 aaabbbce 4 aaabbbdee 5 aaabbbdee 6 aaabbbdee 7 aaabbbeff 8 aaabbe 9 aaabbeef 10 aaabbeef 11 aaabc 12 aabcceef 13 aabcceef 14 aabcceef
The total length of these keys is 119 bytes. There are 3 keys which occur multiple times (rows 4-6, 9-10, 12-14) which models multiple versions of the same MVCC key in CockroachDB. There is a shared prefix to all of the keys which models the "[/tenantID]/tableID/indexID" present on CockroachDB table data keys. There are other shared prefixes which model identical values in table key columns.
The table below shows the components of the KeyBytes encoding for these 15 keys when using a bundle size of 4 which results in 4 bundles. The 15 keys are encoded into 20 slices: 1 block prefix, 4 bundle prefixes, and 15 suffixes. The first slice in the table is the block prefix that is shared by all keys in the block. The first slice in each bundle is the bundle prefix which is shared by all keys in the bundle.
idx | row | end offset | data -------+-------+------------+---------- 0 | | 2 | aa 1 | | 7 | ..abbbc 2 | 0 | 7 | ....... 3 | 1 | 8 | .......c 4 | 2 | 10 | .......de 5 | 3 | 11 | .......e 6 | | 15 | ..abbb 7 | 4 | 18 | ......dee 8 | 5 | 18 | ......... 9 | 6 | 18 | ......... 10 | 7 | 21 | ......eff 11 | | 23 | ..ab 12 | 8 | 25 | ....be 13 | 9 | 29 | ....beef 14 | 10 | 29 | ........ 15 | 11 | 30 | ....c 16 | | 36 | ..bcceef 17 | 12 | 36 | ........ 18 | 13 | 36 | ........ 19 | 14 | 36 | ........
The 'end offset' column in the table encodes the exclusive offset within the string data section where each of the slices end. Each slice starts at the previous slice's end offset. The first slice (the block prefix)'s start offset is implicitly zero. Note that this differs from the plain RawBytes encoding which always stores a zero offset at the beginning of the offsets array to avoid special-casing the first slice. The block prefix already requires special-casing, so materializing the zero start offset is not needed.
The table above defines 20 slices: the 1 block key prefix, the 4 bundle key prefixes and the 15 key suffixes. Offset[0] is the length of the first slice which is always anchored at data[0]. The data columns display the portion of the data array the slice covers. For row slices, an empty suffix column indicates that the slice is identical to the slice at the previous index which is indicated by the slice's offset being equal to the previous slice's offset. Due to the lexicographic sorting, the key at row i can't be a prefix of the key at row i-1 or it would have sorted before the key at row i-1. And if the key differs then only the differing bytes will be part of the suffix and not contained in the bundle prefix.
The end result of this encoding is that we can store the 119 bytes of the 15 keys plus their start and end offsets (which would naively consume 15*4=60 bytes for at least the key lengths) in 61 bytes (36 bytes of data + 4 bytes of offset constant + 20 bytes of offset delta data + 1 byte of bundle size).
Physical representation ¶
+==================================================================+ | Bundle size (1 byte) | | | | The bundle size indicates how many keys prefix compression may | | apply across. Every bundleSize keys, prefix compression restarts.| | The bundleSize is required to be a power of two, and this 1- | | byte prefix stores log2(bundleSize). | +==================================================================+ | RawBytes | | | | A modified RawBytes encoding is used to store the data slices. A | | PrefixBytes column storing n keys will encode | | | | 1 block prefix | | + | | (n + bundleSize-1)/bundleSize bundle prefixes | | + | | n row suffixes | | | | slices. Unlike the RawBytes encoding, the first offset encoded | | is not guaranteed to be zero. In the PrefixBytes encoding, the | | first offset encodes the length of the column-wide prefix. The | | column-wide prefix is stored in slice(0, offset(0)). | | | | +------------------------------------------------------------+ | | | Offset table | | | | | | | | A Uint32 column encoding offsets into the string data, | | | | possibly delta8 or delta16 encoded. When a delta encoding | | | | is used, the base constant is always zero. | | | +------------------------------------------------------------+ | | | offsetDelta[0] | offsetDelta[1] | ... | offsetDelta[m] | | | +------------------------------------------------------------+ | | | prefix-compressed string data | | | | ... | | | +------------------------------------------------------------+ | +==================================================================+
TODO(jackson): Consider stealing the low bit of the offset for a flag indicating that a key is a duplicate and then using the remaining bits to encode the relative index of the duplicated key's end offset. This would avoid the O(bundle size) scan in the case of duplicate keys, but at the cost of complicating logic to look up a bundle prefix (which may need to follow a duplicate key's relative index to uncover the start offset of the bundle prefix).
Reads ¶
This encoding provides O(1) access to any row by calculating the bundle for the row (see bundleOffsetIndexForRow), then the per-row's suffix (see rowSuffixIndex). If the per-row suffix's end offset equals the previous offset, then the row is a duplicate key and we need to step backward until we find a non-empty slice or the start of the bundle (a variable number of steps, but bounded by the bundle size).
Forward iteration can easily reuse the previous row's key with a check on whether the row's slice is empty. Reverse iteration within a run of equal keys can reuse the next row's key. When reverse iteration steps backward from a non-empty slice onto an empty slice, it must continue backward until a non-empty slice is found (just as in absolute positioning) to discover the row suffix that is duplicated.
The Seek{GE,LT} routines first binary search on the first key of each bundle which can be retrieved without data movement because the bundle prefix is immediately adjacent to it in the data array. We can slightly optimize the binary search by skipping over all of the keys in the bundle on prefix mismatches.
func DecodePrefixBytes ¶
func DecodePrefixBytes( b []byte, offset uint32, count int, ) (prefixBytes PrefixBytes, endOffset uint32)
DecodePrefixBytes decodes the structure of a PrefixBytes, constructing an accessor for an array of lexicographically sorted byte slices constructed by PrefixBytesBuilder. Count must be the number of logical slices within the array.
func (PrefixBytes) At ¶
func (b PrefixBytes) At(i int) []byte
At returns the i'th []byte slice in the PrefixBytes. At must allocate, so callers should prefer accessing a slice's constituent components through SharedPrefix, BundlePrefix and RowSuffix.
func (*PrefixBytes) BundleCount ¶
func (b *PrefixBytes) BundleCount() int
BundleCount returns the count of bundles within the PrefixBytes.
func (*PrefixBytes) BundlePrefix ¶
func (b *PrefixBytes) BundlePrefix(i int) []byte
BundlePrefix returns the prefix of the i-th bundle in the column. The provided i must be in the range [0, BundleCount()). The returned slice should not be mutated.
func (*PrefixBytes) RowBundlePrefix ¶
func (b *PrefixBytes) RowBundlePrefix(row int) []byte
RowBundlePrefix takes a row index and returns a []byte of the prefix shared among all the keys in the row's bundle, but without the block-level shared prefix for the column. The returned slice should not be mutated.
func (*PrefixBytes) RowSuffix ¶
func (b *PrefixBytes) RowSuffix(row int) []byte
RowSuffix returns a []byte of the suffix unique to the row. A row's full key is the result of concatenating SharedPrefix(), BundlePrefix() and RowSuffix().
The returned slice should not be mutated.
func (*PrefixBytes) Rows ¶
func (b *PrefixBytes) Rows() int
Rows returns the count of rows whose keys are encoded within the PrefixBytes.
func (*PrefixBytes) Search ¶
func (b *PrefixBytes) Search(k []byte) (rowIndex int, isEqual bool)
Search searches for the first key in the PrefixBytes that is greater than or equal to k, returning the index of the key and whether an equal key was found. If multiple keys are equal, the index of the first such key is returned. If all keys are < k, Search returns Rows() for the row index.
func (*PrefixBytes) SetAt ¶
func (b *PrefixBytes) SetAt(it *PrefixBytesIter, i int)
SetAt updates the provided PrefixBytesIter to hold the i'th []byte slice in the PrefixBytes. The PrefixBytesIter's buffer must be sufficiently large to hold the i'th []byte slice, and the caller is required to statically ensure this.
func (*PrefixBytes) SetNext ¶
func (b *PrefixBytes) SetNext(it *PrefixBytesIter)
SetNext updates the provided PrefixBytesIter to hold the next []byte slice in the PrefixBytes. SetNext requires the provided iter to currently hold a slice and for a subsequent slice to exist within the PrefixBytes. The PrefixBytesIter's buffer must be sufficiently large to hold the next []byte slice, and the caller is required to statically ensure this.
func (*PrefixBytes) SharedPrefix ¶
func (b *PrefixBytes) SharedPrefix() []byte
SharedPrefix return a []byte of the shared prefix that was extracted from all of the values in the Bytes vector. The returned slice should not be mutated.
func (*PrefixBytes) UnsafeFirstSlice ¶
func (b *PrefixBytes) UnsafeFirstSlice() []byte
UnsafeFirstSlice returns first slice in the PrefixBytes. The returned slice points directly into the PrefixBytes buffer and must not be mutated.
type PrefixBytesBuilder ¶
type PrefixBytesBuilder struct {
// contains filtered or unexported fields
}
PrefixBytesBuilder encodes a column of lexicographically-sorted byte slices, applying prefix compression to reduce the encoded size.
func (*PrefixBytesBuilder) DataType ¶
func (b *PrefixBytesBuilder) DataType(int) DataType
DataType implements ColumnWriter.
func (*PrefixBytesBuilder) Finish ¶
func (b *PrefixBytesBuilder) Finish( col int, rows int, offset uint32, buf []byte, ) (endOffset uint32)
Finish writes the serialized byte slices to buf starting at offset. The buf slice must be sufficiently large to store the serialized output. The caller should use [Size] to size buf appropriately before calling Finish.
Finish only supports values of [rows] equal to the number of keys set on the builder, or one less.
func (*PrefixBytesBuilder) Init ¶
func (b *PrefixBytesBuilder) Init(bundleSize int)
Init initializes the PrefixBytesBuilder with the specified bundle size. The builder will produce a prefix-compressed column of data type DataTypePrefixBytes. The [bundleSize] indicates the number of keys that form a "bundle," across which prefix-compression is applied. All keys in the column will share a column-wide prefix if there is one.
func (*PrefixBytesBuilder) NumColumns ¶
func (b *PrefixBytesBuilder) NumColumns() int
NumColumns implements ColumnWriter.
func (*PrefixBytesBuilder) Put ¶
func (b *PrefixBytesBuilder) Put(key []byte, bytesSharedWithPrev int)
Put adds the provided key to the column. The provided key must be lexicographically greater than or equal to the previous key added to the builder.
The provided bytesSharedWithPrev must be the length of the byte prefix the provided key shares with the previous key. The caller is required to provide this because in the primary expected use, the caller will already need to compute it for the purpose of determining whether successive keys share the same prefix.
func (*PrefixBytesBuilder) Reset ¶
func (b *PrefixBytesBuilder) Reset()
Reset resets the builder to an empty state, preserving the existing bundle size.
func (*PrefixBytesBuilder) Rows ¶
func (b *PrefixBytesBuilder) Rows() int
Rows returns the number of keys added to the builder.
func (*PrefixBytesBuilder) Size ¶
func (b *PrefixBytesBuilder) Size(rows int, offset uint32) uint32
Size computes the size required to encode the byte slices beginning at the provided offset. The offset is required to ensure proper alignment. The returned uint32 is the offset of the first byte after the end of the encoded data. To compute the size in bytes, subtract the [offset] passed into Size from the returned offset.
func (*PrefixBytesBuilder) UnsafeGet ¶
func (b *PrefixBytesBuilder) UnsafeGet(i int) []byte
UnsafeGet returns the zero-indexed i'th key added to the builder through Put. UnsafeGet may only be used to retrieve the Rows()-1'th or Rows()-2'th keys. If called with a different i value, UnsafeGet panics. The keys returned by UnsafeGet are guaranteed to be stable until Finish or Reset is called. The caller must not mutate the returned slice.
func (*PrefixBytesBuilder) WriteDebug ¶
func (b *PrefixBytesBuilder) WriteDebug(w io.Writer, rows int)
WriteDebug implements the Encoder interface.
type PrefixBytesIter ¶
type PrefixBytesIter struct { // Buf is used for materializing a user key. It is preallocated to the maximum // key length in the data block. Buf []byte // contains filtered or unexported fields }
PrefixBytesIter is an iterator and associated buffers for PrefixBytes. It provides a means for efficiently iterating over the []byte slices contained within a PrefixBytes, avoiding unnecessary copying when portions of slices are shared.
func (*PrefixBytesIter) Init ¶
func (i *PrefixBytesIter) Init(maxKeyLength int, syntheticPrefix block.SyntheticPrefix)
Init initializes the prefix bytes iterator; maxKeyLength must be large enough to fit any key in the block after applying any synthetic prefix and/or suffix.
type RawBytes ¶
type RawBytes struct {
// contains filtered or unexported fields
}
RawBytes holds an array of byte slices, stored as a concatenated data section and a series of offsets for each slice. Byte slices within RawBytes are stored in their entirety without any compression, ensuring stability without copying.
Representation ¶
An array of N byte slices encodes N+1 offsets. The beginning of the data representation holds an offsets table, in the same encoding as a DataTypeUint32 column. The integer offsets may be encoded using smaller width integers to save space if all offsets fit within an 8-bit or 16-bit uint. Each offset is relative to the beginning of the string data section (after the offset table).
The use of UintEncoding conserves space in the common case. In the context of CockroachDB, the vast majority of offsets will fit in 16-bits when using 32 KiB blocks (the size in use by CockroachDB). However, a single value larger than 65535 bytes requires an offset too large to fit within 16 bits, in which case offsets will be encoded as 32-bit integers.
+-------------------------------------------------------------------+ | a uint offsets table, usually encoded with 16-bits, | | possibly padded for alignment | | (see UintEncoding) | +-------------------------------------------------------------------+ | String Data | | abcabcada.... | +-------------------------------------------------------------------+
The UintEncoding bits of the ColumnEncoding for a RawBytes column describes the encoding of the offset table.
func DecodeRawBytes ¶
DecodeRawBytes decodes the structure of a RawBytes, constructing an accessor for an array of byte slices constructed by RawBytesBuilder. Count must be the number of byte slices within the array.
type RawBytesBuilder ¶
type RawBytesBuilder struct {
// contains filtered or unexported fields
}
RawBytesBuilder encodes a column of byte slices.
func (*RawBytesBuilder) DataType ¶
func (b *RawBytesBuilder) DataType(int) DataType
DataType implements ColumnWriter.
func (*RawBytesBuilder) Finish ¶
func (b *RawBytesBuilder) Finish(col, rows int, offset uint32, buf []byte) uint32
Finish writes the serialized byte slices to buf starting at offset. The buf slice must be sufficiently large to store the serialized output. The caller should use [Size] to size buf appropriately before calling Finish.
func (*RawBytesBuilder) Init ¶
func (b *RawBytesBuilder) Init()
Init initializes the builder for first-time use.
func (*RawBytesBuilder) NumColumns ¶
func (b *RawBytesBuilder) NumColumns() int
NumColumns implements ColumnWriter.
func (*RawBytesBuilder) Put ¶
func (b *RawBytesBuilder) Put(s []byte)
Put appends the provided byte slice to the builder.
func (*RawBytesBuilder) PutConcat ¶
func (b *RawBytesBuilder) PutConcat(s1, s2 []byte)
PutConcat appends a single byte slice formed by the concatenation of the two byte slice arguments.
func (*RawBytesBuilder) Reset ¶
func (b *RawBytesBuilder) Reset()
Reset resets the builder to an empty state.
func (*RawBytesBuilder) Rows ¶
func (b *RawBytesBuilder) Rows() int
Rows returns the count of slices that have been added to the builder.
func (*RawBytesBuilder) Size ¶
func (b *RawBytesBuilder) Size(rows int, offset uint32) uint32
Size computes the size required to encode the byte slices beginning in a buffer at the provided offset. The offset is required to ensure proper alignment. The returned uint32 is the offset of the first byte after the end of the encoded data. To compute the size in bytes, subtract the [offset] passed into Size from the returned offset.
func (*RawBytesBuilder) UnsafeGet ¶
func (b *RawBytesBuilder) UnsafeGet(i int) []byte
UnsafeGet returns the i'th slice added to the builder. The returned slice is owned by the builder and must not be mutated.
func (*RawBytesBuilder) WriteDebug ¶
func (b *RawBytesBuilder) WriteDebug(w io.Writer, rows int)
WriteDebug implements Encoder.
type UintBuilder ¶
type UintBuilder struct {
// contains filtered or unexported fields
}
UintBuilder builds a column of unsigned integers. It uses the smallest possible UintEncoding, depending on the values.
func (*UintBuilder) DataType ¶
func (b *UintBuilder) DataType(int) DataType
DataType implements ColumnWriter.
func (*UintBuilder) Finish ¶
func (b *UintBuilder) Finish(col, rows int, offset uint32, buf []byte) uint32
Finish implements ColumnWriter, serializing the column into offset [offset] of [buf].
func (*UintBuilder) Get ¶
func (b *UintBuilder) Get(row int) uint64
Get gets the value of the provided row index. The provided row must have been Set or the builder must have been initialized with InitWithDefault.
func (*UintBuilder) InitWithDefault ¶
func (b *UintBuilder) InitWithDefault()
InitWithDefault initializes the UintBuilder. Any rows that are not explicitly set are assumed to be zero. For the purpose of determining whether a delta encoding is possible, the column is assumed to contain at least 1 default value.
InitWithDefault may be preferrable when a nonzero value is uncommon, and the caller can avoid explicitly Set-ing every zero value.
func (*UintBuilder) NumColumns ¶
func (b *UintBuilder) NumColumns() int
NumColumns implements ColumnWriter.
func (*UintBuilder) Reset ¶
func (b *UintBuilder) Reset()
Reset implements ColumnWriter and resets the builder, reusing existing allocated memory.
func (*UintBuilder) Set ¶
func (b *UintBuilder) Set(row int, v uint64)
Set sets the value of the provided row index to v.
func (*UintBuilder) Size ¶
func (b *UintBuilder) Size(rows int, offset uint32) uint32
Size implements ColumnWriter and returns the size of the column if its first [rows] rows were serialized, serializing the column into offset [offset].
func (*UintBuilder) WriteDebug ¶
func (b *UintBuilder) WriteDebug(w io.Writer, rows int)
WriteDebug implements Encoder.
type UintEncoding ¶
type UintEncoding uint8
UintEncoding indicates how unsigned integers (of at most 64 bits) are encoded. It has two components:
- the low bits indicate how many bytes per integer are used, with allowed values 0, 1, 2, 4, or 8.
- whether we are using a delta encoding, meaning that a base (64-bit) value is encoded separately and each encoded value is a delta from that base. Delta encoding is never necessary when we use 8 bytes per integer.
Note that 0-byte encodings imply that all values are equal (either to the base value if we are using a delta encoding, otherwise to 0).
The UintEncoding byte is serialized to the uint column before the column data.
func DetermineUintEncoding ¶
func DetermineUintEncoding(minValue, maxValue uint64, numRows int) UintEncoding
DetermineUintEncoding returns the best valid encoding that can be used to represent numRows integers in the range [minValue, maxValue].
DetermineUintEncoding returns the same result for any value of rows >= UintEncodingRowThreshold.
func (UintEncoding) IsDelta ¶
func (e UintEncoding) IsDelta() bool
IsDelta returns true if it is a delta encoding.
func (UintEncoding) IsValid ¶
func (e UintEncoding) IsValid() bool
IsValid returns true if the encoding is valid.
func (UintEncoding) Width ¶
func (e UintEncoding) Width() int
Width returns the number of bytes used per integer. It can be 0, 1, 2, 4, or 8.
type UnsafeBuf ¶
type UnsafeBuf struct {
// contains filtered or unexported fields
}
UnsafeBuf provides a buffer without bounds checking. Every buf has a len and capacity.
func (*UnsafeBuf) Alloc ¶
Alloc allocates a buffer of size n, without zeroing its contents or copying previous buffer contents.
func (*UnsafeBuf) UnsafeSlice ¶
UnsafeSlice returns the current contents of the buf.
type UnsafeOffsets ¶
type UnsafeOffsets struct {
// contains filtered or unexported fields
}
UnsafeOffsets is a specialization of UnsafeInts (providing the same functionality) which is optimized when the integers are offsets inside a column block. It can only be used with 0, 1, 2, or 4 byte encoding without delta.
func DecodeUnsafeOffsets ¶
func DecodeUnsafeOffsets(b []byte, off uint32, rows int) (_ UnsafeOffsets, endOffset uint32)
DecodeUnsafeOffsets decodes the structure of a slice of offsets from a byte slice.
type UnsafeRawSlice ¶
type UnsafeRawSlice[T constraints.Integer] struct { // contains filtered or unexported fields }
UnsafeRawSlice maintains a pointer to a slice of elements of type T. UnsafeRawSlice provides no bounds checking.
func (UnsafeRawSlice[T]) At ¶
func (s UnsafeRawSlice[T]) At(i int) T
At returns the `i`-th element of the slice.
func (UnsafeRawSlice[T]) Slice ¶
func (s UnsafeRawSlice[T]) Slice(len int) []T
Slice returns a go []T slice containing the first `len` elements of the unsafe slice.
type UnsafeUints ¶
type UnsafeUints struct {
// contains filtered or unexported fields
}
UnsafeUints exposes a read-only view of integers from a column, transparently decoding data based on the UintEncoding.
See UintEncoding and UintBuilder.
func DecodeUnsafeUints ¶
func DecodeUnsafeUints(b []byte, off uint32, rows int) (_ UnsafeUints, endOffset uint32)
DecodeUnsafeUints decodes the structure of a slice of uints from a byte slice.
type Version ¶
type Version uint8
Version indicates the version of the columnar block format encoded. The version byte is always the first byte within the block. This ensures that readers can switch on the version before reading the rest of the block.
const ( // Version1 is the first version of the columnar block format. Version1 Version = 0x01 )