segmentindex

package
v1.29.0-rc.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 13, 2025 License: BSD-3-Clause Imports: 16 Imported by: 2

Documentation

Index

Constants

View Source
const (
	// HeaderSize describes the general offset in a segment until the data
	// starts, it is composed of 2 bytes for level, 2 bytes for version,
	// 2 bytes for secondary index count, 2 bytes for strategy, 8 bytes
	// for the pointer to the index part
	HeaderSize = 16

	// ChecksumSize describes the length of the segment file checksum.
	// This is currently based on the CRC32 hashing algorithm.
	ChecksumSize = 4
)
View Source
const (
	// SegmentV1 is the current latest version, and introduced support
	// for integrity checks with checksums added to the segment files.
	SegmentV1 = uint16(1)

	// CurrentSegmentVersion is used to ensure that the parsed header
	// version does not exceed the highest valid version.
	CurrentSegmentVersion = SegmentV1
)
View Source
const HeaderInvertedSize = 29 // 27 + 2 bytes for data field count

Variables

View Source
var (
	SegmentInvertedDefaultHeaderSize = 27
	SegmentInvertedDefaultBlockSize  = terms.BLOCK_SIZE
	SegmentInvertedDefaultFieldCount = 2
)

Functions

func CheckExpectedStrategy added in v1.26.0

func CheckExpectedStrategy(strategy Strategy, expectedStrategies ...Strategy) error

func ChooseHeaderVersion added in v1.26.14

func ChooseHeaderVersion(checksumsEnabled bool) uint16

func IsExpectedStrategy added in v1.26.0

func IsExpectedStrategy(strategy Strategy, expectedStrategies ...Strategy) bool

func MustBeExpectedStrategy added in v1.26.0

func MustBeExpectedStrategy(strategy Strategy, expectedStrategies ...Strategy)

Types

type DiskTree

type DiskTree struct {
	// contains filtered or unexported fields
}

DiskTree is a read-only wrapper around a marshalled index search tree, which can be used for reading, but cannot change the underlying structure. It is thus perfectly suited as an index for an (immutable) LSM disk segment, but pretty much useless for anything else

func NewDiskTree

func NewDiskTree(data []byte) *DiskTree

func (*DiskTree) AllKeys

func (t *DiskTree) AllKeys() ([][]byte, error)

AllKeys is a relatively expensive operation as it basically does a full disk read of the index. It is meant for one of operations, such as initializing a segment where we need access to all keys, e.g. to build a bloom filter. This should not run at query time.

The binary tree is traversed in Level-Order so keys have no meaningful order. Do not use this method if an In-Order traversal is required, but only for use cases who don't require a specific order, such as building a bloom filter.

func (*DiskTree) Get

func (t *DiskTree) Get(key []byte) (Node, error)

func (*DiskTree) Next added in v1.26.0

func (t *DiskTree) Next(key []byte) (Node, error)

func (*DiskTree) QuantileKeys added in v1.25.11

func (t *DiskTree) QuantileKeys(q int) [][]byte

QuantileKeys returns a list of keys that roughly represent the quantiles of the tree. This can be very useful to bootstrap parallel cursors over the segment that are more or less evenly distributed.

This method uses the natural shape of the tree to determine the distribution of the keys. This is a performance-accuracy trade-off. It does not guarantee perfect distribution, but it is fairly cheap to obtain as most runs will only need to go a few levels deep – even on massive trees.

The number of keys returned is not guaranteed to be exactly q, in most cases returns more keys. This is because in a real-life application you would likely aggregate across multiple segments. Similarly keys are not returned in any specific order, as the assumption is that post-processing will be done when keys are aggregated across multiple segments.

The two guarantees you get are:

  1. If there are at least q keys in the tree, you will get at least q keys, most likely more
  2. If there are less than q keys in the tree, you will get all keys.

func (*DiskTree) Seek

func (t *DiskTree) Seek(key []byte) (Node, error)

func (*DiskTree) Size

func (t *DiskTree) Size() int
type Header struct {
	Level            uint16
	Version          uint16
	SecondaryIndices uint16
	Strategy         Strategy
	IndexStart       uint64
}

func ParseHeader added in v1.18.0

func ParseHeader(r io.Reader) (*Header, error)

func (*Header) PrimaryIndex added in v1.18.0

func (h *Header) PrimaryIndex(source []byte) ([]byte, error)

func (*Header) SecondaryIndex added in v1.18.0

func (h *Header) SecondaryIndex(source []byte, indexID uint16) ([]byte, error)

func (*Header) WriteTo added in v1.18.0

func (h *Header) WriteTo(w io.Writer) (int64, error)

type HeaderInverted added in v1.28.0

type HeaderInverted struct {
	KeysOffset            uint64
	TombstoneOffset       uint64
	PropertyLengthsOffset uint64
	Version               uint8
	BlockSize             uint8
	DataFieldCount        uint8
	DataFields            []varenc.VarEncDataType
}

func LoadHeaderInverted added in v1.28.0

func LoadHeaderInverted(headerBytes []byte) (*HeaderInverted, error)

func ParseHeaderInverted added in v1.28.0

func ParseHeaderInverted(r io.Reader) (*HeaderInverted, error)

func (*HeaderInverted) WriteTo added in v1.28.0

func (h *HeaderInverted) WriteTo(w io.Writer) (int64, error)

type Indexes added in v1.18.0

type Indexes struct {
	Keys                []Key
	SecondaryIndexCount uint16
	ScratchSpacePath    string
}

func (*Indexes) WriteTo added in v1.18.0

func (s *Indexes) WriteTo(w io.Writer) (int64, error)

type Key added in v1.18.0

type Key struct {
	Key           []byte
	SecondaryKeys [][]byte
	ValueStart    int
	ValueEnd      int
}

Key is a helper struct that can be used to build the key nodes for the segment index. It contains the primary key and an arbitrary number of secondary keys, as well as valueStart and valueEnd indicator. Those are used to find the correct payload for each key.

type Node

type Node struct {
	Key   []byte
	Start uint64
	End   uint64
}

type SegmentFile added in v1.26.14

type SegmentFile struct {
	// contains filtered or unexported fields
}

SegmentFile facilitates the writing/reading of an LSM bucket segment file.

These contents include the CRC32 checksum which is calculated based on the:

  • segment data
  • segment indexes
  • segment header

The checksum is calculated using those components in that exact ordering. This is because during compactions, the header is not actually known until the compaction process is complete. So to accommodate this, all segment checksum calculations are made using the header last.

Usage:

To write a segment file, initialization and API are as follows:
   ```
   sf := NewSegmentFile(WithBufferedWriter(<some buffered writer>))
   sf.WriterHeader(<some *Header>)
   <some segment node>.WriteTo(sf.BodyWriter())
   sf.WriteChecksum()
   ```

To validate a segment file checksum, initialization and API are as follows:
   ```
   sf := NewSegmentFile(WithReader(<segment fd>))
   sf.ValidateChecksum(<segment fd file info>)
   ```

func NewSegmentFile added in v1.26.14

func NewSegmentFile(opts ...SegmentFileOption) *SegmentFile

NewSegmentFile creates a new instance of SegmentFile. Be sure to include a writer or reader option depending on your needs.

func (*SegmentFile) BodyWriter added in v1.26.14

func (f *SegmentFile) BodyWriter() io.Writer

BodyWriter exposes the underlying writer which calculates the hash inline. This method is used when writing the body of the segment, the user data itself.

Because there are many segment node types, and each exposes its own `WriteTo` (or similar) method, it would be cumbersome to support each node type, in the way we support WriteHeader and WriteIndexes. So this method exists to hook into each segment node's `WriteTo` instead.

This method uses the written data to further calculate the checksum.

func (*SegmentFile) ValidateChecksum added in v1.26.14

func (f *SegmentFile) ValidateChecksum(info os.FileInfo) error

ValidateChecksum determines if a segment's content matches its checksum

func (*SegmentFile) WriteChecksum added in v1.26.14

func (f *SegmentFile) WriteChecksum() (int64, error)

WriteChecksum writes checksum itself to the segment file. As mentioned elsewhere in SegmentFile, the header is added to the checksum last. This method finally adds the header to the hash, and then writes the resulting checksum to the segment file.

func (*SegmentFile) WriteHeader added in v1.26.14

func (f *SegmentFile) WriteHeader(header *Header) (int64, error)

WriteHeader writes the header struct to the underlying writer. This method resets the internal hash, so that the header can be written to the checksum last. For more details see SegmentFile.

func (*SegmentFile) WriteIndexes added in v1.26.14

func (f *SegmentFile) WriteIndexes(indexes *Indexes) (int64, error)

WriteIndexes writes the indexes struct to the underlying writer. This method uses the written data to further calculate the checksum.

type SegmentFileOption added in v1.26.14

type SegmentFileOption func(*SegmentFile)

func WithBufferedWriter added in v1.26.14

func WithBufferedWriter(writer *bufio.Writer) SegmentFileOption

WithBufferedWriter sets the desired segment file writer This will typically wrap the segment *os.File

func WithChecksumsDisabled added in v1.26.14

func WithChecksumsDisabled(disable bool) SegmentFileOption

WithChecksumsDisabled configures the segment file to be written without checksums

func WithReader added in v1.26.14

func WithReader(reader io.Reader) SegmentFileOption

WithReader sets the desired segment file reader. This will typically be the segment *os.File.

type Strategy added in v1.18.0

type Strategy uint16
const (
	StrategyReplace Strategy = iota
	StrategySetCollection
	StrategyMapCollection
	StrategyRoaringSet
	StrategyRoaringSetRange
	StrategyInverted
)

type Tree

type Tree struct {
	// contains filtered or unexported fields
}

func NewBalanced

func NewBalanced(nodes []Node) Tree

func NewTree

func NewTree(capacity int) Tree

func (*Tree) Get

func (t *Tree) Get(key []byte) ([]byte, uint64, uint64)

func (*Tree) Height

func (t *Tree) Height() int

func (*Tree) Insert

func (t *Tree) Insert(key []byte, start, end uint64)

func (*Tree) MarshalBinary

func (t *Tree) MarshalBinary() ([]byte, error)

func (*Tree) MarshalBinaryInto

func (t *Tree) MarshalBinaryInto(w io.Writer) (int64, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL