segmentindex

package

v1.29.0-rc.2 Latest Latest Go to latest Published: Feb 13, 2025 License: BSD-3-Clause Imports: 16 Imported by: 2

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/weaviate/weaviate

Documentation ¶

Index ¶

Constants
Variables
func CheckExpectedStrategy(strategy Strategy, expectedStrategies ...Strategy) error
func ChooseHeaderVersion(checksumsEnabled bool) uint16
func IsExpectedStrategy(strategy Strategy, expectedStrategies ...Strategy) bool
func MustBeExpectedStrategy(strategy Strategy, expectedStrategies ...Strategy)
type DiskTree
- func NewDiskTree(data []byte) *DiskTree
- func (t *DiskTree) AllKeys() ([][]byte, error)
- func (t *DiskTree) Get(key []byte) (Node, error)
- func (t *DiskTree) Next(key []byte) (Node, error)
- func (t *DiskTree) QuantileKeys(q int) [][]byte
- func (t *DiskTree) Seek(key []byte) (Node, error)
- func (t *DiskTree) Size() int
type Header
- func ParseHeader(r io.Reader) (*Header, error)
- func (h *Header) PrimaryIndex(source []byte) ([]byte, error)
- func (h *Header) SecondaryIndex(source []byte, indexID uint16) ([]byte, error)
- func (h *Header) WriteTo(w io.Writer) (int64, error)
type HeaderInverted
- func LoadHeaderInverted(headerBytes []byte) (*HeaderInverted, error)
- func ParseHeaderInverted(r io.Reader) (*HeaderInverted, error)
- func (h *HeaderInverted) WriteTo(w io.Writer) (int64, error)
type Indexes
- func (s *Indexes) WriteTo(w io.Writer) (int64, error)
type Key
type Node
type SegmentFile
- func NewSegmentFile(opts ...SegmentFileOption) *SegmentFile
- func (f *SegmentFile) BodyWriter() io.Writer
- func (f *SegmentFile) ValidateChecksum(info os.FileInfo) error
- func (f *SegmentFile) WriteChecksum() (int64, error)
- func (f *SegmentFile) WriteHeader(header *Header) (int64, error)
- func (f *SegmentFile) WriteIndexes(indexes *Indexes) (int64, error)
type SegmentFileOption
- func WithBufferedWriter(writer *bufio.Writer) SegmentFileOption
- func WithChecksumsDisabled(disable bool) SegmentFileOption
- func WithReader(reader io.Reader) SegmentFileOption
type Strategy
type Tree
- func NewBalanced(nodes []Node) Tree
- func NewTree(capacity int) Tree
- func (t *Tree) Get(key []byte) ([]byte, uint64, uint64)
- func (t *Tree) Height() int
- func (t *Tree) Insert(key []byte, start, end uint64)
- func (t *Tree) MarshalBinary() ([]byte, error)
- func (t *Tree) MarshalBinaryInto(w io.Writer) (int64, error)

Constants ¶

View Source

const (
	// HeaderSize describes the general offset in a segment until the data
	// starts, it is composed of 2 bytes for level, 2 bytes for version,
	// 2 bytes for secondary index count, 2 bytes for strategy, 8 bytes
	// for the pointer to the index part
	HeaderSize = 16

	// ChecksumSize describes the length of the segment file checksum.
	// This is currently based on the CRC32 hashing algorithm.
	ChecksumSize = 4
)

View Source

const (
	// SegmentV1 is the current latest version, and introduced support
	// for integrity checks with checksums added to the segment files.
	SegmentV1 = uint16(1)

	// CurrentSegmentVersion is used to ensure that the parsed header
	// version does not exceed the highest valid version.
	CurrentSegmentVersion = SegmentV1
)

View Source

const HeaderInvertedSize = 29 // 27 + 2 bytes for data field count

Variables ¶

View Source

var (
	SegmentInvertedDefaultHeaderSize = 27
	SegmentInvertedDefaultBlockSize  = terms.BLOCK_SIZE
	SegmentInvertedDefaultFieldCount = 2
)

Functions ¶

func CheckExpectedStrategy ¶ added in v1.26.0

func CheckExpectedStrategy(strategy Strategy, expectedStrategies ...Strategy) error

func ChooseHeaderVersion ¶ added in v1.26.14

func ChooseHeaderVersion(checksumsEnabled bool) uint16

func IsExpectedStrategy ¶ added in v1.26.0

func IsExpectedStrategy(strategy Strategy, expectedStrategies ...Strategy) bool

func MustBeExpectedStrategy ¶ added in v1.26.0

func MustBeExpectedStrategy(strategy Strategy, expectedStrategies ...Strategy)

Types ¶

type DiskTree ¶

type DiskTree struct {
	// contains filtered or unexported fields
}

DiskTree is a read-only wrapper around a marshalled index search tree, which can be used for reading, but cannot change the underlying structure. It is thus perfectly suited as an index for an (immutable) LSM disk segment, but pretty much useless for anything else

func NewDiskTree ¶

func NewDiskTree(data []byte) *DiskTree

func (*DiskTree) AllKeys ¶

func (t *DiskTree) AllKeys() ([][]byte, error)

AllKeys is a relatively expensive operation as it basically does a full disk read of the index. It is meant for one of operations, such as initializing a segment where we need access to all keys, e.g. to build a bloom filter. This should not run at query time.

The binary tree is traversed in Level-Order so keys have no meaningful order. Do not use this method if an In-Order traversal is required, but only for use cases who don't require a specific order, such as building a bloom filter.

func (*DiskTree) Get ¶

func (t *DiskTree) Get(key []byte) (Node, error)

func (*DiskTree) Next ¶ added in v1.26.0

func (t *DiskTree) Next(key []byte) (Node, error)

func (*DiskTree) QuantileKeys ¶ added in v1.25.11

func (t *DiskTree) QuantileKeys(q int) [][]byte

QuantileKeys returns a list of keys that roughly represent the quantiles of the tree. This can be very useful to bootstrap parallel cursors over the segment that are more or less evenly distributed.

This method uses the natural shape of the tree to determine the distribution of the keys. This is a performance-accuracy trade-off. It does not guarantee perfect distribution, but it is fairly cheap to obtain as most runs will only need to go a few levels deep – even on massive trees.

The number of keys returned is not guaranteed to be exactly q, in most cases returns more keys. This is because in a real-life application you would likely aggregate across multiple segments. Similarly keys are not returned in any specific order, as the assumption is that post-processing will be done when keys are aggregated across multiple segments.

The two guarantees you get are:

If there are at least q keys in the tree, you will get at least q keys, most likely more
If there are less than q keys in the tree, you will get all keys.

func (*DiskTree) Seek ¶

func (t *DiskTree) Seek(key []byte) (Node, error)

func (*DiskTree) Size ¶

func (t *DiskTree) Size() int

type Header ¶ added in v1.18.0

type Header struct {
	Level            uint16
	Version          uint16
	SecondaryIndices uint16
	Strategy         Strategy
	IndexStart       uint64
}

func ParseHeader ¶ added in v1.18.0

func ParseHeader(r io.Reader) (*Header, error)

func (h *Header) PrimaryIndex(source []byte) ([]byte, error)

func (h *Header) SecondaryIndex(source []byte, indexID uint16) ([]byte, error)

func (h *Header) WriteTo(w io.Writer) (int64, error)

type HeaderInverted ¶ added in v1.28.0

type HeaderInverted struct {
	KeysOffset            uint64
	TombstoneOffset       uint64
	PropertyLengthsOffset uint64
	Version               uint8
	BlockSize             uint8
	DataFieldCount        uint8
	DataFields            []varenc.VarEncDataType
}

func LoadHeaderInverted ¶ added in v1.28.0

func LoadHeaderInverted(headerBytes []byte) (*HeaderInverted, error)

func ParseHeaderInverted ¶ added in v1.28.0

func ParseHeaderInverted(r io.Reader) (*HeaderInverted, error)

func (*HeaderInverted) WriteTo ¶ added in v1.28.0

func (h *HeaderInverted) WriteTo(w io.Writer) (int64, error)

type Indexes ¶ added in v1.18.0

type Indexes struct {
	Keys                []Key
	SecondaryIndexCount uint16
	ScratchSpacePath    string
}

func (*Indexes) WriteTo ¶ added in v1.18.0

func (s *Indexes) WriteTo(w io.Writer) (int64, error)

type Key ¶ added in v1.18.0

type Key struct {
	Key           []byte
	SecondaryKeys [][]byte
	ValueStart    int
	ValueEnd      int
}

Key is a helper struct that can be used to build the key nodes for the segment index. It contains the primary key and an arbitrary number of secondary keys, as well as valueStart and valueEnd indicator. Those are used to find the correct payload for each key.

type Node ¶

type Node struct {
	Key   []byte
	Start uint64
	End   uint64
}

type SegmentFile ¶ added in v1.26.14

type SegmentFile struct {
	// contains filtered or unexported fields
}

SegmentFile facilitates the writing/reading of an LSM bucket segment file.

These contents include the CRC32 checksum which is calculated based on the:

segment data
segment indexes
segment header

The checksum is calculated using those components in that exact ordering. This is because during compactions, the header is not actually known until the compaction process is complete. So to accommodate this, all segment checksum calculations are made using the header last.

Usage:

To write a segment file, initialization and API are as follows:
   ```
   sf := NewSegmentFile(WithBufferedWriter(<some buffered writer>))
   sf.WriterHeader(<some *Header>)
   <some segment node>.WriteTo(sf.BodyWriter())
   sf.WriteChecksum()
   ```

To validate a segment file checksum, initialization and API are as follows:
   ```
   sf := NewSegmentFile(WithReader(<segment fd>))
   sf.ValidateChecksum(<segment fd file info>)
   ```

func NewSegmentFile ¶ added in v1.26.14

func NewSegmentFile(opts ...SegmentFileOption) *SegmentFile

NewSegmentFile creates a new instance of SegmentFile. Be sure to include a writer or reader option depending on your needs.

func (*SegmentFile) BodyWriter ¶ added in v1.26.14

func (f *SegmentFile) BodyWriter() io.Writer

BodyWriter exposes the underlying writer which calculates the hash inline. This method is used when writing the body of the segment, the user data itself.

Because there are many segment node types, and each exposes its own `WriteTo` (or similar) method, it would be cumbersome to support each node type, in the way we support WriteHeader and WriteIndexes. So this method exists to hook into each segment node's `WriteTo` instead.

This method uses the written data to further calculate the checksum.

func (*SegmentFile) ValidateChecksum ¶ added in v1.26.14

func (f *SegmentFile) ValidateChecksum(info os.FileInfo) error

ValidateChecksum determines if a segment's content matches its checksum

func (*SegmentFile) WriteChecksum ¶ added in v1.26.14

func (f *SegmentFile) WriteChecksum() (int64, error)

WriteChecksum writes checksum itself to the segment file. As mentioned elsewhere in SegmentFile, the header is added to the checksum last. This method finally adds the header to the hash, and then writes the resulting checksum to the segment file.

func (*SegmentFile) WriteHeader ¶ added in v1.26.14

func (f *SegmentFile) WriteHeader(header *Header) (int64, error)

WriteHeader writes the header struct to the underlying writer. This method resets the internal hash, so that the header can be written to the checksum last. For more details see SegmentFile.

func (*SegmentFile) WriteIndexes ¶ added in v1.26.14

func (f *SegmentFile) WriteIndexes(indexes *Indexes) (int64, error)

WriteIndexes writes the indexes struct to the underlying writer. This method uses the written data to further calculate the checksum.

type SegmentFileOption ¶ added in v1.26.14

type SegmentFileOption func(*SegmentFile)

func WithBufferedWriter ¶ added in v1.26.14

func WithBufferedWriter(writer *bufio.Writer) SegmentFileOption

WithBufferedWriter sets the desired segment file writer This will typically wrap the segment *os.File

func WithChecksumsDisabled ¶ added in v1.26.14

func WithChecksumsDisabled(disable bool) SegmentFileOption

WithChecksumsDisabled configures the segment file to be written without checksums

func WithReader ¶ added in v1.26.14

func WithReader(reader io.Reader) SegmentFileOption

WithReader sets the desired segment file reader. This will typically be the segment *os.File.

type Strategy ¶ added in v1.18.0

type Strategy uint16

const (
	StrategyReplace Strategy = iota
	StrategySetCollection
	StrategyMapCollection
	StrategyRoaringSet
	StrategyRoaringSetRange
	StrategyInverted
)

type Tree ¶

type Tree struct {
	// contains filtered or unexported fields
}

func NewBalanced ¶

func NewBalanced(nodes []Node) Tree

func NewTree ¶

func NewTree(capacity int) Tree

func (*Tree) Get ¶

func (t *Tree) Get(key []byte) ([]byte, uint64, uint64)

func (*Tree) Height ¶

func (t *Tree) Height() int

func (*Tree) Insert ¶

func (t *Tree) Insert(key []byte, start, end uint64)

func (*Tree) MarshalBinary ¶

func (t *Tree) MarshalBinary() ([]byte, error)

func (*Tree) MarshalBinaryInto ¶

func (t *Tree) MarshalBinaryInto(w io.Writer) (int64, error)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func CheckExpectedStrategy ¶ added in v1.26.0

func ChooseHeaderVersion ¶ added in v1.26.14

func IsExpectedStrategy ¶ added in v1.26.0

func MustBeExpectedStrategy ¶ added in v1.26.0

Types ¶

type DiskTree ¶

func NewDiskTree ¶

func (*DiskTree) AllKeys ¶

func (*DiskTree) Get ¶

func (*DiskTree) Next ¶ added in v1.26.0

func (*DiskTree) QuantileKeys ¶ added in v1.25.11

func (*DiskTree) Seek ¶

func (*DiskTree) Size ¶

type Header ¶ added in v1.18.0

func ParseHeader ¶ added in v1.18.0

func (*Header) PrimaryIndex ¶ added in v1.18.0

func (*Header) SecondaryIndex ¶ added in v1.18.0

func (*Header) WriteTo ¶ added in v1.18.0

type HeaderInverted ¶ added in v1.28.0

func LoadHeaderInverted ¶ added in v1.28.0

func ParseHeaderInverted ¶ added in v1.28.0

func (*HeaderInverted) WriteTo ¶ added in v1.28.0

type Indexes ¶ added in v1.18.0

func (*Indexes) WriteTo ¶ added in v1.18.0

type Key ¶ added in v1.18.0

type Node ¶

type SegmentFile ¶ added in v1.26.14

func NewSegmentFile ¶ added in v1.26.14

func (*SegmentFile) BodyWriter ¶ added in v1.26.14

func (*SegmentFile) ValidateChecksum ¶ added in v1.26.14

func (*SegmentFile) WriteChecksum ¶ added in v1.26.14

func (*SegmentFile) WriteHeader ¶ added in v1.26.14

func (*SegmentFile) WriteIndexes ¶ added in v1.26.14

type SegmentFileOption ¶ added in v1.26.14

func WithBufferedWriter ¶ added in v1.26.14

func WithChecksumsDisabled ¶ added in v1.26.14

func WithReader ¶ added in v1.26.14

type Strategy ¶ added in v1.18.0

type Tree ¶

func NewBalanced ¶

func NewTree ¶

func (*Tree) Get ¶

func (*Tree) Height ¶

func (*Tree) Insert ¶

func (*Tree) MarshalBinary ¶

func (*Tree) MarshalBinaryInto ¶

Source Files ¶