recsplit

package
v0.0.0-...-d3d9290 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 13, 2023 License: Apache-2.0 Imports: 24 Imported by: 0

Documentation

Index

Constants

View Source
const MaxLeafSize = 24
View Source
const RecSplitLogPrefix = "recsplit"

Variables

View Source
var ErrCollision = fmt.Errorf("duplicate key")

Functions

This section is empty.

Types

type GolombRice

type GolombRice struct {
	// contains filtered or unexported fields
}

GolombRice can build up the golomb-rice encoding of the sequeuce of numbers, as well as read the numbers back from it.

func (*GolombRice) Bits

func (g *GolombRice) Bits() int

Bits returns currrent number of bits in the compact encoding of the hash function representation

func (*GolombRice) Data

func (g *GolombRice) Data() []uint64

Data returns the binary representation of the Golomb-Rice code that is built

func (*GolombRice) Write

func (g *GolombRice) Write(w io.Writer) error

Write outputs the state of golomb rice encoding into a writer, which can be recovered later by Read

type GolombRiceReader

type GolombRiceReader struct {
	// contains filtered or unexported fields
}

func (*GolombRiceReader) ReadNext

func (g *GolombRiceReader) ReadNext(log2golomb int) uint64

func (*GolombRiceReader) ReadReset

func (g *GolombRiceReader) ReadReset(bitPos, unaryOffset int)

func (*GolombRiceReader) SkipSubtree

func (g *GolombRiceReader) SkipSubtree(nodes, fixedLen int)

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index implements index lookup from the file created by the RecSplit

func MustOpen

func MustOpen(indexFile string) *Index

func OpenIndex

func OpenIndex(indexFilePath string) (*Index, error)

func (*Index) BaseDataID

func (idx *Index) BaseDataID() uint64

func (*Index) Close

func (idx *Index) Close()

func (*Index) DataHandle

func (idx *Index) DataHandle() unsafe.Pointer

func (*Index) DisableReadAhead

func (idx *Index) DisableReadAhead()

DisableReadAhead - usage: `defer d.EnableReadAhead().DisableReadAhead()`. Please don't use this funcs without `defer` to avoid leak.

func (*Index) Empty

func (idx *Index) Empty() bool

func (*Index) EnableMadvNormal

func (idx *Index) EnableMadvNormal() *Index

func (*Index) EnableReadAhead

func (idx *Index) EnableReadAhead() *Index

func (*Index) EnableWillNeed

func (idx *Index) EnableWillNeed() *Index

func (*Index) ExtractOffsets

func (idx *Index) ExtractOffsets() map[uint64]uint64

func (*Index) FileName

func (idx *Index) FileName() string

func (*Index) FilePath

func (idx *Index) FilePath() string

func (*Index) GetReaderFromPool

func (idx *Index) GetReaderFromPool() *IndexReader

func (*Index) KeyCount

func (idx *Index) KeyCount() uint64

func (*Index) Lookup

func (idx *Index) Lookup(bucketHash, fingerprint uint64) uint64

Lookup is not thread-safe because it used id.hasher

func (*Index) ModTime

func (idx *Index) ModTime() time.Time

func (*Index) OrdinalLookup

func (idx *Index) OrdinalLookup(i uint64) uint64

OrdinalLookup returns the offset of i-th element in the index Perfect hash table lookup is not performed, only access to the Elias-Fano structure containing all offsets.

func (*Index) RewriteWithOffsets

func (idx *Index) RewriteWithOffsets(w *bufio.Writer, m map[uint64]uint64) error

func (*Index) Size

func (idx *Index) Size() int64

type IndexReader

type IndexReader struct {
	// contains filtered or unexported fields
}

IndexReader encapsulates Hash128 to allow concurrent access to Index

func NewIndexReader

func NewIndexReader(index *Index) *IndexReader

NewIndexReader creates new IndexReader

func (*IndexReader) Close

func (r *IndexReader) Close()

func (*IndexReader) Empty

func (r *IndexReader) Empty() bool

func (*IndexReader) Lookup

func (r *IndexReader) Lookup(key []byte) uint64

Lookup wraps index Lookup

func (*IndexReader) Lookup2

func (r *IndexReader) Lookup2(key1, key2 []byte) uint64

type RecSplit

type RecSplit struct {
	// contains filtered or unexported fields
}

RecSplit is the implementation of Recursive Split algorithm for constructing perfect hash mapping, described in https://arxiv.org/pdf/1910.06416.pdf Emmanuel Esposito, Thomas Mueller Graf, and Sebastiano Vigna. Recsplit: Minimal perfect hashing via recursive splitting. In 2020 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), pages 175−185. SIAM, 2020.

func NewRecSplit

func NewRecSplit(args RecSplitArgs, logger log.Logger) (*RecSplit, error)

NewRecSplit creates a new RecSplit instance with given number of keys and given bucket size Typical bucket size is 100 - 2000, larger bucket sizes result in smaller representations of hash functions, at a cost of slower access salt parameters is used to randomise the hash function construction, to ensure that different Erigon instances (nodes) are likely to use different hash function, to collision attacks are unlikely to slow down any meaningful number of nodes at the same time

func (*RecSplit) AddKey

func (rs *RecSplit) AddKey(key []byte, offset uint64) error

Add key to the RecSplit. There can be many more keys than what fits in RAM, and RecSplit spills data onto disk to accomodate that. The key gets copied by the collector, therefore the slice underlying key is not getting accessed by RecSplit after this invocation.

func (*RecSplit) AddOffset

func (rs *RecSplit) AddOffset(offset uint64) error

func (*RecSplit) Build

func (rs *RecSplit) Build(ctx context.Context) error

Build has to be called after all the keys have been added, and it initiates the process of building the perfect hash function and writing index into a file

func (*RecSplit) Close

func (rs *RecSplit) Close()

func (*RecSplit) Collision

func (rs *RecSplit) Collision() bool

Collision returns true if there was a collision detected during mapping of keys into 64-bit values RecSplit needs to be reset, re-populated with keys, and rebuilt

func (*RecSplit) DisableFsync

func (rs *RecSplit) DisableFsync()

func (*RecSplit) LogLvl

func (rs *RecSplit) LogLvl(lvl log.Lvl)

func (*RecSplit) ResetNextSalt

func (rs *RecSplit) ResetNextSalt()

ResetNextSalt resets the RecSplit and uses the next salt value to try to avoid collisions when mapping keys to 64-bit values

func (*RecSplit) SetTrace

func (rs *RecSplit) SetTrace(trace bool)

func (*RecSplit) Stats

func (rs *RecSplit) Stats() (int, int)

Stats returns the size of golomb rice encoding and ellias fano encoding

type RecSplitArgs

type RecSplitArgs struct {
	// Whether two level index needs to be built, where perfect hash map points to an enumeration, and enumeration points to offsets
	// if Enum=false: can have unsorted and duplicated values
	// if Enum=true:  must have sorted values (can have duplicates) - monotonically growing sequence
	Enums bool

	IndexFile   string // File name where the index and the minimal perfect hash function will be written to
	TmpDir      string
	StartSeed   []uint64 // For each level of recursive split, the hash seed (salt) used for that level - need to be generated randomly and be large enough to accomodate all the levels
	KeyCount    int
	BucketSize  int
	BaseDataID  uint64
	EtlBufLimit datasize.ByteSize
	Salt        uint32 // Hash seed (salt) for the hash function used for allocating the initial buckets - need to be generated randomly
	LeafSize    uint16
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL