recsplit

package

v0.0.0-...-d3d9290 Latest Latest Go to latest Published: Dec 13, 2023 License: Apache-2.0 Imports: 24 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tenderly/erigon

Documentation ¶

Index ¶

Constants
Variables
type GolombRice
type GolombRiceReader
type Index
- func MustOpen(indexFile string) *Index
- func OpenIndex(indexFilePath string) (*Index, error)
type IndexReader
- func NewIndexReader(index *Index) *IndexReader
type RecSplit
- func NewRecSplit(args RecSplitArgs, logger log.Logger) (*RecSplit, error)
type RecSplitArgs

Constants ¶

View Source

const MaxLeafSize = 24

View Source

const RecSplitLogPrefix = "recsplit"

Variables ¶

View Source

var ErrCollision = fmt.Errorf("duplicate key")

Functions ¶

This section is empty.

Types ¶

type GolombRice ¶

type GolombRice struct {
	// contains filtered or unexported fields
}

GolombRice can build up the golomb-rice encoding of the sequeuce of numbers, as well as read the numbers back from it.

func (*GolombRice) Bits ¶

func (g *GolombRice) Bits() int

Bits returns currrent number of bits in the compact encoding of the hash function representation

func (*GolombRice) Data ¶

func (g *GolombRice) Data() []uint64

Data returns the binary representation of the Golomb-Rice code that is built

func (*GolombRice) Write ¶

func (g *GolombRice) Write(w io.Writer) error

Write outputs the state of golomb rice encoding into a writer, which can be recovered later by Read

type GolombRiceReader ¶

type GolombRiceReader struct {
	// contains filtered or unexported fields
}

func (*GolombRiceReader) ReadNext ¶

func (g *GolombRiceReader) ReadNext(log2golomb int) uint64

func (*GolombRiceReader) ReadReset ¶

func (g *GolombRiceReader) ReadReset(bitPos, unaryOffset int)

func (*GolombRiceReader) SkipSubtree ¶

func (g *GolombRiceReader) SkipSubtree(nodes, fixedLen int)

type Index ¶

type Index struct {
	// contains filtered or unexported fields
}

Index implements index lookup from the file created by the RecSplit

func MustOpen ¶

func MustOpen(indexFile string) *Index

func OpenIndex ¶

func OpenIndex(indexFilePath string) (*Index, error)

func (*Index) BaseDataID ¶

func (idx *Index) BaseDataID() uint64

func (*Index) Close ¶

func (idx *Index) Close()

func (*Index) DataHandle ¶

func (idx *Index) DataHandle() unsafe.Pointer

func (*Index) DisableReadAhead ¶

func (idx *Index) DisableReadAhead()

DisableReadAhead - usage: `defer d.EnableReadAhead().DisableReadAhead()`. Please don't use this funcs without `defer` to avoid leak.

func (*Index) Empty ¶

func (idx *Index) Empty() bool

func (*Index) EnableMadvNormal ¶

func (idx *Index) EnableMadvNormal() *Index

func (*Index) EnableReadAhead ¶

func (idx *Index) EnableReadAhead() *Index

func (*Index) EnableWillNeed ¶

func (idx *Index) EnableWillNeed() *Index

func (*Index) ExtractOffsets ¶

func (idx *Index) ExtractOffsets() map[uint64]uint64

func (*Index) FileName ¶

func (idx *Index) FileName() string

func (*Index) FilePath ¶

func (idx *Index) FilePath() string

func (*Index) GetReaderFromPool ¶

func (idx *Index) GetReaderFromPool() *IndexReader

func (*Index) KeyCount ¶

func (idx *Index) KeyCount() uint64

func (*Index) Lookup ¶

func (idx *Index) Lookup(bucketHash, fingerprint uint64) uint64

Lookup is not thread-safe because it used id.hasher

func (*Index) ModTime ¶

func (idx *Index) ModTime() time.Time

func (*Index) OrdinalLookup ¶

func (idx *Index) OrdinalLookup(i uint64) uint64

OrdinalLookup returns the offset of i-th element in the index Perfect hash table lookup is not performed, only access to the Elias-Fano structure containing all offsets.

func (*Index) RewriteWithOffsets ¶

func (idx *Index) RewriteWithOffsets(w *bufio.Writer, m map[uint64]uint64) error

func (*Index) Size ¶

func (idx *Index) Size() int64

type IndexReader ¶

type IndexReader struct {
	// contains filtered or unexported fields
}

IndexReader encapsulates Hash128 to allow concurrent access to Index

func (*IndexReader) Close ¶

func (r *IndexReader) Close()

func (*IndexReader) Empty ¶

func (r *IndexReader) Empty() bool

func (*IndexReader) Lookup ¶

func (r *IndexReader) Lookup(key []byte) uint64

Lookup wraps index Lookup

func (*IndexReader) Lookup2 ¶

func (r *IndexReader) Lookup2(key1, key2 []byte) uint64

type RecSplit ¶

type RecSplit struct {
	// contains filtered or unexported fields
}

RecSplit is the implementation of Recursive Split algorithm for constructing perfect hash mapping, described in https://arxiv.org/pdf/1910.06416.pdf Emmanuel Esposito, Thomas Mueller Graf, and Sebastiano Vigna. Recsplit: Minimal perfect hashing via recursive splitting. In 2020 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), pages 175−185. SIAM, 2020.

func NewRecSplit ¶

func NewRecSplit(args RecSplitArgs, logger log.Logger) (*RecSplit, error)

NewRecSplit creates a new RecSplit instance with given number of keys and given bucket size Typical bucket size is 100 - 2000, larger bucket sizes result in smaller representations of hash functions, at a cost of slower access salt parameters is used to randomise the hash function construction, to ensure that different Erigon instances (nodes) are likely to use different hash function, to collision attacks are unlikely to slow down any meaningful number of nodes at the same time

func (*RecSplit) AddKey ¶

func (rs *RecSplit) AddKey(key []byte, offset uint64) error

Add key to the RecSplit. There can be many more keys than what fits in RAM, and RecSplit spills data onto disk to accomodate that. The key gets copied by the collector, therefore the slice underlying key is not getting accessed by RecSplit after this invocation.

func (*RecSplit) AddOffset ¶

func (rs *RecSplit) AddOffset(offset uint64) error

func (*RecSplit) Build ¶

func (rs *RecSplit) Build(ctx context.Context) error

Build has to be called after all the keys have been added, and it initiates the process of building the perfect hash function and writing index into a file

func (*RecSplit) Close ¶

func (rs *RecSplit) Close()

func (*RecSplit) Collision ¶

func (rs *RecSplit) Collision() bool

Collision returns true if there was a collision detected during mapping of keys into 64-bit values RecSplit needs to be reset, re-populated with keys, and rebuilt

func (*RecSplit) DisableFsync ¶

func (rs *RecSplit) DisableFsync()

func (*RecSplit) LogLvl ¶

func (rs *RecSplit) LogLvl(lvl log.Lvl)

func (*RecSplit) ResetNextSalt ¶

func (rs *RecSplit) ResetNextSalt()

ResetNextSalt resets the RecSplit and uses the next salt value to try to avoid collisions when mapping keys to 64-bit values

func (*RecSplit) SetTrace ¶

func (rs *RecSplit) SetTrace(trace bool)

func (*RecSplit) Stats ¶

func (rs *RecSplit) Stats() (int, int)

Stats returns the size of golomb rice encoding and ellias fano encoding

type RecSplitArgs ¶

type RecSplitArgs struct {
	// Whether two level index needs to be built, where perfect hash map points to an enumeration, and enumeration points to offsets
	// if Enum=false: can have unsorted and duplicated values
	// if Enum=true:  must have sorted values (can have duplicates) - monotonically growing sequence
	Enums bool

	IndexFile   string // File name where the index and the minimal perfect hash function will be written to
	TmpDir      string
	StartSeed   []uint64 // For each level of recursive split, the hash seed (salt) used for that level - need to be generated randomly and be large enough to accomodate all the levels
	KeyCount    int
	BucketSize  int
	BaseDataID  uint64
	EtlBufLimit datasize.ByteSize
	Salt        uint32 // Hash seed (salt) for the hash function used for allocating the initial buckets - need to be generated randomly
	LeafSize    uint16
}

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
eliasfano16
eliasfano32

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

Types ¶

type GolombRice ¶

func (*GolombRice) Bits ¶

func (*GolombRice) Data ¶

func (*GolombRice) Write ¶

type GolombRiceReader ¶

func (*GolombRiceReader) ReadNext ¶

func (*GolombRiceReader) ReadReset ¶

func (*GolombRiceReader) SkipSubtree ¶

type Index ¶

func MustOpen ¶

func OpenIndex ¶

func (*Index) BaseDataID ¶

func (*Index) Close ¶

func (*Index) DataHandle ¶

func (*Index) DisableReadAhead ¶

func (*Index) Empty ¶

func (*Index) EnableMadvNormal ¶

func (*Index) EnableReadAhead ¶

func (*Index) EnableWillNeed ¶

func (*Index) ExtractOffsets ¶

func (*Index) FileName ¶

func (*Index) FilePath ¶

func (*Index) GetReaderFromPool ¶

func (*Index) KeyCount ¶

func (*Index) Lookup ¶

func (*Index) ModTime ¶

func (*Index) OrdinalLookup ¶

func (*Index) RewriteWithOffsets ¶

func (*Index) Size ¶

type IndexReader ¶

func NewIndexReader ¶

func (*IndexReader) Close ¶

func (*IndexReader) Empty ¶

func (*IndexReader) Lookup ¶

func (*IndexReader) Lookup2 ¶

type RecSplit ¶

func NewRecSplit ¶

func (*RecSplit) AddKey ¶

func (*RecSplit) AddOffset ¶

func (*RecSplit) Build ¶

func (*RecSplit) Close ¶

func (*RecSplit) Collision ¶

func (*RecSplit) DisableFsync ¶

func (*RecSplit) LogLvl ¶

func (*RecSplit) ResetNextSalt ¶

func (*RecSplit) SetTrace ¶

func (*RecSplit) Stats ¶

type RecSplitArgs ¶

Source Files ¶

Directories ¶