Documentation ¶
Index ¶
- Constants
- Variables
- type GolombRice
- type GolombRiceReader
- type Index
- func (idx *Index) BaseDataID() uint64
- func (idx *Index) Close()
- func (idx *Index) DataHandle() unsafe.Pointer
- func (idx *Index) DisableReadAhead()
- func (idx *Index) Empty() bool
- func (idx *Index) EnableMadvNormal() *Index
- func (idx *Index) EnableReadAhead() *Index
- func (idx *Index) EnableWillNeed() *Index
- func (idx *Index) ExtractOffsets() map[uint64]uint64
- func (idx *Index) FileName() string
- func (idx *Index) FilePath() string
- func (idx *Index) GetReaderFromPool() *IndexReader
- func (idx *Index) KeyCount() uint64
- func (idx *Index) Lookup(bucketHash, fingerprint uint64) uint64
- func (idx *Index) ModTime() time.Time
- func (idx *Index) OrdinalLookup(i uint64) uint64
- func (idx *Index) RewriteWithOffsets(w *bufio.Writer, m map[uint64]uint64) error
- func (idx *Index) Size() int64
- type IndexReader
- type RecSplit
- func (rs *RecSplit) AddKey(key []byte, offset uint64) error
- func (rs *RecSplit) AddOffset(offset uint64) error
- func (rs *RecSplit) Build(ctx context.Context) error
- func (rs *RecSplit) Close()
- func (rs *RecSplit) Collision() bool
- func (rs *RecSplit) DisableFsync()
- func (rs *RecSplit) LogLvl(lvl log.Lvl)
- func (rs *RecSplit) ResetNextSalt()
- func (rs *RecSplit) SetTrace(trace bool)
- func (rs *RecSplit) Stats() (int, int)
- type RecSplitArgs
Constants ¶
const MaxLeafSize = 24
const RecSplitLogPrefix = "recsplit"
Variables ¶
var ErrCollision = fmt.Errorf("duplicate key")
Functions ¶
This section is empty.
Types ¶
type GolombRice ¶
type GolombRice struct {
// contains filtered or unexported fields
}
GolombRice can build up the golomb-rice encoding of the sequeuce of numbers, as well as read the numbers back from it.
func (*GolombRice) Bits ¶
func (g *GolombRice) Bits() int
Bits returns currrent number of bits in the compact encoding of the hash function representation
func (*GolombRice) Data ¶
func (g *GolombRice) Data() []uint64
Data returns the binary representation of the Golomb-Rice code that is built
type GolombRiceReader ¶
type GolombRiceReader struct {
// contains filtered or unexported fields
}
func (*GolombRiceReader) ReadNext ¶
func (g *GolombRiceReader) ReadNext(log2golomb int) uint64
func (*GolombRiceReader) ReadReset ¶
func (g *GolombRiceReader) ReadReset(bitPos, unaryOffset int)
func (*GolombRiceReader) SkipSubtree ¶
func (g *GolombRiceReader) SkipSubtree(nodes, fixedLen int)
type Index ¶
type Index struct {
// contains filtered or unexported fields
}
Index implements index lookup from the file created by the RecSplit
func (*Index) BaseDataID ¶
func (*Index) DataHandle ¶
func (*Index) DisableReadAhead ¶
func (idx *Index) DisableReadAhead()
DisableReadAhead - usage: `defer d.EnableReadAhead().DisableReadAhead()`. Please don't use this funcs without `defer` to avoid leak.
func (*Index) EnableMadvNormal ¶
func (*Index) EnableReadAhead ¶
func (*Index) EnableWillNeed ¶
func (*Index) ExtractOffsets ¶
func (*Index) GetReaderFromPool ¶
func (idx *Index) GetReaderFromPool() *IndexReader
func (*Index) OrdinalLookup ¶
OrdinalLookup returns the offset of i-th element in the index Perfect hash table lookup is not performed, only access to the Elias-Fano structure containing all offsets.
func (*Index) RewriteWithOffsets ¶
type IndexReader ¶
type IndexReader struct {
// contains filtered or unexported fields
}
IndexReader encapsulates Hash128 to allow concurrent access to Index
func NewIndexReader ¶
func NewIndexReader(index *Index) *IndexReader
NewIndexReader creates new IndexReader
func (*IndexReader) Close ¶
func (r *IndexReader) Close()
func (*IndexReader) Empty ¶
func (r *IndexReader) Empty() bool
func (*IndexReader) Lookup ¶
func (r *IndexReader) Lookup(key []byte) uint64
Lookup wraps index Lookup
func (*IndexReader) Lookup2 ¶
func (r *IndexReader) Lookup2(key1, key2 []byte) uint64
type RecSplit ¶
type RecSplit struct {
// contains filtered or unexported fields
}
RecSplit is the implementation of Recursive Split algorithm for constructing perfect hash mapping, described in https://arxiv.org/pdf/1910.06416.pdf Emmanuel Esposito, Thomas Mueller Graf, and Sebastiano Vigna. Recsplit: Minimal perfect hashing via recursive splitting. In 2020 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), pages 175−185. SIAM, 2020.
func NewRecSplit ¶
func NewRecSplit(args RecSplitArgs, logger log.Logger) (*RecSplit, error)
NewRecSplit creates a new RecSplit instance with given number of keys and given bucket size Typical bucket size is 100 - 2000, larger bucket sizes result in smaller representations of hash functions, at a cost of slower access salt parameters is used to randomise the hash function construction, to ensure that different Erigon instances (nodes) are likely to use different hash function, to collision attacks are unlikely to slow down any meaningful number of nodes at the same time
func (*RecSplit) AddKey ¶
Add key to the RecSplit. There can be many more keys than what fits in RAM, and RecSplit spills data onto disk to accomodate that. The key gets copied by the collector, therefore the slice underlying key is not getting accessed by RecSplit after this invocation.
func (*RecSplit) Build ¶
Build has to be called after all the keys have been added, and it initiates the process of building the perfect hash function and writing index into a file
func (*RecSplit) Collision ¶
Collision returns true if there was a collision detected during mapping of keys into 64-bit values RecSplit needs to be reset, re-populated with keys, and rebuilt
func (*RecSplit) DisableFsync ¶
func (rs *RecSplit) DisableFsync()
func (*RecSplit) ResetNextSalt ¶
func (rs *RecSplit) ResetNextSalt()
ResetNextSalt resets the RecSplit and uses the next salt value to try to avoid collisions when mapping keys to 64-bit values
type RecSplitArgs ¶
type RecSplitArgs struct { // Whether two level index needs to be built, where perfect hash map points to an enumeration, and enumeration points to offsets // if Enum=false: can have unsorted and duplicated values // if Enum=true: must have sorted values (can have duplicates) - monotonically growing sequence Enums bool IndexFile string // File name where the index and the minimal perfect hash function will be written to TmpDir string StartSeed []uint64 // For each level of recursive split, the hash seed (salt) used for that level - need to be generated randomly and be large enough to accomodate all the levels KeyCount int BucketSize int BaseDataID uint64 EtlBufLimit datasize.ByteSize Salt uint32 // Hash seed (salt) for the hash function used for allocating the initial buckets - need to be generated randomly LeafSize uint16 }