Documentation
¶
Overview ¶
Package chd implements ChdBuilder - to create fast, minimal perfect hash functions from a given set of keys. This is an implementation of CHD in http://cmph.sourceforge.net/papers/esa09.pdf -
Additionally, DBWriter enables creating a fast, constant-time DB for read-only workloads. It serializes the key,value pairs and builds a CHD minimal perfect hash function over the given keys. The serialized DB can be read back via DBReader for constant time lookups of the MPH DB.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( // ErrMPHFail is returned when the gamma value provided to Freeze() is too small to // build a minimal perfect hash table. ErrMPHFail = errors.New("failed to build MPH") // ErrFrozen is returned when attempting to add new records to an already frozen DB // It is also returned when trying to freeze a DB that's already frozen. ErrFrozen = errors.New("DB already frozen") // ErrValueTooLarge is returned if the value-length is larger than 2^32-1 bytes ErrValueTooLarge = errors.New("value is larger than 2^32-1 bytes") // ErrExists is returned if a duplicate key is added to the DB ErrExists = errors.New("key exists in DB") // ErrNoKey is returned when a key cannot be found in the DB ErrNoKey = errors.New("No such key") )
Functions ¶
This section is empty.
Types ¶
type Chd ¶
type Chd struct {
// contains filtered or unexported fields
}
Chd represents a frozen PHF for the given set of keys
func (*Chd) Find ¶
Find returns a unique integer representing the minimal hash for key 'k'. The return value is meaningful ONLY for keys in the original key set (provided at the time of construction of the minimal-hash). Callers should verify that the key at the returned index == k.
func (*Chd) MarshalBinary ¶
MarshalBinary encodes the hash into a binary form suitable for durable storage. A subsequent call to UnmarshalBinary() will reconstruct the CHD instance.
func (*Chd) UnmarshalBinaryMmap ¶
UnmarshalBinaryMmap reads a previously marshalled Chd instance and returns a lookup table. It assumes that buf is memory-mapped and aligned at the right boundaries.
type ChdBuilder ¶
type ChdBuilder struct {
// contains filtered or unexported fields
}
ChdBuilder is used to create a MPHF from a given set of uint64 keys
func New ¶
func New() (*ChdBuilder, error)
New enables creation of a minimal perfect hash function via the Compress Hash Displace algorithm. Once created, callers can add keys to it before Freezing the MPH and generating a constant time lookup table. This implementation of CHD uses uint64 keys. Callers can use any good hash function (murmur hash etc.) to map their data into these keys. Once the construction is frozen, callers can use "Find()" to find the unique mapping for each key in 'keys'.
func (*ChdBuilder) Add ¶
func (c *ChdBuilder) Add(key uint64) error
Add a new key to the MPH builder
type DBReader ¶
type DBReader struct {
// contains filtered or unexported fields
}
DBReader represents the query interface for a previously constructed constant database (built using NewDBWriter()). The only meaningful operation on such a database is Lookup().
func NewDBReader ¶
NewDBReader reads a previously construct database in file 'fn' and prepares it for querying. Records are opportunistically cached after reading from disk. We retain upto 'cache' number of records in memory (default 128).
func (*DBReader) Find ¶
Find looks up 'key' in the table and returns the corresponding value. It returns an error if the key is not found or the disk i/o failed or the record checksum failed.
type DBWriter ¶
type DBWriter struct {
// contains filtered or unexported fields
}
DBWriter represents an abstraction to construct a read-only constant database. This database uses CHD as the underlying mechanism for constant time lookups of keys; keys and values are represented as arbitrary byte sequences ([]byte). The DB meta-data is protected by strong checksum (SHA512-256) and each stored value is protected by a distinct siphash-2-4. Once all addition of key/val is complete, the DB is written to disk via the Freeze() function.
We don't want to use SHA512-256 over the entire file - because it will mean reading a potentially large file in DBReader(). By using checksums separately per record, we increase the overhead a bit - but speeds up DBReader initialization for the common case; we will be verifying actual records opportunistically.
The DB has the following general structure:
64 byte file header: big-endian encoding of all multibyte ints
magic [4]byte "CHDB"
flags uint32 for now, all zeros
salt [16]byte random salt for siphash record integrity
nkeys uint64 Number of keys in the DB
offtbl uint64 File offset of <offset, hash> table
Contiguous series of records; each record is a key/value pair:
cksum uint64 Siphash checksum of value, offset (big endian)
val []byte value bytes
Possibly a gap until the next PageSize boundary (4096 bytes)
Offset table: nkeys worth of offsets, hash pairs. Everything in this table is little-endian encoded so we can mmap() it into memory. Entry 'i' has two 64-bit words:
offset in the file where the corresponding value can be found
hash key corresponding to the value
Val_len table: nkeys worth of value lengths corresponding to each key.
Marshaled Chd bytes (Chd:MarshalBinary())
32 bytes of strong checksum (SHA512_256); this checksum is done over the file header, offset-table and marshaled chd.
func NewDBWriter ¶
NewDBWriter prepares file 'fn' to hold a constant DB built using CHD minimal perfect hash function. Once written, the DB is "frozen" and readers will open it using NewDBReader() to do constant time lookups of key to value.
func (*DBWriter) Abort ¶
func (w *DBWriter) Abort()
Abort stops the construction of the perfect hash db
func (*DBWriter) AddKeyVals ¶
AddKeyVals adds a series of key-value matched pairs to the db. If they are of unequal length, only the smaller of the lengths are used. Records with duplicate keys are discarded. Returns number of records added.