compactindex

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 14, 2024 License: AGPL-3.0, Apache-2.0 Imports: 13 Imported by: 0

README

a fast flat-file index for constant datasets

This package specifies a file format and Go implementation for indexing constant datasets.

compactindex

  • is an immutable file format;
  • maps arbitrary keys into offsets in an external flat file;
  • consumes a constant amount of space per entry
    • ~6-8 bytes, regardless of key size
    • 3 bytes per enty
  • O(1) complexity queries, with 2 + log2(10000) lookups worst- & average-case (binary search);
  • during construction, requires near-constant memory space and O(n) scratch space with regard to entries per file;
  • during construction, features a constant >500k entry/s per-core write rate (2.5 GHz Intel laptop);
  • works on any storage supporting random reads (regular files, HTTP range requests, on-chain, ...);
  • is based on the "FKS method" which uses perfect (collision-free) hash functions in a two-level hashtable; [^1]
  • is inspired by D. J. Bernstein's "constant database"; [^2]
  • uses the xxHash64 non-cryptographic hash-function; [^3]

Refer to the Go documentation for the algorithms used and implementation details.

Go Reference

[^1]: Fredman, M. L., Komlós, J., & Szemerédi, E. (1984). Storing a Sparse Table with 0 (1) Worst Case Access Time. Journal of the ACM, 31(3), 538–544. https://doi.org/10.1145/828.1884 [^2]: cdb by D. J. Bernstein https://cr.yp.to/cdb.html [^3]: Go implementation of xxHash by @cespare: https://github.com/cespare/xxhash/

Interface

In programming terms:

fn lookup(key: &[byte]) -> Option<u64>

Given an arbitrary key, the index

  • states whether the key exists in the index
  • if it exists, maps the key to an integer (usually an offset into a file)

Examples

Here are some example scenarios where compactindex is useful:

  • When working with immutable data structures
    • Example: Indexing IPLD CAR files carrying Merkle-DAGs of content-addressable data
  • When working with archived/constant data
    • Example: Indexing files in .tar archives
  • When dealing with immutable remote storage such as S3-like object storage
    • Example: Storing the index and target file in S3, then using HTTP range requests to efficiently query data

Here are some things compactindex cannot do:

  • Cannot add more entries to an existing index
    • Reason 1: indexes are tightly packed, so there is no space to insert new entries (though fallocate(2) with FALLOC_FL_INSERT_RANGE would technically work)
    • Reason 2: the second-level hashtable uses a perfect hash function ensuring collision-free indexing of a subset of entries; inserting new entries might cause a collision requiring
    • Reason 3: adding too many entries will eventually create an imbalance in the first-level hashtable; fixing this imbalance effectively requires re-constructing the file from scratch
  • Cannot iterate over keys
    • Reason: compactindex stores hashes, not the entries themselves. This saves space but also allows for efficient random reads used during binary search

File Format (v0)

Encoding

The file format contains binary packed structures with byte alignment.

Integers are encoded as little endian.

File Header

The file beings with a 32 byte file header.

#[repr(packed)]
struct FileHeader {
    magic: [u8; 8],       // 0x00
    max_value: u64,       // 0x08
    num_buckets: u32,     // 0x10
    padding_14: [u8; 12], // 0x14
}
  • magic is set to the UTF-8 string "rdcecidx". The reader should reject files that don't start with this string.
  • num_buckets is set to the number of hashtable buckets.
  • max_value indicates the integer width of index values.
  • padding_14 must be zero. (reserved for future use)

Bucket Header Table

The file header is followed by a vector of bucket headers. The number of is set by num_buckets in the file header.

Each bucket header is 16 bytes long.

#[repr(packed)]
struct BucketHeader {
    hash_domain: u32, // 0x00
    num_entries: u32, // 0x04
    hash_len: u8,     // 0x08
    padding_09: u8,   // 0x09
    file_offset: u48, // 0x10
}
  • hash_domain is a "salt" to the per-bucket hash function.
  • num_entries is set to the number of records in the bucket.
  • hash_len is the size of the per-record hash in bytes and currently hardcoded to 3.
  • padding_09 must be zero.
  • file_offset is an offset from the beginning of the file header to the start of the bucket entries.

Bucket Entry Table

Each bucket has a vector of entries with length num_entries. This structure makes up the vast majority of the index.

#[repr(packed)]
struct Entry {
    hash: u??,
    value: u??,
}

The size of entry is static within a bucket. It is determined by its components:

  • The size of hash in bytes equals hash_len
  • The size of value in bytes equals the byte aligned integer width that is minimally required to represent max_value

Documentation

Overview

Package compactindex is an immutable hashtable index format inspired by djb's constant database (cdb).

Design

Compactindex is used to create secondary indexes over arbitrary flat files. Each index is a single, immutable flat file.

Index files consist of a space-optimized and query-optimized key-value-like table.

Instead of storing actual keys, the format stores FKS dynamic perfect hashes. And instead of storing values, the format contains offsets into some file.

As a result, the database effectively only supports two operations, similarly to cdb. (Note that the actual Go interface is a bit more flexible).

func Create(kv map[[]byte]uint64) *Index
func (*Index) Lookup(key []byte) (value uint64, exist bool)

Buckets

The set of items is split into buckets of approx 10000 records. The number of buckets is unlimited.

The key-to-bucket assignment is determined by xxHash3 using uniform discrete hashing over the key space.

The index file header also mentions the number of buckets and the file offset of each bucket.

Tables

Each bucket contains a table of entries, indexed by a collision-free hash function.

The hash function used in the entry table is xxHash. A 32-bit hash domain is prefixed to mine collision-free sets of hashes (FKS scheme). This hash domain is also recorded at the bucket header.

Each bucket entry is a constant-size record consisting of a 3-byte hash and an offset to the value. The size of the offset integer is the minimal byte-aligned integer width that can represent the target file size.

Querying

The query interface (DB) is backend-agnostic, supporting any storage medium that provides random reads. To name a few: Memory buffers, local files, arbitrary embedded buffers, HTTP range requests, plan9, etc...

The DB struct itself performs zero memory allocations and therefore also doesn't cache. It is therefore recommended to provide a io.ReaderAt backed by a cache to improve performance.

Given a key, the query strategy is simple:

  1. Hash key to bucket using global hash function
  2. Retrieve bucket offset from bucket header table
  3. Hash key to entry using per-bucket hash function
  4. Search for entry in bucket (binary search)

The search strategy for locating entries in buckets can be adjusted to fit the latency/bandwidth profile of the underlying storage medium.

For example, the fastest lookup strategy in memory is a binary search retrieving double cache lines at a time. When doing range requests against high-latency remote storage (e.g. S3 buckets), it is typically faster to retrieve and scan through large parts of a bucket (multiple kilobytes) at once.

Construction

Constructing a compactindex requires upfront knowledge of the number of items and highest possible target offset (read: target file size).

The process requires scratch space of around 16 bytes per entry. During generation, data is offloaded to disk for memory efficiency.

The process works as follows:

  1. Determine number of buckets and offset integer width based on known input params (item count and target file size).
  2. Linear pass over input data, populating temporary files that contain the unsorted entries of each bucket.
  3. For each bucket, brute force a perfect hash function that defines a bijection between hash values and keys in the bucket.
  4. For each bucket, sort by hash values.
  5. Store to index.

An alternative construction approach is available when the number of items or target file size is unknown. In this case, a set of keys is first serialized to a flat file.

Index

Constants

View Source
const Version = uint8(1)

Variables

View Source
var ErrCollision = errors.New("hash collision")
View Source
var ErrNotFound = errors.New("not found")

ErrNotFound marks a missing entry.

View Source
var Magic = [8]byte{'r', 'd', 'c', 'e', 'c', 'i', 'd', 'x'}

Magic are the first eight bytes of an index.

Functions

func EntryHash64

func EntryHash64(prefix uint32, key []byte) uint64

EntryHash64 is a xxHash-based hash function using an arbitrary prefix.

Types

type Bucket

type Bucket struct {
	BucketDescriptor
	Entries *io.SectionReader
}

Bucket is a database handle pointing to a subset of the index.

func (*Bucket) Load

func (b *Bucket) Load(batchSize int) ([]Entry, error)

Load retrieves all entries in the hashtable.

func (*Bucket) Lookup

func (b *Bucket) Lookup(key []byte) (uint64, error)

Lookup queries for a key using binary search.

type BucketDescriptor

type BucketDescriptor struct {
	BucketHeader
	Stride      uint8 // size of one entry in bucket
	OffsetWidth uint8 // with of offset field in bucket
}

type BucketHeader

type BucketHeader struct {
	HashDomain uint32
	NumEntries uint32
	HashLen    uint8
	FileOffset uint64
}

BucketHeader occurs at the beginning of each bucket.

func (*BucketHeader) Hash

func (b *BucketHeader) Hash(key []byte) uint64

Hash returns the per-bucket hash of a key.

func (*BucketHeader) Load

func (b *BucketHeader) Load(buf *[bucketHdrLen]byte)

func (*BucketHeader) Store

func (b *BucketHeader) Store(buf *[bucketHdrLen]byte)

type Builder

type Builder struct {
	Header
	// contains filtered or unexported fields
}

Builder creates new compactindex files.

func NewBuilder

func NewBuilder(dir string, numItems uint, targetFileSize uint64) (*Builder, error)

NewBuilder creates a new index builder.

If dir is an empty string, a random temporary directory is used.

numItems refers to the number of items in the index.

targetFileSize is the size of the file that index entries point to. Can be set to zero if unknown, which results in a less efficient (larger) index.

func (*Builder) Close

func (b *Builder) Close() error

func (*Builder) Insert

func (b *Builder) Insert(key []byte, value uint64) error

Insert writes a key-value mapping to the index.

Index generation will fail if the same key is inserted twice. The writer must not pass a value greater than targetFileSize.

func (*Builder) Seal

func (b *Builder) Seal(ctx context.Context, f *os.File) (err error)

Seal writes the final index to the provided file. This process is CPU-intensive, use context to abort prematurely.

The file should be opened with access mode os.O_RDWR. Passing a non-empty file will result in a corrupted index.

type DB

type DB struct {
	Header
	Stream io.ReaderAt
	// contains filtered or unexported fields
}

DB is a compactindex handle.

func Open

func Open(stream io.ReaderAt) (*DB, error)

Open returns a handle to access a compactindex.

The provided stream must start with the Magic byte sequence. Tip: Use io.NewSectionReader to create aligned substreams when dealing with a file that contains multiple indexes.

func (*DB) GetBucket

func (db *DB) GetBucket(i uint) (*Bucket, error)

GetBucket returns a handle to the bucket at the given index.

func (*DB) Lookup

func (db *DB) Lookup(key []byte) (uint64, error)

Lookup queries for a key in the index and returns the value (offset), if any.

Returns ErrNotFound if the key is unknown.

func (*DB) LookupBucket

func (db *DB) LookupBucket(key []byte) (*Bucket, error)

LookupBucket returns a handle to the bucket that might contain the given key.

func (*DB) Prefetch

func (db *DB) Prefetch(yes bool)

type Entry

type Entry struct {
	Hash  uint64
	Value uint64
}

Entry is a single element in a hash table.

func SearchSortedEntries

func SearchSortedEntries(entries []Entry, hash uint64) *Entry

SearchSortedEntries performs an in-memory binary search for a given hash.

type Header struct {
	FileSize   uint64
	NumBuckets uint32
}

Header occurs once at the beginning of the index.

func (*Header) BucketHash

func (h *Header) BucketHash(key []byte) uint

BucketHash returns the bucket index for the given key.

Uses a truncated xxHash64 rotated until the result fits.

func (*Header) Load

func (h *Header) Load(buf *[headerSize]byte) error

Load checks the Magic sequence and loads the header fields.

func (*Header) Store

func (h *Header) Store(buf *[headerSize]byte)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL