gonudb

package module

v0.4.2 Latest Latest Go to latest Published: Jun 27, 2023 License: BSL-1.0 Imports: 4 Imported by: 3

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/iand/gonudb

Links

Open Source Insights

README ¶

Gonudb

Gonudb is an append-only key/value datastore written in Go.

Overview

Gonudb is a port of NuDB, a C++ key/value store.

A Gonudb datastore comprises a data file holding keys and values stored sequentially and an accompanying key file which forms an on-disk hash table indexing the values stored in the data file.

During commits a log file is created to store bookkeeping information that may be used to repair the datastore in the event of a failure.

The data file and key file are independent and a new key file may be rebuilt from the data file if necessary, potentially with an alternate hashing scheme.

Installation

Execute go get github.com/iand/gonudb within a Go module directory to add it to your module.

Usage

Gonudb is primarily a library. Import package github.com/iand/gonudb to use. A sample application that demonstrates some simple inserts and fetches is provided in cmd/gonudbsample.

An admin tool can be found in cmd/gonudbadmin which provides some commands for inspecting and validating the files that comprise a store.

Install by executing go install github.com/iand/gonudb/cmd/gonudbadmin from the root of the repository.

gonudbadmin info can be used to view charactistic information about any of the three files used by gonudb (data, key and log files).
gonudbadmin verify verifies the consistency of data and key files and shows some statistics on the data they hold.

Design

Gonudb shares the design ideals that motivated NuDB (but see Status below):

Writes should not block reads.
Reads should be limited only by the SSD's IOPS limit.
A read for a non-present key should require one IOP.
A read for a present key whose data can be read in a single IOP should only require two IOPs, one to figure out where it is and one to read it in.

Keys and values are stored sequentially in an append only data file. The data file begins with a header that contains characteristic information about the file such as the version of the encoding scheme, a datastore identifier and an application identifier. Data records follow immediately on from the header. Each record comprises the size of the value, followed by the size of the key, followed by the key, followed by the value data. The data file is considered to be immutable and there are no delete or mutate operations.

Inserts are buffered in memory and periodically committed to disk. Clients are throttled based on the rate at which data is flushed to disk. Values are immediately discoverable via their key and may be read from memory or disk.

Keys are hashed and written to buckets stored in the key file. As with the data file, the key file begins with a header containing characteristic information. The key file's version, datastore identifier and application identifier must match those in the data file header. Additionally the key file header contains the hash salt, the block size of each bucket and the target load factor which determines when a bucket should be split. Buckets are a fixed size and written sequentially after the header which enables them to the be easily located by index.

Each bucket is assigned a range of hash values and entries within a bucket are ordered by hash. When the number of entries in a bucket exceeds the load factor it undergoes a split and its entries are rehashed across the pair of buckets using the linear hashing algorithm. When a bucket exceeds its capacity it is spilled to the data file and replaced with an empty bucket containing a pointer to the spill record. A spilled bucket may spill multiple times with the resulting spill records forming a linked list in the data file.

In the best case reading a record from the datastore requires one read from the key file to load the relevant bucket and a read from the data file to access the value. Additional reads from the data file may be required to resolve hash collisions and to load spill records. Read performance is independent of the size of the datastore and the size of buckets in the key file may be tuned to the block size of the underlying physical media so loading a bucket may only take a single IOP.

Status

Version 0.1.0 is an alpha quality functional port of the original NuDB suitable for testing with expendable loads. Correctness and safety has been prioritised over performance. Locks are broad in scope and treat reads and writes with equal priority. Future work will tune the locking bahaviour to better meet the goal of writes not blocking reads.

High priority tasks include:

Add recover from partial writes
Add rekey admin function.
Tune locking strategy

Additional features under consideration:

Allow alternate hashing functions to be specified.

Author

Go port written by:

Ian Davis - http://iandavis.com/

License

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE or copy at http://www.boost.org/LICENSE_1_0.txt)

Documentation ¶

Index ¶

Constants
Variables
func CreateStore(datPath, keyPath, logPath string, appnum, salt uint64, blockSize int, ...) error
func NewSalt() uint64
func Version() string
type Bucket
type BucketEntry
type BucketScanner
type RecordScanner
type Store
- func OpenStore(datPath, keyPath, logPath string, options *StoreOptions) (*Store, error)
type StoreOptions

Constants ¶

View Source

const (
	LogLevelDiagnostics = 1 // log level increment for diagnostics logging
	LogLevelTrace       = 2 // log level increment for verbose tracing
)

Variables ¶

View Source

var (
	ErrAppNumMismatch     = internal.ErrAppNumMismatch
	ErrDataMissing        = internal.ErrDataMissing
	ErrDataTooLarge       = internal.ErrDataTooLarge
	ErrDifferentVersion   = internal.ErrDifferentVersion
	ErrHashMismatch       = internal.ErrHashMismatch
	ErrInvalidBlockSize   = internal.ErrInvalidBlockSize
	ErrInvalidBucketCount = internal.ErrInvalidBucketCount
	ErrInvalidCapacity    = internal.ErrInvalidCapacity
	ErrInvalidDataRecord  = internal.ErrInvalidDataRecord
	ErrInvalidKeySize     = internal.ErrInvalidKeySize
	ErrInvalidLoadFactor  = internal.ErrInvalidLoadFactor
	ErrInvalidRecordSize  = internal.ErrInvalidRecordSize
	ErrInvalidSpill       = internal.ErrInvalidSpill
	ErrKeyExists          = internal.ErrKeyExists
	ErrKeyMismatch        = internal.ErrKeyMismatch
	ErrKeyMissing         = internal.ErrKeyMissing
	ErrKeyNotFound        = internal.ErrKeyNotFound
	ErrKeySizeMismatch    = internal.ErrKeySizeMismatch
	ErrKeyTooLarge        = internal.ErrKeyTooLarge
	ErrKeyWrongSize       = internal.ErrKeyWrongSize // deprecated: use ErrKeyMissing and ErrKeyTooLarge instead
	ErrNotDataFile        = internal.ErrNotDataFile
	ErrNotKeyFile         = internal.ErrNotKeyFile
	ErrNotLogFile         = internal.ErrNotLogFile
	ErrShortKeyFile       = internal.ErrShortKeyFile
	ErrUIDMismatch        = internal.ErrUIDMismatch
)

Functions ¶

func CreateStore ¶

func CreateStore(datPath, keyPath, logPath string, appnum, salt uint64, blockSize int, loadFactor float64) error

func NewSalt ¶

func NewSalt() uint64

func Version ¶

func Version() string

Types ¶

type Bucket ¶

type Bucket struct {
	// contains filtered or unexported fields
}

A Bucket contains a set of key entries that form part of the data store's index.

func (*Bucket) ActualSize ¶

func (b *Bucket) ActualSize() int

ActualSize returns the serialized bucket size, excluding empty space

func (*Bucket) BlockSize ¶

func (b *Bucket) BlockSize() int

BlockSize returns the physical size of a key file bucket.

func (*Bucket) Capacity ¶

func (b *Bucket) Capacity() int

Capacity returns the maximum number of key entries that can be held in the bucket.

func (*Bucket) Count ¶

func (b *Bucket) Count() int

Count returns the number of key entries in the bucket

func (*Bucket) Entry ¶

func (b *Bucket) Entry(idx int) BucketEntry

Entry returns the record for a key entry

func (*Bucket) Has ¶

func (b *Bucket) Has(h uint64) bool

Has reports whether the bucket contains an entry with the given hash.

func (*Bucket) HashRange ¶

func (b *Bucket) HashRange() (uint64, uint64)

HashRange returns the range of hashed keys that are contained in the bucket.

func (*Bucket) IsEmpty ¶

func (b *Bucket) IsEmpty() bool

IsEmpty reports whether the bucket has any key entries.

func (*Bucket) Spill ¶

func (b *Bucket) Spill() int64

Spill returns offset in the store's data file of next spill record or 0 is there no spill.

type BucketEntry ¶

type BucketEntry struct {
	// Offset is the position in the store's data file of the data record.
	Offset int64

	// Size is the size of the data value within the data record.
	Size int64

	// Hash is the hashed version of the key used to insert the data value.
	Hash uint64
}

type BucketScanner ¶

type BucketScanner struct {
	// contains filtered or unexported fields
}

BucketScanner implements a sequential scan through a key file. Successive calls to the Next method will step through the buckets in the file, including spilled buckets in the data file.

func (*BucketScanner) Bucket ¶

func (s *BucketScanner) Bucket() *Bucket

Bucket returns the current bucket. Should not be called until Next has been called. The bucket is backed by data that may be overwritten with a call to Next so should not be retained.

func (*BucketScanner) Close ¶

func (s *BucketScanner) Close() error

Close closes the underlying reader used by the scanner.

func (*BucketScanner) Err ¶

func (s *BucketScanner) Err() error

Err returns the first non-EOF error that was encountered by the BucketScanner.

func (*BucketScanner) Index ¶

func (s *BucketScanner) Index() int

Index returns the index of the current bucket. Should not be called until Next has been called. Spill buckets share an index with their parent.

func (*BucketScanner) IsSpill ¶

func (s *BucketScanner) IsSpill() bool

IsSpill reports whether the current bucket was read from a data store spill.

func (*BucketScanner) Next ¶

func (s *BucketScanner) Next() bool

Next reads the next bucket in sequence, including spills to the data store. It returns false if it encounters an error or there are no more buckets to read.

type RecordScanner ¶

type RecordScanner struct {
	// contains filtered or unexported fields
}

RecordScanner implements a sequential scan through a store's data file. Successive calls to the Next method will step through the records in the file. Note that the scanner does not include data buffered in memory. Call Flush to ensure all written data is visible to the scanner.

func (*RecordScanner) Close ¶

func (s *RecordScanner) Close() error

func (*RecordScanner) Err ¶

func (s *RecordScanner) Err() error

Err returns the first non-EOF error that was encountered by the RecordScanner.

func (*RecordScanner) IsData ¶

func (s *RecordScanner) IsData() bool

IsData reports whether the current record is a data record

func (*RecordScanner) IsSpill ¶

func (s *RecordScanner) IsSpill() bool

IsSpill reports whether the current record is a bucket spill

func (*RecordScanner) Key ¶

func (s *RecordScanner) Key() string

Size returns the key of the current record

func (*RecordScanner) Next ¶

func (s *RecordScanner) Next() bool

Next reads the next bucket in sequence, including spills to the data store. It returns false if it encounters an error or there are no more buckets to read.

func (*RecordScanner) Reader ¶

func (s *RecordScanner) Reader() io.Reader

Reader returns an io.Reader that may be used to read the data from the record. Should not be called until Next has been called. The Reader is only valid for use until the next call to Next().

func (*RecordScanner) RecordSize ¶

func (s *RecordScanner) RecordSize() int64

RecordSize returns the number of bytes occupied by the current record including its header

func (*RecordScanner) Size ¶

func (s *RecordScanner) Size() int64

Size returns the size of the current record's data in bytes

type Store ¶

type Store struct {
	// contains filtered or unexported fields
}

func OpenStore ¶

func OpenStore(datPath, keyPath, logPath string, options *StoreOptions) (*Store, error)

func (*Store) AppNum ¶

func (s *Store) AppNum() uint64

AppNum returns the store's application-defined integer constant.

func (*Store) BlockSize ¶

func (s *Store) BlockSize() uint16

BlockSize returns the physical size of a key file bucket.

func (*Store) BucketScanner ¶

func (s *Store) BucketScanner() *BucketScanner

BucketScanner returns a scanner that may be used to iterate the datastore's index of keys. The caller is responsible for calling Close on the scanner after use.

func (*Store) Close ¶

func (s *Store) Close() error

func (*Store) DataSize ¶ added in v0.2.1

func (s *Store) DataSize(key string) (int64, error)

DataSize returns the size of the data record associated with a key.

func (*Store) Err ¶

func (s *Store) Err() error

Err returns an error if the store is in an error state, nil otherwise

func (*Store) Exists ¶ added in v0.2.1

func (s *Store) Exists(key string) (bool, error)

Exists reports whether a data record is associated with a key.

func (*Store) Fetch ¶

func (s *Store) Fetch(key string) ([]byte, error)

Fetch fetches the value associated with key from the store.

func (*Store) FetchReader ¶

func (s *Store) FetchReader(key string) (io.Reader, error)

Fetch fetches a reader that may be used to read the value associated with a key.

func (*Store) Flush ¶

func (s *Store) Flush() error

func (*Store) Insert ¶

func (s *Store) Insert(key string, value []byte) error

Insert adds a key/value pair to the store. Zero length values are not supported.

func (*Store) Rate ¶ added in v0.2.0

func (s *Store) Rate() float64

Rate returns the data write rate in bytes per second.

func (*Store) RecordCount ¶ added in v0.2.0

func (s *Store) RecordCount() int

RecordCount returns the number of data records in the store.

func (*Store) RecordScanner ¶

func (s *Store) RecordScanner() *RecordScanner

RecordScanner returns a scanner that may be used to iterate the datastore's values. The caller is responsible for calling Close on the scanner after use.

func (*Store) UID ¶

func (s *Store) UID() uint64

AppNum returns the store's unique id that was generated on creation.

func (*Store) Version ¶

func (s *Store) Version() uint16

Version returns the version number of the store's data format.

type StoreOptions ¶

type StoreOptions struct {
	Logger                 logr.Logger
	BackgroundSyncInterval time.Duration
}

Source Files ¶

View all Source files

store.go

Directories ¶

Path	Synopsis
cmd
gonudbadmin
gonudbsample
internal

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL