bitcask

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 1, 2021 License: MIT Imports: 23 Imported by: 61

README

bitcask

Build Status Go Report Card Go Reference

A high performance Key/Value store written in Go with a predictable read/write performance and high throughput. Uses a Bitcask on-disk layout (LSM+WAL) similar to Riak

For a more feature-complete Redis-compatible server, distributed key/value store have a look at Bitraft which uses this library as its backend. Use Bitcask as a starting point or if you want to embed in your application, use Bitraft if you need a complete server/client solution with high availability with a Redis-compatible API.

Features

  • Embedded (import "git.mills.io/prologic/bitcask")
  • Builtin CLI (bitcask)
  • Builtin Redis-compatible server (bitcaskd)
  • Predictable read/write performance
  • Low latency
  • High throughput (See: Performance )

Is Bitcask right for my project?

NOTE: Please read this carefully to identify whether using Bitcask is suitable for your needs.

bitcask is a great fit for:

  • Storing hundreds of thousands to millions of key/value pairs based on default configuration. With the default configuration (configurable) of 64 bytes per key and 64kB values, 1M keys would consume roughly ~600-700MB of memory ~65-70GB of disk storage. These are all configurable when you create a new database with bitcask.Open(...) with functional-style options you can pass with WithXXX().

  • As the backing store to a distributed key/value store. See for example the bitraft as an example of this.

  • For high performance, low latency read/write workloads where you cannot fit a typical hash-map into memory, but require the highest level of performance and predicate read latency. Bitcask ensures only 1 read/write IOPS are ever required for reading and writing key/value pairs.

  • As a general purpose embedded key/value store where you would have used BoltDB, LevelDB, BuntDB or similar...

bitcask is not suited for:

  • Storing billions of records The reason for this is the key-space is held in memory using a highly performant and memory optimized adaptive radix tree thanks to go-adaptive-radix-tree however this means the more keys you have in your key space, the more memory is consumed. Consider using a disk-backed B-Tree like BoltDB or LevelDB if you intend to store a large quantity of key/value pairs.

Note however that storing large amounts of data in terms of value(s) is totally fine. In other wise thousands to millions of keys with large values will work just fine.

  • Write intensive workloads. Due to the Bitcask design heavy write workloads that lots of key/value pairs will over time cause problems like "Too many open files" (#193) errors to occur. This can be mitigated by periodically compacting the data files by issuing a .Merge() operation however if key/value pairs do not change or are never deleted, as-in only new key/value pairs are ever written this will have no effect. Eventually you will run out of file descriptors!

You should consider your read/write workloads carefully and ensure you set appropriate file descriptor limits with ulimit -n that suit your needs.

Development

$ git clone https://git.mills.io/prologic/bitcask.git
$ make

Install

$ go get git.mills.io/prologic/bitcask

Usage (library)

Install the package into your project:

$ go get git.mills.io/prologic/bitcask
package main

import (
	"log"
	"git.mills.io/prologic/bitcask"
)

func main() {
    db, _ := bitcask.Open("/tmp/db")
    defer db.Close()
    db.Put([]byte("Hello"), []byte("World"))
    val, _ := db.Get([]byte("Hello"))
    log.Printf(string(val))
}

See the GoDoc for further documentation and other examples.

Usage (tool)

$ bitcask -p /tmp/db set Hello World
$ bitcask -p /tmp/db get Hello
World

Usage (server)

There is also a builtin very simple Redis-compatible server called bitcaskd:

$ ./bitcaskd ./tmp
INFO[0000] starting bitcaskd v0.0.7@146f777              bind=":6379" path=./tmp

Example session:

$ telnet localhost 6379
Trying ::1...
Connected to localhost.
Escape character is '^]'.
SET foo bar
+OK
GET foo
$3
bar
DEL foo
:1
GET foo
$-1
PING
+PONG
QUIT
+OK
Connection closed by foreign host.

Docker

You can also use the Bitcask Docker Image:

$ docker pull prologic/bitcask
$ docker run -d -p 6379:6379 prologic/bitcask

Performance

Benchmarks run on a 11" MacBook with a 1.4Ghz Intel Core i7:

$ make bench
...
goos: darwin
goarch: amd64
pkg: git.mills.io/prologic/bitcask

BenchmarkGet/128B-4         	  316515	      3263 ns/op	  39.22 MB/s	     160 B/op	       1 allocs/op
BenchmarkGet/256B-4         	  382551	      3204 ns/op	  79.90 MB/s	     288 B/op	       1 allocs/op
BenchmarkGet/512B-4         	  357216	      3835 ns/op	 133.51 MB/s	     576 B/op	       1 allocs/op
BenchmarkGet/1K-4           	  274958	      4429 ns/op	 231.20 MB/s	    1152 B/op	       1 allocs/op
BenchmarkGet/2K-4           	  227764	      5013 ns/op	 408.55 MB/s	    2304 B/op	       1 allocs/op
BenchmarkGet/4K-4           	  187557	      5534 ns/op	 740.15 MB/s	    4864 B/op	       1 allocs/op
BenchmarkGet/8K-4           	  153546	      7652 ns/op	1070.56 MB/s	    9472 B/op	       1 allocs/op
BenchmarkGet/16K-4          	  115549	     10272 ns/op	1594.95 MB/s	   18432 B/op	       1 allocs/op
BenchmarkGet/32K-4          	   69592	     16405 ns/op	1997.39 MB/s	   40960 B/op	       1 allocs/op

BenchmarkPut/128BNoSync-4   	  123519	     11094 ns/op	  11.54 MB/s	      49 B/op	       2 allocs/op
BenchmarkPut/256BNoSync-4   	   84662	     13398 ns/op	  19.11 MB/s	      50 B/op	       2 allocs/op
BenchmarkPut/1KNoSync-4     	   46345	     24855 ns/op	  41.20 MB/s	      58 B/op	       2 allocs/op
BenchmarkPut/2KNoSync-4     	   28820	     43817 ns/op	  46.74 MB/s	      68 B/op	       2 allocs/op
BenchmarkPut/4KNoSync-4     	   13976	     90059 ns/op	  45.48 MB/s	      89 B/op	       2 allocs/op
BenchmarkPut/8KNoSync-4     	    7852	    155101 ns/op	  52.82 MB/s	     130 B/op	       2 allocs/op
BenchmarkPut/16KNoSync-4    	    4848	    238113 ns/op	  68.81 MB/s	     226 B/op	       2 allocs/op
BenchmarkPut/32KNoSync-4    	    2564	    391483 ns/op	  83.70 MB/s	     377 B/op	       3 allocs/op

BenchmarkPut/128BSync-4     	     260	   4611273 ns/op	   0.03 MB/s	      48 B/op	       2 allocs/op
BenchmarkPut/256BSync-4     	     265	   4665506 ns/op	   0.05 MB/s	      48 B/op	       2 allocs/op
BenchmarkPut/1KSync-4       	     256	   4757334 ns/op	   0.22 MB/s	      48 B/op	       2 allocs/op
BenchmarkPut/2KSync-4       	     255	   4996788 ns/op	   0.41 MB/s	      92 B/op	       2 allocs/op
BenchmarkPut/4KSync-4       	     222	   5136481 ns/op	   0.80 MB/s	      98 B/op	       2 allocs/op
BenchmarkPut/8KSync-4       	     223	   5530824 ns/op	   1.48 MB/s	      99 B/op	       2 allocs/op
BenchmarkPut/16KSync-4      	     213	   5717880 ns/op	   2.87 MB/s	     202 B/op	       2 allocs/op
BenchmarkPut/32KSync-4      	     211	   5835948 ns/op	   5.61 MB/s	     355 B/op	       3 allocs/op

BenchmarkScan-4             	  568696	      2036 ns/op	     392 B/op	      33 allocs/op
PASS

For 128B values:

  • ~300,000 reads/sec
  • ~90,000 writes/sec
  • ~490,000 scans/sec

The full benchmark above shows linear performance as you increase key/value sizes.

Support

Support the ongoing development of Bitcask!

Sponsor

Contributors

Thank you to all those that have contributed to this project, battle-tested it, used it in their own projects or products, fixed bugs, improved performance and even fix tiny typos in documentation! Thank you and keep contributing!

You can find an AUTHORS file where we keep a list of contributors to the project. If you contribute a PR please consider adding your name there.

  • bitraft -- A Distributed Key/Value store (using Raft) with a Redis compatible protocol.
  • bitcaskfs -- A FUSE file system for mounting a Bitcask database.
  • bitcask-bench -- A benchmarking tool comparing Bitcask and several other Go key/value libraries.

License

bitcask is licensed under the term of the MIT License

Documentation

Overview

Package bitcask implements a high-performance key-value store based on a WAL and LSM.

Example
_, _ = Open("path/to/db")
Output:

Example (WithOptions)
opts := []Option{
	WithMaxKeySize(1024),
	WithMaxValueSize(4096),
}
_, _ = Open("path/to/db", opts...)
Output:

Index

Examples

Constants

View Source
const (
	// DefaultDirFileModeBeforeUmask is the default os.FileMode used when creating directories
	DefaultDirFileModeBeforeUmask = os.FileMode(0700)

	// DefaultFileFileModeBeforeUmask is the default os.FileMode used when creating files
	DefaultFileFileModeBeforeUmask = os.FileMode(0600)

	// DefaultMaxDatafileSize is the default maximum datafile size in bytes
	DefaultMaxDatafileSize = 1 << 20 // 1MB

	// DefaultMaxKeySize is the default maximum key size in bytes
	DefaultMaxKeySize = uint32(64) // 64 bytes

	// DefaultMaxValueSize is the default value size in bytes
	DefaultMaxValueSize = uint64(1 << 16) // 65KB

	// DefaultSync is the default file synchronization action
	DefaultSync = false

	CurrentDBVersion = uint32(1)
)

Variables

View Source
var (
	// ErrKeyNotFound is the error returned when a key is not found
	ErrKeyNotFound = errors.New("error: key not found")

	// ErrKeyTooLarge is the error returned for a key that exceeds the
	// maximum allowed key size (configured with WithMaxKeySize).
	ErrKeyTooLarge = errors.New("error: key too large")

	// ErrKeyExpired is the error returned when a key is queried which has
	// already expired (due to ttl)
	ErrKeyExpired = errors.New("error: key expired")

	// ErrEmptyKey is the error returned for a value with an empty key.
	ErrEmptyKey = errors.New("error: empty key")

	// ErrValueTooLarge is the error returned for a value that exceeds the
	// maximum allowed value size (configured with WithMaxValueSize).
	ErrValueTooLarge = errors.New("error: value too large")

	// ErrChecksumFailed is the error returned if a key/value retrieved does
	// not match its CRC checksum
	ErrChecksumFailed = errors.New("error: checksum failed")

	// ErrDatabaseLocked is the error returned if the database is locked
	// (typically opened by another process)
	ErrDatabaseLocked = errors.New("error: database locked")

	// ErrInvalidRange is the error returned when the range scan is invalid
	ErrInvalidRange = errors.New("error: invalid range")

	// ErrInvalidVersion is the error returned when the database version is invalid
	ErrInvalidVersion = errors.New("error: invalid db version")

	// ErrMergeInProgress is the error returned if merge is called when already a merge
	// is in progress
	ErrMergeInProgress = errors.New("error: merge already in progress")
)

Functions

This section is empty.

Types

type Bitcask

type Bitcask struct {
	// contains filtered or unexported fields
}

Bitcask is a struct that represents a on-disk LSM and WAL data structure and in-memory hash of key/value pairs as per the Bitcask paper and seen in the Riak database.

func Open

func Open(path string, options ...Option) (*Bitcask, error)

Open opens the database at the given path with optional options. Options can be provided with the `WithXXX` functions that provide configuration options as functions.

func (*Bitcask) Backup

func (b *Bitcask) Backup(path string) error

Backup copies db directory to given path it creates path if it does not exist

func (*Bitcask) Close

func (b *Bitcask) Close() error

Close closes the database and removes the lock. It is important to call Close() as this is the only way to cleanup the lock held by the open database.

func (*Bitcask) Delete

func (b *Bitcask) Delete(key []byte) error

Delete deletes the named key.

func (*Bitcask) DeleteAll

func (b *Bitcask) DeleteAll() (err error)

DeleteAll deletes all the keys. If an I/O error occurs the error is returned.

func (*Bitcask) Fold

func (b *Bitcask) Fold(f func(key []byte) error) (err error)

Fold iterates over all keys in the database calling the function `f` for each key. If the function returns an error, no further keys are processed and the error is returned.

func (*Bitcask) Get

func (b *Bitcask) Get(key []byte) ([]byte, error)

Get fetches value for a key

func (*Bitcask) Has

func (b *Bitcask) Has(key []byte) bool

Has returns true if the key exists in the database, false otherwise.

func (*Bitcask) Keys

func (b *Bitcask) Keys() chan []byte

Keys returns all keys in the database as a channel of keys

func (*Bitcask) Len

func (b *Bitcask) Len() int

Len returns the total number of keys in the database

func (*Bitcask) Merge

func (b *Bitcask) Merge() error

Merge merges all datafiles in the database. Old keys are squashed and deleted keys removes. Duplicate key/value pairs are also removed. Call this function periodically to reclaim disk space.

func (*Bitcask) Put

func (b *Bitcask) Put(key, value []byte) error

Put stores the key and value in the database.

func (*Bitcask) PutWithTTL

func (b *Bitcask) PutWithTTL(key, value []byte, ttl time.Duration) error

PutWithTTL stores the key and value in the database with the given TTL

func (*Bitcask) Range added in v0.3.14

func (b *Bitcask) Range(start, end []byte, f func(key []byte) error) (err error)

Range performs a range scan of keys matching a range of keys between the start key and end key and calling the function `f` with the keys found. If the function returns an error no further keys are processed and the first error returned.

func (*Bitcask) Reclaimable

func (b *Bitcask) Reclaimable() int64

Reclaimable returns space that can be reclaimed

func (*Bitcask) Reopen

func (b *Bitcask) Reopen() error

Reopen closes and reopsns the database

func (*Bitcask) RunGC

func (b *Bitcask) RunGC() error

RunGC deletes all expired keys

func (*Bitcask) Scan

func (b *Bitcask) Scan(prefix []byte, f func(key []byte) error) (err error)

Scan performs a prefix scan of keys matching the given prefix and calling the function `f` with the keys found. If the function returns an error no further keys are processed and the first error is returned.

func (*Bitcask) Sift added in v0.3.14

func (b *Bitcask) Sift(f func(key []byte) (bool, error)) (err error)

Sift iterates over all keys in the database calling the function `f` for each key. If the KV pair is expired or the function returns true, that key is deleted from the database. If the function returns an error on any key, no further keys are processed, no keys are deleted, and the first error is returned.

func (*Bitcask) SiftRange added in v0.3.14

func (b *Bitcask) SiftRange(start, end []byte, f func(key []byte) (bool, error)) (err error)

SiftRange performs a range scan of keys matching a range of keys between the start key and end key and calling the function `f` with the keys found. If the KV pair is expired or the function returns true, that key is deleted from the database. If the function returns an error on any key, no further keys are processed, no keys are deleted, and the first error is returned.

func (*Bitcask) SiftScan added in v0.3.14

func (b *Bitcask) SiftScan(prefix []byte, f func(key []byte) (bool, error)) (err error)

SiftScan iterates over all keys in the database beginning with the given prefix, calling the function `f` for each key. If the KV pair is expired or the function returns true, that key is deleted from the database.

If the function returns an error on any key, no further keys are processed,

no keys are deleted, and the first error is returned.

func (*Bitcask) Stats

func (b *Bitcask) Stats() (stats Stats, err error)

Stats returns statistics about the database including the number of data files, keys and overall size on disk of the data

func (*Bitcask) Sync

func (b *Bitcask) Sync() error

Sync flushes all buffers to disk ensuring all data is written

type ErrBadConfig added in v1.0.1

type ErrBadConfig struct {
	Err error
}

ErrBadConfig is the error returned on failure to load the database config

func (*ErrBadConfig) Error added in v1.0.1

func (e *ErrBadConfig) Error() string

func (*ErrBadConfig) Is added in v1.0.1

func (e *ErrBadConfig) Is(target error) bool

func (*ErrBadConfig) Unwrap added in v1.0.1

func (e *ErrBadConfig) Unwrap() error

type ErrBadMetadata added in v1.0.1

type ErrBadMetadata struct {
	Err error
}

ErrBadMetadata is the error returned on failure to load the database metadata

func (*ErrBadMetadata) Error added in v1.0.1

func (e *ErrBadMetadata) Error() string

func (*ErrBadMetadata) Is added in v1.0.1

func (e *ErrBadMetadata) Is(target error) bool

func (*ErrBadMetadata) Unwrap added in v1.0.1

func (e *ErrBadMetadata) Unwrap() error

type Option

type Option func(*config.Config) error

Option is a function that takes a config struct and modifies it

func WithAutoRecovery

func WithAutoRecovery(enabled bool) Option

WithAutoRecovery sets auto recovery of data and index file recreation. IMPORTANT: This flag MUST BE used only if a proper backup was made of all the existing datafiles.

func WithDirFileModeBeforeUmask

func WithDirFileModeBeforeUmask(mode os.FileMode) Option

WithDirFileModeBeforeUmask sets the FileMode used for each new file created.

func WithFileFileModeBeforeUmask

func WithFileFileModeBeforeUmask(mode os.FileMode) Option

WithFileFileModeBeforeUmask sets the FileMode used for each new file created.

func WithMaxDatafileSize

func WithMaxDatafileSize(size int) Option

WithMaxDatafileSize sets the maximum datafile size option

func WithMaxKeySize

func WithMaxKeySize(size uint32) Option

WithMaxKeySize sets the maximum key size option

func WithMaxValueSize

func WithMaxValueSize(size uint64) Option

WithMaxValueSize sets the maximum value size option

func WithSync

func WithSync(sync bool) Option

WithSync causes Sync() to be called on every key/value written increasing durability and safety at the expense of performance

type Stats

type Stats struct {
	Datafiles int
	Keys      int
	Size      int64
}

Stats is a struct returned by Stats() on an open Bitcask instance

Directories

Path Synopsis
cmd
scripts

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL