simpledb

package
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 15, 2021 License: Apache-2.0 Imports: 20 Imported by: 0

README

SimpleDB

SimpleDB is a simple embedded key-value database built using the primitives in this repository. It's an example project to show how the different pieces can fit together and is not meant to be a production-ready database like RocksDB or LevelDB.

Interface

There are three methods, namely Get, Put and Delete to query and add data to the database. For simplicity's sake the whole interface is based on strings.

Get is retrieving the value for the given key. If there is no value for the given key it will return NotFound as the error, and an empty string value. Otherwise, the error will contain any other usual io error that can be expected.

Put adds the given value for the given key. If this key already exists, it will overwrite the already existing value with the given one (basically an upsert).

Delete will remove the value for the given key. It will ignore when a key does not exist in the database. Underneath it will be tombstoned, which still store it and make it not retrievable through this interface.

As with any embedded database, they are based on an empty directory you supply. All the state is contained in that database and opening an existing database in a folder will allow you to continue where you've left off.

Thus, there are two more methods in the interface: Open and Close. Open will set up some state, kick off the compaction and memstore flushing goroutines and make sure to recover anything that wasn't closed properly beforehand - more is described in the recovery section below. Close will make sure everything is flushed, cleaned and all started goroutines are stopped.

Examples

Opening and closing
db, err := simpledb.NewSimpleDB("path")
if err != nil { log.Fatalf("error: %v", err) }

err = db.Open()
if err != nil { log.Fatalf("error: %v", err) }

err = db.Close()
if err != nil { log.Fatalf("error: %v", err) }
Put data
err = db.Put("hello", "world")
if err != nil { log.Fatalf("error: %v", err) }
Get data
value, err = db.Get("hello")
if err != nil {
    if err == simpledb.NotFound {
        log.Printf("no value found!")
    } else {
        log.Fatalf("error: %v", err)
    }
}

log.Printf("value %s", value)
Delete data
err = db.Delete("hello")
if err != nil { log.Fatalf("error: %v", err) }

The full end to end example can be found in examples/simpledb.go.

Configuration

The database can be configured using options, here are a few that can be used to tune the database:

db, err := NewSimpleDB(
    "some_path", // the non-empty and existing base path of the database - only mandatory argument
    DisableAsyncWAL(), // this enables fsync after every modification -> safe option for data consistency, but affects performance greatly
    MemstoreSizeBytes(1024*1024*1024),     // the maximum size a memstore should have in bytes
    CompactionRunInterval(30*time.Second), // how often the compaction process should run  
    CompactionMaxSizeBytes(1024 * 1024 * 1024 * 5) // up to which size in bytes to continue to compact sstables
    CompactionFileThreshold(20), // how many files must be at least compacted together
    DisableCompactions()         // turn off the compaction completely
)

Concurrency

The database itself is thread-safe and can be used from multiple goroutines. It's not advised to open or close it from multiple threads however - which is mostly a matter of idempotency than concurrency.

How does it work?

Effectively SimpleDB implements the given diagram:

rocksdb architecture overview

There is always a writable-memstore that stores in the writes in memory. When it's becoming too big (default of 1gb) it will rotate to a new memstore and flush the filled one to a SSTable and rotate the WAL. The WAL will be deleted once the SSTable is written fully to disk.

In SimpleDB only L0 is implemented, meaning there are multiple SSTables at the same time that have overlapping key ranges. There is a maximum limit of ten simultaneous SSTables, after that limit an asynchronous compaction and merge process will kick off and merge into a single SSTable again.

The only difference is that there is no manifest log, the purpose of this tracking is delegated to the filesystem mainly using path patterns and conventions.

Recovery Modes

The recovery in SimpleDB is quite simple, the supplied base path when the database is opened will be scanned for existing sstable paths first. They are read in sequence of their file name, and the internal state is reset to be starting from the latest number.

Then the remainder of the WAL will be read into the existing memstore, but the memstore will be immediately flushed synchronously into a new SSTable. That has a very simple reason: the WAL folder should be fully empty to not create edge cases when WALs are being overwritten in multi-crash scenarios. The trade-off here is that most likely the created SSTable is too small, but it will be compacted with others later on.

In case the crash happened during a compaction, there are two places where we can attempt to recover. The first one is while the compaction is ongoing (there is no compaction_successful flag file in the folder yet), in this case we can discard the compaction result. If there is a compaction that has finished successfully, we can try to recover that by completing the steps in the sstable_manager. The flag file contains several meta information for that case and can be used to trigger the remainder of the logic again.

Crash Testing

To exercise the above assumptions there are two kinds of tests: in-process crashing by mutation some state to simulate a crash and the other is a real kill -9 style crash test where a database is spawned as an external process.

The latter can be run using make crash-simpledb, which will start a webserver that exposes the database methods via REST. The test code will then try a couple of deterministic and random patterns while killing the database and checking its results continuously.

The test suite is reusable for other databases, as long as they implement the REST interface. .

Documentation

Index

Constants

View Source
const CompactionFinishedSuccessfulFileName = "compaction_successful"
View Source
const DefaultCompactionInterval = 5 * time.Second
View Source
const DefaultCompactionMaxSizeBytes uint64 = 5 * 1024 * 1024 * 1024 // 5gb
View Source
const MemStoreMaxSizeBytes uint64 = 1024 * 1024 * 1024 // 1gb
View Source
const NumSSTablesToTriggerCompaction int = 10
View Source
const SSTableCompactionPathPrefix = SSTablePrefix + "_compaction"
View Source
const SSTablePattern = SSTablePrefix + "_%015d"
View Source
const SSTablePrefix = "sstable"
View Source
const WriteAheadFolder = "wal"

Variables

View Source
var AlreadyClosed = errors.New("database is already closed")
View Source
var EmptyKeyValue = errors.New("neither empty keys nor values are allowed")
View Source
var NotFound = errors.New("NotFound")

Functions

This section is empty.

Types

type DB

type DB struct {
	// contains filtered or unexported fields
}

func NewSimpleDB

func NewSimpleDB(basePath string, extraOptions ...ExtraOption) (*DB, error)

NewSimpleDB creates a new db that requires a directory that exist, it can be empty in case of existing databases. The error in case it doesn't exist can be checked using normal os package functions like os.IsNotExist(err)

func (*DB) Close

func (db *DB) Close() error

func (*DB) Delete

func (db *DB) Delete(key string) error

func (*DB) Get

func (db *DB) Get(key string) (string, error)

func (*DB) Open

func (db *DB) Open() error

func (*DB) Put

func (db *DB) Put(key, value string) error

type DatabaseI

type DatabaseI interface {
	recordio.OpenClosableI

	// Get returns the value for the given key. If there is no value for the given
	// key it will return NotFound as the error and an empty string value. Otherwise
	// the error will contain any other usual io error that can be expected.
	Get(key string) (string, error)

	// Put adds the given value for the given key. If this key already exists, it will
	// overwrite the already existing value with the given one.
	// Unfortunately this method does not support empty keys and values, that will immediately return an error.
	Put(key, value string) error

	// Delete will delete the value for the given key. It will ignore when a key does not exist in the database.
	// Underneath it will be tombstoned, which still store it and make it not retrievable through this interface.
	Delete(key string) error
}

type ExtraOption

type ExtraOption func(options *ExtraOptions)

func CompactionFileThreshold

func CompactionFileThreshold(n int) ExtraOption

CompactionFileThreshold tells how often SSTables are being compacted, this is measured in the number of SSTables. The default is 10, which in turn will compact into a single SSTable.

func CompactionMaxSizeBytes

func CompactionMaxSizeBytes(n uint64) ExtraOption

CompactionMaxSizeBytes tells whether an SSTable is considered for compaction. SSTables over the given threshold will not be compacted any further. Default is 5GB in DefaultCompactionMaxSizeBytes.

func CompactionRunInterval

func CompactionRunInterval(interval time.Duration) ExtraOption

CompactionRunInterval configures how often the compaction ticker tries to compact sstables. By default it's every DefaultCompactionInterval.

func DisableAsyncWAL

func DisableAsyncWAL() ExtraOption

DisableAsyncWAL will turn off the asynchronous WAL writes, which should give consistent, but slower results.

func DisableCompactions

func DisableCompactions() ExtraOption

DisableCompactions will disable the compaction process from running. Default is enabled.

func MemstoreSizeBytes

func MemstoreSizeBytes(n uint64) ExtraOption

MemstoreSizeBytes controls the size of the memstore, after this limit is hit the memstore will be written to disk. Default is 1 GiB.

type ExtraOptions

type ExtraOptions struct {
	// contains filtered or unexported fields
}

type RWMemstore

type RWMemstore struct {
	// contains filtered or unexported fields
}

the RW memstore contains two memstores, one for reading, one for writing. the one that writes takes precedence over the read store (for the same key).

func (*RWMemstore) Add

func (c *RWMemstore) Add(key []byte, value []byte) error

func (*RWMemstore) Contains

func (c *RWMemstore) Contains(key []byte) bool

func (*RWMemstore) Delete

func (c *RWMemstore) Delete(key []byte) error

func (*RWMemstore) DeleteIfExists

func (c *RWMemstore) DeleteIfExists(key []byte) error

func (*RWMemstore) EstimatedSizeInBytes

func (c *RWMemstore) EstimatedSizeInBytes() uint64

func (*RWMemstore) Flush

func (c *RWMemstore) Flush(opts ...sstables.WriterOption) error

func (*RWMemstore) Get

func (c *RWMemstore) Get(key []byte) ([]byte, error)

func (*RWMemstore) SStableIterator

func (c *RWMemstore) SStableIterator() sstables.SSTableIteratorI

func (*RWMemstore) Size

func (c *RWMemstore) Size() int

func (*RWMemstore) Tombstone

func (c *RWMemstore) Tombstone(key []byte) error

func (*RWMemstore) Upsert

func (c *RWMemstore) Upsert(key []byte, value []byte) error

type SSTableManager

type SSTableManager struct {
	// contains filtered or unexported fields
}

func NewSSTableManager

func NewSSTableManager(cmp skiplist.KeyComparator, dbLock *sync.RWMutex, basePath string) *SSTableManager

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL