rosedb

package module
v2.3.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2024 License: Apache-2.0 Imports: 20 Imported by: 4

README

What is ROSEDB

rosedb is a lightweight, fast and reliable key/value storage engine based on Bitcask storage model.

The design of Bitcask was inspired, in part, by log-structured filesystems and log file merging.

Status

rosedb is well tested and ready for production use. There are serveral projects using rosedb in production as a storage engine.

Didn't find the feature you want? Feel free to open an issue or PR, we are in active development.

Design overview

RoseDB log files are using the WAL(Write Ahead Log) as backend, which are append-only files with block cache.

wal: https://github.com/rosedblabs/wal

Key features

Strengths

Low latency per item read or written This is due to the write-once, append-only nature of Bitcask database files.
High throughput, especially when writing an incoming stream of random items Write operations to RoseDB generally saturate I/O and disk bandwidth, which is a good thing from a performance perspective. This saturation occurs for two reasons: because (1) data that is written to RoseDB doesn't need to be ordered on disk, and (2) the log-structured design of Bitcask allows for minimal disk head movement during writes.
Ability to handle datasets larger than RAM without degradation Access to data in RoseDB involves direct lookup from an in-memory index data structure. This makes finding data very efficient, even when datasets are very large.
Single seek to retrieve any value RoseDB's in-memory index data structure of keys points directly to locations on disk where the data lives. RoseDB never uses more than one disk seek to read a value and sometimes even that isn't necessary due to filesystem caching done by the operating system.
Predictable lookup and insert performance For the reasons listed above, read operations from RoseDB have fixed, predictable behavior. This is also true of writes to RoseDB because write operations require, at most, one seek to the end of the current open file followed by and append to that file.
Fast, bounded crash recovery Crash recovery is easy and fast with RoseDB because RoseDB files are append only and write once. The only items that may be lost are partially written records at the tail of the last file that was opened for writes. Recovery operations need to review the record and verify CRC data to ensure that the data is consistent.
Easy Backup In most systems, backup can be very complicated. RoseDB simplifies this process due to its append-only, write-once disk format. Any utility that archives or copies files in disk-block order will properly back up or copy a RoseDB database.
Batch options which guarantee atomicity, consistency, and durability RoseDB supports batch operations which are atomic, consistent, and durable. The new writes in batch are cached in memory before committing. If the batch is committed successfully, all the writes in the batch will be persisted to disk. If the batch fails, all the writes in the batch will be discarded.
Support iterator for forward and backward RoseDB supports iterator for forward and backward. The iterator is based on the in-memory index data structure of keys, which points directly to locations on disk where the data lives. The iterator is very efficient, even when datasets are very large.
Support key watch RoseDB supports key watch, you can get the notification if keys changed in db.
Support key expire RoseDB supports key expire, you can set the expire time for keys.

Weaknesses

Keys must fit in memory RoseDB keeps all keys in memory at all times, which means that your system must have enough memory to contain your entire keyspace, plus additional space for other operational components and operating- system-resident filesystem buffer space.

Gettings Started

Basic operations

package main

import "github.com/rosedblabs/rosedb/v2"

func main() {
	// specify the options
	options := rosedb.DefaultOptions
	options.DirPath = "/tmp/rosedb_basic"

	// open a database
	db, err := rosedb.Open(options)
	if err != nil {
		panic(err)
	}
	defer func() {
		_ = db.Close()
	}()

	// set a key
	err = db.Put([]byte("name"), []byte("rosedb"))
	if err != nil {
		panic(err)
	}

	// get a key
	val, err := db.Get([]byte("name"))
	if err != nil {
		panic(err)
	}
	println(string(val))

	// delete a key
	err = db.Delete([]byte("name"))
	if err != nil {
		panic(err)
	}
}

Batch operations

	// create a batch
	batch := db.NewBatch(rosedb.DefaultBatchOptions)

	// set a key
	_ = batch.Put([]byte("name"), []byte("rosedb"))

	// get a key
	val, _ := batch.Get([]byte("name"))
	println(string(val))

	// delete a key
	_ = batch.Delete([]byte("name"))

	// commit the batch
	_ = batch.Commit()

see the examples for more details.

Community

Welcome to join the Slack channel and Discussions to connect with RoseDB team developers and other users.

Contributors

Documentation

Index

Constants

View Source
const (
	B  = 1
	KB = 1024 * B
	MB = 1024 * KB
	GB = 1024 * MB
)

Variables

View Source
var (
	ErrKeyIsEmpty      = errors.New("the key is empty")
	ErrKeyNotFound     = errors.New("key not found in database")
	ErrDatabaseIsUsing = errors.New("the database directory is used by another process")
	ErrReadOnlyBatch   = errors.New("the batch is read only")
	ErrBatchCommitted  = errors.New("the batch is committed")
	ErrBatchRollbacked = errors.New("the batch is rollbacked")
	ErrDBClosed        = errors.New("the database is closed")
	ErrMergeRunning    = errors.New("the merge operation is running")
	ErrWatchDisabled   = errors.New("the watch is disabled")
)
View Source
var DefaultBatchOptions = BatchOptions{
	Sync:     true,
	ReadOnly: false,
}
View Source
var DefaultOptions = Options{
	DirPath:           tempDBDir(),
	SegmentSize:       1 * GB,
	BlockCache:        0,
	Sync:              false,
	BytesPerSync:      0,
	WatchQueueSize:    0,
	AutoMergeCronExpr: "",
}

Functions

This section is empty.

Types

type Batch

type Batch struct {
	// contains filtered or unexported fields
}

Batch is a batch operations of the database. If readonly is true, you can only get data from the batch by Get method. An error will be returned if you try to use Put or Delete method.

If readonly is false, you can use Put and Delete method to write data to the batch. The data will be written to the database when you call Commit method.

Batch is not a transaction, it does not guarantee isolation. But it can guarantee atomicity, consistency and durability(if the Sync options is true).

You must call Commit method to commit the batch, otherwise the DB will be locked.

func (*Batch) Commit

func (b *Batch) Commit() error

Commit commits the batch, if the batch is readonly or empty, it will return directly.

It will iterate the pendingWrites and write the data to the database, then write a record to indicate the end of the batch to guarantee atomicity. Finally, it will write the index.

func (*Batch) Delete

func (b *Batch) Delete(key []byte) error

Delete marks a key for deletion in the batch.

func (*Batch) Exist

func (b *Batch) Exist(key []byte) (bool, error)

Exist checks if the key exists in the database.

func (*Batch) Expire added in v2.3.2

func (b *Batch) Expire(key []byte, ttl time.Duration) error

Expire sets the ttl of the key.

func (*Batch) Get

func (b *Batch) Get(key []byte) ([]byte, error)

Get retrieves the value associated with a given key from the batch.

func (*Batch) Persist added in v2.3.3

func (b *Batch) Persist(key []byte) error

Persist removes the ttl of the key.

func (*Batch) Put

func (b *Batch) Put(key []byte, value []byte) error

Put adds a key-value pair to the batch for writing.

func (*Batch) PutWithTTL added in v2.3.1

func (b *Batch) PutWithTTL(key []byte, value []byte, ttl time.Duration) error

PutWithTTL adds a key-value pair with ttl to the batch for writing.

func (*Batch) Rollback added in v2.2.1

func (b *Batch) Rollback() error

Rollback discards an uncommitted batch instance. the discard operation will clear the buffered data and release the lock.

func (*Batch) TTL added in v2.3.2

func (b *Batch) TTL(key []byte) (time.Duration, error)

TTL returns the ttl of the key.

type BatchOptions

type BatchOptions struct {
	// Sync has the same semantics as Options.Sync.
	Sync bool
	// ReadOnly specifies whether the batch is read only.
	ReadOnly bool
}

BatchOptions specifies the options for creating a batch.

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB represents a ROSEDB database instance. It is built on the bitcask model, which is a log-structured storage. It uses WAL to write data, and uses an in-memory index to store the key and the position of the data in the WAL, the index will be rebuilt when the database is opened.

The main advantage of ROSEDB is that it is very fast to write, read, and delete data. Because it only needs one disk IO to complete a single operation.

But since we should store all keys and their positions(index) in memory, our total data size is limited by the memory size.

So if your memory can almost hold all the keys, ROSEDB is the perfect storage engine for you.

func Open

func Open(options Options) (*DB, error)

Open a database with the specified options. If the database directory does not exist, it will be created automatically.

Multiple processes can not use the same database directory at the same time, otherwise it will return ErrDatabaseIsUsing.

It will open the wal files in the database directory and load the index from them. Return the DB instance, or an error if any.

func (*DB) Ascend added in v2.3.0

func (db *DB) Ascend(handleFn func(k []byte, v []byte) (bool, error))

Ascend calls handleFn for each key/value pair in the db in ascending order.

func (*DB) AscendGreaterOrEqual added in v2.3.1

func (db *DB) AscendGreaterOrEqual(key []byte, handleFn func(k []byte, v []byte) (bool, error))

AscendGreaterOrEqual calls handleFn for each key/value pair in the db with keys greater than or equal to the given key.

func (*DB) AscendKeys added in v2.3.2

func (db *DB) AscendKeys(pattern []byte, filterExpired bool, handleFn func(k []byte) (bool, error))

AscendKeys calls handleFn for each key in the db in ascending order. Since our expiry time is stored in the value, if you want to filter expired keys, you need to set parameter filterExpired to true. But the performance will be affected. Because we need to read the value of each key to determine if it is expired.

func (*DB) AscendRange added in v2.3.1

func (db *DB) AscendRange(startKey, endKey []byte, handleFn func(k []byte, v []byte) (bool, error))

AscendRange calls handleFn for each key/value pair in the db within the range [startKey, endKey] in ascending order.

func (*DB) Close

func (db *DB) Close() error

Close the database, close all data files and release file lock. Set the closed flag to true. The DB instance cannot be used after closing.

func (*DB) Delete

func (db *DB) Delete(key []byte) error

Delete the specified key from the database. Actually, it will open a new batch and commit it. You can think the batch has only one Delete operation.

func (*DB) DeleteExpiredKeys added in v2.3.2

func (db *DB) DeleteExpiredKeys(timeout time.Duration) error

DeleteExpiredKeys scan the entire index in ascending order to delete expired keys. It is a time-consuming operation, so we need to specify a timeout to prevent the DB from being unavailable for a long time.

func (*DB) Descend added in v2.3.0

func (db *DB) Descend(handleFn func(k []byte, v []byte) (bool, error))

Descend calls handleFn for each key/value pair in the db in descending order.

func (*DB) DescendKeys added in v2.3.2

func (db *DB) DescendKeys(pattern []byte, filterExpired bool, handleFn func(k []byte) (bool, error))

DescendKeys calls handleFn for each key in the db in descending order. Since our expiry time is stored in the value, if you want to filter expired keys, you need to set parameter filterExpired to true. But the performance will be affected. Because we need to read the value of each key to determine if it is expired.

func (*DB) DescendLessOrEqual added in v2.3.1

func (db *DB) DescendLessOrEqual(key []byte, handleFn func(k []byte, v []byte) (bool, error))

DescendLessOrEqual calls handleFn for each key/value pair in the db with keys less than or equal to the given key.

func (*DB) DescendRange added in v2.3.1

func (db *DB) DescendRange(startKey, endKey []byte, handleFn func(k []byte, v []byte) (bool, error))

DescendRange calls handleFn for each key/value pair in the db within the range [startKey, endKey] in descending order.

func (*DB) Exist

func (db *DB) Exist(key []byte) (bool, error)

Exist checks if the specified key exists in the database. Actually, it will open a new batch and commit it. You can think the batch has only one Exist operation.

func (*DB) Expire added in v2.3.2

func (db *DB) Expire(key []byte, ttl time.Duration) error

Expire sets the ttl of the key.

func (*DB) Get

func (db *DB) Get(key []byte) ([]byte, error)

Get the value of the specified key from the database. Actually, it will open a new batch and commit it. You can think the batch has only one Get operation.

func (*DB) Merge added in v2.2.0

func (db *DB) Merge(reopenAfterDone bool) error

Merge merges all the data files in the database. It will iterate all the data files, find the valid data, and rewrite the data to the new data file.

Merge operation maybe a very time-consuming operation when the database is large. So it is recommended to perform this operation when the database is idle.

If reopenAfterDone is true, the original file will be replaced by the merge file, and db's index will be rebuilt after the merge completes.

func (*DB) NewBatch

func (db *DB) NewBatch(options BatchOptions) *Batch

NewBatch creates a new Batch instance.

func (*DB) Persist added in v2.3.3

func (db *DB) Persist(key []byte) error

Persist removes the ttl of the key. If the key does not exist or expired, it will return ErrKeyNotFound.

func (*DB) Put

func (db *DB) Put(key []byte, value []byte) error

Put a key-value pair into the database. Actually, it will open a new batch and commit it. You can think the batch has only one Put operation.

func (*DB) PutWithTTL added in v2.3.1

func (db *DB) PutWithTTL(key []byte, value []byte, ttl time.Duration) error

PutWithTTL a key-value pair into the database, with a ttl. Actually, it will open a new batch and commit it. You can think the batch has only one PutWithTTL operation.

func (*DB) Stat

func (db *DB) Stat() *Stat

Stat returns the statistics of the database.

func (*DB) Sync

func (db *DB) Sync() error

Sync all data files to the underlying storage.

func (*DB) TTL added in v2.3.2

func (db *DB) TTL(key []byte) (time.Duration, error)

TTL get the ttl of the key.

func (*DB) Watch added in v2.2.2

func (db *DB) Watch() (<-chan *Event, error)

type Event added in v2.2.2

type Event struct {
	Action  WatchActionType
	Key     []byte
	Value   []byte
	BatchId uint64
}

Event is the event that occurs when the database is modified. It is used to synchronize the watch of the database.

type IndexRecord

type IndexRecord struct {
	// contains filtered or unexported fields
}

IndexRecord is the index record of the key. It contains the key, the record type and the position of the record in the wal. Only used in start up to rebuild the index.

type LogRecord

type LogRecord struct {
	Key     []byte
	Value   []byte
	Type    LogRecordType
	BatchId uint64
	Expire  int64
}

LogRecord is the log record of the key/value pair. It contains the key, the value, the record type and the batch id It will be encoded to byte slice and written to the wal.

func (*LogRecord) IsExpired added in v2.3.2

func (lr *LogRecord) IsExpired(now int64) bool

IsExpired checks whether the log record is expired.

type LogRecordType

type LogRecordType = byte

LogRecordType is the type of the log record.

const (
	// LogRecordNormal is the normal log record type.
	LogRecordNormal LogRecordType = iota
	// LogRecordDeleted is the deleted log record type.
	LogRecordDeleted
	// LogRecordBatchFinished is the batch finished log record type.
	LogRecordBatchFinished
)

type Options

type Options struct {
	// DirPath specifies the directory path where the WAL segment files will be stored.
	DirPath string

	// SegmentSize specifies the maximum size of each segment file in bytes.
	SegmentSize int64

	// BlockCache specifies the size of the block cache in number of bytes.
	// A block cache is used to store recently accessed data blocks, improving read performance.
	// If BlockCache is set to 0, no block cache will be used.
	BlockCache uint32

	// Sync is whether to synchronize writes through os buffer cache and down onto the actual disk.
	// Setting sync is required for durability of a single write operation, but also results in slower writes.
	//
	// If false, and the machine crashes, then some recent writes may be lost.
	// Note that if it is just the process that crashes (machine does not) then no writes will be lost.
	//
	// In other words, Sync being false has the same semantics as a write
	// system call. Sync being true means write followed by fsync.
	Sync bool

	// BytesPerSync specifies the number of bytes to write before calling fsync.
	BytesPerSync uint32

	// WatchQueueSize the cache length of the watch queue.
	// if the size greater than 0, which means enable the watch.
	WatchQueueSize uint64

	// AutoMergeEnable enable the auto merge.
	// auto merge will be triggered when cron expr is satisfied.
	// cron expression follows the standard cron expression.
	// e.g. "0 0 * * *" means merge at 00:00:00 every day.
	// it also supports seconds optionally.
	// when enable the second field, the cron expression will be like this: "0/10 * * * * *" (every 10 seconds).
	// when auto merge is enabled, the db will be closed and reopened after merge done.
	// do not set this shecule too frequently, it will affect the performance.
	// refer to https://en.wikipedia.org/wiki/Cron
	AutoMergeCronExpr string
}

Options specifies the options for opening a database.

type Stat

type Stat struct {
	// Total number of keys
	KeysNum int
	// Total disk size of database directory
	DiskSize int64
}

Stat represents the statistics of the database.

type WatchActionType added in v2.2.2

type WatchActionType = byte
const (
	WatchActionPut WatchActionType = iota
	WatchActionDelete
)

type Watcher added in v2.2.2

type Watcher struct {
	// contains filtered or unexported fields
}

Watcher temporarily stores event information, as it is generated until it is synchronized to DB's watch.

If the event is overflow, It will remove the oldest data, even if event hasn't been read yet.

func NewWatcher added in v2.2.2

func NewWatcher(capacity uint64) *Watcher

Directories

Path Synopsis
examples
ttl

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL