k4

package module
v2.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 12, 2024 License: BSD-3-Clause Imports: 13 Imported by: 0

README

K4 is an open-source, high-performance, transactional, and durable storage engine based on an LSM (Log-Structured Merge) tree architecture. This design optimizes high-speed writes and reads, making it ideal for modern data-intensive applications.

Benchmarks

HDD benchmarks on a WDC WDS500G2B0A-00SM50

More detailed benchmarks on latest version can be found here

Articles, Papers, Discussions

Features

  • High speed writes and reads.
  • Durability
  • Optimized for RAM and flash storage (SSD)
  • Variable length binary keys and values. Keys and their values can be any size that paired don't exceed available memory at time of operation.
  • Write-Ahead Logging (WAL). System writes PUT and DELETE operations to a log file before applying them to K4.
  • Atomic transactions. Multiple PUT and DELETE operations can be grouped together and applied atomically to K4.
  • Multi-threaded parallel paired compaction. SSTables are paired up during compaction and merged into a single SSTable(s). This reduces the number of SSTables and minimizes disk I/O for read operations.
  • Memtable implemented as a skip list.
  • Configurable memtable flush threshold
  • Configurable compaction interval (in seconds)
  • Configurable logging
  • Configurable skip list (max level and probability)
  • Optimized cuckoo filter for faster lookups. SSTable final pages contain a cuckoo filter. The cuckoo filter check returns a page index if key is found.
  • Recovery from WAL
  • Granular page locking (sstables on scan are locked granularly)
  • Thread-safe (multiple readers, single writer)
  • TTL support (time to live). Keys can be set to expire after a certain time duration.
  • Murmur3 inspired hashing on compression and cuckoo filter
  • Optional compression support (Simple lightweight and optimized Lempel-Ziv 1977 inspired compression algorithm)
  • Background flushing and compaction operations for less blocking on read and write operations
  • Easy intuitive API(Get, Put, Delete, Range, NRange, GreaterThan, GreaterThanEq, LessThan, LessThanEq, NGet)
  • Iterator for iterating over key-value pairs in memtable and sstables with Next and Prev methods
  • Periodic pager write syncing to disk. Every 24576 writes or every minute the system will flush the file system's in-memory copy of recently written data to disk.
  • No dependencies

C Library and FFIs

K4 has a C library that can be used with FFIs (Foreign Function Interfaces) in other languages. The C library is built using the Go toolchain and can be found in the c directory.

Shared C Library

FFIs

These are FFI's using the shared K4 C library. They can be used as a building block using K4 as a storage engine in other languages.

Example usage

This is GO code that demonstrates how to use K4. The code is simple and demonstrates PUT, GET, and DELETE operations.

import (
    "github.com/guycipher/k4/v2"
    "log"
)

func main() {
    var err error
    directory := "./data"
    memtableFlushThreshold := (1024 * 1024) * 5 // 5MB
    compactionInterval := 3600 // 1 hour
    logging := true
    compression := false

    db, err := k4.Open(directory, memtableFlushThreshold, compactionInterval, logging, compression)
    if err != nil {
        log.Fatalf("Failed to open K4: %v", err)
    }

    defer db.Close()


    // Put
    // Putting the same key will update the value
    key := []byte("key")
    value := []byte("value")
    err = db.Put(key, value, nil)
    if err != nil {
        log.Fatalf("Failed to put key: %v", err)
    }

    // Get
    value, err = db.Get(key)
    if err != nil {
        log.Fatalf("Failed to get key: %v", err)
    }

    // Delete
    err = db.Delete(key)
    if err != nil {
        log.Fatalf("Failed to get key: %v", err)
    }
}

Iteration

To iterate over key-value pairs you can use an Iterator. Will iterate over key-value pairs in memtable then sstables.


it := k4.NewIterator(db)

for  {
    key, value := it.Next()
    if key == nil {
        break
    }

    // .. You can also go back if you want
    key, value = it.Prev()
    if key == nil {
        break
    }
}

Transactions

Transactions are atomic and can be used to group multiple PUT and DELETE operations together. Transactions are committed atomically to K4. Transactions can be rolled back after their commited but before they are removed. Commits are first come first serve and are applied to K4 in the order they were committed.

txn := db.BeginTransaction()

txn.AddOperation(k4.PUT, key, value)
txn.AddOperation(k4.DELETE, key, nil)

err = txn.Commit()

// Once committed you can rollback before the transaction is removed
// On commit error the transaction is automatically rolled back
txn.Rollback()

// Remove the transaction after commit or rollback
txn.Remove() // txn now no longer usable nor existent

Recovery

If you have a populated WAL file in the data directory but no data files aka sstables you can use RecoverFromWAL() which will replay the WAL file and populate K4.

Example

func main() {
    directory := "./data"
    memtableFlushThreshold := 1024 * 1024 // 1MB
    compactionInterval := 3600 // 1 hour
    logging := true
    compression := false

    db, err := k4.Open(directory, memtableFlushThreshold, compactionInterval, logging, compression)
    if err != nil {
        log.Fatalf("Failed to open K4: %v", err)
    }

    defer db.Close()

    err := db.RecoverFromWAL()
    if err != nil {
        ..
    }

    // Continue as normal
}

TTL

TTL (time to live) when putting a key-value pair you can specify a time duration after which the key-value pair will be deleted. The system will mark the key with a tombstone and delete it during compaction and or flush operations.

key := []byte("key")
value := []byte("value")
ttl :=  6 * time.Second

err = db.Put(key, value, ttl)
if err != nil {
    ..
..

Configuration recommendations

It is advisable to configure the memtable flush threshold and compaction interval in accordance with your application's specific requirements. A minimum memtable flush size of 50-256 MB is suggested.

Regarding compaction, a compaction interval of 1-6 hours is recommended, contingent upon storage usage and activity patterns.

Compression

Compression is optional and can be enabled or disabled when opening the K4 instance. Memtable keys and their values are not compressed. What is compressed is WAL entries and SSTable pages. Compression could save disk space and reduce disk I/O but it could also increase CPU usage and slow down read and write operations.

API

The K4 API is finalized and will not change. The API is simple and intuitive. The API is designed to be easy to use and understand.


// Open a new K4 instance
// you can pass extra arguments to configure memtable such as
// args[0] = max level, must be an int
// args[1] = probability, must be a float64
func Open(directory string, memtableFlushThreshold int, compactionInterval int, logging, compress bool, args ...interface{}) (*K4, error)

// Close the K4 instance gracefully
func (k4 *K4) Close() error

// Put a key-value pair into the db
func (k4 *K4) Put(key, value []byte, ttl *time.Duration) error

// Get a value from the db
func (k4 *K4) Get(key []byte) ([]byte, error)

// Get all key-value pairs not equal to the key
func (k4 *K4) NGet(key []byte) (*KeyValueArray, error)

// Get all key-value pairs greater than the key
func (k4 *K4) GreaterThan(key []byte) (*KeyValueArray, error)

// Get all key-value pairs greater than or equal to the key
func (k4 *K4) GreaterThanEq(key []byte) (*KeyValueArray, error)

// Get all key-value pairs less than the key
func (k4 *K4) LessThan(key []byte) (*KeyValueArray, error)

// Get all key-value pairs less than or equal to the key
func (k4 *K4) LessThanEq(key []byte) (*KeyValueArray, error)

// Get all key-value pairs in the range of startKey and endKey
func (k4 *K4) Range(startKey, endKey []byte) (*KeyValueArray, error)

// Get all key-value pairs not in the range of startKey and endKey
func (k4 *K4) NRange(startKey, endKey []byte) (*KeyValueArray, error)

// Delete a key-value pair from the db
func (k4 *K4) Delete(key []byte) error

// Begin a transaction
func (k4 *K4) BeginTransaction() *Transaction

// Add a key-value pair to the transaction OPR_CODE can be PUT or DELETE
func (txn *Transaction) AddOperation(op OPR_CODE, key, value []byte)

// Remove a transaction after it has been committed
func (txn *Transaction) Remove(k4 *K4)

// Rollback a transaction (only after commit or error)
func (txn *Transaction) Rollback(k4 *K4) error

// Commit a transaction
func (txn *Transaction) Commit(k4 *K4) error

// Recover from WAL file
// WAL file must be placed in the data directory
func (k4 *K4) RecoverFromWAL() error

// EscalateFlush escalates the memtable flush to the background queue
func (k4 *K4) EscalateFlush() error

// EscalateCompaction escalates a compaction operation
func (k4 *K4) EscalateCompaction() error

What can you build with K4?

  • Time series databases
  • Object storage
  • Key-value stores
  • Distributed databases
  • Blockchain
  • Cache
  • Log store
  • Document store
  • Graph databases
  • Search engines
  • Data warehouses
  • Data lakes
  • Relational databases

and more!

If your project is using K4, please let us know! We would love to hear about it and promote it here.

Reporting bugs

If you find a bug with K4 create an issue on this repository but please do not include any sensitive information in the issue. If you have a security concern please follow SECURITY.md.

Contributing

This repository for the K4 project welcomes contributions. Before submitting a pull request (PR), please ensure you have reviewed and adhered to our guidelines outlined in SECURITY.md and CODE_OF_CONDUCT.md. Following these documents is essential for maintaining a safe and respectful environment for all contributors. We encourage you to create well-structured PRs that address specific issues or enhancements, and to include relevant details in your description. Your contributions help improve K4, and we appreciate your commitment to fostering a collaborative and secure development process. Thank you for being part of the K4 project.

License

BSD 3-Clause License (https://opensource.org/licenses/BSD-3-Clause)

Documentation

Overview

Package k4 BSD 3-Clause License

Copyright (c) 2024, Alex Gaetano Padula All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Index

Constants

View Source
const BACKGROUND_OP_SLEEP = 5 * time.Microsecond // The background sleep time for the background operations
View Source
const COMPRESSION_WINDOW_SIZE = 1024 * 32 // The compression window size
View Source
const LOG_EXTENSION = ".log" // The log file extension
View Source
const SSTABLE_EXTENSION = ".sst" // The SSTable file extension
View Source
const TOMBSTONE_VALUE = "$tombstone" // The tombstone value
View Source
const WAL_EXTENSION = ".wal" // The write ahead log file extension

Variables

This section is empty.

Functions

This section is empty.

Types

type Iterator

type Iterator struct {
	// contains filtered or unexported fields
}

Iterator is a structure for an iterator which goes through memtable and sstables. First it goes through the memtable, then once exhausted goes through the sstables from newest to oldest

func NewIterator

func NewIterator(instance *K4) *Iterator

NewIterator creates a new Iterator

func (*Iterator) Next

func (it *Iterator) Next() ([]byte, []byte)

Next moves the iterator to the next key-value pair

func (*Iterator) Prev

func (it *Iterator) Prev() ([]byte, []byte)

Prev moves the iterator to the previous key-value pair

func (*Iterator) Reset

func (it *Iterator) Reset()

Reset resets the sstable iterator

type K4

type K4 struct {
	// contains filtered or unexported fields
}

K4 is the main structure for the k4 database

func Open

func Open(directory string, memtableFlushThreshold int, compactionInterval int, logging, compress bool, args ...interface{}) (*K4, error)

Open opens a new K4 instance at the specified directory. will reopen the database if it already exists directory - the directory where the database files are stored memtableFlushThreshold - the threshold in bytes for flushing the memtable to disk compactionInterval - the interval in seconds for running compactions logging - whether or not to log to the log file

func (*K4) BeginTransaction

func (k4 *K4) BeginTransaction() *Transaction

BeginTransaction begins a new transaction

func (*K4) Close

func (k4 *K4) Close() error

Close closes the K4

func (*K4) Delete

func (k4 *K4) Delete(key []byte) error

Delete deletes a key from K4

func (*K4) EscalateCompaction added in v2.1.5

func (k4 *K4) EscalateCompaction() error

EscalateCompaction is a public method to force compaction

func (*K4) EscalateFlush added in v2.1.5

func (k4 *K4) EscalateFlush() error

EscalateFlush is a public method to force flush memtable to queue

func (*K4) Get

func (k4 *K4) Get(key []byte) ([]byte, error)

Get gets a key from K4

func (*K4) GreaterThan

func (k4 *K4) GreaterThan(key []byte) (*KeyValueArray, error)

GreaterThan gets all keys greater than a key from K4 and returns a map of key-value pairs

func (*K4) GreaterThanEq

func (k4 *K4) GreaterThanEq(key []byte) (*KeyValueArray, error)

GreaterThanEq queries keys greater than or equal to a key from K4

func (*K4) LessThan

func (k4 *K4) LessThan(key []byte) (*KeyValueArray, error)

LessThan gets all keys less than a key from K4 and returns a map of key-value pairs

func (*K4) LessThanEq

func (k4 *K4) LessThanEq(key []byte) (*KeyValueArray, error)

LessThanEq queries keys less than or equal to a key from K4

func (*K4) NGet

func (k4 *K4) NGet(key []byte) (*KeyValueArray, error)

NGet gets a key from K4 and returns a map of key-value pairs

func (*K4) NRange

func (k4 *K4) NRange(startKey, endKey []byte) (*KeyValueArray, error)

NRange returns key value pairs not in provided range

func (*K4) Put

func (k4 *K4) Put(key, value []byte, ttl *time.Duration) error

Put puts a key-value pair into K4

func (*K4) Range

func (k4 *K4) Range(startKey, endKey []byte) (*KeyValueArray, error)

Range queries keys in a range from K4

func (*K4) RecoverFromWAL

func (k4 *K4) RecoverFromWAL() error

RecoverFromWAL recovers K4 from a write ahead log

type KV

type KV struct {
	Key   []byte     // Binary array of key
	Value []byte     // Binary array of keys value
	TTL   *time.Time // Time to live
}

KV mainly used for storage of key value pairs on disk pages we code the KV into a binary format before writing to disk

type KeyValueArray

type KeyValueArray []*KV

KeyValueArray type to hold a slice of KeyValue's

type OPR_CODE

type OPR_CODE int // Operation code
const (
	PUT OPR_CODE = iota
	DELETE
	GET
)

type Operation

type Operation struct {
	Op       OPR_CODE   // Operation code
	Key      []byte     // Key for the operation
	Value    []byte     // Value for the operation
	Rollback *Operation // Pointer to the operation that will undo this operation

} // fields must be exported for gob

Operation Used for transaction operations and WAL

type SSTable

type SSTable struct {
	// contains filtered or unexported fields
}

SSTable is the structure for the SSTable files

type SSTableIterator

type SSTableIterator struct {
	// contains filtered or unexported fields
}

SSTableIterator is the structure for the SSTable iterator

type Transaction

type Transaction struct {
	// contains filtered or unexported fields
}

Transaction is the structure for the transactions

func (*Transaction) AddOperation

func (txn *Transaction) AddOperation(op OPR_CODE, key, value []byte)

AddOperation adds an operation to a transaction

func (*Transaction) Commit

func (txn *Transaction) Commit(k4 *K4) error

Commit commits a transaction

func (*Transaction) Remove

func (txn *Transaction) Remove(k4 *K4)

Remove removes a transaction from the list of transactions in K4

func (*Transaction) Rollback

func (txn *Transaction) Rollback(k4 *K4) error

Rollback rolls back a transaction (after a commit)

type WALIterator

type WALIterator struct {
	// contains filtered or unexported fields
}

WALIterator is the structure for the WAL iterator

Directories

Path Synopsis
Package cuckoofilter implements a custom cuckoo filter data structure that allows for inserting and looking up keys with associated prefixes.
Package cuckoofilter implements a custom cuckoo filter data structure that allows for inserting and looking up keys with associated prefixes.
Package pager BSD 3-Clause License
Package pager BSD 3-Clause License
Package skiplist BSD 3-Clause License
Package skiplist BSD 3-Clause License

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL