badger

package module
v3.2103.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 15, 2022 License: Apache-2.0 Imports: 39 Imported by: 1,080

README

BadgerDB

Go Reference Go Report Card Sourcegraph ci-badger-tests ci-badger-bank-tests ci-golang-lint

Badger mascot

BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go. It is the underlying database for Dgraph, a fast, distributed graph database. It's meant to be a performant alternative to non-Go-based key-value stores like RocksDB.

Project Status

Badger is stable and is being used to serve data sets worth hundreds of terabytes. Badger supports concurrent ACID transactions with serializable snapshot isolation (SSI) guarantees. A Jepsen-style bank test runs nightly for 8h, with --race flag and ensures the maintenance of transactional guarantees. Badger has also been tested to work with filesystem level anomalies, to ensure persistence and consistency. Badger is being used by a number of projects which includes Dgraph, Jaeger Tracing, UsenetExpress, and many more.

The list of projects using Badger can be found here.

Badger v1.0 was released in Nov 2017, and the latest version that is data-compatible with v1.0 is v1.6.0.

Badger v2.0 was released in Nov 2019 with a new storage format which won't be compatible with all of the v1.x. Badger v2.0 supports compression, encryption and uses a cache to speed up lookup.

Badger v3.0 was released in January 2021. This release improves compaction performance.

Please consult the Changelog for more detailed information on releases.

For more details on our version naming schema please read Choosing a version.

Table of Contents

Getting Started

Installing

To start using Badger, install Go 1.12 or above. Badger v3 needs go modules. From your project, run the following command

$ go get github.com/dgraph-io/badger/v3

This will retrieve the library.

Installing Badger Command Line Tool

Badger provides a CLI tool which can perform certain operations like offline backup/restore. To install the Badger CLI, retrieve the repository and checkout the desired version. Then run

$ cd badger
$ go install .

This will install the badger command line utility into your $GOBIN path.

Choosing a version

BadgerDB is a pretty special package from the point of view that the most important change we can make to it is not on its API but rather on how data is stored on disk.

This is why we follow a version naming schema that differs from Semantic Versioning.

  • New major versions are released when the data format on disk changes in an incompatible way.
  • New minor versions are released whenever the API changes but data compatibility is maintained. Note that the changes on the API could be backward-incompatible - unlike Semantic Versioning.
  • New patch versions are released when there's no changes to the data format nor the API.

Following these rules:

  • v1.5.0 and v1.6.0 can be used on top of the same files without any concerns, as their major version is the same, therefore the data format on disk is compatible.
  • v1.6.0 and v2.0.0 are data incompatible as their major version implies, so files created with v1.6.0 will need to be converted into the new format before they can be used by v2.0.0.
  • v2.x.x and v3.x.x are data incompatible as their major version implies, so files created with v2.x.x will need to be converted into the new format before they can be used by v3.0.0.

For a longer explanation on the reasons behind using a new versioning naming schema, you can read VERSIONING.

Badger Documentation

Badger Documentation is available at https://dgraph.io/docs/badger

Resources

Blog Posts
  1. Introducing Badger: A fast key-value store written natively in Go
  2. Make Badger crash resilient with ALICE
  3. Badger vs LMDB vs BoltDB: Benchmarking key-value databases in Go
  4. Concurrent ACID Transactions in Badger

Design

Badger was written with these design goals in mind:

  • Write a key-value database in pure Go.
  • Use latest research to build the fastest KV database for data sets spanning terabytes.
  • Optimize for SSDs.

Badger’s design is based on a paper titled WiscKey: Separating Keys from Values in SSD-conscious Storage.

Comparisons
Feature Badger RocksDB BoltDB
Design LSM tree with value log LSM tree only B+ tree
High Read throughput Yes No Yes
High Write throughput Yes Yes No
Designed for SSDs Yes (with latest research 1) Not specifically 2 No
Embeddable Yes Yes Yes
Sorted KV access Yes Yes Yes
Pure Go (no Cgo) Yes No Yes
Transactions Yes, ACID, concurrent with SSI3 Yes (but non-ACID) Yes, ACID
Snapshots Yes Yes Yes
TTL support Yes Yes No
3D access (key-value-version) Yes4 No No

1 The WISCKEY paper (on which Badger is based) saw big wins with separating values from keys, significantly reducing the write amplification compared to a typical LSM tree.

2 RocksDB is an SSD optimized version of LevelDB, which was designed specifically for rotating disks. As such RocksDB's design isn't aimed at SSDs.

3 SSI: Serializable Snapshot Isolation. For more details, see the blog post Concurrent ACID Transactions in Badger

4 Badger provides direct access to value versions via its Iterator API. Users can also specify how many versions to keep per key via Options.

Benchmarks

We have run comprehensive benchmarks against RocksDB, Bolt and LMDB. The benchmarking code, and the detailed logs for the benchmarks can be found in the badger-bench repo. More explanation, including graphs can be found the blog posts (linked above).

Projects Using Badger

Below is a list of known projects that use Badger:

  • Dgraph - Distributed graph database.
  • Jaeger - Distributed tracing platform.
  • go-ipfs - Go client for the InterPlanetary File System (IPFS), a new hypermedia distribution protocol.
  • Riot - An open-source, distributed search engine.
  • emitter - Scalable, low latency, distributed pub/sub broker with message storage, uses MQTT, gossip and badger.
  • OctoSQL - Query tool that allows you to join, analyse and transform data from multiple databases using SQL.
  • Dkron - Distributed, fault tolerant job scheduling system.
  • smallstep/certificates - Step-ca is an online certificate authority for secure, automated certificate management.
  • Sandglass - distributed, horizontally scalable, persistent, time sorted message queue.
  • TalariaDB - Grab's Distributed, low latency time-series database.
  • Sloop - Salesforce's Kubernetes History Visualization Project.
  • Immudb - Lightweight, high-speed immutable database for systems and applications.
  • Usenet Express - Serving over 300TB of data with Badger.
  • gorush - A push notification server written in Go.
  • 0-stor - Single device object store.
  • Dispatch Protocol - Blockchain protocol for distributed application data analytics.
  • GarageMQ - AMQP server written in Go.
  • RedixDB - A real-time persistent key-value store with the same redis protocol.
  • BBVA - Raft backend implementation using BadgerDB for Hashicorp raft.
  • Fantom - aBFT Consensus platform for distributed applications.
  • decred - An open, progressive, and self-funding cryptocurrency with a system of community-based governance integrated into its blockchain.
  • OpenNetSys - Create useful dApps in any software language.
  • HoneyTrap - An extensible and opensource system for running, monitoring and managing honeypots.
  • Insolar - Enterprise-ready blockchain platform.
  • IoTeX - The next generation of the decentralized network for IoT powered by scalability- and privacy-centric blockchains.
  • go-sessions - The sessions manager for Go net/http and fasthttp.
  • Babble - BFT Consensus platform for distributed applications.
  • Tormenta - Embedded object-persistence layer / simple JSON database for Go projects.
  • BadgerHold - An embeddable NoSQL store for querying Go types built on Badger
  • Goblero - Pure Go embedded persistent job queue backed by BadgerDB
  • Surfline - Serving global wave and weather forecast data with Badger.
  • Cete - Simple and highly available distributed key-value store built on Badger. Makes it easy bringing up a cluster of Badger with Raft consensus algorithm by hashicorp/raft.
  • Volument - A new take on website analytics backed by Badger.
  • KVdb - Hosted key-value store and serverless platform built on top of Badger.
  • Terminotes - Self hosted notes storage and search server - storage powered by BadgerDB
  • Pyroscope - Open source confinuous profiling platform built with BadgerDB
  • Veri - A distributed feature store optimized for Search and Recommendation tasks.
  • bIter - A library and Iterator interface for working with the badger.Iterator, simplifying from-to, and prefix mechanics.
  • ld - (Lean Database) A very simple gRPC-only key-value database, exposing BadgerDB with key-range scanning semantics.
  • Souin - A RFC compliant HTTP cache with lot of other features based on Badger for the storage. Compatible with all existing reverse-proxies.
  • Xuperchain - A highly flexible blockchain architecture with great transaction performance.
  • m2 - A simple http key/value store based on the raft protocol.
  • chaindb - A blockchain storage layer used by Gossamer, a Go client for the Polkadot Network.
  • vxdb - Simple schema-less Key-Value NoSQL database with simplest API interface.
  • Opacity - Backend implementation for the Opacity storage project
  • Vephar - A minimal key/value store using hashicorp-raft for cluster coordination and Badger for data storage.

If you are using Badger in a project please send a pull request to add it to the list.

Contributing

If you're interested in contributing to Badger see CONTRIBUTING.

Contact

Documentation

Overview

Package badger implements an embeddable, simple and fast key-value database, written in pure Go. It is designed to be highly performant for both reads and writes simultaneously. Badger uses Multi-Version Concurrency Control (MVCC), and supports transactions. It runs transactions concurrently, with serializable snapshot isolation guarantees.

Badger uses an LSM tree along with a value log to separate keys from values, hence reducing both write amplification and the size of the LSM tree. This allows LSM tree to be served entirely from RAM, while the values are served from SSD.

Usage

Badger has the following main types: DB, Txn, Item and Iterator. DB contains keys that are associated with values. It must be opened with the appropriate options before it can be accessed.

All operations happen inside a Txn. Txn represents a transaction, which can be read-only or read-write. Read-only transactions can read values for a given key (which are returned inside an Item), or iterate over a set of key-value pairs using an Iterator (which are returned as Item type values as well). Read-write transactions can also update and delete keys from the DB.

See the examples for more usage details.

Index

Examples

Constants

View Source
const (
	// KeyRegistryFileName is the file name for the key registry file.
	KeyRegistryFileName = "KEYREGISTRY"
	// KeyRegistryRewriteFileName is the file name for the rewrite key registry file.
	KeyRegistryRewriteFileName = "REWRITE-KEYREGISTRY"
)
View Source
const (
	DEBUG loggingLevel = iota
	INFO
	WARNING
	ERROR
)
View Source
const (
	// ManifestFilename is the filename for the manifest file.
	ManifestFilename = "MANIFEST"
)
View Source
const (
	// ValueThresholdLimit is the maximum permissible value of opt.ValueThreshold.
	ValueThresholdLimit = math.MaxUint16 - 16 + 1
)

Variables

View Source
var (
	// ErrValueLogSize is returned when opt.ValueLogFileSize option is not within the valid
	// range.
	ErrValueLogSize = errors.New("Invalid ValueLogFileSize, must be in range [1MB, 2GB)")

	// ErrKeyNotFound is returned when key isn't found on a txn.Get.
	ErrKeyNotFound = errors.New("Key not found")

	// ErrTxnTooBig is returned if too many writes are fit into a single transaction.
	ErrTxnTooBig = errors.New("Txn is too big to fit into one request")

	// ErrConflict is returned when a transaction conflicts with another transaction. This can
	// happen if the read rows had been updated concurrently by another transaction.
	ErrConflict = errors.New("Transaction Conflict. Please retry")

	// ErrReadOnlyTxn is returned if an update function is called on a read-only transaction.
	ErrReadOnlyTxn = errors.New("No sets or deletes are allowed in a read-only transaction")

	// ErrDiscardedTxn is returned if a previously discarded transaction is re-used.
	ErrDiscardedTxn = errors.New("This transaction has been discarded. Create a new one")

	// ErrEmptyKey is returned if an empty key is passed on an update function.
	ErrEmptyKey = errors.New("Key cannot be empty")

	// ErrInvalidKey is returned if the key has a special !badger! prefix,
	// reserved for internal usage.
	ErrInvalidKey = errors.New("Key is using a reserved !badger! prefix")

	// ErrBannedKey is returned if the read/write key belongs to any banned namespace.
	ErrBannedKey = errors.New("Key is using the banned prefix")

	// ErrThresholdZero is returned if threshold is set to zero, and value log GC is called.
	// In such a case, GC can't be run.
	ErrThresholdZero = errors.New(
		"Value log GC can't run because threshold is set to zero")

	// ErrNoRewrite is returned if a call for value log GC doesn't result in a log file rewrite.
	ErrNoRewrite = errors.New(
		"Value log GC attempt didn't result in any cleanup")

	// ErrRejected is returned if a value log GC is called either while another GC is running, or
	// after DB::Close has been called.
	ErrRejected = errors.New("Value log GC request rejected")

	// ErrInvalidRequest is returned if the user request is invalid.
	ErrInvalidRequest = errors.New("Invalid request")

	// ErrManagedTxn is returned if the user tries to use an API which isn't
	// allowed due to external management of transactions, when using ManagedDB.
	ErrManagedTxn = errors.New(
		"Invalid API request. Not allowed to perform this action using ManagedDB")

	// ErrNamespaceMode is returned if the user tries to use an API which is allowed only when
	// NamespaceOffset is non-negative.
	ErrNamespaceMode = errors.New(
		"Invalid API request. Not allowed to perform this action when NamespaceMode is not set.")

	// ErrInvalidDump if a data dump made previously cannot be loaded into the database.
	ErrInvalidDump = errors.New("Data dump cannot be read")

	// ErrZeroBandwidth is returned if the user passes in zero bandwidth for sequence.
	ErrZeroBandwidth = errors.New("Bandwidth must be greater than zero")

	// ErrWindowsNotSupported is returned when opt.ReadOnly is used on Windows
	ErrWindowsNotSupported = errors.New("Read-only mode is not supported on Windows")

	// ErrPlan9NotSupported is returned when opt.ReadOnly is used on Plan 9
	ErrPlan9NotSupported = errors.New("Read-only mode is not supported on Plan 9")

	// ErrTruncateNeeded is returned when the value log gets corrupt, and requires truncation of
	// corrupt data to allow Badger to run properly.
	ErrTruncateNeeded = errors.New(
		"Log truncate required to run DB. This might result in data loss")

	// ErrBlockedWrites is returned if the user called DropAll. During the process of dropping all
	// data from Badger, we stop accepting new writes, by returning this error.
	ErrBlockedWrites = errors.New("Writes are blocked, possibly due to DropAll or Close")

	// ErrNilCallback is returned when subscriber's callback is nil.
	ErrNilCallback = errors.New("Callback cannot be nil")

	// ErrEncryptionKeyMismatch is returned when the storage key is not
	// matched with the key previously given.
	ErrEncryptionKeyMismatch = errors.New("Encryption key mismatch")

	// ErrInvalidDataKeyID is returned if the datakey id is invalid.
	ErrInvalidDataKeyID = errors.New("Invalid datakey id")

	// ErrInvalidEncryptionKey is returned if length of encryption keys is invalid.
	ErrInvalidEncryptionKey = errors.New("Encryption key's length should be" +
		"either 16, 24, or 32 bytes")
	// ErrGCInMemoryMode is returned when db.RunValueLogGC is called in in-memory mode.
	ErrGCInMemoryMode = errors.New("Cannot run value log GC when DB is opened in InMemory mode")

	// ErrDBClosed is returned when a get operation is performed after closing the DB.
	ErrDBClosed = errors.New("DB Closed")
)
View Source
var DefaultIteratorOptions = IteratorOptions{
	PrefetchValues: true,
	PrefetchSize:   100,
	Reverse:        false,
	AllVersions:    false,
}

DefaultIteratorOptions contains default options when iterating over Badger key-value stores.

Functions

func BufferToKVList

func BufferToKVList(buf *z.Buffer) (*pb.KVList, error)

func InitDiscardStats added in v3.2103.0

func InitDiscardStats(opt Options) (*discardStats, error)

func KVToBuffer

func KVToBuffer(kv *pb.KV, buf *z.Buffer)

func WriteKeyRegistry

func WriteKeyRegistry(reg *KeyRegistry, opt KeyRegistryOptions) error

WriteKeyRegistry will rewrite the existing key registry file with new one. It is okay to give closed key registry. Since, it's using only the datakey.

Types

type CacheType

type CacheType int
const (
	BlockCache CacheType = iota
	IndexCache
)

type DB

type DB struct {
	// contains filtered or unexported fields
}

DB provides the various functions required to interact with Badger. DB is thread-safe.

func Open

func Open(opt Options) (*DB, error)

Open returns a new DB object.

Example
dir, err := ioutil.TempDir("", "badger-test")
if err != nil {
	panic(err)
}
defer removeDir(dir)
db, err := Open(DefaultOptions(dir))
if err != nil {
	panic(err)
}
defer db.Close()

err = db.View(func(txn *Txn) error {
	_, err := txn.Get([]byte("key"))
	// We expect ErrKeyNotFound
	fmt.Println(err)
	return nil
})

if err != nil {
	panic(err)
}

txn := db.NewTransaction(true) // Read-write txn
err = txn.SetEntry(NewEntry([]byte("key"), []byte("value")))
if err != nil {
	panic(err)
}
err = txn.Commit()
if err != nil {
	panic(err)
}

err = db.View(func(txn *Txn) error {
	item, err := txn.Get([]byte("key"))
	if err != nil {
		return err
	}
	val, err := item.ValueCopy(nil)
	if err != nil {
		return err
	}
	fmt.Printf("%s\n", string(val))
	return nil
})

if err != nil {
	panic(err)
}
Output:

Key not found
value

func OpenManaged

func OpenManaged(opts Options) (*DB, error)

OpenManaged returns a new DB, which allows more control over setting transaction timestamps, aka managed mode.

This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.

func (*DB) Backup

func (db *DB) Backup(w io.Writer, since uint64) (uint64, error)

Backup dumps a protobuf-encoded list of all entries in the database into the given writer, that are newer than or equal to the specified version. It returns a timestamp (version) indicating the version of last entry that is dumped, which after incrementing by 1 can be passed into later invocation to generate incremental backup of entries that have been added/modified since the last invocation of DB.Backup(). DB.Backup is a wrapper function over Stream.Backup to generate full and incremental backups of the DB. For more control over how many goroutines are used to generate the backup, or if you wish to backup only a certain range of keys, use Stream.Backup directly.

func (*DB) BanNamespace added in v3.2103.0

func (db *DB) BanNamespace(ns uint64) error

BanNamespace bans a namespace. Read/write to keys belonging to any of such namespace is denied.

func (*DB) BannedNamespaces added in v3.2103.0

func (db *DB) BannedNamespaces() []uint64

BannedNamespaces returns the list of prefixes banned for DB.

func (*DB) BlockCacheMetrics

func (db *DB) BlockCacheMetrics() *ristretto.Metrics

BlockCacheMetrics returns the metrics for the underlying block cache.

func (*DB) CacheMaxCost

func (db *DB) CacheMaxCost(cache CacheType, maxCost int64) (int64, error)

CacheMaxCost updates the max cost of the given cache (either block or index cache). The call will have an effect only if the DB was created with the cache. Otherwise it is a no-op. If you pass a negative value, the function will return the current value without updating it.

func (*DB) Close

func (db *DB) Close() error

Close closes a DB. It's crucial to call it to ensure all the pending updates make their way to disk. Calling DB.Close() multiple times would still only close the DB once.

func (*DB) DropAll

func (db *DB) DropAll() error

DropAll would drop all the data stored in Badger. It does this in the following way. - Stop accepting new writes. - Pause memtable flushes and compactions. - Pick all tables from all levels, create a changeset to delete all these tables and apply it to manifest. - Pick all log files from value log, and delete all of them. Restart value log files from zero. - Resume memtable flushes and compactions.

NOTE: DropAll is resilient to concurrent writes, but not to reads. It is up to the user to not do any reads while DropAll is going on, otherwise they may result in panics. Ideally, both reads and writes are paused before running DropAll, and resumed after it is finished.

func (*DB) DropPrefix

func (db *DB) DropPrefix(prefixes ...[]byte) error

DropPrefix would drop all the keys with the provided prefix. It does this in the following way:

  • Stop accepting new writes.
  • Stop memtable flushes before acquiring lock. Because we're acquring lock here and memtable flush stalls for lock, which leads to deadlock
  • Flush out all memtables, skipping over keys with the given prefix, Kp.
  • Write out the value log header to memtables when flushing, so we don't accidentally bring Kp back after a restart.
  • Stop compaction.
  • Compact L0->L1, skipping over Kp.
  • Compact rest of the levels, Li->Li, picking tables which have Kp.
  • Resume memtable flushes, compactions and writes.

func (*DB) EstimateSize added in v3.2103.0

func (db *DB) EstimateSize(prefix []byte) (uint64, uint64)

EstimateSize can be used to get rough estimate of data size for a given prefix.

func (*DB) Flatten

func (db *DB) Flatten(workers int) error

Flatten can be used to force compactions on the LSM tree so all the tables fall on the same level. This ensures that all the versions of keys are colocated and not split across multiple levels, which is necessary after a restore from backup. During Flatten, live compactions are stopped. Ideally, no writes are going on during Flatten. Otherwise, it would create competition between flattening the tree and new tables being created at level zero.

func (*DB) GetMergeOperator

func (db *DB) GetMergeOperator(key []byte,
	f MergeFunc, dur time.Duration) *MergeOperator

GetMergeOperator creates a new MergeOperator for a given key and returns a pointer to it. It also fires off a goroutine that performs a compaction using the merge function that runs periodically, as specified by dur.

func (*DB) GetSequence

func (db *DB) GetSequence(key []byte, bandwidth uint64) (*Sequence, error)

GetSequence would initiate a new sequence object, generating it from the stored lease, if available, in the database. Sequence can be used to get a list of monotonically increasing integers. Multiple sequences can be created by providing different keys. Bandwidth sets the size of the lease, determining how many Next() requests can be served from memory.

GetSequence is not supported on ManagedDB. Calling this would result in a panic.

func (*DB) IndexCacheMetrics

func (db *DB) IndexCacheMetrics() *ristretto.Metrics

IndexCacheMetrics returns the metrics for the underlying index cache.

func (*DB) IsClosed

func (db *DB) IsClosed() bool

IsClosed denotes if the badger DB is closed or not. A DB instance should not be used after closing it.

func (*DB) Levels

func (db *DB) Levels() []LevelInfo

Levels gets the LevelInfo.

func (*DB) LevelsToString

func (db *DB) LevelsToString() string

func (*DB) Load

func (db *DB) Load(r io.Reader, maxPendingWrites int) error

Load reads a protobuf-encoded list of all entries from a reader and writes them to the database. This can be used to restore the database from a backup made by calling DB.Backup(). If more complex logic is needed to restore a badger backup, the KVLoader interface should be used instead.

DB.Load() should be called on a database that is not running any other concurrent transactions while it is running.

func (*DB) MaxBatchCount

func (db *DB) MaxBatchCount() int64

MaxBatchCount returns max possible entries in batch

func (*DB) MaxBatchSize

func (db *DB) MaxBatchSize() int64

MaxBatchSize returns max possible batch size

func (*DB) MaxVersion

func (db *DB) MaxVersion() uint64

func (*DB) NewKVLoader

func (db *DB) NewKVLoader(maxPendingWrites int) *KVLoader

NewKVLoader returns a new instance of KVLoader.

func (*DB) NewManagedWriteBatch

func (db *DB) NewManagedWriteBatch() *WriteBatch

func (*DB) NewStream

func (db *DB) NewStream() *Stream

NewStream creates a new Stream.

func (*DB) NewStreamAt

func (db *DB) NewStreamAt(readTs uint64) *Stream

NewStreamAt creates a new Stream at a particular timestamp. Should only be used with managed DB.

func (*DB) NewStreamWriter

func (db *DB) NewStreamWriter() *StreamWriter

NewStreamWriter creates a StreamWriter. Right after creating StreamWriter, Prepare must be called. The memory usage of a StreamWriter is directly proportional to the number of streams possible. So, efforts must be made to keep the number of streams low. Stream framework would typically use 16 goroutines and hence create 16 streams.

func (*DB) NewTransaction

func (db *DB) NewTransaction(update bool) *Txn

NewTransaction creates a new transaction. Badger supports concurrent execution of transactions, providing serializable snapshot isolation, avoiding write skews. Badger achieves this by tracking the keys read and at Commit time, ensuring that these read keys weren't concurrently modified by another transaction.

For read-only transactions, set update to false. In this mode, we don't track the rows read for any changes. Thus, any long running iterations done in this mode wouldn't pay this overhead.

Running transactions concurrently is OK. However, a transaction itself isn't thread safe, and should only be run serially. It doesn't matter if a transaction is created by one goroutine and passed down to other, as long as the Txn APIs are called serially.

When you create a new transaction, it is absolutely essential to call Discard(). This should be done irrespective of what the update param is set to. Commit API internally runs Discard, but running it twice wouldn't cause any issues.

txn := db.NewTransaction(false)
defer txn.Discard()
// Call various APIs.

func (*DB) NewTransactionAt

func (db *DB) NewTransactionAt(readTs uint64, update bool) *Txn

NewTransactionAt follows the same logic as DB.NewTransaction(), but uses the provided read timestamp.

This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.

func (*DB) NewWriteBatch

func (db *DB) NewWriteBatch() *WriteBatch

NewWriteBatch creates a new WriteBatch. This provides a way to conveniently do a lot of writes, batching them up as tightly as possible in a single transaction and using callbacks to avoid waiting for them to commit, thus achieving good performance. This API hides away the logic of creating and committing transactions. Due to the nature of SSI guaratees provided by Badger, blind writes can never encounter transaction conflicts (ErrConflict).

func (*DB) NewWriteBatchAt

func (db *DB) NewWriteBatchAt(commitTs uint64) *WriteBatch

NewWriteBatchAt is similar to NewWriteBatch but it allows user to set the commit timestamp. NewWriteBatchAt is supposed to be used only in the managed mode.

func (*DB) Opts

func (db *DB) Opts() Options

Opts returns a copy of the DB options.

func (*DB) PrintHistogram

func (db *DB) PrintHistogram(keyPrefix []byte)

PrintHistogram builds and displays the key-value size histogram. When keyPrefix is set, only the keys that have prefix "keyPrefix" are considered for creating the histogram

func (*DB) Ranges added in v3.2103.0

func (db *DB) Ranges(prefix []byte, numRanges int) []*keyRange

Ranges can be used to get rough key ranges to divide up iteration over the DB. The ranges here would consider the prefix, but would not necessarily start or end with the prefix. In fact, the first range would have nil as left key, and the last range would have nil as the right key.

func (*DB) RunValueLogGC

func (db *DB) RunValueLogGC(discardRatio float64) error

RunValueLogGC triggers a value log garbage collection.

It picks value log files to perform GC based on statistics that are collected during compactions. If no such statistics are available, then log files are picked in random order. The process stops as soon as the first log file is encountered which does not result in garbage collection.

When a log file is picked, it is first sampled. If the sample shows that we can discard at least discardRatio space of that file, it would be rewritten.

If a call to RunValueLogGC results in no rewrites, then an ErrNoRewrite is thrown indicating that the call resulted in no file rewrites.

We recommend setting discardRatio to 0.5, thus indicating that a file be rewritten if half the space can be discarded. This results in a lifetime value log write amplification of 2 (1 from original write + 0.5 rewrite + 0.25 + 0.125 + ... = 2). Setting it to higher value would result in fewer space reclaims, while setting it to a lower value would result in more space reclaims at the cost of increased activity on the LSM tree. discardRatio must be in the range (0.0, 1.0), both endpoints excluded, otherwise an ErrInvalidRequest is returned.

Only one GC is allowed at a time. If another value log GC is running, or DB has been closed, this would return an ErrRejected.

Note: Every time GC is run, it would produce a spike of activity on the LSM tree.

func (*DB) SetDiscardTs

func (db *DB) SetDiscardTs(ts uint64)

SetDiscardTs sets a timestamp at or below which, any invalid or deleted versions can be discarded from the LSM tree, and thence from the value log to reclaim disk space. Can only be used with managed transactions.

func (*DB) Size

func (db *DB) Size() (lsm, vlog int64)

Size returns the size of lsm and value log files in bytes. It can be used to decide how often to call RunValueLogGC.

func (*DB) StreamDB

func (db *DB) StreamDB(outOptions Options) error

Stream the contents of this DB to a new DB with options outOptions that will be created in outDir.

func (*DB) Subscribe

func (db *DB) Subscribe(ctx context.Context, cb func(kv *KVList) error, matches []pb.Match) error

Subscribe can be used to watch key changes for the given key prefixes and the ignore string. At least one prefix should be passed, or an error will be returned. You can use an empty prefix to monitor all changes to the DB. Ignore string is the byte ranges for which prefix matching will be ignored. For example: ignore = "2-3", and prefix = "abc" will match for keys "abxxc", "abdfc" etc. This function blocks until the given context is done or an error occurs. The given function will be called with a new KVList containing the modified keys and the corresponding values.

func (*DB) Sync

func (db *DB) Sync() error

Sync syncs database content to disk. This function provides more control to user to sync data whenever required.

func (*DB) Tables

func (db *DB) Tables() []TableInfo

Tables gets the TableInfo objects from the level controller. If withKeysCount is true, TableInfo objects also contain counts of keys for the tables.

func (*DB) Update

func (db *DB) Update(fn func(txn *Txn) error) error

Update executes a function, creating and managing a read-write transaction for the user. Error returned by the function is relayed by the Update method. Update cannot be used with managed transactions.

func (*DB) VerifyChecksum

func (db *DB) VerifyChecksum() error

VerifyChecksum verifies checksum for all tables on all levels. This method can be used to verify checksum, if opt.ChecksumVerificationMode is NoVerification.

func (*DB) View

func (db *DB) View(fn func(txn *Txn) error) error

View executes a function creating and managing a read-only transaction for the user. Error returned by the function is relayed by the View method. If View is used with managed transactions, it would assume a read timestamp of MaxUint64.

type Entry

type Entry struct {
	Key       []byte
	Value     []byte
	ExpiresAt uint64 // time.Unix

	UserMeta byte
	// contains filtered or unexported fields
}

Entry provides Key, Value, UserMeta and ExpiresAt. This struct can be used by the user to set data.

func NewEntry

func NewEntry(key, value []byte) *Entry

NewEntry creates a new entry with key and value passed in args. This newly created entry can be set in a transaction by calling txn.SetEntry(). All other properties of Entry can be set by calling WithMeta, WithDiscard, WithTTL methods on it. This function uses key and value reference, hence users must not modify key and value until the end of transaction.

func (*Entry) WithDiscard

func (e *Entry) WithDiscard() *Entry

WithDiscard adds a marker to Entry e. This means all the previous versions of the key (of the Entry) will be eligible for garbage collection. This method is only useful if you have set a higher limit for options.NumVersionsToKeep. The default setting is 1, in which case, this function doesn't add any more benefit. If however, you have a higher setting for NumVersionsToKeep (in Dgraph, we set it to infinity), you can use this method to indicate that all the older versions can be discarded and removed during compactions.

func (*Entry) WithMeta

func (e *Entry) WithMeta(meta byte) *Entry

WithMeta adds meta data to Entry e. This byte is stored alongside the key and can be used as an aid to interpret the value or store other contextual bits corresponding to the key-value pair of entry.

func (*Entry) WithTTL

func (e *Entry) WithTTL(dur time.Duration) *Entry

WithTTL adds time to live duration to Entry e. Entry stored with a TTL would automatically expire after the time has elapsed, and will be eligible for garbage collection.

type Item

type Item struct {
	// contains filtered or unexported fields
}

Item is returned during iteration. Both the Key() and Value() output is only valid until iterator.Next() is called.

func (*Item) DiscardEarlierVersions

func (item *Item) DiscardEarlierVersions() bool

DiscardEarlierVersions returns whether the item was created with the option to discard earlier versions of a key when multiple are available.

func (*Item) EstimatedSize

func (item *Item) EstimatedSize() int64

EstimatedSize returns the approximate size of the key-value pair.

This can be called while iterating through a store to quickly estimate the size of a range of key-value pairs (without fetching the corresponding values).

func (*Item) ExpiresAt

func (item *Item) ExpiresAt() uint64

ExpiresAt returns a Unix time value indicating when the item will be considered expired. 0 indicates that the item will never expire.

func (*Item) IsDeletedOrExpired

func (item *Item) IsDeletedOrExpired() bool

IsDeletedOrExpired returns true if item contains deleted or expired value.

func (*Item) Key

func (item *Item) Key() []byte

Key returns the key.

Key is only valid as long as item is valid, or transaction is valid. If you need to use it outside its validity, please use KeyCopy.

func (*Item) KeyCopy

func (item *Item) KeyCopy(dst []byte) []byte

KeyCopy returns a copy of the key of the item, writing it to dst slice. If nil is passed, or capacity of dst isn't sufficient, a new slice would be allocated and returned.

func (*Item) KeySize

func (item *Item) KeySize() int64

KeySize returns the size of the key. Exact size of the key is key + 8 bytes of timestamp

func (*Item) String

func (item *Item) String() string

String returns a string representation of Item

func (*Item) UserMeta

func (item *Item) UserMeta() byte

UserMeta returns the userMeta set by the user. Typically, this byte, optionally set by the user is used to interpret the value.

func (*Item) Value

func (item *Item) Value(fn func(val []byte) error) error

Value retrieves the value of the item from the value log.

This method must be called within a transaction. Calling it outside a transaction is considered undefined behavior. If an iterator is being used, then Item.Value() is defined in the current iteration only, because items are reused.

If you need to use a value outside a transaction, please use Item.ValueCopy instead, or copy it yourself. Value might change once discard or commit is called. Use ValueCopy if you want to do a Set after Get.

func (*Item) ValueCopy

func (item *Item) ValueCopy(dst []byte) ([]byte, error)

ValueCopy returns a copy of the value of the item from the value log, writing it to dst slice. If nil is passed, or capacity of dst isn't sufficient, a new slice would be allocated and returned. Tip: It might make sense to reuse the returned slice as dst argument for the next call.

This function is useful in long running iterate/update transactions to avoid a write deadlock. See Github issue: https://github.com/dgraph-io/badger/issues/315

func (*Item) ValueSize

func (item *Item) ValueSize() int64

ValueSize returns the approximate size of the value.

This can be called to quickly estimate the size of a value without fetching it.

func (*Item) Version

func (item *Item) Version() uint64

Version returns the commit timestamp of the item.

type Iterator

type Iterator struct {

	// ThreadId is an optional value that can be set to identify which goroutine created
	// the iterator. It can be used, for example, to uniquely identify each of the
	// iterators created by the stream interface
	ThreadId int

	Alloc *z.Allocator
	// contains filtered or unexported fields
}

Iterator helps iterating over the KV pairs in a lexicographically sorted order.

func (*Iterator) Close

func (it *Iterator) Close()

Close would close the iterator. It is important to call this when you're done with iteration.

func (*Iterator) Item

func (it *Iterator) Item() *Item

Item returns pointer to the current key-value pair. This item is only valid until it.Next() gets called.

func (*Iterator) Next

func (it *Iterator) Next()

Next would advance the iterator by one. Always check it.Valid() after a Next() to ensure you have access to a valid it.Item().

func (*Iterator) Rewind

func (it *Iterator) Rewind()

Rewind would rewind the iterator cursor all the way to zero-th position, which would be the smallest key if iterating forward, and largest if iterating backward. It does not keep track of whether the cursor started with a Seek().

func (*Iterator) Seek

func (it *Iterator) Seek(key []byte)

Seek would seek to the provided key if present. If absent, it would seek to the next smallest key greater than the provided key if iterating in the forward direction. Behavior would be reversed if iterating backwards.

func (*Iterator) Valid

func (it *Iterator) Valid() bool

Valid returns false when iteration is done.

func (*Iterator) ValidForPrefix

func (it *Iterator) ValidForPrefix(prefix []byte) bool

ValidForPrefix returns false when iteration is done or when the current key is not prefixed by the specified prefix.

type IteratorOptions

type IteratorOptions struct {
	// PrefetchSize is the number of KV pairs to prefetch while iterating.
	// Valid only if PrefetchValues is true.
	PrefetchSize int
	// PrefetchValues Indicates whether we should prefetch values during
	// iteration and store them.
	PrefetchValues bool
	Reverse        bool // Direction of iteration. False is forward, true is backward.
	AllVersions    bool // Fetch all valid versions of the same key.
	InternalAccess bool // Used to allow internal access to badger keys.

	Prefix  []byte // Only iterate over this given prefix.
	SinceTs uint64 // Only read data that has version > SinceTs.
	// contains filtered or unexported fields
}

IteratorOptions is used to set options when iterating over Badger key-value stores.

This package provides DefaultIteratorOptions which contains options that should work for most applications. Consider using that as a starting point before customizing it for your own needs.

type KVList

type KVList = pb.KVList

KVList contains a list of key-value pairs.

type KVLoader

type KVLoader struct {
	// contains filtered or unexported fields
}

KVLoader is used to write KVList objects in to badger. It can be used to restore a backup.

func (*KVLoader) Finish

func (l *KVLoader) Finish() error

Finish is meant to be called after all the key-value pairs have been loaded.

func (*KVLoader) Set

func (l *KVLoader) Set(kv *pb.KV) error

Set writes the key-value pair to the database.

type KeyRegistry

type KeyRegistry struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

KeyRegistry used to maintain all the data keys.

func OpenKeyRegistry

func OpenKeyRegistry(opt KeyRegistryOptions) (*KeyRegistry, error)

OpenKeyRegistry opens key registry if it exists, otherwise it'll create key registry and returns key registry.

func (*KeyRegistry) Close

func (kr *KeyRegistry) Close() error

Close closes the key registry.

func (*KeyRegistry) DataKey

func (kr *KeyRegistry) DataKey(id uint64) (*pb.DataKey, error)

DataKey returns datakey of the given key id.

func (*KeyRegistry) LatestDataKey

func (kr *KeyRegistry) LatestDataKey() (*pb.DataKey, error)

LatestDataKey will give you the latest generated datakey based on the rotation period. If the last generated datakey lifetime exceeds the rotation period. It'll create new datakey.

type KeyRegistryOptions

type KeyRegistryOptions struct {
	Dir                           string
	ReadOnly                      bool
	EncryptionKey                 []byte
	EncryptionKeyRotationDuration time.Duration
	InMemory                      bool
}

type LevelInfo

type LevelInfo struct {
	Level          int
	NumTables      int
	Size           int64
	TargetSize     int64
	TargetFileSize int64
	IsBaseLevel    bool
	Score          float64
	Adjusted       float64
	StaleDatSize   int64
}

type Logger

type Logger interface {
	Errorf(string, ...interface{})
	Warningf(string, ...interface{})
	Infof(string, ...interface{})
	Debugf(string, ...interface{})
}

Logger is implemented by any logging system that is used for standard logs.

type Manifest

type Manifest struct {
	Levels []levelManifest
	Tables map[uint64]TableManifest

	// Contains total number of creation and deletion changes in the manifest -- used to compute
	// whether it'd be useful to rewrite the manifest.
	Creations int
	Deletions int
}

Manifest represents the contents of the MANIFEST file in a Badger store.

The MANIFEST file describes the startup state of the db -- all LSM files and what level they're at.

It consists of a sequence of ManifestChangeSet objects. Each of these is treated atomically, and contains a sequence of ManifestChange's (file creations/deletions) which we use to reconstruct the manifest at startup.

func ReplayManifestFile

func ReplayManifestFile(fp *os.File) (Manifest, int64, error)

ReplayManifestFile reads the manifest file and constructs two manifest objects. (We need one immutable copy and one mutable copy of the manifest. Easiest way is to construct two of them.) Also, returns the last offset after a completely read manifest entry -- the file must be truncated at that point before further appends are made (if there is a partial entry after that). In normal conditions, truncOffset is the file size.

type MergeFunc

type MergeFunc func(existingVal, newVal []byte) []byte

MergeFunc accepts two byte slices, one representing an existing value, and another representing a new value that needs to be ‘merged’ into it. MergeFunc contains the logic to perform the ‘merge’ and return an updated value. MergeFunc could perform operations like integer addition, list appends etc. Note that the ordering of the operands is maintained.

type MergeOperator

type MergeOperator struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

MergeOperator represents a Badger merge operator.

func (*MergeOperator) Add

func (op *MergeOperator) Add(val []byte) error

Add records a value in Badger which will eventually be merged by a background routine into the values that were recorded by previous invocations to Add().

func (*MergeOperator) Get

func (op *MergeOperator) Get() ([]byte, error)

Get returns the latest value for the merge operator, which is derived by applying the merge function to all the values added so far.

If Add has not been called even once, Get will return ErrKeyNotFound.

func (*MergeOperator) Stop

func (op *MergeOperator) Stop()

Stop waits for any pending merge to complete and then stops the background goroutine.

type Options

type Options struct {
	Dir      string
	ValueDir string

	SyncWrites        bool
	NumVersionsToKeep int
	ReadOnly          bool
	Logger            Logger
	Compression       options.CompressionType
	InMemory          bool
	MetricsEnabled    bool
	// Sets the Stream.numGo field
	NumGoroutines int

	MemTableSize        int64
	BaseTableSize       int64
	BaseLevelSize       int64
	LevelSizeMultiplier int
	TableSizeMultiplier int
	MaxLevels           int

	VLogPercentile float64
	ValueThreshold int64
	NumMemtables   int
	// Changing BlockSize across DB runs will not break badger. The block size is
	// read from the block index stored at the end of the table.
	BlockSize          int
	BloomFalsePositive float64
	BlockCacheSize     int64
	IndexCacheSize     int64

	NumLevelZeroTables      int
	NumLevelZeroTablesStall int

	ValueLogFileSize   int64
	ValueLogMaxEntries uint32

	NumCompactors        int
	CompactL0OnClose     bool
	LmaxCompaction       bool
	ZSTDCompressionLevel int

	// When set, checksum will be validated for each entry read from the value log file.
	VerifyValueChecksum bool

	// Encryption related options.
	EncryptionKey                 []byte        // encryption key
	EncryptionKeyRotationDuration time.Duration // key rotation duration

	// BypassLockGuard will bypass the lock guard on badger. Bypassing lock
	// guard can cause data corruption if multiple badger instances are using
	// the same directory. Use this options with caution.
	BypassLockGuard bool

	// ChecksumVerificationMode decides when db should verify checksums for SSTable blocks.
	ChecksumVerificationMode options.ChecksumVerificationMode

	// DetectConflicts determines whether the transactions would be checked for
	// conflicts. The transactions can be processed at a higher rate when
	// conflict detection is disabled.
	DetectConflicts bool

	// NamespaceOffset specifies the offset from where the next 8 bytes contains the namespace.
	NamespaceOffset int
	// contains filtered or unexported fields
}

Options are params for creating DB object.

This package provides DefaultOptions which contains options that should work for most applications. Consider using that as a starting point before customizing it for your own needs.

Each option X is documented on the WithX method.

func DefaultOptions

func DefaultOptions(path string) Options

DefaultOptions sets a list of recommended options for good performance. Feel free to modify these to suit your needs with the WithX methods.

func LSMOnlyOptions

func LSMOnlyOptions(path string) Options

LSMOnlyOptions follows from DefaultOptions, but sets a higher ValueThreshold so values would be collocated with the LSM tree, with value log largely acting as a write-ahead log only. These options would reduce the disk usage of value log, and make Badger act more like a typical LSM tree.

func (*Options) Debugf

func (opt *Options) Debugf(format string, v ...interface{})

Debugf logs a DEBUG message to the logger specified in opts.

func (*Options) Errorf

func (opt *Options) Errorf(format string, v ...interface{})

Errorf logs an ERROR log message to the logger specified in opts or to the global logger if no logger is specified in opts.

func (Options) FromSuperFlag added in v3.2103.0

func (opt Options) FromSuperFlag(superflag string) Options

FromSuperFlag fills Options fields for each flag within the superflag. For example, replacing the default Options.NumGoroutines:

options := FromSuperFlag("numgoroutines=4", DefaultOptions(""))

It's important to note that if you pass an empty Options struct, FromSuperFlag will not fill it with default values. FromSuperFlag only writes to the fields present within the superflag string (case insensitive).

It specially handles compression subflag. Valid options are {none,snappy,zstd:<level>} Example: compression=zstd:3; Unsupported: Options.Logger, Options.EncryptionKey

func (*Options) Infof

func (opt *Options) Infof(format string, v ...interface{})

Infof logs an INFO message to the logger specified in opts.

func (*Options) Warningf

func (opt *Options) Warningf(format string, v ...interface{})

Warningf logs a WARNING message to the logger specified in opts.

func (Options) WithBaseLevelSize

func (opt Options) WithBaseLevelSize(val int64) Options

WithBaseLevelSize sets the maximum size target for the base level.

The default value is 10MB.

func (Options) WithBaseTableSize

func (opt Options) WithBaseTableSize(val int64) Options

WithBaseTableSize returns a new Options value with MaxTableSize set to the given value.

BaseTableSize sets the maximum size in bytes for LSM table or file in the base level.

The default value of BaseTableSize is 2MB.

func (Options) WithBlockCacheSize

func (opt Options) WithBlockCacheSize(size int64) Options

WithBlockCacheSize returns a new Options value with BlockCacheSize set to the given value.

This value specifies how much data cache should hold in memory. A small size of cache means lower memory consumption and lookups/iterations would take longer. It is recommended to use a cache if you're using compression or encryption. If compression and encryption both are disabled, adding a cache will lead to unnecessary overhead which will affect the read performance. Setting size to zero disables the cache altogether.

Default value of BlockCacheSize is zero.

func (Options) WithBlockSize

func (opt Options) WithBlockSize(val int) Options

WithBlockSize returns a new Options value with BlockSize set to the given value.

BlockSize sets the size of any block in SSTable. SSTable is divided into multiple blocks internally. Each block is compressed using prefix diff encoding.

The default value of BlockSize is 4KB.

func (Options) WithBloomFalsePositive

func (opt Options) WithBloomFalsePositive(val float64) Options

WithBloomFalsePositive returns a new Options value with BloomFalsePositive set to the given value.

BloomFalsePositive sets the false positive probability of the bloom filter in any SSTable. Before reading a key from table, the bloom filter is checked for key existence. BloomFalsePositive might impact read performance of DB. Lower BloomFalsePositive value might consume more memory.

The default value of BloomFalsePositive is 0.01.

Setting this to 0 disables the bloom filter completely.

func (Options) WithBypassLockGuard

func (opt Options) WithBypassLockGuard(b bool) Options

WithBypassLockGuard returns a new Options value with BypassLockGuard set to the given value.

When BypassLockGuard option is set, badger will not acquire a lock on the directory. This could lead to data corruption if multiple badger instances write to the same data directory. Use this option with caution.

The default value of BypassLockGuard is false.

func (Options) WithChecksumVerificationMode

func (opt Options) WithChecksumVerificationMode(cvMode options.ChecksumVerificationMode) Options

WithChecksumVerificationMode returns a new Options value with ChecksumVerificationMode set to the given value.

ChecksumVerificationMode indicates when the db should verify checksums for SSTable blocks.

The default value of VerifyValueChecksum is options.NoVerification.

func (Options) WithCompactL0OnClose

func (opt Options) WithCompactL0OnClose(val bool) Options

WithCompactL0OnClose determines whether Level 0 should be compacted before closing the DB. This ensures that both reads and writes are efficient when the DB is opened later.

The default value of CompactL0OnClose is false.

func (Options) WithCompression

func (opt Options) WithCompression(cType options.CompressionType) Options

WithCompression is used to enable or disable compression. When compression is enabled, every block will be compressed using the specified algorithm. This option doesn't affect existing tables. Only the newly created tables will be compressed.

The default compression algorithm used is zstd when built with Cgo. Without Cgo, the default is snappy. Compression is enabled by default.

func (Options) WithDetectConflicts

func (opt Options) WithDetectConflicts(b bool) Options

WithDetectConflicts returns a new Options value with DetectConflicts set to the given value.

Detect conflicts options determines if the transactions would be checked for conflicts before committing them. When this option is set to false (detectConflicts=false) badger can process transactions at a higher rate. Setting this options to false might be useful when the user application deals with conflict detection and resolution.

The default value of Detect conflicts is True.

func (Options) WithDir

func (opt Options) WithDir(val string) Options

WithDir returns a new Options value with Dir set to the given value.

Dir is the path of the directory where key data will be stored in. If it doesn't exist, Badger will try to create it for you. This is set automatically to be the path given to `DefaultOptions`.

func (Options) WithEncryptionKey

func (opt Options) WithEncryptionKey(key []byte) Options

WithEncryptionKey is used to encrypt the data with AES. Type of AES is used based on the key size. For example 16 bytes will use AES-128. 24 bytes will use AES-192. 32 bytes will use AES-256.

func (Options) WithEncryptionKeyRotationDuration

func (opt Options) WithEncryptionKeyRotationDuration(d time.Duration) Options

WithEncryptionKeyRotationDuration returns new Options value with the duration set to the given value.

Key Registry will use this duration to create new keys. If the previous generated key exceed the given duration. Then the key registry will create new key.

func (Options) WithInMemory

func (opt Options) WithInMemory(b bool) Options

WithInMemory returns a new Options value with Inmemory mode set to the given value.

When badger is running in InMemory mode, everything is stored in memory. No value/sst files are created. In case of a crash all data will be lost.

func (Options) WithIndexCacheSize

func (opt Options) WithIndexCacheSize(size int64) Options

WithIndexCacheSize returns a new Options value with IndexCacheSize set to the given value.

This value specifies how much memory should be used by table indices. These indices include the block offsets and the bloomfilters. Badger uses bloom filters to speed up lookups. Each table has its own bloom filter and each bloom filter is approximately of 5 MB.

Zero value for IndexCacheSize means all the indices will be kept in memory and the cache is disabled.

The default value of IndexCacheSize is 0 which means all indices are kept in memory.

func (Options) WithLevelSizeMultiplier

func (opt Options) WithLevelSizeMultiplier(val int) Options

WithLevelSizeMultiplier returns a new Options value with LevelSizeMultiplier set to the given value.

LevelSizeMultiplier sets the ratio between the maximum sizes of contiguous levels in the LSM. Once a level grows to be larger than this ratio allowed, the compaction process will be

triggered.

The default value of LevelSizeMultiplier is 10.

func (Options) WithLogger

func (opt Options) WithLogger(val Logger) Options

WithLogger returns a new Options value with Logger set to the given value.

Logger provides a way to configure what logger each value of badger.DB uses.

The default value of Logger writes to stderr using the log package from the Go standard library.

func (Options) WithLoggingLevel

func (opt Options) WithLoggingLevel(val loggingLevel) Options

WithLoggingLevel returns a new Options value with logging level of the default logger set to the given value. LoggingLevel sets the level of logging. It should be one of DEBUG, INFO, WARNING or ERROR levels.

The default value of LoggingLevel is INFO.

func (Options) WithMaxLevels

func (opt Options) WithMaxLevels(val int) Options

WithMaxLevels returns a new Options value with MaxLevels set to the given value.

Maximum number of levels of compaction allowed in the LSM.

The default value of MaxLevels is 7.

func (Options) WithMemTableSize

func (opt Options) WithMemTableSize(val int64) Options

WithMemTableSize returns a new Options value with MemTableSize set to the given value.

MemTableSize sets the maximum size in bytes for memtable table.

The default value of MemTableSize is 64MB.

func (Options) WithMetricsEnabled added in v3.2103.0

func (opt Options) WithMetricsEnabled(val bool) Options

WithMetricsEnabled returns a new Options value with MetricsEnabled set to the given value.

When MetricsEnabled is set to false, then the DB will be opened and no badger metrics will be logged. Metrics are defined in metric.go file.

This flag is useful for use cases like in Dgraph where we open temporary badger instances to index data. In those cases we don't want badger metrics to be polluted with the noise from those temporary instances.

Default value is set to true

func (Options) WithNamespaceOffset added in v3.2103.0

func (opt Options) WithNamespaceOffset(offset int) Options

WithNamespaceOffset returns a new Options value with NamespaceOffset set to the given value. DB will expect the namespace in each key at the 8 bytes starting from NamespaceOffset. A negative value means that namespace is not stored in the key.

The default value for NamespaceOffset is -1.

func (Options) WithNumCompactors

func (opt Options) WithNumCompactors(val int) Options

WithNumCompactors sets the number of compaction workers to run concurrently. Setting this to zero stops compactions, which could eventually cause writes to block forever.

The default value of NumCompactors is 2. One is dedicated just for L0 and L1.

func (Options) WithNumGoroutines added in v3.2103.0

func (opt Options) WithNumGoroutines(val int) Options

WithNumGoroutines sets the number of goroutines to be used in Stream.

The default value of NumGoroutines is 8.

func (Options) WithNumLevelZeroTables

func (opt Options) WithNumLevelZeroTables(val int) Options

WithNumLevelZeroTables sets the maximum number of Level 0 tables before compaction starts.

The default value of NumLevelZeroTables is 5.

func (Options) WithNumLevelZeroTablesStall

func (opt Options) WithNumLevelZeroTablesStall(val int) Options

WithNumLevelZeroTablesStall sets the number of Level 0 tables that once reached causes the DB to stall until compaction succeeds.

The default value of NumLevelZeroTablesStall is 10.

func (Options) WithNumMemtables

func (opt Options) WithNumMemtables(val int) Options

WithNumMemtables returns a new Options value with NumMemtables set to the given value.

NumMemtables sets the maximum number of tables to keep in memory before stalling.

The default value of NumMemtables is 5.

func (Options) WithNumVersionsToKeep

func (opt Options) WithNumVersionsToKeep(val int) Options

WithNumVersionsToKeep returns a new Options value with NumVersionsToKeep set to the given value.

NumVersionsToKeep sets how many versions to keep per key at most.

The default value of NumVersionsToKeep is 1.

func (Options) WithReadOnly

func (opt Options) WithReadOnly(val bool) Options

WithReadOnly returns a new Options value with ReadOnly set to the given value.

When ReadOnly is true the DB will be opened on read-only mode. Multiple processes can open the same Badger DB. Note: if the DB being opened had crashed before and has vlog data to be replayed, ReadOnly will cause Open to fail with an appropriate message.

The default value of ReadOnly is false.

func (Options) WithSyncWrites

func (opt Options) WithSyncWrites(val bool) Options

WithSyncWrites returns a new Options value with SyncWrites set to the given value.

Badger does all writes via mmap. So, all writes can survive process crashes or k8s environments with SyncWrites set to false.

When set to true, Badger would call an additional msync after writes to flush mmap buffer over to disk to survive hard reboots. Most users of Badger should not need to do this.

The default value of SyncWrites is false.

func (Options) WithVLogPercentile added in v3.2103.0

func (opt Options) WithVLogPercentile(t float64) Options

WithVLogPercentile returns a new Options value with ValLogPercentile set to given value.

VLogPercentile with 0.0 means no dynamic thresholding is enabled. MinThreshold value will always act as the value threshold.

VLogPercentile with value 0.99 means 99 percentile of value will be put in LSM tree and only 1 percent in vlog. The value threshold will be dynamically updated within the range of [ValueThreshold, Options.maxValueThreshold]

Say VLogPercentile with 1.0 means threshold will eventually set to Options.maxValueThreshold

The default value of VLogPercentile is 0.0.

func (Options) WithValueDir

func (opt Options) WithValueDir(val string) Options

WithValueDir returns a new Options value with ValueDir set to the given value.

ValueDir is the path of the directory where value data will be stored in. If it doesn't exist, Badger will try to create it for you. This is set automatically to be the path given to `DefaultOptions`.

func (Options) WithValueLogFileSize

func (opt Options) WithValueLogFileSize(val int64) Options

WithValueLogFileSize sets the maximum size of a single value log file.

The default value of ValueLogFileSize is 1GB.

func (Options) WithValueLogMaxEntries

func (opt Options) WithValueLogMaxEntries(val uint32) Options

WithValueLogMaxEntries sets the maximum number of entries a value log file can hold approximately. A actual size limit of a value log file is the minimum of ValueLogFileSize and ValueLogMaxEntries.

The default value of ValueLogMaxEntries is one million (1000000).

func (Options) WithValueThreshold

func (opt Options) WithValueThreshold(val int64) Options

WithValueThreshold returns a new Options value with ValueThreshold set to the given value.

ValueThreshold sets the threshold used to decide whether a value is stored directly in the LSM tree or separately in the log value files.

The default value of ValueThreshold is 1 MB, but LSMOnlyOptions sets it to maxValueThreshold.

func (Options) WithVerifyValueChecksum

func (opt Options) WithVerifyValueChecksum(val bool) Options

WithVerifyValueChecksum is used to set VerifyValueChecksum. When VerifyValueChecksum is set to true, checksum will be verified for every entry read from the value log. If the value is stored in SST (value size less than value threshold) then the checksum validation will not be done.

The default value of VerifyValueChecksum is False.

func (Options) WithZSTDCompressionLevel

func (opt Options) WithZSTDCompressionLevel(cLevel int) Options

WithZSTDCompressionLevel returns a new Options value with ZSTDCompressionLevel set to the given value.

The ZSTD compression algorithm supports 20 compression levels. The higher the compression level, the better is the compression ratio but lower is the performance. Lower levels have better performance and higher levels have better compression ratios. We recommend using level 1 ZSTD Compression Level. Any level higher than 1 seems to deteriorate badger's performance. The following benchmarks were done on a 4 KB block size (default block size). The compression is ratio supposed to increase with increasing compression level but since the input for compression algorithm is small (4 KB), we don't get significant benefit at level 3. It is advised to write your own benchmarks before choosing a compression algorithm or level.

NOTE: The benchmarks are with DataDog ZSTD that requires CGO. Hence, no longer valid. no_compression-16 10 502848865 ns/op 165.46 MB/s - zstd_compression/level_1-16 7 739037966 ns/op 112.58 MB/s 2.93 zstd_compression/level_3-16 7 756950250 ns/op 109.91 MB/s 2.72 zstd_compression/level_15-16 1 11135686219 ns/op 7.47 MB/s 4.38 Benchmark code can be found in table/builder_test.go file

type Sequence

type Sequence struct {
	// contains filtered or unexported fields
}

Sequence represents a Badger sequence.

func (*Sequence) Next

func (seq *Sequence) Next() (uint64, error)

Next would return the next integer in the sequence, updating the lease by running a transaction if needed.

func (*Sequence) Release

func (seq *Sequence) Release() error

Release the leased sequence to avoid wasted integers. This should be done right before closing the associated DB. However it is valid to use the sequence after it was released, causing a new lease with full bandwidth.

type Stream

type Stream struct {
	// Prefix to only iterate over certain range of keys. If set to nil (default), Stream would
	// iterate over the entire DB.
	Prefix []byte

	// Number of goroutines to use for iterating over key ranges. Defaults to 8.
	NumGo int

	// Badger would produce log entries in Infof to indicate the progress of Stream. LogPrefix can
	// be used to help differentiate them from other activities. Default is "Badger.Stream".
	LogPrefix string

	// ChooseKey is invoked each time a new key is encountered. Note that this is not called
	// on every version of the value, only the first encountered version (i.e. the highest version
	// of the value a key has). ChooseKey can be left nil to select all keys.
	//
	// Note: Calls to ChooseKey are concurrent.
	ChooseKey func(item *Item) bool

	// KeyToList, similar to ChooseKey, is only invoked on the highest version of the value. It
	// is upto the caller to iterate over the versions and generate zero, one or more KVs. It
	// is expected that the user would advance the iterator to go through the versions of the
	// values. However, the user MUST immediately return from this function on the first encounter
	// with a mismatching key. See example usage in ToList function. Can be left nil to use ToList
	// function by default.
	//
	// KeyToList has access to z.Allocator accessible via stream.Allocator(itr.ThreadId). This
	// allocator can be used to allocate KVs, to decrease the memory pressure on Go GC. Stream
	// framework takes care of releasing those resources after calling Send. AllocRef does
	// NOT need to be set in the returned KVList, as Stream framework would ignore that field,
	// instead using the allocator assigned to that thread id.
	//
	// Note: Calls to KeyToList are concurrent.
	KeyToList func(key []byte, itr *Iterator) (*pb.KVList, error)

	// This is the method where Stream sends the final output. All calls to Send are done by a
	// single goroutine, i.e. logic within Send method can expect single threaded execution.
	Send func(buf *z.Buffer) error

	// Read data above the sinceTs. All keys with version =< sinceTs will be ignored.
	SinceTs uint64
	// contains filtered or unexported fields
}

Stream provides a framework to concurrently iterate over a snapshot of Badger, pick up key-values, batch them up and call Send. Stream does concurrent iteration over many smaller key ranges. It does NOT send keys in lexicographical sorted order. To get keys in sorted order, use Iterator.

func (*Stream) Backup

func (stream *Stream) Backup(w io.Writer, since uint64) (uint64, error)

Backup dumps a protobuf-encoded list of all entries in the database into the given writer, that are newer than or equal to the specified version. It returns a timestamp(version) indicating the version of last entry that was dumped, which after incrementing by 1 can be passed into a later invocation to generate an incremental dump of entries that have been added/modified since the last invocation of Stream.Backup().

This can be used to backup the data in a database at a given point in time.

func (*Stream) Orchestrate

func (st *Stream) Orchestrate(ctx context.Context) error

Orchestrate runs Stream. It picks up ranges from the SSTables, then runs NumGo number of goroutines to iterate over these ranges and batch up KVs in lists. It concurrently runs a single goroutine to pick these lists, batch them up further and send to Output.Send. Orchestrate also spits logs out to Infof, using provided LogPrefix. Note that all calls to Output.Send are serial. In case any of these steps encounter an error, Orchestrate would stop execution and return that error. Orchestrate can be called multiple times, but in serial order.

func (*Stream) SendDoneMarkers

func (st *Stream) SendDoneMarkers(done bool)

SendDoneMarkers when true would send out done markers on the stream. False by default.

func (*Stream) ToList

func (st *Stream) ToList(key []byte, itr *Iterator) (*pb.KVList, error)

ToList is a default implementation of KeyToList. It picks up all valid versions of the key, skipping over deleted or expired keys.

type StreamWriter

type StreamWriter struct {
	// contains filtered or unexported fields
}

StreamWriter is used to write data coming from multiple streams. The streams must not have any overlapping key ranges. Within each stream, the keys must be sorted. Badger Stream framework is capable of generating such an output. So, this StreamWriter can be used at the other end to build BadgerDB at a much faster pace by writing SSTables (and value logs) directly to LSM tree levels without causing any compactions at all. This is way faster than using batched writer or using transactions, but only applicable in situations where the keys are pre-sorted and the DB is being bootstrapped. Existing data would get deleted when using this writer. So, this is only useful when restoring from backup or replicating DB across servers.

StreamWriter should not be called on in-use DB instances. It is designed only to bootstrap new DBs.

func (*StreamWriter) Cancel

func (sw *StreamWriter) Cancel()

Cancel signals all goroutines to exit. Calling defer sw.Cancel() immediately after creating a new StreamWriter ensures that writes are unblocked even upon early return. Note that dropAll() is not called here, so any partially written data will not be erased until a new StreamWriter is initialized.

func (*StreamWriter) Flush

func (sw *StreamWriter) Flush() error

Flush is called once we are done writing all the entries. It syncs DB directories. It also updates Oracle with maxVersion found in all entries (if DB is not managed).

func (*StreamWriter) Prepare

func (sw *StreamWriter) Prepare() error

Prepare should be called before writing any entry to StreamWriter. It deletes all data present in existing DB, stops compactions and any writes being done by other means. Be very careful when calling Prepare, because it could result in permanent data loss. Not calling Prepare would result in a corrupt Badger instance.

func (*StreamWriter) Write

func (sw *StreamWriter) Write(buf *z.Buffer) error

Write writes KVList to DB. Each KV within the list contains the stream id which StreamWriter would use to demux the writes. Write is thread safe and can be called concurrently by multiple goroutines.

type TableInfo

type TableInfo struct {
	ID               uint64
	Level            int
	Left             []byte
	Right            []byte
	KeyCount         uint32 // Number of keys in the table
	OnDiskSize       uint32
	StaleDataSize    uint32
	UncompressedSize uint32
	MaxVersion       uint64
	IndexSz          int
	BloomFilterSize  int
}

TableInfo represents the information about a table.

type TableManifest

type TableManifest struct {
	Level       uint8
	KeyID       uint64
	Compression options.CompressionType
}

TableManifest contains information about a specific table in the LSM tree.

type Txn

type Txn struct {
	// contains filtered or unexported fields
}

Txn represents a Badger transaction.

func (*Txn) Commit

func (txn *Txn) Commit() error

Commit commits the transaction, following these steps:

1. If there are no writes, return immediately.

2. Check if read rows were updated since txn started. If so, return ErrConflict.

3. If no conflict, generate a commit timestamp and update written rows' commit ts.

4. Batch up all writes, write them to value log and LSM tree.

5. If callback is provided, Badger will return immediately after checking for conflicts. Writes to the database will happen in the background. If there is a conflict, an error will be returned and the callback will not run. If there are no conflicts, the callback will be called in the background upon successful completion of writes or any error during write.

If error is nil, the transaction is successfully committed. In case of a non-nil error, the LSM tree won't be updated, so there's no need for any rollback.

func (*Txn) CommitAt

func (txn *Txn) CommitAt(commitTs uint64, callback func(error)) error

CommitAt commits the transaction, following the same logic as Commit(), but at the given commit timestamp. This will panic if not used with managed transactions.

This is only useful for databases built on top of Badger (like Dgraph), and can be ignored by most users.

func (*Txn) CommitWith

func (txn *Txn) CommitWith(cb func(error))

CommitWith acts like Commit, but takes a callback, which gets run via a goroutine to avoid blocking this function. The callback is guaranteed to run, so it is safe to increment sync.WaitGroup before calling CommitWith, and decrementing it in the callback; to block until all callbacks are run.

func (*Txn) Delete

func (txn *Txn) Delete(key []byte) error

Delete deletes a key.

This is done by adding a delete marker for the key at commit timestamp. Any reads happening before this timestamp would be unaffected. Any reads after this commit would see the deletion.

The current transaction keeps a reference to the key byte slice argument. Users must not modify the key until the end of the transaction.

func (*Txn) Discard

func (txn *Txn) Discard()

Discard discards a created transaction. This method is very important and must be called. Commit method calls this internally, however, calling this multiple times doesn't cause any issues. So, this can safely be called via a defer right when transaction is created.

NOTE: If any operations are run on a discarded transaction, ErrDiscardedTxn is returned.

func (*Txn) Get

func (txn *Txn) Get(key []byte) (item *Item, rerr error)

Get looks for key and returns corresponding Item. If key is not found, ErrKeyNotFound is returned.

func (*Txn) NewIterator

func (txn *Txn) NewIterator(opt IteratorOptions) *Iterator

NewIterator returns a new iterator. Depending upon the options, either only keys, or both key-value pairs would be fetched. The keys are returned in lexicographically sorted order. Using prefetch is recommended if you're doing a long running iteration, for performance.

Multiple Iterators: For a read-only txn, multiple iterators can be running simultaneously. However, for a read-write txn, iterators have the nuance of being a snapshot of the writes for the transaction at the time iterator was created. If writes are performed after an iterator is created, then that iterator will not be able to see those writes. Only writes performed before an iterator was created can be viewed.

Example
dir, err := ioutil.TempDir("", "badger-test")
if err != nil {
	panic(err)
}
defer removeDir(dir)

db, err := Open(DefaultOptions(dir))
if err != nil {
	panic(err)
}
defer db.Close()

bkey := func(i int) []byte {
	return []byte(fmt.Sprintf("%09d", i))
}
bval := func(i int) []byte {
	return []byte(fmt.Sprintf("%025d", i))
}

txn := db.NewTransaction(true)

// Fill in 1000 items
n := 1000
for i := 0; i < n; i++ {
	err := txn.SetEntry(NewEntry(bkey(i), bval(i)))
	if err != nil {
		panic(err)
	}
}

err = txn.Commit()
if err != nil {
	panic(err)
}

opt := DefaultIteratorOptions
opt.PrefetchSize = 10

// Iterate over 1000 items
var count int
err = db.View(func(txn *Txn) error {
	it := txn.NewIterator(opt)
	defer it.Close()
	for it.Rewind(); it.Valid(); it.Next() {
		count++
	}
	return nil
})
if err != nil {
	panic(err)
}
fmt.Printf("Counted %d elements", count)
Output:

Counted 1000 elements

func (*Txn) NewKeyIterator

func (txn *Txn) NewKeyIterator(key []byte, opt IteratorOptions) *Iterator

NewKeyIterator is just like NewIterator, but allows the user to iterate over all versions of a single key. Internally, it sets the Prefix option in provided opt, and uses that prefix to additionally run bloom filter lookups before picking tables from the LSM tree.

func (*Txn) ReadTs

func (txn *Txn) ReadTs() uint64

ReadTs returns the read timestamp of the transaction.

func (*Txn) Set

func (txn *Txn) Set(key, val []byte) error

Set adds a key-value pair to the database. It will return ErrReadOnlyTxn if update flag was set to false when creating the transaction.

The current transaction keeps a reference to the key and val byte slice arguments. Users must not modify key and val until the end of the transaction.

func (*Txn) SetEntry

func (txn *Txn) SetEntry(e *Entry) error

SetEntry takes an Entry struct and adds the key-value pair in the struct, along with other metadata to the database.

The current transaction keeps a reference to the entry passed in argument. Users must not modify the entry until the end of the transaction.

type WriteBatch

type WriteBatch struct {
	sync.Mutex
	// contains filtered or unexported fields
}

WriteBatch holds the necessary info to perform batched writes.

func (*WriteBatch) Cancel

func (wb *WriteBatch) Cancel()

Cancel function must be called if there's a chance that Flush might not get called. If neither Flush or Cancel is called, the transaction oracle would never get a chance to clear out the row commit timestamp map, thus causing an unbounded memory consumption. Typically, you can call Cancel as a defer statement right after NewWriteBatch is called.

Note that any committed writes would still go through despite calling Cancel.

func (*WriteBatch) Delete

func (wb *WriteBatch) Delete(k []byte) error

Delete is equivalent of Txn.Delete.

func (*WriteBatch) DeleteAt

func (wb *WriteBatch) DeleteAt(k []byte, ts uint64) error

DeleteAt is equivalent of Txn.Delete but accepts a delete timestamp.

func (*WriteBatch) Error

func (wb *WriteBatch) Error() error

Error returns any errors encountered so far. No commits would be run once an error is detected.

func (*WriteBatch) Flush

func (wb *WriteBatch) Flush() error

Flush must be called at the end to ensure that any pending writes get committed to Badger. Flush returns any error stored by WriteBatch.

func (*WriteBatch) Set

func (wb *WriteBatch) Set(k, v []byte) error

Set is equivalent of Txn.Set().

func (*WriteBatch) SetEntry

func (wb *WriteBatch) SetEntry(e *Entry) error

SetEntry is the equivalent of Txn.SetEntry.

func (*WriteBatch) SetEntryAt

func (wb *WriteBatch) SetEntryAt(e *Entry, ts uint64) error

SetEntryAt is the equivalent of Txn.SetEntry but it also allows setting version for the entry. SetEntryAt can be used only in managed mode.

func (*WriteBatch) SetMaxPendingTxns

func (wb *WriteBatch) SetMaxPendingTxns(max int)

SetMaxPendingTxns sets a limit on maximum number of pending transactions while writing batches. This function should be called before using WriteBatch. Default value of MaxPendingTxns is 16 to minimise memory usage.

func (*WriteBatch) Write

func (wb *WriteBatch) Write(buf *z.Buffer) error

func (*WriteBatch) WriteList

func (wb *WriteBatch) WriteList(kvList *pb.KVList) error

Directories

Path Synopsis
cmd
integration
y

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL