Documentation ¶
Overview ¶
Package engine provides low-level storage. It interacts with storage backends (e.g. LevelDB, RocksDB, etc.) via the Engine interface. At one level higher, MVCC provides multi-version concurrency control capability on top of an Engine instance.
The Engine interface provides an API for key-value stores. InMem implements an in-memory engine using a sorted map. RocksDB implements an engine for data stored to local disk using RocksDB, a variant of LevelDB.
MVCC provides a multi-version concurrency control system on top of an engine. MVCC is the basis for Cockroach's support for distributed transactions. It is intended for direct use from storage.Range objects.
Notes on MVCC architecture ¶
Each MVCC value contains a metadata key/value pair and one or more version key/value pairs. The MVCC metadata key is the actual key for the value, binary encoded using the SQL binary encoding scheme which contains a sentinel byte of 0x25, following by a 7-bit encoding of the key data with 1s in the high bit and terminated by a nil byte. The MVCC metadata value is of type MVCCMetadata and contains the most recent version timestamp and an optional proto.Transaction message. If set, the most recent version of the MVCC value is a transactional "intent". It also contains some information on the size of the most recent version's key and value for efficient stat counter computations.
Each MVCC version key/value pair has a key which is also binary-encoded, but is suffixed with a decreasing, big-endian encoding of the timestamp (8 bytes for the nanosecond wall time, followed by 4 bytes for the logical time). The MVCC version value is a message of type MVCCValue which indicates whether the version is a deletion timestamp and if not, contains a proto.Value object which holds the actual value. The decreasing encoding on the timestamp sorts the most recent version directly after the metadata key. This increases the likelihood that an Engine.Get() of the MVCC metadata will get the same block containing the most recent version, even if there are many versions. We rely on getting the MVCC metadata key/value and then using it to directly get the MVCC version using the metadata's most recent version timestamp. This avoids using an expensive merge iterator to scan the most recent version. It also allows us to leverage RocksDB's bloom filters.
The binary encoding used on the MVCC keys allows arbitrary keys to be stored in the map (no restrictions on intermediate nil-bytes, for example), while still sorting lexicographically and guaranteeing that all timestamp-suffixed MVCC version keys sort consecutively with the metadata key. We use an escape-based encoding which transforms all nul ("\x00") characters in the key and is terminated with the sequence "\x00\x01", which is guaranteed to not occur elsewhere in the encoded value. See util/encoding/encoding.go for more details.
We considered inlining the most recent MVCC version in the MVCCMetadata. This would reduce the storage overhead of storing the same key twice (which is small due to block compression), and the runtime overhead of two separate DB lookups. On the other hand, all writes that create a new version of an existing key would incur a double write as the previous value is moved out of the MVCCMetadata into its versioned key. Preliminary benchmarks have not shown enough performance improvement to justify this change, although we may revisit this decision if it turns out that multiple versions of the same key are rare in practice.
However, we do allow inlining in order to use the MVCC interface to store non-versioned values. It turns out that not everything which Cockroach needs to store would be efficient or possible using MVCC. Examples include transaction records, response cache entries, stats counters, time series data, and system-local config values. However, supporting a mix of encodings is problematic in terms of resulting complexity. So Cockroach treats an MVCC timestamp of zero to mean an inlined, non-versioned value. These values are replaced if they exist on a Put operation and are cleared from the engine on a delete. Importantly, zero-timestamped MVCC values may be merged, as is necessary for stats counters and time series data.
Package engine is a generated protocol buffer package. It is generated from these files: cockroach/storage/engine/mvcc.proto It has these top-level messages: MVCCValue MVCCMetadata MVCCStats
Index ¶
- Variables
- func ClearRange(engine Engine, start, end proto.EncodedKey) (int, error)
- func Increment(engine Engine, key proto.EncodedKey, inc int64) (int64, error)
- func IsValidSplitKey(key proto.Key) bool
- func MVCCComputeGCBytesAge(bytes, ageSeconds int64) int64
- func MVCCConditionalPut(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, ...) error
- func MVCCDecodeKey(encodedKey proto.EncodedKey) (proto.Key, proto.Timestamp, bool)
- func MVCCDelete(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, ...) error
- func MVCCDeleteRange(engine Engine, ms *MVCCStats, key, endKey proto.Key, max int64, ...) (int64, error)
- func MVCCEncodeKey(key proto.Key) proto.EncodedKey
- func MVCCEncodeVersionKey(key proto.Key, timestamp proto.Timestamp) proto.EncodedKey
- func MVCCFindSplitKey(engine Engine, rangeID proto.RangeID, key, endKey proto.Key) (proto.Key, error)
- func MVCCGarbageCollect(engine Engine, ms *MVCCStats, keys []proto.GCRequest_GCKey, ...) error
- func MVCCGet(engine Engine, key proto.Key, timestamp proto.Timestamp, consistent bool, ...) (*proto.Value, []proto.Intent, error)
- func MVCCGetProto(engine Engine, key proto.Key, timestamp proto.Timestamp, consistent bool, ...) (bool, error)
- func MVCCGetRangeStats(engine Engine, rangeID proto.RangeID, ms *MVCCStats) error
- func MVCCIncrement(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, ...) (int64, error)
- func MVCCIterate(engine Engine, startKey, endKey proto.Key, timestamp proto.Timestamp, ...) ([]proto.Intent, error)
- func MVCCMerge(engine Engine, ms *MVCCStats, key proto.Key, value proto.Value) error
- func MVCCPut(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, ...) error
- func MVCCPutProto(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, ...) error
- func MVCCResolveWriteIntent(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, ...) error
- func MVCCResolveWriteIntentRange(engine Engine, ms *MVCCStats, key, endKey proto.Key, max int64, ...) (int64, error)
- func MVCCReverseScan(engine Engine, key, endKey proto.Key, max int64, timestamp proto.Timestamp, ...) ([]proto.KeyValue, []proto.Intent, error)
- func MVCCScan(engine Engine, key, endKey proto.Key, max int64, timestamp proto.Timestamp, ...) ([]proto.KeyValue, []proto.Intent, error)
- func MVCCSetRangeStats(engine Engine, rangeID proto.RangeID, ms *MVCCStats) error
- func MergeInternalTimeSeriesData(sources ...*proto.InternalTimeSeriesData) (*proto.InternalTimeSeriesData, error)
- func PutProto(engine Engine, key proto.EncodedKey, msg gogoproto.Message) (keyBytes, valBytes int64, err error)
- func Scan(engine Engine, start, end proto.EncodedKey, max int64) ([]proto.RawKeyValue, error)
- type Engine
- type GarbageCollector
- type InMem
- type Iterator
- type MVCCMetadata
- func (m *MVCCMetadata) GetDeleted() bool
- func (m *MVCCMetadata) GetKeyBytes() int64
- func (m *MVCCMetadata) GetTimestamp() cockroach_proto1.Timestamp
- func (m *MVCCMetadata) GetTxn() *cockroach_proto1.Transaction
- func (m *MVCCMetadata) GetValBytes() int64
- func (m *MVCCMetadata) GetValue() *cockroach_proto1.Value
- func (meta MVCCMetadata) HasWriteIntentError(txn *proto.Transaction) bool
- func (meta MVCCMetadata) IsInline() bool
- func (meta MVCCMetadata) IsIntentOf(txn *proto.Transaction) bool
- func (m *MVCCMetadata) Marshal() (data []byte, err error)
- func (m *MVCCMetadata) MarshalTo(data []byte) (int, error)
- func (*MVCCMetadata) ProtoMessage()
- func (m *MVCCMetadata) Reset()
- func (m *MVCCMetadata) Size() (n int)
- func (m *MVCCMetadata) String() string
- func (m *MVCCMetadata) Unmarshal(data []byte) error
- type MVCCStats
- func (ms *MVCCStats) Add(oms *MVCCStats)
- func (ms *MVCCStats) Delta(oms *MVCCStats) MVCCStats
- func (m *MVCCStats) GetGCBytesAge() int64
- func (m *MVCCStats) GetIntentAge() int64
- func (m *MVCCStats) GetIntentBytes() int64
- func (m *MVCCStats) GetIntentCount() int64
- func (m *MVCCStats) GetKeyBytes() int64
- func (m *MVCCStats) GetKeyCount() int64
- func (m *MVCCStats) GetLastUpdateNanos() int64
- func (m *MVCCStats) GetLiveBytes() int64
- func (m *MVCCStats) GetLiveCount() int64
- func (m *MVCCStats) GetSysBytes() int64
- func (m *MVCCStats) GetSysCount() int64
- func (m *MVCCStats) GetValBytes() int64
- func (m *MVCCStats) GetValCount() int64
- func (m *MVCCStats) Marshal() (data []byte, err error)
- func (m *MVCCStats) MarshalTo(data []byte) (int, error)
- func (*MVCCStats) ProtoMessage()
- func (m *MVCCStats) Reset()
- func (m *MVCCStats) Size() (n int)
- func (m *MVCCStats) String() string
- func (ms *MVCCStats) Subtract(oms *MVCCStats)
- func (m *MVCCStats) Unmarshal(data []byte) error
- type MVCCValue
- func (m *MVCCValue) GetDeleted() bool
- func (m *MVCCValue) GetValue() *cockroach_proto1.Value
- func (m *MVCCValue) Marshal() (data []byte, err error)
- func (m *MVCCValue) MarshalTo(data []byte) (int, error)
- func (*MVCCValue) ProtoMessage()
- func (m *MVCCValue) Reset()
- func (m *MVCCValue) Size() (n int)
- func (m *MVCCValue) String() string
- func (m *MVCCValue) Unmarshal(data []byte) error
- type RocksDB
- func (r *RocksDB) ApproximateSize(start, end proto.EncodedKey) (uint64, error)
- func (r *RocksDB) Attrs() proto.Attributes
- func (r *RocksDB) Capacity() (proto.StoreCapacity, error)
- func (r *RocksDB) Clear(key proto.EncodedKey) error
- func (r *RocksDB) Close()
- func (r *RocksDB) Commit() error
- func (r *RocksDB) CompactRange(start, end proto.EncodedKey)
- func (r *RocksDB) Defer(func())
- func (r *RocksDB) Destroy() error
- func (r *RocksDB) Flush() error
- func (r *RocksDB) Get(key proto.EncodedKey) ([]byte, error)
- func (r *RocksDB) GetProto(key proto.EncodedKey, msg gogoproto.Message) (ok bool, keyBytes, valBytes int64, err error)
- func (r *RocksDB) Iterate(start, end proto.EncodedKey, f func(proto.RawKeyValue) (bool, error)) error
- func (r *RocksDB) Merge(key proto.EncodedKey, value []byte) error
- func (r *RocksDB) NewBatch() Engine
- func (r *RocksDB) NewIterator() Iterator
- func (r *RocksDB) NewSnapshot() Engine
- func (r *RocksDB) Open() error
- func (r *RocksDB) Put(key proto.EncodedKey, value []byte) error
- func (r *RocksDB) SetGCTimeouts(minTxnTS, minRCacheTS int64)
- func (r *RocksDB) String() string
Constants ¶
This section is empty.
Variables ¶
var (
ErrInvalidLengthMvcc = fmt.Errorf("proto: negative length found during unmarshaling")
)
var ( // MVCCKeyMax is a maximum mvcc-encoded key value which sorts after // all other keys. MVCCKeyMax = MVCCEncodeKey(proto.KeyMax) )
Functions ¶
func ClearRange ¶
func ClearRange(engine Engine, start, end proto.EncodedKey) (int, error)
ClearRange removes a set of entries, from start (inclusive) to end (exclusive). This function returns the number of entries removed. Either all entries within the range will be deleted, or none, and an error will be returned. Note that this function actually removes entries from the storage engine, rather than inserting tombstones, as with deletion through the MVCC.
func Increment ¶
Increment fetches the varint encoded int64 value specified by key and adds "inc" to it then re-encodes as varint. The newly incremented value is returned.
func IsValidSplitKey ¶
IsValidSplitKey returns whether the key is a valid split key. Certain key ranges cannot be split; split keys chosen within any of these ranges are considered invalid.
- \x00\x00meta1 < SplitKey < \x00\x00meta2
- \x00zone < SplitKey < \x00zonf
And split key equal to Meta2KeyMax (\x00\x00meta2\xff\xff) is considered invalid.
func MVCCComputeGCBytesAge ¶
MVCCComputeGCBytesAge comptues the value to assign to the specified number of bytes, at the given age (in seconds).
func MVCCConditionalPut ¶
func MVCCConditionalPut(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, value proto.Value, expValue *proto.Value, txn *proto.Transaction) error
MVCCConditionalPut sets the value for a specified key only if the expected value matches. If not, the return a ConditionFailedError containing the actual value.
The condition check reads a value from the key using the same operational timestamp as we use to write a value.
func MVCCDecodeKey ¶
MVCCDecodeKey decodes encodedKey by binary decoding the leading bytes of encodedKey. If there are no remaining bytes, returns the decoded key, an empty timestamp, and false, to indicate the key is for an MVCC metadata or a raw value. Otherwise, there must be exactly 12 trailing bytes and they're decoded into a timestamp. The decoded key, timestamp and true are returned to indicate the key is for an MVCC versioned value.
func MVCCDelete ¶
func MVCCDelete(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, txn *proto.Transaction) error
MVCCDelete marks the key deleted so that it will not be returned in future get responses.
func MVCCDeleteRange ¶
func MVCCDeleteRange(engine Engine, ms *MVCCStats, key, endKey proto.Key, max int64, timestamp proto.Timestamp, txn *proto.Transaction) (int64, error)
MVCCDeleteRange deletes the range of key/value pairs specified by start and end keys. Specify max=0 for unbounded deletes.
func MVCCEncodeKey ¶
func MVCCEncodeKey(key proto.Key) proto.EncodedKey
MVCCEncodeKey makes an MVCC key for storing MVCC metadata or for storing raw values directly. Use MVCCEncodeVersionValue for storing timestamped version values.
func MVCCEncodeVersionKey ¶
MVCCEncodeVersionKey makes an MVCC version key, which consists of a binary-encoding of key, followed by a decreasing encoding of the timestamp, so that more recent versions sort first.
func MVCCFindSplitKey ¶
func MVCCFindSplitKey(engine Engine, rangeID proto.RangeID, key, endKey proto.Key) (proto.Key, error)
MVCCFindSplitKey suggests a split key from the given user-space key range that aims to roughly cut into half the total number of bytes used (in raw key and value byte strings) in both subranges. Specify a snapshot engine to safely invoke this method in a goroutine.
The split key will never be chosen from the key ranges listed in illegalSplitKeySpans.
func MVCCGarbageCollect ¶
func MVCCGarbageCollect(engine Engine, ms *MVCCStats, keys []proto.GCRequest_GCKey, timestamp proto.Timestamp) error
MVCCGarbageCollect creates an iterator on the engine. In parallel it iterates through the keys listed for garbage collection by the keys slice. The engine iterator is seeked in turn to each listed key, clearing all values with timestamps <= to expiration.
func MVCCGet ¶
func MVCCGet(engine Engine, key proto.Key, timestamp proto.Timestamp, consistent bool, txn *proto.Transaction) (*proto.Value, []proto.Intent, error)
MVCCGet returns the value for the key specified in the request, while satisfying the given timestamp condition. The key may contain arbitrary bytes. If no value for the key exists, or it has been deleted, returns nil for value.
The values of multiple versions for the given key should be organized as follows: ... keyA : MVCCMetadata of keyA keyA_Timestamp_n : value of version_n keyA_Timestamp_n-1 : value of version_n-1 ... keyA_Timestamp_0 : value of version_0 keyB : MVCCMetadata of keyB ...
The consistent parameter indicates that intents should cause WriteIntentErrors. If set to false, a possible intent on the key will be ignored for reading the value (but returned via the proto.Intent slice); the previous value (if any) is read instead.
func MVCCGetProto ¶
func MVCCGetProto(engine Engine, key proto.Key, timestamp proto.Timestamp, consistent bool, txn *proto.Transaction, msg gogoproto.Message) (bool, error)
MVCCGetProto fetches the value at the specified key and unmarshals it using a protobuf decoder. Returns true on success or false if the key was not found. In the event of a WriteIntentError when consistent=false, we return the error and the decoded result; for all other errors (or when consistent=true) the decoded value is invalid.
func MVCCGetRangeStats ¶
MVCCGetRangeStats reads stat counters for the specified range and sets the values in the supplied MVCCStats struct.
func MVCCIncrement ¶
func MVCCIncrement(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, txn *proto.Transaction, inc int64) (int64, error)
MVCCIncrement fetches the value for key, and assuming the value is an "integer" type, increments it by inc and stores the new value. The newly incremented value is returned.
An initial value is read from the key using the same operational timestamp as we use to write a value.
func MVCCIterate ¶
func MVCCIterate(engine Engine, startKey, endKey proto.Key, timestamp proto.Timestamp, consistent bool, txn *proto.Transaction, reverse bool, f func(proto.KeyValue) (bool, error)) ([]proto.Intent, error)
MVCCIterate iterates over the key range [start,end). At each step of the iteration, f() is invoked with the current key/value pair. If f returns true (done) or an error, the iteration stops and the error is propagated. If the reverse is flag set the iterator will be moved in reverse order.
func MVCCMerge ¶
MVCCMerge implements a merge operation. Merge adds integer values, concatenates undifferentiated byte slice values, and efficiently combines time series observations if the proto.Value tag value indicates the value byte slice is of type _CR_TS (the internal cockroach time series data tag).
func MVCCPut ¶
func MVCCPut(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, value proto.Value, txn *proto.Transaction) error
MVCCPut sets the value for a specified key. It will save the value with different versions according to its timestamp and update the key metadata.
If the timestamp is specifed as proto.ZeroTimestamp, the value is inlined instead of being written as a timestamp-versioned value. A zero timestamp write to a key precludes a subsequent write using a non-zero timestamp and vice versa. Inlined values require only a single row and never accumulate more than a single value. Successive zero timestamp writes to a key replace the value and deletes clear the value. In addition, zero timestamp values may be merged.
func MVCCPutProto ¶
func MVCCPutProto(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, txn *proto.Transaction, msg gogoproto.Message) error
MVCCPutProto sets the given key to the protobuf-serialized byte string of msg and the provided timestamp.
func MVCCResolveWriteIntent ¶
func MVCCResolveWriteIntent(engine Engine, ms *MVCCStats, key proto.Key, timestamp proto.Timestamp, txn *proto.Transaction) error
MVCCResolveWriteIntent either commits or aborts (rolls back) an extant write intent for a given txn according to commit parameter. ResolveWriteIntent will skip write intents of other txns.
Transaction epochs deserve a bit of explanation. The epoch for a transaction is incremented on transaction retry. Transaction retry is different from abort. Retries occur in SSI transactions when the commit timestamp is not equal to the proposed transaction timestamp. This might be because writes to different keys had to use higher timestamps than expected because of existing, committed value, or because reads pushed the transaction's commit timestamp forward. Retries also occur in the event that the txn tries to push another txn in order to write an intent but fails (i.e. it has lower priority).
Because successive retries of a transaction may end up writing to different keys, the epochs serve to classify which intents get committed in the event the transaction succeeds (all those with epoch matching the commit epoch), and which intents get aborted, even if the transaction succeeds.
func MVCCResolveWriteIntentRange ¶
func MVCCResolveWriteIntentRange(engine Engine, ms *MVCCStats, key, endKey proto.Key, max int64, timestamp proto.Timestamp, txn *proto.Transaction) (int64, error)
MVCCResolveWriteIntentRange commits or aborts (rolls back) the range of write intents specified by start and end keys for a given txn. ResolveWriteIntentRange will skip write intents of other txns. Specify max=0 for unbounded resolves.
func MVCCReverseScan ¶
func MVCCReverseScan(engine Engine, key, endKey proto.Key, max int64, timestamp proto.Timestamp, consistent bool, txn *proto.Transaction) ([]proto.KeyValue, []proto.Intent, error)
MVCCReverseScan scans the key range [start,end) key up to some maximum number of results in descending order. Specify max=0 for unbounded scans.
func MVCCScan ¶
func MVCCScan(engine Engine, key, endKey proto.Key, max int64, timestamp proto.Timestamp, consistent bool, txn *proto.Transaction) ([]proto.KeyValue, []proto.Intent, error)
MVCCScan scans the key range [start,end) key up to some maximum number of results in ascending order. Specify max=0 for unbounded scans.
func MVCCSetRangeStats ¶
MVCCSetRangeStats sets stat counters for specified range.
func MergeInternalTimeSeriesData ¶
func MergeInternalTimeSeriesData(sources ...*proto.InternalTimeSeriesData) ( *proto.InternalTimeSeriesData, error)
MergeInternalTimeSeriesData exports the engine's C++ merge logic for InternalTimeSeriesData to higher level packages. This is intended primarily for consumption by high level testing of time series functionality.
func PutProto ¶
func PutProto(engine Engine, key proto.EncodedKey, msg gogoproto.Message) (keyBytes, valBytes int64, err error)
PutProto sets the given key to the protobuf-serialized byte string of msg and the provided timestamp. Returns the length in bytes of key and the value.
func Scan ¶
func Scan(engine Engine, start, end proto.EncodedKey, max int64) ([]proto.RawKeyValue, error)
Scan returns up to max key/value objects starting from start (inclusive) and ending at end (non-inclusive). Specify max=0 for unbounded scans.
Types ¶
type Engine ¶
type Engine interface { // Open initializes the engine. Open() error // Close closes the engine, freeing up any outstanding resources. Close() // Attrs returns the engine/store attributes. Attrs() proto.Attributes // Put sets the given key to the value provided. Put(key proto.EncodedKey, value []byte) error // Get returns the value for the given key, nil otherwise. Get(key proto.EncodedKey) ([]byte, error) // GetProto fetches the value at the specified key and unmarshals it // using a protobuf decoder. Returns true on success or false if the // key was not found. On success, returns the length in bytes of the // key and the value. GetProto(key proto.EncodedKey, msg gogoproto.Message) (ok bool, keyBytes, valBytes int64, err error) // Iterate scans from start to end keys, visiting at most max // key/value pairs. On each key value pair, the function f is // invoked. If f returns an error or if the scan itself encounters // an error, the iteration will stop and return the error. // If the first result of f is true, the iteration stops. Iterate(start, end proto.EncodedKey, f func(proto.RawKeyValue) (bool, error)) error // Clear removes the item from the db with the given key. // Note that clear actually removes entries from the storage // engine, rather than inserting tombstones. Clear(key proto.EncodedKey) error // Merge is a high-performance write operation used for values which are // accumulated over several writes. Multiple values can be merged // sequentially into a single key; a subsequent read will return a "merged" // value which is computed from the original merged values. // // Merge currently provides specialized behavior for three data types: // integers, byte slices, and time series observations. Merged integers are // summed, acting as a high-performance accumulator. Byte slices are simply // concatenated in the order they are merged. Time series observations // (stored as byte slices with a special tag on the proto.Value) are // combined with specialized logic beyond that of simple byte slices. // // The logic for merges is written in db.cc in order to be compatible with RocksDB. Merge(key proto.EncodedKey, value []byte) error // Capacity returns capacity details for the engine's available storage. Capacity() (proto.StoreCapacity, error) // SetGCTimeouts sets timeout values for GC of transaction and // response cache entries. The values are specified in unix // time in nanoseconds for the minimum transaction row timestamp and // the minimum response cache row timestamp respectively. Rows // with timestamps less than the associated value will be GC'd // during compaction. SetGCTimeouts(minTxnTS, minRCacheTS int64) // ApproximateSize returns the approximate number of bytes the engine is // using to store data for the given range of keys. ApproximateSize(start, end proto.EncodedKey) (uint64, error) // Flush causes the engine to write all in-memory data to disk // immediately. Flush() error // NewIterator returns a new instance of an Iterator over this // engine. The caller must invoke Iterator.Close() when finished with // the iterator to free resources. NewIterator() Iterator // NewSnapshot returns a new instance of a read-only snapshot // engine. Snapshots are instantaneous and, as long as they're // released relatively quickly, inexpensive. Snapshots are released // by invoking Close(). Note that snapshots must not be used after the // original engine has been stopped. NewSnapshot() Engine // NewBatch returns a new instance of a batched engine which wraps // this engine. Batched engines accumulate all mutations and apply // them atomically on a call to Commit(). NewBatch() Engine // Commit atomically applies any batched updates to the underlying // engine. This is a noop unless the engine was created via NewBatch(). Commit() error // Defer adds a callback to be run after the batch commits // successfully. If Commit() fails (or if this engine was not // created via NewBatch()), deferred callbacks are not called. As // with the defer statement, the last callback to be deferred is the // first to be executed. Defer(fn func()) }
Engine is the interface that wraps the core operations of a key/value store.
type GarbageCollector ¶
type GarbageCollector struct {
// contains filtered or unexported fields
}
GarbageCollector GCs MVCC key/values using a zone-specific GC policy allows either the union or intersection of maximum # of versions and maximum age.
func NewGarbageCollector ¶
func NewGarbageCollector(now proto.Timestamp, policy config.GCPolicy) *GarbageCollector
NewGarbageCollector allocates and returns a new GC, with expiration computed based on current time and policy.TTLSeconds.
func (*GarbageCollector) Filter ¶
func (gc *GarbageCollector) Filter(keys []proto.EncodedKey, values [][]byte) proto.Timestamp
Filter makes decisions about garbage collection based on the garbage collection policy for batches of values for the same key. Returns the timestamp including, and after which, all values should be garbage collected. If no values should be GC'd, returns proto.ZeroTimestamp.
type InMem ¶
type InMem struct {
*RocksDB
}
InMem wraps RocksDB and configures it for in-memory only storage.
type Iterator ¶
type Iterator interface { // Close frees up resources held by the iterator. Close() // Seek advances the iterator to the first key in the engine which // is >= the provided key. Seek(key []byte) // SeekReverse advances the iterator to the first key in the engine which // is <= the provided key. SeekReverse(key []byte) // Valid returns true if the iterator is currently valid. An // iterator which hasn't been seeked or has gone past the end of the // key range is invalid. Valid() bool // Next advances the iterator to the next key/value in the // iteration. After this call, Valid() will be true if the // iterator was not positioned at the last key. Next() // Prev moves the iterator backward to the previous key/value // in the iteration. After this call, Valid() will be true if the // iterator was not positioned at the first key. Prev() // Key returns the current key as a byte slice. Key() proto.EncodedKey // Value returns the current value as a byte slice. Value() []byte // ValueProto unmarshals the value the iterator is currently // pointing to using a protobuf decoder. ValueProto(msg gogoproto.Message) error // Error returns the error, if any, which the iterator encountered. Error() error }
Iterator is an interface for iterating over key/value pairs in an engine. Iterator implementations are thread safe unless otherwise noted.
type MVCCMetadata ¶
type MVCCMetadata struct { Txn *cockroach_proto1.Transaction `protobuf:"bytes,1,opt,name=txn" json:"txn,omitempty"` // The timestamp of the most recent versioned value. Timestamp cockroach_proto1.Timestamp `protobuf:"bytes,2,opt,name=timestamp" json:"timestamp"` // Is the most recent value a deletion tombstone? Deleted bool `protobuf:"varint,3,opt,name=deleted" json:"deleted"` // The size in bytes of the most recent encoded key. KeyBytes int64 `protobuf:"varint,4,opt,name=key_bytes" json:"key_bytes"` // The size in bytes of the most recent versioned value. ValBytes int64 `protobuf:"varint,5,opt,name=val_bytes" json:"val_bytes"` // Inline value, used for values with zero timestamp. This provides // an efficient short circuit of the normal MVCC metadata sentinel // and subsequent version rows. If timestamp == (0, 0), then there // is only a single MVCC metadata row with value inlined, and with // empty timestamp, key_bytes, and val_bytes. Value *cockroach_proto1.Value `protobuf:"bytes,6,opt,name=value" json:"value,omitempty"` }
MVCCMetadata holds MVCC metadata for a key. Used by storage/engine/mvcc.go.
func (*MVCCMetadata) GetDeleted ¶
func (m *MVCCMetadata) GetDeleted() bool
func (*MVCCMetadata) GetKeyBytes ¶
func (m *MVCCMetadata) GetKeyBytes() int64
func (*MVCCMetadata) GetTimestamp ¶
func (m *MVCCMetadata) GetTimestamp() cockroach_proto1.Timestamp
func (*MVCCMetadata) GetTxn ¶
func (m *MVCCMetadata) GetTxn() *cockroach_proto1.Transaction
func (*MVCCMetadata) GetValBytes ¶
func (m *MVCCMetadata) GetValBytes() int64
func (*MVCCMetadata) GetValue ¶
func (m *MVCCMetadata) GetValue() *cockroach_proto1.Value
func (MVCCMetadata) HasWriteIntentError ¶
func (meta MVCCMetadata) HasWriteIntentError(txn *proto.Transaction) bool
HasWriteIntentError returns whether the metadata has an open intent which has not been laid down by the given transaction (which may be nil).
func (MVCCMetadata) IsInline ¶
func (meta MVCCMetadata) IsInline() bool
IsInline returns true if the value is inlined in the metadata.
func (MVCCMetadata) IsIntentOf ¶
func (meta MVCCMetadata) IsIntentOf(txn *proto.Transaction) bool
IsIntentOf returns true if the meta record is an intent of the supplied transaction.
func (*MVCCMetadata) Marshal ¶
func (m *MVCCMetadata) Marshal() (data []byte, err error)
func (*MVCCMetadata) ProtoMessage ¶
func (*MVCCMetadata) ProtoMessage()
func (*MVCCMetadata) Reset ¶
func (m *MVCCMetadata) Reset()
func (*MVCCMetadata) Size ¶
func (m *MVCCMetadata) Size() (n int)
func (*MVCCMetadata) String ¶
func (m *MVCCMetadata) String() string
func (*MVCCMetadata) Unmarshal ¶
func (m *MVCCMetadata) Unmarshal(data []byte) error
type MVCCStats ¶
type MVCCStats struct { LiveBytes int64 `protobuf:"varint,1,opt,name=live_bytes" json:"live_bytes"` KeyBytes int64 `protobuf:"varint,2,opt,name=key_bytes" json:"key_bytes"` ValBytes int64 `protobuf:"varint,3,opt,name=val_bytes" json:"val_bytes"` IntentBytes int64 `protobuf:"varint,4,opt,name=intent_bytes" json:"intent_bytes"` LiveCount int64 `protobuf:"varint,5,opt,name=live_count" json:"live_count"` KeyCount int64 `protobuf:"varint,6,opt,name=key_count" json:"key_count"` ValCount int64 `protobuf:"varint,7,opt,name=val_count" json:"val_count"` IntentCount int64 `protobuf:"varint,8,opt,name=intent_count" json:"intent_count"` IntentAge int64 `protobuf:"varint,9,opt,name=intent_age" json:"intent_age"` GCBytesAge int64 `protobuf:"varint,10,opt,name=gc_bytes_age" json:"gc_bytes_age"` SysBytes int64 `protobuf:"varint,12,opt,name=sys_bytes" json:"sys_bytes"` SysCount int64 `protobuf:"varint,13,opt,name=sys_count" json:"sys_count"` LastUpdateNanos int64 `protobuf:"varint,30,opt,name=last_update_nanos" json:"last_update_nanos"` }
MVCCStats tracks byte and instance counts for:
- Live key/values (i.e. what a scan at current time will reveal; note that this includes intent keys and values, but not keys and values with most recent value deleted)
- Key bytes (includes all keys, even those with most recent value deleted)
- Value bytes (includes all versions)
- Key count (count of all keys, including keys with deleted tombstones)
- Value count (all versions, including deleted tombstones)
- Intents (provisional values written during txns)
- System-local key counts and byte totals
func MVCCComputeStats ¶
MVCCComputeStats scans the underlying engine from start to end keys and computes stats counters based on the values. This method is used after a range is split to recompute stats for each subrange. The start key is always adjusted to avoid counting local keys in the event stats are being recomputed for the first range (i.e. the one with start key == KeyMin). The nowNanos arg specifies the wall time in nanoseconds since the epoch and is used to compute the total age of all intents.
func (*MVCCStats) GetGCBytesAge ¶
func (*MVCCStats) GetIntentAge ¶
func (*MVCCStats) GetIntentBytes ¶
func (*MVCCStats) GetIntentCount ¶
func (*MVCCStats) GetKeyBytes ¶
func (*MVCCStats) GetKeyCount ¶
func (*MVCCStats) GetLastUpdateNanos ¶
func (*MVCCStats) GetLiveBytes ¶
func (*MVCCStats) GetLiveCount ¶
func (*MVCCStats) GetSysBytes ¶
func (*MVCCStats) GetSysCount ¶
func (*MVCCStats) GetValBytes ¶
func (*MVCCStats) GetValCount ¶
func (*MVCCStats) ProtoMessage ¶
func (*MVCCStats) ProtoMessage()
type MVCCValue ¶
type MVCCValue struct { // True to indicate a deletion tombstone. If false, value should not // be nil. Deleted bool `protobuf:"varint,1,opt,name=deleted" json:"deleted"` // The value. Nil if deleted is true; not nil otherwise. Value *cockroach_proto1.Value `protobuf:"bytes,2,opt,name=value" json:"value,omitempty"` }
MVCCValue differentiates between normal versioned values and deletion tombstones.
func (*MVCCValue) GetDeleted ¶
func (*MVCCValue) GetValue ¶
func (m *MVCCValue) GetValue() *cockroach_proto1.Value
func (*MVCCValue) ProtoMessage ¶
func (*MVCCValue) ProtoMessage()
type RocksDB ¶
type RocksDB struct {
// contains filtered or unexported fields
}
RocksDB is a wrapper around a RocksDB database instance.
func NewRocksDB ¶
func NewRocksDB(attrs proto.Attributes, dir string, cacheSize int64) *RocksDB
NewRocksDB allocates and returns a new RocksDB object.
func (*RocksDB) ApproximateSize ¶
func (r *RocksDB) ApproximateSize(start, end proto.EncodedKey) (uint64, error)
ApproximateSize returns the approximate number of bytes on disk that RocksDB is using to store data for the given range of keys.
func (*RocksDB) Attrs ¶
func (r *RocksDB) Attrs() proto.Attributes
Attrs returns the list of attributes describing this engine. This may include a specification of disk type (e.g. hdd, ssd, fio, etc.) and potentially other labels to identify important attributes of the engine.
func (*RocksDB) Capacity ¶
func (r *RocksDB) Capacity() (proto.StoreCapacity, error)
Capacity queries the underlying file system for disk capacity information.
func (*RocksDB) Clear ¶
func (r *RocksDB) Clear(key proto.EncodedKey) error
Clear removes the item from the db with the given key.
func (*RocksDB) Close ¶
func (r *RocksDB) Close()
Close closes the database by deallocating the underlying handle.
func (*RocksDB) CompactRange ¶
func (r *RocksDB) CompactRange(start, end proto.EncodedKey)
CompactRange compacts the specified key range. Specifying nil for the start key starts the compaction from the start of the database. Similarly, specifying nil for the end key will compact through the last key. Note that the use of the word "Range" here does not refer to Cockroach ranges, just to a generalized key range.
func (*RocksDB) Defer ¶
func (r *RocksDB) Defer(func())
Defer is not implemented for RocksDB engine.
func (*RocksDB) Destroy ¶
Destroy destroys the underlying filesystem data associated with the database.
func (*RocksDB) Get ¶
func (r *RocksDB) Get(key proto.EncodedKey) ([]byte, error)
Get returns the value for the given key.
func (*RocksDB) GetProto ¶
func (r *RocksDB) GetProto(key proto.EncodedKey, msg gogoproto.Message) ( ok bool, keyBytes, valBytes int64, err error)
GetProto fetches the value at the specified key and unmarshals it.
func (*RocksDB) Iterate ¶
func (r *RocksDB) Iterate(start, end proto.EncodedKey, f func(proto.RawKeyValue) (bool, error)) error
Iterate iterates from start to end keys, invoking f on each key/value pair. See engine.Iterate for details.
func (*RocksDB) Merge ¶
func (r *RocksDB) Merge(key proto.EncodedKey, value []byte) error
Merge implements the RocksDB merge operator using the function goMergeInit to initialize missing values and goMerge to merge the old and the given value into a new value, which is then stored under key. Currently 64-bit counter logic is implemented. See the documentation of goMerge and goMergeInit for details.
The key and value byte slices may be reused safely. merge takes a copy of them before returning.
func (*RocksDB) NewIterator ¶
NewIterator returns an iterator over this rocksdb engine.
func (*RocksDB) NewSnapshot ¶
NewSnapshot creates a snapshot handle from engine and returns a read-only rocksDBSnapshot engine.
func (*RocksDB) Open ¶
Open creates options and opens the database. If the database doesn't yet exist at the specified directory, one is initialized from scratch. The RocksDB Open and Close methods are reference counted such that subsequent Open calls to an already opened RocksDB instance only bump the reference count. The RocksDB is only closed when a sufficient number of Close calls are performed to bring the reference count down to 0.
func (*RocksDB) Put ¶
func (r *RocksDB) Put(key proto.EncodedKey, value []byte) error
Put sets the given key to the value provided.
The key and value byte slices may be reused safely. put takes a copy of them before returning.
func (*RocksDB) SetGCTimeouts ¶
SetGCTimeouts calls through to the DBEngine's SetGCTimeouts method.