Documentation ¶
Overview ¶
Package storage implements the Cockroach storage node. A storage node exports the "Node" Go RPC service. Each node handles one or more stores, identified by a device name. Each store corresponds to a single physical device. A store multiplexes to one or more ranges, identified by start key. Ranges are contiguous regions of the keyspace. Each range implements an instance of the raft consensus algorithm to synchronize range replicas.
The Engine interface provides an API for key-value stores. InMem implements an in-memory engine using a sorted map. RocksDB implements an engine for data stored to local disk using RocksDB, a variant of LevelDB.
Index ¶
- Variables
- type AcctConfig
- type AccumulateTSRequest
- type AccumulateTSResponse
- type Attributes
- type ContainsRequest
- type ContainsResponse
- type DeleteRangeRequest
- type DeleteRangeResponse
- type DeleteRequest
- type DeleteResponse
- type EndTransactionRequest
- type EndTransactionResponse
- type Engine
- type EnqueueMessageRequest
- type EnqueueMessageResponse
- type EnqueueUpdateRequest
- type EnqueueUpdateResponse
- type GetRequest
- type GetResponse
- type InMem
- type IncrementRequest
- type IncrementResponse
- type InternalRangeLookupRequest
- type InternalRangeLookupResponse
- type Key
- type KeyValue
- type LogEntry
- type NodeDescriptor
- type PermConfig
- type Permission
- type PutRequest
- type PutResponse
- type Range
- func (r *Range) AccumulateTS(args *AccumulateTSRequest, reply *AccumulateTSResponse)
- func (r *Range) Contains(args *ContainsRequest, reply *ContainsResponse)
- func (r *Range) Delete(args *DeleteRequest, reply *DeleteResponse)
- func (r *Range) DeleteRange(args *DeleteRangeRequest, reply *DeleteRangeResponse)
- func (r *Range) EndTransaction(args *EndTransactionRequest, reply *EndTransactionResponse)
- func (r *Range) EnqueueMessage(args *EnqueueMessageRequest, reply *EnqueueMessageResponse)
- func (r *Range) EnqueueUpdate(args *EnqueueUpdateRequest, reply *EnqueueUpdateResponse)
- func (r *Range) Get(args *GetRequest, reply *GetResponse)
- func (r *Range) Increment(args *IncrementRequest, reply *IncrementResponse)
- func (r *Range) InternalRangeLookup(args *InternalRangeLookupRequest, reply *InternalRangeLookupResponse)
- func (r *Range) IsFirstRange() bool
- func (r *Range) IsLeader() bool
- func (r *Range) Put(args *PutRequest, reply *PutResponse)
- func (r *Range) ReadOnlyCmd(method string, args, reply interface{}) error
- func (r *Range) ReadWriteCmd(method string, args, reply interface{}) <-chan error
- func (r *Range) ReapQueue(args *ReapQueueRequest, reply *ReapQueueResponse)
- func (r *Range) Scan(args *ScanRequest, reply *ScanResponse)
- func (r *Range) Start()
- func (r *Range) Stop()
- type RangeDescriptor
- type RangeMetadata
- type ReapQueueRequest
- type ReapQueueResponse
- type Replica
- type RequestHeader
- type ResponseHeader
- type RocksDB
- type ScanRequest
- type ScanResponse
- type Store
- func (s *Store) Attrs() Attributes
- func (s *Store) Bootstrap(ident StoreIdent) error
- func (s *Store) Capacity() (StoreCapacity, error)
- func (s *Store) Close()
- func (s *Store) CreateRange(startKey, endKey Key, replicas []Replica) (*Range, error)
- func (s *Store) Descriptor(nodeDesc *NodeDescriptor) (*StoreDescriptor, error)
- func (s *Store) GetRange(rangeID int64) (*Range, error)
- func (s *Store) Init() error
- func (s *Store) IsBootstrapped() bool
- func (s *Store) String() string
- type StoreCapacity
- type StoreDescriptor
- type StoreFinder
- type StoreIdent
- type Value
- type ZoneConfig
Constants ¶
This section is empty.
Variables ¶
var ( // KeyMin is a minimum key value which sorts before all other keys. KeyMin = Key("") // KeyMax is a maximum key value which sorts after all other // keys. Because keys are stored using an ordered encoding (see // storage/encoding.go), they will never start with \xff. KeyMax = Key("\xff") // KeyConfigAccountingPrefix specifies the key prefix for accounting // configurations. The suffix is the affected key prefix. KeyConfigAccountingPrefix = Key("\x00acct") // KeyConfigPermissionPrefix specifies the key prefix for accounting // configurations. The suffix is the affected key prefix. KeyConfigPermissionPrefix = Key("\x00perm") // KeyConfigZonePrefix specifies the key prefix for zone // configurations. The suffix is the affected key prefix. KeyConfigZonePrefix = Key("\x00zone") // KeyMetaPrefix is the prefix for range metadata keys. KeyMetaPrefix = Key("\x00\x00meta") // KeyMeta1Prefix is the first level of key addressing. The value is a // RangeDescriptor struct. KeyMeta1Prefix = MakeKey(KeyMetaPrefix, Key("1")) // KeyMeta2Prefix is the second level of key addressing. The value is a // RangeDescriptor struct. KeyMeta2Prefix = MakeKey(KeyMetaPrefix, Key("2")) // KeyNodeIDGenerator contains a sequence generator for node IDs. KeyNodeIDGenerator = Key("\x00node-id-generator") // KeyStoreIDGeneratorPrefix specifies key prefixes for sequence // generators, one per node, for store IDs. KeyStoreIDGeneratorPrefix = Key("\x00store-id-generator-") )
Constants for system-reserved keys in the KV map.
Functions ¶
This section is empty.
Types ¶
type AccumulateTSRequest ¶
type AccumulateTSRequest struct { RequestHeader Key Key Counts []int64 // One per discrete subtime period (e.g. one/minute or one/second) }
An AccumulateTSRequest is arguments to the AccumulateTS() method. It specifies the key at which to accumulate TS values, and the time series counts for this discrete time interval.
type AccumulateTSResponse ¶
type AccumulateTSResponse struct {
ResponseHeader
}
An AccumulateTSResponse is the return value from the AccumulateTS() method.
type Attributes ¶
type Attributes []string
Attributes specifies a list of arbitrary strings describing node topology, store type, and machine capabilities.
func (Attributes) IsSubset ¶
func (a Attributes) IsSubset(b Attributes) bool
IsSubset returns whether attributes list b is a subset of attributes list a.
func (Attributes) SortedString ¶
func (a Attributes) SortedString() string
SortedString returns a sorted, de-duplicated, comma-separated list of the attributes.
type ContainsRequest ¶
type ContainsRequest struct { RequestHeader Key Key }
A ContainsRequest is arguments to the Contains() method.
type ContainsResponse ¶
type ContainsResponse struct { ResponseHeader Exists bool }
A ContainsResponse is the return value of the Contains() method.
type DeleteRangeRequest ¶
type DeleteRangeRequest struct { RequestHeader StartKey Key // Empty to start at first key EndKey Key // Non-inclusive; if empty, deletes all }
A DeleteRangeRequest is arguments to the DeleteRange method. It specifies the range of keys to delete.
type DeleteRangeResponse ¶
type DeleteRangeResponse struct { ResponseHeader NumDeleted uint64 }
A DeleteRangeResponse is the return value from the DeleteRange() method.
type DeleteRequest ¶
type DeleteRequest struct { RequestHeader Key Key }
A DeleteRequest is arguments to the Delete() method.
type DeleteResponse ¶
type DeleteResponse struct {
ResponseHeader
}
A DeleteResponse is the return value from the Delete() method.
type EndTransactionRequest ¶
type EndTransactionRequest struct { RequestHeader Commit bool // False to abort and rollback Keys []Key // Write-intent keys to commit or abort }
An EndTransactionRequest is arguments to the EndTransaction() method. It specifies whether to commit or roll back an extant transaction. It also lists the keys involved in the transaction so their write intents may be aborted or committed.
type EndTransactionResponse ¶
type EndTransactionResponse struct { ResponseHeader CommitTimestamp int64 // Unix nanos (us) CommitWait int64 // Remaining with (us) }
An EndTransactionResponse is the return value from the EndTransaction() method. It specifies the commit timestamp for the final transaction (all writes will have this timestamp). It further specifies the commit wait, which is the remaining time the client MUST wait before signalling completion of the transaction to another distributed node to maintain consistency.
type Engine ¶
type Engine interface { // The engine/store attributes. Attrs() Attributes // contains filtered or unexported methods }
Engine is the interface that wraps the core operations of a key/value store.
type EnqueueMessageRequest ¶
type EnqueueMessageRequest struct { RequestHeader Inbox Key // Recipient key Message Value // Message value to delivery to inbox }
An EnqueueMessageRequest is arguments to the EnqueueMessage() method. It specifies the recipient inbox key and the message (an arbitrary byte slice value).
type EnqueueMessageResponse ¶
type EnqueueMessageResponse struct {
ResponseHeader
}
An EnqueueMessageResponse is the return value from the EnqueueMessage() method.
type EnqueueUpdateRequest ¶
type EnqueueUpdateRequest struct { RequestHeader Update interface{} }
An EnqueueUpdateRequest is arguments to the EnqueueUpdate() method. It specifies the update to enqueue for asynchronous execution. Update is an instance of one of the following messages: PutRequest, IncrementRequest, DeleteRequest, DeleteRangeRequest, or AccountingRequest.
type EnqueueUpdateResponse ¶
type EnqueueUpdateResponse struct {
ResponseHeader
}
An EnqueueUpdateResponse is the return value from the EnqueueUpdate() method.
type GetRequest ¶
type GetRequest struct { RequestHeader Key Key }
A GetRequest is arguments to the Get() method.
type GetResponse ¶
type GetResponse struct { ResponseHeader Value Value }
A GetResponse is the return value from the Get() method. If the key doesn't exist, returns nil for Value.Bytes.
type InMem ¶
InMem a simple, in-memory key-value store.
func NewInMem ¶
func NewInMem(attrs Attributes, maxBytes int64) *InMem
NewInMem allocates and returns a new InMem object.
func (*InMem) Attrs ¶
func (in *InMem) Attrs() Attributes
Attrs returns the list of attributes describing this engine. This includes the disk type (always "mem") and potentially other labels to identify important attributes of the engine.
type IncrementRequest ¶
type IncrementRequest struct { RequestHeader Key Key Increment int64 }
An IncrementRequest is arguments to the Increment() method. It increments the value for key, interpreting the existing value as a varint64.
type IncrementResponse ¶
type IncrementResponse struct { ResponseHeader NewValue int64 }
An IncrementResponse is the return value from the Increment method. The new value after increment is specified in NewValue. If the value could not be decoded as specified, Error will be set.
type InternalRangeLookupRequest ¶
type InternalRangeLookupRequest struct { RequestHeader Key Key }
An InternalRangeLookupRequest is arguments to the InternalRangeLookup() method. It specifies the key for range lookup, which is a system key prefixed by KeyMeta1Prefix or KeyMeta2Prefix to the user key.
type InternalRangeLookupResponse ¶
type InternalRangeLookupResponse struct { ResponseHeader EndKey Key // The key in datastore whose value is the Range object. Range RangeDescriptor }
An InternalRangeLookupResponse is the return value from the InternalRangeLookup() method. It returns the metadata for the range where the key resides. When looking up 1-level metadata, it returns the info for the range containing the 2-level metadata for the key. And when looking up 2-level metadata, it returns the info for the range possibly containing the actual key and its value.
type Key ¶
type Key []byte
Key defines the key in the key-value datastore.
func PrefixEndKey ¶
PrefixEndKey determines the end key given a start key as a prefix. This adds "1" to the final byte and propagates the carry. The special case of KeyMin ("") always returns KeyMax ("\xff").
type KeyValue ¶
KeyValue is a pair of Key and Value for returned Key/Value pairs from ScanRequest/ScanResponse. It embeds a Key and a Value.
type LogEntry ¶
type LogEntry struct { Method string Args interface{} Reply interface{} // contains filtered or unexported fields }
A LogEntry provides serialization of a read/write command. Once committed to the log, the command is executed and the result returned via the done channel.
type NodeDescriptor ¶
type NodeDescriptor struct { NodeID int32 Address net.Addr Attrs Attributes // node specific attributes (e.g. datacenter, machine info) }
NodeDescriptor holds details on node physical/network topology.
type PermConfig ¶
type PermConfig struct {
Perms []Permission `yaml:"permissions,omitempty"`
}
PermConfig holds permission configuration.
type Permission ¶
type Permission struct { Users []string `yaml:"users,omitempty"` // Empty to specify default permission Read bool `yaml:"read,omitempty"` // Default means reads are restricted Write bool `yaml:"write,omitempty"` // Default means writes are restricted Priority float32 `yaml:"priority,omitempty"` // 0.0 means default priority }
Permission specifies read/write access and associated priority.
type PutRequest ¶
type PutRequest struct { RequestHeader Key Key // must be non-empty Value Value // The value to put ExpValue *Value // ExpValue.Bytes empty to test for non-existence }
A PutRequest is arguments to the Put() method. Conditional puts are supported if ExpValue is set. - Returns true and sets value if ExpValue equals existing value. - If key doesn't exist and ExpValue is empty, sets value. - Otherwise, returns error.
type PutResponse ¶
type PutResponse struct { ResponseHeader ActualValue *Value // ActualValue.Bytes set if conditional put failed }
A PutResponse is the return value form the Put() method.
type Range ¶
type Range struct { Meta RangeMetadata // contains filtered or unexported fields }
A Range is a contiguous keyspace with writes managed via an instance of the Raft consensus algorithm. Many ranges may exist in a store and they are unlikely to be contiguous. Ranges are independent units and are responsible for maintaining their own integrity by replacing failed replicas, splitting and merging as appropriate.
func NewRange ¶
func NewRange(meta RangeMetadata, engine Engine, allocator *allocator, gossip *gossip.Gossip) *Range
NewRange initializes the range starting at key.
func (*Range) AccumulateTS ¶
func (r *Range) AccumulateTS(args *AccumulateTSRequest, reply *AccumulateTSResponse)
AccumulateTS is used internally to aggregate statistics over key ranges throughout the distributed cluster.
func (*Range) Contains ¶
func (r *Range) Contains(args *ContainsRequest, reply *ContainsResponse)
Contains verifies the existence of a key in the key value store.
func (*Range) Delete ¶
func (r *Range) Delete(args *DeleteRequest, reply *DeleteResponse)
Delete deletes the key and value specified by key.
func (*Range) DeleteRange ¶
func (r *Range) DeleteRange(args *DeleteRangeRequest, reply *DeleteRangeResponse)
DeleteRange deletes the range of key/value pairs specified by start and end keys.
func (*Range) EndTransaction ¶
func (r *Range) EndTransaction(args *EndTransactionRequest, reply *EndTransactionResponse)
EndTransaction either commits or aborts (rolls back) an extant transaction according to the args.Commit parameter.
func (*Range) EnqueueMessage ¶
func (r *Range) EnqueueMessage(args *EnqueueMessageRequest, reply *EnqueueMessageResponse)
EnqueueMessage enqueues a message (Value) for delivery to a recipient inbox.
func (*Range) EnqueueUpdate ¶
func (r *Range) EnqueueUpdate(args *EnqueueUpdateRequest, reply *EnqueueUpdateResponse)
EnqueueUpdate sidelines an update for asynchronous execution. AccumulateTS updates are sent this way. Eventually-consistent indexes are also built using update queues. Crucially, the enqueue happens as part of the caller's transaction, so is guaranteed to be executed if the transaction succeeded.
func (*Range) Get ¶
func (r *Range) Get(args *GetRequest, reply *GetResponse)
Get returns the value for a specified key.
func (*Range) Increment ¶
func (r *Range) Increment(args *IncrementRequest, reply *IncrementResponse)
Increment increments the value (interpreted as varint64 encoded) and returns the newly incremented value (encoded as varint64). If no value exists for the key, zero is incremented.
func (*Range) InternalRangeLookup ¶
func (r *Range) InternalRangeLookup(args *InternalRangeLookupRequest, reply *InternalRangeLookupResponse)
InternalRangeLookup looks up the metadata info for the given args.Key. args.Key should be a metadata key, which are of the form "\0\0meta[12]<encoded_key>".
func (*Range) IsFirstRange ¶
IsFirstRange returns true if this is the first range.
func (*Range) IsLeader ¶
IsLeader returns true if this range replica is the raft leader. TODO(spencer): this is always true for now.
func (*Range) Put ¶
func (r *Range) Put(args *PutRequest, reply *PutResponse)
Put sets the value for a specified key. Conditional puts are supported.
func (*Range) ReadOnlyCmd ¶
ReadOnlyCmd executes a read-only command against the store. If this server is the raft leader, we can satisfy the read locally. Otherwise, if this server has executed a raft command or heartbeat at a timestamp greater than the read timestamp, we can also satisfy the read locally. Otherwise, we must ping the leader to determine with certainty whether our local data is up to date.
func (*Range) ReadWriteCmd ¶
ReadWriteCmd executes a read-write command against the store. If this node is the raft leader, it proposes the write to the other raft participants. Otherwise, the write is forwarded via a FollowerPropose RPC to the leader and this replica waits for an ACK to execute the command locally and return the result to the requesting client.
Commands which mutate the store must be proposed as part of the raft consensus write protocol. Only after committed can the command be executed. To facilitate this, ReadWriteCmd returns a channel which is signaled upon completion.
func (*Range) ReapQueue ¶
func (r *Range) ReapQueue(args *ReapQueueRequest, reply *ReapQueueResponse)
ReapQueue destructively queries messages from a delivery inbox queue. This method must be called from within a transaction.
func (*Range) Scan ¶
func (r *Range) Scan(args *ScanRequest, reply *ScanResponse)
Scan scans the key range specified by start key through end key up to some maximum number of results. The last key of the iteration is returned with the reply.
type RangeDescriptor ¶
type RangeDescriptor struct { // The start key of the range represented by this struct, along with the // meta1 or meta2 key prefix. StartKey Key Replicas []Replica }
RangeDescriptor is the metadata value stored for a metadata key. The metadata key has meta1 or meta2 key prefix and the suffix encodes the end key of the range this struct represents.
type RangeMetadata ¶
type RangeMetadata struct { ClusterID string RangeID int64 StartKey Key EndKey Key Replicas RangeDescriptor }
A RangeMetadata holds information about the range, including range ID and start and end keys, and replicas slice.
type ReapQueueRequest ¶
type ReapQueueRequest struct { RequestHeader Inbox Key // Recipient inbox key MaxResults int64 // Maximum results to return; must be > 0 }
A ReapQueueRequest is arguments to the ReapQueue() method. It specifies the recipient inbox key to which messages are waiting to be reapted and also the maximum number of results to return.
type ReapQueueResponse ¶
type ReapQueueResponse struct { ResponseHeader Messages []Value }
A ReapQueueResponse is the return value from the ReapQueue() method.
type Replica ¶
type Replica struct { NodeID int32 StoreID int32 RangeID int64 Attrs Attributes // combination of node & store attributes }
Replica describes a replica location by node ID (corresponds to a host:port via lookup on gossip network), store ID (corresponds to a physical device, unique per node) and range ID. Datacenter and DiskType are provided to optimize reads. Replicas are stored in Range lookup records (meta1, meta2).
func ChooseRandomReplica ¶
ChooseRandomReplica returns a replica selected at random or nil if none exist.
type RequestHeader ¶
type RequestHeader struct { // Timestamp specifies time at which read or writes should be // performed. In nanoseconds since the epoch. Defaults to current // wall time. Timestamp int64 // Replica specifies the destination for the request. See config.go. Replica Replica // MaxTimestamp is the maximum wall time seen by the client to // date. This should be supplied with successive transactions for // linearalizability for this client. In nanoseconds since the // epoch. MaxTimestamp int64 // TxID is set non-empty if a transaction is underway. Empty string // to start a new transaction. TxID string }
RequestHeader is supplied with every storage node request.
type ResponseHeader ¶
type ResponseHeader struct { // Error is non-nil if an error occurred. Error error // TxID is non-empty if a transaction is underway. TxID string }
ResponseHeader is returned with every storage node response.
type RocksDB ¶
type RocksDB struct {
// contains filtered or unexported fields
}
RocksDB is a wrapper around a RocksDB database instance.
func NewRocksDB ¶
func NewRocksDB(attrs Attributes, dir string) (*RocksDB, error)
NewRocksDB allocates and returns a new RocksDB object.
func (*RocksDB) Attrs ¶
func (r *RocksDB) Attrs() Attributes
Attrs returns the list of attributes describing this engine. This may include a specification of disk type (e.g. hdd, ssd, fio, etc.) and potentially other labels to identify important attributes of the engine.
type ScanRequest ¶
type ScanRequest struct { RequestHeader StartKey Key // Empty to start at first key EndKey Key // Optional max key; empty to ignore MaxResults int64 // Must be > 0 }
A ScanRequest is arguments to the Scan() method. It specifies the start and end keys for the scan and the maximum number of results.
type ScanResponse ¶
type ScanResponse struct { ResponseHeader Rows []KeyValue // Empty if no rows were scanned }
A ScanResponse is the return value from the Scan() method.
type Store ¶
type Store struct { Ident StoreIdent // contains filtered or unexported fields }
A Store maintains a map of ranges by start key. A Store corresponds to one physical device.
func (*Store) Attrs ¶
func (s *Store) Attrs() Attributes
Attrs returns the attributes of the underlying store.
func (*Store) Bootstrap ¶
func (s *Store) Bootstrap(ident StoreIdent) error
Bootstrap writes a new store ident to the underlying engine. To ensure that no crufty data already exists in the engine, it scans the engine contents before writing the new store ident. The engine should be completely empty. It returns an error if called on a non-empty engine.
func (*Store) Capacity ¶
func (s *Store) Capacity() (StoreCapacity, error)
Capacity returns the capacity of the underlying storage engine.
func (*Store) CreateRange ¶
CreateRange allocates a new range ID and stores range metadata. On success, returns the new range.
func (*Store) Descriptor ¶
func (s *Store) Descriptor(nodeDesc *NodeDescriptor) (*StoreDescriptor, error)
Descriptor returns a StoreDescriptor including current store capacity information.
func (*Store) IsBootstrapped ¶
IsBootstrapped returns true if the store has already been bootstrapped. If the store ident is corrupt, IsBootstrapped will return true; the exact error can be retrieved via a call to Init().
type StoreCapacity ¶
StoreCapacity contains capacity information for a storage device.
func (StoreCapacity) PercentAvail ¶
func (sc StoreCapacity) PercentAvail() float64
PercentAvail computes the percentage of disk space that is available.
type StoreDescriptor ¶
type StoreDescriptor struct { StoreID int32 Attrs Attributes // store specific attributes (e.g. ssd, hdd, mem) Node NodeDescriptor Capacity StoreCapacity }
StoreDescriptor holds store information including store attributes, node descriptor and store capacity.
func (*StoreDescriptor) CombinedAttrs ¶
func (s *StoreDescriptor) CombinedAttrs() Attributes
CombinedAttrs returns the full list of attributes for the store, including both the node and store attributes.
type StoreFinder ¶
type StoreFinder func(Attributes) ([]*StoreDescriptor, error)
StoreFinder finds the disks in a datacenter with the most available capacity.
type StoreIdent ¶
A StoreIdent uniquely identifies a store in the cluster. The StoreIdent is written to the underlying storage engine at a store-reserved system key (keyStoreIdent).
type Value ¶
type Value struct { // Bytes is the byte string value. Bytes []byte // Timestamp of value in nanoseconds since epoch. Timestamp int64 // Expiration in nanoseconds. Expiration int64 }
Value specifies the value at a key. Multiple values at the same key are supported based on timestamp. Values which have been overwritten have an associated expiration, after which they will be permanently deleted.
type ZoneConfig ¶
type ZoneConfig struct { // Replicas is a slice of Attributes, each describing required // capabilities of each replica in the zone. Replicas []Attributes `yaml:"replicas,omitempty,flow"` RangeMinBytes int64 `yaml:"range_min_bytes,omitempty"` RangeMaxBytes int64 `yaml:"range_max_bytes,omitempty"` }
ZoneConfig holds configuration that is needed for a range of KV pairs.
func ParseZoneConfig ¶
func ParseZoneConfig(in []byte) (*ZoneConfig, error)
ParseZoneConfig parses a YAML serialized ZoneConfig.
func (*ZoneConfig) ToYAML ¶
func (z *ZoneConfig) ToYAML() ([]byte, error)
ToYAML serializes a ZoneConfig as YAML.