Documentation ¶
Overview ¶
Supplies API to append/fetch key/value/docid from kv-file. kv-file is opened and managed by the WStore structure. entry format is,
| 4-byte size | size-byte value |
Maximum size of each entry is int32, that is 2^31.
translates btree blocks from persistant storage to in-memory data structure, called btree-node. A btree node can be a knode (also called leaf node) or it can be a inode. `block` structure is fundamental to both type of nodes.
Btree indexing algorithm for json {key,docid,value} triplets. `keys` and `values` are expected to be in json, while `docid` is the primary key of json document which contains the key fragment. `value` can optionally be used to store fragment of a document. since keys generated for seconday indexes may not be unique, indexing a.k.a sorting is done on {key,docid}.
This module provides a lock free interface to cache data structure. The general idea is to create an array of pointers, each pointing to the head of DCacheItem linked list.
Since nodes are cached based on their file-position (fpos), fpos is hashed to index into the array. Refer indexFor() to know how.
Subsequenly we walk the linked list for a match in fpos and corresponding cached node.
For deleting we simply remove the node from the list.
For inserting, we prepend the new DCacheItem as the head of the list, and walk the remaining list to delete the item, if it is already present.
(fpos & hashmask) >> rshift
| | *--------* *-->| head | -->|DcacheItem|-->|DcacheItem|-->|DcacheItem| *--------* | *--------* *-->| head | -->|DcacheItem|-->|DcacheItem|-->|DcacheItem| *--------* ... | *--------* *-->| head | -->|DcacheItem|-->|DcacheItem|-->|DcacheItem| *--------*
defer goroutine that handles mvcc snapshots. Functionalites of this process augments the functionalit of MVCC process.
Manages list of free blocks in btree index-file.
Manages head sector of btree index-file. Head sector contains the following items,
rootFileposition int64 timestamp int64 sectorsize int64 flistsize int64 blocksize int64 maxkeys int64 pick int64 crc uint32
Index mutation due to {key,docid,value} insert.
MVCC controller process.
Handles all btree traversals except, insert() and remove()
The following is the general idea on cache structure.
| *------------* WRITE | READ *------------* | inode | ^ | ^ | inode | | ping-cache | | | | | pong-cache | | | *<-----*-----------* | | | knode | | | | *------->| knode | | ping-cache | | | | ncache() | pong-cache | *------------* | commitQ | *------------* ^ V ^ | (Locked access using sync.Mutex) | *------* | |
commits*-----------| MVCC |<-* | recyles *------* | reclaims | |
*----->ping2Pong() (atomic, no locking) | |
The cycle of ping-pong,
reads will always refer to the pong-cache.
reads will populate the cache from disk, when ever cache lookup fails.
writes will refer to the commitQ maintained by MVCC, if node is not in commitQ it will refer to pong-cache.
ping-cache is operated only by the MVCC controller.
MVCC controller will _populate_ the ping-cache when new nodes are generated due to index mutations.
MVCC controller will _evict_ the pong-cache as and when nodes become stale due to index mutations.
ping2Pong() happens when snapshot is flushed to disk.
pong becomes ping, and MVCC controller will _populate_ and _evict_ the newly flipped ping-cache based on commited, recycled and reclaimed node, before allowing further mutations.
Data store for btree, organised in two files, index-file and kv-file.
index-file,
contains head block, list of free blocks within index file, and btree-blocks. head, 512 byte sector written at the head of the file. contains reference to the root bock, head-sector-size, free-list-size and block-size. freelist, contains a list of 8-byte offset into the index file that contains free blocks.
kv-file,
contains key, value, docid bytes. They are always added in append only mode, and a separate read-fd fetches them in random-access. Refer to appendkv.go for more information.
Common functions used across test cases.
Contains necessary functions to do index writing.
Index ¶
- Constants
- func RandomKey(rnd *rand.Rand) string
- func RandomValue(rnd *rand.Rand) string
- func TestData(count int, seed int64) ([]*TestKey, []*TestValue)
- type BTree
- func (bt *BTree) Check()
- func (bt *BTree) Close()
- func (bt *BTree) Contains(key Key) bool
- func (bt *BTree) Count() int64
- func (bt *BTree) DocidSet() <-chan []byte
- func (bt *BTree) Drain()
- func (bt *BTree) Equals(key Key) bool
- func (bt *BTree) Front() ([]byte, []byte, []byte)
- func (bt *BTree) FullSet() <-chan []byte
- func (bt *BTree) Insert(key Key, v Value) bool
- func (bt *BTree) KeySet() <-chan []byte
- func (bt *BTree) LevelCount() ([]int64, int64, int64)
- func (bt *BTree) Lookup(key Key) chan []byte
- func (bt *BTree) Remove(key Key) bool
- func (bt *BTree) Show()
- func (bt *BTree) ShowKeys()
- func (bt *BTree) Stats(check bool)
- func (bt *BTree) ValueSet() <-chan []byte
- type CheckContext
- type Config
- type DCache
- type DCacheItem
- type DEFER
- type Emitter
- type FreeList
- type Head
- type IO
- type IndexConfig
- type Indexer
- type Key
- type MV
- type MVCC
- type Node
- type ReclaimData
- type RecycleData
- type Store
- func (store *Store) Close()
- func (store *Store) Destroy()
- func (store *Store) FetchMVCache(fpos int64) Node
- func (store *Store) FetchNCache(fpos int64) Node
- func (store *Store) FetchNode(fpos int64) Node
- func (store *Store) OpEnd(transaction bool, mv *MV, ts int64)
- func (store *Store) OpStart(transaction bool) (Node, *MV, int64)
- type TestKey
- type TestValue
- type Value
- type WStore
- type WStoreStats
Constants ¶
const ( // FIXME : Is there a better way to learn sizeof a struct. BLK_KEY_SIZE = 20 // bytes BLK_VALUE_SIZE = 8 // bytes BLK_OVERHEAD = 16 // bytes, leaf+size field TRUE = 1 FALSE = 0 )
const ( DEFER_ADD byte = iota DEFER_DELETE )
const ( WS_SAYHI byte = iota WS_CLOSE // {WS_CLOSE} // messages to mvcc goroutine WS_ACCESS // {WS_ACCESS} -> timestamp int64 WS_RELEASE // {WS_RELEASE, timestamp} -> minAccess int64 WS_SETSNAPSHOT // {WS_SETSNAPSHOT, offsets []int64, root int64, timestamp int64} // messages to defer routine WS_PINGCACHE // {WS_PINGCACHE, what byte, fpos int64, node Node} WS_PINGKD // {WS_PINGKD, fpos int64, key []byte} WS_MV // {WS_MV, mv *MV} WS_SYNCSNAPSHOT // {WS_MV, minAccess int64} )
const ( IO_FLUSH byte = iota IO_APPEND IO_CLOSE )
const ( OFFSET_SIZE = 8 // 64 bit offset SECTOR_SIZE = 512 // Disk drive sector size in bytes. FLIST_SIZE = 1024 * OFFSET_SIZE // default free list size in bytes. BLOCK_SIZE = 1024 * 64 // default block size in bytes. )
constants that are relevant for index-file and kv-file
Variables ¶
This section is empty.
Functions ¶
func RandomValue ¶
Types ¶
type BTree ¶
type BTree struct { Config // contains filtered or unexported fields }
btree instance. Typical usage, where `conf` is Config structure.
bt = btree.NewBTree( btree.NewStore( conf ))
any number of BTree instances can be created.
func NewBTree ¶
Create a new instance of btree. `store` will be used to persist btree blocks, key-value data and associated meta-information.
type CheckContext ¶
type CheckContext struct {
// contains filtered or unexported fields
}
type Config ¶
type Config struct { //-- file store Idxfile string Kvfile string IndexConfig // maximum number of levels btree can grow, this information is used as a // cue in calculating couple of limits within the algorithm. Maxlevel int // if number of entries within a node goes below this threshold, then a // rebalance will be triggered on its parent node. RebalanceThrs int // when free nodes are not available to record btree mutations, then a new // set of btree blocks will be appended to the index file. // count of appended blocks = freelist-size * AppendRatio AppendRatio float32 // MVCC snapshots are flushed in batches. DrainRate defines the maximum // number of snapshots to accumulate in-memory, after which they are // flushed to disk. DrainRate int // all intermediate nodes are cached in memory, there are no upper limit // to that. But number of leaf nodes can be really large and // `MaxLeafCache` limits the number of leaf nodes to be cached. MaxLeafCache int // MVCC throttle rate in milliseconds MVCCThrottleRate time.Duration // enables O_SYNC flag for indexfile and kvfile. Sync bool // enables O_DIRECT flag for indexfile and kvfile. Nocache bool // Debug Debug bool }
BTree configuration parameters, these parameters cannot change once the index-file and kv-file are created, for intance, when indexing server restarts on existing index files.
type DCacheItem ¶
type DCacheItem struct {
// contains filtered or unexported fields
}
Singley linked list.
type FreeList ¶
type FreeList struct {
// contains filtered or unexported fields
}
Structure to manage the free list
type Head ¶
type Head struct {
// contains filtered or unexported fields
}
Structure to manage the head sector
type IndexConfig ¶
type IndexConfig struct { Sectorsize int64 // head sector-size in bytes. Flistsize int64 // free-list size in bytes. Blocksize int64 // btree block size in bytes. }
Sub-structure to `Config` structure.
type Indexer ¶
type Indexer interface { // Insert {key,value} pairs into the index. key type is expected to // implement `Key` interface and value type is expected to implement // `Value` interface. If the key is successfuly inserted it returns true. Insert(Key, Value) bool // Count number of key,value pairs in this index. Count() int64 // Return key-bytes, docid-bytes, and value bytes of the first // element in the list. Front() ([]byte, []byte, []byte) // Check whether `key` is present in the index. Contains(Key) bool // Check whether `key` and `docid` is present in the index. Equals(Key) bool // Return a channel on which the caller can receive key bytes, docid- // bytes and value-bytes for each entry in the index. // ch := bt.FullSet() // keybytes := <-ch // valbytes := <-ch // docidbytes := <-ch FullSet() <-chan []byte // Return a channel on which the caller can receive key-bytes. KeySet() <-chan []byte // Return a channel on which the caller can receive docid-bytes DocidSet() <-chan []byte // Return a channel on which the caller can receive value-bytes ValueSet() <-chan []byte // Return a channel that will transmit all values associated with `key`, // make sure the `docid` is set to minimum value to lookup all values // greater that `key` && `docid` Lookup(Key) (chan []byte, error) // Remove an entry identified by {key,docid} Remove(Key) bool //-- Meant for debugging. Drain() // flush the MVCC snapshots into disk. Check() // check the btree data structure for anamolies. Show() // displays in-memory btree structure on stdout. ShowKeys() // list keys and docids inside the tree. Stats(bool) // display statistics so far. LevelCount() // count number of inodes, knodes and number of entries. }
interface made available to btree user.
type Key ¶
type Key interface { // transform actual key content into byte slice, that can be persisted in // file. Bytes() []byte // every key carries the document-id that emitted this {key,value} tupele, // transform the document-id into byte slice, that can be persisted in file. Docid() []byte // this is the call-back hook that `Key` types can use to sort themself. // kfpos : file-position inside kv-file that contains key-content. // dfpos : file-position inside kv-file that contains docid-content. // isD : boolean that says whether comparision needs to be done on // document-id as well // // Example: // // otherkey = s.fetchKey(kfpos) // if cmp = bytes.Compare(thiskey, otherkey); cmp == 0 && isD { // otherdocid = s.fetchKey(dfpos) // cmp = bytes.Compare(thisdocid, otherdocid) // if cmp == 0 { // return cmp, kfpos, dfpos // } else { // return cmp, kfpos, -1 // } // } else if cmp == 0 { // return cmp, kfpos, -1 // } else { // return cmp, -1, -1 // } // // Returns: // - cmp, result of comparision, either -1, 0, 1. // - kfpos, if > -1, it means the keys are equal and specifies the // offset in kv-file that contains the key. // - dfpos, if > -1, it means the docids are equal and specifies the // offset in kv-file that contains the docid. CompareLess(s *Store, kfpos int64, dfpos int64, isD bool) (int, int64, int64) // check whether both key and document-id compares equal. Equal([]byte, []byte) (bool, bool) }
interfaces to be supported by key,value types.
type Node ¶
type Node interface {
// contains filtered or unexported methods
}
Node interface that is implemented by both `knode` and `inode` structure.
type ReclaimData ¶
type ReclaimData struct {
// contains filtered or unexported fields
}
type RecycleData ¶
type RecycleData ReclaimData
type Store ¶
type Store struct { Config // contains filtered or unexported fields }
func (*Store) Close ¶
func (store *Store) Close()
Close will release all resources maintained by store.
func (*Store) Destroy ¶
func (store *Store) Destroy()
Destroy is opposite of Create, it cleans up the datafiles. Data files will be deleted only when all references to WStore is removed.
func (*Store) FetchMVCache ¶
Fetch a node, identified by its file-position, from commitQ or from memory cache. If it is not available from memory fetch from disk. NOTE: multi-version fetches are only used from index mutations and they eventually end up in commitQ under a new file-position. They are not cached into memory until the snapshot is flushed into the disk.
func (*Store) FetchNCache ¶
Fetch a node, identified by its file-position, from cache. If it is not available from cache, fetch from disk and cache them in memory. To learn how nodes are cached, refer to cache.go
type TestKey ¶
func (*TestKey) CompareLess ¶
type Value ¶
type Value interface { // transform actual value content into byte slice, that can be persisted in // file. Bytes() []byte }
type WStore ¶
type WStore struct { Config MVCC // MVCC concurrency control go-routine IO // IO flusher DEFER // kv-cache WStoreStats // contains filtered or unexported fields }
structure that handles write.
func OpenWStore ¶
Main API to get or instantiate a write-store. If write-store for this index file is already created, it will bre returned after incrementing the reference count.
func (*WStore) DestroyWStore ¶
func (wstore *WStore) DestroyWStore()
Destroy is opposite of Create, it cleans up the datafiles.
type WStoreStats ¶
type WStoreStats struct { MVloadCounts int64 // contains filtered or unexported fields }
Statistical counts