Documentation ¶
Overview ¶
Package lldb (WIP) implements a low level database engine. The database model used could be considered a specific implementation of some small(est) intersection of models listed in [1]. As a settled term is lacking, it'll be called here a 'Virtual memory model' (VMM).
Experimental release notes ¶
This is an experimental release. Don't open a DB from two applications or two instances of an application - it will get corrupted (no file locking is implemented and this task is delegated to lldb's clients).
WARNING: THE LLDB API IS SUBJECT TO CHANGE.
Filers ¶
A Filer is an abstraction of storage. A Filer may be a part of some process' virtual address space, an OS file, a networked, remote file etc. Persistence of the storage is optional, opaque to VMM and it is specific to a concrete Filer implementation.
Space management ¶
Mechanism to allocate, reallocate (resize), deallocate (and later reclaim the unused) contiguous parts of a Filer, called blocks. Blocks are identified and referred to by a handle, an int64.
BTrees ¶
In addition to the VMM like services, lldb provides volatile and non-volatile BTrees. Keys and values of a BTree are limited in size to 64kB each (a bit more actually). Support for larger keys/values, if desired, can be built atop a BTree to certain limits.
Handles vs pointers ¶
A handle is the abstracted storage counterpart of a memory address. There is one fundamental difference, though. Resizing a block never results in a change to the handle which refers to the resized block, so a handle is more akin to an unique numeric id/key. Yet it shares one property of pointers - handles can be associated again with blocks after the original handle block was deallocated. In other words, a handle uniqueness domain is the state of the database and is not something comparable to e.g. an ever growing numbering sequence.
Also, as with memory pointers, dangling handles can be created and blocks overwritten when such handles are used. Using a zero handle to refer to a block will not panic; however, the resulting error is effectively the same exceptional situation as dereferencing a nil pointer.
Blocks ¶
Allocated/used blocks, are limited in size to only a little bit more than 64kB. Bigger semantic entities/structures must be built in lldb's client code. The content of a block has no semantics attached, it's only a fully opaque `[]byte`.
Scalars ¶
Use of "scalars" applies to EncodeScalars, DecodeScalars and Collate. Those first two "to bytes" and "from bytes" functions are suggested for handling multi-valued Allocator content items and/or keys/values of BTrees (using Collate for keys). Types called "scalar" are:
nil (the typeless one) bool all integral types: [u]int8, [u]int16, [u]int32, [u]int, [u]int64 all floating point types: float32, float64 all complex types: complex64, complex128 []byte (64kB max) string (64kb max)
Specific implementations ¶
Included are concrete implementations of some of the VMM interfaces included to ease serving simple client code or for testing and possibly as an example. More details in the documentation of such implementations.
[1]: http://en.wikipedia.org/wiki/Database_model
Index ¶
- func Collate(x, y []interface{}, strCollate func(string, string) int) (r int, err error)
- func DecodeScalars(b []byte) (scalars []interface{}, err error)
- func EncodeScalars(scalars ...interface{}) (b []byte, err error)
- func RemoveBTree(store *Allocator, handle int64) (err error)
- type ACIDFiler0
- type AllocStats
- type Allocator
- func (a *Allocator) Alloc(b []byte) (handle int64, err error)
- func (a *Allocator) CacheStats() (buffersUsed, buffersTotal int, bytesUsed, bytesTotal, hits, misses int64)
- func (a *Allocator) Free(handle int64) (err error)
- func (a *Allocator) Get(buf []byte, handle int64) (b []byte, err error)
- func (a *Allocator) Realloc(handle int64, b []byte) (err error)
- func (a *Allocator) Verify(bitmap Filer, log func(error) bool, stats *AllocStats) (err error)
- type BTree
- func (t *BTree) Clear() (err error)
- func (t *BTree) Delete(key []byte) (err error)
- func (t *BTree) DeleteAny() (empty bool, err error)
- func (t *BTree) Dump(w io.Writer) (err error)
- func (t *BTree) Extract(buf, key []byte) (value []byte, err error)
- func (t *BTree) First() (key, value []byte, err error)
- func (t *BTree) Get(buf, key []byte) (value []byte, err error)
- func (t *BTree) Handle() int64
- func (t *BTree) IndexSeek(key []byte, c func(a, b []byte) int) (enum *BTreeEnumerator, hit bool, err error)
- func (t *BTree) IsMem() (r bool)
- func (t *BTree) Last() (key, value []byte, err error)
- func (t *BTree) Put(buf, key []byte, upd func(key, old []byte) (new []byte, write bool, err error)) (old []byte, written bool, err error)
- func (t *BTree) Seek(key []byte) (enum *BTreeEnumerator, hit bool, err error)
- func (t *BTree) SeekFirst() (enum *BTreeEnumerator, err error)
- func (t *BTree) SeekLast() (enum *BTreeEnumerator, err error)
- func (t *BTree) Set(key, value []byte) (err error)
- type BTreeEnumerator
- type ErrDecodeScalars
- type ErrILSEQ
- type ErrINVAL
- type ErrPERM
- type ErrType
- type Filer
- type InnerFiler
- func (f *InnerFiler) BeginUpdate() error
- func (f *InnerFiler) Close() (err error)
- func (f *InnerFiler) EndUpdate() error
- func (f *InnerFiler) Name() string
- func (f *InnerFiler) PunchHole(off, size int64) error
- func (f *InnerFiler) ReadAt(b []byte, off int64) (n int, err error)
- func (f *InnerFiler) Rollback() error
- func (f *InnerFiler) Size() (int64, error)
- func (f *InnerFiler) Sync() (err error)
- func (f *InnerFiler) Truncate(size int64) error
- func (f *InnerFiler) WriteAt(b []byte, off int64) (n int, err error)
- type MemFiler
- func (f *MemFiler) BeginUpdate() error
- func (f *MemFiler) Close() (err error)
- func (f *MemFiler) EndUpdate() (err error)
- func (f *MemFiler) Name() string
- func (f *MemFiler) PunchHole(off, size int64) (err error)
- func (f *MemFiler) ReadAt(b []byte, off int64) (n int, err error)
- func (f *MemFiler) ReadFrom(r io.Reader) (n int64, err error)
- func (f *MemFiler) Rollback() (err error)
- func (f *MemFiler) Size() (int64, error)
- func (f *MemFiler) Sync() error
- func (f *MemFiler) Truncate(size int64) (err error)
- func (f *MemFiler) WriteAt(b []byte, off int64) (n int, err error)
- func (f *MemFiler) WriteTo(w io.Writer) (n int64, err error)
- type OSFile
- type OSFiler
- func (f *OSFiler) BeginUpdate() (err error)
- func (f *OSFiler) Close() (err error)
- func (f *OSFiler) EndUpdate() (err error)
- func (f *OSFiler) Name() string
- func (f *OSFiler) PunchHole(off, size int64) (err error)
- func (f *OSFiler) ReadAt(b []byte, off int64) (n int, err error)
- func (f *OSFiler) Rollback() (err error)
- func (f *OSFiler) Size() (n int64, err error)
- func (f *OSFiler) Sync() (err error)
- func (f *OSFiler) Truncate(size int64) (err error)
- func (f *OSFiler) WriteAt(b []byte, off int64) (n int, err error)
- type Options
- type RollbackFiler
- func (r *RollbackFiler) BeginUpdate() (err error)
- func (r *RollbackFiler) Close() (err error)
- func (r *RollbackFiler) EndUpdate() (err error)
- func (r *RollbackFiler) Name() string
- func (r *RollbackFiler) PunchHole(off, size int64) error
- func (r *RollbackFiler) ReadAt(b []byte, off int64) (n int, err error)
- func (r *RollbackFiler) Rollback() (err error)
- func (r *RollbackFiler) Size() (sz int64, err error)
- func (r *RollbackFiler) Sync() error
- func (r *RollbackFiler) Truncate(size int64) error
- func (r *RollbackFiler) WriteAt(b []byte, off int64) (n int, err error)
- type SimpleFileFiler
- func (f *SimpleFileFiler) BeginUpdate() error
- func (f *SimpleFileFiler) Close() (err error)
- func (f *SimpleFileFiler) EndUpdate() (err error)
- func (f *SimpleFileFiler) Name() string
- func (f *SimpleFileFiler) PunchHole(off, size int64) (err error)
- func (f *SimpleFileFiler) ReadAt(b []byte, off int64) (n int, err error)
- func (f *SimpleFileFiler) Rollback() (err error)
- func (f *SimpleFileFiler) Size() (int64, error)
- func (f *SimpleFileFiler) Sync() error
- func (f *SimpleFileFiler) Truncate(size int64) (err error)
- func (f *SimpleFileFiler) WriteAt(b []byte, off int64) (n int, err error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Collate ¶
Collate collates two arrays of Go predeclared scalar types (and the typeless nil or []byte). If any other type appears in x or y, Collate will return a non nil error. String items are collated using strCollate or lexically byte-wise (as when using Go comparison operators) when strCollate is nil. []byte items are collated using bytes.Compare.
Collate returns:
-1 if x < y 0 if x == y +1 if x > y
The same value as defined above must be returned from strCollate.
The "outer" ordering is: nil, bool, number, []byte, string. IOW, nil is "smaller" than anything else except other nil, numbers collate before []byte, []byte collate before strings, etc.
Integers and real numbers collate as expected in math. However, complex numbers are not ordered in Go. Here the ordering is defined: Complex numbers are in comparison considered first only by their real part. Iff the result is equality then the imaginary part is used to determine the ordering. In this "second order" comparing, integers and real numbers are considered as complex numbers with a zero imaginary part.
func DecodeScalars ¶
DecodeScalars decodes a []byte produced by EncodeScalars.
func EncodeScalars ¶
EncodeScalars encodes a vector of predeclared scalar type values to a []byte, making it suitable to store it as a "record" in a DB or to use it as a key of a BTree.
func RemoveBTree ¶
RemoveBTree removes tree, represented by handle from store. Empty trees are cheap, each uses only few bytes of the store. If there's a chance that a tree will eventually get reused (non empty again), it's recommended to not/never remove it. One advantage of such approach is a stable handle of such tree.
Types ¶
type ACIDFiler0 ¶
type ACIDFiler0 struct { *RollbackFiler // contains filtered or unexported fields }
ACIDFiler0 is a very simple, synchronous implementation of 2PC. It uses a single write ahead log file to provide the structural atomicity (BeginUpdate/EndUpdate/Rollback) and durability (DB can be recovered from WAL if a crash occurred).
ACIDFiler0 is a Filer.
NOTE: Durable synchronous 2PC involves three fsyncs in this implementation (WAL, DB, zero truncated WAL). Where possible, it's recommended to collect transactions for, say one second before performing the two phase commit as the typical performance for rotational hard disks is about few tens of fsyncs per second atmost. For an example of such collective transaction approach please see the colecting FSM STT in Dbm's documentation[1].
[1]: http://godoc.org/github.com/cznic/exp/dbm
func NewACIDFiler ¶
func NewACIDFiler(db Filer, wal *os.File) (r *ACIDFiler0, err error)
NewACIDFiler0 returns a newly created ACIDFiler0 with WAL in wal.
If the WAL is zero sized then a previous clean shutdown of db is taken for granted and no recovery procedure is taken.
If the WAL is of non zero size then it is checked for having a commited/fully finished transaction not yet been reflected in db. If such transaction exists it's committed to db. If the recovery process finishes successfully, the WAL is truncated to zero size and fsync'ed prior to return from NewACIDFiler0.
func (ACIDFiler0) PeakWALSize ¶
func (a ACIDFiler0) PeakWALSize() int64
PeakWALSize reports the maximum size WAL has ever used.
type AllocStats ¶
type AllocStats struct { Handles int64 // total valid handles in use Compression int64 // number of compressed blocks TotalAtoms int64 // total number of atoms == AllocAtoms + FreeAtoms AllocBytes int64 // bytes allocated (after decompression, if/where used) AllocAtoms int64 // atoms allocated/used, including relocation atoms Relocations int64 // number of relocated used blocks FreeAtoms int64 // atoms unused AllocMap map[int64]int64 // allocated block size in atoms -> count of such blocks FreeMap map[int64]int64 // free block size in atoms -> count of such blocks }
AllocStats record statistics about a Filer. It can be optionally filled by Allocator.Verify, if successful.
type Allocator ¶
type Allocator struct { Compress bool // enables content compression // contains filtered or unexported fields }
Allocator implements "raw" storage space management (allocation and deallocation) for a low level of a DB engine. The storage is an abstraction provided by a Filer.
The terms MUST or MUST NOT, if/where used in the documentation of Allocator, written in all caps as seen here, are a requirement for any possible alternative implementations aiming for compatibility with this one.
Filer file ¶
A Filer file, or simply 'file', is a linear, contiguous sequence of blocks. Blocks may be either free (currently unused) or allocated (currently used). Some blocks may eventually become virtual in a sense as they may not be realized in the storage (sparse files).
Free Lists Table ¶
File starts with a FLT. This table records heads of 14 doubly linked free lists. The zero based index (I) vs minimal size of free blocks in that list, except the last one which registers free blocks of size 4112+ atoms:
MinSize == 2^I For example 0 -> 1, 1 -> 2, ... 12 -> 4096.
Each entry in the FLT is 8 bytes in netwtork order, MSB MUST be zero, ie. the slot value is effectively only 7 bytes. The value is the handle of the head of the respective doubly linked free list. The FLT size is 14*8 == 112(0x70) bytes. If the free blocks list for any particular size is empty, the respective FLT slot is zero. Sizes of free blocks in one list MUST NOT overlap with sizes of free lists in other list. For example, even though a free block of size 2 technically is of minimal size >= 1, it MUST NOT be put to the list for slot 0 (minimal size 1), but in slot 1( minimal size 2).
slot 0: sizes [1, 2) slot 1: sizes [2, 4) slot 2: sizes [4, 8) ... slot 11: sizes [2048, 4096) slot 12: sizes [4096, 4112) slot 13: sizes [4112, inf)
The last FLT slot collects all free blocks bigger than its minimal size. That still respects the 'no overlap' invariant.
File blocks ¶
A block is a linear, contiguous sequence of atoms. The first and last atoms of a block provide information about, for example, whether the block is free or used, what is the size of the block, etc. Details are discussed elsewhere. The first block of a file starts immediately after FLT, ie. at file offset 112(0x70).
Block atoms ¶
An atom is a fixed size piece of a block (and thus of a file too); it is 16 bytes long. A consequence is that for a valid file:
filesize == 0 (mod 16)
The first atom of the first block is considered to be atom #1.
Block handles ¶
A handle is an integer referring to a block. The reference is the number of the atom the block starts with. Put in other way:
handle == offset/16 - 6 offset == 16 * (handle + 6)
`offset` is the offset of the first byte of the block, measured in bytes - as in fseek(3). Handle has type `int64`, but only the lower 7 bytes may be nonzero while referring to a block, both in code as well as when persisted in the the file's internal bookkeeping structures - see 'Block types' bellow. So a handle is effectively only `uint56`. This also means that the maximum usable size of a file is 2^56 atoms. That is 2^60 bytes == 1 exabyte (10^18 bytes).
Nil handles ¶
A handle with numeric value of '0' refers to no block.
Zero padding ¶
A padding is used to round-up a block size to be a whole number of atoms. Any padding, if present, MUST be all zero bytes. Note that the size of padding is in [0, 15].
Content wiping ¶
When a block is deallocated, its data content is not wiped as the added overhead may be substantial while not necessarily needed. Client code should however overwrite the content of any block having sensitive data with eg. zeros (good compression) - before deallocating the block.
Block tags ¶
Every block is tagged in its first byte (a head tag) and last byte (tail tag). Block types are:
- Short content used block (head tags 0x00-0xFB)
- Long content used block (head tag 0xFC)
- Relocated used block (head tag 0xFD)
- Short, single atom, free block (head tag 0xFE)
- Long free block (head tag 0xFF)
Note: Relocated used block, 3. above (head tag 0xFD) MUST NOT refer to blocks other then 1. or 2. above (head tags 0x00-0xFC).
Content blocks ¶
Used blocks (head tags 0x00-0xFC) tail tag distinguish used/unused block and if content is compressed or not.
Content compression ¶
The tail flag of an used block is one of
CC == 0 // Content is not compressed. CC == 1 // Content is in zappy compression format.
If compression of written content is enabled, there are two cases: If compressed size < original size then the compressed content should be written if it will save at least one atom of the block. If compressed size >= original size then the compressed content should not be used.
It's recommended to use compression. For example the BTrees implementation assumes compression is used. Using compression may cause a slowdown in some cases while it may as well cause a speedup.
Short content block ¶
Short content block carries content of length between N == 0(0x00) and N == 251(0xFB) bytes.
|<-first atom start ... last atom end->| +---++-- ... --+-- ... --++------+ | 0 || 1... | 0x*...0x*E || 0x*F | +---++-- ... --+-- ... --++------+ | N || content | padding || CC | +---++-- ... --+-- ... --++------+ A == (N+1)/16 + 1 // The number of atoms in the block [1, 16] padding == 15 - (N+1)%16 // Length of the zero padding
Long content block ¶
Long content block carries content of length between N == 252(0xFC) and N == 65787(0x100FB) bytes.
|<-first atom start ... last atom end->| +------++------+-- ... --+-- ... --++------+ | 0 || 1..2 | 3... | 0x*...0x*E || 0x*F | +------++------+-- ... --+-- ... --++------+ | 0xFC || M | content | padding || CC | +------++------+-- ... --+-- ... --++------+ A == (N+3)/16 + 1 // The number of atoms in the block [16, 4112] M == N % 0x10000 // Stored as 2 bytes in network byte order padding == 15 - (N+3)%16 // Length of the zero padding
Relocated used block ¶
Relocated block allows to permanently assign a handle to some content and resize the content anytime afterwards without having to update all the possible existing references; the handle can be constant while the content size may be dynamic. When relocating a block, any space left by the original block content, above this single atom block, MUST be reclaimed.
Relocations MUST point only to a used short or long block == blocks with tags 0x00...0xFC.
+------++------+---------++----+ | 0 || 1..7 | 8...14 || 15 | +------++------+---------++----+ | 0xFD || H | padding || 0 | +------++------+---------++----+
H is the handle of the relocated block in network byte order.
Free blocks ¶
Free blocks are the result of space deallocation. Free blocks are organized in one or more doubly linked lists, abstracted by the FLT interface. Free blocks MUST be "registered" by putting them in such list. Allocator MUST reuse a big enough free block, if such exists, before growing the file size. When a free block is created by deallocation or reallocation it MUST be joined with any adjacently existing free blocks before "registering". If the resulting free block is now a last block of a file, the free block MUST be discarded and the file size MUST be truncated accordingly instead. Put differently, there MUST NOT ever be a free block at the file end.
A single free atom ¶
Is an unused block of size 1 atom.
+------++------+--------++------+ | 0 || 1..7 | 8...14 || 15 | +------++------+--------++------+ | 0xFE || P | N || 0xFE | +------++------+--------++------+
P and N, stored in network byte order, are the previous and next free block handles in the doubly linked list to which this free block belongs.
A long unused block ¶
Is an unused block of size > 1 atom.
+------++------+-------+---------+- ... -+----------++------+ | 0 || 1..7 | 8..14 | 15...21 | | Z-7..Z-1 || Z | +------++------+-------+---------+- ... -+----------++------+ | 0xFF || S | P | N | Leak | S || 0xFF | +------++------+-------+---------+- ... -+----------++------+ Z == 16 * S - 1
S is the size of this unused block in atoms. P and N are the previous and next free block handles in the doubly linked list to which this free block belongs. Leak contains any data the block had before deallocating this block. See also the subtitle 'Content wiping' above. S, P and N are stored in network byte order. Large free blocks may trigger a consideration of file hole punching of the Leak field - for some value of 'large'.
Note: Allocator methods vs CRUD[1]:
Alloc [C]reate Get [R]ead Realloc [U]pdate Free [D]elete
Note: No Allocator method returns io.EOF.
[1]: http://en.wikipedia.org/wiki/Create,_read,_update_and_delete
func NewAllocator ¶
NewAllocator returns a new Allocator. To open an existing file, pass its Filer. To create a "new" file, pass a Filer which file is of zero size.
func (*Allocator) Alloc ¶
Alloc allocates storage space for b and returns the handle of the new block with content set to b or an error, if any. The returned handle is valid only while the block is used - until the block is deallocated. No two valid handles share the same value within the same Filer, but any value of a handle not referring to any used block may become valid any time as a result of Alloc.
Invoking Alloc on an empty Allocator is guaranteed to return handle with value 1. The intended use of content of handle 1 is a root "directory" of other data held by an Allocator.
Passing handles not obtained initially from Alloc or not anymore valid to any other Allocator methods can result in an irreparably corrupted database.
func (*Allocator) CacheStats ¶
func (a *Allocator) CacheStats() (buffersUsed, buffersTotal int, bytesUsed, bytesTotal, hits, misses int64)
CacheStats reports cache statistics.
TODO return a struct perhaps.
func (*Allocator) Free ¶
Free deallocates the block referred to by handle or returns an error, if any.
After Free succeeds, handle is invalid and must not be used.
Handle must have been obtained initially from Alloc and must be still valid, otherwise a database may get irreparably corrupted.
func (*Allocator) Get ¶
Get returns the data content of a block referred to by handle or an error if any. The returned slice may be a sub-slice of buf if buf was large enough to hold the entire content. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil buf.
If the content was stored using compression then it is transparently returned decompressed.
Handle must have been obtained initially from Alloc and must be still valid, otherwise invalid data may be returned without detecting the error.
Get is safe for concurrent access by multiple goroutines iff no other goroutine mutates the DB.
func (*Allocator) Realloc ¶
Realloc sets the content of a block referred to by handle or returns an error, if any.
Handle must have been obtained initially from Alloc and must be still valid, otherwise a database may get irreparably corrupted.
func (*Allocator) Verify ¶
Verify attempts to find any structural errors in a Filer wrt the organization of it as defined by Allocator. 'bitmap' is a scratch pad for necessary bookkeeping and will grow to at most to Allocator's Filer.Size()/128 (0,78%). Any problems found are reported to 'log' except non verify related errors like disk read fails etc. If 'log' returns false or the error doesn't allow to (reliably) continue, the verification process is stopped and an error is returned from the Verify function. Passing a nil log works like providing a log function always returning false. Any non-structural errors, like for instance Filer read errors, are NOT reported to 'log', but returned as the Verify's return value, because Verify cannot proceed in such cases. Verify returns nil only if it fully completed verifying Allocator's Filer without detecting any error.
It is recommended to limit the number reported problems by returning false from 'log' after reaching some limit. Huge and corrupted DB can produce an overwhelming error report dataset.
The verifying process will scan the whole DB at least 3 times (a trade between processing space and time consumed). It doesn't read the content of free blocks above the head/tail info bytes. If the 3rd phase detects lost free space, then a 4th scan (a faster one) is performed to precisely report all of them.
If the DB/Filer to be verified is reasonably small, respective if its size/128 can comfortably fit within process's free memory, then it is recommended to consider using a MemFiler for the bit map.
Statistics are returned via 'stats' if non nil. The statistics are valid only if Verify succeeded, ie. it didn't reported anything to log and it returned a nil error.
type BTree ¶
type BTree struct {
// contains filtered or unexported fields
}
BTree is a B+tree[1][2], i.e. a variant which speeds up enumeration/iteration of the BTree. According to its origin it can be volatile (backed only by memory) or non-volatile (backed by a non-volatile Allocator).
The specific implementation of BTrees in this package are B+trees with delayed split/concatenation (discussed in e.g. [3]).
Note: No BTree methods returns io.EOF for physical Filer reads/writes. The io.EOF is returned only by bTreeEnumerator methods to indicate "no more K-V pair".
[1]: http://en.wikipedia.org/wiki/B+tree [2]: http://zgking.com:8080/home/donghui/publications/books/dshandbook_BTree.pdf [3]: http://people.cs.aau.dk/~simas/aalg06/UbiquitBtree.pdf
func CreateBTree ¶
func CreateBTree(store *Allocator, collate func(a, b []byte) int) (bt *BTree, handle int64, err error)
CreateBTree creates a new BTree in store. It returns the tree, its (freshly assigned) handle (for OpenBTree or RemoveBTree) or an error, if any.
func OpenBTree ¶
func OpenBTree(store *Allocator, collate func(a, b []byte) int, handle int64) (bt *BTree, err error)
OpenBTree opens a store's BTree using handle. It returns the tree or an error, if any. The same tree may be opened more than once, but operations on the separate instances should not ever overlap or void the other instances. However, the intended API usage is to open the same tree handle only once (handled by some upper layer "dispatcher").
func (*BTree) DeleteAny ¶
DeleteAny deletes one key and its associated value from the tree. If the tree is empty on return then empty is true.
func (*BTree) Dump ¶
Dump outputs a human readable dump of t to w. It is usable iff t keys and values are encoded scalars (see EncodeScalars). Intended use is only for examples or debugging. Some type information is lost in the rendering, for example a float value '17.' and an integer value '17' may both output as '17'.
func (*BTree) Extract ¶
Extract is a combination of Get and Delete. If the key exists in the tree, it is returned (like Get) and also deleted from a tree in a more efficient way which doesn't walk it twice. The returned slice may be a sub-slice of buf if buf was large enough to hold the entire content. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil buf.
func (*BTree) First ¶
First returns the first KV pair of the tree, if it exists. Otherwise key == nil and value == nil.
func (*BTree) Get ¶
Get returns the value associated with key, or nil if no such value exists. The returned slice may be a sub-slice of buf if buf was large enough to hold the entire content. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil buf.
Get is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.
func (*BTree) IndexSeek ¶
func (t *BTree) IndexSeek(key []byte, c func(a, b []byte) int) (enum *BTreeEnumerator, hit bool, err error)
IndexSeek returns an Enumerator with "position" or an error of any. Normally the position is on a KV pair such that key >= KV.key. Then hit is key == KV.key. The position is possibly "after" the last KV pair, but that is not an error. The collate function originally passed to CreateBTree is used for enumerating the tree but a custom collate function c is used for IndexSeek.
IndexSeek is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.
func (*BTree) Last ¶
Last returns the last KV pair of the tree, if it exists. Otherwise key == nil and value == nil.
func (*BTree) Put ¶
func (t *BTree) Put(buf, key []byte, upd func(key, old []byte) (new []byte, write bool, err error)) (old []byte, written bool, err error)
Put combines Get and Set in a more efficient way where the tree is walked only once. The upd(ater) receives the current (key, old-value), if that exists or (key, nil) otherwise. It can then return a (new-value, true, nil) to create or overwrite the existing value in the KV pair, or (whatever, false, nil) if it decides not to create or not to update the value of the KV pair.
tree.Set(k, v)
conceptually equals
tree.Put(k, func(k, v []byte){ return v, true }([]byte, bool))
modulo the differing return values.
The returned slice may be a sub-slice of buf if buf was large enough to hold the entire content. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil buf.
func (*BTree) Seek ¶
func (t *BTree) Seek(key []byte) (enum *BTreeEnumerator, hit bool, err error)
Seek returns an Enumerator with "position" or an error of any. Normally the position is on a KV pair such that key >= KV.key. Then hit is key == KV.key. The position is possibly "after" the last KV pair, but that is not an error.
Seek is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.
func (*BTree) SeekFirst ¶
func (t *BTree) SeekFirst() (enum *BTreeEnumerator, err error)
seekFirst returns an enumerator positioned on the first KV pair in the tree, if any. For an empty tree, err == io.EOF is returend.
SeekFirst is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.
func (*BTree) SeekLast ¶
func (t *BTree) SeekLast() (enum *BTreeEnumerator, err error)
seekLast returns an enumerator positioned on the last KV pair in the tree, if any. For an empty tree, err == io.EOF is returend.
SeekLast is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.
type BTreeEnumerator ¶
type BTreeEnumerator struct {
// contains filtered or unexported fields
}
BTreeEnumerator captures the state of enumerating a tree. It is returned from the Seek* methods. The enumerator is aware of any mutations made to the tree in the process of enumerating it and automatically resumes the enumeration.
func (*BTreeEnumerator) Next ¶
func (e *BTreeEnumerator) Next() (key, value []byte, err error)
Next returns the currently enumerated KV pair, if it exists and moves to the next KV in the key collation order. If there is no KV pair to return, err == io.EOF is returned.
Next is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.
func (*BTreeEnumerator) Prev ¶
func (e *BTreeEnumerator) Prev() (key, value []byte, err error)
Prev returns the currently enumerated KV pair, if it exists and moves to the previous KV in the key collation order. If there is no KV pair to return, err == io.EOF is returned.
Prev is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.
type ErrDecodeScalars ¶
ErrDecodeScalars is possibly returned from DecodeScalars
func (*ErrDecodeScalars) Error ¶
func (e *ErrDecodeScalars) Error() string
Error implements the built in error type.
type ErrILSEQ ¶
type ErrILSEQ struct { Type ErrType Off int64 Arg int64 Arg2 int64 Arg3 int64 Name string More interface{} }
ErrILSEQ reports a corrupted file format. Details in fields according to Type.
type ErrINVAL ¶
type ErrINVAL struct { Src string Val interface{} }
ErrINVAL reports invalid values passed as parameters, for example negative offsets where only non-negative ones are allowed or read from the DB.
type ErrPERM ¶
type ErrPERM struct {
Src string
}
ErrPERM is for example reported when a Filer is closed while BeginUpdate(s) are not balanced with EndUpdate(s)/Rollback(s) or when EndUpdate or Rollback is invoked which is not paired with a BeginUpdate.
type ErrType ¶
type ErrType int
ErrTag represents an ErrILSEQ kind.
const ( ErrOther ErrType = iota ErrAdjacentFree // Adjacent free blocks (.Off and .Arg) ErrDecompress // Used compressed block: corrupted compression ErrExpFreeTag // Expected a free block tag, got .Arg ErrExpUsedTag // Expected a used block tag, got .Arg ErrFLT // Free block is invalid or referenced multiple times ErrFLTLoad // FLT truncated to .Off, need size >= .Arg ErrFLTSize // Free block size (.Arg) doesn't belong to its list min size: .Arg2 ErrFileSize // File .Name size (.Arg) != 0 (mod 16) ErrFreeChaining // Free block, .prev.next doesn't point back to this block ErrFreeTailBlock // Last block is free ErrHead // Head of a free block list has non zero Prev (.Arg) ErrInvalidRelocTarget // Reloc doesn't target (.Arg) a short or long used block ErrInvalidWAL // Corrupted write ahead log. .Name: file name, .More: more ErrLongFreeBlkTooLong // Long free block spans beyond EOF, size .Arg ErrLongFreeBlkTooShort // Long free block must have at least 2 atoms, got only .Arg ErrLongFreeNextBeyondEOF // Long free block .Next (.Arg) spans beyond EOF ErrLongFreePrevBeyondEOF // Long free block .Prev (.Arg) spans beyond EOF ErrLongFreeTailTag // Expected a long free block tail tag, got .Arg ErrLostFreeBlock // Free block is not in any FLT list ErrNullReloc // Used reloc block with nil target ErrRelocBeyondEOF // Used reloc points (.Arg) beyond EOF ErrShortFreeTailTag // Expected a short free block tail tag, got .Arg ErrSmall // Request for a free block (.Arg) returned a too small one (.Arg2) at .Off ErrTailTag // Block at .Off has invalid tail CC (compression code) tag, got .Arg ErrUnexpReloc // Unexpected reloc block referred to from reloc block .Arg ErrVerifyPadding // Used block has nonzero padding ErrVerifyTailSize // Long free block size .Arg but tail size .Arg2 ErrVerifyUsedSpan // Used block size (.Arg) spans beyond EOF )
ErrILSEQ types
type Filer ¶
type Filer interface { // BeginUpdate increments the "nesting" counter (initially zero). Every // call to BeginUpdate must be eventually "balanced" by exactly one of // EndUpdate or Rollback. Calls to BeginUpdate may nest. BeginUpdate() error // Analogous to os.File.Close(). Close() error // EndUpdate decrements the "nesting" counter. If it's zero after that // then assume the "storage" has reached structural integrity (after a // batch of partial updates). If a Filer implements some support for // that (write ahead log, journal, etc.) then the appropriate actions // are to be taken for nesting == 0. Invocation of an unbalanced // EndUpdate is an error. EndUpdate() error // Analogous to os.File.Name(). Name() string // PunchHole deallocates space inside a "file" in the byte range // starting at off and continuing for size bytes. The actual hole // created by PunchHole may be smaller than requested. The Filer size // (as reported by `Size()` does not change when hole punching, even // when punching the end of a file off. In contrast to the Linux // implementation of FALLOC_FL_PUNCH_HOLE in `fallocate`(2); a Filer is // free not only to ignore `PunchHole()` (implement it as a nop), but // additionally no guarantees about the content of the hole, when // eventually read back, are required, i.e. any data, not only zeros, // can be read from the "hole", including just anything what was left // there - with all of the possible security problems. PunchHole(off, size int64) error // As os.File.ReadAt. Note: `off` is an absolute "file pointer" // address and cannot be negative even when a Filer is a InnerFiler. ReadAt(b []byte, off int64) (n int, err error) // Rollback cancels and undoes the innermost pending update level. // Rollback decrements the "nesting" counter. If a Filer implements // some support for keeping structural integrity (write ahead log, // journal, etc.) then the appropriate actions are to be taken. // Invocation of an unbalanced Rollback is an error. Rollback() error // Analogous to os.File.FileInfo().Size(). Size() (int64, error) // Analogous to os.Sync(). Sync() (err error) // Analogous to os.File.Truncate(). Truncate(size int64) error // Analogous to os.File.WriteAt(). Note: `off` is an absolute "file // pointer" address and cannot be negative even when a Filer is a // InnerFiler. WriteAt(b []byte, off int64) (n int, err error) }
A Filer is a []byte-like model of a file or similar entity. It may optionally implement support for structural transaction safety. In contrast to a file stream, a Filer is not sequentially accessible. ReadAt and WriteAt are always "addressed" by an offset and are assumed to perform atomically. A Filer is not safe for concurrent access, it's designed for consumption by the other objects in package, which should use a Filer from one goroutine only or via a mutex. BeginUpdate, EndUpdate and Rollback must be either all implemented by a Filer for structural integrity - or they should be all no-ops; where/if that requirement is relaxed.
If a Filer wraps another Filer implementation, it usually invokes the same methods on the "inner" one, after some possible argument translations etc. If a Filer implements the structural transactions handling methods (BeginUpdate, EndUpdate and Rollback) as no-ops _and_ wraps another Filer: it then still MUST invoke those methods on the inner Filer. This is important for the case where a RollbackFiler exists somewhere down the chain. It's also important for an Allocator - to know when it must invalidate its FLT cache.
type InnerFiler ¶
type InnerFiler struct {
// contains filtered or unexported fields
}
A InnerFiler is a Filer with added addressing/size translation.
func NewInnerFiler ¶
func NewInnerFiler(outer Filer, off int64) *InnerFiler
NewInnerFiler returns a new InnerFiler wrapped by `outer` in a way which adds `off` to every access.
For example, considering:
inner := NewInnerFiler(outer, 10)
then
inner.WriteAt([]byte{42}, 4)
translates to
outer.WriteAt([]byte{42}, 14)
But an attempt to emulate
outer.WriteAt([]byte{17}, 9)
by
inner.WriteAt([]byte{17}, -1)
will fail as the `off` parameter can never be < 0. Also note that
inner.Size() == outer.Size() - off,
i.e. `inner` pretends no `outer` exists. Finally, after e.g.
inner.Truncate(7) outer.Size() == 17
will be true.
func (*InnerFiler) BeginUpdate ¶
func (f *InnerFiler) BeginUpdate() error
BeginUpdate implements Filer.
func (*InnerFiler) PunchHole ¶
func (f *InnerFiler) PunchHole(off, size int64) error
PunchHole implements Filer. `off`, `size` must be >= 0.
func (*InnerFiler) ReadAt ¶
func (f *InnerFiler) ReadAt(b []byte, off int64) (n int, err error)
ReadAt implements Filer. `off` must be >= 0.
func (*InnerFiler) Truncate ¶
func (f *InnerFiler) Truncate(size int64) error
Truncate implements Filer.
type MemFiler ¶
type MemFiler struct {
// contains filtered or unexported fields
}
MemFiler is a memory backed Filer. It implements BeginUpdate, EndUpdate and Rollback as no-ops. MemFiler is not automatically persistent, but it has ReadFrom and WriteTo methods.
func (*MemFiler) ReadFrom ¶
ReadFrom is a helper to populate MemFiler's content from r. 'n' reports the number of bytes read from 'r'.
func (*MemFiler) WriteTo ¶
WriteTo is a helper to copy/persist MemFiler's content to w. If w is also an io.WriterAt then WriteTo may attempt to _not_ write any big, for some value of big, runs of zeros, i.e. it will attempt to punch holes, where possible, in `w` if that happens to be a freshly created or to zero length truncated OS file. 'n' reports the number of bytes written to 'w'.
type OSFile ¶
type OSFile interface { Name() string Stat() (fi os.FileInfo, err error) Sync() (err error) Truncate(size int64) (err error) io.Closer io.Reader io.ReaderAt io.Seeker io.Writer io.WriterAt }
OSFile is an os.File like minimal set of methods allowing to construct a Filer.
type OSFiler ¶
type OSFiler struct {
// contains filtered or unexported fields
}
OSFiler is like a SimpleFileFiler but based on an OSFile.
func NewOSFiler ¶
NewOSFiler returns a Filer from an OSFile. This Filer is like the SimpleFileFiler, it does not implement the transaction related methods.
func (*OSFiler) BeginUpdate ¶
BeginUpdate implements Filer.
type Options ¶
type Options struct{}
Options are passed to the NewAllocator to amend some configuration. The compatibility promise is the same as of struct types in the Go standard library - introducing changes can be made only by adding new exported fields, which is backward compatible as long as client code uses field names to assign values of imported struct types literals.
NOTE: No options are currently defined.
type RollbackFiler ¶
type RollbackFiler struct {
// contains filtered or unexported fields
}
RollbackFiler is a Filer implementing structural transaction handling. Structural transactions should be small and short lived because all non committed data are held in memory until committed or discarded by a Rollback.
While using RollbackFiler, every intended update of the wrapped Filler, by WriteAt, Truncate or PunchHole, _must_ be made within a transaction. Attempts to do it outside of a transaction will return ErrPERM. OTOH, invoking ReadAt outside of a transaction is not a problem.
No nested transactions: All updates within a transaction are held in memory. On a matching EndUpdate the updates held in memory are actually written to the wrapped Filer.
Nested transactions: Correct data will be seen from RollbackFiler when any level of a nested transaction is rollbacked. The actual writing to the wrapped Filer happens only when the outer most transaction nesting level is closed.
Invoking Rollback is an alternative to EndUpdate. It discards all changes made at the current transaction level and returns the "state" (possibly not yet persisted) of the Filer to what it was before the corresponding BeginUpdate.
During an open transaction, all reads (using ReadAt) are "dirty" reads, seeing the uncommitted changes made to the Filer's data.
Lldb databases should be based upon a RollbackFiler.
With a wrapped MemFiler one gets transactional memory. With, for example a wrapped disk based SimpleFileFiler it protects against at least some HW errors - if Rollback is properly invoked on such failures and/or if there's some WAL or 2PC or whatever other safe mechanism based recovery procedure used by the client.
The "real" writes to the wrapped Filer (or WAL instead) go through the writerAt supplied to NewRollbackFiler.
List of functions/methods which are recommended to be wrapped in a BeginUpdate/EndUpdate structural transaction:
Allocator.Alloc Allocator.Free Allocator.Realloc CreateBTree RemoveBTree BTree.Clear BTree.Delete BTree.DeleteAny BTree.Clear BTree.Extract BTree.Get (it can mutate the DB) BTree.Put BTree.Set
NOTE: RollbackFiler is a generic solution intended to wrap Filers provided by this package which do not implement any of the transactional methods. RollbackFiler thus _does not_ invoke any of the transactional methods of its wrapped Filer.
RollbackFiler is safe for concurrent use by multiple goroutines.
func NewRollbackFiler ¶
func NewRollbackFiler(f Filer, checkpoint func(sz int64) error, writerAt io.WriterAt) (r *RollbackFiler, err error)
NewRollbackFiler returns a RollbackFiler wrapping f.
The checkpoint parameter ¶
The checkpoint function is called after closing (by EndUpdate) the upper most level open transaction if all calls of writerAt were successful and the DB (or eg. a WAL) is thus now in a consistent state (virtually, in the ideal world with no write caches, no HW failures, no process crashes, ...).
NOTE: In, for example, a 2PC it is necessary to reflect also the sz parameter as the new file size (as in the parameter to Truncate). All changes were successfully written already by writerAt before invoking checkpoint.
The writerAt parameter ¶
The writerAt interface is used to commit the updates of the wrapped Filer. If any invocation of writerAt fails then a non nil error will be returned from EndUpdate and checkpoint will _not_ ne called. Neither is necessary to call Rollback. The rule of thumb: The [structural] transaction [level] is closed by invoking exactly once one of EndUpdate _or_ Rollback.
It is presumed that writerAt uses WAL or 2PC or whatever other safe mechanism to physically commit the updates.
Updates performed by invocations of writerAt are byte-precise, but not necessarily maximum possible length precise. IOW, for example an update crossing page boundaries may be performed by more than one writerAt invocation. No offset sorting is performed. This may change if it proves to be a problem. Such change would be considered backward compatible.
NOTE: Using RollbackFiler, but failing to ever invoke a matching "closing" EndUpdate after an "opening" BeginUpdate means neither writerAt or checkpoint will ever get called - with all the possible data loss consequences.
func (*RollbackFiler) BeginUpdate ¶
func (r *RollbackFiler) BeginUpdate() (err error)
Implements Filer.
func (*RollbackFiler) Close ¶
func (r *RollbackFiler) Close() (err error)
Implements Filer.
Close will return an error if not invoked at nesting level 0. However, to allow emergency closing from eg. a signal handler; if Close is invoked within an open transaction(s), it rollbacks any non committed open transactions and performs the Close operation.
IOW: Regardless of the transaction nesting level the Close is always performed but any uncommitted transaction data are lost.
func (*RollbackFiler) PunchHole ¶
func (r *RollbackFiler) PunchHole(off, size int64) error
Implements Filer.
func (*RollbackFiler) ReadAt ¶
func (r *RollbackFiler) ReadAt(b []byte, off int64) (n int, err error)
Implements Filer.
func (*RollbackFiler) Truncate ¶
func (r *RollbackFiler) Truncate(size int64) error
Implements Filer.
type SimpleFileFiler ¶
type SimpleFileFiler struct {
// contains filtered or unexported fields
}
SimpleFileFiler is an os.File backed Filer intended for use where structural consistency can be reached by other means (SimpleFileFiler is for example wrapped in eg. an RollbackFiler or ACIDFiler0) or where persistence is not required (temporary/working data sets).
SimpleFileFiler is the most simple os.File backed Filer implementation as it does not really implement BeginUpdate and EndUpdate/Rollback in any way which would protect the structural integrity of data. If misused e.g. as a real database storage w/o other measures, it can easily cause data loss when, for example, a power outage occurs or the updating process terminates abruptly.
func NewSimpleFileFiler ¶
func NewSimpleFileFiler(f *os.File) *SimpleFileFiler
NewSimpleFileFiler returns a new SimpleFileFiler.
func (*SimpleFileFiler) BeginUpdate ¶
func (f *SimpleFileFiler) BeginUpdate() error
BeginUpdate implements Filer.
func (*SimpleFileFiler) Close ¶
func (f *SimpleFileFiler) Close() (err error)
Close implements Filer.
func (*SimpleFileFiler) EndUpdate ¶
func (f *SimpleFileFiler) EndUpdate() (err error)
EndUpdate implements Filer.
func (*SimpleFileFiler) PunchHole ¶
func (f *SimpleFileFiler) PunchHole(off, size int64) (err error)
PunchHole implements Filer.
func (*SimpleFileFiler) ReadAt ¶
func (f *SimpleFileFiler) ReadAt(b []byte, off int64) (n int, err error)
ReadAt implements Filer.
func (*SimpleFileFiler) Rollback ¶
func (f *SimpleFileFiler) Rollback() (err error)
Rollback implements Filer.
func (*SimpleFileFiler) Size ¶
func (f *SimpleFileFiler) Size() (int64, error)
Size implements Filer.
func (*SimpleFileFiler) Truncate ¶
func (f *SimpleFileFiler) Truncate(size int64) (err error)
Truncate implements Filer.