lldb

package
v0.15.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 8, 2016 License: BSD-3-Clause, Apache-2.0 Imports: 16 Imported by: 0

README

lldb

Package lldb (WIP) implements a low level database engine.

Installation: $ go get github.com/cznic/exp/lldb

Documentation: godoc.org/github.com/cznic/exp/lldb

Documentation

Overview

Package lldb (WIP) implements a low level database engine. The database model used could be considered a specific implementation of some small(est) intersection of models listed in [1]. As a settled term is lacking, it'll be called here a 'Virtual memory model' (VMM).

Experimental release notes

This is an experimental release. Don't open a DB from two applications or two instances of an application - it will get corrupted (no file locking is implemented and this task is delegated to lldb's clients).

WARNING: THE LLDB API IS SUBJECT TO CHANGE.

Filers

A Filer is an abstraction of storage. A Filer may be a part of some process' virtual address space, an OS file, a networked, remote file etc. Persistence of the storage is optional, opaque to VMM and it is specific to a concrete Filer implementation.

Space management

Mechanism to allocate, reallocate (resize), deallocate (and later reclaim the unused) contiguous parts of a Filer, called blocks. Blocks are identified and referred to by a handle, an int64.

BTrees

In addition to the VMM like services, lldb provides volatile and non-volatile BTrees. Keys and values of a BTree are limited in size to 64kB each (a bit more actually). Support for larger keys/values, if desired, can be built atop a BTree to certain limits.

Handles vs pointers

A handle is the abstracted storage counterpart of a memory address. There is one fundamental difference, though. Resizing a block never results in a change to the handle which refers to the resized block, so a handle is more akin to an unique numeric id/key. Yet it shares one property of pointers - handles can be associated again with blocks after the original handle block was deallocated. In other words, a handle uniqueness domain is the state of the database and is not something comparable to e.g. an ever growing numbering sequence.

Also, as with memory pointers, dangling handles can be created and blocks overwritten when such handles are used. Using a zero handle to refer to a block will not panic; however, the resulting error is effectively the same exceptional situation as dereferencing a nil pointer.

Blocks

Allocated/used blocks, are limited in size to only a little bit more than 64kB. Bigger semantic entities/structures must be built in lldb's client code. The content of a block has no semantics attached, it's only a fully opaque `[]byte`.

Scalars

Use of "scalars" applies to EncodeScalars, DecodeScalars and Collate. Those first two "to bytes" and "from bytes" functions are suggested for handling multi-valued Allocator content items and/or keys/values of BTrees (using Collate for keys). Types called "scalar" are:

nil (the typeless one)
bool
all integral types: [u]int8, [u]int16, [u]int32, [u]int, [u]int64
all floating point types: float32, float64
all complex types: complex64, complex128
[]byte (64kB max)
string (64kb max)

Specific implementations

Included are concrete implementations of some of the VMM interfaces included to ease serving simple client code or for testing and possibly as an example. More details in the documentation of such implementations.

[1]: http://en.wikipedia.org/wiki/Database_model

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Collate

func Collate(x, y []interface{}, strCollate func(string, string) int) (r int, err error)

Collate collates two arrays of Go predeclared scalar types (and the typeless nil or []byte). If any other type appears in x or y, Collate will return a non nil error. String items are collated using strCollate or lexically byte-wise (as when using Go comparison operators) when strCollate is nil. []byte items are collated using bytes.Compare.

Collate returns:

-1 if x <  y
 0 if x == y
+1 if x >  y

The same value as defined above must be returned from strCollate.

The "outer" ordering is: nil, bool, number, []byte, string. IOW, nil is "smaller" than anything else except other nil, numbers collate before []byte, []byte collate before strings, etc.

Integers and real numbers collate as expected in math. However, complex numbers are not ordered in Go. Here the ordering is defined: Complex numbers are in comparison considered first only by their real part. Iff the result is equality then the imaginary part is used to determine the ordering. In this "second order" comparing, integers and real numbers are considered as complex numbers with a zero imaginary part.

func DecodeScalars

func DecodeScalars(b []byte) (scalars []interface{}, err error)

DecodeScalars decodes a []byte produced by EncodeScalars.

func EncodeScalars

func EncodeScalars(scalars ...interface{}) (b []byte, err error)

EncodeScalars encodes a vector of predeclared scalar type values to a []byte, making it suitable to store it as a "record" in a DB or to use it as a key of a BTree.

func RemoveBTree

func RemoveBTree(store *Allocator, handle int64) (err error)

RemoveBTree removes tree, represented by handle from store. Empty trees are cheap, each uses only few bytes of the store. If there's a chance that a tree will eventually get reused (non empty again), it's recommended to not/never remove it. One advantage of such approach is a stable handle of such tree.

Types

type ACIDFiler0

type ACIDFiler0 struct {
	*RollbackFiler
	// contains filtered or unexported fields
}

ACIDFiler0 is a very simple, synchronous implementation of 2PC. It uses a single write ahead log file to provide the structural atomicity (BeginUpdate/EndUpdate/Rollback) and durability (DB can be recovered from WAL if a crash occurred).

ACIDFiler0 is a Filer.

NOTE: Durable synchronous 2PC involves three fsyncs in this implementation (WAL, DB, zero truncated WAL). Where possible, it's recommended to collect transactions for, say one second before performing the two phase commit as the typical performance for rotational hard disks is about few tens of fsyncs per second atmost. For an example of such collective transaction approach please see the colecting FSM STT in Dbm's documentation[1].

[1]: http://godoc.org/github.com/cznic/exp/dbm

func NewACIDFiler

func NewACIDFiler(db Filer, wal *os.File) (r *ACIDFiler0, err error)

NewACIDFiler0 returns a newly created ACIDFiler0 with WAL in wal.

If the WAL is zero sized then a previous clean shutdown of db is taken for granted and no recovery procedure is taken.

If the WAL is of non zero size then it is checked for having a commited/fully finished transaction not yet been reflected in db. If such transaction exists it's committed to db. If the recovery process finishes successfully, the WAL is truncated to zero size and fsync'ed prior to return from NewACIDFiler0.

func (ACIDFiler0) PeakWALSize

func (a ACIDFiler0) PeakWALSize() int64

PeakWALSize reports the maximum size WAL has ever used.

type AllocStats

type AllocStats struct {
	Handles     int64           // total valid handles in use
	Compression int64           // number of compressed blocks
	TotalAtoms  int64           // total number of atoms == AllocAtoms + FreeAtoms
	AllocBytes  int64           // bytes allocated (after decompression, if/where used)
	AllocAtoms  int64           // atoms allocated/used, including relocation atoms
	Relocations int64           // number of relocated used blocks
	FreeAtoms   int64           // atoms unused
	AllocMap    map[int64]int64 // allocated block size in atoms -> count of such blocks
	FreeMap     map[int64]int64 // free block size in atoms -> count of such blocks
}

AllocStats record statistics about a Filer. It can be optionally filled by Allocator.Verify, if successful.

type Allocator

type Allocator struct {
	Compress bool // enables content compression
	// contains filtered or unexported fields
}

Allocator implements "raw" storage space management (allocation and deallocation) for a low level of a DB engine. The storage is an abstraction provided by a Filer.

The terms MUST or MUST NOT, if/where used in the documentation of Allocator, written in all caps as seen here, are a requirement for any possible alternative implementations aiming for compatibility with this one.

Filer file

A Filer file, or simply 'file', is a linear, contiguous sequence of blocks. Blocks may be either free (currently unused) or allocated (currently used). Some blocks may eventually become virtual in a sense as they may not be realized in the storage (sparse files).

Free Lists Table

File starts with a FLT. This table records heads of 14 doubly linked free lists. The zero based index (I) vs minimal size of free blocks in that list, except the last one which registers free blocks of size 4112+ atoms:

MinSize == 2^I

For example 0 -> 1, 1 -> 2, ... 12 -> 4096.

Each entry in the FLT is 8 bytes in netwtork order, MSB MUST be zero, ie. the slot value is effectively only 7 bytes. The value is the handle of the head of the respective doubly linked free list. The FLT size is 14*8 == 112(0x70) bytes. If the free blocks list for any particular size is empty, the respective FLT slot is zero. Sizes of free blocks in one list MUST NOT overlap with sizes of free lists in other list. For example, even though a free block of size 2 technically is of minimal size >= 1, it MUST NOT be put to the list for slot 0 (minimal size 1), but in slot 1( minimal size 2).

slot 0:		sizes [1, 2)
slot 1:		sizes [2, 4)
slot 2:		sizes [4, 8)
...
slot 11:	sizes [2048, 4096)
slot 12:	sizes [4096, 4112)
slot 13:	sizes [4112, inf)

The last FLT slot collects all free blocks bigger than its minimal size. That still respects the 'no overlap' invariant.

File blocks

A block is a linear, contiguous sequence of atoms. The first and last atoms of a block provide information about, for example, whether the block is free or used, what is the size of the block, etc. Details are discussed elsewhere. The first block of a file starts immediately after FLT, ie. at file offset 112(0x70).

Block atoms

An atom is a fixed size piece of a block (and thus of a file too); it is 16 bytes long. A consequence is that for a valid file:

filesize == 0 (mod 16)

The first atom of the first block is considered to be atom #1.

Block handles

A handle is an integer referring to a block. The reference is the number of the atom the block starts with. Put in other way:

handle == offset/16 - 6
offset == 16 * (handle + 6)

`offset` is the offset of the first byte of the block, measured in bytes - as in fseek(3). Handle has type `int64`, but only the lower 7 bytes may be nonzero while referring to a block, both in code as well as when persisted in the the file's internal bookkeeping structures - see 'Block types' bellow. So a handle is effectively only `uint56`. This also means that the maximum usable size of a file is 2^56 atoms. That is 2^60 bytes == 1 exabyte (10^18 bytes).

Nil handles

A handle with numeric value of '0' refers to no block.

Zero padding

A padding is used to round-up a block size to be a whole number of atoms. Any padding, if present, MUST be all zero bytes. Note that the size of padding is in [0, 15].

Content wiping

When a block is deallocated, its data content is not wiped as the added overhead may be substantial while not necessarily needed. Client code should however overwrite the content of any block having sensitive data with eg. zeros (good compression) - before deallocating the block.

Block tags

Every block is tagged in its first byte (a head tag) and last byte (tail tag). Block types are:

  1. Short content used block (head tags 0x00-0xFB)
  2. Long content used block (head tag 0xFC)
  3. Relocated used block (head tag 0xFD)
  4. Short, single atom, free block (head tag 0xFE)
  5. Long free block (head tag 0xFF)

Note: Relocated used block, 3. above (head tag 0xFD) MUST NOT refer to blocks other then 1. or 2. above (head tags 0x00-0xFC).

Content blocks

Used blocks (head tags 0x00-0xFC) tail tag distinguish used/unused block and if content is compressed or not.

Content compression

The tail flag of an used block is one of

CC == 0 // Content is not compressed.
CC == 1 // Content is in zappy compression format.

If compression of written content is enabled, there are two cases: If compressed size < original size then the compressed content should be written if it will save at least one atom of the block. If compressed size >= original size then the compressed content should not be used.

It's recommended to use compression. For example the BTrees implementation assumes compression is used. Using compression may cause a slowdown in some cases while it may as well cause a speedup.

Short content block

Short content block carries content of length between N == 0(0x00) and N == 251(0xFB) bytes.

|<-first atom start  ...  last atom end->|
+---++--   ...   --+--   ...   --++------+
| 0 ||    1...     |  0x*...0x*E || 0x*F |
+---++--   ...   --+--   ...   --++------+
| N ||   content   |   padding   ||  CC  |
+---++--   ...   --+--   ...   --++------+

A == (N+1)/16 + 1        // The number of atoms in the block [1, 16]
padding == 15 - (N+1)%16 // Length of the zero padding

Long content block

Long content block carries content of length between N == 252(0xFC) and N == 65787(0x100FB) bytes.

|<-first atom start    ...     last atom end->|
+------++------+-- ... --+--  ...   --++------+
|  0   || 1..2 |   3...  | 0x*...0x*E || 0x*F |
+------++------+-- ... --+--  ...   --++------+
| 0xFC ||  M   | content |  padding   ||  CC  |
+------++------+-- ... --+--  ...   --++------+

A == (N+3)/16 + 1        // The number of atoms in the block [16, 4112]
M == N % 0x10000         // Stored as 2 bytes in network byte order
padding == 15 - (N+3)%16 // Length of the zero padding

Relocated used block

Relocated block allows to permanently assign a handle to some content and resize the content anytime afterwards without having to update all the possible existing references; the handle can be constant while the content size may be dynamic. When relocating a block, any space left by the original block content, above this single atom block, MUST be reclaimed.

Relocations MUST point only to a used short or long block == blocks with tags 0x00...0xFC.

+------++------+---------++----+
|  0   || 1..7 | 8...14  || 15 |
+------++------+---------++----+
| 0xFD ||  H   | padding || 0  |
+------++------+---------++----+

H is the handle of the relocated block in network byte order.

Free blocks

Free blocks are the result of space deallocation. Free blocks are organized in one or more doubly linked lists, abstracted by the FLT interface. Free blocks MUST be "registered" by putting them in such list. Allocator MUST reuse a big enough free block, if such exists, before growing the file size. When a free block is created by deallocation or reallocation it MUST be joined with any adjacently existing free blocks before "registering". If the resulting free block is now a last block of a file, the free block MUST be discarded and the file size MUST be truncated accordingly instead. Put differently, there MUST NOT ever be a free block at the file end.

A single free atom

Is an unused block of size 1 atom.

+------++------+--------++------+
|  0   || 1..7 | 8...14 ||  15  |
+------++------+--------++------+
| 0xFE ||  P   |   N    || 0xFE |
+------++------+--------++------+

P and N, stored in network byte order, are the previous and next free block handles in the doubly linked list to which this free block belongs.

A long unused block

Is an unused block of size > 1 atom.

+------++------+-------+---------+- ... -+----------++------+
|  0   || 1..7 | 8..14 | 15...21 |       | Z-7..Z-1 ||  Z   |
+------++------+-------+---------+- ... -+----------++------+
| 0xFF ||  S   |   P   |    N    | Leak  |    S     || 0xFF |
+------++------+-------+---------+- ... -+----------++------+

Z == 16 * S - 1

S is the size of this unused block in atoms. P and N are the previous and next free block handles in the doubly linked list to which this free block belongs. Leak contains any data the block had before deallocating this block. See also the subtitle 'Content wiping' above. S, P and N are stored in network byte order. Large free blocks may trigger a consideration of file hole punching of the Leak field - for some value of 'large'.

Note: Allocator methods vs CRUD[1]:

Alloc	[C]reate
Get	[R]ead
Realloc	[U]pdate
Free	[D]elete

Note: No Allocator method returns io.EOF.

[1]: http://en.wikipedia.org/wiki/Create,_read,_update_and_delete

func NewAllocator

func NewAllocator(f Filer, opts *Options) (a *Allocator, err error)

NewAllocator returns a new Allocator. To open an existing file, pass its Filer. To create a "new" file, pass a Filer which file is of zero size.

func (*Allocator) Alloc

func (a *Allocator) Alloc(b []byte) (handle int64, err error)

Alloc allocates storage space for b and returns the handle of the new block with content set to b or an error, if any. The returned handle is valid only while the block is used - until the block is deallocated. No two valid handles share the same value within the same Filer, but any value of a handle not referring to any used block may become valid any time as a result of Alloc.

Invoking Alloc on an empty Allocator is guaranteed to return handle with value 1. The intended use of content of handle 1 is a root "directory" of other data held by an Allocator.

Passing handles not obtained initially from Alloc or not anymore valid to any other Allocator methods can result in an irreparably corrupted database.

func (*Allocator) CacheStats

func (a *Allocator) CacheStats() (buffersUsed, buffersTotal int, bytesUsed, bytesTotal, hits, misses int64)

CacheStats reports cache statistics.

TODO return a struct perhaps.

func (*Allocator) Free

func (a *Allocator) Free(handle int64) (err error)

Free deallocates the block referred to by handle or returns an error, if any.

After Free succeeds, handle is invalid and must not be used.

Handle must have been obtained initially from Alloc and must be still valid, otherwise a database may get irreparably corrupted.

func (*Allocator) Get

func (a *Allocator) Get(buf []byte, handle int64) (b []byte, err error)

Get returns the data content of a block referred to by handle or an error if any. The returned slice may be a sub-slice of buf if buf was large enough to hold the entire content. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil buf.

If the content was stored using compression then it is transparently returned decompressed.

Handle must have been obtained initially from Alloc and must be still valid, otherwise invalid data may be returned without detecting the error.

Get is safe for concurrent access by multiple goroutines iff no other goroutine mutates the DB.

func (*Allocator) Realloc

func (a *Allocator) Realloc(handle int64, b []byte) (err error)

Realloc sets the content of a block referred to by handle or returns an error, if any.

Handle must have been obtained initially from Alloc and must be still valid, otherwise a database may get irreparably corrupted.

func (*Allocator) Verify

func (a *Allocator) Verify(bitmap Filer, log func(error) bool, stats *AllocStats) (err error)

Verify attempts to find any structural errors in a Filer wrt the organization of it as defined by Allocator. 'bitmap' is a scratch pad for necessary bookkeeping and will grow to at most to Allocator's Filer.Size()/128 (0,78%). Any problems found are reported to 'log' except non verify related errors like disk read fails etc. If 'log' returns false or the error doesn't allow to (reliably) continue, the verification process is stopped and an error is returned from the Verify function. Passing a nil log works like providing a log function always returning false. Any non-structural errors, like for instance Filer read errors, are NOT reported to 'log', but returned as the Verify's return value, because Verify cannot proceed in such cases. Verify returns nil only if it fully completed verifying Allocator's Filer without detecting any error.

It is recommended to limit the number reported problems by returning false from 'log' after reaching some limit. Huge and corrupted DB can produce an overwhelming error report dataset.

The verifying process will scan the whole DB at least 3 times (a trade between processing space and time consumed). It doesn't read the content of free blocks above the head/tail info bytes. If the 3rd phase detects lost free space, then a 4th scan (a faster one) is performed to precisely report all of them.

If the DB/Filer to be verified is reasonably small, respective if its size/128 can comfortably fit within process's free memory, then it is recommended to consider using a MemFiler for the bit map.

Statistics are returned via 'stats' if non nil. The statistics are valid only if Verify succeeded, ie. it didn't reported anything to log and it returned a nil error.

type BTree

type BTree struct {
	// contains filtered or unexported fields
}

BTree is a B+tree[1][2], i.e. a variant which speeds up enumeration/iteration of the BTree. According to its origin it can be volatile (backed only by memory) or non-volatile (backed by a non-volatile Allocator).

The specific implementation of BTrees in this package are B+trees with delayed split/concatenation (discussed in e.g. [3]).

Note: No BTree methods returns io.EOF for physical Filer reads/writes. The io.EOF is returned only by bTreeEnumerator methods to indicate "no more K-V pair".

[1]: http://en.wikipedia.org/wiki/B+tree
[2]: http://zgking.com:8080/home/donghui/publications/books/dshandbook_BTree.pdf
[3]: http://people.cs.aau.dk/~simas/aalg06/UbiquitBtree.pdf

func CreateBTree

func CreateBTree(store *Allocator, collate func(a, b []byte) int) (bt *BTree, handle int64, err error)

CreateBTree creates a new BTree in store. It returns the tree, its (freshly assigned) handle (for OpenBTree or RemoveBTree) or an error, if any.

func NewBTree

func NewBTree(collate func(a, b []byte) int) *BTree

NewBTree returns a new, memory-only BTree.

func OpenBTree

func OpenBTree(store *Allocator, collate func(a, b []byte) int, handle int64) (bt *BTree, err error)

OpenBTree opens a store's BTree using handle. It returns the tree or an error, if any. The same tree may be opened more than once, but operations on the separate instances should not ever overlap or void the other instances. However, the intended API usage is to open the same tree handle only once (handled by some upper layer "dispatcher").

func (*BTree) Clear

func (t *BTree) Clear() (err error)

Clear empties the tree.

func (*BTree) Delete

func (t *BTree) Delete(key []byte) (err error)

Delete deletes key and its associated value from the tree.

func (*BTree) DeleteAny

func (t *BTree) DeleteAny() (empty bool, err error)

DeleteAny deletes one key and its associated value from the tree. If the tree is empty on return then empty is true.

func (*BTree) Dump

func (t *BTree) Dump(w io.Writer) (err error)

Dump outputs a human readable dump of t to w. It is usable iff t keys and values are encoded scalars (see EncodeScalars). Intended use is only for examples or debugging. Some type information is lost in the rendering, for example a float value '17.' and an integer value '17' may both output as '17'.

func (*BTree) Extract

func (t *BTree) Extract(buf, key []byte) (value []byte, err error)

Extract is a combination of Get and Delete. If the key exists in the tree, it is returned (like Get) and also deleted from a tree in a more efficient way which doesn't walk it twice. The returned slice may be a sub-slice of buf if buf was large enough to hold the entire content. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil buf.

func (*BTree) First

func (t *BTree) First() (key, value []byte, err error)

First returns the first KV pair of the tree, if it exists. Otherwise key == nil and value == nil.

func (*BTree) Get

func (t *BTree) Get(buf, key []byte) (value []byte, err error)

Get returns the value associated with key, or nil if no such value exists. The returned slice may be a sub-slice of buf if buf was large enough to hold the entire content. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil buf.

Get is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.

func (*BTree) Handle

func (t *BTree) Handle() int64

Handle reports t's handle.

func (*BTree) IndexSeek

func (t *BTree) IndexSeek(key []byte, c func(a, b []byte) int) (enum *BTreeEnumerator, hit bool, err error)

IndexSeek returns an Enumerator with "position" or an error of any. Normally the position is on a KV pair such that key >= KV.key. Then hit is key == KV.key. The position is possibly "after" the last KV pair, but that is not an error. The collate function originally passed to CreateBTree is used for enumerating the tree but a custom collate function c is used for IndexSeek.

IndexSeek is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.

func (*BTree) IsMem

func (t *BTree) IsMem() (r bool)

IsMem reports if t is a memory only BTree.

func (*BTree) Last

func (t *BTree) Last() (key, value []byte, err error)

Last returns the last KV pair of the tree, if it exists. Otherwise key == nil and value == nil.

func (*BTree) Put

func (t *BTree) Put(buf, key []byte, upd func(key, old []byte) (new []byte, write bool, err error)) (old []byte, written bool, err error)

Put combines Get and Set in a more efficient way where the tree is walked only once. The upd(ater) receives the current (key, old-value), if that exists or (key, nil) otherwise. It can then return a (new-value, true, nil) to create or overwrite the existing value in the KV pair, or (whatever, false, nil) if it decides not to create or not to update the value of the KV pair.

tree.Set(k, v)

conceptually equals

tree.Put(k, func(k, v []byte){ return v, true }([]byte, bool))

modulo the differing return values.

The returned slice may be a sub-slice of buf if buf was large enough to hold the entire content. Otherwise, a newly allocated slice will be returned. It is valid to pass a nil buf.

func (*BTree) Seek

func (t *BTree) Seek(key []byte) (enum *BTreeEnumerator, hit bool, err error)

Seek returns an Enumerator with "position" or an error of any. Normally the position is on a KV pair such that key >= KV.key. Then hit is key == KV.key. The position is possibly "after" the last KV pair, but that is not an error.

Seek is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.

func (*BTree) SeekFirst

func (t *BTree) SeekFirst() (enum *BTreeEnumerator, err error)

seekFirst returns an enumerator positioned on the first KV pair in the tree, if any. For an empty tree, err == io.EOF is returend.

SeekFirst is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.

func (*BTree) SeekLast

func (t *BTree) SeekLast() (enum *BTreeEnumerator, err error)

seekLast returns an enumerator positioned on the last KV pair in the tree, if any. For an empty tree, err == io.EOF is returend.

SeekLast is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.

func (*BTree) Set

func (t *BTree) Set(key, value []byte) (err error)

Set sets the value associated with key. Any previous value, if existed, is overwritten by the new one.

type BTreeEnumerator

type BTreeEnumerator struct {
	// contains filtered or unexported fields
}

BTreeEnumerator captures the state of enumerating a tree. It is returned from the Seek* methods. The enumerator is aware of any mutations made to the tree in the process of enumerating it and automatically resumes the enumeration.

func (*BTreeEnumerator) Next

func (e *BTreeEnumerator) Next() (key, value []byte, err error)

Next returns the currently enumerated KV pair, if it exists and moves to the next KV in the key collation order. If there is no KV pair to return, err == io.EOF is returned.

Next is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.

func (*BTreeEnumerator) Prev

func (e *BTreeEnumerator) Prev() (key, value []byte, err error)

Prev returns the currently enumerated KV pair, if it exists and moves to the previous KV in the key collation order. If there is no KV pair to return, err == io.EOF is returned.

Prev is safe for concurrent access by multiple goroutines iff no other goroutine mutates the tree.

type ErrDecodeScalars

type ErrDecodeScalars struct {
	B []byte // Data being decoded
	I int    // offending offset
}

ErrDecodeScalars is possibly returned from DecodeScalars

func (*ErrDecodeScalars) Error

func (e *ErrDecodeScalars) Error() string

Error implements the built in error type.

type ErrILSEQ

type ErrILSEQ struct {
	Type ErrType
	Off  int64
	Arg  int64
	Arg2 int64
	Arg3 int64
	Name string
	More interface{}
}

ErrILSEQ reports a corrupted file format. Details in fields according to Type.

func (*ErrILSEQ) Error

func (e *ErrILSEQ) Error() string

Error implements the built in error type.

type ErrINVAL

type ErrINVAL struct {
	Src string
	Val interface{}
}

ErrINVAL reports invalid values passed as parameters, for example negative offsets where only non-negative ones are allowed or read from the DB.

func (*ErrINVAL) Error

func (e *ErrINVAL) Error() string

Error implements the built in error type.

type ErrPERM

type ErrPERM struct {
	Src string
}

ErrPERM is for example reported when a Filer is closed while BeginUpdate(s) are not balanced with EndUpdate(s)/Rollback(s) or when EndUpdate or Rollback is invoked which is not paired with a BeginUpdate.

func (*ErrPERM) Error

func (e *ErrPERM) Error() string

Error implements the built in error type.

type ErrType

type ErrType int

ErrTag represents an ErrILSEQ kind.

const (
	ErrOther ErrType = iota

	ErrAdjacentFree          // Adjacent free blocks (.Off and .Arg)
	ErrDecompress            // Used compressed block: corrupted compression
	ErrExpFreeTag            // Expected a free block tag, got .Arg
	ErrExpUsedTag            // Expected a used block tag, got .Arg
	ErrFLT                   // Free block is invalid or referenced multiple times
	ErrFLTLoad               // FLT truncated to .Off, need size >= .Arg
	ErrFLTSize               // Free block size (.Arg) doesn't belong to its list min size: .Arg2
	ErrFileSize              // File .Name size (.Arg) != 0 (mod 16)
	ErrFreeChaining          // Free block, .prev.next doesn't point back to this block
	ErrFreeTailBlock         // Last block is free
	ErrHead                  // Head of a free block list has non zero Prev (.Arg)
	ErrInvalidRelocTarget    // Reloc doesn't target (.Arg) a short or long used block
	ErrInvalidWAL            // Corrupted write ahead log. .Name: file name, .More: more
	ErrLongFreeBlkTooLong    // Long free block spans beyond EOF, size .Arg
	ErrLongFreeBlkTooShort   // Long free block must have at least 2 atoms, got only .Arg
	ErrLongFreeNextBeyondEOF // Long free block .Next (.Arg) spans beyond EOF
	ErrLongFreePrevBeyondEOF // Long free block .Prev (.Arg) spans beyond EOF
	ErrLongFreeTailTag       // Expected a long free block tail tag, got .Arg
	ErrLostFreeBlock         // Free block is not in any FLT list
	ErrNullReloc             // Used reloc block with nil target
	ErrRelocBeyondEOF        // Used reloc points (.Arg) beyond EOF
	ErrShortFreeTailTag      // Expected a short free block tail tag, got .Arg
	ErrSmall                 // Request for a free block (.Arg) returned a too small one (.Arg2) at .Off
	ErrTailTag               // Block at .Off has invalid tail CC (compression code) tag, got .Arg
	ErrUnexpReloc            // Unexpected reloc block referred to from reloc block .Arg
	ErrVerifyPadding         // Used block has nonzero padding
	ErrVerifyTailSize        // Long free block size .Arg but tail size .Arg2
	ErrVerifyUsedSpan        // Used block size (.Arg) spans beyond EOF
)

ErrILSEQ types

type Filer

type Filer interface {
	// BeginUpdate increments the "nesting" counter (initially zero). Every
	// call to BeginUpdate must be eventually "balanced" by exactly one of
	// EndUpdate or Rollback. Calls to BeginUpdate may nest.
	BeginUpdate() error

	// Analogous to os.File.Close().
	Close() error

	// EndUpdate decrements the "nesting" counter. If it's zero after that
	// then assume the "storage" has reached structural integrity (after a
	// batch of partial updates). If a Filer implements some support for
	// that (write ahead log, journal, etc.) then the appropriate actions
	// are to be taken for nesting == 0. Invocation of an unbalanced
	// EndUpdate is an error.
	EndUpdate() error

	// Analogous to os.File.Name().
	Name() string

	// PunchHole deallocates space inside a "file" in the byte range
	// starting at off and continuing for size bytes. The actual hole
	// created by PunchHole may be smaller than requested. The Filer size
	// (as reported by `Size()` does not change when hole punching, even
	// when punching the end of a file off.  In contrast to the Linux
	// implementation of FALLOC_FL_PUNCH_HOLE in `fallocate`(2); a Filer is
	// free not only to ignore `PunchHole()` (implement it as a nop), but
	// additionally no guarantees about the content of the hole, when
	// eventually read back, are required, i.e.  any data, not only zeros,
	// can be read from the "hole", including just anything what was left
	// there - with all of the possible security problems.
	PunchHole(off, size int64) error

	// As os.File.ReadAt. Note: `off` is an absolute "file pointer"
	// address and cannot be negative even when a Filer is a InnerFiler.
	ReadAt(b []byte, off int64) (n int, err error)

	// Rollback cancels and undoes the innermost pending update level.
	// Rollback decrements the "nesting" counter.  If a Filer implements
	// some support for keeping structural integrity (write ahead log,
	// journal, etc.) then the appropriate actions are to be taken.
	// Invocation of an unbalanced Rollback is an error.
	Rollback() error

	// Analogous to os.File.FileInfo().Size().
	Size() (int64, error)

	// Analogous to os.Sync().
	Sync() (err error)

	// Analogous to os.File.Truncate().
	Truncate(size int64) error

	// Analogous to os.File.WriteAt(). Note: `off` is an absolute "file
	// pointer" address and cannot be negative even when a Filer is a
	// InnerFiler.
	WriteAt(b []byte, off int64) (n int, err error)
}

A Filer is a []byte-like model of a file or similar entity. It may optionally implement support for structural transaction safety. In contrast to a file stream, a Filer is not sequentially accessible. ReadAt and WriteAt are always "addressed" by an offset and are assumed to perform atomically. A Filer is not safe for concurrent access, it's designed for consumption by the other objects in package, which should use a Filer from one goroutine only or via a mutex. BeginUpdate, EndUpdate and Rollback must be either all implemented by a Filer for structural integrity - or they should be all no-ops; where/if that requirement is relaxed.

If a Filer wraps another Filer implementation, it usually invokes the same methods on the "inner" one, after some possible argument translations etc. If a Filer implements the structural transactions handling methods (BeginUpdate, EndUpdate and Rollback) as no-ops _and_ wraps another Filer: it then still MUST invoke those methods on the inner Filer. This is important for the case where a RollbackFiler exists somewhere down the chain. It's also important for an Allocator - to know when it must invalidate its FLT cache.

type InnerFiler

type InnerFiler struct {
	// contains filtered or unexported fields
}

A InnerFiler is a Filer with added addressing/size translation.

func NewInnerFiler

func NewInnerFiler(outer Filer, off int64) *InnerFiler

NewInnerFiler returns a new InnerFiler wrapped by `outer` in a way which adds `off` to every access.

For example, considering:

inner := NewInnerFiler(outer, 10)

then

inner.WriteAt([]byte{42}, 4)

translates to

outer.WriteAt([]byte{42}, 14)

But an attempt to emulate

outer.WriteAt([]byte{17}, 9)

by

inner.WriteAt([]byte{17}, -1)

will fail as the `off` parameter can never be < 0. Also note that

inner.Size() == outer.Size() - off,

i.e. `inner` pretends no `outer` exists. Finally, after e.g.

inner.Truncate(7)
outer.Size() == 17

will be true.

func (*InnerFiler) BeginUpdate

func (f *InnerFiler) BeginUpdate() error

BeginUpdate implements Filer.

func (*InnerFiler) Close

func (f *InnerFiler) Close() (err error)

Close implements Filer.

func (*InnerFiler) EndUpdate

func (f *InnerFiler) EndUpdate() error

EndUpdate implements Filer.

func (*InnerFiler) Name

func (f *InnerFiler) Name() string

Name implements Filer.

func (*InnerFiler) PunchHole

func (f *InnerFiler) PunchHole(off, size int64) error

PunchHole implements Filer. `off`, `size` must be >= 0.

func (*InnerFiler) ReadAt

func (f *InnerFiler) ReadAt(b []byte, off int64) (n int, err error)

ReadAt implements Filer. `off` must be >= 0.

func (*InnerFiler) Rollback

func (f *InnerFiler) Rollback() error

Rollback implements Filer.

func (*InnerFiler) Size

func (f *InnerFiler) Size() (int64, error)

Size implements Filer.

func (*InnerFiler) Sync

func (f *InnerFiler) Sync() (err error)

Sync() implements Filer.

func (*InnerFiler) Truncate

func (f *InnerFiler) Truncate(size int64) error

Truncate implements Filer.

func (*InnerFiler) WriteAt

func (f *InnerFiler) WriteAt(b []byte, off int64) (n int, err error)

WriteAt implements Filer. `off` must be >= 0.

type MemFiler

type MemFiler struct {
	// contains filtered or unexported fields
}

MemFiler is a memory backed Filer. It implements BeginUpdate, EndUpdate and Rollback as no-ops. MemFiler is not automatically persistent, but it has ReadFrom and WriteTo methods.

func NewMemFiler

func NewMemFiler() *MemFiler

NewMemFiler returns a new MemFiler.

func (*MemFiler) BeginUpdate

func (f *MemFiler) BeginUpdate() error

BeginUpdate implements Filer.

func (*MemFiler) Close

func (f *MemFiler) Close() (err error)

Close implements Filer.

func (*MemFiler) EndUpdate

func (f *MemFiler) EndUpdate() (err error)

EndUpdate implements Filer.

func (*MemFiler) Name

func (f *MemFiler) Name() string

Name implements Filer.

func (*MemFiler) PunchHole

func (f *MemFiler) PunchHole(off, size int64) (err error)

PunchHole implements Filer.

func (*MemFiler) ReadAt

func (f *MemFiler) ReadAt(b []byte, off int64) (n int, err error)

ReadAt implements Filer.

func (*MemFiler) ReadFrom

func (f *MemFiler) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom is a helper to populate MemFiler's content from r. 'n' reports the number of bytes read from 'r'.

func (*MemFiler) Rollback

func (f *MemFiler) Rollback() (err error)

Rollback implements Filer.

func (*MemFiler) Size

func (f *MemFiler) Size() (int64, error)

Size implements Filer.

func (*MemFiler) Sync

func (f *MemFiler) Sync() error

Sync implements Filer.

func (*MemFiler) Truncate

func (f *MemFiler) Truncate(size int64) (err error)

Truncate implements Filer.

func (*MemFiler) WriteAt

func (f *MemFiler) WriteAt(b []byte, off int64) (n int, err error)

WriteAt implements Filer.

func (*MemFiler) WriteTo

func (f *MemFiler) WriteTo(w io.Writer) (n int64, err error)

WriteTo is a helper to copy/persist MemFiler's content to w. If w is also an io.WriterAt then WriteTo may attempt to _not_ write any big, for some value of big, runs of zeros, i.e. it will attempt to punch holes, where possible, in `w` if that happens to be a freshly created or to zero length truncated OS file. 'n' reports the number of bytes written to 'w'.

type OSFile

type OSFile interface {
	Name() string
	Stat() (fi os.FileInfo, err error)
	Sync() (err error)
	Truncate(size int64) (err error)
	io.Closer
	io.Reader
	io.ReaderAt
	io.Seeker
	io.Writer
	io.WriterAt
}

OSFile is an os.File like minimal set of methods allowing to construct a Filer.

type OSFiler

type OSFiler struct {
	// contains filtered or unexported fields
}

OSFiler is like a SimpleFileFiler but based on an OSFile.

func NewOSFiler

func NewOSFiler(f OSFile) (r *OSFiler)

NewOSFiler returns a Filer from an OSFile. This Filer is like the SimpleFileFiler, it does not implement the transaction related methods.

func (*OSFiler) BeginUpdate

func (f *OSFiler) BeginUpdate() (err error)

BeginUpdate implements Filer.

func (*OSFiler) Close

func (f *OSFiler) Close() (err error)

Close implements Filer.

func (*OSFiler) EndUpdate

func (f *OSFiler) EndUpdate() (err error)

EndUpdate implements Filer.

func (*OSFiler) Name

func (f *OSFiler) Name() string

Name implements Filer.

func (*OSFiler) PunchHole

func (f *OSFiler) PunchHole(off, size int64) (err error)

PunchHole implements Filer.

func (*OSFiler) ReadAt

func (f *OSFiler) ReadAt(b []byte, off int64) (n int, err error)

ReadAt implements Filer.

func (*OSFiler) Rollback

func (f *OSFiler) Rollback() (err error)

Rollback implements Filer.

func (*OSFiler) Size

func (f *OSFiler) Size() (n int64, err error)

Size implements Filer.

func (*OSFiler) Sync

func (f *OSFiler) Sync() (err error)

Sync implements Filer.

func (*OSFiler) Truncate

func (f *OSFiler) Truncate(size int64) (err error)

Truncate implements Filer.

func (*OSFiler) WriteAt

func (f *OSFiler) WriteAt(b []byte, off int64) (n int, err error)

WriteAt implements Filer.

type Options

type Options struct{}

Options are passed to the NewAllocator to amend some configuration. The compatibility promise is the same as of struct types in the Go standard library - introducing changes can be made only by adding new exported fields, which is backward compatible as long as client code uses field names to assign values of imported struct types literals.

NOTE: No options are currently defined.

type RollbackFiler

type RollbackFiler struct {
	// contains filtered or unexported fields
}

RollbackFiler is a Filer implementing structural transaction handling. Structural transactions should be small and short lived because all non committed data are held in memory until committed or discarded by a Rollback.

While using RollbackFiler, every intended update of the wrapped Filler, by WriteAt, Truncate or PunchHole, _must_ be made within a transaction. Attempts to do it outside of a transaction will return ErrPERM. OTOH, invoking ReadAt outside of a transaction is not a problem.

No nested transactions: All updates within a transaction are held in memory. On a matching EndUpdate the updates held in memory are actually written to the wrapped Filer.

Nested transactions: Correct data will be seen from RollbackFiler when any level of a nested transaction is rollbacked. The actual writing to the wrapped Filer happens only when the outer most transaction nesting level is closed.

Invoking Rollback is an alternative to EndUpdate. It discards all changes made at the current transaction level and returns the "state" (possibly not yet persisted) of the Filer to what it was before the corresponding BeginUpdate.

During an open transaction, all reads (using ReadAt) are "dirty" reads, seeing the uncommitted changes made to the Filer's data.

Lldb databases should be based upon a RollbackFiler.

With a wrapped MemFiler one gets transactional memory. With, for example a wrapped disk based SimpleFileFiler it protects against at least some HW errors - if Rollback is properly invoked on such failures and/or if there's some WAL or 2PC or whatever other safe mechanism based recovery procedure used by the client.

The "real" writes to the wrapped Filer (or WAL instead) go through the writerAt supplied to NewRollbackFiler.

List of functions/methods which are recommended to be wrapped in a BeginUpdate/EndUpdate structural transaction:

Allocator.Alloc
Allocator.Free
Allocator.Realloc

CreateBTree
RemoveBTree
BTree.Clear
BTree.Delete
BTree.DeleteAny
BTree.Clear
BTree.Extract
BTree.Get (it can mutate the DB)
BTree.Put
BTree.Set

NOTE: RollbackFiler is a generic solution intended to wrap Filers provided by this package which do not implement any of the transactional methods. RollbackFiler thus _does not_ invoke any of the transactional methods of its wrapped Filer.

RollbackFiler is safe for concurrent use by multiple goroutines.

func NewRollbackFiler

func NewRollbackFiler(f Filer, checkpoint func(sz int64) error, writerAt io.WriterAt) (r *RollbackFiler, err error)

NewRollbackFiler returns a RollbackFiler wrapping f.

The checkpoint parameter

The checkpoint function is called after closing (by EndUpdate) the upper most level open transaction if all calls of writerAt were successful and the DB (or eg. a WAL) is thus now in a consistent state (virtually, in the ideal world with no write caches, no HW failures, no process crashes, ...).

NOTE: In, for example, a 2PC it is necessary to reflect also the sz parameter as the new file size (as in the parameter to Truncate). All changes were successfully written already by writerAt before invoking checkpoint.

The writerAt parameter

The writerAt interface is used to commit the updates of the wrapped Filer. If any invocation of writerAt fails then a non nil error will be returned from EndUpdate and checkpoint will _not_ ne called. Neither is necessary to call Rollback. The rule of thumb: The [structural] transaction [level] is closed by invoking exactly once one of EndUpdate _or_ Rollback.

It is presumed that writerAt uses WAL or 2PC or whatever other safe mechanism to physically commit the updates.

Updates performed by invocations of writerAt are byte-precise, but not necessarily maximum possible length precise. IOW, for example an update crossing page boundaries may be performed by more than one writerAt invocation. No offset sorting is performed. This may change if it proves to be a problem. Such change would be considered backward compatible.

NOTE: Using RollbackFiler, but failing to ever invoke a matching "closing" EndUpdate after an "opening" BeginUpdate means neither writerAt or checkpoint will ever get called - with all the possible data loss consequences.

func (*RollbackFiler) BeginUpdate

func (r *RollbackFiler) BeginUpdate() (err error)

Implements Filer.

func (*RollbackFiler) Close

func (r *RollbackFiler) Close() (err error)

Implements Filer.

Close will return an error if not invoked at nesting level 0. However, to allow emergency closing from eg. a signal handler; if Close is invoked within an open transaction(s), it rollbacks any non committed open transactions and performs the Close operation.

IOW: Regardless of the transaction nesting level the Close is always performed but any uncommitted transaction data are lost.

func (*RollbackFiler) EndUpdate

func (r *RollbackFiler) EndUpdate() (err error)

Implements Filer.

func (*RollbackFiler) Name

func (r *RollbackFiler) Name() string

Implements Filer.

func (*RollbackFiler) PunchHole

func (r *RollbackFiler) PunchHole(off, size int64) error

Implements Filer.

func (*RollbackFiler) ReadAt

func (r *RollbackFiler) ReadAt(b []byte, off int64) (n int, err error)

Implements Filer.

func (*RollbackFiler) Rollback

func (r *RollbackFiler) Rollback() (err error)

Implements Filer.

func (*RollbackFiler) Size

func (r *RollbackFiler) Size() (sz int64, err error)

Implements Filer.

func (*RollbackFiler) Sync

func (r *RollbackFiler) Sync() error

Implements Filer.

func (*RollbackFiler) Truncate

func (r *RollbackFiler) Truncate(size int64) error

Implements Filer.

func (*RollbackFiler) WriteAt

func (r *RollbackFiler) WriteAt(b []byte, off int64) (n int, err error)

Implements Filer.

type SimpleFileFiler

type SimpleFileFiler struct {
	// contains filtered or unexported fields
}

SimpleFileFiler is an os.File backed Filer intended for use where structural consistency can be reached by other means (SimpleFileFiler is for example wrapped in eg. an RollbackFiler or ACIDFiler0) or where persistence is not required (temporary/working data sets).

SimpleFileFiler is the most simple os.File backed Filer implementation as it does not really implement BeginUpdate and EndUpdate/Rollback in any way which would protect the structural integrity of data. If misused e.g. as a real database storage w/o other measures, it can easily cause data loss when, for example, a power outage occurs or the updating process terminates abruptly.

func NewSimpleFileFiler

func NewSimpleFileFiler(f *os.File) *SimpleFileFiler

NewSimpleFileFiler returns a new SimpleFileFiler.

func (*SimpleFileFiler) BeginUpdate

func (f *SimpleFileFiler) BeginUpdate() error

BeginUpdate implements Filer.

func (*SimpleFileFiler) Close

func (f *SimpleFileFiler) Close() (err error)

Close implements Filer.

func (*SimpleFileFiler) EndUpdate

func (f *SimpleFileFiler) EndUpdate() (err error)

EndUpdate implements Filer.

func (*SimpleFileFiler) Name

func (f *SimpleFileFiler) Name() string

Name implements Filer.

func (*SimpleFileFiler) PunchHole

func (f *SimpleFileFiler) PunchHole(off, size int64) (err error)

PunchHole implements Filer.

func (*SimpleFileFiler) ReadAt

func (f *SimpleFileFiler) ReadAt(b []byte, off int64) (n int, err error)

ReadAt implements Filer.

func (*SimpleFileFiler) Rollback

func (f *SimpleFileFiler) Rollback() (err error)

Rollback implements Filer.

func (*SimpleFileFiler) Size

func (f *SimpleFileFiler) Size() (int64, error)

Size implements Filer.

func (*SimpleFileFiler) Sync

func (f *SimpleFileFiler) Sync() error

Sync implements Filer.

func (*SimpleFileFiler) Truncate

func (f *SimpleFileFiler) Truncate(size int64) (err error)

Truncate implements Filer.

func (*SimpleFileFiler) WriteAt

func (f *SimpleFileFiler) WriteAt(b []byte, off int64) (n int, err error)

WriteAt implements Filer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL