mtrie

package
v0.13.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2020 License: AGPL-3.0 Imports: 10 Imported by: 3

README

Memory-Trie: MTrie

At its heart, an MTrie is an in-memory key-value store, with the ability to generate cryptographic proofs for the states of the stored registers. MTrie combines features of Merkle trees (for generating cryptographic proofs for the stored register) and Radix Trees (for optimized memory consumption).

By construction, MTries are immutable data structures. Essentially, they represent a snapshot of the key-value store for one specific point in time. Updating register values is implemented through copy-on-write, which creates a new MTrie, i.e. a new snapshot of the updated key-value store. For minimal memory consumption, all sub-tries that where not affected by the write operation are shared between the original MTrie (before the register updates) and the updated MTrie (after the register writes).

Storage Model

Formally, an MTrie represents a perfect, full, binary Merkle tree with uniform height. We follow the established graph-theoretic terminology. In our terminology, we explicitly differentiate between:

  • tree: full binary Merkle tree with uniform height. The storage model is defined for the tree.
  • MTrie: is an optimized in-memory structure representing the tree
Underling Graph-Theoretical Storage Model

The storage model is defined for the tree. At its heart, it is a key-value store. In the store, there are a fixed number of storage slots, which we refer to as registers. By convention, each register has a key (identifying the storage slot) and a value (binary blob) stored in that memory slot. While all register keys have the same fixed length (measured in bits), the values are variable-length byte slices. We define an unallocated register as holding no data, i.e. an empty byte slice. By default, each register is unallocated. In contrast, an allocated_ register holds a value with positive storage size, i.e. a byte slice with length larger than zero. Note that do not introduce the concept of registers values being nil.

The theoretical storage model is a perfect, full, binary Merkle tree, which spans all registers (even if they are unallocated).
Therefore, we have two different node types in the tree:

  • A LEAF node represents a register:
    • holding a key and a value;
    • following establish graph-theoretic conventions, the height of a leaf is zero.
    • the hash value is defined as:
      • For and unallocated register, the hash is just the hash of a global constant. Therefore, the leafs for all unallocated registers have the same hash. We refer to the hash of an unallocated register as default hash at height 0.
      • For allocated registers, the hash value is H(key, value) for H the hash function.
  • An INTERIM node is a vertex in the tree with
    • has exactly two children, called LeftChild and RightChild, which are both of the same height; the children can either be leaf or interim nodes;
    • the height of an interim node n is n.height = LeftChild.height + 1 = RightChild.height + 1; (Hence, an interim node n can only have a n.height > 0, as only leafs have height zero.)
    • the hash value is defined as H(LeftChild, RightChild)
convention for mapping a register key to a path in the tree

Conventions:

  • let key[i] be the bit with index i (we use zero-based indexing)
  • a key can be converted into its integer representation though big-endian ordering
  • given a key and an index i, we define:
    • the prefix as key[:i] (excluding the bit up to, but not including, the bit with index i)
  • the tree's root node partitions the register set into to sub-sets depending on value key[0] :
    • all registers key[0] == 0 fall into the LeftChild subtree
    • all registers key[0] == 1 fall into the RightChild subtree
  • All registers in LeftChild's subtree, now have the prefix key[0:1] = [0]. LeftChild's respective two children partition the register set further into all registers with the common key prefix 0,0 vs 0,1.
  • Let n be interim node with a path length to the root node of d [edges]. Then, all registers that fall in n's subtree share the same prefix key[0:d]. Furthermore, partition this register set further into
    • all registers key[d] == 0 fall into the n.LeftChild subtree
    • all registers key[d] == 1 fall into the n.RightChild subtree

Therefore, we have the following relation between tree height and key length:

  • Let the tree hold registers with key length len(key) = K [bits]. Therefore, the tree has interim nodes with height values: K (tree root), K-1 (root's children), ..., 1. The interim nodes with height = 1 partition the resisters according to their last bit. Their children a leaf nodes (which have zero height).
  • Let n be an interim node with height n.height. Then, we can associate n with the key index i := K - n.height.
    • n's prefix is then the defined as p = key[:i], which is shared by all registers that fall into n's subtree.
    • n partitions its register set further: all registers with prefix p,0 fall into n.LeftChild's subtree;
      all registers with p,1 fall into n.LeftChild's subtree.

Note that our definition of height follows established graph-theoretic conventions:

The HEIGHT of a NODE v in a tree is the number of edges on the longest downward path between v and a tree leaf.
The HEIGHT of a TREE is the height of its root node.

Our storage model generates the following property, which is very beneficial for optimizing the implementation:

  • A sub-tree holding only unallocated registers hashes to a value that only depends on the height of the subtree (but not on which specific registers are included in the tree). Specifically, we define the defaultHash, which is recursively defined. The defaultHash[0] of an unallocated leaf node is a global constant. Furthermore, defaultHash[h] is the subtree-root hash of a subtree with height h that holds only unallocated registers.
MTrie as an Optimized Storage implementation

Storing the perfect, full, binary Merkle tree with uniform height in its raw form is very memory intensive. Therefore, the MTrie data structure employs a variety of optimizations to reduce its memory and CPU footprint. Nevertheless, from an MTrie, the full tree can be constructed.

On a high level, MTrie has the following optimizations:

  • sparse: all subtrees holding only unallocated register are pruned:
    • Consider an interim node with height h. Let c be one of its children, i.e. either LeftChild or RightChild.
    • If c == nil, the subtree-root hash for c is defaultHash[h-1] (correctness follows directly from the storage model)
  • Compactification: Consider a register with its repsective path from the root to the leaf in the full binary tree. When traversing the tree from the root down towards the leaf, there will come a node Ω, which only contains a single allocated register. Hence, MTrie pre-computes the root hash of such trees and store them as a compactified leaf. Formally, a compactified leaf stores
    • a key and a value;
    • a height value h, which can zero or larger
    • its hash is: subtree-root hash for a tree that only holds key and value. To compute this hash, we essentially start with H(key, value) and hash our way upwards the tree until we hit height h. While klimbing the tree upwards, we use the respective defaultHash[..] for the other branch which we are merger with.

Furthermore, an MTrie

  • uses SHA-3-256 as hash function H
  • the registers have keys with len(key) = 8*l [bits], for l the key size in bytes
  • the height of MTrie (per definition, the height of the root node) is also 8*l, for l the key size in bytes

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Forest

type Forest struct {
	// contains filtered or unexported fields
}

Forest holds several in-memory tries. As Forest is a storage-abstraction layer, we assume that all registers are addressed via paths of pre-defined uniform length.

Forest has a limit, the forestCapacity, on the number of tries it is able to store. If more tries are added than the capacity, the Least Recently Used trie is removed (evicted) from the Forest. THIS IS A ROUGH HEURISTIC as it might evict tries that are still needed. In fully matured Flow, we will have an explicit eviction policy.

TODO: Storage Eviction Policy for Forest

For the execution node: we only evict on sealing a result.

func NewForest

func NewForest(pathByteSize int, trieStorageDir string, forestCapacity int, metrics module.LedgerMetrics, onTreeEvicted func(tree *trie.MTrie) error) (*Forest, error)

NewForest returns a new instance of memory forest.

CAUTION on forestCapacity: the specified capacity MUST be SUFFICIENT to store all needed MTries in the forest. If more tries are added than the capacity, the Least Recently Used trie is removed (evicted) from the Forest. THIS IS A ROUGH HEURISTIC as it might evict tries that are still needed. Make sure you chose a sufficiently large forestCapacity, such that, when reaching the capacity, the Least Recently Used trie will never be needed again.

func (*Forest) AddTrie

func (f *Forest) AddTrie(newTrie *trie.MTrie) error

AddTrie adds a trie to the forest

func (*Forest) AddTries

func (f *Forest) AddTries(newTries []*trie.MTrie) error

AddTries adds a trie to the forest

func (*Forest) DiskSize

func (f *Forest) DiskSize() (uint64, error)

DiskSize returns the disk size of the directory used by the forest (in bytes)

func (*Forest) GetEmptyRootHash

func (f *Forest) GetEmptyRootHash() []byte

GetEmptyRootHash returns the rootHash of empty Trie

func (*Forest) GetTrie

func (f *Forest) GetTrie(rootHash ledger.RootHash) (*trie.MTrie, error)

GetTrie returns trie at specific rootHash warning, use this function for read-only operation

func (*Forest) GetTries

func (f *Forest) GetTries() ([]*trie.MTrie, error)

GetTries returns list of currently cached tree root hashes

func (*Forest) PathLength

func (f *Forest) PathLength() int

PathLength return the length [in bytes] the trie operates with. Concurrency safe (as Tries are immutable structures by convention)

func (*Forest) Proofs

func (f *Forest) Proofs(r *ledger.TrieRead) (*ledger.TrieBatchProof, error)

Proofs returns a batch proof for the given paths

func (*Forest) Read

func (f *Forest) Read(r *ledger.TrieRead) ([]*ledger.Payload, error)

Read reads values for an slice of paths and returns values and error (if any)

func (*Forest) RemoveTrie

func (f *Forest) RemoveTrie(rootHash []byte)

RemoveTrie removes a trie to the forest

func (*Forest) Size

func (f *Forest) Size() int

Size returns the number of active tries in this store

func (*Forest) Update

func (f *Forest) Update(u *ledger.TrieUpdate) (ledger.RootHash, error)

Update updates the Values for the registers and returns rootHash and error (if any). In case there are multiple updates to the same register, Update will persist the latest written value.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL