Documentation
¶
Overview ¶
Package mst contains a Merkle Search Tree (MST) implementation for atproto.
This implementation is a port of the Typescript implementation in the `atproto` git repo.
## Notes copied from TS repo
This is an implementation of a Merkle Search Tree (MST) The data structure is described here: https://hal.inria.fr/hal-02303490/document The MST is an ordered, insert-order-independent, deterministic tree. Keys are laid out in alphabetic order. The key insight of an MST is that each key is hashed and starting 0s are counted to determine which layer it falls on (5 zeros for ~32 fanout). This is a merkle tree, so each subtree is referred to by it's hash (CID). When a leaf is changed, ever tree on the path to that leaf is changed as well, thereby updating the root hash.
For atproto, we use SHA-256 as the key hashing algorithm, and ~16 fanout (4-bits of zero per layer).
NOTE: currently keys are strings, not bytes. Because UTF-8 strings can't be safely split at arbitrary byte boundaries (the results are not necessarily valid UTF-8 strings), this means that "wide" characters not really supported in keys, particularly across programming language implementations. We recommend sticking with simple alphanumeric (ASCII) strings.
A couple notes on CBOR encoding:
There are never two neighboring subtrees. Therefore, we can represent a node as an array of leaves & pointers to their right neighbor (possibly null), along with a pointer to the left-most subtree (also possibly null).
Most keys in a subtree will have overlap. We do compression on prefixes by describing keys as: - the length of the prefix that it shares in common with the preceding key - the rest of the string
For example: If the first leaf in a tree is `bsky/posts/abcdefg` and the second is `bsky/posts/abcdehi` Then the first will be described as `prefix: 0, key: 'bsky/posts/abcdefg'`, and the second will be described as `prefix: 16, key: 'hi'.`
Index ¶
- Variables
- func CBORTypes() []reflect.Type
- type DiffOp
- type MerkleSearchTree
- func (mst *MerkleSearchTree) Add(ctx context.Context, key string, val cid.Cid, knownZeros int) (*MerkleSearchTree, error)
- func (mst *MerkleSearchTree) Delete(ctx context.Context, k string) (*MerkleSearchTree, error)
- func (mst *MerkleSearchTree) Get(ctx context.Context, k string) (cid.Cid, error)
- func (mst *MerkleSearchTree) GetPointer(ctx context.Context) (cid.Cid, error)
- func (mst *MerkleSearchTree) Update(ctx context.Context, k string, val cid.Cid) (*MerkleSearchTree, error)
- func (mst *MerkleSearchTree) WalkLeavesFrom(ctx context.Context, from string, cb func(key string, val cid.Cid) error) error
- type NodeData
- type TreeEntry
Constants ¶
This section is empty.
Variables ¶
var ErrNotFound = fmt.Errorf("mst: not found")
Functions ¶
Types ¶
type DiffOp ¶
func DiffTrees ¶
func DiffTrees(ctx context.Context, bs blockstore.Blockstore, from, to cid.Cid) ([]*DiffOp, error)
TODO: this code isn't great, should be rewritten on top of the baseline datastructures once functional and correct
type MerkleSearchTree ¶
type MerkleSearchTree struct {
// contains filtered or unexported fields
}
MerkleSearchTree represents an MST tree node (NodeData type). It can be in several levels of hydration: fully hydrated (entries and "pointer" (CID) computed); dirty (entries correct, but pointer (CID) not valid); virtual (pointer is defined, no entries have been pulled from blockstore)
MerkleSearchTree values are immutable. Methods return copies with changes.
func LoadMST ¶
func LoadMST(cst cbor.IpldStore, root cid.Cid) *MerkleSearchTree
This is poorly named in both implementations, because it is lazy Typescript: MST.load(storage, cid, layer=null, fanout) -> MST
func NewEmptyMST ¶
func NewEmptyMST(cst cbor.IpldStore) *MerkleSearchTree
NewEmptyMST reports a new empty MST using cst as its storage.
func (*MerkleSearchTree) Add ¶
func (mst *MerkleSearchTree) Add(ctx context.Context, key string, val cid.Cid, knownZeros int) (*MerkleSearchTree, error)
"Adds a new leaf for the given key/value pair. Throws if a leaf with that key already exists" Typescript: MST.add(key, value, knownZeros?) -> MST
func (*MerkleSearchTree) Delete ¶
func (mst *MerkleSearchTree) Delete(ctx context.Context, k string) (*MerkleSearchTree, error)
"Deletes the value at the given key" Typescript: MST.delete(key) -> MST
func (*MerkleSearchTree) Get ¶
func (mst *MerkleSearchTree) Get(ctx context.Context, k string) (cid.Cid, error)
"Gets the value at the given key" Typescript: MST.get(key) -> (CID|null)
func (*MerkleSearchTree) GetPointer ¶
func (mst *MerkleSearchTree) GetPointer(ctx context.Context) (cid.Cid, error)
"We don't hash the node on every mutation for performance reasons. Instead we keep track of whether the pointer is outdated and only (recursively) calculate when needed" Typescript: MST.getPointer() -> CID
func (*MerkleSearchTree) Update ¶
func (mst *MerkleSearchTree) Update(ctx context.Context, k string, val cid.Cid) (*MerkleSearchTree, error)
"Edits the value at the given key. Throws if the given key does not exist" Typescript: MST.update(key, value) -> MST
func (*MerkleSearchTree) WalkLeavesFrom ¶
func (mst *MerkleSearchTree) WalkLeavesFrom(ctx context.Context, from string, cb func(key string, val cid.Cid) error) error
WalkLeavesFrom walks the leaves of the tree, calling the cb callback on each key that's greater than or equal to the provided from key. If cb returns an error, the walk is aborted and the error is returned.
type NodeData ¶
type NodeData struct { Left *cid.Cid `cborgen:"l"` // [nullable] pointer to lower-level subtree to the "left" of this path/key Entries []TreeEntry `cborgen:"e"` // ordered list of entries at this node }
MST tree node as gets serialized to CBOR. Note that the CBOR fields are all single-character.
type TreeEntry ¶
type TreeEntry struct { PrefixLen int64 `cborgen:"p"` // count of characters shared with previous path/key in tree KeySuffix []byte `cborgen:"k"` // remaining part of path/key (appended to "previous key") Val cid.Cid `cborgen:"v"` // CID pointer at this path/key Tree *cid.Cid `cborgen:"t"` // [nullable] pointer to lower-level subtree to the "right" of this path/key entry }
TreeEntry are elements of NodeData's Entries.