mst

package
v0.0.0-...-2503553 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 22, 2025 License: Apache-2.0, MIT Imports: 16 Imported by: 0

Documentation

Overview

Implementation of the Merkle Search Tree (MST) data structure for atproto.

## Terminology

node: any node in the tree. nodes can contain multiple entries. they should never be entirely "empty", unless the entire tree is a single empty node, but they might only contain "child" pointers

entry: nodes contain multiple entries. these can include both key/CID pairs, or pointers to child nodes. entries are always lexically sorted, with "child" entries pointing to nodes containing (recursively) keys in the appropriate lexical range. there should never be multiple "child" entries adjacent in a single node (they should be merged instead)

tree: an overall tree of nodes

## Tricky Bits

When inserting:

- the inserted key might be on a "higher" layer than the current top of the tree, in which case new parent tree nodes need to be created - "parent" or "child" insertions might be multiple layers away from the starting node, with intermediate nodes created - inserting a "value" entry in a node might require "splitting" a child node, if the key on the current layer would have fallen within the lexical range of the child

When removing:

- deleting an entry from a node might result in a "merge" of two child nodes which are no longer "split" - removing a "value" entry from the top of the tree might make it a simple pointer down to a child. in this case the top of the tree should be "trimmed" (this might involve multiple layers of trimming)

When inverting operations:

- need additional "proof" blocks to invert deletions. basically need the proof blocks for any keys (at any layer) directly adjacent to the deleted block - if an entry is removed from the top of a partial tree and results in "trimming", and the child node is not available, the overall tree root CID might still be known

## Hacking

Be careful with go slices. Need to avoid creating multiple references (slices) of the same underlying array, which can lead to "mutation at a distance" in ways that are hard to debug.

Index

Constants

View Source
const (
	// Maximum length, in bytes, of a key in the tree. Note that the atproto specifications imply a repo path maximum length, but don't say anything directly about MST key, other than they can not be empty (zero-length).
	MAX_KEY_BYTES = 1024
)

Variables

View Source
var ErrInvalidKey = errors.New("bytestring not a valid MST key")
View Source
var ErrInvalidTree = errors.New("invalid MST structure")
View Source
var ErrPartialTree = errors.New("MST is not complete")

Functions

func CountPrefixLen

func CountPrefixLen(a, b []byte) int

Computes the common prefix length between two bytestrings.

Used when compacting node entry lists for encoding.

func DebugPrintTree

func DebugPrintTree(n *Node, depth int)

This function is not very well implemented or correct. Should probably switch to Devin's `goat repo mst` code.

func HeightForKey

func HeightForKey(key []byte) (height int)

Computes the MST "height" for a key (bytestring). Layers are counted from the "bottom" of the tree, starting with zero.

For atproto repository v3, uses SHA-256 as the hashing function and counts two bits at a time, for an MST "fanout" value of 16.

func IsValidKey

func IsValidKey(key []byte) bool

Types

type EntryData

type EntryData struct {
	PrefixLen int64    `cborgen:"p"` // count of characters shared with previous path/key in tree
	KeySuffix []byte   `cborgen:"k"` // remaining part of path/key (appended to "previous key")
	Value     cid.Cid  `cborgen:"v"` // CID pointer at this path/key
	Right     *cid.Cid `cborgen:"t"` // [nullable] pointer to lower-level subtree to the "right" of this path/key entry
}

CBOR serialization struct for a single entry within a `NodeData` entry list.

func (*EntryData) MarshalCBOR

func (t *EntryData) MarshalCBOR(w io.Writer) error

func (*EntryData) UnmarshalCBOR

func (t *EntryData) UnmarshalCBOR(r io.Reader) (err error)

type Node

type Node struct {
	// array of key/value pairs and pointers to child nodes. entry arrays must always be in correct/valid order at any point in time: sorted by 'key', and at most one 'pointer' entry between 'value' entries.
	Entries []NodeEntry
	// "height" or "layer" of MST tree this node is at (with zero at the "bottom" and root/top of tree the "highest")
	Height int
	// if true, the cached CID of this node is out of date
	Dirty bool
	// optionally, the last computed CID of this Node (when expressed as NodeData)
	CID *cid.Cid
	// if true, this is an empty/incomplete node which just represents the CID of the tree. only used as part of MST inversion
	Stub bool
}

Represents a node in a Merkle Search Tree (MST). If this is the "root" or "top" of the tree, it effectively is the tree itself.

Trees may be "partial" if they contain references to child nodes by CID, but not pointers to `Node` representations.

func (*Node) IsEmpty

func (n *Node) IsEmpty() bool

func (*Node) IsPartial

func (n *Node) IsPartial() bool

Checks if the sub-tree (this node, or any children, recursively) contains any CID references to nodes which are not present.

func (*Node) NodeData

func (n *Node) NodeData() NodeData

Transforms `Node` struct to `NodeData`, which is the format used for encoding to CBOR.

Will panic if any entries are missing a CID (must compute those first)

type NodeData

type NodeData struct {
	Left    *cid.Cid    `cborgen:"l"` // [nullable] pointer to lower-level subtree to the "left" of this path/key
	Entries []EntryData `cborgen:"e"` // ordered list of entries at this node
}

CBOR serialization struct for a MST tree node. MST tree node as gets serialized to CBOR. Note that the CBOR fields are all single-character.

func NodeDataFromCBOR

func NodeDataFromCBOR(r io.Reader) (*NodeData, error)

Parses CBOR bytes in to `NodeData` struct

func (*NodeData) Bytes

func (d *NodeData) Bytes() ([]byte, *cid.Cid, error)

Encodes a single `NodeData` struct as CBOR bytes. Does not recursively encode or update children.

func (*NodeData) MarshalCBOR

func (t *NodeData) MarshalCBOR(w io.Writer) error

func (*NodeData) Node

func (d *NodeData) Node(c *cid.Cid) Node

Tansforms an encoded `NodeData` to `Node` data structure format.

c: optional CID argument for the CID of the CBOR representation of the NodeData

func (*NodeData) UnmarshalCBOR

func (t *NodeData) UnmarshalCBOR(r io.Reader) (err error)

type NodeEntry

type NodeEntry struct {
	Key      []byte
	Value    *cid.Cid
	ChildCID *cid.Cid
	Child    *Node

	// tracks whether anything about this entry has changed since `Node` CID was computed
	Dirty bool
}

Represents an entry in an MST `Node`, which could either be a direct path/value entry, or a pointer do a child tree node. Note that these are *not* one-to-one with `EntryData`.

Either the Key and Value fields should be non-zero; or the Child and/or ChildCID field should be non-zero. If ChildCID is present, but Child is not, then this is part of a "partial" tree.

func (*NodeEntry) IsChild

func (e *NodeEntry) IsChild() bool

Returns true if this entry points to a node on a lower level

func (*NodeEntry) IsValue

func (e *NodeEntry) IsValue() bool

Returns true if this entry is a key/value at the current node

type Tree

type Tree struct {
	Root *Node
}

High-level API for an MST, as a decoded in-memory data structure.

This might be an entire tree (all child nodes in-memory), or might be a partial tree with some nodes as CID links. Operations on the tree do not persist to any backing storage automatically.

Errors when operating on the tree may leave the tree in a partially modified or invalid/corrupt state.

func LoadTreeFromMap

func LoadTreeFromMap(m map[string]cid.Cid) (*Tree, error)

Creates a new Tree by loading key/value pairs from a map.

func LoadTreeFromStore

func LoadTreeFromStore(ctx context.Context, bs blockstore.Blockstore, root cid.Cid) (*Tree, error)

func NewEmptyTree

func NewEmptyTree() Tree

func (*Tree) Copy

func (t *Tree) Copy() Tree

Creates a deep copy of MST

func (*Tree) Get

func (t *Tree) Get(key []byte) (*cid.Cid, error)

Reads the value (CID) corresponding to the key.

If key is not in the tree, returns nil, not an error.

key: key or path being inserted. must not be empty/nil

func (*Tree) Insert

func (t *Tree) Insert(key []byte, val cid.Cid) (*cid.Cid, error)

Adds a key/value to the tree, and returns any previously existing value (CID).

Caller can inspect the previous value to determine if the behavior was a "creation" (key didn't exist), an "update" (key existed with a different value), or no-op (key existed with current value).

key: key or path being inserted. must not be empty/nil val: CID value being inserted

func (*Tree) IsEmpty

func (t *Tree) IsEmpty() bool

If the tree contains no key/value pairs, returns true.

func (*Tree) IsPartial

func (t *Tree) IsPartial() bool

Returns false if all nodes in the tree are available in-memory in decoded format; otherwise returns true. Does not consider record data, only MST nodes.

func (*Tree) Remove

func (t *Tree) Remove(key []byte) (*cid.Cid, error)

Removes key/value from the sub-tree provided. Return the previous CID value, if any. If key was not found, returns nil (which is not an error).

key: key or path being inserted. must not be empty/nil

func (*Tree) RootCID

func (t *Tree) RootCID() (*cid.Cid, error)

Returns the overall root-node CID for the MST.

If possible, lazily returned a known value. If necessary, recursively encodes tree nodes to compute CIDs.

NOTE: will mark the tree "clean" (clear any dirty flags).

func (*Tree) Verify

func (t *Tree) Verify() error

func (*Tree) WriteDiffBlocks

func (t *Tree) WriteDiffBlocks(ctx context.Context, bs blockstore.Blockstore) (*cid.Cid, error)

Walks the tree, encodes any "dirty" nodes as CBOR data, and writes that data as blocks to the provided blockstore. Returns root CID.

func (*Tree) WriteToMap

func (t *Tree) WriteToMap(m map[string]cid.Cid) error

Recursively walks the tree and writes key/value pairs to map `m`

The map (`m`) is mutated in place (by reference); the map must be initialized before calling.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL