mst

package

v0.0.0-...-adf259e Latest Latest Go to latest Published: Jan 26, 2025 License: Apache-2.0, MIT Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/lperson/indigo

Documentation ¶

Overview ¶

Package mst contains a Merkle Search Tree (MST) implementation for atproto.

This implementation is a port of the Typescript implementation in the `atproto` git repo.

## Notes copied from TS repo

This is an implementation of a Merkle Search Tree (MST) The data structure is described here: https://hal.inria.fr/hal-02303490/document The MST is an ordered, insert-order-independent, deterministic tree. Keys are laid out in alphabetic order. The key insight of an MST is that each key is hashed and starting 0s are counted to determine which layer it falls on (5 zeros for ~32 fanout). This is a merkle tree, so each subtree is referred to by it's hash (CID). When a leaf is changed, ever tree on the path to that leaf is changed as well, thereby updating the root hash.

For atproto, we use SHA-256 as the key hashing algorithm, and ~16 fanout (4-bits of zero per layer).

NOTE: currently keys are strings, not bytes. Because UTF-8 strings can't be safely split at arbitrary byte boundaries (the results are not necessarily valid UTF-8 strings), this means that "wide" characters not really supported in keys, particularly across programming language implementations. We recommend sticking with simple alphanumeric (ASCII) strings.

A couple notes on CBOR encoding:

There are never two neighboring subtrees. Therefore, we can represent a node as an array of leaves & pointers to their right neighbor (possibly null), along with a pointer to the left-most subtree (also possibly null).

Most keys in a subtree will have overlap. We do compression on prefixes by describing keys as: - the length of the prefix that it shares in common with the preceding key - the rest of the string

For example: If the first leaf in a tree is `bsky/posts/abcdefg` and the second is `bsky/posts/abcdehi` Then the first will be described as `prefix: 0, key: 'bsky/posts/abcdefg'`, and the second will be described as `prefix: 16, key: 'hi'.`

Index ¶

Variables
func CBORTypes() []reflect.Type
type DiffOp
- func DiffTrees(ctx context.Context, bs blockstore.Blockstore, from, to cid.Cid) ([]*DiffOp, error)
type MerkleSearchTree
- func LoadMST(cst cbor.IpldStore, root cid.Cid) *MerkleSearchTree
- func NewEmptyMST(cst cbor.IpldStore) *MerkleSearchTree
type NodeData
- func (t *NodeData) MarshalCBOR(w io.Writer) error
- func (t *NodeData) UnmarshalCBOR(r io.Reader) (err error)
type TreeEntry
- func (t *TreeEntry) MarshalCBOR(w io.Writer) error
- func (t *TreeEntry) UnmarshalCBOR(r io.Reader) (err error)

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrNotFound = fmt.Errorf("mst: not found")

Functions ¶

func CBORTypes ¶

func CBORTypes() []reflect.Type

CBORTypes returns the types in this package that need to be registered with the CBOR codec.

Types ¶

type DiffOp ¶

type DiffOp struct {
	Depth  int
	Op     string
	Rpath  string
	OldCid cid.Cid
	NewCid cid.Cid
}

func DiffTrees ¶

func DiffTrees(ctx context.Context, bs blockstore.Blockstore, from, to cid.Cid) ([]*DiffOp, error)

TODO: this code isn't great, should be rewritten on top of the baseline datastructures once functional and correct

type MerkleSearchTree ¶

type MerkleSearchTree struct {
	// contains filtered or unexported fields
}

MerkleSearchTree represents an MST tree node (NodeData type). It can be in several levels of hydration: fully hydrated (entries and "pointer" (CID) computed); dirty (entries correct, but pointer (CID) not valid); virtual (pointer is defined, no entries have been pulled from blockstore)

MerkleSearchTree values are immutable. Methods return copies with changes.

func LoadMST ¶

func LoadMST(cst cbor.IpldStore, root cid.Cid) *MerkleSearchTree

This is poorly named in both implementations, because it is lazy Typescript: MST.load(storage, cid, layer=null, fanout) -> MST

func NewEmptyMST ¶

func NewEmptyMST(cst cbor.IpldStore) *MerkleSearchTree

NewEmptyMST reports a new empty MST using cst as its storage.

func (*MerkleSearchTree) Add ¶

func (mst *MerkleSearchTree) Add(ctx context.Context, key string, val cid.Cid, knownZeros int) (*MerkleSearchTree, error)

"Adds a new leaf for the given key/value pair. Throws if a leaf with that key already exists" Typescript: MST.add(key, value, knownZeros?) -> MST

func (*MerkleSearchTree) Delete ¶

func (mst *MerkleSearchTree) Delete(ctx context.Context, k string) (*MerkleSearchTree, error)

"Deletes the value at the given key" Typescript: MST.delete(key) -> MST

func (*MerkleSearchTree) Get ¶

func (mst *MerkleSearchTree) Get(ctx context.Context, k string) (cid.Cid, error)

"Gets the value at the given key" Typescript: MST.get(key) -> (CID|null)

func (*MerkleSearchTree) GetPointer ¶

func (mst *MerkleSearchTree) GetPointer(ctx context.Context) (cid.Cid, error)

"We don't hash the node on every mutation for performance reasons. Instead we keep track of whether the pointer is outdated and only (recursively) calculate when needed" Typescript: MST.getPointer() -> CID

func (*MerkleSearchTree) Update ¶

func (mst *MerkleSearchTree) Update(ctx context.Context, k string, val cid.Cid) (*MerkleSearchTree, error)

"Edits the value at the given key. Throws if the given key does not exist" Typescript: MST.update(key, value) -> MST

func (*MerkleSearchTree) WalkLeavesFrom ¶

func (mst *MerkleSearchTree) WalkLeavesFrom(ctx context.Context, from string, cb func(key string, val cid.Cid) error) error

WalkLeavesFrom walks the leaves of the tree, calling the cb callback on each key that's greater than or equal to the provided from key. If cb returns an error, the walk is aborted and the error is returned.

type NodeData ¶

type NodeData struct {
	Left    *cid.Cid    `cborgen:"l"` // [nullable] pointer to lower-level subtree to the "left" of this path/key
	Entries []TreeEntry `cborgen:"e"` // ordered list of entries at this node
}

MST tree node as gets serialized to CBOR. Note that the CBOR fields are all single-character.

func (*NodeData) MarshalCBOR ¶

func (t *NodeData) MarshalCBOR(w io.Writer) error

func (*NodeData) UnmarshalCBOR ¶

func (t *NodeData) UnmarshalCBOR(r io.Reader) (err error)

type TreeEntry ¶

type TreeEntry struct {
	PrefixLen int64    `cborgen:"p"` // count of characters shared with previous path/key in tree
	KeySuffix []byte   `cborgen:"k"` // remaining part of path/key (appended to "previous key")
	Val       cid.Cid  `cborgen:"v"` // CID pointer at this path/key
	Tree      *cid.Cid `cborgen:"t"` // [nullable] pointer to lower-level subtree to the "right" of this path/key entry
}

TreeEntry are elements of NodeData's Entries.

func (*TreeEntry) MarshalCBOR ¶

func (t *TreeEntry) MarshalCBOR(w io.Writer) error

func (*TreeEntry) UnmarshalCBOR ¶

func (t *TreeEntry) UnmarshalCBOR(r io.Reader) (err error)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL