hamt

package module
v3.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 2, 2024 License: MIT Imports: 15 Imported by: 47

README

go-hamt-ipld

Travis CI

This package is a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin.

HAMT is a "hash array mapped trie". This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID).

See https://godoc.org/github.com/filecoin-project/go-hamt-ipld for more information and API details.

License

MIT © Whyrusleeping

Documentation

Overview

Package hamt provides a reference implementation of the IPLD HAMT used in the Filecoin blockchain. It includes some optional flexibility such that it may be used for other purposes outside of Filecoin.

HAMT is a "hash array mapped trie" https://en.wikipedia.org/wiki/Hash_array_mapped_trie. This implementation extends the standard form by including buckets for the key/value pairs at storage leaves and CHAMP mutation semantics https://michael.steindorfer.name/publications/oopsla15.pdf. The CHAMP invariant and mutation rules provide us with the ability to maintain canonical forms given any set of keys and their values, regardless of insertion order and intermediate data insertion and deletion. Therefore, for any given set of keys and their values, a HAMT using the same parameters and CHAMP semantics, the root node should always produce the same content identifier (CID).

Algorithm Overview

The HAMT algorithm hashes incoming keys and uses incrementing subsections of that hash digest at each level of its tree structure to determine the placement of either the entry or a link to a child node of the tree. A `bitWidth` determines the number of bits of the hash to use for index calculation at each level of the tree such that the root node takes the first `bitWidth` bits of the hash to calculate an index and as we move lower in the tree, we move along the hash by `depth x bitWidth` bits. In this way, a sufficiently randomizing hash function will generate a hash that provides a new index at each level of the data structure. An index comprising `bitWidth` bits will generate index values of `[ 0, 2^bitWidth )`. So a `bitWidth` of 8 will generate indexes of 0 to 255 inclusive.

Each node in the tree can therefore hold up to `2^bitWidth` elements of data, which we store in an array. In the this HAMT and the IPLD HashMap we store entries in buckets. A `Set(key, value)` mutation where the index generated at the root node for the hash of key denotes an array index that does not yet contain an entry, we create a new bucket and insert the key / value pair entry. In this way, a single node can theoretically hold up to `2^bitWidth x bucketSize` entries, where `bucketSize` is the maximum number of elements a bucket is allowed to contain ("collisions"). In practice, indexes do not distribute with perfect randomness so this maximum is theoretical. Entries stored in the node's buckets are stored in key-sorted order.

Parameters

This HAMT implementation:

• Fixes the `bucketSize` to 3.

• Defaults the `bitWidth` to 8, however within Filecoin it uses 5

• Defaults the hash algorithm to the 64-bit variant of Murmur3-x64

Further Reading

The algorithm used here is identical to that of the IPLD HashMap algorithm specified at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md. The specific parameters used by Filecoin and the DAG-CBOR block layout differ from the specification and are defined at https://github.com/ipld/specs/blob/master/data-structures/hashmap.md#Appendix-Filecoin-hamt-variant.

Index

Constants

View Source
const (
	// use OVERWRITE for modifyValue operations that overwrite existing values
	OVERWRITE = overwrite(true)
	// use NOVERWRITE for modifyValue operations that cannot overwrite existing values
	NOVERWRITE = overwrite(false)
)
View Source
const (
	// return MODIFIED when a key value mapping is overwritten
	MODIFIED = modified(true)
	// return UNMODIFIED when a no key value mappings are overwritten
	UNMODIFIED = modified(false)
)

Variables

View Source
var ErrMalformedHamt = fmt.Errorf("HAMT node was malformed")

ErrMalformedHamt is returned whenever a block intended as a HAMT node does not conform to the expected form that a block may take. This can occur during block-load where initial validation takes place or during traversal where certain conditions are expected to be met.

View Source
var ErrMaxDepth = fmt.Errorf("attempted to traverse HAMT beyond max-depth")

ErrMaxDepth is returned when the HAMT spans further than the hash function is capable of representing. This can occur when sufficient hash collisions (e.g. from a weak hash function and attacker-provided keys) extend leaf nodes beyond the number of bits that a hash can represent. Or this can occur on extremely large (likely impractical) HAMTs that are unable to be represented with the hash function used. Hash functions with larger byte output increase the maximum theoretical depth of a HAMT.

Functions

This section is empty.

Types

type Change added in v3.1.0

type Change struct {
	Type   ChangeType
	Key    string
	Before *cbg.Deferred
	After  *cbg.Deferred
}

Change represents a change to a DAG and contains a reference to the old and new CIDs.

func Diff added in v3.1.0

func Diff(ctx context.Context, prevBs, curBs cbor.IpldStore, prev, cur cid.Cid, opts ...Option) ([]*Change, error)

Diff returns a set of changes that transform node 'prev' into node 'cur'. opts are applied to both prev and cur.

func ParallelDiff added in v3.2.0

func ParallelDiff(ctx context.Context, prevBs, curBs cbor.IpldStore, prev, cur cid.Cid, workers int64, opts ...Option) ([]*Change, error)

ParallelDiff returns a set of changes that transform node 'prev' into node 'cur'. opts are applied to both prev and cur.

func (Change) String added in v3.1.0

func (ch Change) String() string

type ChangeType added in v3.1.0

type ChangeType int

ChangeType denotes type of change in Change

const (
	Add ChangeType = iota
	Remove
	Modify
)

These constants define the changes that can be applied to a DAG.

type HashFunc

type HashFunc func([]byte) []byte

HashFunc is a hashing function for values.

type KV

type KV struct {
	Key   []byte
	Value *cbg.Deferred
}

KV represents leaf storage within a HAMT node. A Pointer may hold up to `bucketSize` KV elements, where each KV contains a key and value pair stored by the user.

Keys are represented as bytes.

The IPLD Schema representation of this data structure is as follows:

type KV struct {
	key Bytes
	value Any
} representation tuple

func (*KV) MarshalCBOR

func (t *KV) MarshalCBOR(w io.Writer) error

func (*KV) UnmarshalCBOR

func (t *KV) UnmarshalCBOR(r io.Reader) error

type Node

type Node struct {
	Bitfield *big.Int
	Pointers []*Pointer
	// contains filtered or unexported fields
}

Node is a single point in the HAMT, encoded as an IPLD tuple in DAG-CBOR of shape:

[bytes, [Pointer...]]

where 'bytes' is the big.Int#Bytes() and the Pointers array is between 1 and `2^bitWidth`.

The Bitfield provides us with a mechanism to store a compacted array of Pointers. Each bit in the Bitfield represents an element in a sparse array where `1` indicates the element is present in the Pointers array and `0` indicates it is omitted. To look-up a specific index in the Pointers array you must first make a count of the number of `1`s (popcount) up to the element you are looking for. e.g. a Bitfield of `10010110000` shows that we have a 4 element Pointers array. Indexes `[1]` and `[2]` are not present, but index `[3]` is at the second position of our Pointers array.

The IPLD Schema representation of this data structure is as follows:

type Node struct {
	bitfield Bytes
	pointers [Pointer]
} representation tuple

func LoadNode

func LoadNode(ctx context.Context, cs cbor.IpldStore, c cid.Cid, options ...Option) (*Node, error)

LoadNode loads a HAMT Node from the IpldStore and configures it according to any specified Option parameters. Where the parameters of this HAMT vary from the defaults (hash function and bitWidth), those variations _must_ be supplied here via Options otherwise the HAMT will not be readable.

Users should consider how their HAMT parameters are stored or specified along with their HAMT where the data is expected to have a long shelf-life as future users will need to know the parameters of a HAMT being loaded in order to decode it. Users should also NOT rely on the default parameters of this library to remain the defaults long-term and have strategies in place to manage variations.

func NewNode

func NewNode(cs cbor.IpldStore, options ...Option) (*Node, error)

NewNode creates a new IPLD HAMT Node with the given IPLD store and any additional options (bitWidth and hash function).

This function creates a new HAMT that you can use directly and is also used internally to create child nodes.

func (*Node) Copy

func (n *Node) Copy() *Node

Copy a HAMT node and all of its contents. May be useful for mutation operations where the original needs to be preserved in memory.

This operation will also recursively clone any child nodes that are attached as cached nodes.

func (*Node) Delete

func (n *Node) Delete(ctx context.Context, k string) (bool, error)

Delete removes an entry from the HAMT structure.

Returns true if the key was found and deleted, false if the key was absent.

This operation will result in the modification of _at least_ one IPLD block via the IpldStore. Depending on the contents of the leaf node, this operation may result in a node collapse to shrink the HAMT into its canonical form for the remaining data. For an insufficiently random collection of keys at the relevant leaf nodes such a collapse may cascade to further nodes.

func (*Node) Find

func (n *Node) Find(ctx context.Context, k string, out cbg.CBORUnmarshaler) (bool, error)

Find navigates through the HAMT structure to where key `k` should exist. If the key is not found, returns false. If the key is found, returns true, and if the `out` parameter has an UnmarshalCBOR(Reader) method, the value is decoded into it. The `out` parameter may be nil to test for existence without decoding.

Depending on the size of the HAMT, this method may load a large number of child nodes via the HAMT's IpldStore.

func (*Node) FindRaw

func (n *Node) FindRaw(ctx context.Context, k string) (bool, []byte, error)

FindRaw performs the same function as Find, but returns the raw bytes found at the key's location (which may or may not be DAG-CBOR, see also SetRaw).

func (*Node) Flush

func (n *Node) Flush(ctx context.Context) error

Flush has two effects, it (partially!) persists data and resets dirty flag

Flush operates recursively, telling each "cache" child node to flush; Put'ing that "cache" node to the store; updating this node's Link to the CID resulting from the store Put; clearing the dirty flag of that pointer to flase and then returning. Flush doesn't operate unless there's a "cache" node.

"cache" nodes were previously storing either updated values, or, simply storing previously loaded data; these are disambiguated by the dirty flag which is true when the cache node's data has not been persisted

Notice that Flush _does not_ Put _this node_. To fully persist changes, the caller still needs to Put this node to the store themselves, and store the new resulting Link wherever they expect the updated HAMT to be seen.

func (*Node) ForEach

func (n *Node) ForEach(ctx context.Context, f func(k string, val *cbg.Deferred) error) error

ForEach recursively calls function f on each k / val pair found in the HAMT. This performs a full traversal of the graph and for large HAMTs can cause a large number of loads from the underlying store. The values are returned as raw bytes, not decoded.

func (*Node) MarshalCBOR

func (t *Node) MarshalCBOR(w io.Writer) error

func (*Node) Set

func (n *Node) Set(ctx context.Context, k string, v cbg.CBORMarshaler) error

Set key k to value v, where v is has a MarshalCBOR(bytes.Buffer) method to encode it.

To fully commit the change, it is necessary to Flush the root Node, and then additionally Put the root node to the store itself, and save the resulting CID wherever you expect the HAMT root to persist.

func (*Node) SetIfAbsent

func (n *Node) SetIfAbsent(ctx context.Context, k string, v cbg.CBORMarshaler) (bool, error)

SetIfAbsent sets key k to value v only if k is not already set to some value. Returns true if the value mapped to k is changed by this operation false otherwise.

func (*Node) SetRaw

func (n *Node) SetRaw(ctx context.Context, k string, raw []byte) error

SetRaw is similar to Set but sets key k in the HAMT to raw bytes without performing a DAG-CBOR marshal. The bytes may or may not be encoded DAG-CBOR (see also FindRaw for fetching raw form).

func (*Node) UnmarshalCBOR

func (t *Node) UnmarshalCBOR(r io.Reader) error

func (*Node) Write added in v3.2.0

func (n *Node) Write(ctx context.Context) (cid.Cid, error)

Write is a convenience method that calls flush and writes the node to it's internal store, returning the CID of the stored node. It is equivelant to:

n.Flush
store.Put(ctx, n)

where store is equal to the store provided to the node when constructed.

write should only be called on the root node of a HAMT

type Option

type Option func(*config) error

Option is a function that configures a HAMT.

func UseHashFunction

func UseHashFunction(hash HashFunc) Option

UseHashFunction allows you to set the hash function used for internal indexing by the HAMT.

Passing in the returned Option to NewNode will generate a new HAMT that uses the specified hash function.

The default hash function is murmur3-x64 but you should use a cryptographically secure function such as SHA2-256 if an attacker may be able to pick the keys in order to avoid potential hash collision (tree explosion) attacks.

func UseTreeBitWidth

func UseTreeBitWidth(bitWidth int) Option

UseTreeBitWidth allows you to set a custom bitWidth of the HAMT in bits (from 1-8).

Passing in the returned Option to NewNode will generate a new HAMT that uses the specified bitWidth.

The default bitWidth is 8.

type Pointer

type Pointer struct {
	KVs  []*KV
	Link cid.Cid
	// contains filtered or unexported fields
}

Pointer is an element in a HAMT node's Pointers array, encoded as an IPLD tuple in DAG-CBOR of shape:

CID or [KV...]

i.e. it is represented as a "kinded union" where a Link is a pointer to a child node, while an array is a bucket of elements local to this node. A Pointer must represent exactly one of of these two states and cannot be both (or neither).

There are between 1 and 2^bitWidth of these Pointers in any HAMT node.

A Pointer contains either a KV bucket of up to `bucketSize` (3) values or a link (CID) to a child node. When a KV bucket overflows beyond `bucketSize`, the bucket is replaced with a link to a newly created HAMT node which will contain the `bucketSize+1` elements in its own Pointers array.

The IPLD Schema representation of this data structure is as follows:

type Pointer union {
	&Node link
	Bucket list
} representation kinded

type Bucket [KV]

func (*Pointer) MarshalCBOR

func (t *Pointer) MarshalCBOR(w io.Writer) error

func (*Pointer) UnmarshalCBOR

func (t *Pointer) UnmarshalCBOR(r io.Reader) error

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL