hamt

package module
v0.0.0-...-049d484 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2017 License: Apache-2.0 Imports: 2 Imported by: 0

README

SYNOPSIS

This library is a Go language implementation of a functional HAMT, Hash Array Mapped Trie.

There is a lot of Jargon in there to be unpacked. Functional mean immutable and persistent. The term "immutable" means that the datastructure is is never changed after construction. Where "persistent" means that when that immutable datastructure is modified, is is based on the previous datastructure. In other words the new datastructure is not a copy with the modification applied. Instead, that new datastructure shares all unmodified parts of the previouse datastructure and only the changed parts are copied and modified; unchanged parts of the datastructure are shared between the old and new version.

Imagine a hypothetical tree structure with four leaves, two interior nodes and a root node. If you change the fourth leaf node, then a new fourth leaf node is created, as well as it's parent interior node, and a new root node.

        root tree node   root tree node'
            /    \         /   \
           /  +---\----- +      \
          /  /     \             \
     tree node 1   tree node 2  tree node 2'
        /  \          /  \        /   \
       /    \        / +--\------+     \
      /      \      / /    \            \
  Leaf 1   Leaf 2 Leaf 3  Leaf 4      Leaf 4'

Given this approach to changing a tree, a tree with a wide branching factor would be relatively shallow. So the path from root to leaf would be short and the amount of shared content would be substantial.

A Hash Array Mapped Trie is Trie where each node is represented by an Array. The pointer to the next node down in the Trie is selected by and index drived from Hash of the Key. For node arrays 32 entries wide we choose to use a 30 bit hash value. The hash value is chopped into 6 groups of 5 bits each. That works out nicely because 5 bits codifies an unsigned integer from 0 to 31. So each of the six 5 bit values index into each of the tree node's 32 entry array.

Lets calculate the 30bit hash value for the key "a".

mask30 := uint(1<<30) - 1
h := fnv.New32()
h.Write([]byte("a"))
h32 := h.Sum32() //uint32
h30 := (h32 >> 30) ^ (h32 & mask30)
fmt.Printf("%d 0x%08x 0b%032b\n", h30, h30, h30)

84696446 0x050c5d7e 0b00000101000011000101110101111110

We can take the lower 30bits of "00000101000011000101110101111110" and split it into six 5 bit values betweeen 0-31.

00010 10000 11000 10111 01011 11110
2, 16, 24, 23, 11, 30

If we format it from lowest bit to highest and zero-pad the numbers and join those numbers with a "/" and precede it with the same "/" we get a string:

/30/11/23/24/16/02

That is what this library calls a hash path. This allows you to address a node in a tree with 32 entry branches.

However we are working with a Trie. That means in order to look up a key we only have to walk down enough elements of the hash path to find a node that is not a branch but rather a leaf.

In order to insert into a Trie, we walk down the Trie to find a leaf or an empty slot. If it is an empty slot we simply insert a new leaf. If we find a preexisting leaf, it gets a more complicated. We insert a new node array into that slot and we try to insert the new leaf and the old leaf into that new array using the hash path of each leaf one level deeper. The complication occurs when the next entry in the hash paths of both leafs are the same. That results in another collision. So we have to create another array and try to insert both leaves at the next deeper index in their hash paths. We continue that till the maximum depth is achieved. In the end we just start stacking all the leaves that collide at the maximum depth into a fat ""collision leaf". But truthfully we won't have many leaves colliding with identical 30 bit hash paths.

Deleting from the HAMT is simpler that insertion. We find the leaf, if it exists, along the leaf's hash path and remove it. The only complicated part is when there is only one leaf left in an array level. When that happends we replace the array level's position in it's parent with the remaining leaf. This allows us to collapse the HAMT upon deletion.

The last part of this HAMT implementation to deal with is the "immutable" and "persistent" properties. When we change a node array at the by adding or deleting a leaf, we first have to copy that array and change the copy. Then we have to copy and update the parent to point to the changed node array. We continue this copy and update recursivly down to the root node array. When the roo node array is copied and updated, a new HAMT structure is created to represent the modified HAMT value. However, you should take note that most of the previous HAMT is preserved as only the node arrays of the hash path to the modification were copied and updated.

USAGE

It is simplest to import the version of the HAMT library you want to use. The 32 way branching HAMT (hamt32) or the 64 way branching HAMT (hamt64). I strongly recommend the hamt32 it is faster and wastes the least memory on unused table entries.

To import the hamt32 library:

import "github.com/lleo/go-hamt-functional/hamt32"

And you can refer to it as "hamt32" from then on.

Since this is an "immutable" implementation of HAMT you do not have to constructed a new HAMT structure via 'new()'. Just use hamt32.Hamt as a value.

import (
    "github.com/lleo/go-hamt-functional/hamt32"
    "github.com/lleo/go-hamt-key/stringkey"
)

var h hamt32.Hamt
for i, s := range []string{"foo", "bar", "baz"} {
    var k = stringkey.New(s)
    var added bool
    h, added = h.Put(k, i)
        if !added {
            log.Panicf("replaced, didn't add, value %d, for key %s", i, k)
        }
}

Notice that we are keeping the current value of the Hamt in the 'h' variable and forgetting the previous value. That allows some of the previous Hamt's internal datastructures to be garbage collected as needed.

Documentation

Overview

Package hamt is the front-end for a 32bit and 64bit implementations of a functional Hash Array Mapped Trie (HAMT) datastructure.

In this case, functional means immutable and persistent. The term "immutable" means that the datastructure is is never changed after construction. Where "persistent" means that when that immutable datastructure is modified, is is based on the previous datastructure. In other words the new datastructure is not a copy with the modification applied. In stead, that new datastructure shares all un-modified parts of the previouse datastructure and only the changed parts are copied and modified; unchanged parts of the datastructure are shared between the old and new version.

A HAMT structure is a tree with a fixed & wide branching factor. Trees make and excellent datastructure to be immutable and persistent. Our HAMT starts with a root branch. Branches are called tables, because they are represented as tables with the "branching factor" number of entries. These entries may be one of three types of nodes: further branches (aka tables) or key/val entries (aka leafs) or emtpy (aka nil).

A HAMT is a key/val indexing datastructure. Rather than indexing the datastructure on the key directly, which could result in a rather deep tree datastructure. We generate a hash value of the key, and split the hash value into a fixed number of indexes into fixed size arrays. This results in a tree with a maximum depth and a wide branching factor.

For example, we can use a key type of a string. Hash that string into a 32 bit hash value. Coerce that 32 bit value into a 30 bit value. Then split that 30 bit hash value into six 5 bit values. Those 5 bit values will index perfectly into tree nodes with 32 wide branching factor. Now we have tree with a string for the key that is AT MOST six levels deep, in other words O(1) lookup and modification operations.

Lets call the number of hash bits H (for hash value). The number of parts the hash value can be split into we'll call D (for depth). The width of each table (aka branching factor) is 2^B; I think of B as "bits per level". The relationship of H:D:B is given by H/B = D. I've implemented in hamt32 (H=30, D=6, B=5) and hamt64 is (H=60, D=10, B=6). We could call the branching factor W for "width" of each tree node. However W is superfluous, because it can be derived from B (aka W=2^B).

The number in hamt32 is the branching factor W=2^B=32; from H=30,D=6,B=5 . The number in hamt64 is the branching factor W=2^B=64; from H=60,D=10,B=6 .

HAMTs are [Tries](https://en.wikipedia.org/wiki/Trie), because when we are trying to find a location to Get, Put, or Delete a key/value pair we mearly have to walk the "hash path" till we find a non-branching node. The HashPath is the H bit hash value, split into a ordered sequence of B bit integer values that is, at most, D entry tries long.

Lets start with a concrete example of a hamt32 (aka H=30,D=6,B=5). Given the string "ewyx" the Hash30() HashVal30 is 0x11a01c5e. Converted into six descreet 5 bit values (from lowest bit to highest) you get 30, 2, 7, 0 26, and 8. This library prints them out from HashVal30.String() as "/30/02/07/00/26/08"; The hash path from lowest to highest bit. That string, "/30/02/07/00/26/08", is the "hash path". Looking up where to find this entry we look up the 30th index of the root of the tree, if that entry is another branch we look up the 2nd index of that next branch. We continue (7th, 0th et al) until we find a non-branch entry. The non-branch entry can be a leaf or empty.

Just to be pedantic the go-hamt-key API calculates the indexes by depth as follows:

my k = stringkey.New("ewyx")
var h30 = key.Hash30() //=> 295705694 or 0x11a01c5e
h30.BitString()        //=> "00 01000 11010 00000 00111 00010 11110"
h30.Index(0)           //=> 11110 or 30
h30.Index(1)           //=> 00010 or 2
h30.Index(2)           //=> 00111 or 7
h30.Index(3)           //=> 00000 or 0
h30.Index(4)           //=> 11010 or 26
h30.Index(5)           //=> 01000 or 8
h30.String()           //=> "/30/02/07/00/26/08"

Now we know how to find the candidate location or entry for our operation. That operation can be either a straight lookup, called with h.Get(k); or it can be an insertion of a key/value pair, called with h.Put(k,v); or lastly it can be a deletion operation, called with h.Del(k).

For either hamt32.Hamt or hamt64.Hamt value we have three primary operations: h.Get(), h.Put(), and h.Del().

Only h.Put() and h.Del() modify the HAMT. When they modify a table, first the table is copied, then the modification is made to the copy. Next the parent table must be copied so that the new table's entry in the copied parent may be modified. This is continued to the root table and the HAMT structure itself is copied. This is the h.persist() call. Hence, h.Put() and h.Del() return the new HAMT structure as well as any other return values specific to h.Put() or h.Del().

Given that Get() makes no modification of the HAMT structure, it only returns a boolean indicating the key was found in the HAMT and the key's value.

Put() returns a copy of the HAMT and a boolean indicating whether a new entry was added (true) or a current entry was updated (false).

Del() returns a boolean value indicating if the key was found, and if true what the value of the deleted key was, and the new HAMT structure. If the Del() didn't find the key (a false return value) key's value data is nil and the HAMT value is the current HAMT.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewHamt32

func NewHamt32() hamt32.Hamt

NewHamt32 returns an empty hamt32.Hamt value. Given that Hamt structs are immutable we return the Hamt structure by value.

func NewHamt64

func NewHamt64() hamt64.Hamt

NewHamt64 returns an empty hamt64 value. Given that Hamt structs are immutable we return the Hamt structure by value.

Types

This section is empty.

Directories

Path Synopsis
Package hamt32 implements a functional Hash Array Mapped Trie (HAMT).
Package hamt32 implements a functional Hash Array Mapped Trie (HAMT).
Package hamt64 implements a functional Hash Array Mapped Trie (HAMT).
Package hamt64 implements a functional Hash Array Mapped Trie (HAMT).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL