pmmap

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 2, 2023 License: MIT Imports: 2 Imported by: 0

README

Persistent Mergeable Hash Map (pmmap)

Go Reference

This package provides a Go implementation of a persistent key-value hash map with an efficient merge operation.

The maps are immutable, so modifying operations (inserts and removals) return a copy of the map with the operation applied.

The backing data structure is a patricia trie on key hashes.

Usage

The map uses generics to make the API ergonomic. Therefore Go 1.18+ is required.

Install: go get github.com/BarrensZeppelin/pmmap

hasher := pmmap.NumericHasher[int]()
map0 := pmmap.New[string](hasher)
map1 := map0.Insert(42, "Hello World")
fmt.Println(map0.Lookup(42)) // "", false
fmt.Println(map1.Lookup(42)) // "Hello World", true

To create a map with key type K you must supply an implementation of Hasher[K]:

type Hasher[K any] interface {
	Equal(a, b K) bool
	Hash(K) uint64
}

Default hasher implementations are included for numeric types and strings.

Merges

The hash maps support a merge operation that will join the key-value pairs in two maps into a single map. This operation is made efficient by:

  • Re-using substructures from the two merged maps when possible.

    For instance, merging a map with an empty map returns the first map directly without traversing any of its elements.

  • Skipping processing of shared substructures.

    For instance, merging a map with itself always takes constant time.

Re-using shared substructures in merged maps drastically reduces memory usage and execution time of the merge operation. Generally, merging a map with a version of the map with $r$ new insertions will take linear time in $r$ (indepedent of the size of the map, compared to looping over one of the maps).

When the merged maps both contain a mapping for a key, the mapped values are merged with a user-provided value merging binary operator $f$. This operator must be commutative and idempotent:

$$ \forall a, b. f(a, b) = f(b, a) \textrm{ and } f(a, a) = a $$

map2 := map0.Insert(42, "Hello Wzrld").Insert(43, "Goodbye")
fmt.Println(
	map1.Merge(map2, func(a, b string) (string, bool) {
		// Return the lexicographically smallest string
		if a > b {
			a, b = b, a
		}
		return a, a == b
	})
) // [42 ↦ Hello World 43 ↦ Goodbye]

The returned flag should be true iff. the two values are equal. This allows the implementation to re-use more substructures.

Benchmarks

The project includes some performance benchmarks that compare the speed of insert and lookup operations to that of Go's builtin map implementation. Inserts are roughly 8-10 times slower than the builtin map and lookups are roughly 6 times slower. This map implementation is not a good general-purpose replacement for hash maps. It is most useful when the merge operation is used to speed up merges of large maps and when large maps must be copied (which is essentially free due to immutability).

Documentation

Overview

Example
hasher := NumericHasher[int]()
tree0 := New[int](hasher)
tree1 := tree0.Insert(5, 6)
fmt.Println(tree0)
fmt.Println(tree1)
tree2 := tree0.Insert(5, 10)
fmt.Println(tree1.Equal(tree2, hasher.Equal))
fmt.Println(tree1.Merge(tree2, func(a, b int) (int, bool) {
	// Return the max of a and b
	if a < b {
		a, b = b, a
	}
	return a, a == b
}))
Output:

tree[]
tree[5 ↦ 6]
false
tree[5 ↦ 10]

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Hasher

type Hasher[K any] interface {
	Equal(a, b K) bool
	Hash(K) uint64
}

The Hasher type provides a way to hash values as well as compare them for equality.

The implementation guarantees that the hash function is called exactly once for lookup, insertion and removal.

func NumericHasher

func NumericHasher[T Numeric]() Hasher[T]

func StringHasher

func StringHasher[T ~string]() Hasher[T]

type MergeFunc

type MergeFunc[V any] func(a, b V) (V, bool)

MergeFunc describes a binary operator, f, that merges two values. The operator must be commutative and idempotent. I.e.:

f(a, b) = f(b, a)
f(a, a) = a

The second return value informs the caller whether a == b. This flag allows some optimizations in the implementation.

type Numeric

type Numeric interface {
	~int | ~int8 | ~int16 | ~int32 | ~int64 |
		~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~uintptr
}

type Tree

type Tree[K, V any] struct {
	// contains filtered or unexported fields
}

Tree represents a persistent hash map.

Hash collisions are resolved by putting key-value pairs into buckets that are scanned at lookups.

Hash values of keys must not change while they are stored in the map.

func New

func New[V, K any](hasher Hasher[K]) Tree[K, V]

Construct a new persistent key-value map with the specified hasher.

func (Tree[K, V]) Equal

func (tree Tree[K, V]) Equal(other Tree[K, V], f func(V, V) bool) bool

Equal checks whether two maps are equal. Values are compared with the provided function. This operation also skips processing of shared subtrees.

func (Tree[K, V]) ForEach

func (tree Tree[K, V]) ForEach(f eachFunc[K, V])

Call the given function once for each key-value pair in the map.

func (Tree[K, V]) Insert

func (tree Tree[K, V]) Insert(key K, value V) Tree[K, V]

Insert the given key-value pair into the map. Replaces previous value with the same key if it exists.

func (Tree[K, V]) InsertOrMerge

func (tree Tree[K, V]) InsertOrMerge(key K, value V, f MergeFunc[V]) Tree[K, V]

Inserts the given key-value pair into the map. If a previous mapping (prevValue) exists for the key, the inserted value will be `f(value, prevValue)`.

func (Tree[K, V]) Lookup

func (tree Tree[K, V]) Lookup(key K) (ret V, found bool)

Lookup returns the value mapped to the provided key in the map. The semantics are equivalent to those of 2-valued lookup in regular Go maps.

func (Tree[K, V]) Merge

func (tree Tree[K, V]) Merge(other Tree[K, V], f MergeFunc[V]) Tree[K, V]

Merges two maps. If both maps contain a value for a key, the resulting map will map the key to the result of `f` on the two values.

See the documentation for MergeFunc for conditions that `f` must satisfy. No guarantees are made about the order of arguments provided to `f`.

This operation is made fast by skipping processing of shared subtrees. Merging a tree with itself after r updates takes linear time in r.

func (Tree[K, V]) Remove

func (tree Tree[K, V]) Remove(key K) Tree[K, V]

Remove a mapping for the given key if it exists.

func (Tree[K, V]) Size

func (tree Tree[K, V]) Size() (res int)

Size returns the number of key-value pairs in the map. NOTE: Runs in linear time in the size of the map.

func (Tree[K, V]) String

func (tree Tree[K, V]) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL