muhash

package module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2024 License: ISC Imports: 9 Imported by: 2

README

go-muhash

Warning: This is pre-alpha software. The code has not been audited and/or reviewed by anyone other than the author.

ISC License

go-muhash implements a rolling hash function using a multiplicative hash. It is based on a multiplicative group over Z=2^3072-1103717 using multiplication and division for adding/removing elements from the hash function. The current code is heavily based on: Bitcoin MuHash (written by Pieter Wuille, MIT licensed) but uses BLAKE2B as the hash function and Go's standard library big.Int for fast modular inversions (GCD).

MuHash is the public interface implementing Add/Remove elements functions, and a Finalize function to return a final hash.

  • uint3072.go is a go implementation of the multiplicative group.
  • num3072.c/h is a C implementation of the multiplicative group.
  • num3072.go is go bindings for the C imlementation.

Ideally we will add Go Assembly implementations using SSE2/SSE4.1/AVX and will choose the correct one in runtime, this should also remove the cgo overhead.

Tests

  • ./build_and_test.sh will run all the tests and checks in this library.

  • ./fuzz.sh will run the fuzzer and put new corpus in the corpus directory. By default, it will use go-fuzz. But if you run with LIBFUZZER=1 ./fuzz.sh it will run it with libfuzzer. All the current corpus are checked in the unit test in fuzz_corpuses_test.go (requires -tags=gofuzz).

Documentation

Overview

Package muhash provides an implementation of a Multiplicative Hash, a cryptographic data structure that allows you to have a rolling hash function that you can add and remove elements from, without the need to re-serialize and re-hash the whole data set.

Index

Constants

View Source
const (
	// HashSize of array used to store hashes. See Hash.
	HashSize = 32
	// SerializedMuHashSize defines the length in bytes of SerializedMuHash
	SerializedMuHashSize = elementByteSize
)

Variables

View Source
var (

	// EmptyMuHashHash is the hash of `NewMuHash().Finalize()`
	EmptyMuHashHash = Hash{0x54, 0x4e, 0xb3, 0x14, 0x2c, 0x0, 0xf, 0xa, 0xd2, 0xc7, 0x6a, 0xc4, 0x1f, 0x42, 0x22, 0xab, 0xba, 0xba, 0xbe, 0xd8, 0x30, 0xee, 0xaf, 0xee, 0x4b, 0x6d, 0xc5, 0x6b, 0x52, 0xd5, 0xca, 0xc0}
)

Functions

This section is empty.

Types

type Hash

type Hash [HashSize]byte

Hash is a type encapsulating the result of hashing some unknown sized data. it typically represents Blake2b.

func (*Hash) AsArray

func (hash *Hash) AsArray() *[32]byte

AsArray is a helper function to returns a pointer to the underlying byte array.

func (Hash) IsEqual

func (hash Hash) IsEqual(target *Hash) bool

IsEqual returns true if target is the same as hash.

func (*Hash) SetBytes

func (hash *Hash) SetBytes(newHash []byte) error

SetBytes sets the bytes which represent the hash. An error is returned if the number of bytes passed in is not HashSize.

func (Hash) String

func (hash Hash) String() string

String returns the Hash as the hexadecimal string

type MuHash

type MuHash struct {
	// contains filtered or unexported fields
}

MuHash is a type used to create a Multiplicative Hash which is a rolling(homomorphic) hash that you can add and remove elements from and receive the same resulting hash as-if you never hashed them. Because of that the order of adding and removing elements doesn't matter. Use NewMuHash to initialize a MuHash, or DeserializeMuHash to parse a MuHash.

func DeserializeMuHash

func DeserializeMuHash(serialized *SerializedMuHash) (*MuHash, error)

DeserializeMuHash will deserialize the MuHash that `Serialize()` serialized.

func NewMuHash

func NewMuHash() *MuHash

NewMuHash return an empty initialized set. when finalized it should be equal to a finalized set with all elements removed.

func (*MuHash) Add

func (mu *MuHash) Add(data []byte)

Add hashes the data and adds it to the muhash. Supports arbitrary length data (subject to the underlying hash function(Blake2b) limits)

func (MuHash) Clone

func (mu MuHash) Clone() *MuHash

Clone the muhash to create a new one

func (*MuHash) Combine

func (mu *MuHash) Combine(other *MuHash)

Combine will add the MuHash together. Equivalent to manually adding all the data elements from one set to the other.

func (*MuHash) Finalize

func (mu *MuHash) Finalize() Hash

Finalize will return a hash(blake2b) of the multiset. Because the returned value is a hash of a multiset you cannot "Un-Finalize" it. If this is meant for storage then Serialize should be used instead.

func (*MuHash) Remove

func (mu *MuHash) Remove(data []byte)

Remove hashes the data and removes it from the multiset. Supports arbitrary length data (subject to the underlying hash function(Blake2b) limits)

func (*MuHash) Reset

func (mu *MuHash) Reset()

Reset clears the muhash from all data. Equivalent to creating a new empty set

func (*MuHash) Serialize

func (mu *MuHash) Serialize() *SerializedMuHash

Serialize returns a serialized version of the MuHash. This is the only right way to serialize a multiset for storage. This MuHash is not finalized, this is meant for storage.

func (MuHash) String

func (mu MuHash) String() string

String returns the MultiSet as the hexadecimal string

type SerializedMuHash

type SerializedMuHash [SerializedMuHashSize]byte

SerializedMuHash is a is a byte array representing the storage representation of a MuHash

func (SerializedMuHash) String

func (serialized SerializedMuHash) String() string

String returns the SerializedMultiSet as the hexadecimal string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL