murmur3

package module
v1.1.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 1, 2023 License: BSD-3-Clause Imports: 2 Imported by: 219

README

murmur3

Native Go implementation of Austin Appleby's third MurmurHash revision (aka MurmurHash3).

Includes assembly for amd64 for 64/128 bit hashes, seeding functions, and string functions to avoid string to slice conversions.

Hand rolled 32 bit assembly was removed during 1.11, but may be reintroduced if the compiler slows down any more. As is, the compiler generates marginally slower code (by one instruction in the hot loop).

The reference algorithm has been slightly hacked as to support the streaming mode required by Go's standard Hash interface.

Endianness

Unlike the canonical source, this library always reads bytes as little endian numbers. This makes the hashes portable across architectures, although does mean that hashing is a bit slower on big endian architectures.

Safety

This library used to use unsafe to convert four bytes to a uint32 and eight bytes to a uint64, but Go 1.14 introduced checks around those types of conversions that flagged that code as erroneous when hashing on unaligned input. While the code would not be problematic on amd64, it could be problematic on some architectures.

As of Go 1.14, those conversions were removed at the expense of a very minor performance hit. This hit affects all cpu architectures on for Sum32, and non-amd64 architectures for Sum64 and Sum128. For 64 and 128, custom assembly exists for amd64 that preserves performance.

Testing

Build Status

Testing includes comparing random inputs against the canonical implementation, and testing length 0 through 17 inputs to force all branches.

Because this code always reads input as little endian, testing against the canonical source is skipped for big endian architectures. The canonical source just converts bytes to numbers, meaning on big endian architectures, it will use different numbers for its hashing.

Documentation

GoDoc

Full documentation can be found on godoc.

Benchmarks

Benchmarks below were run on an amd64 machine with and without the custom assembly. The following numbers are for Go 1.14.1 and are comparing against spaolacci/murmur3.

You will notice that at small sizes, the other library is better. This is due to this library converting to safe code for Go 1.14. At large sizes, this library is nearly identical to the other. On amd64, the 64 bit and 128 bit sums come out to ~9% faster.

32 bit sums:

32Sizes/32-12     3.00GB/s ± 1%  2.12GB/s ±11%  -29.24%  (p=0.000 n=9+10)
32Sizes/64-12     3.61GB/s ± 3%  2.79GB/s ± 8%  -22.62%  (p=0.000 n=10+10)
32Sizes/128-12    3.47GB/s ± 8%  2.79GB/s ± 4%  -19.47%  (p=0.000 n=10+10)
32Sizes/256-12    3.66GB/s ± 4%  3.25GB/s ± 6%  -11.09%  (p=0.000 n=10+10)
32Sizes/512-12    3.78GB/s ± 3%  3.54GB/s ± 4%   -6.30%  (p=0.000 n=9+9)
32Sizes/1024-12   3.86GB/s ± 3%  3.69GB/s ± 5%   -4.46%  (p=0.000 n=10+10)
32Sizes/2048-12   3.85GB/s ± 3%  3.81GB/s ± 3%     ~     (p=0.079 n=10+9)
32Sizes/4096-12   3.90GB/s ± 3%  3.82GB/s ± 2%   -2.14%  (p=0.029 n=10+10)
32Sizes/8192-12   3.82GB/s ± 3%  3.78GB/s ± 7%     ~     (p=0.529 n=10+10)

64/128 bit sums, non-amd64:

64Sizes/32-12     2.34GB/s ± 5%  2.64GB/s ± 9%  +12.87%  (p=0.000 n=10+10)
64Sizes/64-12     3.62GB/s ± 5%  3.96GB/s ± 4%   +9.41%  (p=0.000 n=10+10)
64Sizes/128-12    5.12GB/s ± 3%  5.44GB/s ± 4%   +6.09%  (p=0.000 n=10+9)
64Sizes/256-12    6.35GB/s ± 2%  6.27GB/s ± 9%     ~     (p=0.796 n=10+10)
64Sizes/512-12    6.58GB/s ± 7%  6.79GB/s ± 3%     ~     (p=0.075 n=10+10)
64Sizes/1024-12   7.49GB/s ± 3%  7.55GB/s ± 9%     ~     (p=0.393 n=10+10)
64Sizes/2048-12   8.06GB/s ± 2%  7.90GB/s ± 6%     ~     (p=0.156 n=9+10)
64Sizes/4096-12   8.27GB/s ± 6%  8.22GB/s ± 5%     ~     (p=0.631 n=10+10)
64Sizes/8192-12   8.35GB/s ± 4%  8.38GB/s ± 6%     ~     (p=0.631 n=10+10)
128Sizes/32-12    2.27GB/s ± 2%  2.68GB/s ± 5%  +18.00%  (p=0.000 n=10+10)
128Sizes/64-12    3.55GB/s ± 2%  4.00GB/s ± 3%  +12.47%  (p=0.000 n=8+9)
128Sizes/128-12   5.09GB/s ± 1%  5.43GB/s ± 3%   +6.65%  (p=0.000 n=9+9)
128Sizes/256-12   6.33GB/s ± 3%  5.65GB/s ± 4%  -10.79%  (p=0.000 n=9+10)
128Sizes/512-12   6.78GB/s ± 3%  6.74GB/s ± 6%     ~     (p=0.968 n=9+10)
128Sizes/1024-12  7.46GB/s ± 4%  7.56GB/s ± 4%     ~     (p=0.222 n=9+9)
128Sizes/2048-12  7.99GB/s ± 4%  7.96GB/s ± 3%     ~     (p=0.666 n=9+9)
128Sizes/4096-12  8.20GB/s ± 2%  8.25GB/s ± 4%     ~     (p=0.631 n=10+10)
128Sizes/8192-12  8.24GB/s ± 2%  8.26GB/s ± 5%     ~     (p=0.673 n=8+9)

64/128 bit sums, amd64:

64Sizes/32-12     2.34GB/s ± 5%  4.36GB/s ± 3%  +85.86%  (p=0.000 n=10+10)
64Sizes/64-12     3.62GB/s ± 5%  6.27GB/s ± 3%  +73.37%  (p=0.000 n=10+9)
64Sizes/128-12    5.12GB/s ± 3%  7.70GB/s ± 6%  +50.27%  (p=0.000 n=10+10)
64Sizes/256-12    6.35GB/s ± 2%  8.61GB/s ± 3%  +35.50%  (p=0.000 n=10+10)
64Sizes/512-12    6.58GB/s ± 7%  8.59GB/s ± 4%  +30.48%  (p=0.000 n=10+9)
64Sizes/1024-12   7.49GB/s ± 3%  8.81GB/s ± 2%  +17.66%  (p=0.000 n=10+10)
64Sizes/2048-12   8.06GB/s ± 2%  8.90GB/s ± 4%  +10.49%  (p=0.000 n=9+10)
64Sizes/4096-12   8.27GB/s ± 6%  8.90GB/s ± 4%   +7.54%  (p=0.000 n=10+10)
64Sizes/8192-12   8.35GB/s ± 4%  9.00GB/s ± 3%   +7.80%  (p=0.000 n=10+9)
128Sizes/32-12    2.27GB/s ± 2%  4.29GB/s ± 9%  +88.75%  (p=0.000 n=10+10)
128Sizes/64-12    3.55GB/s ± 2%  6.10GB/s ± 8%  +71.78%  (p=0.000 n=8+10)
128Sizes/128-12   5.09GB/s ± 1%  7.62GB/s ± 9%  +49.63%  (p=0.000 n=9+10)
128Sizes/256-12   6.33GB/s ± 3%  8.65GB/s ± 3%  +36.71%  (p=0.000 n=9+10)
128Sizes/512-12   6.78GB/s ± 3%  8.39GB/s ± 6%  +23.77%  (p=0.000 n=9+10)
128Sizes/1024-12  7.46GB/s ± 4%  8.70GB/s ± 4%  +16.70%  (p=0.000 n=9+10)
128Sizes/2048-12  7.99GB/s ± 4%  8.73GB/s ± 8%   +9.26%  (p=0.003 n=9+10)
128Sizes/4096-12  8.20GB/s ± 2%  8.86GB/s ± 6%   +8.00%  (p=0.000 n=10+10)
128Sizes/8192-12  8.24GB/s ± 2%  9.01GB/s ± 3%   +9.30%  (p=0.000 n=8+10)

Documentation

Overview

Package murmur3 provides an amd64 native (Go generic fallback) implementation of the murmur3 hash algorithm for strings and slices.

Assembly is provided for amd64 go1.5+; pull requests are welcome for other architectures.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func New32

func New32() hash.Hash32

New32 returns a hash.Hash32 for streaming 32 bit sums.

func New64

func New64() hash.Hash64

New64 returns a hash.Hash64 for streaming 64 bit sums.

func SeedNew32

func SeedNew32(seed uint32) hash.Hash32

SeedNew32 returns a hash.Hash32 for streaming 32 bit sums with its internal digest initialized to seed.

This reads and processes the data in chunks of little endian uint32s; thus, the returned hash is portable across architectures.

func SeedNew64

func SeedNew64(seed uint64) hash.Hash64

SeedNew64 returns a hash.Hash64 for streaming 64 bit sums. As the canonical implementation does not support Sum64, this uses SeedNew128(seed, seed)

func SeedStringSum128

func SeedStringSum128(seed1, seed2 uint64, data string) (h1 uint64, h2 uint64)

SeedStringSum128 is the string version of SeedSum128.

func SeedStringSum32

func SeedStringSum32(seed uint32, data string) (h1 uint32)

SeedStringSum32 is the string version of SeedSum32.

func SeedStringSum64

func SeedStringSum64(seed uint64, data string) uint64

SeedStringSum64 is the string version of SeedSum64.

func SeedSum128

func SeedSum128(seed1, seed2 uint64, data []byte) (h1 uint64, h2 uint64)

SeedSum128 returns the murmur3 sum of data with digests initialized to seed1 and seed2.

The canonical implementation allows only one uint32 seed; to imitate that behavior, use the same, uint32-max seed for seed1 and seed2.

This reads and processes the data in chunks of little endian uint64s; thus, the returned hashes are portable across architectures.

func SeedSum32

func SeedSum32(seed uint32, data []byte) (h1 uint32)

SeedSum32 returns the murmur3 sum of data with the digest initialized to seed.

This reads and processes the data in chunks of little endian uint32s; thus, the returned hash is portable across architectures.

func SeedSum64

func SeedSum64(seed uint64, data []byte) uint64

SeedSum64 returns the murmur3 sum of data with the digest initialized to seed.

Because the canonical implementation does not support SeedSum64, this uses SeedSum128(seed, seed, data).

func StringSum128

func StringSum128(data string) (h1 uint64, h2 uint64)

StringSum128 is the string version of Sum128.

func StringSum32

func StringSum32(data string) uint32

StringSum32 is the string version of Sum32.

func StringSum64

func StringSum64(data string) uint64

StringSum64 is the string version of Sum64.

func Sum128

func Sum128(data []byte) (h1 uint64, h2 uint64)

Sum128 returns the murmur3 sum of data. It is equivalent to the following sequence (without the extra burden and the extra allocation):

hasher := New128()
hasher.Write(data)
return hasher.Sum128()

func Sum32

func Sum32(data []byte) uint32

Sum32 returns the murmur3 sum of data. It is equivalent to the following sequence (without the extra burden and the extra allocation):

hasher := New32()
hasher.Write(data)
return hasher.Sum32()

func Sum64

func Sum64(data []byte) uint64

Sum64 returns the murmur3 sum of data. It is equivalent to the following sequence (without the extra burden and the extra allocation):

hasher := New64()
hasher.Write(data)
return hasher.Sum64()

Types

type Hash128

type Hash128 interface {
	hash.Hash
	Sum128() (uint64, uint64)
}

Hash128 provides an interface for a streaming 128 bit hash.

func New128

func New128() Hash128

New128 returns a Hash128 for streaming 128 bit sums.

func SeedNew128

func SeedNew128(seed1, seed2 uint64) Hash128

SeedNew128 returns a Hash128 for streaming 128 bit sums with its internal digests initialized to seed1 and seed2.

The canonical implementation allows one only uint32 seed; to imitate that behavior, use the same, uint32-max seed for seed1 and seed2.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL