sketch

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 15, 2021 License: MIT

README

sketch

sketch is a Go package that provides implementations for probabilistic data structures.

The implementations are based largely on literature and are a bit hacky, I don't recommend using them for any reason other than figuring out how PDS works or for demo's or mini-projects.

While the implementations are fast enough, they were not formally verified for correctness tests offer 1%~10% accuracy.

Better suitable implementations can be found in Github.

Trade Offs

For LinearCounter the size-accuracy trade-offs for a few bitvector size values have been computed and provided by Ilya Katsov.

  • Trade-off between accuracy (δ) and bit array size (m)
n m (δ = 1%) m (δ = 10%)
1000 5329 268
10000 7960 1709
100000 26729 12744
1000000 154171 100880
10000000 1096582 831809
100000000 8571013 7061760

For HyperLogLog the accuracy-register count trade-offs are based on the estimated cardinality of input. The recommended precision value (denoted m) should take small values for < 10e8 entries.

n m (δ = 1%) m (δ = 10%)
1000 4 7
10000 4 8
100000 5 10
1000000 6 10
10000000 7 12
100000000 8 14

Benchmarks

LinearCounter

BenchmarkLinearCounter
BenchmarkLinearCounter/BenchmarkLinearCounter-5329
BenchmarkLinearCounter/BenchmarkLinearCounter-5329-8         	 2298578	       629.2 ns/op	      64 B/op	       2 allocs/op
BenchmarkLinearCounter/BenchmarkLinearCounter-5329#01
BenchmarkLinearCounter/BenchmarkLinearCounter-5329#01-8      	 1894555	       599.8 ns/op	      64 B/op	       2 allocs/op

HyperLogLog

BenchmarkHyperLogLog
BenchmarkHyperLogLog/BenchmarkHyperLogLog-5
BenchmarkHyperLogLog/BenchmarkHyperLogLog-5-8         	30878364	        38.76 ns/op	       0 B/op	       0 allocs/op
BenchmarkHyperLogLog/BenchmarkHyperLogLog-8
BenchmarkHyperLogLog/BenchmarkHyperLogLog-8-8         	32440388	        40.03 ns/op	       0 B/op	       0 allocs/op
BenchmarkHyperLogLog/BenchmarkHyperLogLog-5#01
BenchmarkHyperLogLog/BenchmarkHyperLogLog-5#01-8      	25808560	        38.98 ns/op	       0 B/op	       0 allocs/op

References

Directories

Path Synopsis
Package bitset provides a fast implementation of Bitsets also called Bitarray, Bitvector...
Package bitset provides a fast implementation of Bitsets also called Bitarray, Bitvector...
Package cardinal implements efficient cardinality estimation datastructures.
Package cardinal implements efficient cardinality estimation datastructures.
Package member implements efficient probabilistic datastructures for membership queries.
Package member implements efficient probabilistic datastructures for membership queries.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL