bloomfilter

package module
v1.0.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 20, 2020 License: MIT Imports: 18 Imported by: 0

README

GoDoc travis

History

This bloom filter implementation is a fork from steakknife/bloomfilter by Barry Allard. The upstream project is now archived, so this fork exists to fix some bugs and also make a few improvements. Below is the original description.

All recent changes are copyright © 2019 Martin Holst Swende.

Installation

$ go get github.com/holiman/bloomfilter

Face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

Copyright © 2014-2016,2018 Barry Allard MIT license

WTF is a bloom filter

**TL;DR: **Probabilistic, extra lookup table to track a set of elements kept elsewhere to reduce expensive, unnecessary set element retrieval and/or iterator operations when an element is not present in the set. It's a classic time-storage tradeoff algoritm.

Properties
See wikipedia for algorithm details
Impact What Description
Good No false negatives know for certain if a given element is definitely NOT in the set
Bad False positives uncertain if a given element is in the set
Bad Theoretical potential for hash collisions in very large systems and/or badly hash.Hash64-conforming implementations
Bad Add only Cannot remove an element, it would destroy information about other elements
Good Constant storage uses only a fixed amount of memory

Naming conventions

(Similar to algorithm)

Variable/function Description Range
m/M() number of bits in the bloom filter (memory representation is about m/8 bytes in size) >=2
n/N() number of elements present >=0
k/K() number of keys to use (keys are kept private to user code but are de/serialized to Marshal and file I/O) >=0
maxN maximum capacity of intended structure >0
p maximum allowed probability of collision (for computing m and k for optimal sizing) >0..<1
  • Memory representation should be exactly 24 + 8*(k + (m+63)/64) + unsafe.Sizeof(RWMutex) bytes.
  • Serialized (BinaryMarshaler) representation should be exactly 72 + 8*(k + (m+63)/64) bytes. (Disk format is less due to compression.)

Binary serialization format

All values in Little-endian format

Offset Offset (Hex) Length (bytes) Name Type
0 00 8 k uint64
8 08 8 n uint64
16 10 8 m uint64
24 18 k (keys) [k]uint64
24+8*k ... (m+63)/64 (bloom filter) [(m+63)/64]uint64
24+8*k+8*((m+63)/64) ... 48 (SHA384 of all previous fields, hashed in order) [48]byte
  • bloomfilter.Filter conforms to encoding.BinaryMarshaler and `encoding.BinaryUnmarshaler'

Usage


import "github.com/steakknife/bloomfilter"

const (
  maxElements = 100000
  probCollide = 0.0000001
)

bf, err := bloomfilter.NewOptimal(maxElements, probCollide)
if err != nil {
  panic(err)
}

someValue := ... // must conform to hash.Hash64

bf.Add(someValue)
if bf.Contains(someValue) { // probably true, could be false
  // whatever
}

anotherValue := ... // must also conform to hash.Hash64

if bf.Contains(anotherValue) {
  panic("This should never happen")
}

err := bf.WriteFile("1.bf.gz")  // saves this BF to a file
if err != nil {
  panic(err)
}

bf2, err := bloomfilter.ReadFile("1.bf.gz") // read the BF to another var
if err != nil {
  panic(err)
}

Design

Where possible, branch-free operations are used to avoid deep pipeline / execution unit stalls on branch-misses.

Contact

License

MIT license

Copyright © 2014-2016 Barry Allard

Documentation

Overview

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Package bloomfilter is face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go

https://github.com/steakknife/bloomfilter

MIT license

Index

Constants

View Source
const (
	// MMin is the minimum Bloom filter bits count
	MMin = 2
	// KMin is the minimum number of keys
	KMin = 1
	// Uint64Bytes is the number of bytes in type uint64
	Uint64Bytes = 8
)

Variables

This section is empty.

Functions

func EnableDebugging

func EnableDebugging()

EnableDebugging permits debug() logging of details to stderr

func OptimalK

func OptimalK(m, maxN uint64) uint64

OptimalK calculates the optimal k value for creating a new Bloom filter maxn is the maximum anticipated number of elements

func OptimalM

func OptimalM(maxN uint64, p float64) uint64

OptimalM calculates the optimal m value for creating a new Bloom filter p is the desired false positive probability optimal m = ceiling( - n * ln(p) / ln(2)**2 )

func UniqueKeys

func UniqueKeys(keys []uint64) bool

UniqueKeys is true if all keys are unique

Types

type Filter

type Filter struct {
	// contains filtered or unexported fields
}

Filter is an opaque Bloom filter type

func New

func New(m, k uint64) (*Filter, error)

New Filter with CSPRNG keys

m is the size of the Bloom filter, in bits, >= 2

k is the number of random keys, >= 1

func NewOptimal

func NewOptimal(maxN uint64, p float64) (*Filter, error)

NewOptimal Bloom filter with random CSPRNG keys

func NewWithKeys

func NewWithKeys(m uint64, origKeys []uint64) (f *Filter, err error)

NewWithKeys creates a new Filter from user-supplied origKeys

func ReadFile

func ReadFile(filename string) (f *Filter, n int64, err error)

ReadFile from filename into a lossless-compressed Bloom Filter f Suggested file extension: .bf.gz

func ReadFrom

func ReadFrom(r io.Reader) (f *Filter, n int64, err error)

ReadFrom Reader r into a lossless-compressed Bloom filter f

func UnmarshalText

func UnmarshalText(text []byte) (f *Filter, err error)

UnmarshalText conforms to TextUnmarshaler

func (*Filter) Add

func (f *Filter) Add(v hash.Hash64)

Add a hashable item, v, to the filter

func (*Filter) AddHash

func (f *Filter) AddHash(hash uint64)

Adds an already hashes item to the filter. Identical to Add (but slightly faster)

func (*Filter) Contains

func (f *Filter) Contains(v hash.Hash64) bool

Contains tests if f contains v false: f definitely does not contain value v true: f maybe contains value v

func (*Filter) ContainsHash

func (f *Filter) ContainsHash(hash uint64) bool

ContainsHash tests if f contains the (already hashed) key Identical to Contains but slightly faster

func (*Filter) Copy

func (f *Filter) Copy() (*Filter, error)

Copy f to a new Bloom filter

func (*Filter) FalsePosititveProbability

func (f *Filter) FalsePosititveProbability() float64

FalsePosititveProbability is the upper-bound probability of false positives

(1 - exp(-k*(n+0.5)/(m-1))) ** k

func (*Filter) GobDecode

func (f *Filter) GobDecode(data []byte) error

GobDecode conforms to interface gob.GobDecoder

func (*Filter) GobEncode

func (f *Filter) GobEncode() ([]byte, error)

GobEncode conforms to interface gob.GobEncoder

func (*Filter) IsCompatible

func (f *Filter) IsCompatible(f2 *Filter) bool

IsCompatible is true if f and f2 can be Union()ed together

func (*Filter) K

func (f *Filter) K() uint64

K is the count of keys

func (*Filter) M

func (f *Filter) M() uint64

M is the size of Bloom filter, in bits

func (*Filter) MarshalBinary

func (f *Filter) MarshalBinary() (data []byte, err error)

MarshalBinary converts a Filter into []bytes

func (*Filter) MarshalText

func (f *Filter) MarshalText() (text []byte, err error)

MarshalText conforms to encoding.TextMarshaler

func (*Filter) MarshallToWriter

func (f *Filter) MarshallToWriter(out io.Writer) (int, [sha512.Size384]byte, error)

MarshallToWriter marshalls the filter into the given io.Writer Binary layout (Little Endian):

k	1 uint64
n	1 uint64
m	1 uint64
keys	[k]uint64
bits	[(m+63)/64]uint64
hash	sha384 (384 bits == 48 bytes)

size = (3 + k + (m+63)/64) * 8 bytes

func (*Filter) N

func (f *Filter) N() uint64

N is how many elements have been inserted (actually, how many Add()s have been performed?)

func (*Filter) NewCompatible

func (f *Filter) NewCompatible() (*Filter, error)

NewCompatible Filter compatible with f

func (*Filter) PreciseFilledRatio

func (f *Filter) PreciseFilledRatio() float64

PreciseFilledRatio is an exhaustive count # of 1's

func (*Filter) ReadFrom

func (f *Filter) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom r and overwrite f with new Bloom filter data

func (*Filter) Union

func (f *Filter) Union(f2 *Filter) (out *Filter, err error)

Union merges f2 and f2 into a new Filter out

func (*Filter) UnionInPlace

func (f *Filter) UnionInPlace(f2 *Filter) error

UnionInPlace merges Bloom filter f2 into f

func (*Filter) UnmarshalBinary

func (f *Filter) UnmarshalBinary(data []byte) (err error)

UnmarshalBinary converts []bytes into a Filter conforms to encoding.BinaryUnmarshaler

func (*Filter) UnmarshalText

func (f *Filter) UnmarshalText(text []byte) error

UnmarshalText method overwrites f with data decoded from text

func (*Filter) WriteFile

func (f *Filter) WriteFile(filename string) (n int64, err error)

WriteFile filename from a a lossless-compressed Bloom Filter f Suggested file extension: .bf.gz

func (*Filter) WriteTo

func (f *Filter) WriteTo(w io.Writer) (n int64, err error)

WriteTo a Writer w from lossless-compressed Bloom Filter f

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL