bloom

package module

v1.0.0 Latest Latest Go to latest Published: Feb 15, 2014 License: BSD-3-Clause Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/archs/bloom

Links

Open Source Insights

README ¶

Bloom filters

A Bloom filter is a representation of a set of n items, where the main requirement is to make membership queries; i.e., whether an item is a member of a set.

A Bloom filter has two parameters: m, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and k, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo m). Set membership is done by testing whether the bits at each value of the hashing functions (again, modulo m) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose k and m correctly.

In this implementation, the hashing function used is FNV, a non-cryptographic hashing function which is part of the Go package (hash/fnv). For a item, the 64-bit FNV hash is computed, and upper and lower 32 bit numbers, call them h1 and h2, are used. Then, the _i_th hashing function is:

h1 + h2*i

Thus, the underlying hash function, FNV, is only called once per key.

This implementation accepts keys for setting as testing as []byte. Thus, to add a string item, "Love":

uint n = 1000
filter := bloom.New(20*n, 5) // load of 20, 5 keys
filter.Add([]byte("Love"))

Similarly, to test if "Love" is in bloom:

if filter.Test([]byte("Love"))

For numeric data, I recommend that you look into the binary/encoding library. But, for example, to add a uint32 to the filter:

i := uint32(100)
n1 := make([]byte,4)
binary.BigEndian.PutUint32(n1,i)
f.Add(n1)

Finally, there is a method to estimate the false positive rate of a particular bloom filter for a set of size n:

if filter.EstimateFalsePositiveRate(1000) > 0.001

Given the particular hashing scheme, it's best to be empirical about this. Note that estimating the FP rate will clear the Bloom filter.

Discussion here: Bloom filter

Documentation ¶

Overview ¶

A Bloom filter is a representation of a set of _n_ items, where the main requirement is to make membership queries; _i.e._, whether an item is a member of a set.

A Bloom filter has two parameters: _m_, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and _k_, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo _m_). Set membership is done by _testing_ whether the bits at each value of the hashing functions (again, modulo _m_) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose _k_ and _m_ correctly.

In this implementation, the hashing function used is FNV, a non-cryptographic hashing function which is part of the Go package (hash/fnv). For a item, the 64-bit FNV hash is computed, and upper and lower 32 bit numbers, call them h1 and h2, are used. Then, the _i_th hashing function is:

h1 + h2*i

Thus, the underlying hash function, FNV, is only called once per key.

This implementation accepts keys for setting as testing as []byte. Thus, to add a string item, "Love":

uint n = 1000
filter := bloom.New(20*n, 5) // load of 20, 5 keys
filter.Add([]byte("Love"))

Similarly, to test if "Love" is in bloom:

if filter.Test([]byte("Love"))

For numeric data, I recommend that you look into the binary/encoding library. But, for example, to add a uint32 to the filter:

i := uint32(100)
n1 := make([]byte,4)
binary.BigEndian.PutUint32(n1,i)
f.Add(n1)

Finally, there is a method to estimate the false positive rate of a particular bloom filter for a set of size _n_:

if filter.EstimateFalsePositiveRate(1000) > 0.001

Given the particular hashing scheme, it's best to be empirical about this. Note that estimating the FP rate will clear the Bloom filter.

Index ¶

type BloomFilter
- func New(m uint, k uint) *BloomFilter
- func NewWithEstimates(n uint, fp float64) *BloomFilter

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type BloomFilter ¶

type BloomFilter struct {
	// contains filtered or unexported fields
}

func New ¶

func New(m uint, k uint) *BloomFilter

Create a new Bloom filter with _m_ bits and _k_ hashing functions

func NewWithEstimates ¶

func NewWithEstimates(n uint, fp float64) *BloomFilter

Create a new Bloom filter for about n items with fp false positive rate

func (*BloomFilter) Add ¶

func (f *BloomFilter) Add(data []byte) *BloomFilter

Add data to the Bloom Filter. Returns the filter (allows chaining)

func (*BloomFilter) Cap ¶

func (f *BloomFilter) Cap() uint

Return the capacity, _m_, of a Bloom filter

func (*BloomFilter) ClearAll ¶

func (f *BloomFilter) ClearAll() *BloomFilter

Clear all the data in a Bloom filter, removing all keys

func (*BloomFilter) EstimateFalsePositiveRate ¶

func (f *BloomFilter) EstimateFalsePositiveRate(n uint) (fp_rate float64)

Estimate, for a BloomFilter with a limit of m bytes and k hash functions, what the false positive rate will be whilst storing n entries; runs 10k tests

func (*BloomFilter) K ¶

func (f *BloomFilter) K() uint

Return the number of hash functions used

func (*BloomFilter) Test ¶

func (f *BloomFilter) Test(data []byte) bool

Tests for the presence of data in the Bloom filter

func (*BloomFilter) TestAndAdd ¶

func (f *BloomFilter) TestAndAdd(data []byte) bool

Equivalent to calling Test(data) then Add(data). Returns the result of Test.

Source Files ¶

View all Source files

bloom.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL