Documentation ¶
Overview ¶
A Bloom filter is a representation of a set of _n_ items, where the main requirement is to make membership queries; _i.e._, whether an item is a member of a set.
A Bloom filter has two parameters: _m_, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and _k_, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo _m_). Set membership is done by _testing_ whether the bits at each value of the hashing functions (again, modulo _m_) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose _k_ and _m_ correctly.
In this implementation, the hashing function used is FNV, a non-cryptographic hashing function which is part of the Go package (hash/fnv). For a item, the 64-bit FNV hash is computed, and upper and lower 32 bit numbers, call them h1 and h2, are used. Then, the _i_th hashing function is:
h1 + h2*i
Thus, the underlying hash function, FNV, is only called once per key.
This implementation accepts keys for setting as testing as []byte. Thus, to add a string item, "Love":
uint n = 1000 filter := bloom.New(20*n, 5) // load of 20, 5 keys filter.Add([]byte("Love"))
Similarly, to test if "Love" is in bloom:
if filter.Test([]byte("Love"))
For numeric data, I recommend that you look into the binary/encoding library. But, for example, to add a uint32 to the filter:
i := uint32(100) n1 := make([]byte,4) binary.BigEndian.PutUint32(n1,i) f.Add(n1)
Finally, there is a method to estimate the false positive rate of a particular bloom filter for a set of size _n_:
if filter.EstimateFalsePositiveRate(1000) > 0.001
Given the particular hashing scheme, it's best to be empirical about this. Note that estimating the FP rate will clear the Bloom filter.
Index ¶
- type BloomFilter
- func (f *BloomFilter) Add(data []byte) *BloomFilter
- func (f *BloomFilter) Cap() uint
- func (f *BloomFilter) ClearAll() *BloomFilter
- func (f *BloomFilter) EstimateFalsePositiveRate(n uint) (fp_rate float64)
- func (f *BloomFilter) K() uint
- func (f *BloomFilter) Test(data []byte) bool
- func (f *BloomFilter) TestAndAdd(data []byte) bool
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BloomFilter ¶
type BloomFilter struct {
// contains filtered or unexported fields
}
func New ¶
func New(m uint, k uint) *BloomFilter
Create a new Bloom filter with _m_ bits and _k_ hashing functions
func NewWithEstimates ¶
func NewWithEstimates(n uint, fp float64) *BloomFilter
Create a new Bloom filter for about n items with fp false positive rate
func (*BloomFilter) Add ¶
func (f *BloomFilter) Add(data []byte) *BloomFilter
Add data to the Bloom Filter. Returns the filter (allows chaining)
func (*BloomFilter) Cap ¶
func (f *BloomFilter) Cap() uint
Return the capacity, _m_, of a Bloom filter
func (*BloomFilter) ClearAll ¶
func (f *BloomFilter) ClearAll() *BloomFilter
Clear all the data in a Bloom filter, removing all keys
func (*BloomFilter) EstimateFalsePositiveRate ¶
func (f *BloomFilter) EstimateFalsePositiveRate(n uint) (fp_rate float64)
Estimate, for a BloomFilter with a limit of m bytes and k hash functions, what the false positive rate will be whilst storing n entries; runs 10k tests
func (*BloomFilter) Test ¶
func (f *BloomFilter) Test(data []byte) bool
Tests for the presence of data in the Bloom filter
func (*BloomFilter) TestAndAdd ¶
func (f *BloomFilter) TestAndAdd(data []byte) bool
Equivalent to calling Test(data) then Add(data). Returns the result of Test.