Documentation
¶
Overview ¶
Package bloom provides data structures and methods for creating Bloom filters. A Bloom filter is a representation of a set of _n_ items, where the main requirement is to make membership queries; _i.e._, whether an item is a member of a set. A Bloom filter has two parameters: _m_, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and _k_, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo _m_). Set membership is done by _testing_ whether the bits at each value of the hashing functions (again, modulo _m_) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose _k_ and _m_ correctly. In this implementation, the hashing functions used is murmurhash, a non-cryptographic hashing function. This implementation accepts keys for setting as testing as []byte. Thus, to add a string item, "Love":
uint n = 1000 filter := bloom.New(20*n, 5) // load of 20, 5 keys filter.Add([]byte("Love"))
Similarly, to test if "Love" is in bloom:
if filter.Test([]byte("Love"))
For numeric data, I recommend that you look into the binary/encoding library. But, for example, to add a uint32 to the filter:
i := uint32(100) n1 := make([]byte,4) binary.BigEndian.PutUint32(n1,i) f.Add(n1)
Finally, there is a method to estimate the false positive rate of a Bloom filter with _m_ bits and _k_ hashing functions for a set of size _n_:
if bloom.EstimateFalsePositiveRate(20*n, 5, n) > 0.001 ...
You can use it to validate the computed m, k parameters:
m, k := bloom.EstimateParameters(n, fp) ActualfpRate := bloom.EstimateFalsePositiveRate(m, k, n)
or
f := bloom.NewWithEstimates(n, fp) ActualfpRate := bloom.EstimateFalsePositiveRate(f.m, f.k, n)
You would expect ActualfpRate to be close to the desired fp in these cases. The EstimateFalsePositiveRate function creates a temporary Bloom filter. It is also relatively expensive and only meant for validation.
Index ¶
Constants ¶
const ( N = 7 MUT = 0 DEL = 1 ADD = 2 SWP = 3 FLP = 4 REV = 5 ROT = 6 )
const (
EXP = `` /* 298-byte string literal not displayed */
)
const MAX = 5_000
Variables ¶
This section is empty.
Functions ¶
func EstimateParameters ¶
EstimateParameters estimates requirements for m and k. Based on https://bitbucket.org/ww/bloom/src/829aa19d01d9/bloom.go used with permission.
Types ¶
type BloomFilter ¶
type BloomFilter struct {
// contains filtered or unexported fields
}
func New ¶
func New(m uint, k uint) *BloomFilter
New creates a new Bloom filter with _m_ bits and _k_ hashing functions We force _m_ and _k_ to be at least one to avoid panics.
func NewWithEstimates ¶
func NewWithEstimates(n uint, fp float64) *BloomFilter
NewWithEstimates creates a new Bloom filter for about n items with fp false positive rate
func (*BloomFilter) Add ¶
func (f *BloomFilter) Add(data []byte)
Add data to the Bloom Filter. Returns the filter (allows chaining)
func (*BloomFilter) AddHash ¶
func (f *BloomFilter) AddHash(h c.BFHash)
func (*BloomFilter) Contains ¶
func (f *BloomFilter) Contains(data []byte) bool
Contains returns true if the data is in the BloomFilter, false otherwise. If true, the result might be a false positive. If false, the data is definitely not in the set.
func (*BloomFilter) ContainsHash ¶
func (f *BloomFilter) ContainsHash(h c.BFHash) bool
type Hopper ¶
type Hopper struct {
// contains filtered or unexported fields
}
func InitHopper ¶
func (*Hopper) UpdateFTask ¶
func (h *Hopper) UpdateFTask(update *c.UpdateFTask, reply *c.UpdateReply) error
type PQItem ¶
type PQItem struct { Seed []byte Energy float64 Id c.FTaskID // contains filtered or unexported fields }
An Item is something we manage in a priority queue.
type PriorityQueue ¶
type PriorityQueue []*PQItem
A PriorityQueue implements heap.Interface and holds Items.
func (PriorityQueue) Len ¶
func (pq PriorityQueue) Len() int
func (*PriorityQueue) Pop ¶
func (pq *PriorityQueue) Pop() any
func (PriorityQueue) Swap ¶
func (pq PriorityQueue) Swap(i, j int)