Documentation ¶
Overview ¶
Package bloom provides data structures and methods for creating Bloom filters.
A Bloom filter is a representation of a set of _n_ items, where the main requirement is to make membership queries; _i.e._, whether an item is a member of a set.
A Bloom filter has two parameters: _m_, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and _k_, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo _m_). Set membership is done by _testing_ whether the bits at each value of the hashing functions (again, modulo _m_) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose _k_ and _m_ correctly.
In this implementation, the hashing functions used is murmurhash, a non-cryptographic hashing function.
This implementation accepts keys for setting as testing as []byte. Thus, to add a string item, "Love":
uint n = 1000 filter := bloom.New(20*n, 5) // load of 20, 5 keys filter.Add([]byte("Love"))
Similarly, to test if "Love" is in bloom:
if filter.Test([]byte("Love"))
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func EstimateFalsePositiveRate ¶
EstimateFalsePositiveRate estimates m and k, based on: https://stackoverflow.com/a/22467497
Types ¶
type BloomFilter ¶
type BloomFilter struct {
// contains filtered or unexported fields
}
BloomFilter is a bloom filter set membership. It cannot be concurrently read or written to. Multiple concurrent readers is also unsafe so a sync.Mutex must be used to guard read/write access if desired. This is due to a cached hash digest being used to avoid allocation each read/write.
func NewBloomFilter ¶
func NewBloomFilter(m uint, k uint) *BloomFilter
NewBloomFilter creates a new bloom filter that can represent m elements with k hashes. It is not concurrent read or write safe.
func (*BloomFilter) BitSet ¶
func (b *BloomFilter) BitSet() *bitset.BitSet
BitSet returns the bitset used.
func (*BloomFilter) Test ¶
func (b *BloomFilter) Test(value []byte) bool
Test if value is in the set.
type ConcurrentReadOnlyBloomFilter ¶
type ConcurrentReadOnlyBloomFilter struct {
// contains filtered or unexported fields
}
ConcurrentReadOnlyBloomFilter is a concurrent read only bloom filter set membership. It can be concurrently read from by any number of readers.
func NewConcurrentReadOnlyBloomFilter ¶
func NewConcurrentReadOnlyBloomFilter( m, k uint, data []byte, ) *ConcurrentReadOnlyBloomFilter
NewConcurrentReadOnlyBloomFilter returns a new concurrent read only bloom filter backed by a byte slice, this means it can be used with a mmap'd bytes ref. It can be concurrently read from by any number of readers.
func (*ConcurrentReadOnlyBloomFilter) BitSet ¶
func (b *ConcurrentReadOnlyBloomFilter) BitSet() *bitset.ReadOnlyBitSet
BitSet returns the bitset used.
func (*ConcurrentReadOnlyBloomFilter) K ¶
func (b *ConcurrentReadOnlyBloomFilter) K() uint
K returns the k hashes used.
func (*ConcurrentReadOnlyBloomFilter) M ¶
func (b *ConcurrentReadOnlyBloomFilter) M() uint
M returns the m elements represented.
func (*ConcurrentReadOnlyBloomFilter) Test ¶
func (b *ConcurrentReadOnlyBloomFilter) Test(value []byte) bool
Test if value is in the set.
type ReadOnlyBloomFilter ¶
type ReadOnlyBloomFilter struct {
// contains filtered or unexported fields
}
ReadOnlyBloomFilter is a read only bloom filter set membership. It cannot be concurrently read or written to. Multiple concurrent readers is also unsafe so a sync.Mutex must be used to guard read/write access if desired. This is due to a cached hash digest being used to avoid allocation each read/write.
func NewReadOnlyBloomFilter ¶
func NewReadOnlyBloomFilter(m, k uint, data []byte) *ReadOnlyBloomFilter
NewReadOnlyBloomFilter returns a new read only bloom filter backed by a byte slice, this means it can be used with a mmap'd bytes ref. It is not concurrent read or write safe.
func (*ReadOnlyBloomFilter) BitSet ¶
func (b *ReadOnlyBloomFilter) BitSet() *bitset.ReadOnlyBitSet
BitSet returns the bitset used.
func (*ReadOnlyBloomFilter) M ¶
func (b *ReadOnlyBloomFilter) M() uint
M returns the m elements represented.
func (*ReadOnlyBloomFilter) Test ¶
func (b *ReadOnlyBloomFilter) Test(value []byte) bool
Test if value is in the set.