Documentation ¶
Overview ¶
Package bloom provides data structures and methods for creating Bloom filters.
A Bloom filter is a representation of a set of _n_ items, where the main requirement is to make membership queries; _i.e._, whether an item is a member of a set.
A Bloom filter has two parameters: _m_, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and _k_, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo _m_). Set membership is done by _testing_ whether the bits at each value of the hashing functions (again, modulo _m_) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose _k_ and _m_ correctly.
In this implementation, the hashing functions used is murmurhash, a non-cryptographic hashing function.
This implementation accepts keys for setting as testing as []byte. Thus, to add a string item, "Love":
uint n = 1000 filter := bloom.New(20*n, 5) // load of 20, 5 keys filter.Add([]byte("Love"))
Similarly, to test if "Love" is in bloom:
if filter.Test([]byte("Love"))
For numeric data, I recommend that you look into the binary/encoding library. But, for example, to add a uint32 to the filter:
i := uint32(100) n1 := make([]byte,4) binary.BigEndian.PutUint32(n1,i) f.Add(n1)
Finally, there is a method to estimate the false positive rate of a particular Bloom filter for a set of size _n_:
if filter.EstimateFalsePositiveRate(1000) > 0.001
Given the particular hashing scheme, it's best to be empirical about this. Note that estimating the FP rate will clear the Bloom filter.
Index ¶
- func EstimateParameters(n uint, p float64) (m uint, k uint)
- func Locations(data []byte, k uint) []uint64
- type BloomFilter
- func (f *BloomFilter) Add(data []byte) *BloomFilter
- func (f *BloomFilter) AddString(data string) *BloomFilter
- func (f *BloomFilter) Cap() uint
- func (f *BloomFilter) ClearAll() *BloomFilter
- func (f *BloomFilter) Copy() *BloomFilter
- func (f *BloomFilter) Equal(g *BloomFilter) bool
- func (f *BloomFilter) EstimateFalsePositiveRate(n uint) (fpRate float64)
- func (f *BloomFilter) GobDecode(data []byte) error
- func (f *BloomFilter) GobEncode() ([]byte, error)
- func (f *BloomFilter) K() uint
- func (f *BloomFilter) MarshalJSON() ([]byte, error)
- func (f *BloomFilter) Merge(g *BloomFilter) error
- func (f *BloomFilter) ReadFrom(stream io.Reader) (int64, error)
- func (f *BloomFilter) Test(data []byte) bool
- func (f *BloomFilter) TestAndAdd(data []byte) bool
- func (f *BloomFilter) TestAndAddString(data string) bool
- func (f *BloomFilter) TestLocations(locs []uint64) bool
- func (f *BloomFilter) TestString(data string) bool
- func (f *BloomFilter) UnmarshalJSON(data []byte) error
- func (f *BloomFilter) WriteTo(stream io.Writer) (int64, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func EstimateParameters ¶
EstimateParameters estimates requirements for m and k. Based on https://bitbucket.org/ww/bloom/src/829aa19d01d9/bloom.go used with permission.
Types ¶
type BloomFilter ¶
type BloomFilter struct {
// contains filtered or unexported fields
}
A BloomFilter is a representation of a set of _n_ items, where the main requirement is to make membership queries; _i.e._, whether an item is a member of a set.
func From ¶
func From(data []uint64, k uint) *BloomFilter
From creates a new Bloom filter with len(_data_) * 64 bits and _k_ hashing functions. The data slice is not going to be reset.
func New ¶
func New(m uint, k uint) *BloomFilter
New creates a new Bloom filter with _m_ bits and _k_ hashing functions We force _m_ and _k_ to be at least one to avoid panics.
func NewWithEstimates ¶
func NewWithEstimates(n uint, fp float64) *BloomFilter
NewWithEstimates creates a new Bloom filter for about n items with fp false positive rate
func (*BloomFilter) Add ¶
func (f *BloomFilter) Add(data []byte) *BloomFilter
Add data to the Bloom Filter. Returns the filter (allows chaining)
func (*BloomFilter) AddString ¶
func (f *BloomFilter) AddString(data string) *BloomFilter
AddString to the Bloom Filter. Returns the filter (allows chaining)
func (*BloomFilter) Cap ¶
func (f *BloomFilter) Cap() uint
Cap returns the capacity, _m_, of a Bloom filter
func (*BloomFilter) ClearAll ¶
func (f *BloomFilter) ClearAll() *BloomFilter
ClearAll clears all the data in a Bloom filter, removing all keys
func (*BloomFilter) Copy ¶
func (f *BloomFilter) Copy() *BloomFilter
Copy creates a copy of a Bloom filter.
func (*BloomFilter) Equal ¶
func (f *BloomFilter) Equal(g *BloomFilter) bool
Equal tests for the equality of two Bloom filters
func (*BloomFilter) EstimateFalsePositiveRate ¶
func (f *BloomFilter) EstimateFalsePositiveRate(n uint) (fpRate float64)
EstimateFalsePositiveRate returns, for a BloomFilter with a estimate of m bits and k hash functions, what the false positive rate will be while storing n entries; runs 100,000 tests. This is an empirical test using integers as keys. As a side-effect, it clears the BloomFilter.
func (*BloomFilter) GobDecode ¶
func (f *BloomFilter) GobDecode(data []byte) error
GobDecode implements gob.GobDecoder interface.
func (*BloomFilter) GobEncode ¶
func (f *BloomFilter) GobEncode() ([]byte, error)
GobEncode implements gob.GobEncoder interface.
func (*BloomFilter) K ¶
func (f *BloomFilter) K() uint
K returns the number of hash functions used in the BloomFilter
func (*BloomFilter) MarshalJSON ¶
func (f *BloomFilter) MarshalJSON() ([]byte, error)
MarshalJSON implements json.Marshaler interface.
func (*BloomFilter) Merge ¶
func (f *BloomFilter) Merge(g *BloomFilter) error
Merge the data from two Bloom Filters.
func (*BloomFilter) ReadFrom ¶
func (f *BloomFilter) ReadFrom(stream io.Reader) (int64, error)
ReadFrom reads a binary representation of the BloomFilter (such as might have been written by WriteTo()) from an i/o stream. It returns the number of bytes read.
func (*BloomFilter) Test ¶
func (f *BloomFilter) Test(data []byte) bool
Test returns true if the data is in the BloomFilter, false otherwise. If true, the result might be a false positive. If false, the data is definitely not in the set.
func (*BloomFilter) TestAndAdd ¶
func (f *BloomFilter) TestAndAdd(data []byte) bool
TestAndAdd is the equivalent to calling Test(data) then Add(data). Returns the result of Test.
func (*BloomFilter) TestAndAddString ¶
func (f *BloomFilter) TestAndAddString(data string) bool
TestAndAddString is the equivalent to calling Test(string) then Add(string). Returns the result of Test.
func (*BloomFilter) TestLocations ¶
func (f *BloomFilter) TestLocations(locs []uint64) bool
TestLocations returns true if all locations are set in the BloomFilter, false otherwise.
func (*BloomFilter) TestString ¶
func (f *BloomFilter) TestString(data string) bool
TestString returns true if the string is in the BloomFilter, false otherwise. If true, the result might be a false positive. If false, the data is definitely not in the set.
func (*BloomFilter) UnmarshalJSON ¶
func (f *BloomFilter) UnmarshalJSON(data []byte) error
UnmarshalJSON implements json.Unmarshaler interface.