Documentation
¶
Overview ¶
Package blobloom implements blocked Bloom filters.
Blocked Bloom filters are an approximate set data structure: if a key has been added to a filter, a lookup of that key returns true, but if the key has not been added, there is a non-zero probability that the lookup still returns true (a false positive). It follows that, if the lookup for a key returns false, that key has not been added to the filter.
In this package, keys are represented exclusively as hashes. Client code is responsible for supplying two 32-bit hash values for a key. No hash function is provided, since the "right" hash function for an application depends on the data the application processes.
Compared to standard Bloom filters, blocked Bloom filters use the CPU cache more efficiently. A blocked Bloom filter is an array of ordinary Bloom filters of fixed size BlockBits (the blocks). The first hash of a key selects the block to use.
To achieve the same false positive rate (FPR) as a standard Bloom filter, a blocked Bloom filter requires more memory. For an FPR of at most 2e-6 (two in a million), it uses ~20% more memory. At 1e-10, the space required is double that of standard Bloom filter.
For more details, see the 2010 paper by Putze, Sanders and Singler, https://algo2.iti.kit.edu/documents/cacheefficientbloomfilters-jea.pdf.
Example (Fnv) ¶
package main import ( "fmt" "hash/fnv" "io" "github.com/greatroar/blobloom" ) func main() { f := blobloom.New(10000, 5) h := fnv.New64() messages := []string{ "Hello!", "Welcome!", "Mind your step!", "Have fun!", "Goodbye!", } for _, msg := range messages { h.Reset() io.WriteString(h, msg) f.Add(h.Sum64()) } for _, msg := range messages { h.Reset() io.WriteString(h, msg) if f.Has(h.Sum64()) { fmt.Println(msg) } else { panic("Bloom filter didn't get the message") } } }
Output: Hello! Welcome! Mind your step! Have fun! Goodbye!
Index ¶
- Constants
- func FPRate(nkeys, nbits uint64, nhashes int) float64
- func Optimize(cfg Config) (nbits uint64, nhashes int)
- type Config
- type Filter
- func (f *Filter) Add(h uint64)
- func (f *Filter) Add2(h1, h2 uint32)
- func (f *Filter) AddAtomic(h uint64)
- func (f *Filter) AddAtomic2(h1, h2 uint32)
- func (f *Filter) Clear()
- func (f *Filter) FPRate(nkeys uint64) float64
- func (f *Filter) Has(h uint64) bool
- func (f *Filter) Has2(h1, h2 uint32) bool
- func (f *Filter) NumBits() uint64
- func (f *Filter) Union(g *Filter)
Examples ¶
Constants ¶
const BlockBits = 512
BlockBits is the number of bits per block and the minimum number of bits in a Filter.
The value of this constant is chosen to match the L1 cache line size of popular architectures (386, amd64, arm64).
const MaxBits = BlockBits << 32 // 256GiB.
MaxBits is the maximum number of bits supported by a Filter.
Variables ¶
This section is empty.
Functions ¶
func FPRate ¶
FPRate computes an estimate of the false positive rate of a Bloom filter after nkeys distinct keys have been added.
func Optimize ¶
Optimize returns numbers of keys and hash functions that achieve the desired false positive described by cfg.
The estimated number of bits is imprecise for false positives rates below ca. 1e-15.
Example ¶
package main import ( "fmt" "github.com/greatroar/blobloom" ) func main() { cfg := blobloom.Config{ // We want to insert a billion keys and get a false positive rate of // one in a million, but we only have 2GiB (= 2^31 bytes) to spare. Capacity: 1e9, FPRate: 1e-6, MaxBits: 8 * 1 << 31, } nbits, nhashes := blobloom.Optimize(cfg) fpr := blobloom.FPRate(cfg.Capacity, nbits, nhashes) // How big will the filter be and what FP rate will we achieve? fmt.Printf("size = %dMiB\nfpr = %.3f\n", nbits/(8<<20), fpr) }
Output: size = 2048MiB fpr = 0.001
Types ¶
type Config ¶
type Config struct { // Capacity is the expected number of distinct keys to be added. // More keys can always be added, but the false positive rate can be // expected to drop below FPRate is the number exceeds the Capacity. Capacity uint64 // Desired lower bound on the false positive rate when the Bloom filter // has been filled to capacity. FPRate float64 // Maximum size of the Bloom filter in bits. Zero means no limit. MaxBits uint64 // contains filtered or unexported fields }
A Config holds parameters for Optimize or NewOptimized.
type Filter ¶
type Filter struct {
// contains filtered or unexported fields
}
A Filter is a blocked Bloom filter.
func New ¶
New constructs a Bloom filter with given numbers of bits and hash functions.
The number of bits should be at least BlockBits; smaller values are silently increased.
The number of hash functions uses is silently increased to two. The client passes the first two hashes for every key to Add and Has, which synthesize all following hashes from the two values passed in.
func NewOptimized ¶
NewOptimized is shorthand for New(Optimize(cfg)).
func (*Filter) Add ¶
Add insert a key with hash value h into f.
The upper and lower half of h are treated as two independent hashes. These are used to derive further values using the enhanced double hashing construction of Dillinger and Manolios, https://www.ccs.neu.edu/home/pete/pub/bloom-filters-verification.pdf.
func (*Filter) Add2 ¶
Add2 inserts a key with hash values h1 and h2 into f.
Add2 is equivalent to Add(h1<<32 | h2).
func (*Filter) AddAtomic ¶
AddAtomic atomically inserts a key with hash value h into f.
This is a synchronized version of Add. Multiple goroutines may call AddAtomic and AddAtomic2 concurrently, though no goroutines should call any other methods on f concurrently with these methods.
func (*Filter) AddAtomic2 ¶
AddAtomic2 atomically inserts a key with hash values h1 and h2 into f.
AddAtomic2 is equivalent to AddAtomic(h1<<32 | h2).
func (*Filter) FPRate ¶
FPRate computes an estimate of f's false positive rate after nkeys distinct keys have been added.
func (*Filter) Has ¶
Has reports whether a key with hash value h has been added. It may return a false positive.
func (*Filter) Has2 ¶
Has2 reports whether a key with hash values h1 and h2 has been added. It may return a false positive.
Has2 is equivalent to Has(h1<<32 | h2).
func (*Filter) Union ¶
Union sets f to the union of f and g.
Union panics when f and g do not have the same number of bits and hash functions. Both Filters must be using the same hash function(s), but Union cannot check this.
Example ¶
package main import ( "hash/fnv" "io" "github.com/greatroar/blobloom" ) const nworkers = 4 func getKeys(keys chan<- string) { keys <- "hello" keys <- "goodbye" close(keys) } func hash(key string) uint64 { h := fnv.New64() io.WriteString(h, key) return h.Sum64() } func main() { // Union can be used to fill a Bloom filter using multiple goroutines. // // Each goroutine allocates a filter, so the memory use increases by // a factor nworkers-1 compared to a sequential version or a version // using AddAtomic. keys := make(chan string, nworkers) filters := make(chan *blobloom.Filter, nworkers) go getKeys(keys) for i := 0; i < nworkers; i++ { go func() { f := blobloom.New(1<<20, 6) for key := range keys { f.Add(hash(key)) } filters <- f }() } f := <-filters for i := 1; i < nworkers; i++ { f.Union(<-filters) } }
Output: