Documentation ¶
Overview ¶
Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries.
While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.
Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).
"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)
Example ¶
package main import ( "fmt" cuckoo "github.com/panmari/cuckoofilter" ) func main() { cf := cuckoo.NewFilter(1000) cf.Insert([]byte("pizza")) cf.Insert([]byte("tacos")) cf.Insert([]byte("tacos")) // Re-insertion is possible. fmt.Println(cf.Lookup([]byte("pizza"))) fmt.Println(cf.Lookup([]byte("missing"))) cf.Reset() fmt.Println(cf.Lookup([]byte("pizza"))) }
Output: true false false
Example (ThreadSafe) ¶
package main import ( "fmt" "sync" cuckoo "github.com/panmari/cuckoofilter" ) // Small wrapper around cuckoo filter making it thread safe. type threadSafeFilter struct { cf *cuckoo.Filter mu sync.RWMutex } func (f *threadSafeFilter) insert(item []byte) { // Concurrent inserts need a Write lock. f.mu.Lock() defer f.mu.Unlock() f.cf.Insert(item) } func (f *threadSafeFilter) lookup(item []byte) bool { // Concurrent lookups need a read lock. f.mu.RLock() defer f.mu.RUnlock() return f.cf.Lookup(item) } func main() { cf := &threadSafeFilter{ cf: cuckoo.NewFilter(1000), } var wg sync.WaitGroup // Insert items concurrently... for i := byte(0); i < 50; i++ { wg.Add(1) go func(item byte) { defer wg.Done() cf.insert([]byte{item}) }(i) } // ...while also doing lookups concurrently. for i := byte(0); i < 100; i++ { wg.Add(1) go func(item byte) { defer wg.Done() // State is not well-defined here, so we can't define expectations. cf.lookup([]byte{item}) }(i) } wg.Wait() // Simple lookups to verify initialization. fmt.Println(cf.lookup([]byte{1})) fmt.Println(cf.lookup([]byte{99})) }
Output: true false
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Filter ¶
type Filter struct {
// contains filtered or unexported fields
}
Filter is a probabilistic counter.
func NewFilter ¶
NewFilter returns a new cuckoofilter suitable for the given number of elements. When inserting more elements, insertion speed will drop significantly and insertions might fail altogether. A capacity of 1000000 is a normal default, which allocates about ~2MB on 64-bit machines.
func (*Filter) Delete ¶
Delete data from the filter. Returns true if the data was found and deleted.
Example ¶
package main import ( "fmt" cuckoo "github.com/panmari/cuckoofilter" ) func main() { cf := cuckoo.NewFilter(1000) cf.Insert([]byte("pizza")) cf.Insert([]byte("tacos")) fmt.Println(cf.Lookup([]byte("pizza"))) cf.Delete([]byte("pizza")) fmt.Println(cf.Lookup([]byte("pizza"))) }
Output: true false
func (*Filter) Insert ¶
Insert data into the filter. Returns false if insertion failed. In the resulting state, the filter * Might return false negatives * Deletes are not guaranteed to work To increase success rate of inserts, create a larger filter.
func (*Filter) LoadFactor ¶ added in v0.0.6
LoadFactor returns the fraction slots that are occupied.
func (*Filter) Lookup ¶
Lookup returns true if data is in the filter.
Example ¶
package main import ( "fmt" cuckoo "github.com/panmari/cuckoofilter" ) func main() { cf := cuckoo.NewFilter(1000) cf.Insert([]byte("pizza")) cf.Insert([]byte("tacos")) fmt.Println(cf.Lookup([]byte("pizza"))) fmt.Println(cf.Lookup([]byte("missing"))) }
Output: true false