README
¶
Cuckoo Filter
Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.
Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).
For details about the algorithm and citations please use this article for now
"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky
Implementation details
The paper cited above leaves several parameters to choose. In this implementation
- Every element has 2 possible bucket indices
- Buckets have a static size of 4 fingerprints
- Fingerprints have a static size of 8 bits
1 and 2 are suggested to be the optimum by the authors. The choice of 3 comes down to the desired false positive rate. Given a target false positive rate of r
and a bucket size b
, they suggest choosing the fingerprint size f
using
f >= log2(2b/r) bits
With the 8 bit fingerprint size in this repository, you can expect r ~= 0.03
.
Other implementations use 16 bit, which correspond to a false positive rate of r ~= 0.0001
.
Example usage:
package main
import "fmt"
import cuckoo "github.com/seiflotfy/cuckoofilter"
func main() {
cf := cuckoo.NewFilter(1000)
cf.InsertUnique([]byte("geeky ogre"))
// Lookup a string (and it a miss) if it exists in the cuckoofilter
cf.Lookup([]byte("hello"))
count := cf.Count()
fmt.Println(count) // count == 1
// Delete a string (and it a miss)
cf.Delete([]byte("hello"))
count = cf.Count()
fmt.Println(count) // count == 1
// Delete a string (a hit)
cf.Delete([]byte("geeky ogre"))
count = cf.Count()
fmt.Println(count) // count == 0
cf.Reset() // reset
}
Documentation:
Documentation
¶
Overview ¶
Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries.
While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.
Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).
For details about the algorithm and citations please use this article:
"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)
Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.
Index ¶
- Constants
- func SetDefaultHasher(hasher Hasher)
- type Filter
- type Hasher
- type ScalableCuckooFilter
- func (sf *ScalableCuckooFilter) Count() uint
- func (sf *ScalableCuckooFilter) DecodeWithParam(fBytes []byte, opts ...option) (*ScalableCuckooFilter, error)
- func (sf *ScalableCuckooFilter) Delete(data []byte) bool
- func (sf *ScalableCuckooFilter) Encode() []byte
- func (sf *ScalableCuckooFilter) Insert(data []byte) bool
- func (sf *ScalableCuckooFilter) InsertUnique(data []byte) bool
- func (sf *ScalableCuckooFilter) Lookup(data []byte) bool
- func (sf *ScalableCuckooFilter) Reset()
- type Store
Constants ¶
const ( DefaultLoadFactor = 0.9 DefaultCapacity = 10000 )
Variables ¶
This section is empty.
Functions ¶
func SetDefaultHasher ¶
func SetDefaultHasher(hasher Hasher)
Types ¶
type Filter ¶
type Filter struct {
// contains filtered or unexported fields
}
Filter is a probabilistic counter
func NewFilter ¶
NewFilter returns a new cuckoofilter with a given capacity. A capacity of 1000000 is a normal default, which allocates about ~1MB on 64-bit machines.
func (*Filter) InsertUnique ¶
InsertUnique inserts data into the counter if not exists and returns true upon success
type ScalableCuckooFilter ¶
type ScalableCuckooFilter struct {
// contains filtered or unexported fields
}
func DecodeScalableFilter ¶
func DecodeScalableFilter(fBytes []byte) (*ScalableCuckooFilter, error)
func NewScalableCuckooFilter ¶
func NewScalableCuckooFilter(opts ...option) *ScalableCuckooFilter
by default option the grow capacity is: capacity , total 4096 4096 8192 12288
16384 28672 32768 61440 65536 126,976
func (*ScalableCuckooFilter) Count ¶
func (sf *ScalableCuckooFilter) Count() uint
func (*ScalableCuckooFilter) DecodeWithParam ¶
func (sf *ScalableCuckooFilter) DecodeWithParam(fBytes []byte, opts ...option) (*ScalableCuckooFilter, error)
func (*ScalableCuckooFilter) Delete ¶
func (sf *ScalableCuckooFilter) Delete(data []byte) bool
func (*ScalableCuckooFilter) Encode ¶
func (sf *ScalableCuckooFilter) Encode() []byte
func (*ScalableCuckooFilter) Insert ¶
func (sf *ScalableCuckooFilter) Insert(data []byte) bool
func (*ScalableCuckooFilter) InsertUnique ¶
func (sf *ScalableCuckooFilter) InsertUnique(data []byte) bool
func (*ScalableCuckooFilter) Lookup ¶
func (sf *ScalableCuckooFilter) Lookup(data []byte) bool
func (*ScalableCuckooFilter) Reset ¶
func (sf *ScalableCuckooFilter) Reset()