cuckoo

package module
v0.0.0-...-a2f2c23 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 15, 2024 License: MIT Imports: 6 Imported by: 169

README

Cuckoo Filter

GoDoc CodeHunt.io

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

For details about the algorithm and citations please use this article for now

"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky

Implementation details

The paper cited above leaves several parameters to choose. In this implementation

  1. Every element has 2 possible bucket indices
  2. Buckets have a static size of 4 fingerprints
  3. Fingerprints have a static size of 8 bits

1 and 2 are suggested to be the optimum by the authors. The choice of 3 comes down to the desired false positive rate. Given a target false positive rate of r and a bucket size b, they suggest choosing the fingerprint size f using

f >= log2(2b/r) bits

With the 8 bit fingerprint size in this repository, you can expect r ~= 0.03. Other implementations use 16 bit, which correspond to a false positive rate of r ~= 0.0001.

Example usage:

package main

import "fmt"
import cuckoo "github.com/seiflotfy/cuckoofilter"

func main() {
  cf := cuckoo.NewFilter(1000)
  cf.InsertUnique([]byte("geeky ogre"))

  // Lookup a string (and it a miss) if it exists in the cuckoofilter
  cf.Lookup([]byte("hello"))

  count := cf.Count()
  fmt.Println(count) // count == 1

  // Delete a string (and it a miss)
  cf.Delete([]byte("hello"))

  count = cf.Count()
  fmt.Println(count) // count == 1

  // Delete a string (a hit)
  cf.Delete([]byte("geeky ogre"))

  count = cf.Count()
  fmt.Println(count) // count == 0
  
  cf.Reset()    // reset
}

Documentation:

"Cuckoo Filter on GoDoc"

Documentation

Overview

Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries.

While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

For details about the algorithm and citations please use this article:

"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.

Index

Constants

View Source
const (
	DefaultLoadFactor = 0.9
	DefaultCapacity   = 10000
)

Variables

This section is empty.

Functions

func SetDefaultHasher

func SetDefaultHasher(hasher Hasher)

Types

type Filter

type Filter struct {
	// contains filtered or unexported fields
}

Filter is a probabilistic counter

func Decode

func Decode(bytes []byte) (*Filter, error)

Decode returns a Cuckoofilter from a byte slice

func NewFilter

func NewFilter(capacity uint) *Filter

NewFilter returns a new cuckoofilter with a given capacity. A capacity of 1000000 is a normal default, which allocates about ~1MB on 64-bit machines.

func (*Filter) Count

func (cf *Filter) Count() uint

Count returns the number of items in the counter

func (*Filter) Delete

func (cf *Filter) Delete(data []byte) bool

Delete data from counter if exists and return if deleted or not

func (*Filter) Encode

func (cf *Filter) Encode() []byte

Encode returns a byte slice representing a Cuckoofilter

func (*Filter) Insert

func (cf *Filter) Insert(data []byte) bool

Insert inserts data into the counter and returns true upon success

func (*Filter) InsertUnique

func (cf *Filter) InsertUnique(data []byte) bool

InsertUnique inserts data into the counter if not exists and returns true upon success

func (*Filter) Lookup

func (cf *Filter) Lookup(data []byte) bool

Lookup returns true if data is in the counter

func (*Filter) Reset

func (cf *Filter) Reset()

Reset ...

type Hasher

type Hasher interface {
	Hash64([]byte) uint64
}

type ScalableCuckooFilter

type ScalableCuckooFilter struct {
	// contains filtered or unexported fields
}

func DecodeScalableFilter

func DecodeScalableFilter(fBytes []byte) (*ScalableCuckooFilter, error)

func NewScalableCuckooFilter

func NewScalableCuckooFilter(opts ...option) *ScalableCuckooFilter
by default option the grow capacity is:
capacity , total
4096  4096
8192  12288

16384 28672 32768 61440 65536 126,976

func (*ScalableCuckooFilter) Count

func (sf *ScalableCuckooFilter) Count() uint

func (*ScalableCuckooFilter) DecodeWithParam

func (sf *ScalableCuckooFilter) DecodeWithParam(fBytes []byte, opts ...option) (*ScalableCuckooFilter, error)

func (*ScalableCuckooFilter) Delete

func (sf *ScalableCuckooFilter) Delete(data []byte) bool

func (*ScalableCuckooFilter) Encode

func (sf *ScalableCuckooFilter) Encode() []byte

func (*ScalableCuckooFilter) Insert

func (sf *ScalableCuckooFilter) Insert(data []byte) bool

func (*ScalableCuckooFilter) InsertUnique

func (sf *ScalableCuckooFilter) InsertUnique(data []byte) bool

func (*ScalableCuckooFilter) Lookup

func (sf *ScalableCuckooFilter) Lookup(data []byte) bool

func (*ScalableCuckooFilter) Reset

func (sf *ScalableCuckooFilter) Reset()

type Store

type Store struct {
	Bytes      [][]byte
	LoadFactor float32
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL