cuckoo

package module
v0.0.0-...-9fc3065 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 29, 2020 License: MIT Imports: 8 Imported by: 0

README

Cuckoo Filter

GoDoc CodeHunt.io

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

For details about the algorithm and citations please use this article for now

"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky

Implementation details

The paper cited above leaves several parameters to choose. In this implementation

  1. Every element has 2 possible bucket indices
  2. Buckets have a static size of 4 fingerprints
  3. Fingerprints have a static size of 8 bits

1 and 2 are suggested to be the optimum by the authors. The choice of 3 comes down to the desired false positive rate. Given a target false positive rate of r and a bucket size b, they suggest choosing the fingerprint size f using

f >= log2(2b/r) bits

With the 8 bit fingerprint size in this repository, you can expect r ~= 0.03. Other implementations use 16 bit, which correspond to a false positive rate of r ~= 0.0001.

Example usage:

package main

import "fmt"
import "github.com/seiflotfy/cuckoofilter"

func main() {
  cf := cuckoo.NewFilter(1000)
  cf.InsertUnique([]byte("geeky ogre"))

  // Lookup a string (and it a miss) if it exists in the cuckoofilter
  cf.Lookup([]byte("hello"))

  count := cf.Count()
  fmt.Println(count) // count == 1

  // Delete a string (and it a miss)
  cf.Delete([]byte("hello"))

  count = cf.Count()
  fmt.Println(count) // count == 1

  // Delete a string (a hit)
  cf.Delete([]byte("geeky ogre"))

  count = cf.Count()
  fmt.Println(count) // count == 0
  
  cf.Reset()    // reset
}

Documentation:

"Cuckoo Filter on GoDoc"

Documentation

Overview

Package cuckoo provides a Cuckoo Filter, a Bloom filter replacement for approximated set-membership queries.

While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

For details about the algorithm and citations please use this article:

"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)

Note: This implementation uses a a static bucket size of 4 fingerprints and a fingerprint size of 1 byte based on my understanding of an optimal bucket/fingerprint/size ratio from the aforementioned paper.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func UintIn

func UintIn(n uint) []byte

func UintOut

func UintOut(bye []byte) uint

Types

type Filter

type Filter struct {
	FilePath string
	// contains filtered or unexported fields
}

Filter is a probabilistic counter

func Decode

func Decode(bytes []byte) (*Filter, error)

Decode returns a Cuckoofilter from a byte slice

func NewFilter

func NewFilter(capacity uint, path string) *Filter

NewFilter returns a new cuckoofilter with a given capacity. A capacity of 1000000 is a normal default, which allocates about ~1MB on 64-bit machines.

func ReadFile

func ReadFile(path string) (*Filter, error)

func (*Filter) Count

func (cf *Filter) Count() uint

Count returns the number of items in the counter

func (*Filter) Delete

func (cf *Filter) Delete(data []byte) bool

Delete data from counter if exists and return if deleted or not

func (*Filter) Encode

func (cf *Filter) Encode() []byte

Encode returns a byte slice representing a Cuckoofilter

func (*Filter) Expand

func (cf *Filter) Expand()

Expand expands the buckets when it was fulfilled

func (*Filter) Insert

func (cf *Filter) Insert(data []byte) bool

Insert inserts data into the counter and returns true upon success

func (*Filter) InsertUnique

func (cf *Filter) InsertUnique(data []byte) bool

InsertUnique inserts data into the counter if not exists and returns true upon success

func (*Filter) Lookup

func (cf *Filter) Lookup(data []byte) bool

Lookup returns true if data is in the counter

func (*Filter) ReadFile

func (cf *Filter) ReadFile() error

func (*Filter) Reset

func (cf *Filter) Reset()

func (*Filter) SaveFile

func (cf *Filter) SaveFile() error

type LCG

type LCG struct {
	// contains filtered or unexported fields
}

Linear Congruential Generator See https://link.springer.com/chapter/10.1007/978-1-4615-2317-8_3

func (*LCG) Intn

func (l *LCG) Intn(n int) int

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL