cuckoofilter

package module
v0.0.0-...-25a093b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 5, 2015 License: MIT Imports: 3 Imported by: 0

README

godoc Build Status

Package cuckoofilter implements the Cuckoo Filter algorithm for approximated set-membership queries. See https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf for the theory.

Six types are defined depending on the use cases:

Filter Type Name Input Type memory usage (per item) error rate (%)
Filter64S uint64 1 byte ~11
Filter64M uint64 2 bytes ~1
Filter64L uint64 8 bytes ~0.003
FilterS []byte 1 byte ~11
FilterM []byte 2 bytes ~1
FilterL []byte 8 bytes ~0.003

Filter64* types are much faster then their siblings, see the benchmarks below.

The following benchmarks were performed on randomly generated items of sizes up to 16 bytes.

Benchmarks for FilterL:

Benchmark Name # tests time/op bits/op allocations/op
Benchmark64InsertL-4 2000000 1283 ns/op 4 B/op 0 allocs/op
Benchmark64HasL-4 30000000 39.4 ns/op 0 B/op 0 allocs/op
BenchmarkInsertL-4 1000000 1060 ns/op 8 B/op 0 allocs/op
BenchmarkHasL-4 20000000 101 ns/op 0 B/op 0 allocs/op

Benchmarks for FilterM:

Benchmark Name # tests time/op bits/op allocations/op
Benchmark64InsertM-4 2000000 2152 ns/op 1 B/op 0 allocs/op
Benchmark64HasM-4 50000000 22.6 ns/op 0 B/op 0 allocs/op
BenchmarkInsertM-4 2000000 1691 ns/op 1 B/op 0 allocs/op
BenchmarkHasM-4 20000000 79.0 ns/op 0 B/op 0 allocs/op

Benchmarks for FilterS:

Benchmark Name # tests time/op bits/op allocations/op
Benchmark64InsertS-4 1000000 1077 ns/op 1 B/op 0 allocs/op
Benchmark64HasS-4 100000000 16.4 ns/op 0 B/op 0 allocs/op
BenchmarkInsertS-4 1000000 1004 ns/op 1 B/op 0 allocs/op
BenchmarkHasS-4 20000000 65.6 ns/op 0 B/op 0 allocs/op

Benchmarks for github.com/seiflotfy/cuckoofilter:

Benchmark Name # tests time/op bits/op allocations/op
BenchmarkExt1Insert-4 1000000 4950 ns/op 17 B/op 1 allocs/op
BenchmarkExt1Has-4 10000000 175 ns/op 16 B/op 1 allocs/op

Benchmarks for github.com/tylertreat/BoomFilters:

Benchmark Name # tests time/op bits/op allocations/op
BenchmarkExt2Insert-4 1 1656973329 ns/op 1006633088 B/op 8388613 allocs/op
BenchmarkExt2Has-4 3000000 455 ns/op 32 B/op 2 allocs/op

Documentation

Overview

Package cuckoofilter implements the Cuckoo Filter algorithm for approximated set-membership queries. This means that one can check for data being in a set and either get a definitive answer if it is not in the set, or a "maybe" answer but with a low probability of being wrong. Cuckoo filters are similar in functionality to Bloom filters, but they support removing items from the filter without altering future results.

Six Cuckoo filter types are defined to be used in different cases:

Filter64S: for uint64 items, with low memory footprint (1 byte per item) and with an error rate of ~11%
Filter64M: for uint64 items, with medium memory footprint (2 bytes per item) and with an error rate of ~1%
Filter64L: for uint64 items, with high memory footprint (8 bytes per item) and with an error rate of ~0.005%
FilterS: low memory footprint (1 byte per item) and with an error rate of ~11%
FilterM: medium memory footprint (2 bytes per item) and with an error rate of ~1%
FilterL: high memory footprint (8 bytes per item) and with an error rate of ~0.005%

Filter64* types are much faster then their siblings.

See https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf for the theory behind Cuckoo filters.

Example (RowIds)
// some row ids
rowIds := []uint64{24, 98, 58, 345, 111, 156800, 123, 961}

// create a filter to hold the row ids and populate it
f := cuckoofilter.New64S(uint(128))
for _, id := range rowIds {
	if !f.Insert(id) {
		fmt.Printf("could not insert row id %d\n", id)
	}
}

// perform some checks
fmt.Printf("number of items in the filter: %d\n", f.Len())
fmt.Printf("item 123 is in the filter: %v\n", f.Has(123))
fmt.Printf("item 456 is in the filter: %v\n", f.Has(456))

// remove one item
if !f.Delete(123) {
	fmt.Printf("could not remove row id %d\n", 123)
}
fmt.Printf("item 123 is in the filter: %v\n", f.Has(123))
Output:

number of items in the filter: 8
item 123 is in the filter: true
item 456 is in the filter: false
item 123 is in the filter: false
Example (Words)
// some words
words := strings.Split(
	strings.Replace(
		"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua",
		",",
		"",
		-1,
	),
	" ",
)

// create a filter to hold the words and populate it
f := cuckoofilter.NewM(uint(len(words)))
for _, w := range words {
	if !f.Insert([]byte(w)) {
		fmt.Printf("could not insert word %s\n", w)
	}
}

// perform some checks
fmt.Printf("number of words in the filter: %d\n", f.Len())
w := "elit"
fmt.Printf("word '%s' is in the filter: %v\n", w, f.Has([]byte(w)))
w = "e_lit"
fmt.Printf("word '%s' is in the filter: %v\n", w, f.Has([]byte(w)))

// remove one item
w = "elit"
if !f.Delete([]byte(w)) {
	fmt.Printf("could not remove word %s\n", w)
}
fmt.Printf("word '%s' is in the filter: %v\n", w, f.Has([]byte(w)))
Output:

number of words in the filter: 19
word 'elit' is in the filter: true
word 'e_lit' is in the filter: false
word 'elit' is in the filter: false

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Filter64L

type Filter64L struct {
	// contains filtered or unexported fields
}

Filter64L represents a Cuckoo Filter that only stores uint64 items and as such is much faster than FilterL.

func New64L

func New64L(n uint) *Filter64L

New64L creates a Filter64L containing up to n uint64 items. A Filter64L contains a minimum of 2 items.

func (*Filter64L) Cap

func (cf *Filter64L) Cap() int

Cap returns the filter capacity (maximum number of items it may contain).

func (*Filter64L) Delete

func (cf *Filter64L) Delete(x uint64) bool

Delete removes an item from the Filter and returns whether or not it was present. To delete an item safely it must have been previously inserted.

func (*Filter64L) Has

func (cf *Filter64L) Has(x uint64) bool

Has checks if the item is in the Filter. If it returns false, then the item is definitely not in the filter. If it returns true, then the item *may* be in the filter, although with a low probability of not being in it.

func (*Filter64L) Insert

func (cf *Filter64L) Insert(x uint64) bool

Insert add an item to the Filter and returns whether it was inserted or not.

NB. The same item cannot be inserted more than 2 times.

func (*Filter64L) Len

func (cf *Filter64L) Len() (n int)

Len returns the number of items in the filter.

type Filter64M

type Filter64M struct {
	// contains filtered or unexported fields
}

Filter64M represents a Cuckoo Filter that only stores uint64 items and as such is much faster than FilterM.

func New64M

func New64M(n uint) *Filter64M

New64M creates a Filter64M containing up to n uint64 items. A Filter64M contains a minimum of 2 items.

func (*Filter64M) Cap

func (cf *Filter64M) Cap() int

Cap returns the filter capacity (maximum number of items it may contain).

func (*Filter64M) Delete

func (cf *Filter64M) Delete(x uint64) bool

Delete removes an item from the Filter and returns whether or not it was present. To delete an item safely it must have been previously inserted.

func (*Filter64M) Has

func (cf *Filter64M) Has(x uint64) bool

Has checks if the item is in the Filter. If it returns false, then the item is definitely not in the filter. If it returns true, then the item *may* be in the filter, although with a low probability of not being in it.

func (*Filter64M) Insert

func (cf *Filter64M) Insert(x uint64) bool

Insert add an item to the Filter and returns whether it was inserted or not.

NB. The same item cannot be inserted more than 2 times.

func (*Filter64M) Len

func (cf *Filter64M) Len() (n int)

Len returns the number of items in the filter.

type Filter64S

type Filter64S struct {
	// contains filtered or unexported fields
}

Filter64S represents a Cuckoo Filter that only stores uint64 items and as such is much faster than FilterS.

func New64S

func New64S(n uint) *Filter64S

New64S creates a Filter64S containing up to n uint64 items. A Filter64S contains a minimum of 2 items.

func (*Filter64S) Cap

func (cf *Filter64S) Cap() int

Cap returns the filter capacity (maximum number of items it may contain).

func (*Filter64S) Delete

func (cf *Filter64S) Delete(x uint64) bool

Delete removes an item from the Filter and returns whether or not it was present. To delete an item safely it must have been previously inserted.

func (*Filter64S) Has

func (cf *Filter64S) Has(x uint64) bool

Has checks if the item is in the Filter. If it returns false, then the item is definitely not in the filter. If it returns true, then the item *may* be in the filter, although with a low probability of not being in it.

func (*Filter64S) Insert

func (cf *Filter64S) Insert(x uint64) bool

Insert add an item to the Filter and returns whether it was inserted or not.

NB. The same item cannot be inserted more than 2 times.

func (*Filter64S) Len

func (cf *Filter64S) Len() (n int)

Len returns the number of items in the filter.

type FilterL

type FilterL struct {
	// contains filtered or unexported fields
}

FilterL represents a Cuckoo Filter with 4 bytes per item and an error rate of 0.005.

func NewL

func NewL(n uint) *FilterL

NewL creates a Filter containing up to n items with 4 bytes per item and an error rate of 0.005.

func (*FilterL) Cap

func (cf *FilterL) Cap() int

Cap returns the filter capacity.

func (*FilterL) Delete

func (cf *FilterL) Delete(b []byte) bool

Delete removes an item from the Filter and returns whether or not it was present. To delete an item safely it must have been previously inserted.

func (*FilterL) Has

func (cf *FilterL) Has(b []byte) bool

Has checks if the item is in the Filter.

func (*FilterL) Insert

func (cf *FilterL) Insert(b []byte) bool

Insert add an item to the Filter and returns whether it was inserted or not.

NB. The same item cannot be inserted more than 2 times.

func (*FilterL) Len

func (cf *FilterL) Len() int

Len returns the number of items in the filter.

type FilterM

type FilterM struct {
	// contains filtered or unexported fields
}

FilterM represents a Cuckoo Filter with 2 bytes per item and an error rate of 1.

func NewM

func NewM(n uint) *FilterM

NewM creates a Filter containing up to n items with 2 bytes per item and an error rate of 1.

func (*FilterM) Cap

func (cf *FilterM) Cap() int

Cap returns the filter capacity.

func (*FilterM) Delete

func (cf *FilterM) Delete(b []byte) bool

Delete removes an item from the Filter and returns whether or not it was present. To delete an item safely it must have been previously inserted.

func (*FilterM) Has

func (cf *FilterM) Has(b []byte) bool

Has checks if the item is in the Filter.

func (*FilterM) Insert

func (cf *FilterM) Insert(b []byte) bool

Insert add an item to the Filter and returns whether it was inserted or not.

NB. The same item cannot be inserted more than 2 times.

func (*FilterM) Len

func (cf *FilterM) Len() int

Len returns the number of items in the filter.

type FilterS

type FilterS struct {
	// contains filtered or unexported fields
}

FilterS represents a Cuckoo Filter with 1 byte per item and an error rate of 11.

func NewS

func NewS(n uint) *FilterS

NewS creates a Filter containing up to n items with 1 byte per item and an error rate of 11.

func (*FilterS) Cap

func (cf *FilterS) Cap() int

Cap returns the filter capacity.

func (*FilterS) Delete

func (cf *FilterS) Delete(b []byte) bool

Delete removes an item from the Filter and returns whether or not it was present. To delete an item safely it must have been previously inserted.

func (*FilterS) Has

func (cf *FilterS) Has(b []byte) bool

Has checks if the item is in the Filter.

func (*FilterS) Insert

func (cf *FilterS) Insert(b []byte) bool

Insert add an item to the Filter and returns whether it was inserted or not.

NB. The same item cannot be inserted more than 2 times.

func (*FilterS) Len

func (cf *FilterS) Len() int

Len returns the number of items in the filter.

Directories

Path Synopsis
This command is used to generate the Cuckoo filter types.
This command is used to generate the Cuckoo filter types.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL