bloom

package module
v3.0.1+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 31, 2020 License: BSD-2-Clause Imports: 3 Imported by: 2

README

Bloom filters

Note: This is a fork of github.com/willf/bloom that provides a read only bloom filter and a concurrent read only bloom filter that can both be instantiated from a mmap'd bytes ref and also limits the scope of the API.

Master Build Status Coverage Status Go Report Card GoDoc

A Bloom filter is a representation of a set of n items, where the main requirement is to make membership queries; i.e., whether an item is a member of a set.

A Bloom filter has two parameters: m, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and k, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo m). Set membership is done by testing whether the bits at each value of the hashing functions (again, modulo m) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose k and m correctly.

In this implementation, the hashing functions used is murmurhash, a non-cryptographic hashing function.

This implementation accepts keys for setting and testing as []byte. Thus, to add a string item, "Love":

n := uint(1000)
filter := bloom.NewBloomFilter(20*n, 5) // load of 20, 5 keys
filter.Add([]byte("Love"))

Similarly, to test if "Love" is in bloom:

if filter.Test([]byte("Love"))

Godoc documentation: https://godoc.org/github.com/m3db/bloom

Performance

  • macOS 10.13.6
  • go version go1.10 darwin/amd64
$ GOMAXPROCS=1 go test github.com/m3db/bloom -bench=.
goos: darwin
goarch: amd64
pkg: github.com/m3db/bloom
BenchmarkAddX10kX5              10000000               174 ns/op
BenchmarkContains1kX10kX5       10000000               139 ns/op
BenchmarkContains100kX10BX20     5000000               271 ns/op

Installation

go get -u github.com/m3db/bloom

Running all tests

Before committing the code, please check if it passes all tests using (note: this will install some dependencies):

make deps
make qa

Documentation

Overview

Package bloom provides data structures and methods for creating Bloom filters.

A Bloom filter is a representation of a set of _n_ items, where the main requirement is to make membership queries; _i.e._, whether an item is a member of a set.

A Bloom filter has two parameters: _m_, a maximum size (typically a reasonably large multiple of the cardinality of the set to represent) and _k_, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo _m_). Set membership is done by _testing_ whether the bits at each value of the hashing functions (again, modulo _m_) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose _k_ and _m_ correctly.

In this implementation, the hashing functions used is murmurhash, a non-cryptographic hashing function.

This implementation accepts keys for setting as testing as []byte. Thus, to add a string item, "Love":

uint n = 1000
filter := bloom.New(20*n, 5) // load of 20, 5 keys
filter.Add([]byte("Love"))

Similarly, to test if "Love" is in bloom:

if filter.Test([]byte("Love"))

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EstimateFalsePositiveRate

func EstimateFalsePositiveRate(n uint, p float64) (m uint, k uint)

EstimateFalsePositiveRate estimates m and k, based on: https://stackoverflow.com/a/22467497

Types

type BloomFilter

type BloomFilter struct {
	// contains filtered or unexported fields
}

BloomFilter is a bloom filter set membership. It cannot be concurrently read or written to. Multiple concurrent readers is also unsafe so a sync.Mutex must be used to guard read/write access if desired. This is due to a cached hash digest being used to avoid allocation each read/write.

func NewBloomFilter

func NewBloomFilter(m uint, k uint) *BloomFilter

NewBloomFilter creates a new bloom filter that can represent m elements with k hashes. It is not concurrent read or write safe.

func (*BloomFilter) Add

func (b *BloomFilter) Add(value []byte)

Add value to the set.

func (*BloomFilter) BitSet

func (b *BloomFilter) BitSet() *bitset.BitSet

BitSet returns the bitset used.

func (*BloomFilter) K

func (b *BloomFilter) K() uint

K returns the k hashes used.

func (*BloomFilter) M

func (b *BloomFilter) M() uint

M returns the m elements represented.

func (*BloomFilter) Test

func (b *BloomFilter) Test(value []byte) bool

Test if value is in the set.

type ConcurrentReadOnlyBloomFilter

type ConcurrentReadOnlyBloomFilter struct {
	// contains filtered or unexported fields
}

ConcurrentReadOnlyBloomFilter is a concurrent read only bloom filter set membership. It can be concurrently read from by any number of readers.

func NewConcurrentReadOnlyBloomFilter

func NewConcurrentReadOnlyBloomFilter(
	m, k uint,
	data []byte,
) *ConcurrentReadOnlyBloomFilter

NewConcurrentReadOnlyBloomFilter returns a new concurrent read only bloom filter backed by a byte slice, this means it can be used with a mmap'd bytes ref. It can be concurrently read from by any number of readers.

func (*ConcurrentReadOnlyBloomFilter) BitSet

BitSet returns the bitset used.

func (*ConcurrentReadOnlyBloomFilter) K

K returns the k hashes used.

func (*ConcurrentReadOnlyBloomFilter) M

M returns the m elements represented.

func (*ConcurrentReadOnlyBloomFilter) Test

func (b *ConcurrentReadOnlyBloomFilter) Test(value []byte) bool

Test if value is in the set.

type ReadOnlyBloomFilter

type ReadOnlyBloomFilter struct {
	// contains filtered or unexported fields
}

ReadOnlyBloomFilter is a read only bloom filter set membership. It cannot be concurrently read or written to. Multiple concurrent readers is also unsafe so a sync.Mutex must be used to guard read/write access if desired. This is due to a cached hash digest being used to avoid allocation each read/write.

func NewReadOnlyBloomFilter

func NewReadOnlyBloomFilter(m, k uint, data []byte) *ReadOnlyBloomFilter

NewReadOnlyBloomFilter returns a new read only bloom filter backed by a byte slice, this means it can be used with a mmap'd bytes ref. It is not concurrent read or write safe.

func (*ReadOnlyBloomFilter) BitSet

BitSet returns the bitset used.

func (*ReadOnlyBloomFilter) K

func (b *ReadOnlyBloomFilter) K() uint

K returns the k hashes used.

func (*ReadOnlyBloomFilter) M

func (b *ReadOnlyBloomFilter) M() uint

M returns the m elements represented.

func (*ReadOnlyBloomFilter) Test

func (b *ReadOnlyBloomFilter) Test(value []byte) bool

Test if value is in the set.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL