bloomfilter

package module
v0.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 23, 2024 License: Apache-2.0 Imports: 8 Imported by: 0

README

A Bloom filter is a space-efficient probabilistic data structure used to determine whether an element belongs to a set or not. The Bloom filter allows false positives (maybe in the set) but never false negatives (definitely not in set) . If you are new to bloomfilters, give Bloom Filters by Example a read.

The bloomfilter package is suitable for caching filtering, decentralized aggregation, search large chemical structure databases and many other applications. More specifically, we use this package in production with KrakenD to distributedly reject JWT tokens as it allows us to perform massive rejections with very little memory consumption. For instance, 100 million tokens of any size consume around 0.5GB RAM (with a rate of false positives of 1 in 999,925,224 tokens), and lookups are completed in constant time (k number of hashes). These numbers are impossible to get with a key-value or a relational database.

Implementations

This repository contains several bloomfilter implementations that you can use to solve different distributed computing problems. The solution starts from an optimized local implementation that adds rotation, RPC coordination, and generic rejecters. The packages are:

  • bitset: Implementations of bitsets for basic sets.
  • bloomfilter: Optimized implementation of the bloomfilter.
  • rotable: Implementation over the BF with 3 rotating buckets.
  • rpc: Implementation of an RPC layer over rotable.
  • krakend: Integration of the rpc package as a rejecter for KrakenD

Documentation

Overview

Package bloomfilter contains common data and interfaces needed to implement bloomfilters.

It is based on the theory explained in: http://llimllib.github.io/bloomfilter-tutorial/ In the repo, there are created the following types of bloomfilter: derived from bitset, sliding bloomfilters and rpc bloomfilter implementation.

Index

Constants

View Source
const (
	HASHER_DEFAULT = "default"
	HASHER_OPTIMAL = "optimal"
)

Variables

View Source
var (
	HashFactoryNames = map[string]HashFactory{
		HASHER_DEFAULT: DefaultHashFactory,
		HASHER_OPTIMAL: OptimalHashFactory,
	}

	ErrImpossibleToTreat = fmt.Errorf("unable to union")

	MD5    = HashWrapper(md5.New())  // skipcq: GO-S1023, GSC-G401
	SHA1   = HashWrapper(sha1.New()) // skipcq: GO-S1025, GSC-G401
	CRC64  = HashWrapper(crc64.New(crc64.MakeTable(crc64.ECMA)))
	FNV64  = HashWrapper(fnv.New64())
	FNV128 = HashWrapper(fnv.New128())
)
View Source
var EmptyConfig = Config{
	N: 2,
	P: .5,
}

EmptyConfig configuration used for first empty `previous` bloomfilter in the sliding three bloomfilters

Functions

func K

func K(m, n uint) uint

K function computes the number of hashfunctions of the bloomfilter as function of n and p

func M

func M(n uint, p float64) uint

M function computes the length of the bit array of the bloomfilter as function of n and p

Types

type Bloomfilter

type Bloomfilter interface {
	Add([]byte)
	Check([]byte) bool
	Union(interface{}) (float64, error)
}

Bloomfilter interface implemented in the different packages

type Config

type Config struct {
	N        uint    `json:"n"`
	P        float64 `json:"p"`
	HashName string  `json:"hash_name"`
}

Config for bloomfilter defining the parameters: P - desired false positive probability, N - number of elements to be stored in the filter and HashName - the name of the particular hashfunction

type EmptySet

type EmptySet int

EmptySet type is a synonym of int

func (EmptySet) Add

func (EmptySet) Add(_ []byte)

Add implementation for EmptySet

func (EmptySet) Check

func (EmptySet) Check(_ []byte) bool

Check implementation for EmptySet

func (EmptySet) Union

func (EmptySet) Union(interface{}) (float64, error)

Union implementation for EmptySet

type Hash

type Hash func([]byte) []uint

func DefaultHashFactory

func DefaultHashFactory(k uint) []Hash

func HashWrapper

func HashWrapper(h hash.Hash) Hash

func OptimalHashFactory

func OptimalHashFactory(k uint) []Hash

type HashFactory

type HashFactory func(uint) []Hash

Directories

Path Synopsis
Package bitset implements a bitset based on the bitset library.
Package bitset implements a bitset based on the bitset library.
Package bbloomfilter implements a bloomfilter based on an m-bit bit array, k hashfilters and configuration.
Package bbloomfilter implements a bloomfilter based on an m-bit bit array, k hashfilters and configuration.
cmd
Package krakend registers a bloomfilter given a config and registers the service with consul.
Package krakend registers a bloomfilter given a config and registers the service with consul.
Package rotate implemennts a sliding set of three bloomfilters: `previous`, `current` and `next` and the bloomfilter interface.
Package rotate implemennts a sliding set of three bloomfilters: `previous`, `current` and `next` and the bloomfilter interface.
rpc
Package rpc implements the rpc layer for the bloomfilter, following the principles from https://golang.org/pkg/net/rpc
Package rpc implements the rpc layer for the bloomfilter, following the principles from https://golang.org/pkg/net/rpc
Package testutils contains utils for the tests.
Package testutils contains utils for the tests.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL