splitblockbloom

package module
v0.0.0-...-fb52550 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2024 License: MIT Imports: 6 Imported by: 0

README

Split Block Bloom Filters

This repository contains an implementation of the Split Block Bloom Filters as described in the paper "Split Block Bloom Filters" by Jim Apple.

Overview

The Split Block Bloom Filter is an advanced implementation of the classic Bloom Filter, a probabilistic data structure used for set membership testing. It is designed to offer high efficiency and speed for modern computing architectures. This implementation is based on the research presented in the paper "Split Block Bloom Filters" by Jim Apple.

Features

  • Enhanced Cache Efficiency: Reduces cache line accesses to just one, significantly improving cache performance.
  • Optimized Hash Functions: Utilizes a split approach and eight hash functions for efficient bit setting and checking.
  • Balanced Speed and Accuracy: Offers a practical trade-off between a slightly higher false positive rate and significantly faster operations. This trade-off is favorable in scenarios where speed is more critical than having the absolute lowest false positive rate, especially within the practical range of ε (0.40% to 19%).

Usage

package main

import (
    "fmt"
    "github.com/axiomhq/splitblockbloom" // Ensure this is the correct path to your package
)

func main() {
    // Creating a new Split Block Bloom Filter
    // Parameters: Number of distinct values (ndv), False positive probability (fpp)
    // Example: Creating a filter for 1000 distinct values with a 1% false positive probability
    filter := splitblockbloom.NewFilter(1000, 0.01)

    // Adding elements to the filter
    // The AddHash method takes a hash of the item you want to add
    // Example: Adding elements with hash values 123456 and 789012
    filter.AddHash(123456)
    filter.AddHash(789012)

    // Checking if elements are in the filter
    // The Contains method returns a boolean indicating whether the element
    // (by its hash) is possibly in the filter
    // Remember: Bloom Filters can have false positives
    fmt.Println("123456 in filter:", filter.Contains(123456)) // true (element is added)
    fmt.Println("111111 in filter:", filter.Contains(111111)) // false (probably, as it wasn't added)
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ContainsFromStream

func ContainsFromStream(stream io.ReadSeeker, blockCount int, hash uint64) (bool, error)

func RecommendedBitsPerValue

func RecommendedBitsPerValue(ndv uint64, desiredFalsePositiveRatio float64) float64

Types

type Block

type Block [wordsPerBlock]uint32

func (*Block) AddHash

func (blk *Block) AddHash(hash uint64)

func (*Block) AddHashIfNotContains

func (blk *Block) AddHashIfNotContains(hash uint64) bool

func (*Block) AppendTo

func (blk *Block) AppendTo(bs []byte) []byte

func (*Block) Contains

func (blk *Block) Contains(hash uint64) bool

func (*Block) EstimateCardinality

func (blk *Block) EstimateCardinality() int

func (*Block) Merge

func (blk *Block) Merge(other *Block)

func (*Block) ReadFrom

func (blk *Block) ReadFrom(r io.Reader) (int64, error)

func (*Block) UnmarshalBinary

func (blk *Block) UnmarshalBinary(bs []byte) error

func (*Block) WriteTo

func (blk *Block) WriteTo(w io.Writer) (int64, error)

type Filter

type Filter []Block

func NewFilter

func NewFilter(ndv, bpv uint64) Filter

NewFilter creates a new blocked bloom filter.

func (Filter) AddHash

func (f Filter) AddHash(hash uint64)

func (Filter) AddHashIfNotContains

func (f Filter) AddHashIfNotContains(hash uint64) bool

func (Filter) Contains

func (f Filter) Contains(hash uint64) bool

func (Filter) EstimateCardinality

func (f Filter) EstimateCardinality() int

func (Filter) Merge

func (f Filter) Merge(other Filter)

func (Filter) NumBlocks

func (f Filter) NumBlocks() int

func (*Filter) ReadFrom

func (f *Filter) ReadFrom(r io.Reader) (int64, error)

func (Filter) SizeInBytes

func (f Filter) SizeInBytes() int

func (Filter) WriteTo

func (f Filter) WriteTo(w io.Writer) (int64, error)

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL