maybe

package module

v0.0.0-...-22165a2 Latest Latest Go to latest Published: Feb 7, 2021 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/chermehdi/maybe

Links

Open Source Insights

README ¶

Maybe

A library containing a set of probabilstic data structures that I found interesting while researching the subject.

package main

import (
	"fmt"
	"github.com/chermehdi/maybe"
)

type String string

func (i String) Bytes() []byte {
	return []byte(i)
}

func main() {
	hll, _ := maybe.NewHyperLogLog(10)
	hll.Add(String("a"))
	hll.Add(String("b"))
	hll.Add(String("c"))
	hll.Add(String("d"))
	fmt.Println(4 == hll.Cardinality())
}

Resources

Documentation ¶

Index ¶

func MurMurHash(value AsBytes) uint64
type AsBytes
type BloomFilter
- func NewBloomFilter(size uint, numFunc uint) (*BloomFilter, error)
- func NewBloomFilterWithHashes(size uint, hashes []HashFunc) (*BloomFilter, error)
- func (bf *BloomFilter) Add(value AsBytes)
- func (bf *BloomFilter) Has(value AsBytes) bool
type CountMinSketch
- func NewCountMinSketch(width, depth uint32) *CountMinSketch
type HashFunc
- func MurmurFrom(value uint64) HashFunc
type HyperLogLog
- func NewHyperLogLog(b uint) (*HyperLogLog, error)
- func (hll *HyperLogLog) Add(value AsBytes)
- func (hll *HyperLogLog) Cardinality() uint64

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func MurMurHash ¶

func MurMurHash(value AsBytes) uint64

Types ¶

type AsBytes ¶

type AsBytes interface {
	Bytes() []byte
}

AsBytes is an abstraction needed by the maybe library component to indicate anytype that can be transformed to a series of bytes, so that we can apply a hash function to it.

type BloomFilter ¶

type BloomFilter struct {
	// contains filtered or unexported fields
}

BloomFilter is a probabilistic data structure to test weather a given value is a member of a given set (represented by this bloom filter instance).

The main advantage of this data structure is the low memory footprint, it can be used to do set membership on a very large multiset with very low memory footprint (in the order of `Kb`) as opposed to storing this in a set which can have a memory footprint in the order of `Mg` or `Gb`. Of course this comes with the additional tradeoff of certainty, if a bloom filter `Has` method returns `true` this means that the element is already a member of this set with a certainty of `x%` as opposed to the `100%` given by a set-like data structure. however if a `Has` call on a bloom filter returns `false` then the element is not in the set `100%`.

The certainty factor is determined by the size of the bitset in the internal implementation, and the number of hash functions you apply to the value, the corollary is this: the bigger the bitset and the more hash functions you have the more accurate the result of the bloom filter (or the less false positives you will get).

You can read the original paper for the math behind it.

func NewBloomFilter ¶

func NewBloomFilter(size uint, numFunc uint) (*BloomFilter, error)

NewBloomFilter creates a new BloomFilter with a given size and given a number of hash functions. The numFunc will indicate the number of times the default hash function is going to be used to create additional hash functions. The default hash base hash function is the murmur3, any other hash function would've done the job however.

func NewBloomFilterWithHashes ¶

func NewBloomFilterWithHashes(size uint, hashes []HashFunc) (*BloomFilter, error)

NewBloomFilterWithHashes creates a bloom filter with user defined hash functions

func (*BloomFilter) Add ¶

func (bf *BloomFilter) Add(value AsBytes)

Add adds the value to the bloom filter.

The value should implement the AsBytes interface for the hash functions to work.

func (*BloomFilter) Has ¶

func (bf *BloomFilter) Has(value AsBytes) bool

Has returns weather the value element exists in the bloom filter.

If the return value is true, than the value might have been added to the bloom filter otherwise the value has **definitely** never been added

type CountMinSketch ¶

type CountMinSketch struct {
	// contains filtered or unexported fields
}

CountMinSketch is a probabilistic data structure that is used to count the number of occurrences of a given value in a stream of events

The data structure's bias is determined by the width and depth of the table, the bigger they are the better precision you get in point count queries. The we use a linear hash function based on the murmur3 algorithm in this implementation, the algorithm used is not highly important in the functioning of the data structure, that being said of course, using sha1 or some cryptographic hashing algorithm (usually slow) will impact the performance of adding and querying. A count min sketch is biased estimator data structure, it can over estimate (report more elements then the ones added) but it will never under estimate.

func NewCountMinSketch ¶

func NewCountMinSketch(width, depth uint32) *CountMinSketch

NewCountMinSketch will create a count min sketch based of the width and depth given.

The values will determine the size of the underlying table, and the number of hash functions generated, bigger values will give you better estimates, but it will also impact your memory / cpu footprint.

func (*CountMinSketch) Add ¶

func (cm *CountMinSketch) Add(value AsBytes, count uint64)

Add will increment the buckets corresponding to the value by count.

func (*CountMinSketch) Count ¶

func (cm *CountMinSketch) Count(value AsBytes) uint64

Count will return an estimate of the number of values of type value that has been added to the sketch.

The count estimate is always greater than or equal the actual count.

func (*CountMinSketch) Increment ¶

func (cm *CountMinSketch) Increment(value AsBytes)

Increment is equivalent to adding one element of type represented by value.

type HashFunc ¶

type HashFunc func(AsBytes) uint64

HashFunc is any type that can transform an `AsBytes` value into a uint64

func MurmurFrom ¶

func MurmurFrom(value uint64) HashFunc

MurmurFrom constructs a HashFunc from a murmur3 hash function and a multiplication factor This is a trick that can be used to create multiple hash function from the same one.

type HyperLogLog ¶

type HyperLogLog struct {
	// contains filtered or unexported fields
}

HyperLogLog is data structure for estimating cardinality with a accuracy of ` 1 - 1.04 / sqrt(m)` with m defined as `2^b`, the reference is from this paper: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf This being said the accuracy of the data structure is proportional to the value of b.

func NewHyperLogLog ¶

func NewHyperLogLog(b uint) (*HyperLogLog, error)

NewHyperLogLog creates a new HLL based on the the number of counters

func (*HyperLogLog) Add ¶

func (hll *HyperLogLog) Add(value AsBytes)

Add adds the given observable (value) to the HLL instance

func (*HyperLogLog) Cardinality ¶

func (hll *HyperLogLog) Cardinality() uint64

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL