Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func MurMurHash ¶
Types ¶
type AsBytes ¶
type AsBytes interface {
Bytes() []byte
}
AsBytes is an abstraction needed by the maybe library component to indicate anytype that can be transformed to a series of bytes, so that we can apply a hash function to it.
type BloomFilter ¶
type BloomFilter struct {
// contains filtered or unexported fields
}
BloomFilter is a probabilistic data structure to test weather a given value is a member of a given set (represented by this bloom filter instance).
The main advantage of this data structure is the low memory footprint, it can be used to do set membership on a very large multiset with very low memory footprint (in the order of `Kb`) as opposed to storing this in a set which can have a memory footprint in the order of `Mg` or `Gb`. Of course this comes with the additional tradeoff of certainty, if a bloom filter `Has` method returns `true` this means that the element is already a member of this set with a certainty of `x%` as opposed to the `100%` given by a set-like data structure. however if a `Has` call on a bloom filter returns `false` then the element is not in the set `100%`.
The certainty factor is determined by the size of the bitset in the internal implementation, and the number of hash functions you apply to the value, the corollary is this: the bigger the bitset and the more hash functions you have the more accurate the result of the bloom filter (or the less false positives you will get).
You can read the original paper for the math behind it.
func NewBloomFilter ¶
func NewBloomFilter(size uint, numFunc uint) (*BloomFilter, error)
NewBloomFilter creates a new BloomFilter with a given size and given a number of hash functions. The numFunc will indicate the number of times the default hash function is going to be used to create additional hash functions. The default hash base hash function is the murmur3, any other hash function would've done the job however.
func NewBloomFilterWithHashes ¶
func NewBloomFilterWithHashes(size uint, hashes []HashFunc) (*BloomFilter, error)
NewBloomFilterWithHashes creates a bloom filter with user defined hash functions
func (*BloomFilter) Add ¶
func (bf *BloomFilter) Add(value AsBytes)
Add adds the value to the bloom filter.
The value should implement the AsBytes interface for the hash functions to work.
func (*BloomFilter) Has ¶
func (bf *BloomFilter) Has(value AsBytes) bool
Has returns weather the value element exists in the bloom filter.
If the return value is true, than the value might have been added to the bloom filter otherwise the value has **definitely** never been added
type CountMinSketch ¶
type CountMinSketch struct {
// contains filtered or unexported fields
}
CountMinSketch is a probabilistic data structure that is used to count the number of occurrences of a given value in a stream of events
The data structure's bias is determined by the width and depth of the table, the bigger they are the better precision you get in point count queries. The we use a linear hash function based on the murmur3 algorithm in this implementation, the algorithm used is not highly important in the functioning of the data structure, that being said of course, using sha1 or some cryptographic hashing algorithm (usually slow) will impact the performance of adding and querying. A count min sketch is biased estimator data structure, it can over estimate (report more elements then the ones added) but it will never under estimate.
func NewCountMinSketch ¶
func NewCountMinSketch(width, depth uint32) *CountMinSketch
NewCountMinSketch will create a count min sketch based of the width and depth given.
The values will determine the size of the underlying table, and the number of hash functions generated, bigger values will give you better estimates, but it will also impact your memory / cpu footprint.
func (*CountMinSketch) Add ¶
func (cm *CountMinSketch) Add(value AsBytes, count uint64)
Add will increment the buckets corresponding to the value by count.
func (*CountMinSketch) Count ¶
func (cm *CountMinSketch) Count(value AsBytes) uint64
Count will return an estimate of the number of values of type value that has been added to the sketch.
The count estimate is always greater than or equal the actual count.
func (*CountMinSketch) Increment ¶
func (cm *CountMinSketch) Increment(value AsBytes)
Increment is equivalent to adding one element of type represented by value.
type HashFunc ¶
HashFunc is any type that can transform an `AsBytes` value into a uint64
func MurmurFrom ¶
MurmurFrom constructs a HashFunc from a murmur3 hash function and a multiplication factor This is a trick that can be used to create multiple hash function from the same one.
type HyperLogLog ¶
type HyperLogLog struct {
// contains filtered or unexported fields
}
HyperLogLog is data structure for estimating cardinality with a accuracy of ` 1 - 1.04 / sqrt(m)` with m defined as `2^b`, the reference is from this paper: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf This being said the accuracy of the data structure is proportional to the value of b.
func NewHyperLogLog ¶
func NewHyperLogLog(b uint) (*HyperLogLog, error)
NewHyperLogLog creates a new HLL based on the the number of counters
func (*HyperLogLog) Add ¶
func (hll *HyperLogLog) Add(value AsBytes)
Add adds the given observable (value) to the HLL instance
func (*HyperLogLog) Cardinality ¶
func (hll *HyperLogLog) Cardinality() uint64