Documentation ¶
Overview ¶
Package topk defines a count-min sketch and a cheap probabilistic top-K data structure that uses the count-min sketch to track the top K items in constant memory and O(log(k)) time.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func PickParams ¶
PickParams provides good parameters for 'hashes' and 'buckets' when constructing a CountMinSketch, given an estimated total number of counts (i.e. the sum of all counts ever stored), the error factor ϵ as a float (e.g. 1% is 0.001), and the probability factor δ.
Parameters are chosen such that with a probability of 1−δ, the error is at most ϵ∗totalCount. Or, in other words: if N is the true count of an event, E is the estimate given by a sketch and T the total count of items in the sketch, E ≤ N + T*ϵ with probability (1 - δ).
Types ¶
type CountMinSketch ¶
type CountMinSketch struct {
// contains filtered or unexported fields
}
CountMinSketch implements a count-min sketch, a probabilistic data structure that tracks the frequency of events in a stream of data.
See: https://en.wikipedia.org/wiki/Count%E2%80%93min_sketch
func NewCountMinSketch ¶
func NewCountMinSketch(hashes, buckets int) *CountMinSketch
NewCountMinSketch creates a new CountMinSketch with the provided number of hashes and buckets. Hashes and buckets are often called "depth" and "width", or "d" and "w", respectively.
func (*CountMinSketch) Add ¶
func (cms *CountMinSketch) Add(val []byte) uint64
Add calls AddN(val, 1).
func (*CountMinSketch) AddN ¶
func (cms *CountMinSketch) AddN(val []byte, count uint64) uint64
AddN increments the count for the given value by the provided count, returning the new count.
func (*CountMinSketch) Get ¶
func (cms *CountMinSketch) Get(val []byte) uint64
Get returns the count for the provided value.
type SerializeFunc ¶
HashFunc is responsible for providing a []byte serialization of a value, appended to the provided byte slice. This is used for hashing the value when adding to a CountMinSketch.
type TopK ¶
type TopK[T any] struct { // contains filtered or unexported fields }
TopK is a probabilistic counter of the top K items, using a count-min sketch to keep track of item counts and a heap to track the top K of them.
func New ¶
func New[T any](k int, sf SerializeFunc[T]) *TopK[T]
New creates a new TopK that stores k values. Parameters for the underlying count-min sketch are chosen for a 0.1% error rate and a 0.1% probability of error.
func NewWithParams ¶
func NewWithParams[T any](k int, sf SerializeFunc[T], numHashes, numCols int) *TopK[T]
NewWithParams creates a new TopK that stores k values, and additionally allows customizing the parameters for the underlying count-min sketch.
func (*TopK[T]) AddN ¶
AddN adds the given item to the set with the provided count, returning the new estimated count.