Documentation ¶
Overview ¶
Package sketchy provides count- and rate-tracking sketches. These are probabilistic data structures that can track the occurrences of a very large number of distinct keys using a relatively small amount of space. The resulting counts/rates are estimates with a guaranteed level of accuracy.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( DefaultEpsilon = 0.999 DefaultDelta = 0.99 )
Functions ¶
This section is empty.
Types ¶
type CountSketch ¶
type CountSketch interface { // Count adds delta to the count of occurrences of the given key. // Returns the updated estimated count. Count(key []byte, delta int) uint64 // Query returns the estimated count of the given key. Query(key []byte) uint64 }
A Sketch counts occurrences of keys and returns approximate total counts.
func NewSketch ¶
func NewSketch(epsilon, delta float64) CountSketch
NewSketch returns a new, empty count-min sketch with the given parameters. The values of epsilon and delta determine the desired accuracy of the sketch. Queries for the count of observations by a particular key by this sketch will be within a factor of epsilon of the true count, with probability delta.
The closer these parameters are to 1, the greater the storage and computation cost. The value of delta determines how many hashes we must compute for each key (and how many counters we must inspect to answer a count query). The bucket uses ceil(log(1 / (1-delta))) hashes. The value of epsilon determines the size of the domain we map these hashes to. The size of the domain is e / (1-epsilon). So, for epsilon=0.999, delta=0.99, we would store 2719 counters for each of five hash values.
type RateSketch ¶
type RateSketch interface { // Count records delta occurrences of key, returning the updated observed // rate over the given interval. If interval is smaller than time.Second, // or the available data covers less than a second, then 0 is returned. Count(key []byte, delta int, interval time.Duration) float64 // Query returns the observed rate of the given key over the given interval. // If interval is smaller than time.Second, or the available data covers // less than a second, then 0 is returned. Query(key []byte, interval time.Duration) float64 }
Counter provides an interface for tracking the rate at which keys are observed.
func RollingCounter ¶
func RollingCounter(epsilon, delta float64, interval time.Duration, num int) RateSketch
RollingCounter maintains a series of count-min sketches to count events in time-based buckets. Counts are always applied to the "current" bucket (which is reinitialized as needed). Rate queries can use multiple buckets to account for the interval of the query.
The interval given to this constructor specifies the maximum duration of each bucket. If an event comes in beyond the current bucket's duration, then a new bucket is created. If the maximum number of buckets (given by num) is exceeded, then the oldest bucket is forgotten.
func RollupCounter ¶
func RollupCounter(epsilon, delta float64, durations ...time.Duration) RateSketch