quantile

package
v0.0.0-...-65ad924 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2017 License: Apache-2.0, BSD-2-Clause Imports: 2 Imported by: 0

README

Streaming Quantile Estimator

Implements ideas found in Effective Computation of Biased Quantiles over Data Streams (Cormode, Korn, Muthukrishnan, Srivastava) to provide a space and time efficient estimator for streaming quantile estimation.

Build Status

Documentation

Overview

Package quantile implements a streaming quantile estimator. The implementation is based on "Effective Computation of Biased Quantiles over Data Streams" (Cormode, Korn, Muthukrishnan, Srivastava) to provide a space and time efficient estimator for online quantile estimation.

For the normal distribution of 10^9 elements, a tolerance for 0.99th percentile at 0.001 uses under 1000 bins at 32 bytes per bin.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Estimate

type Estimate interface {
	// Delta calculates the acceptable difference in ranks between two values.
	// It is used to remove redundant values during compression.
	Delta(rank, observations float64) float64
}

func Known

func Known(quantile, tolerance float64) Estimate

Known produces a optimal space usage for estimations at the given quantile and error tolerance.

Quantiles not known ahead of time can also be queried, but at a lower accuracy.

func Unknown

func Unknown(tolerance float64) Estimate

Unknown produces estimations for all possible quantiles at this error tolerance. It uses significantly more space and time than when you know the quantiles you wish to estimate.

The Known estimation should be used when you know which quantiles you will be querying.

type Estimator

type Estimator struct {
	// contains filtered or unexported fields
}

func New

func New(invariants ...Estimate) *Estimator

New allocates a new estimator tolerating the minimum of the invariants provided.

When you know how much error you can tolerate in the quantiles you will query, use a Known estimation for each quantile you will query. For example:

quantile.New(quantile.Known(0.50, 0.01), quantile.Known(0.95, 0.001), quantile.Known(0.99, 0.0005))

When you will query for multiple different quantiles, and know the error tolerance, use the Bias invariant. For example:

quantile.New(quantile.Unknown(0.01))

Targeted estimators consume significantly less resources than Biased estimators.

Passing no parameters will create an estimator that has a tolerance of 0.1, equivalent to:

quantile.New(quantile.Unknown(0.1))

Estimators are not safe to use from multiple goroutines.

func (*Estimator) Add

func (est *Estimator) Add(value float64)

Add buffers a new sample, committing and compressing the data structure when the buffer is full.

func (*Estimator) Get

func (est *Estimator) Get(quantile float64) float64

Get finds a value within (quantile - tolerance) * n <= value <= (quantile + tolerance) * n or 0 if no values have been observed.

func (*Estimator) Samples

func (est *Estimator) Samples() int

Samples returns the number of values this estimator has sampled.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL