tdigest

package module
v0.0.0-...-9165f38 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 22, 2019 License: Apache-2.0 Imports: 6 Imported by: 0

README

tdigest

Documentation

This is a concurrent go implementation of the t-digest data structure for streaming quantile estimation. The code implements the zero-allocation, merging algorithm from Ted Dunning's paper (here).

It utilizes the later iteration of the scale functions, currently just exposing scale function k_2 from the paper.

The implementation strives to make concurrent writes cheap. In the common case a write needs only increment an atomic and write two floats to a buffer. Occasionlly, when the buffer fills, a caller will have to perform the merge operation.

TODOs

  • Provide encoding functionality
  • Benchmark against HDR Histogram
  • Evaluate accuracy against HDR Histogram at reasonable settings
  • Implement more scale functions
    • k_3
    • k_0 (uniform weight)
  • Describe use cases, comparisons, trade-offs
  • Implement trimmed mean

Documentation

Overview

Package tdigest provides a concurrent, streaming quantiles estimation data structure for float64 data.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AddToer

type AddToer interface {
	AddTo(Recorder)
}

AddToer allows a sketch to be added to another.

type CompressionSizer

type CompressionSizer interface {
	// CompressionSize is the maximum number of centroids which a TDigest will
	// store when fully compressed. If the TDigest is using the weightLimit
	// heuristic then this is a target, not an upper bound.
	CompressionSize() int
}

CompressionSizer is an interface to expose the Compression factor for a tdigest.

type Concurrent

type Concurrent struct {
	// contains filtered or unexported fields
}

Concurrent approximates a distribution of floating point numbers. All methods are safe to be called concurrently.

Design

The data structure is designed to maintain most of its state in a single slice The total in-memory size of a Concurrent is

(1+BufferFactor)*(int(Compression)+1)

The data structure does not allocates memory after its construction.

func NewConcurrent

func NewConcurrent(options ...Option) *Concurrent

NewConcurrent creates a new Concurrent.

func (*Concurrent) Add

func (td *Concurrent) Add(mean, count float64)

Add adds the provided data to the Concurrent.

func (*Concurrent) AddTo

func (td *Concurrent) AddTo(into Recorder)

AddTo adds the currently recorded data into the provided Recorder.

func (*Concurrent) Clear

func (td *Concurrent) Clear()

Clear resets the data structure, clearing all recorded data.

func (*Concurrent) CompressionSize

func (td *Concurrent) CompressionSize() int

CompressionSize is the maximum number of centroids which a TDigest will store when fully compressed. If the TDigest is using the weightLimit heuristic then this is a target, not an upper bound.

func (*Concurrent) Decay

func (td *Concurrent) Decay(factor float64)

Decay decreases the weight of all tracked centroids by factor.

func (*Concurrent) InnerMean

func (td *Concurrent) InnerMean(inner float64) (c float64)

InnerMean returns the mean of the inner quantile range. It requires flushing the buffer then is an O(n) operation on the number of centroids.

func (*Concurrent) QuantileOf

func (td *Concurrent) QuantileOf(v float64) (q float64)

QuantileOf returns the estimated quantile at which this value falls in the distribution. If the v is smaller than any recorded value 0.0 will be returned and if v is larger than any recorded value 1.0 will be returned. An empty Concurrent will return 0.0 for all values.

func (*Concurrent) Read

func (td *Concurrent) Read(f func(d Reader))

Read enables clients to perform a number of read operations on a snapshot of the data.

func (*Concurrent) ReadStale

func (td *Concurrent) ReadStale(f func(d Reader))

ReadStale enables clients to perform a number of read operations on a snapshot of the data without forcing a compression of buffered data.

func (*Concurrent) Record

func (td *Concurrent) Record(mean float64)

Record records adds a value with a count of 1.

func (*Concurrent) String

func (td *Concurrent) String() (s string)

func (*Concurrent) TotalCount

func (td *Concurrent) TotalCount() (c float64)

TotalCount returns the total count that has been added to the Concurrent.

func (*Concurrent) TotalSum

func (td *Concurrent) TotalSum() (s float64)

TotalSum returns the approximation of the weighted sum of all values recorded in to the sketch.

func (*Concurrent) TrimmedMean

func (td *Concurrent) TrimmedMean(lo, hi float64) (c float64)

TrimmedMean returns the mean of the inner quantile range from lo to hi. It requires flushing the buffer then is an O(n) operation.

func (*Concurrent) ValueAt

func (td *Concurrent) ValueAt(q float64) (v float64)

ValueAt returns the value of the quantile q. If q is not in [0, 1], ValueAt will panic. An empty Concurrent will return 0.

type Option

type Option interface {
	// contains filtered or unexported methods
}

Option configures a TDigest.

func BufferFactor

func BufferFactor(factor int) Option

BufferFactor configures the size of the buffer for uncompressed data. The default value is 5.

func Compression

func Compression(compression float64) Option

Compression controls the maximum number of centroids.

func UseWeightLimit

func UseWeightLimit(useWeightLimit bool) Option

UseWeightLimit enables the weightLimit compression heuristic. It is cheaper to compute but does not provide a strict upper bound on the total compressed size of the TDigest.

type Reader

type Reader interface {
	InnerMean(innerQ float64) float64
	TrimmedMean(lo, hi float64) float64
	TotalCount() float64
	TotalSum() float64
	ValueAt(q float64) (v float64)
	QuantileOf(v float64) (q float64)
}

Reader provides read access to a float64 valued distribution by quantile or by value.

type Recorder

type Recorder interface {
	Add(mean, count float64)
}

Recorder is the write interface to a Sketch.

type Sketch

type Sketch interface {
	Reader
	Recorder
}

Sketch is an

type TDigest

type TDigest struct {
	// contains filtered or unexported fields
}

func New

func New(options ...Option) *TDigest

New creates a new Concurrent.

func (*TDigest) Add

func (td *TDigest) Add(mean, count float64)

Add adds data to the TDigest with the provided mean and count.

func (*TDigest) AddTo

func (td *TDigest) AddTo(into Recorder)

AddTo adds the data from td into the provided Recorder.

func (*TDigest) Clear

func (td *TDigest) Clear()

func (*TDigest) CompressionSize

func (td *TDigest) CompressionSize() int

CompressionSize is the maximum number of centroids which a TDigest will store when fully compressed.

func (*TDigest) InnerMean

func (td *TDigest) InnerMean(inner float64) (c float64)

InnerMean returns the mean of the inner quantile range. It requires flushing the buffer then is an O(n) operation.

func (*TDigest) QuantileOf

func (td *TDigest) QuantileOf(v float64) (q float64)

QuantileOf returns the estimated quantile at which this value falls in the distribution. If the v is smaller than any recorded value 0.0 will be returned and if v is larger than any recorded value 1.0 will be returned. An empty Concurrent will return 0.0 for all values.

func (*TDigest) Record

func (td *TDigest) Record(mean float64)

Record is a shorthand for td.Add(mean, 1).

func (*TDigest) String

func (td *TDigest) String() string

func (*TDigest) TotalCount

func (td *TDigest) TotalCount() (c float64)

TotalCount returns the total amount of count which has been added to td. It requires flushing the buffer then is an O(1) operation.

func (*TDigest) TotalSum

func (td *TDigest) TotalSum() float64

TotalSum returns the total amount of data added to the TDigest weighted by its associated count.

func (*TDigest) TrimmedMean

func (td *TDigest) TrimmedMean(lo, hi float64) (c float64)

TrimmedMean returns the mean of values which lie in the quantile range between lo and hi. It requires flushing the buffer then is an O(n) operation.

func (*TDigest) ValueAt

func (td *TDigest) ValueAt(q float64) (v float64)

Directories

Path Synopsis
internal
scale
package scale provides functionality to control the scaling of tdigests
package scale provides functionality to control the scaling of tdigests
tdigest
Package tdigest contains the low-level algorithms to compress and query lists of centroids.
Package tdigest contains the low-level algorithms to compress and query lists of centroids.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL