tdigest

package
v4.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 4, 2021 License: Apache-2.0, MIT Imports: 7 Imported by: 0

README

T-Digest

A map-reduce and parallel streaming friendly data-structure for accurate quantile approximation.

This package provides a very crude implementation of Ted Dunning's t-digest data structure in Go.

Build Status GoDoc Coverage Go Report Card

Installation

go get github.com/caio/go-tdigest

Usage

package main

import (
        "fmt"
        "math/rand"

        "github.com/caio/go-tdigest"
)

func main() {
        var t = tdigest.New(100)

        for i := 0; i < 10000; i++ {
                t.Add(rand.Float64(), 1)
        }

        fmt.Printf("p(.5) = %.6f\n", t.Quantile(0.5))
}

Disclaimer

I've written this solely with the purpose of understanding how the data-structure works, it hasn't been throughly verified nor battle tested in a production environment.

References

This is a very simple port of the reference implementation with some ideas borrowed from the python version. If you wanna get a quick grasp of how it works and why it's useful, this video and companion article is pretty helpful.

Documentation

Overview

Package tdigest provides a highly accurate mergeable data-structure for quantile estimation.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type TDigest

type TDigest struct {
	// contains filtered or unexported fields
}

TDigest is a quantile approximation data structure. Typical T-Digest use cases involve accumulating metrics on several distinct nodes of a cluster and then merging them together to get a system-wide quantile overview. Things such as: sensory data from IoT devices, quantiles over enormous document datasets (think ElasticSearch), performance metrics for distributed systems, etc.

func FromBytes

func FromBytes(buf *bytes.Reader) (*TDigest, error)

FromBytes reads a byte buffer with a serialized digest (from AsBytes) and deserializes it.

func New

func New(compression float64) *TDigest

New creates a new digest. The compression parameter rules the threshold in which samples are merged together - the more often distinct samples are merged the more precision is lost. Compression should be tuned according to your data distribution, but a value of 100 is often good enough. A higher compression value means holding more centroids in memory (thus: better precision), which means a bigger serialization payload and higher memory footprint. Compression must be a value greater of equal to 1, will panic otherwise.

func (*TDigest) Add

func (t *TDigest) Add(value float64, count uint32) error

Add registers a new sample in the digest. It's the main entry point for the digest and very likely the only method to be used for collecting samples. The count parameter is for when you are registering a sample that occurred multiple times - the most common value for this is 1.

func (TDigest) AsBytes

func (t TDigest) AsBytes() ([]byte, error)

AsBytes serializes the digest into a byte array so it can be saved to disk or sent over the wire.

func (*TDigest) Compress

func (t *TDigest) Compress()

Compress tries to reduce the number of individual centroids stored in the digest. Compression trades off accuracy for performance and happens automatically after a certain amount of distinct samples have been stored.

func (*TDigest) ForEachCentroid

func (t *TDigest) ForEachCentroid(f func(mean float64, count uint32) bool)

ForEachCentroid calls the specified function for each centroid. Iteration stops when the supplied function returns false, or when all centroids have been iterated.

func (*TDigest) Len

func (t *TDigest) Len() int

Len returns the number of centroids in the TDigest.

func (*TDigest) Merge

func (t *TDigest) Merge(other *TDigest)

Merge joins a given digest into itself. Merging is useful when you have multiple TDigest instances running in separate threads and you want to compute quantiles over all the samples. This is particularly important on a scatter-gather/map-reduce scenario.

func (*TDigest) Quantile

func (t *TDigest) Quantile(q float64) float64

Quantile returns the desired percentile estimation. Values of p must be between 0 and 1 (inclusive), will panic otherwise.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL