quantile

package
v0.4.0-changelog Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2016 License: MIT Imports: 2 Imported by: 0

Documentation

Overview

Package quantile computes approximate quantiles over an unbounded data stream within low memory and CPU bounds.

A small amount of accuracy is traded to achieve the above properties.

Multiple streams can be merged before calling Query to generate a single set of results. This is meaningful when the streams represent the same type of data. See Merge and Samples.

For more detailed information about the algorithm used, see:

Effective Computation of Biased Quantiles over Data Streams

http://www.cs.rutgers.edu/~muthu/bquant.pdf

Example (MergeMultipleStreams)
package main

import (
	"fmt"

	"github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/beorn7/perks/quantile"
)

func main() {
	// Scenario:
	// We have multiple database shards. On each shard, there is a process
	// collecting query response times from the database logs and inserting
	// them into a Stream (created via NewTargeted(0.90)), much like the
	// Simple example. These processes expose a network interface for us to
	// ask them to serialize and send us the results of their
	// Stream.Samples so we may Merge and Query them.
	//
	// NOTES:
	// * These sample sets are small, allowing us to get them
	// across the network much faster than sending the entire list of data
	// points.
	//
	// * For this to work correctly, we must supply the same quantiles
	// a priori the process collecting the samples supplied to NewTargeted,
	// even if we do not plan to query them all here.
	ch := make(chan quantile.Samples)
	getDBQuerySamples(ch)
	q := quantile.NewTargeted(map[float64]float64{0.90: 0.001})
	for samples := range ch {
		q.Merge(samples)
	}
	fmt.Println("perc90:", q.Query(0.90))
}

// This is a stub for the above example. In reality this would hit the remote
// servers via http or something like it.
func getDBQuerySamples(ch chan quantile.Samples) {}
Output:

Example (Simple)
package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
	"strconv"

	"github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/beorn7/perks/quantile"
)

func main() {
	ch := make(chan float64)
	go sendFloats(ch)

	// Compute the 50th, 90th, and 99th percentile.
	q := quantile.NewTargeted(map[float64]float64{
		0.50: 0.005,
		0.90: 0.001,
		0.99: 0.0001,
	})
	for v := range ch {
		q.Insert(v)
	}

	fmt.Println("perc50:", q.Query(0.50))
	fmt.Println("perc90:", q.Query(0.90))
	fmt.Println("perc99:", q.Query(0.99))
	fmt.Println("count:", q.Count())
}

func sendFloats(ch chan<- float64) {
	f, err := os.Open("exampledata.txt")
	if err != nil {
		log.Fatal(err)
	}
	sc := bufio.NewScanner(f)
	for sc.Scan() {
		b := sc.Bytes()
		v, err := strconv.ParseFloat(string(b), 64)
		if err != nil {
			log.Fatal(err)
		}
		ch <- v
	}
	if sc.Err() != nil {
		log.Fatal(sc.Err())
	}
	close(ch)
}
Output:

perc50: 5
perc90: 16
perc99: 223
count: 2388
Example (Window)
package main

import (
	"time"

	"github.com/ipfs/go-ipfs/Godeps/_workspace/src/github.com/beorn7/perks/quantile"
)

func main() {
	// Scenario: We want the 90th, 95th, and 99th percentiles for each
	// minute.

	ch := make(chan float64)
	go sendStreamValues(ch)

	tick := time.NewTicker(1 * time.Minute)
	q := quantile.NewTargeted(map[float64]float64{
		0.90: 0.001,
		0.95: 0.0005,
		0.99: 0.0001,
	})
	for {
		select {
		case t := <-tick.C:
			flushToDB(t, q.Samples())
			q.Reset()
		case v := <-ch:
			q.Insert(v)
		}
	}
}

func sendStreamValues(ch chan float64) {

}

func flushToDB(t time.Time, samples quantile.Samples) {

}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Sample

type Sample struct {
	Value float64 `json:",string"`
	Width float64 `json:",string"`
	Delta float64 `json:",string"`
}

Sample holds an observed value and meta information for compression. JSON tags have been added for convenience.

type Samples

type Samples []Sample

Samples represents a slice of samples. It implements sort.Interface.

func (Samples) Len

func (a Samples) Len() int

func (Samples) Less

func (a Samples) Less(i, j int) bool

func (Samples) Swap

func (a Samples) Swap(i, j int)

type Stream

type Stream struct {
	// contains filtered or unexported fields
}

Stream computes quantiles for a stream of float64s. It is not thread-safe by design. Take care when using across multiple goroutines.

func NewHighBiased

func NewHighBiased(epsilon float64) *Stream

NewHighBiased returns an initialized Stream for high-biased quantiles (e.g. 0.01, 0.1, 0.5) where the needed quantiles are not known a priori, but error guarantees can still be given even for the higher ranks of the data distribution.

The provided epsilon is a relative error, i.e. the true quantile of a value returned by a query is guaranteed to be within 1-(1±Epsilon)*(1-Quantile).

See http://www.cs.rutgers.edu/~muthu/bquant.pdf for time, space, and error properties.

func NewLowBiased

func NewLowBiased(epsilon float64) *Stream

NewLowBiased returns an initialized Stream for low-biased quantiles (e.g. 0.01, 0.1, 0.5) where the needed quantiles are not known a priori, but error guarantees can still be given even for the lower ranks of the data distribution.

The provided epsilon is a relative error, i.e. the true quantile of a value returned by a query is guaranteed to be within (1±Epsilon)*Quantile.

See http://www.cs.rutgers.edu/~muthu/bquant.pdf for time, space, and error properties.

func NewTargeted

func NewTargeted(targets map[float64]float64) *Stream

NewTargeted returns an initialized Stream concerned with a particular set of quantile values that are supplied a priori. Knowing these a priori reduces space and computation time. The targets map maps the desired quantiles to their absolute errors, i.e. the true quantile of a value returned by a query is guaranteed to be within (Quantile±Epsilon).

See http://www.cs.rutgers.edu/~muthu/bquant.pdf for time, space, and error properties.

func (*Stream) Count

func (s *Stream) Count() int

Count returns the total number of samples observed in the stream since initialization.

func (*Stream) Insert

func (s *Stream) Insert(v float64)

Insert inserts v into the stream.

func (*Stream) Merge

func (s *Stream) Merge(samples Samples)

Merge merges samples into the underlying streams samples. This is handy when merging multiple streams from separate threads, database shards, etc.

ATTENTION: This method is broken and does not yield correct results. The underlying algorithm is not capable of merging streams correctly.

func (*Stream) Query

func (s *Stream) Query(q float64) float64

Query returns the computed qth percentiles value. If s was created with NewTargeted, and q is not in the set of quantiles provided a priori, Query will return an unspecified result.

func (*Stream) Reset

func (s *Stream) Reset()

Reset reinitializes and clears the list reusing the samples buffer memory.

func (*Stream) Samples

func (s *Stream) Samples() Samples

Samples returns stream samples held by s.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL