countish

package
v0.1.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 14, 2022 License: AGPL-3.0 Imports: 2 Imported by: 0

README

* Approximate frequency counts over data streams for Go

Countish implements two approximate counting algorithms  outlined in "Approximate Frequency Counts over Data Streams".

http://www.vldb.org/conf/2002/S10P03.pdf


** Use cases

Have you ever needed to do something like calculate the top
URLs or top ips from an infinite stream? This package provides probabalistic
frequency counters, with accuracy guarantees and low memory usage.

countish provides an extremely simple interface consisting of an "Observe" method and
a "ItemsAboveThreshold" method.



Example:

#+BEGIN_SRC bash :exports both
cat urls.txt | go run ./cmd/countish/main.go -threshold .3
#+END_SRC

#+RESULTS:
: 0.428671 /

3 counting implementations are provided.

1) Naive: exact counts are held in a map
2) Lossy: corresponding to "lossy counting"
3) Sticky: corresponding to "sticky sampling"

** Example:

#+BEGIN_SRC go :imports '("github.com/shanemhansen/countish" "fmt") :exports both
  counter := countish.NewLossyCounter(.01, .01)
  for i:=0;i<100;i++ {
      counter.Observe("value")
  }
  counter.Observe("another value")
  // print all items which *might* occur more than 90% of the time,
  // along with their estimated frequency
  entries := counter.ItemsAboveThreshold(.9)
  fmt.Println(entries)
#+END_SRC

#+RESULTS:
: [{value 1.00009900990099}]

** TODO examples showing memory usage comparisons


Documentation

Index

Constants

This section is empty.

Variables

View Source
var Rand = rand.Float64
View Source
var RandCoin = rand.Int31n

Functions

This section is empty.

Types

type Counter

type Counter interface {
	Observe(string)
	ItemsAboveThreshold(float64) []Entry
}

type Entry

type Entry struct {
	Key       string
	Frequency float64
}

type FDeltaPair

type FDeltaPair struct {
	F     float64
	Delta float64
}

type LossyCounter

type LossyCounter struct {
	Support        float64
	ErrorTolerance float64
	D              map[string]FDeltaPair
	N              uint64
	BucketWidth    uint64
}

func NewLossyCounter

func NewLossyCounter(support, errorTolerance float64) *LossyCounter

func (*LossyCounter) ItemsAboveThreshold

func (lc *LossyCounter) ItemsAboveThreshold(threshold float64) []Entry

ItemsAboveThreshold returns a list of items that occur more than threshold, along with their frequencies. threshold is in the range [0,1]

func (*LossyCounter) Observe

func (lc *LossyCounter) Observe(key string)

Observe records a new sample

type NaiveSampler

type NaiveSampler struct {
	N uint64
	// contains filtered or unexported fields
}

func NewNaiveSampler

func NewNaiveSampler() *NaiveSampler

func (*NaiveSampler) ItemsAboveThreshold

func (ns *NaiveSampler) ItemsAboveThreshold(val float64) []Entry

func (*NaiveSampler) Observe

func (ns *NaiveSampler) Observe(key string)

type StickySampler

type StickySampler struct {
	ErrorTolerance  float64
	Support         float64
	S               map[string]float64
	R               float64
	FailureProb     float64
	N               float64
	T               float64
	RequiredSamples int
}

func NewSampler

func NewSampler(Support, ErrorTolerance, FailureProb float64) *StickySampler

func (*StickySampler) ItemsAboveThreshold

func (s *StickySampler) ItemsAboveThreshold(threshold float64) []Entry

ItemsAboveThreshold returns a list of items that occur more than threshold, along with their frequencies. threshold is in the range [0,1]

func (*StickySampler) Observe

func (s *StickySampler) Observe(key string)

Observe records a new sample

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL