cml

package module
v0.0.0-...-2e7f2fb Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 27, 2015 License: MIT Imports: 6 Imported by: 0

README

Count-Min-Log

GoDoc

Count-Min-Log sketch: Approximately counting with approximate counters - Guillaume Pitel & Geoffroy Fouquier

TL;DR: Count-Min-Log Sketch for improved Average Relative Error on low frequency events

Count-Min Sketch is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested in the low-frequency items, such as in text- mining related tasks. Several variants of CMS have been proposed to compensate for the high relative error for low-frequency events, but the proposed solutions tend to correct the errors instead of preventing them. In this paper, we propose the Count-Min-Log sketch, which uses logarithm-based, approximate counters instead of linear counters to improve the average relative error of CMS at constant memory footprint.

Example Usage

8-bit version

import cml

...

sk, err := cml.NewDefaultSketch8()
sk.IncreaseCount([]byte("scott pilgrim"))
...

sk.GetCount("scott pilgrim")

// >> 1

16-bit version

import cml

...

sk, err := cml.NewDefaultSketch16()
sk.IncreaseCount([]byte("scott pilgrim"))
...

sk.GetCount("scott pilgrim")

// >> 1

Documentation

Overview

Package cml implments the Count-Min-Log datastructure.

It is based on Count-Min-Log sketch: Approximately counting with approximate counters - Guillaume Pitel & Geoffroy Fouquier

http://iswag-symposium.org/2015/pdfs/shortpaper1.pdf

TL;DR: Count-Min-Log Sketch for improved Average Relative Error on low frequency events

Count-Min Sketch is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested in the low-frequency items, such as in text- mining related tasks. Several variants of CMS have been proposed to compensate for the high relative error for low-frequency events, but the proposed solutions tend to correct the errors instead of preventing them. In this paper, we propose the Count-Min-Log sketch, which uses logarithm-based, approximate counters instead of linear counters to improve the average relative error of CMS at constant memory footprint.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Sketch

type Sketch struct {
	// contains filtered or unexported fields
}

Sketch is a Count-Min-Log sketch 16-bit registers

func NewDefaultSketch

func NewDefaultSketch() (*Sketch, error)

NewDefaultSketch returns a new Count-Min-Log sketch with 16-bit registers and default settings

func NewForCapacity16

func NewForCapacity16(capacity uint64, e float64) (*Sketch, error)

NewForCapacity16 returns a new Count-Min-Log sketch with 16-bit registers optimized for a given max capacity and expected error rate

func NewSketch

func NewSketch(w uint, k uint, conservative bool, exp float64,
	maxSample bool, progressive bool, nBits uint) (*Sketch, error)

NewSketch returns a new Count-Min-Log sketch with 16-bit registers

func NewSketchForEpsilonDelta

func NewSketchForEpsilonDelta(epsilon, delta float64) (*Sketch, error)

NewSketchForEpsilonDelta ...

func Unmarshal16

func Unmarshal16(b []byte) (*Sketch, error)

Unmarshal16 returns a Sketch from an serialized byte array

func (*Sketch) Frequency

func (sk *Sketch) Frequency(s []byte) float64

Frequency returns the count of `s`

func (*Sketch) GetFillRate

func (sk *Sketch) GetFillRate() float64

GetFillRate ...

func (*Sketch) IncreaseCount

func (sk *Sketch) IncreaseCount(s []byte) bool

IncreaseCount increases the count of `s` by one, return true if added and the current count of `s`

func (*Sketch) Marshal

func (sk *Sketch) Marshal() ([]byte, error)

Marshal returns a serialized byte array representing the structure

func (*Sketch) Probability

func (sk *Sketch) Probability(s []byte) float64

Probability returns the error probability of `s`

func (*Sketch) Reset

func (sk *Sketch) Reset()

Reset the Sketch to a fresh state (all counters set to 0)

func (*Sketch) TotalCount

func (sk *Sketch) TotalCount() uint

TotalCount returns total count of samples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL