sbitmap

package module
v0.0.0-...-ed102ce Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 1, 2015 License: MIT Imports: 3 Imported by: 0

README

s-bitmap

S-Bitmap: Distinct Counting with a Self-Learning Bitmap

The S-bitmap is a bitmap obtained via a novel adaptive sampling process, where the bits corresponding to the sampled items are set to 1, and the sampling rates are learned from the number of distinct items already passed and reduced sequentially as more bits are set to 1. A unique property of S-bitmap is that its relative estimation error is truly stabilized, i.e. invariant to unknown cardinalities in a prescribed range. This paper demonstrates through both theoretical and empirical studies that with a given memory requirement, S-bitmap is not only uniformly reliable but more accurate than state-of-the-art algorithms such as the multiresolution bitmap and HyperLogLog algorithms (not HyperLogLog++) under common practice settings.

For details about the algorithm and citations please use this article for now: "Distinct Counting with a Self-Learning Bitmap" by Aiyou Chen and Jin Cao

Note:

A portion of this code has been translated from the following implementation

##Example usage:


import "github.com/seiflotfy/s-bitmap"

sb := sbitmap.NewDefault()

// Add 1000 different strings
for i:=0; i<1000; i++ {
	sb.Update([]byte{"foobar" + strconv.Itoa(i)})
}


count := sb.Estimate()
// count ~= 1000 (+/- 0.8%)

##Todo:

  • Add Merge functionality
  • Add Serialization

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Sbitmap

type Sbitmap struct {
	V *bitset.BitSet
	B uint64
	L uint64
	// contains filtered or unexported fields
}

Sbitmap represents a Self learning Bitmap structure

func New

func New(nmax uint64, err float64) *Sbitmap

New returns a Sbitmap with for the max cardinality nmax with an errorRate err

func NewDefault

func NewDefault() *Sbitmap

NewDefault returns a Sbitmap with for the max cardinality nmax math.MaxUint64 with an errorRate of 0.008

func (*Sbitmap) Contains

func (s *Sbitmap) Contains(item []byte) bool

Contains returns membership of item in Sbitmap (can result in false positive)

func (*Sbitmap) Estimate

func (s *Sbitmap) Estimate() float64

Estimate returns the estimated cardinality of the Sbitmap

func (*Sbitmap) Update

func (s *Sbitmap) Update(item []byte)

Update adds item to the Sbitmap

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL