approx

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 27, 2024 License: MIT Imports: 8 Imported by: 0

README

��

kelindar/approx
Go Version PkgGoDev Go Report Card License Coverage

Probabilistic Data Structures

This Go package provides several data structures for approximate counting and frequency estimation, leveraging Morris's algorithm and other techniques for efficient, probabilistic computations. It is suitable for applications where exact counts are unnecessary or where memory efficiency is paramount. The package includes implementations for 8-bit and 16-bit counters, Count-Min Sketch, and a Top-K frequent elements tracker.

  • 8-bit and 16-bit Approximate Counters: Implements Morris's algorithm to provide approximate counts with low memory usage. Tuned to count up to ~100k (8-bit) and ~2 billion (16-bit) with acceptable error rates.
  • Count-Min Sketch: A probabilistic data structure for estimating the frequency of events in a stream of data. It allows users to specify desired accuracy and confidence levels.
  • Top-K Frequent Elements Tracking: Maintains a list of the top-K frequent elements in a stream, using a combination of Count-Min Sketch for frequency estimation and a min-heap for maintaining the top elements.

Advantages

  • Memory Efficiency: The package uses probabilistic data structures which offer significant memory savings compared to exact counting methods, especially beneficial for large datasets or streams.
  • Performance: Incorporates fast, thread-local random number generation and efficient hash functions, optimizing performance for high-throughput applications.
  • Scalability: Suitable for scaling to large datasets with minimal computational and memory footprint increases.
  • Thread-Safety: Features such as atomic operations ensure thread safety, making the package suitable for concurrent applications.

Disadvantages

  • Probabilistic Nature: As the package relies on probabilistic algorithms, there is an inherent trade-off between memory usage and accuracy. Exact counts are not guaranteed, and there is a specified error margin.

Usage

8-bit and 16-bit Counters

Instantiate a counter and use the Increment method to increase its value probabilistically. The Estimate method returns the current approximate count.

var counter approx.Count8 // or approx.Count16 for 16-bit
counter.Increment()
fmt.Println(counter.Estimate())
Count-Min Sketch

Create a new Count-Min Sketch with default or custom parameters. Update the sketch with observed items and query their estimated frequencies.

cms, err := approx.NewCountMin()
if err != nil {
    log.Fatal(err)
}
cms.UpdateString("example_item")
fmt.Println(cms.CountString("example_item"))
Top-K Frequent Elements

Track the top-K frequent elements in a stream by creating a TopK instance and updating it with observed items.

topK, err := approx.NewTopK(10)
if err != nil {
    log.Fatal(err)
}
topK.UpdateString("example_item")
for _, v := range topK.Values() {
    fmt.Printf("Value: %s, Count: %d\n", v.Value, v.Count)
}

Dependencies

This package depends on external libraries for hashing and HyperLogLog implementations:

  • github.com/zeebo/xxh3: For fast hashing.
  • github.com/axiomhq/hyperloglog: For cardinality estimation in the Top-K tracker.

License

Please review the license agreement for this package before using it in your projects.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Count16

type Count16 uint16

Count16 is a 16-bit counter that uses Morris's algorithm to estimate the count. The counter was tuned to count up to ~2 billion with relatively low mean error rate of around ~0.50%.

func (Count16) Estimate

func (c Count16) Estimate() uint

Estimate returns the estimated count

func (*Count16) Increment

func (c *Count16) Increment() uint

Increment increments the counter

type Count16x4

type Count16x4 struct {
	// contains filtered or unexported fields
}

Count16x4 is a represents 4 16-bit approximate counters, using atomic operations to increment the counter.

func (*Count16x4) Estimate

func (c *Count16x4) Estimate() [4]uint

Estimate returns the estimated count for all counters.

func (*Count16x4) EstimateAt

func (c *Count16x4) EstimateAt(i int) uint

EstimateAt returns the estimated count for the counter at the given index.

func (*Count16x4) IncrementAt

func (c *Count16x4) IncrementAt(i int) bool

IncrementAt increments the counter at the given index. It returns true if the counter estimate was updated.

func (*Count16x4) Reset

func (c *Count16x4) Reset() [4]uint

Reset resets the counter to zero. It returns the estimated count for all counters.

type Count8

type Count8 uint8

Count8 is a 8-bit counter that uses Morris's algorithm to estimate the count. The counter was tuned to count up to ~100k with relatively mean error rate of around ~10%.

func (Count8) Estimate

func (c Count8) Estimate() uint

Estimate returns the estimated count

func (*Count8) Increment

func (c *Count8) Increment() uint

Increment increments the counter

type CountMin

type CountMin struct {
	// contains filtered or unexported fields
}

CountMin is a sketch data structure for estimating the frequency of items in a stream

func NewCountMin

func NewCountMin() (*CountMin, error)

NewCountMin creates a new CountMin sketch with default epsilon and confidence

func NewCountMinWithEstimates

func NewCountMinWithEstimates(epsilon, confidence float64) (*CountMin, error)

NewCountMinWithEpsilon creates a new CountMin sketch with the given epsilon and delta. The epsilon parameter controls the accuracy of the estimates, and the confidence parameter controls the probability that the estimates are within the specified error bounds.

func NewCountMinWithSize

func NewCountMinWithSize(depth, width uint) (*CountMin, error)

NewCountMinWithSize creates a new CountMin sketch with the given depth and width

func (*CountMin) Count

func (c *CountMin) Count(item []byte) uint

Count returns the estimated frequency of the given item

func (*CountMin) CountHash

func (c *CountMin) CountHash(hash uint64) uint

CountHash returns the estimated frequency of the given item

func (*CountMin) CountString

func (c *CountMin) CountString(item string) uint

CountString returns the estimated frequency of the given item

func (*CountMin) Reset

func (c *CountMin) Reset()

Reset sets all counters to zero

func (*CountMin) Update

func (c *CountMin) Update(item []byte) bool

Update increments the counter for the given item

func (*CountMin) UpdateHash

func (c *CountMin) UpdateHash(hash uint64) (updated bool)

UpdateHash increments the counter for the given item

func (*CountMin) UpdateString

func (c *CountMin) UpdateString(item string) bool

UpdateString increments the counter for the given item

type TopK

type TopK struct {
	// contains filtered or unexported fields
}

TopK uses a Count-Min Sketch to calculate the top-K frequent elements in a stream.

func NewTopK

func NewTopK(k uint) (*TopK, error)

NewTopK creates a new structure to track the top-k elements in a stream. The k parameter specifies the number of elements to track.

func (*TopK) Cardinality

func (t *TopK) Cardinality() uint

Cardinality returns the estimated cardinality of the stream.

func (*TopK) Reset

func (t *TopK) Reset(k int) ([]TopValue, uint)

Reset restores the TopK to its original state. The function returns the top-k elements and their counts as well as the estimated cardinality of the stream.

func (*TopK) Update

func (t *TopK) Update(value string)

Update adds the binary value to Count-Min Sketch and updates the top-k elements.

func (*TopK) Values

func (t *TopK) Values() []TopValue

Values returns the top-k elements from lowest to highest frequency.

type TopValue

type TopValue struct {
	Value string `json:"value"` // The associated value
	Count uint32 `json:"count"` // The count of the value
	// contains filtered or unexported fields
}

TopValue represents a value and its associated count.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL