frequencies

package
v0.0.0-...-3e17171 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 22, 2024 License: Apache-2.0 Imports: 11 Imported by: 0

Documentation

Overview

Package frequencies is dedicated to streaming algorithms that enable estimation of the frequency of occurrence of items in a weighted multiset stream of items. If the frequency distribution of items is sufficiently skewed, these algorithms are very useful in identifying the "Heavy Hitters" that occurred most frequently in the stream. The accuracy of the estimation of the frequency of an item has well understood error bounds that can be returned by the sketch.

These algorithms are sometimes referred to as "TopN" algorithms.

Index

Constants

This section is empty.

Variables

View Source
var ErrorTypeEnum = &errorTypes{
	NoFalsePositives: errorType{
		id:   1,
		Name: "NO_FALSE_POSITIVES",
	},
	NoFalseNegatives: errorType{
		id:   2,
		Name: "NO_FALSE_NEGATIVES",
	},
}

Functions

func GetAprioriErrorFrequencyItemsSketch

func GetAprioriErrorFrequencyItemsSketch(maxMapSize int, estimatedTotalStreamWeight int64) (float64, error)

GetAprioriErrorFrequencyItemsSketch returns the estimated a priori error given the maxMapSize for the sketch and the estimatedTotalStreamWeight.

maxMapSize is the planned map size to be used when constructing this sketch. estimatedTotalStreamWeight is the estimated total stream weight.

func GetAprioriErrorLongsSketch

func GetAprioriErrorLongsSketch(maxMapSize int, estimatedTotalStreamWeight int64) (float64, error)

GetAprioriErrorLongsSketch returns the estimated a priori error given the maxMapSize for the sketch and the estimatedTotalStreamWeight.

maxMapSize is the planned map size to be used when constructing this sketch. estimatedTotalStreamWeight is the estimated total stream weight.

func GetEpsilonFrequencyItemsSketch

func GetEpsilonFrequencyItemsSketch(maxMapSize int) (float64, error)

GetEpsilonFrequencyItemsSketch returns epsilon used to compute a priori error. This is just the value 3.5 / maxMapSize.

maxMapSize is the planned map size to be used when constructing this sketch.

func GetEpsilonLongsSketch

func GetEpsilonLongsSketch(maxMapSize int) (float64, error)

GetEpsilonLongsSketch returns epsilon used to compute a priori error. This is just the value 3.5 / maxMapSize.

maxMapSize is the planned map size to be used when constructing this sketch.

Types

type ItemsSketch

type ItemsSketch[C comparable] struct {
	// contains filtered or unexported fields
}

func NewFrequencyItemsSketch

func NewFrequencyItemsSketch[C comparable](lgMaxMapSize int, lgCurMapSize int, hasher common.ItemSketchHasher[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)

NewFrequencyItemsSketch constructs a new ItemsSketch with the given parameters. this internal constructor is used when deserializing the sketch.

  • lgMaxMapSize, log2 of the physical size of the internal hash map managed by this sketch. The maximum capacity of this internal hash map is 0.75 times 2^lgMaxMapSize. Both the ultimate accuracy and size of this sketch are functions of lgMaxMapSize.
  • lgCurMapSize, log2 of the starting (current) physical size of the internal hashFn map managed by this sketch.

func NewFrequencyItemsSketchFromSlice

func NewFrequencyItemsSketchFromSlice[C comparable](slc []byte, hasher common.ItemSketchHasher[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)

NewFrequencyItemsSketchFromSlice constructs a new ItemsSketch with the given maxMapSize and the default initialMapSize (8).

maxMapSize determines the physical size of the internal hashmap managed by this sketch and must be a power of 2. The maximum capacity of this internal hash map is 0.75 times * maxMapSize. Both the ultimate accuracy and size of this sketch are a function of maxMapSize.

func NewFrequencyItemsSketchWithMaxMapSize

func NewFrequencyItemsSketchWithMaxMapSize[C comparable](maxMapSize int, hasher common.ItemSketchHasher[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)

NewFrequencyItemsSketchWithMaxMapSize constructs a new ItemsSketch with the given maxMapSize and the default initialMapSize (8).

  • maxMapSize, Determines the physical size of the internal hash map managed by this sketch and must be a power of 2. The maximum capacity of this internal hash map is 0.75 times * maxMapSize. Both the ultimate accuracy and size of this sketch are functions of maxMapSize.

func (*ItemsSketch[C]) GetCurrentMapCapacity

func (i *ItemsSketch[C]) GetCurrentMapCapacity() int

GetCurrentMapCapacity returns the current number of counters the sketch is configured to support.

func (*ItemsSketch[C]) GetEstimate

func (i *ItemsSketch[C]) GetEstimate(item C) (int64, error)

GetEstimate gets the estimate of the frequency of the given item. Note: The true frequency of an item would be the sum of the counts as a result of the two update functions.

item is the given item

return the estimate of the frequency of the given item

func (*ItemsSketch[C]) GetFrequentItems

func (i *ItemsSketch[C]) GetFrequentItems(errorType errorType) ([]*RowItem[C], error)

GetFrequentItems returns an array of Row that include frequent items, estimates, upper and lower bounds given an ErrorCondition and the default threshold. This is the same as GetFrequentItemsWithThreshold(getMaximumError(), errorType)

errorType determines whether no false positives or no false negatives are desired.

func (*ItemsSketch[C]) GetFrequentItemsWithThreshold

func (i *ItemsSketch[C]) GetFrequentItemsWithThreshold(threshold int64, errorType errorType) ([]*RowItem[C], error)

GetFrequentItemsWithThreshold returns an array of RowItem that include frequent items, estimates, upper and lower bounds given a threshold and an ErrorCondition. If the threshold is lower than getMaximumError(), then getMaximumError() will be used instead.

The method first examines all active items in the sketch (items that have a counter).

If errorType = NO_FALSE_NEGATIVES, this will include an item in the result list if GetUpperBound(item) > threshold. There will be no false negatives, i.e., no Type II error. There may be items in the set with true frequencies less than the threshold (false positives).

If errorType = NO_FALSE_POSITIVES, this will include an item in the result list if GetLowerBound(item) > threshold. There will be no false positives, i.e., no Type I error. There may be items omitted from the set with true frequencies greater than the threshold (false negatives). This is a subset of the NO_FALSE_NEGATIVES case.

threshold to include items in the result list errorType determines whether no false positives or no false negatives are desired. an array of frequent items

func (*ItemsSketch[C]) GetLowerBound

func (i *ItemsSketch[C]) GetLowerBound(item C) (int64, error)

GetLowerBound gets the guaranteed lower bound frequency of the given item, which can never be negative.

  • item, the given item.

func (*ItemsSketch[C]) GetMaximumError

func (i *ItemsSketch[C]) GetMaximumError() int64

GetMaximumError return an upper bound on the maximum error of GetEstimate(item) for any item. This is equivalent to the maximum distance between the upper bound and the lower bound for any item.

func (*ItemsSketch[C]) GetMaximumMapCapacity

func (i *ItemsSketch[C]) GetMaximumMapCapacity() int

GetMaximumMapCapacity returns the maximum number of counters the sketch is configured to support.

func (*ItemsSketch[C]) GetNumActiveItems

func (i *ItemsSketch[C]) GetNumActiveItems() int

GetNumActiveItems returns the number of active items in the sketch.

func (*ItemsSketch[C]) GetStreamLength

func (i *ItemsSketch[C]) GetStreamLength() int64

GetStreamLength returns the sum of the frequencies in the stream seen so far by the sketch.

func (*ItemsSketch[C]) GetUpperBound

func (i *ItemsSketch[C]) GetUpperBound(item C) (int64, error)

GetUpperBound gets the guaranteed upper bound frequency of the given item.

  • item, the given item.

func (*ItemsSketch[C]) IsEmpty

func (i *ItemsSketch[C]) IsEmpty() bool

IsEmpty returns true if this sketch is empty.

func (*ItemsSketch[C]) Merge

func (i *ItemsSketch[C]) Merge(other *ItemsSketch[C]) (*ItemsSketch[C], error)

Merge merges the other sketch into this one. The other sketch may be of a different size.

other sketch of this class

return a sketch whose estimates are within the guarantees of the largest error tolerance of the two merged sketches.

func (*ItemsSketch[C]) Reset

func (i *ItemsSketch[C]) Reset() error

Reset resets this sketch to a virgin state.

func (*ItemsSketch[C]) String

func (i *ItemsSketch[C]) String() string

func (*ItemsSketch[C]) ToSlice

func (i *ItemsSketch[C]) ToSlice() ([]byte, error)

ToSlice returns a slice representation of this sketch

func (*ItemsSketch[C]) ToString

func (i *ItemsSketch[C]) ToString() (string, error)

ToString returns a String representation of this sketch

func (*ItemsSketch[C]) Update

func (i *ItemsSketch[C]) Update(item C) error

Update this sketch with an item and a frequency count of one.

item for which the frequency should be increased.

func (*ItemsSketch[C]) UpdateMany

func (i *ItemsSketch[C]) UpdateMany(item C, count int64) error

UpdateMany update this sketch with an item and a positive frequency count (or weight).

Item for which the frequency should be increased. The item can be any long value and is only used by the sketch to determine uniqueness. count the amount by which the frequency of the item should be increased. A count of zero is a no-op, and a negative count will throw an exception.

type LongsSketch

type LongsSketch struct {
	// contains filtered or unexported fields
}

func NewLongsSketch

func NewLongsSketch(lgMaxMapSize int, lgCurMapSize int) (*LongsSketch, error)

NewLongsSketch returns a new LongsSketch with the given lgMaxMapSize and lgCurMapSize.

lgMaxMapSize is the log2 of the physical size of the internal hash map managed by this sketch. The maximum capacity of this internal hash map is 0.75 times 2^lgMaxMapSize. Both the ultimate accuracy and size of this sketch are a function of lgMaxMapSize.

lgCurMapSize is the log2 of the starting (current) physical size of the internal hashFn map managed by this sketch.

func NewLongsSketchFromSlice

func NewLongsSketchFromSlice(slc []byte) (*LongsSketch, error)

NewLongsSketchFromSlice returns a sketch instance of this class from the given slice, which must be a byte slice representation of this sketch class.

slc is a byte slice representation of a sketch of this class.

func NewLongsSketchFromString

func NewLongsSketchFromString(str string) (*LongsSketch, error)

NewLongsSketchFromString returns a sketch instance of this class from the given string, which must be a string representation of this sketch class.

str is a string representation of a sketch of this class.

func NewLongsSketchWithMaxMapSize

func NewLongsSketchWithMaxMapSize(maxMapSize int) (*LongsSketch, error)

NewLongsSketchWithMaxMapSize constructs a new LongsSketch with the given maxMapSize and the default initialMapSize (8).

maxMapSize determines the physical size of the internal hash map managed by this sketch and must be a power of 2. The maximum capacity of this internal hash map is 0.75 times * maxMapSize. Both the ultimate accuracy and size of this sketch are a function of maxMapSize.

func (*LongsSketch) GetCurrentMapCapacity

func (s *LongsSketch) GetCurrentMapCapacity() int

GetCurrentMapCapacity returns the current number of counters the sketch is configured to support.

func (*LongsSketch) GetEstimate

func (s *LongsSketch) GetEstimate(item int64) (int64, error)

GetEstimate gets the estimate of the frequency of the given item. Note: The true frequency of an item would be the sum of the counts as a result of the two update functions.

item is the given item

return the estimate of the frequency of the given item

func (*LongsSketch) GetFrequentItems

func (s *LongsSketch) GetFrequentItems(errorType errorType) ([]*Row, error)

GetFrequentItems returns an array of Row that include frequent items, estimates, upper and lower bounds given an ErrorCondition and the default threshold. This is the same as GetFrequentItemsWithThreshold(getMaximumError(), errorType)

errorType determines whether no false positives or no false negatives are desired.

func (*LongsSketch) GetFrequentItemsWithThreshold

func (s *LongsSketch) GetFrequentItemsWithThreshold(threshold int64, errorType errorType) ([]*Row, error)

GetFrequentItemsWithThreshold returns an array of Row that include frequent items, estimates, upper and lower bounds given a threshold and an ErrorCondition. If the threshold is lower than getMaximumError(), then getMaximumError() will be used instead.

The method first examines all active items in the sketch (items that have a counter).

If errorType = NO_FALSE_NEGATIVES, this will include an item in the result list if GetUpperBound(item) > threshold. There will be no false negatives, i.e., no Type II error. There may be items in the set with true frequencies less than the threshold (false positives).

If errorType = NO_FALSE_POSITIVES, this will include an item in the result list if GetLowerBound(item) > threshold. There will be no false positives, i.e., no Type I error. There may be items omitted from the set with true frequencies greater than the threshold (false negatives). This is a subset of the NO_FALSE_NEGATIVES case.

threshold to include items in the result list errorType determines whether no false positives or no false negatives are desired. an array of frequent items

func (*LongsSketch) GetLowerBound

func (s *LongsSketch) GetLowerBound(item int64) (int64, error)

GetLowerBound gets the guaranteed lower bound frequency of the given item, which can never be negative.

item is the given item.

return the guaranteed lower bound frequency of the given item. That is, a number which is guaranteed to be no larger than the real frequency.

func (*LongsSketch) GetMaximumError

func (s *LongsSketch) GetMaximumError() int64

GetMaximumError return an upper bound on the maximum error of GetEstimate(item) for any item. This is equivalent to the maximum distance between the upper bound and the lower bound for any item.

func (*LongsSketch) GetMaximumMapCapacity

func (s *LongsSketch) GetMaximumMapCapacity() int

GetMaximumMapCapacity returns the maximum number of counters the sketch is configured to support.

func (*LongsSketch) GetNumActiveItems

func (s *LongsSketch) GetNumActiveItems() int

GetNumActiveItems returns the number of active items in the sketch.

func (*LongsSketch) GetStorageBytes

func (s *LongsSketch) GetStorageBytes() int

GetStorageBytes returns the number of bytes required to store this sketch as slice

func (*LongsSketch) GetStreamLength

func (s *LongsSketch) GetStreamLength() int64

GetStreamLength returns the sum of the frequencies (weights or counts) in the stream seen so far by the sketch

func (*LongsSketch) GetUpperBound

func (s *LongsSketch) GetUpperBound(item int64) (int64, error)

GetUpperBound gets the guaranteed upper bound frequency of the given item.

item is the given item.

return the guaranteed upper bound frequency of the given item. That is, a number which is guaranteed to be no smaller than the real frequency.

func (*LongsSketch) IsEmpty

func (s *LongsSketch) IsEmpty() bool

IsEmpty returns true if this sketch is empty

func (*LongsSketch) Merge

func (s *LongsSketch) Merge(other *LongsSketch) (*LongsSketch, error)

Merge merges the other sketch into this one. The other sketch may be of a different size.

other sketch of this class

return a sketch whose estimates are within the guarantees of the largest error tolerance of the two merged sketches.

func (*LongsSketch) Reset

func (s *LongsSketch) Reset()

Reset resets this sketch to a virgin state.

func (*LongsSketch) String

func (s *LongsSketch) String() string

func (*LongsSketch) ToSlice

func (s *LongsSketch) ToSlice() []byte

ToSlice returns a slice representation of this sketch

func (*LongsSketch) ToString

func (s *LongsSketch) ToString() (string, error)

ToString returns a String representation of this sketch

func (*LongsSketch) Update

func (s *LongsSketch) Update(item int64) error

Update this sketch with an item and a frequency count of one.

item for which the frequency should be increased.

func (*LongsSketch) UpdateMany

func (s *LongsSketch) UpdateMany(item int64, count int64) error

UpdateMany this sketch with an item and a positive frequency count (or weight).

Item for which the frequency should be increased. The item can be any long value and is only used by the sketch to determine uniqueness. count the amount by which the frequency of the item should be increased. A count of zero is a no-op, and a negative count will throw an exception.

type Row

type Row struct {
	// contains filtered or unexported fields
}

func (*Row) GetEstimate

func (r *Row) GetEstimate() int64

func (*Row) GetItem

func (r *Row) GetItem() int64

func (*Row) GetLowerBound

func (r *Row) GetLowerBound() int64

func (*Row) GetUpperBound

func (r *Row) GetUpperBound() int64

func (*Row) String

func (r *Row) String() string

type RowItem

type RowItem[C comparable] struct {
	// contains filtered or unexported fields
}

func (*RowItem[C]) GetEstimate

func (r *RowItem[C]) GetEstimate() int64

func (*RowItem[C]) GetItem

func (r *RowItem[C]) GetItem() C

func (*RowItem[C]) GetLowerBound

func (r *RowItem[C]) GetLowerBound() int64

func (*RowItem[C]) GetUpperBound

func (r *RowItem[C]) GetUpperBound() int64

func (*RowItem[C]) String

func (r *RowItem[C]) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL