Documentation ¶
Overview ¶
Package frequencies is dedicated to streaming algorithms that enable estimation of the frequency of occurrence of items in a weighted multiset stream of items. If the frequency distribution of items is sufficiently skewed, these algorithms are very useful in identifying the "Heavy Hitters" that occurred most frequently in the stream. The accuracy of the estimation of the frequency of an item has well understood error bounds that can be returned by the sketch.
These algorithms are sometimes referred to as "TopN" algorithms.
Index ¶
- Variables
- func GetAprioriErrorFrequencyItemsSketch(maxMapSize int, estimatedTotalStreamWeight int64) (float64, error)
- func GetAprioriErrorLongsSketch(maxMapSize int, estimatedTotalStreamWeight int64) (float64, error)
- func GetEpsilonFrequencyItemsSketch(maxMapSize int) (float64, error)
- func GetEpsilonLongsSketch(maxMapSize int) (float64, error)
- type ItemsSketch
- func NewFrequencyItemsSketch[C comparable](lgMaxMapSize int, lgCurMapSize int, hasher common.ItemSketchHasher[C], ...) (*ItemsSketch[C], error)
- func NewFrequencyItemsSketchFromSlice[C comparable](slc []byte, hasher common.ItemSketchHasher[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)
- func NewFrequencyItemsSketchWithMaxMapSize[C comparable](maxMapSize int, hasher common.ItemSketchHasher[C], ...) (*ItemsSketch[C], error)
- func (i *ItemsSketch[C]) GetCurrentMapCapacity() int
- func (i *ItemsSketch[C]) GetEstimate(item C) (int64, error)
- func (i *ItemsSketch[C]) GetFrequentItems(errorType errorType) ([]*RowItem[C], error)
- func (i *ItemsSketch[C]) GetFrequentItemsWithThreshold(threshold int64, errorType errorType) ([]*RowItem[C], error)
- func (i *ItemsSketch[C]) GetLowerBound(item C) (int64, error)
- func (i *ItemsSketch[C]) GetMaximumError() int64
- func (i *ItemsSketch[C]) GetMaximumMapCapacity() int
- func (i *ItemsSketch[C]) GetNumActiveItems() int
- func (i *ItemsSketch[C]) GetStreamLength() int64
- func (i *ItemsSketch[C]) GetUpperBound(item C) (int64, error)
- func (i *ItemsSketch[C]) IsEmpty() bool
- func (i *ItemsSketch[C]) Merge(other *ItemsSketch[C]) (*ItemsSketch[C], error)
- func (i *ItemsSketch[C]) Reset() error
- func (i *ItemsSketch[C]) String() string
- func (i *ItemsSketch[C]) ToSlice() ([]byte, error)
- func (i *ItemsSketch[C]) ToString() (string, error)
- func (i *ItemsSketch[C]) Update(item C) error
- func (i *ItemsSketch[C]) UpdateMany(item C, count int64) error
- type LongsSketch
- func (s *LongsSketch) GetCurrentMapCapacity() int
- func (s *LongsSketch) GetEstimate(item int64) (int64, error)
- func (s *LongsSketch) GetFrequentItems(errorType errorType) ([]*Row, error)
- func (s *LongsSketch) GetFrequentItemsWithThreshold(threshold int64, errorType errorType) ([]*Row, error)
- func (s *LongsSketch) GetLowerBound(item int64) (int64, error)
- func (s *LongsSketch) GetMaximumError() int64
- func (s *LongsSketch) GetMaximumMapCapacity() int
- func (s *LongsSketch) GetNumActiveItems() int
- func (s *LongsSketch) GetStorageBytes() int
- func (s *LongsSketch) GetStreamLength() int64
- func (s *LongsSketch) GetUpperBound(item int64) (int64, error)
- func (s *LongsSketch) IsEmpty() bool
- func (s *LongsSketch) Merge(other *LongsSketch) (*LongsSketch, error)
- func (s *LongsSketch) Reset()
- func (s *LongsSketch) String() string
- func (s *LongsSketch) ToSlice() []byte
- func (s *LongsSketch) ToString() (string, error)
- func (s *LongsSketch) Update(item int64) error
- func (s *LongsSketch) UpdateMany(item int64, count int64) error
- type Row
- type RowItem
Constants ¶
This section is empty.
Variables ¶
var ErrorTypeEnum = &errorTypes{
NoFalsePositives: errorType{
id: 1,
Name: "NO_FALSE_POSITIVES",
},
NoFalseNegatives: errorType{
id: 2,
Name: "NO_FALSE_NEGATIVES",
},
}
Functions ¶
func GetAprioriErrorFrequencyItemsSketch ¶
func GetAprioriErrorFrequencyItemsSketch(maxMapSize int, estimatedTotalStreamWeight int64) (float64, error)
GetAprioriErrorFrequencyItemsSketch returns the estimated a priori error given the maxMapSize for the sketch and the estimatedTotalStreamWeight.
maxMapSize is the planned map size to be used when constructing this sketch. estimatedTotalStreamWeight is the estimated total stream weight.
func GetAprioriErrorLongsSketch ¶
GetAprioriErrorLongsSketch returns the estimated a priori error given the maxMapSize for the sketch and the estimatedTotalStreamWeight.
maxMapSize is the planned map size to be used when constructing this sketch. estimatedTotalStreamWeight is the estimated total stream weight.
func GetEpsilonFrequencyItemsSketch ¶
GetEpsilonFrequencyItemsSketch returns epsilon used to compute a priori error. This is just the value 3.5 / maxMapSize.
maxMapSize is the planned map size to be used when constructing this sketch.
func GetEpsilonLongsSketch ¶
GetEpsilonLongsSketch returns epsilon used to compute a priori error. This is just the value 3.5 / maxMapSize.
maxMapSize is the planned map size to be used when constructing this sketch.
Types ¶
type ItemsSketch ¶
type ItemsSketch[C comparable] struct { // contains filtered or unexported fields }
func NewFrequencyItemsSketch ¶
func NewFrequencyItemsSketch[C comparable](lgMaxMapSize int, lgCurMapSize int, hasher common.ItemSketchHasher[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)
NewFrequencyItemsSketch constructs a new ItemsSketch with the given parameters. this internal constructor is used when deserializing the sketch.
- lgMaxMapSize, log2 of the physical size of the internal hash map managed by this sketch. The maximum capacity of this internal hash map is 0.75 times 2^lgMaxMapSize. Both the ultimate accuracy and size of this sketch are functions of lgMaxMapSize.
- lgCurMapSize, log2 of the starting (current) physical size of the internal hashFn map managed by this sketch.
func NewFrequencyItemsSketchFromSlice ¶
func NewFrequencyItemsSketchFromSlice[C comparable](slc []byte, hasher common.ItemSketchHasher[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)
NewFrequencyItemsSketchFromSlice constructs a new ItemsSketch with the given maxMapSize and the default initialMapSize (8).
maxMapSize determines the physical size of the internal hashmap managed by this sketch and must be a power of 2. The maximum capacity of this internal hash map is 0.75 times * maxMapSize. Both the ultimate accuracy and size of this sketch are a function of maxMapSize.
func NewFrequencyItemsSketchWithMaxMapSize ¶
func NewFrequencyItemsSketchWithMaxMapSize[C comparable](maxMapSize int, hasher common.ItemSketchHasher[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)
NewFrequencyItemsSketchWithMaxMapSize constructs a new ItemsSketch with the given maxMapSize and the default initialMapSize (8).
- maxMapSize, Determines the physical size of the internal hash map managed by this sketch and must be a power of 2. The maximum capacity of this internal hash map is 0.75 times * maxMapSize. Both the ultimate accuracy and size of this sketch are functions of maxMapSize.
func (*ItemsSketch[C]) GetCurrentMapCapacity ¶
func (i *ItemsSketch[C]) GetCurrentMapCapacity() int
GetCurrentMapCapacity returns the current number of counters the sketch is configured to support.
func (*ItemsSketch[C]) GetEstimate ¶
func (i *ItemsSketch[C]) GetEstimate(item C) (int64, error)
GetEstimate gets the estimate of the frequency of the given item. Note: The true frequency of an item would be the sum of the counts as a result of the two update functions.
item is the given item
return the estimate of the frequency of the given item
func (*ItemsSketch[C]) GetFrequentItems ¶
func (i *ItemsSketch[C]) GetFrequentItems(errorType errorType) ([]*RowItem[C], error)
GetFrequentItems returns an array of Row that include frequent items, estimates, upper and lower bounds given an ErrorCondition and the default threshold. This is the same as GetFrequentItemsWithThreshold(getMaximumError(), errorType)
errorType determines whether no false positives or no false negatives are desired.
func (*ItemsSketch[C]) GetFrequentItemsWithThreshold ¶
func (i *ItemsSketch[C]) GetFrequentItemsWithThreshold(threshold int64, errorType errorType) ([]*RowItem[C], error)
GetFrequentItemsWithThreshold returns an array of RowItem that include frequent items, estimates, upper and lower bounds given a threshold and an ErrorCondition. If the threshold is lower than getMaximumError(), then getMaximumError() will be used instead.
The method first examines all active items in the sketch (items that have a counter).
If errorType = NO_FALSE_NEGATIVES, this will include an item in the result list if GetUpperBound(item) > threshold. There will be no false negatives, i.e., no Type II error. There may be items in the set with true frequencies less than the threshold (false positives).
If errorType = NO_FALSE_POSITIVES, this will include an item in the result list if GetLowerBound(item) > threshold. There will be no false positives, i.e., no Type I error. There may be items omitted from the set with true frequencies greater than the threshold (false negatives). This is a subset of the NO_FALSE_NEGATIVES case.
threshold to include items in the result list errorType determines whether no false positives or no false negatives are desired. an array of frequent items
func (*ItemsSketch[C]) GetLowerBound ¶
func (i *ItemsSketch[C]) GetLowerBound(item C) (int64, error)
GetLowerBound gets the guaranteed lower bound frequency of the given item, which can never be negative.
- item, the given item.
func (*ItemsSketch[C]) GetMaximumError ¶
func (i *ItemsSketch[C]) GetMaximumError() int64
GetMaximumError return an upper bound on the maximum error of GetEstimate(item) for any item. This is equivalent to the maximum distance between the upper bound and the lower bound for any item.
func (*ItemsSketch[C]) GetMaximumMapCapacity ¶
func (i *ItemsSketch[C]) GetMaximumMapCapacity() int
GetMaximumMapCapacity returns the maximum number of counters the sketch is configured to support.
func (*ItemsSketch[C]) GetNumActiveItems ¶
func (i *ItemsSketch[C]) GetNumActiveItems() int
GetNumActiveItems returns the number of active items in the sketch.
func (*ItemsSketch[C]) GetStreamLength ¶
func (i *ItemsSketch[C]) GetStreamLength() int64
GetStreamLength returns the sum of the frequencies in the stream seen so far by the sketch.
func (*ItemsSketch[C]) GetUpperBound ¶
func (i *ItemsSketch[C]) GetUpperBound(item C) (int64, error)
GetUpperBound gets the guaranteed upper bound frequency of the given item.
- item, the given item.
func (*ItemsSketch[C]) IsEmpty ¶
func (i *ItemsSketch[C]) IsEmpty() bool
IsEmpty returns true if this sketch is empty.
func (*ItemsSketch[C]) Merge ¶
func (i *ItemsSketch[C]) Merge(other *ItemsSketch[C]) (*ItemsSketch[C], error)
Merge merges the other sketch into this one. The other sketch may be of a different size.
other sketch of this class
return a sketch whose estimates are within the guarantees of the largest error tolerance of the two merged sketches.
func (*ItemsSketch[C]) Reset ¶
func (i *ItemsSketch[C]) Reset() error
Reset resets this sketch to a virgin state.
func (*ItemsSketch[C]) String ¶
func (i *ItemsSketch[C]) String() string
func (*ItemsSketch[C]) ToSlice ¶
func (i *ItemsSketch[C]) ToSlice() ([]byte, error)
ToSlice returns a slice representation of this sketch
func (*ItemsSketch[C]) ToString ¶
func (i *ItemsSketch[C]) ToString() (string, error)
ToString returns a String representation of this sketch
func (*ItemsSketch[C]) Update ¶
func (i *ItemsSketch[C]) Update(item C) error
Update this sketch with an item and a frequency count of one.
item for which the frequency should be increased.
func (*ItemsSketch[C]) UpdateMany ¶
func (i *ItemsSketch[C]) UpdateMany(item C, count int64) error
UpdateMany update this sketch with an item and a positive frequency count (or weight).
Item for which the frequency should be increased. The item can be any long value and is only used by the sketch to determine uniqueness. count the amount by which the frequency of the item should be increased. A count of zero is a no-op, and a negative count will throw an exception.
type LongsSketch ¶
type LongsSketch struct {
// contains filtered or unexported fields
}
func NewLongsSketch ¶
func NewLongsSketch(lgMaxMapSize int, lgCurMapSize int) (*LongsSketch, error)
NewLongsSketch returns a new LongsSketch with the given lgMaxMapSize and lgCurMapSize.
lgMaxMapSize is the log2 of the physical size of the internal hash map managed by this sketch. The maximum capacity of this internal hash map is 0.75 times 2^lgMaxMapSize. Both the ultimate accuracy and size of this sketch are a function of lgMaxMapSize.
lgCurMapSize is the log2 of the starting (current) physical size of the internal hashFn map managed by this sketch.
func NewLongsSketchFromSlice ¶
func NewLongsSketchFromSlice(slc []byte) (*LongsSketch, error)
NewLongsSketchFromSlice returns a sketch instance of this class from the given slice, which must be a byte slice representation of this sketch class.
slc is a byte slice representation of a sketch of this class.
func NewLongsSketchFromString ¶
func NewLongsSketchFromString(str string) (*LongsSketch, error)
NewLongsSketchFromString returns a sketch instance of this class from the given string, which must be a string representation of this sketch class.
str is a string representation of a sketch of this class.
func NewLongsSketchWithMaxMapSize ¶
func NewLongsSketchWithMaxMapSize(maxMapSize int) (*LongsSketch, error)
NewLongsSketchWithMaxMapSize constructs a new LongsSketch with the given maxMapSize and the default initialMapSize (8).
maxMapSize determines the physical size of the internal hash map managed by this sketch and must be a power of 2. The maximum capacity of this internal hash map is 0.75 times * maxMapSize. Both the ultimate accuracy and size of this sketch are a function of maxMapSize.
func (*LongsSketch) GetCurrentMapCapacity ¶
func (s *LongsSketch) GetCurrentMapCapacity() int
GetCurrentMapCapacity returns the current number of counters the sketch is configured to support.
func (*LongsSketch) GetEstimate ¶
func (s *LongsSketch) GetEstimate(item int64) (int64, error)
GetEstimate gets the estimate of the frequency of the given item. Note: The true frequency of an item would be the sum of the counts as a result of the two update functions.
item is the given item
return the estimate of the frequency of the given item
func (*LongsSketch) GetFrequentItems ¶
func (s *LongsSketch) GetFrequentItems(errorType errorType) ([]*Row, error)
GetFrequentItems returns an array of Row that include frequent items, estimates, upper and lower bounds given an ErrorCondition and the default threshold. This is the same as GetFrequentItemsWithThreshold(getMaximumError(), errorType)
errorType determines whether no false positives or no false negatives are desired.
func (*LongsSketch) GetFrequentItemsWithThreshold ¶
func (s *LongsSketch) GetFrequentItemsWithThreshold(threshold int64, errorType errorType) ([]*Row, error)
GetFrequentItemsWithThreshold returns an array of Row that include frequent items, estimates, upper and lower bounds given a threshold and an ErrorCondition. If the threshold is lower than getMaximumError(), then getMaximumError() will be used instead.
The method first examines all active items in the sketch (items that have a counter).
If errorType = NO_FALSE_NEGATIVES, this will include an item in the result list if GetUpperBound(item) > threshold. There will be no false negatives, i.e., no Type II error. There may be items in the set with true frequencies less than the threshold (false positives).
If errorType = NO_FALSE_POSITIVES, this will include an item in the result list if GetLowerBound(item) > threshold. There will be no false positives, i.e., no Type I error. There may be items omitted from the set with true frequencies greater than the threshold (false negatives). This is a subset of the NO_FALSE_NEGATIVES case.
threshold to include items in the result list errorType determines whether no false positives or no false negatives are desired. an array of frequent items
func (*LongsSketch) GetLowerBound ¶
func (s *LongsSketch) GetLowerBound(item int64) (int64, error)
GetLowerBound gets the guaranteed lower bound frequency of the given item, which can never be negative.
item is the given item.
return the guaranteed lower bound frequency of the given item. That is, a number which is guaranteed to be no larger than the real frequency.
func (*LongsSketch) GetMaximumError ¶
func (s *LongsSketch) GetMaximumError() int64
GetMaximumError return an upper bound on the maximum error of GetEstimate(item) for any item. This is equivalent to the maximum distance between the upper bound and the lower bound for any item.
func (*LongsSketch) GetMaximumMapCapacity ¶
func (s *LongsSketch) GetMaximumMapCapacity() int
GetMaximumMapCapacity returns the maximum number of counters the sketch is configured to support.
func (*LongsSketch) GetNumActiveItems ¶
func (s *LongsSketch) GetNumActiveItems() int
GetNumActiveItems returns the number of active items in the sketch.
func (*LongsSketch) GetStorageBytes ¶
func (s *LongsSketch) GetStorageBytes() int
GetStorageBytes returns the number of bytes required to store this sketch as slice
func (*LongsSketch) GetStreamLength ¶
func (s *LongsSketch) GetStreamLength() int64
GetStreamLength returns the sum of the frequencies (weights or counts) in the stream seen so far by the sketch
func (*LongsSketch) GetUpperBound ¶
func (s *LongsSketch) GetUpperBound(item int64) (int64, error)
GetUpperBound gets the guaranteed upper bound frequency of the given item.
item is the given item.
return the guaranteed upper bound frequency of the given item. That is, a number which is guaranteed to be no smaller than the real frequency.
func (*LongsSketch) IsEmpty ¶
func (s *LongsSketch) IsEmpty() bool
IsEmpty returns true if this sketch is empty
func (*LongsSketch) Merge ¶
func (s *LongsSketch) Merge(other *LongsSketch) (*LongsSketch, error)
Merge merges the other sketch into this one. The other sketch may be of a different size.
other sketch of this class
return a sketch whose estimates are within the guarantees of the largest error tolerance of the two merged sketches.
func (*LongsSketch) Reset ¶
func (s *LongsSketch) Reset()
Reset resets this sketch to a virgin state.
func (*LongsSketch) String ¶
func (s *LongsSketch) String() string
func (*LongsSketch) ToSlice ¶
func (s *LongsSketch) ToSlice() []byte
ToSlice returns a slice representation of this sketch
func (*LongsSketch) ToString ¶
func (s *LongsSketch) ToString() (string, error)
ToString returns a String representation of this sketch
func (*LongsSketch) Update ¶
func (s *LongsSketch) Update(item int64) error
Update this sketch with an item and a frequency count of one.
item for which the frequency should be increased.
func (*LongsSketch) UpdateMany ¶
func (s *LongsSketch) UpdateMany(item int64, count int64) error
UpdateMany this sketch with an item and a positive frequency count (or weight).
Item for which the frequency should be increased. The item can be any long value and is only used by the sketch to determine uniqueness. count the amount by which the frequency of the item should be increased. A count of zero is a no-op, and a negative count will throw an exception.
type Row ¶
type Row struct {
// contains filtered or unexported fields
}
func (*Row) GetEstimate ¶
func (*Row) GetLowerBound ¶
func (*Row) GetUpperBound ¶
type RowItem ¶
type RowItem[C comparable] struct { // contains filtered or unexported fields }