Documentation ¶
Index ¶
- Constants
- type CountMinSketch
- func (s *CountMinSketch) Add(event string, count int)
- func (s *CountMinSketch) ConservativeAdd(event string, count uint32) (uint32, uint32, uint32)
- func (s *CountMinSketch) ConservativeIncrement(event string) (uint32, uint32, uint32)
- func (s *CountMinSketch) Count(event string) uint32
- func (s *CountMinSketch) Increment(event string)
- func (s *CountMinSketch) Merge(from *CountMinSketch) error
- type DDSketchQuantile
- type MinHeap
- type QuantileSketch
- type QuantileSketchFactory
- type TDigestQuantile
- type TopKMatrix
- type TopKResult
- type TopKVector
- type Topk
Constants ¶
const ValueTypeTopKMatrix = "topk_matrix"
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CountMinSketch ¶
type CountMinSketch struct {
// contains filtered or unexported fields
}
func NewCountMinSketch ¶
func NewCountMinSketch(w, d uint32) (*CountMinSketch, error)
NewCountMinSketch creates a new CMS for a given width and depth.
func (*CountMinSketch) Add ¶
func (s *CountMinSketch) Add(event string, count int)
Add 'count' occurrences of the given input.
func (*CountMinSketch) ConservativeAdd ¶
ConservativeAdd adds the count (conservatively) for the given input. Conservative counting is described in https://dl.acm.org/doi/pdf/10.1145/633025.633056 and https://theory.stanford.edu/~matias/papers/sbf-sigmod-03.pdf. For more details you can read https://arxiv.org/pdf/2203.14549.pdf as well. The tl; dr, we only update the counters with a value that's less than Count(h) + count rather than all counters that h hashed to. Returns the new estimate for the event as well as the both hashes which can be used to identify the event for other things that need a hash.
func (*CountMinSketch) ConservativeIncrement ¶
func (s *CountMinSketch) ConservativeIncrement(event string) (uint32, uint32, uint32)
func (*CountMinSketch) Count ¶
func (s *CountMinSketch) Count(event string) uint32
Count returns the approximate min count for the given input.
func (*CountMinSketch) Increment ¶
func (s *CountMinSketch) Increment(event string)
func (*CountMinSketch) Merge ¶
func (s *CountMinSketch) Merge(from *CountMinSketch) error
Merge the given sketch into this one. The sketches must have the same dimensions.
type DDSketchQuantile ¶
DDSketchQuantile is a QuantileSketch implementation based on DataDog's "DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees." paper.
func DDSketchQuantileFromProto ¶
func DDSketchQuantileFromProto(buf []byte) (*DDSketchQuantile, error)
func NewDDSketch ¶
func NewDDSketch() *DDSketchQuantile
func (*DDSketchQuantile) Merge ¶
func (d *DDSketchQuantile) Merge(other QuantileSketch) (QuantileSketch, error)
func (*DDSketchQuantile) Quantile ¶
func (d *DDSketchQuantile) Quantile(quantile float64) (float64, error)
func (*DDSketchQuantile) Release ¶
func (d *DDSketchQuantile) Release()
func (*DDSketchQuantile) ToProto ¶
func (d *DDSketchQuantile) ToProto() *logproto.QuantileSketch
type MinHeap ¶
type MinHeap []*node
type QuantileSketch ¶
type QuantileSketch interface { Add(float64) error Quantile(float64) (float64, error) Merge(QuantileSketch) (QuantileSketch, error) ToProto() *logproto.QuantileSketch Release() }
QuantileSketch estimates quantiles over time.
func NewTDigestSketch ¶
func NewTDigestSketch() QuantileSketch
func QuantileSketchFromProto ¶
func QuantileSketchFromProto(proto *logproto.QuantileSketch) (QuantileSketch, error)
type QuantileSketchFactory ¶
type QuantileSketchFactory func() QuantileSketch
type TDigestQuantile ¶
func TDigestQuantileFromProto ¶
func TDigestQuantileFromProto(proto *logproto.TDigest) *TDigestQuantile
func (*TDigestQuantile) Add ¶
func (d *TDigestQuantile) Add(count float64) error
func (*TDigestQuantile) Merge ¶
func (d *TDigestQuantile) Merge(other QuantileSketch) (QuantileSketch, error)
func (*TDigestQuantile) Quantile ¶
func (d *TDigestQuantile) Quantile(quantile float64) (float64, error)
func (*TDigestQuantile) Release ¶
func (d *TDigestQuantile) Release()
func (*TDigestQuantile) ToProto ¶
func (d *TDigestQuantile) ToProto() *logproto.QuantileSketch
type TopKMatrix ¶
type TopKMatrix []TopKVector
TopkMatrix is `promql.Value` and `parser.Value`
func TopKMatrixFromProto ¶
func TopKMatrixFromProto(proto *logproto.TopKMatrix) (TopKMatrix, error)
func (TopKMatrix) String ¶
func (TopKMatrix) String() string
String implements `promql.Value` and `parser.Value`
func (TopKMatrix) ToProto ¶
func (s TopKMatrix) ToProto() (*logproto.TopKMatrix, error)
func (TopKMatrix) Type ¶
func (TopKMatrix) Type() parser.ValueType
Type implements `promql.Value` and `parser.Value`
type TopKResult ¶
type TopKResult []element
func (TopKResult) Len ¶
func (t TopKResult) Len() int
func (TopKResult) Less ¶
func (t TopKResult) Less(i, j int) bool
for topk we actually want the largest item first
func (TopKResult) Swap ¶
func (t TopKResult) Swap(i, j int)
type TopKVector ¶
type TopKVector struct {
// contains filtered or unexported fields
}
type Topk ¶
type Topk struct {
// contains filtered or unexported fields
}
Topk is a structure that uses a Count Min Sketch and a Min-Heap to track the top k events by frequency. We also use the sketch-bf (https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/ell2.12482) notion of a bloomfilter per count min sketch row to avoid having to iterate though the heap each time we want to check for existence of a given event (by identifier) in the heap.
func NewCMSTopkForCardinality ¶
NewCMSTopkForCardinality creates a new topk sketch where k is the amount of topk we want, and c is the expected total cardinality of the dataset the sketch should be able to handle, including other sketches that we may merge in.
func (*Topk) Cardinality ¶
Cardinality returns the estimated cardinality of the input plus whether the size of t's count min sketch was big enough for that estimated cardinality.
func (*Topk) Merge ¶
Merge the given sketch into this one. The sketches must have the same dimensions. Note that our merge operation currently also replaces the heap by taking the combined topk list from both t and from and then deduplicating the union of the two, and finally pushing that list of things to a new heap
func (*Topk) Observe ¶
Observe is our sketch event observation function, which is a bit more complex than the original count min sketch + heap TopK literature outlines. We're using some optimizations from the sketch-bf paper (here: http://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf) in order to reduce the # of heap operations required over time. As an example, with a cardinality of 100k we saw nearly 3x improvement in CPU usage by using these optimizations.
By default when we observe an event, if it's already in the current topk we would update it's value in the heap structure with the new count min sketch estimate and then rebalance the heap. This is potentially a lot of heap balancing operations that, at the end of the day, aren't really important. What information do we care about from the heap when we're actually still observing events and tracking the topk? The minimum value that's stored in the heap. If we observe an event and it's new count is greater than the minimum value in the heap, that event should go into the heap and the event with the minimum value should come out. So the optimization is as follows:
We only need to update the count for each event in the heap when we observe an event that's not in the heap, and it's new estimate is greater than the thing that's the current minimum value heap element. At that point, we update the values for each node in the heap and rebalance the heap, and then if the event we're observing has an estimate that is still greater than the minimum heap element count, we should put this event into the heap and remove the other one.
func (*Topk) Topk ¶
func (t *Topk) Topk() TopKResult