sketch

package
v3.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 9, 2024 License: AGPL-3.0 Imports: 18 Imported by: 0

Documentation

Index

Constants

View Source
const ValueTypeTopKMatrix = "topk_matrix"

Variables

This section is empty.

Functions

This section is empty.

Types

type CountMinSketch

type CountMinSketch struct {
	// contains filtered or unexported fields
}

func NewCountMinSketch

func NewCountMinSketch(w, d uint32) (*CountMinSketch, error)

NewCountMinSketch creates a new CMS for a given width and depth.

func (*CountMinSketch) Add

func (s *CountMinSketch) Add(event string, count int)

Add 'count' occurrences of the given input.

func (*CountMinSketch) ConservativeAdd

func (s *CountMinSketch) ConservativeAdd(event string, count uint32) (uint32, uint32, uint32)

ConservativeAdd adds the count (conservatively) for the given input. Conservative counting is described in https://dl.acm.org/doi/pdf/10.1145/633025.633056 and https://theory.stanford.edu/~matias/papers/sbf-sigmod-03.pdf. For more details you can read https://arxiv.org/pdf/2203.14549.pdf as well. The tl; dr, we only update the counters with a value that's less than Count(h) + count rather than all counters that h hashed to. Returns the new estimate for the event as well as the both hashes which can be used to identify the event for other things that need a hash.

func (*CountMinSketch) ConservativeIncrement

func (s *CountMinSketch) ConservativeIncrement(event string) (uint32, uint32, uint32)

func (*CountMinSketch) Count

func (s *CountMinSketch) Count(event string) uint32

Count returns the approximate min count for the given input.

func (*CountMinSketch) Increment

func (s *CountMinSketch) Increment(event string)

func (*CountMinSketch) Merge

func (s *CountMinSketch) Merge(from *CountMinSketch) error

Merge the given sketch into this one. The sketches must have the same dimensions.

type DDSketchQuantile

type DDSketchQuantile struct {
	*ddsketch.DDSketch
}

DDSketchQuantile is a QuantileSketch implementation based on DataDog's "DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees." paper.

func DDSketchQuantileFromProto

func DDSketchQuantileFromProto(buf []byte) (*DDSketchQuantile, error)

func NewDDSketch

func NewDDSketch() *DDSketchQuantile

func (*DDSketchQuantile) Merge

func (*DDSketchQuantile) Quantile

func (d *DDSketchQuantile) Quantile(quantile float64) (float64, error)

func (*DDSketchQuantile) Release

func (d *DDSketchQuantile) Release()

func (*DDSketchQuantile) ToProto

type MinHeap

type MinHeap []*node

func (*MinHeap) Find

func (h *MinHeap) Find(e string) (int, bool)

func (MinHeap) Len

func (h MinHeap) Len() int

func (MinHeap) Less

func (h MinHeap) Less(i, j int) bool

less is only used in the underlying pop implementation

func (*MinHeap) Peek

func (h *MinHeap) Peek() interface{}

func (*MinHeap) Pop

func (h *MinHeap) Pop() interface{}

func (*MinHeap) Push

func (h *MinHeap) Push(x interface{})

func (MinHeap) Swap

func (h MinHeap) Swap(i, j int)

type QuantileSketch

type QuantileSketch interface {
	Add(float64) error
	Quantile(float64) (float64, error)
	Merge(QuantileSketch) (QuantileSketch, error)
	ToProto() *logproto.QuantileSketch
	Release()
}

QuantileSketch estimates quantiles over time.

func NewTDigestSketch

func NewTDigestSketch() QuantileSketch

func QuantileSketchFromProto

func QuantileSketchFromProto(proto *logproto.QuantileSketch) (QuantileSketch, error)

type QuantileSketchFactory

type QuantileSketchFactory func() QuantileSketch

type TDigestQuantile

type TDigestQuantile struct {
	*tdigest.TDigest
}

func TDigestQuantileFromProto

func TDigestQuantileFromProto(proto *logproto.TDigest) *TDigestQuantile

func (*TDigestQuantile) Add

func (d *TDigestQuantile) Add(count float64) error

func (*TDigestQuantile) Merge

func (*TDigestQuantile) Quantile

func (d *TDigestQuantile) Quantile(quantile float64) (float64, error)

func (*TDigestQuantile) Release

func (d *TDigestQuantile) Release()

func (*TDigestQuantile) ToProto

func (d *TDigestQuantile) ToProto() *logproto.QuantileSketch

type TopKMatrix

type TopKMatrix []TopKVector

TopkMatrix is `promql.Value` and `parser.Value`

func TopKMatrixFromProto

func TopKMatrixFromProto(proto *logproto.TopKMatrix) (TopKMatrix, error)

func (TopKMatrix) String

func (TopKMatrix) String() string

String implements `promql.Value` and `parser.Value`

func (TopKMatrix) ToProto

func (s TopKMatrix) ToProto() (*logproto.TopKMatrix, error)

func (TopKMatrix) Type

func (TopKMatrix) Type() parser.ValueType

Type implements `promql.Value` and `parser.Value`

type TopKResult

type TopKResult []element

func (TopKResult) Len

func (t TopKResult) Len() int

func (TopKResult) Less

func (t TopKResult) Less(i, j int) bool

for topk we actually want the largest item first

func (TopKResult) Swap

func (t TopKResult) Swap(i, j int)

type TopKVector

type TopKVector struct {
	// contains filtered or unexported fields
}

type Topk

type Topk struct {
	// contains filtered or unexported fields
}

Topk is a structure that uses a Count Min Sketch and a Min-Heap to track the top k events by frequency. We also use the sketch-bf (https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/ell2.12482) notion of a bloomfilter per count min sketch row to avoid having to iterate though the heap each time we want to check for existence of a given event (by identifier) in the heap.

func NewCMSTopkForCardinality

func NewCMSTopkForCardinality(l log.Logger, k, c int) (*Topk, error)

NewCMSTopkForCardinality creates a new topk sketch where k is the amount of topk we want, and c is the expected total cardinality of the dataset the sketch should be able to handle, including other sketches that we may merge in.

func TopkFromProto

func TopkFromProto(t *logproto.TopK) (*Topk, error)

func (*Topk) Cardinality

func (t *Topk) Cardinality() (uint64, bool)

Cardinality returns the estimated cardinality of the input plus whether the size of t's count min sketch was big enough for that estimated cardinality.

func (*Topk) InTopk

func (t *Topk) InTopk(h1, h2 uint32) bool

InTopk checks to see if an event is currently in the topk for this sketch

func (*Topk) Merge

func (t *Topk) Merge(from *Topk) error

Merge the given sketch into this one. The sketches must have the same dimensions. Note that our merge operation currently also replaces the heap by taking the combined topk list from both t and from and then deduplicating the union of the two, and finally pushing that list of things to a new heap

func (*Topk) Observe

func (t *Topk) Observe(event string)

Observe is our sketch event observation function, which is a bit more complex than the original count min sketch + heap TopK literature outlines. We're using some optimizations from the sketch-bf paper (here: http://www.eecs.harvard.edu/~michaelm/postscripts/tr-02-05.pdf) in order to reduce the # of heap operations required over time. As an example, with a cardinality of 100k we saw nearly 3x improvement in CPU usage by using these optimizations.

By default when we observe an event, if it's already in the current topk we would update it's value in the heap structure with the new count min sketch estimate and then rebalance the heap. This is potentially a lot of heap balancing operations that, at the end of the day, aren't really important. What information do we care about from the heap when we're actually still observing events and tracking the topk? The minimum value that's stored in the heap. If we observe an event and it's new count is greater than the minimum value in the heap, that event should go into the heap and the event with the minimum value should come out. So the optimization is as follows:

We only need to update the count for each event in the heap when we observe an event that's not in the heap, and it's new estimate is greater than the thing that's the current minimum value heap element. At that point, we update the values for each node in the heap and rebalance the heap, and then if the event we're observing has an estimate that is still greater than the minimum heap element count, we should put this event into the heap and remove the other one.

func (*Topk) ToProto

func (t *Topk) ToProto() (*logproto.TopK, error)

func (*Topk) Topk

func (t *Topk) Topk() TopKResult

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL