kll

package
v0.0.0-...-57d8af6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 23, 2024 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package kll is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained quantile.</p>

Reference: https://arxiv.org/abs/1603.05346v2" Optimal Quantile Approximation in Streams

The default k of 200 yields a "single-sided" epsilon of about 1.33% and a "double-sided" (PMF) epsilon of about 1.65%, with a confidence of 99%.

See "https://datasketches.apache.org/docs/KLL/KLLSketch.html" KLL Sketch

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ItemsSketch

type ItemsSketch[C comparable] struct {
	// contains filtered or unexported fields
}

func NewKllItemsSketch

func NewKllItemsSketch[C comparable](k uint16, m uint8, compareFn common.CompareFn[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)

NewKllItemsSketch create a new ItemsSketch with the given k and m. The default k = 200 results in a normalized rank error of about 1.65%. Larger K will have smaller error but the sketch will be larger (and slower).

func NewKllItemsSketchFromSlice

func NewKllItemsSketchFromSlice[C comparable](sl []byte, compareFn common.CompareFn[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)

NewKllItemsSketchFromSlice create a new ItemsSketch from the given byte slice (serialized sketch).

func NewKllItemsSketchWithDefault

func NewKllItemsSketchWithDefault[C comparable](compareFn common.CompareFn[C], serde common.ItemSketchSerde[C]) (*ItemsSketch[C], error)

NewKllItemsSketchWithDefault create a new ItemsSketch with default k and m. The default k = 200 results in a normalized rank error of about 1.65%.

func (*ItemsSketch[C]) GetCDF

func (s *ItemsSketch[C]) GetCDF(splitPoints []C, inclusive bool) ([]float64, error)

GetCDF returns an approximation to the Cumulative Distribution Function (CDF) of the input stream as a monotonically increasing array of double ranks (or cumulative probabilities) on the interval [0.0, 1.0], given a set of splitPoints.

The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(false) function.

- splitPoints an array of <i>m</i> unique, monotonically increasing items (of the same type as the input items) that divide the item input domain into <i>m+1</i> overlapping intervals.

The start of each interval is below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and the end of the interval is the rank or cumulative probability corresponding to the split point.

The (m+1)th interval represents 100% of the distribution represented by the sketch and consistent with the definition of a cumulative probability distribution, thus the (m+1)th rank or probability in the returned array is always 1.0.

If a split point exactly equals a retained item of the sketch and the search criterion is:

- INCLUSIVE, the resulting cumulative probability will include that item. - EXCLUSIVE, the resulting cumulative probability will not include the weight of that split point.

It is not recommended to include either the minimum or maximum items of the input stream.

func (*ItemsSketch[C]) GetIterator

func (s *ItemsSketch[C]) GetIterator() *ItemsSketchIterator[C]

GetIterator returns the iterator for this sketch, which is not sorted.

func (*ItemsSketch[C]) GetK

func (s *ItemsSketch[C]) GetK() uint16

GetK returns the value of k (which controls the accuracy of the sketch and its memory space usage)

func (*ItemsSketch[C]) GetMaxItem

func (s *ItemsSketch[C]) GetMaxItem() (C, error)

GetMaxItem returns the maximum item of the stream. This may be distinct from the largest item retained by the sketch algorithm.

func (*ItemsSketch[C]) GetMinItem

func (s *ItemsSketch[C]) GetMinItem() (C, error)

GetMinItem returns the minimum item of the stream. This may be distinct from the smallest item retained by the sketch algorithm.

func (*ItemsSketch[C]) GetN

func (s *ItemsSketch[C]) GetN() uint64

GetN returns the value of n (the length of the input stream offered to the sketch)

func (*ItemsSketch[C]) GetNormalizedRankError

func (s *ItemsSketch[C]) GetNormalizedRankError(pmf bool) float64

GetNormalizedRankError return the approximate rank error of this sketch normalized as a fraction between zero and one. The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials. = pmf if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries. @return if pmf is true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.

func (*ItemsSketch[C]) GetNumRetained

func (s *ItemsSketch[C]) GetNumRetained() uint32

GetNumRetained returns the number of quantiles retained by the sketch.

func (*ItemsSketch[C]) GetPMF

func (s *ItemsSketch[C]) GetPMF(splitPoints []C, inclusive bool) ([]float64, error)

GetPMF returns an approximation to the Probability Mass Function (PMF) of the input stream as an array of probability masses as doubles on the interval [0.0, 1.0], given a set of splitPoints.

The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(true) function.</p>

  • splitPoints an array of m unique, monotonically increasing items (of the same type as the input items) that divide the item input domain into <i>m+1</i> consecutive, non-overlapping intervals.

Each interval except for the end intervals starts with a split point and ends with the next split point in sequence.

The first interval starts below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and ends with the first split point</p>

The last (m+1)th interval starts with the last split point and ends after the last item retained by the sketch corresponding to a rank or probability of 1.0.

The sum of the probability masses of all (m+1) intervals is 1.0.

If the search criterion is:

  • INCLUSIVE, and the upper split point of an interval equals an item retained by the sketch, the interval will include that item. If the lower split point equals an item retained by the sketch, the interval will exclude that item.

  • EXCLUSIVE, and the upper split point of an interval equals an item retained by the sketch, the interval will exclude that item. If the lower split point equals an item retained by the sketch, the interval will include that item.

It is not recommended to include either the minimum or maximum items of the input stream.

func (*ItemsSketch[C]) GetPartitionBoundaries

func (s *ItemsSketch[C]) GetPartitionBoundaries(numEquallySized int, inclusive bool) (*ItemsSketchPartitionBoundaries[C], error)

GetPartitionBoundaries returns an instance of ItemsSketchPartitionBoundaries which provides sufficient information for the user to create the given number of equally sized partitions, where "equally sized" refers to an approximately equal number of items per partition.

- numEquallySized an integer that specifies the number of equally sized partitions between getMinItem() and getMaxItem(). This must be a positive integer greater than zero.

A 1 will return: minItem, maxItem. A 2 will return: minItem, median quantile, maxItem. Etc.

- searchCrit If INCLUSIVE, all the returned quantiles are the upper boundaries of the equally sized partitions except for the lowest returned quantile, which is the lowest boundary of the lowest ranked partition. If EXCLUSIVE, all the returned quantiles are the lower boundaries of the equally sized partitions except for the highest returned quantile, which is the upper boundary of the highest ranked partition.

func (*ItemsSketch[C]) GetQuantile

func (s *ItemsSketch[C]) GetQuantile(rank float64, inclusive bool) (C, error)

GetQuantile return the approximate quantile of the given normalized rank and the given search criterion. If INCLUSIVE, the given rank includes all quantiles <= the quantile directly corresponding to the given rank. If EXCLUSIVE, the given rank includes all quantiles < the quantile directly corresponding to the given rank.

func (*ItemsSketch[C]) GetQuantiles

func (s *ItemsSketch[C]) GetQuantiles(ranks []float64, inclusive bool) ([]C, error)

GetQuantiles return an array of quantiles from the given array of normalized ranks. if INCLUSIVE, the given ranks include all quantiles <= the quantile directly corresponding to each rank.

func (*ItemsSketch[C]) GetRank

func (s *ItemsSketch[C]) GetRank(item C, inclusive bool) (float64, error)

GetRank return the normalized rank corresponding to the given a quantile. if INCLUSIVE the given quantile is included into the rank.

func (*ItemsSketch[C]) GetRanks

func (s *ItemsSketch[C]) GetRanks(item []C, inclusive bool) ([]float64, error)

GetRanks return an array of normalized ranks corresponding to the given array of quantiles and the given search criterion. if INCLUSIVE, the given quantiles include the rank directly corresponding to each quantile.

func (*ItemsSketch[C]) GetSerializedSizeBytes

func (s *ItemsSketch[C]) GetSerializedSizeBytes() (int, error)

GetSerializedSizeBytes Returns the current number of bytes this Sketch would require if serialized in compact form.

func (*ItemsSketch[C]) GetSortedView

func (s *ItemsSketch[C]) GetSortedView() (*ItemsSketchSortedView[C], error)

GetSortedView return the sorted view of this sketch.

func (*ItemsSketch[C]) GetTotalItemsArray

func (s *ItemsSketch[C]) GetTotalItemsArray() []C

GetTotalItemsArray return the serialized byte array of the entire internal items hypothetical structure. It does not include the preamble, the levels array, or minimum or maximum items. It may include empty or garbage items.

func (*ItemsSketch[C]) IsEmpty

func (s *ItemsSketch[C]) IsEmpty() bool

IsEmpty returns true if the sketch is empty, otherwise false.

func (*ItemsSketch[C]) IsEstimationMode

func (s *ItemsSketch[C]) IsEstimationMode() bool

IsEstimationMode returns true if the sketch is in estimation mode, otherwise false.

func (*ItemsSketch[C]) Merge

func (s *ItemsSketch[C]) Merge(other *ItemsSketch[C])

Merge the given sketch into this sketch.

func (*ItemsSketch[C]) Reset

func (s *ItemsSketch[C]) Reset()

Reset this sketch to the empty state.

func (*ItemsSketch[C]) ToSlice

func (s *ItemsSketch[C]) ToSlice() ([]byte, error)

ToSlice returns the serialized byte array of this sketch.

func (*ItemsSketch[C]) Update

func (s *ItemsSketch[C]) Update(item C)

Update this sketch with the given item.

type ItemsSketchIterator

type ItemsSketchIterator[C comparable] struct {
	// contains filtered or unexported fields
}

func (*ItemsSketchIterator[C]) GetQuantile

func (s *ItemsSketchIterator[C]) GetQuantile() C

GetQuantile return the generic quantile at the current index.

Don't call this before calling next() for the first time or after getting false from next().

func (*ItemsSketchIterator[C]) GetWeight

func (s *ItemsSketchIterator[C]) GetWeight() int64

func (*ItemsSketchIterator[C]) Next

func (s *ItemsSketchIterator[C]) Next() bool

type ItemsSketchPartitionBoundaries

type ItemsSketchPartitionBoundaries[C comparable] struct {
	// contains filtered or unexported fields
}

func (*ItemsSketchPartitionBoundaries[C]) GetBoundaries

func (b *ItemsSketchPartitionBoundaries[C]) GetBoundaries() []C

type ItemsSketchSortedView

type ItemsSketchSortedView[C comparable] struct {
	// contains filtered or unexported fields
}

func (*ItemsSketchSortedView[C]) GetCDF

func (s *ItemsSketchSortedView[C]) GetCDF(splitPoints []C, inclusive bool) ([]float64, error)

func (*ItemsSketchSortedView[C]) GetNumRetained

func (s *ItemsSketchSortedView[C]) GetNumRetained() int

func (*ItemsSketchSortedView[C]) GetPMF

func (s *ItemsSketchSortedView[C]) GetPMF(splitPoints []C, inclusive bool) ([]float64, error)

func (*ItemsSketchSortedView[C]) GetPartitionBoundaries

func (s *ItemsSketchSortedView[C]) GetPartitionBoundaries(numEquallySized int, inclusive bool) (*ItemsSketchPartitionBoundaries[C], error)

func (*ItemsSketchSortedView[C]) GetQuantile

func (s *ItemsSketchSortedView[C]) GetQuantile(rank float64, inclusive bool) (C, error)

func (*ItemsSketchSortedView[C]) GetRank

func (s *ItemsSketchSortedView[C]) GetRank(item C, inclusive bool) (float64, error)

func (*ItemsSketchSortedView[C]) Iterator

type ItemsSketchSortedViewIterator

type ItemsSketchSortedViewIterator[C comparable] struct {
	// contains filtered or unexported fields
}

func (*ItemsSketchSortedViewIterator[C]) GetNaturalRank

func (i *ItemsSketchSortedViewIterator[C]) GetNaturalRank(inclusive bool) int64

func (*ItemsSketchSortedViewIterator[C]) GetNormalizedRank

func (i *ItemsSketchSortedViewIterator[C]) GetNormalizedRank(inclusive bool) float64

func (*ItemsSketchSortedViewIterator[C]) GetQuantile

func (i *ItemsSketchSortedViewIterator[C]) GetQuantile() C

GetQuantile returns the quantile at the current index

Don't call this before calling next() for the first time or after getting false from next().

func (*ItemsSketchSortedViewIterator[C]) GetWeight

func (i *ItemsSketchSortedViewIterator[C]) GetWeight() int64

func (*ItemsSketchSortedViewIterator[C]) Next

func (i *ItemsSketchSortedViewIterator[C]) Next() bool

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL