stats

package
v3.0.0-...-7ba4d6b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 17, 2024 License: Apache-2.0, BSD-3-Clause, MIT Imports: 17 Imported by: 16

Documentation

Overview

Package stats contains transforms for statistical processing.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ApproximateQuantiles

func ApproximateQuantiles(s beam.Scope, pc beam.PCollection, less any, opts Opts) beam.PCollection

ApproximateQuantiles computes approximate quantiles for the input PCollection<T>.

The output PCollection contains a single element: a list of numQuantiles - 1 elements approximately splitting up the input collection into numQuantiles separate quantiles. For example, if numQuantiles = 2, the returned list would contain a single element such that approximately half of the input would be less than that element and half would be greater.

func ApproximateWeightedQuantiles

func ApproximateWeightedQuantiles(s beam.Scope, pc beam.PCollection, less any, opts Opts) beam.PCollection

ApproximateWeightedQuantiles computes approximate quantiles for the input PCollection<(weight int, T)>.

The output PCollection contains a single element: a list of numQuantiles - 1 elements approximately splitting up the input collection into numQuantiles separate quantiles. For example, if numQuantiles = 2, the returned list would contain a single element such that approximately half of the input would be less than that element and half would be greater or equal.

func Count

func Count(s beam.Scope, col beam.PCollection) beam.PCollection

Count counts the number of appearances of each element in a collection. It expects a PCollection<T> as input and returns a PCollection<KV<T,int>>. T's encoding must be deterministic so it is valid as a key.

func CountElms

func CountElms(s beam.Scope, col beam.PCollection) beam.PCollection

CountElms counts the number of elements in a collection. It expects a PCollection<T> as input and returns a PCollection<int> of one element containing the count.

func Max

Max returns the maximal element in a PCollection<A> as a singleton PCollection<A>. It can only be used for numbers, such as int, uint16, float32, etc.

For example:

col := beam.Create(s, 1, 11, 7, 5, 10)
max := stats.Max(s, col)   // PCollection<int> with 11 as the only element.

func MaxPerKey

func MaxPerKey(s beam.Scope, col beam.PCollection) beam.PCollection

MaxPerKey returns the maximal element per key in a PCollection<KV<A,B>> as a PCollection<KV<A,B>>. It can only be used for numbers, such as int, uint16, float32, etc.

func Mean

Mean returns the arithmetic mean (or average) of the elements in a collection. It expects a PCollection<A> as input and returns a singleton PCollection<float64>. It can only be used for numbers, such as int, uint16, float32, etc.

For example:

col := beam.Create(s, 1, 11, 7, 5, 10)
mean := stats.Mean(s, col)   // PCollection<float64> with 6.8 as the only element.

func MeanPerKey

func MeanPerKey(s beam.Scope, col beam.PCollection) beam.PCollection

MeanPerKey returns the arithmetic mean (or average) for each key of the elements in a collection. It expects a PCollection<KV<A,B>> as input and returns a PCollection<KV<A,float64>>. It can only be used for numbers, such as int, uint16, float32, etc.

func Min

Min returns the minimal element in a PCollection<A> as a singleton PCollection<A>. It can only be used for numbers, such as int, uint16, float32, etc.

For example:

col := beam.Create(s, 1, 11, 7, 5, 10)
min := stats.Min(s, col)   // PCollection<int> with 1 as the only element.

func MinPerKey

func MinPerKey(s beam.Scope, col beam.PCollection) beam.PCollection

MinPerKey returns the minimal element per key in a PCollection<KV<A,B>> as a PCollection<KV<A,B>>. It can only be used for numbers, such as int, uint16, float32, etc.

func Sum

Sum returns the sum of the elements in a PCollection<A> as a singleton PCollection<A>. It can only be used for numbers, such as int, uint16, float32, etc.

For example:

col := beam.Create(s, 1, 11, 7, 5, 10)
sum := stats.Sum(s, col)   // PCollection<int> with 34 as the only element.

func SumPerKey

func SumPerKey(s beam.Scope, col beam.PCollection) beam.PCollection

SumPerKey returns the sum of the values per key in a PCollection<KV<A,B>> as a PCollection<KV<A,B>>. It can only be used for value numbers, such as int, uint16, float32, etc.

Types

type Opts

type Opts struct {
	// Controls the memory used and approximation error (difference between the quantile returned and the true quantile.)
	K int
	// Number of quantiles to return. The algorithm will return NumQuantiles - 1 numbers
	NumQuantiles int
	// For extremely large datasets, runners may have issues with out of memory errors or taking too long to finish.
	// If ApproximateQuantiles is failing, you can use this option to tune how the data is sharded internally.
	// This parameter is optional. If unspecified, Beam will compact all elements into a single compactor at once using a single machine.
	// For example, if this is set to [8, 4, 2]: First, elements will be assigned to 8 shards which will run in parallel. Then the intermediate results from those 8 shards will be reassigned to 4 shards and merged in parallel. Then once again to 2 shards. Finally the intermediate results of those two shards will be merged on one machine before returning the final result.
	InternalSharding []int
}

Opts contains settings used to configure how approximate quantiles are computed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL