Documentation ¶
Overview ¶
Package stats contains transforms for statistical processing.
Index ¶
- func ApproximateQuantiles(s beam.Scope, pc beam.PCollection, less interface{}, opts Opts) beam.PCollection
- func ApproximateWeightedQuantiles(s beam.Scope, pc beam.PCollection, less interface{}, opts Opts) beam.PCollection
- func Count(s beam.Scope, col beam.PCollection) beam.PCollection
- func CountElms(s beam.Scope, col beam.PCollection) beam.PCollection
- func Max(s beam.Scope, col beam.PCollection) beam.PCollection
- func MaxPerKey(s beam.Scope, col beam.PCollection) beam.PCollection
- func Mean(s beam.Scope, col beam.PCollection) beam.PCollection
- func MeanPerKey(s beam.Scope, col beam.PCollection) beam.PCollection
- func Min(s beam.Scope, col beam.PCollection) beam.PCollection
- func MinPerKey(s beam.Scope, col beam.PCollection) beam.PCollection
- func Sum(s beam.Scope, col beam.PCollection) beam.PCollection
- func SumPerKey(s beam.Scope, col beam.PCollection) beam.PCollection
- type Opts
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ApproximateQuantiles ¶
func ApproximateQuantiles(s beam.Scope, pc beam.PCollection, less interface{}, opts Opts) beam.PCollection
ApproximateQuantiles computes approximate quantiles for the input PCollection<T>.
The output PCollection contains a single element: a list of numQuantiles - 1 elements approximately splitting up the input collection into numQuantiles separate quantiles. For example, if numQuantiles = 2, the returned list would contain a single element such that approximately half of the input would be less than that element and half would be greater.
func ApproximateWeightedQuantiles ¶
func ApproximateWeightedQuantiles(s beam.Scope, pc beam.PCollection, less interface{}, opts Opts) beam.PCollection
ApproximateWeightedQuantiles computes approximate quantiles for the input PCollection<(weight int, T)>.
The output PCollection contains a single element: a list of numQuantiles - 1 elements approximately splitting up the input collection into numQuantiles separate quantiles. For example, if numQuantiles = 2, the returned list would contain a single element such that approximately half of the input would be less than that element and half would be greater or equal.
func Count ¶
func Count(s beam.Scope, col beam.PCollection) beam.PCollection
Count counts the number of appearances of each element in a collection. It expects a PCollection<T> as input and returns a PCollection<KV<T,int>>. T's encoding must be deterministic so it is valid as a key.
func CountElms ¶
func CountElms(s beam.Scope, col beam.PCollection) beam.PCollection
CountElms counts the number of elements in a collection. It expects a PCollection<T> as input and returns a PCollection<int> of one element containing the count.
func Max ¶
func Max(s beam.Scope, col beam.PCollection) beam.PCollection
Max returns the maximal element in a PCollection<A> as a singleton PCollection<A>. It can only be used for numbers, such as int, uint16, float32, etc.
For example:
col := beam.Create(s, 1, 11, 7, 5, 10) max := stats.Max(s, col) // PCollection<int> with 11 as the only element.
func MaxPerKey ¶
func MaxPerKey(s beam.Scope, col beam.PCollection) beam.PCollection
MaxPerKey returns the maximal element per key in a PCollection<KV<A,B>> as a PCollection<KV<A,B>>. It can only be used for numbers, such as int, uint16, float32, etc.
func Mean ¶
func Mean(s beam.Scope, col beam.PCollection) beam.PCollection
Mean returns the arithmetic mean (or average) of the elements in a collection. It expects a PCollection<A> as input and returns a singleton PCollection<float64>. It can only be used for numbers, such as int, uint16, float32, etc.
For example:
col := beam.Create(s, 1, 11, 7, 5, 10) mean := stats.Mean(s, col) // PCollection<float64> with 6.8 as the only element.
func MeanPerKey ¶
func MeanPerKey(s beam.Scope, col beam.PCollection) beam.PCollection
MeanPerKey returns the arithmetic mean (or average) for each key of the elements in a collection. It expects a PCollection<KV<A,B>> as input and returns a PCollection<KV<A,float64>>. It can only be used for numbers, such as int, uint16, float32, etc.
func Min ¶
func Min(s beam.Scope, col beam.PCollection) beam.PCollection
Min returns the minimal element in a PCollection<A> as a singleton PCollection<A>. It can only be used for numbers, such as int, uint16, float32, etc.
For example:
col := beam.Create(s, 1, 11, 7, 5, 10) min := stats.Min(s, col) // PCollection<int> with 1 as the only element.
func MinPerKey ¶
func MinPerKey(s beam.Scope, col beam.PCollection) beam.PCollection
MinPerKey returns the minimal element per key in a PCollection<KV<A,B>> as a PCollection<KV<A,B>>. It can only be used for numbers, such as int, uint16, float32, etc.
func Sum ¶
func Sum(s beam.Scope, col beam.PCollection) beam.PCollection
Sum returns the sum of the elements in a PCollection<A> as a singleton PCollection<A>. It can only be used for numbers, such as int, uint16, float32, etc.
For example:
col := beam.Create(s, 1, 11, 7, 5, 10) sum := stats.Sum(s, col) // PCollection<int> with 34 as the only element.
func SumPerKey ¶
func SumPerKey(s beam.Scope, col beam.PCollection) beam.PCollection
SumPerKey returns the sum of the values per key in a PCollection<KV<A,B>> as a PCollection<KV<A,B>>. It can only be used for value numbers, such as int, uint16, float32, etc.
Types ¶
type Opts ¶
type Opts struct { // Controls the memory used and approximation error (difference between the quantile returned and the true quantile.) K int // Number of quantiles to return. The algorithm will return NumQuantiles - 1 numbers NumQuantiles int // For extremely large datasets, runners may have issues with out of memory errors or taking too long to finish. // If ApproximateQuantiles is failing, you can use this option to tune how the data is sharded internally. // This parameter is optional. If unspecified, Beam will compact all elements into a single compactor at once using a single machine. // For example, if this is set to [8, 4, 2]: First, elements will be assigned to 8 shards which will run in parallel. Then the intermediate results from those 8 shards will be reassigned to 4 shards and merged in parallel. Then once again to 2 shards. Finally the intermediate results of those two shards will be merged on one machine before returning the final result. InternalSharding []int }
Opts contains settings used to configure how approximate quantiles are computed.