Documentation ¶
Overview ¶
Package transform provides Operations which transform DataFrame rows
Index ¶
- func AddColumn(accessor sif.ColumnAccessor) *sif.DataFrameOperation
- func Filter(fn sif.FilterOperation) *sif.DataFrameOperation
- func FlatMap(fn sif.FlatMapOperation) *sif.DataFrameOperation
- func Group(kfn sif.KeyingOperation) *sif.DataFrameOperation
- func KeyColumns(accessors ...sif.ColumnAccessor) sif.KeyingOperation
- func Map(fn sif.MapOperation) *sif.DataFrameOperation
- func Reduce(kfn sif.KeyingOperation, fn sif.ReductionOperation) *sif.DataFrameOperation
- func RemoveColumn(accessors ...sif.ColumnAccessor) *sif.DataFrameOperation
- func RenameColumn(accessor sif.ColumnAccessor, newAccessor sif.ColumnAccessor) *sif.DataFrameOperation
- func Repartition(targetPartitionSize int, kfn sif.KeyingOperation) *sif.DataFrameOperation
- func RepartitionReduce(targetPartitionSize int, kfn sif.KeyingOperation, fn sif.ReductionOperation) *sif.DataFrameOperation
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AddColumn ¶
func AddColumn(accessor sif.ColumnAccessor) *sif.DataFrameOperation
AddColumn declares that a new (empty) column with a specific type and name should be available to the next Task of the DataFrame pipeline
func Filter ¶
func Filter(fn sif.FilterOperation) *sif.DataFrameOperation
Filter filters Rows out of a Partition, creating a new one. Rows are retained iff FilterOperation returns true.
func FlatMap ¶
func FlatMap(fn sif.FlatMapOperation) *sif.DataFrameOperation
FlatMap transforms a Row, potentially producing new rows
func Group ¶
func Group(kfn sif.KeyingOperation) *sif.DataFrameOperation
Group shuffles rows across workers, using a key - useful for grouping buckets of data together on single workers
func KeyColumns ¶
func KeyColumns(accessors ...sif.ColumnAccessor) sif.KeyingOperation
KeyColumns is a shortcut for defining a KeyingOperation which uses multiple source column values to produce a compound key.
func Reduce ¶
func Reduce(kfn sif.KeyingOperation, fn sif.ReductionOperation) *sif.DataFrameOperation
Reduce combines rows across workers, using a key
func RemoveColumn ¶
func RemoveColumn(accessors ...sif.ColumnAccessor) *sif.DataFrameOperation
RemoveColumn marks existing columns for removal at the end of the current stage
func RenameColumn ¶
func RenameColumn(accessor sif.ColumnAccessor, newAccessor sif.ColumnAccessor) *sif.DataFrameOperation
RenameColumn renames an existing column
func Repartition ¶
func Repartition(targetPartitionSize int, kfn sif.KeyingOperation) *sif.DataFrameOperation
Repartition is identical to Group, with the added ability to change the number of rows per partition during the shuffle
func RepartitionReduce ¶
func RepartitionReduce(targetPartitionSize int, kfn sif.KeyingOperation, fn sif.ReductionOperation) *sif.DataFrameOperation
RepartitionReduce is identical to Reduce, with the added ability to change the number of rows per partition during the reduction
Types ¶
This section is empty.