Documentation
¶
Overview ¶
Package data provides primitives for representing and organizing the given data sets. In addition to the traditional sharded data set, it supports a partitioned data set where the data is split into multiple data partitions across nodes in the cluster.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Dataset ¶
type Dataset interface { // Getitem retrieves a data sample with the given arguments. This must provide // an index identifying the scheduled data sample and its size. Getitem(rank, size int) (int, int) // Rand retrieves an arbitrary data sample from the data set. Rand(rank int) (int, int) // OnEpochEnd is called at the end of an epoch during training. OnEpochEnd(epoch int64) // OnTrainEnd terminates the training environment. OnTrainEnd() }
Dataset represents the given data set. All implementations must embed DatasetBase for forward compatibility.
type DatasetBase ¶
type DatasetBase struct { }
DatasetBase must be embedded to have forward compatible implementations.
func (DatasetBase) Getitem ¶
func (DatasetBase) Getitem(rank, size int) (_, _ int)
func (DatasetBase) OnEpochEnd ¶
func (DatasetBase) OnEpochEnd(epoch int64)
func (DatasetBase) OnTrainEnd ¶
func (DatasetBase) OnTrainEnd()
func (DatasetBase) Rand ¶
func (DatasetBase) Rand(rank int) (_, _ int)
type PartitionedDataset ¶
type PartitionedDataset struct { DatasetBase // contains filtered or unexported fields }
PartitionedDataset represents a partitioned data set where each of the nodes in the cluster holds only a portion of the given data set.
func NewPartitionedDataset ¶
func NewPartitionedDataset(sizes, groups []int) *PartitionedDataset
NewPartitionedDataset creates a new partitioned data set with the given arguments.
func (*PartitionedDataset) Getitem ¶
func (d *PartitionedDataset) Getitem(rank, size int) (_, _ int)
Getitem looks for the data sample with the size nearest to the given size in the partition with the given rank.
func (*PartitionedDataset) OnEpochEnd ¶
func (d *PartitionedDataset) OnEpochEnd(epoch int64)
OnEpochEnd restores the data partitions.
func (*PartitionedDataset) OnTrainEnd ¶
func (d *PartitionedDataset) OnTrainEnd()
OnTrainEnd terminates the training environment.
func (*PartitionedDataset) Rand ¶
func (d *PartitionedDataset) Rand(rank int) (_, _ int)
Rand selects a random data sample from the data set.
type Sample ¶
Sample represents a single data sample in the dataset.
type ShardedDataset ¶
type ShardedDataset struct { DatasetBase // contains filtered or unexported fields }
ShardedDataset represents a sharded data set where every node in the cluster has a replica of the given data set; hence it ignores rank when looking for the data sample.
func NewShardedDataset ¶
func NewShardedDataset(sizes []int) *ShardedDataset
NewShardedDataset creates a new sharded data set with the given argument.
func (*ShardedDataset) Getitem ¶
func (d *ShardedDataset) Getitem(rank, size int) (_, _ int)
Getitem looks for the data sample with the size nearest to the given size.
func (*ShardedDataset) OnEpochEnd ¶
func (d *ShardedDataset) OnEpochEnd(epoch int64)
OnEpochEnd restores the data samples.
func (*ShardedDataset) OnTrainEnd ¶
func (d *ShardedDataset) OnTrainEnd()
OnTrainEnd terminates the training environment.
func (*ShardedDataset) Rand ¶
func (d *ShardedDataset) Rand(rank int) (_, _ int)
Rand selects a random data sample from the data set.