Documentation ¶
Overview ¶
Package data provides primitives for representing and organizing the given data sets. In addition to the traditional sharded data set, it supports a partitioned data set where the data is split into multiple data partitions across nodes in the cluster.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Dataset ¶
type Dataset interface { // Getitem retrieves a data sample with the given arguments. This must provide // an index identifying the scheduled data sample and its size. Getitem(rank int, size int64) (int64, int64) // Rand retrieves an arbitrary data sample from the data set. Rand(rank int) (int64, int64) // OnEpochEnd is called at the end of an epoch during training. OnEpochEnd(epoch int64) // OnTrainEnd terminates the training environment. OnTrainEnd() }
Dataset represents the given data set. In addition to Getitem and Rand to retrieve data samples, one should implement callbacks called OnEpochEnd and OnTrainEnd, which are called at the end of each epoch and training, respectively.
type PartitionedDataset ¶
type PartitionedDataset struct {
// contains filtered or unexported fields
}
PartitionedDataset represents a partitioned data set where each of the nodes in the cluster holds only a portion of the given data set.
func NewPartitionedDataset ¶
func NewPartitionedDataset(sizes, groups []int64, seed int64) *PartitionedDataset
NewPartitionedDataset creates a new partitioned data set with the given arguments.
func (*PartitionedDataset) Getitem ¶
func (d *PartitionedDataset) Getitem(rank int, size int64) (_, _ int64)
Getitem looks for the data sample with the size nearest to the given size in the partition with the given rank.
func (*PartitionedDataset) OnEpochEnd ¶
func (d *PartitionedDataset) OnEpochEnd(epoch int64)
OnEpochEnd restores the data partitions.
func (*PartitionedDataset) OnTrainEnd ¶
func (d *PartitionedDataset) OnTrainEnd()
OnTrainEnd terminates the training environment.
func (*PartitionedDataset) Rand ¶
func (d *PartitionedDataset) Rand(rank int) (_, _ int64)
Rand selects a random data sample from the data set.
type Sample ¶
Sample represents a single data sample in the data set.
type ShardedDataset ¶
type ShardedDataset struct {
// contains filtered or unexported fields
}
ShardedDataset represents a sharded data set where every node in the cluster has a replica of the given data set; hence it ignores rank when looking for the data sample.
func NewShardedDataset ¶
func NewShardedDataset(sizes []int64, seed int64) *ShardedDataset
NewShardedDataset creates a new sharded data set with the given argument.
func (*ShardedDataset) Getitem ¶
func (d *ShardedDataset) Getitem(rank int, size int64) (_, _ int64)
Getitem looks for the data sample with the size nearest to the given size.
func (*ShardedDataset) OnEpochEnd ¶
func (d *ShardedDataset) OnEpochEnd(epoch int64)
OnEpochEnd restores the data samples.
func (*ShardedDataset) OnTrainEnd ¶
func (d *ShardedDataset) OnTrainEnd()
OnTrainEnd terminates the training environment.
func (*ShardedDataset) Rand ¶
func (d *ShardedDataset) Rand(rank int) (_, _ int64)
Rand selects a random data sample from the data set.