data

package

v0.2.0 Latest Latest Go to latest Published: Aug 20, 2023 License: Apache-2.0 Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

Documentation ¶

Overview ¶

Package data provides primitives for representing and organizing the given data sets. In addition to the traditional sharded data set, it supports a partitioned data set where the data is split into multiple data partitions across nodes in the cluster.

Index ¶

type Dataset
- func New(sizes, groups []int, partition bool) Dataset
type DatasetBase
type PartitionedDataset
- func NewPartitionedDataset(sizes, groups []int) *PartitionedDataset
type Sample
- func NewSample(index, size int) Sample
- func (s Sample) Less(than btree.Item) bool
type ShardedDataset
- func NewShardedDataset(sizes []int) *ShardedDataset

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Dataset ¶

type Dataset interface {
	// Getitem retrieves a data sample with the given arguments.  This must provide
	// an index identifying the scheduled data sample and its size.
	Getitem(rank, size int) (int, int)

	// Rand retrieves an arbitrary data sample from the data set.
	Rand(rank int) (int, int)

	// OnEpochEnd is called at the end of an epoch during training.
	OnEpochEnd(epoch int64)

	// OnTrainEnd terminates the training environment.
	OnTrainEnd()
}

Dataset represents the given data set. All implementations must embed DatasetBase for forward compatibility.

func New ¶

func New(sizes, groups []int, partition bool) Dataset

New creates a new data set with the given arguments.

type DatasetBase ¶

type DatasetBase struct {
}

DatasetBase must be embedded to have forward compatible implementations.

func (DatasetBase) Getitem ¶

func (DatasetBase) Getitem(rank, size int) (_, _ int)

func (DatasetBase) OnEpochEnd ¶

func (DatasetBase) OnEpochEnd(epoch int64)

func (DatasetBase) OnTrainEnd ¶

func (DatasetBase) OnTrainEnd()

func (DatasetBase) Rand ¶

func (DatasetBase) Rand(rank int) (_, _ int)

type PartitionedDataset ¶

type PartitionedDataset struct {
	DatasetBase
	// contains filtered or unexported fields
}

PartitionedDataset represents a partitioned data set where each of the nodes in the cluster holds only a portion of the given data set.

func NewPartitionedDataset ¶

func NewPartitionedDataset(sizes, groups []int) *PartitionedDataset

NewPartitionedDataset creates a new partitioned data set with the given arguments.

func (*PartitionedDataset) Getitem ¶

func (d *PartitionedDataset) Getitem(rank, size int) (_, _ int)

Getitem looks for the data sample with the size nearest to the given size in the partition with the given rank.

func (*PartitionedDataset) OnEpochEnd ¶

func (d *PartitionedDataset) OnEpochEnd(epoch int64)

OnEpochEnd restores the data partitions.

func (*PartitionedDataset) OnTrainEnd ¶

func (d *PartitionedDataset) OnTrainEnd()

OnTrainEnd terminates the training environment.

func (*PartitionedDataset) Rand ¶

func (d *PartitionedDataset) Rand(rank int) (_, _ int)

Rand selects a random data sample from the data set.

type Sample ¶

type Sample struct {
	btree.ItemBase
}

Sample represents a single data sample in the dataset.

func NewSample ¶

func NewSample(index, size int) Sample

NewSample creates a new data sample with the given arguments.

func (Sample) Less ¶

func (s Sample) Less(than btree.Item) bool

Less tests whether the current data sample is less than the given argument. This allows the underlying container to non-deterministically return items for a given key while keeping the sorting order.

type ShardedDataset ¶

type ShardedDataset struct {
	DatasetBase
	// contains filtered or unexported fields
}

ShardedDataset represents a sharded data set where every node in the cluster has a replica of the given data set; hence it ignores rank when looking for the data sample.

func NewShardedDataset ¶

func NewShardedDataset(sizes []int) *ShardedDataset

NewShardedDataset creates a new sharded data set with the given argument.

func (*ShardedDataset) Getitem ¶

func (d *ShardedDataset) Getitem(rank, size int) (_, _ int)

Getitem looks for the data sample with the size nearest to the given size.

func (*ShardedDataset) OnEpochEnd ¶

func (d *ShardedDataset) OnEpochEnd(epoch int64)

OnEpochEnd restores the data samples.

func (*ShardedDataset) OnTrainEnd ¶

func (d *ShardedDataset) OnTrainEnd()

OnTrainEnd terminates the training environment.

func (*ShardedDataset) Rand ¶

func (d *ShardedDataset) Rand(rank int) (_, _ int)

Rand selects a random data sample from the data set.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL