data

package

v0.2.6 Latest Latest Go to latest Published: Feb 16, 2024 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

Documentation ¶

Overview ¶

Package data provides primitives for representing and organizing the given data sets. In addition to the traditional sharded data set, it supports a partitioned data set where the data is split into multiple data partitions across nodes in the cluster.

Index ¶

type Dataset
- func New(sizes, groups []int64, seed int64, partition bool) Dataset
type PartitionedDataset
- func NewPartitionedDataset(sizes, groups []int64, seed int64) *PartitionedDataset
type Sample
- func NewSample(index, size int64) Sample
- func (s Sample) Less(than btree.Item) bool
type ShardedDataset
- func NewShardedDataset(sizes []int64, seed int64) *ShardedDataset

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Dataset ¶

type Dataset interface {
	// Getitem retrieves a data sample with the given arguments.  This must provide
	// an index identifying the scheduled data sample and its size.
	Getitem(rank int, size int64) (int64, int64)

	// Rand retrieves an arbitrary data sample from the data set.
	Rand(rank int) (int64, int64)

	// OnEpochEnd is called at the end of an epoch during training.
	OnEpochEnd(epoch int64)

	// OnTrainEnd terminates the training environment.
	OnTrainEnd()
}

Dataset represents the given data set. In addition to Getitem and Rand to retrieve data samples, one should implement callbacks called OnEpochEnd and OnTrainEnd, which are called at the end of each epoch and training, respectively.

func New ¶

func New(sizes, groups []int64, seed int64, partition bool) Dataset

New creates a new data set with the given arguments.

type PartitionedDataset ¶

type PartitionedDataset struct {
	// contains filtered or unexported fields
}

PartitionedDataset represents a partitioned data set where each of the nodes in the cluster holds only a portion of the given data set.

func NewPartitionedDataset ¶

func NewPartitionedDataset(sizes, groups []int64, seed int64) *PartitionedDataset

NewPartitionedDataset creates a new partitioned data set with the given arguments.

func (*PartitionedDataset) Getitem ¶

func (d *PartitionedDataset) Getitem(rank int, size int64) (_, _ int64)

Getitem looks for the data sample with the size nearest to the given size in the partition with the given rank.

func (*PartitionedDataset) OnEpochEnd ¶

func (d *PartitionedDataset) OnEpochEnd(epoch int64)

OnEpochEnd restores the data partitions.

func (*PartitionedDataset) OnTrainEnd ¶

func (d *PartitionedDataset) OnTrainEnd()

OnTrainEnd terminates the training environment.

func (*PartitionedDataset) Rand ¶

func (d *PartitionedDataset) Rand(rank int) (_, _ int64)

Rand selects a random data sample from the data set.

type Sample ¶

type Sample struct {
	btree.ItemBase
}

Sample represents a single data sample in the data set.

func NewSample ¶

func NewSample(index, size int64) Sample

NewSample creates a new data sample with the given arguments.

func (Sample) Less ¶

func (s Sample) Less(than btree.Item) bool

Less tests whether the current data sample is less than the given argument. This allows the underlying container to non-deterministically return items for a given key while keeping the sorting order.

type ShardedDataset ¶

type ShardedDataset struct {
	// contains filtered or unexported fields
}

ShardedDataset represents a sharded data set where every node in the cluster has a replica of the given data set; hence it ignores rank when looking for the data sample.

func NewShardedDataset ¶

func NewShardedDataset(sizes []int64, seed int64) *ShardedDataset

NewShardedDataset creates a new sharded data set with the given argument.

func (*ShardedDataset) Getitem ¶

func (d *ShardedDataset) Getitem(rank int, size int64) (_, _ int64)

Getitem looks for the data sample with the size nearest to the given size.

func (*ShardedDataset) OnEpochEnd ¶

func (d *ShardedDataset) OnEpochEnd(epoch int64)

OnEpochEnd restores the data samples.

func (*ShardedDataset) OnTrainEnd ¶

func (d *ShardedDataset) OnTrainEnd()

OnTrainEnd terminates the training environment.

func (*ShardedDataset) Rand ¶

func (d *ShardedDataset) Rand(rank int) (_, _ int64)

Rand selects a random data sample from the data set.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL