data

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 24, 2023 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Overview

Package data provides primitives for representing and organizing the given data sets. In addition to the traditional sharded data set, it supports a partitioned data set where the data is split into multiple data partitions across nodes in the cluster.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Dataset

type Dataset interface {
	// Getitem retrieves a data sample with the given arguments.  This must provide
	// an index identifying the scheduled data sample and its size.
	Getitem(rank, size int) (_, _ int)

	// Rand retrieves an arbitrary data sample from the data set.
	Rand(rank int) (_, _ int)

	// OnEpochEnd is called at the end of an epoch during training.
	OnEpochEnd(epoch int64)

	// OnTrainEnd terminates the training environment.
	OnTrainEnd()
}

Dataset represents the given data set. All implementations must embed DatasetBase for forward compatibility.

func New

func New(sizes, groups []int, partition bool) Dataset

New creates a new data set with the given arguments.

type DatasetBase

type DatasetBase struct {
}

DatasetBase must be embedded to have forward compatible implementations.

func (DatasetBase) Getitem

func (DatasetBase) Getitem(rank, size int) (_, _ int)

func (DatasetBase) OnEpochEnd

func (DatasetBase) OnEpochEnd(epoch int64)

func (DatasetBase) OnTrainEnd

func (DatasetBase) OnTrainEnd()

func (DatasetBase) Rand

func (DatasetBase) Rand(rank int) (_, _ int)

type PartitionedDataset

type PartitionedDataset struct {
	DatasetBase
	// contains filtered or unexported fields
}

PartitionedDataset represents a partitioned data set where each of the nodes in the cluster holds only a portion of the given data set.

func NewPartitionedDataset

func NewPartitionedDataset(sizes, groups []int) *PartitionedDataset

NewPartitionedDataset creates a new partitioned data set with the given arguments.

func (*PartitionedDataset) Getitem

func (d *PartitionedDataset) Getitem(rank, size int) (_, _ int)

Getitem looks for the data sample with the size nearest to the given size in the partition with the given rank.

func (*PartitionedDataset) OnEpochEnd

func (d *PartitionedDataset) OnEpochEnd(epoch int64)

OnEpochEnd restores the data partitions.

func (*PartitionedDataset) OnTrainEnd

func (d *PartitionedDataset) OnTrainEnd()

OnTrainEnd terminates the training environment.

func (*PartitionedDataset) Rand

func (d *PartitionedDataset) Rand(rank int) (_, _ int)

Rand selects a random data sample from the data set.

type Sample

type Sample struct {
	btree.ItemBase
}

Sample represents a single data sample in the dataset.

func NewSample

func NewSample(index, size int) Sample

NewSample creates a new data sample with the given arguments.

func (Sample) Less

func (s Sample) Less(than btree.Item) bool

Less tests whether the current data sample is less than the given argument. This allows the underlying container to non-deterministically return items for a given key while keeping the sorting order.

type ShardedDataset

type ShardedDataset struct {
	DatasetBase
	// contains filtered or unexported fields
}

ShardedDataset represents a sharded data set where every node in the cluster has a replica of the given data set; hence it ignores rank when looking for the data sample.

func NewShardedDataset

func NewShardedDataset(sizes []int) *ShardedDataset

NewShardedDataset creates a new sharded data set with the given argument.

func (*ShardedDataset) Getitem

func (d *ShardedDataset) Getitem(rank, size int) (_, _ int)

Getitem looks for the data sample with the size nearest to the given size.

func (*ShardedDataset) OnEpochEnd

func (d *ShardedDataset) OnEpochEnd(epoch int64)

OnEpochEnd restores the data samples.

func (*ShardedDataset) OnTrainEnd

func (d *ShardedDataset) OnTrainEnd()

OnTrainEnd terminates the training environment.

func (*ShardedDataset) Rand

func (d *ShardedDataset) Rand(rank int) (_, _ int)

Rand selects a random data sample from the data set.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL