allgather

package
v0.38.0-rc4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 31, 2024 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultTimeout = 10 * time.Minute

DefaultTimeout is the default timeout for all gather.

View Source
var ErrAllGatherTimeoutExceeded = fmt.Errorf(
	"some ranks are taking a long time to connect to master " +
		"during all gather; when running on kubernetes this may happen " +
		"because only some of the pods have been scheduled; it is possible " +
		"that some pods will never be scheduled without adding compute " +
		"resources or pausing / killing other experiments in the cluster",
)

ErrAllGatherTimeoutExceeded indicates that we not halt within the expected deadline.

View Source
var ErrClosed = fmt.Errorf("left or closed")

ErrClosed is returned from a closed and incomplete allgather.

View Source
var ErrReconnected = fmt.Errorf("another watcher with the same ID connected")

ErrReconnected indicates another watcher connected with the same ID. Only one watcher should connect per ID. Anyone attempted to synchronize more things should use more `numPeers` and different IDs.

Functions

func Leave

func Leave(groupID string, id uuid.UUID)

Leave removes from member with `id` from the allgather group `groupID`. The last member out closes the group.

Types

type Result

type Result struct {
	Data []any
	Err  error
}

Result contains the information from a completed all gather.

type Watcher

type Watcher struct {
	C <-chan Result
}

Watcher signals all gather completion via a channel which is closed upon said completion.

func Join

func Join(
	groupID string,
	id uuid.UUID,
	numPeers int,
	data any,
	ready func(),
	timeout func(error),
) Watcher

Join adds the member with `id` to the allgather group `groupID`. The allgather waits until `numPeers` members are waiting then gives all members all the submitted `data` and fires the `ready` callback. If the `DefaultTimeout` is exceeded before `ready`, the `timeout` callback fires. Note, the data is not a copy, it should not be mutated.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL