Documentation
¶
Overview ¶
training is a package for managing MXNet training jobs.
Index ¶
Constants ¶
const (
NAMESPACE string = "default"
)
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type KubernetesLabels ¶
KubernetesLabels represents a set of labels to apply to a Kubernetes resources.
func (KubernetesLabels) ToSelector ¶
func (l KubernetesLabels) ToSelector() (string, error)
ToSelector converts the labels to a selector matching the labels.
type MXReplicaSet ¶
type MXReplicaSet struct { ClientSet kubernetes.Interface // Job is a pointer to the TrainingJob to which this replica belongs. Job *TrainingJob Spec spec.MxReplicaSpec }
MXReplicaSet is a set of MX processes all acting as the same role (e.g. worker
func NewMXReplicaSet ¶
func NewMXReplicaSet(clientSet kubernetes.Interface, mxReplicaSpec spec.MxReplicaSpec, job *TrainingJob) (*MXReplicaSet, error)
func (*MXReplicaSet) Create ¶
func (s *MXReplicaSet) Create() error
func (*MXReplicaSet) GetStatus ¶
func (s *MXReplicaSet) GetStatus() (spec.MxReplicaStatus, error)
Status returns the status of the replica set.
func (*MXReplicaSet) Labels ¶
func (s *MXReplicaSet) Labels() KubernetesLabels
Labels returns the labels for this replica set.
type MXReplicaSetInterface ¶
type MXReplicaSetInterface interface { Create() error Delete() error GetStatus() (spec.MxReplicaStatus, error) }
MXReplicas is an interface for managing a set of replicas.
type MxConfig ¶
type MxConfig struct {
Task map[string]interface{} `json:"task"`
}
MXConfig is a struct representing the MXNET config. This struct is turned into an environment which is used by MXNET processes to configure themselves.
type TrainingJob ¶
type TrainingJob struct { KubeCli kubernetes.Interface Replicas []*MXReplicaSet // contains filtered or unexported fields }
TODO(jlewi): We should switch a New pattern and make trainingJob private so we can ensure correctness on creation.
func NewJob ¶
func NewJob(kubeCli kubernetes.Interface, mxJobClient k8sutil.MxJobClient, mxjob *spec.MxJob, stopC <-chan struct{}, wg *sync.WaitGroup, config *spec.ControllerConfig) (*TrainingJob, error)
func (*TrainingJob) Delete ¶
func (j *TrainingJob) Delete()
func (*TrainingJob) GetStatus ¶
func (j *TrainingJob) GetStatus() (spec.State, []*spec.MxReplicaStatus, error)
func (*TrainingJob) Update ¶
func (j *TrainingJob) Update(newJob *spec.MxJob)
Update sends an update event for the job.