Documentation ¶
Overview ¶
Package trainer is to manage Caffe2 training jobs.
Index ¶
- Constants
- type Caffe2Config
- type Caffe2ReplicaSet
- func (s *Caffe2ReplicaSet) Create(config *api.ControllerConfig) error
- func (s *Caffe2ReplicaSet) Delete() error
- func (s *Caffe2ReplicaSet) GetSingleReplicaStatus(index int32) api.ReplicaState
- func (s *Caffe2ReplicaSet) GetStatus() (api.Caffe2ReplicaStatus, error)
- func (s *Caffe2ReplicaSet) Labels() KubernetesLabels
- type Caffe2ReplicaSetInterface
- type ClusterSpec
- type KubernetesLabels
- type TaskSpec
- type TrainingJob
Constants ¶
const ( SuccessfulCreateReason = "SuccessfulCreate" FailedCreateReason = "FailedCreate" )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Caffe2Config ¶
type Caffe2Config struct { // Cluster represents a Caffe2 ClusterSpec. // See: https://www.tensorflow.org/api_docs/python/tf/train/ClusterSpechttps://www.tensorflow.org/api_docs/python/tf/train/ClusterSpec Cluster ClusterSpec `json:"cluster"` Task TaskSpec `json:"task"` Environment string `json:"environment"` }
Caffe2Config is a struct representing the TensorFlow config. This struct is turned into an environment which is used by TensorFlow processes to configure themselves.
type Caffe2ReplicaSet ¶
type Caffe2ReplicaSet struct { ClientSet kubernetes.Interface // Job is a pointer to the TrainingJob to which this replica belongs. Job *TrainingJob Spec api.Caffe2ReplicaSpec // contains filtered or unexported fields }
Caffe2ReplicaSet is a set of Caffe2 processes all acting as the same role (e.g. worker
func NewCaffe2ReplicaSet ¶
func NewCaffe2ReplicaSet(clientSet kubernetes.Interface, recorder record.EventRecorder, caffe2ReplicaSpec api.Caffe2ReplicaSpec, job *TrainingJob) (*Caffe2ReplicaSet, error)
func (*Caffe2ReplicaSet) Create ¶
func (s *Caffe2ReplicaSet) Create(config *api.ControllerConfig) error
func (*Caffe2ReplicaSet) Delete ¶
func (s *Caffe2ReplicaSet) Delete() error
Delete deletes the replicas
func (*Caffe2ReplicaSet) GetSingleReplicaStatus ¶
func (s *Caffe2ReplicaSet) GetSingleReplicaStatus(index int32) api.ReplicaState
func (*Caffe2ReplicaSet) GetStatus ¶
func (s *Caffe2ReplicaSet) GetStatus() (api.Caffe2ReplicaStatus, error)
Status returns the status of the replica set.
func (*Caffe2ReplicaSet) Labels ¶
func (s *Caffe2ReplicaSet) Labels() KubernetesLabels
Labels returns the labels for this replica set.
type Caffe2ReplicaSetInterface ¶
type Caffe2ReplicaSetInterface interface { Create() error Delete() error GetStatus() (api.Caffe2ReplicaStatus, error) }
Caffe2Replicas is an interface for managing a set of replicas.
type ClusterSpec ¶
ClusterSpec represents a cluster Caffe2 specification. https://www.tensorflow.org/deploy/distributed#create_a_tftrainclusterspec_to_describe_the_cluster It is a map from job names to network addresses.
type KubernetesLabels ¶
KubernetesLabels represents a set of labels to apply to a Kubernetes resources.
func (KubernetesLabels) ToSelector ¶
func (l KubernetesLabels) ToSelector() (string, error)
ToSelector converts the labels to a selector matching the labels.
type TrainingJob ¶
type TrainingJob struct { KubeCli kubernetes.Interface Replicas []*Caffe2ReplicaSet // contains filtered or unexported fields }
TODO: We should switch a New pattern and make trainingJob private so we can ensure correctness on creation.
func NewJob ¶
func NewJob(kubeCli kubernetes.Interface, jobClient jobclient.Interface, recorder record.EventRecorder, job *api.Caffe2Job, config *api.ControllerConfig) (*TrainingJob, error)
func (*TrainingJob) ClusterSpec ¶
func (j *TrainingJob) ClusterSpec() ClusterSpec
func (*TrainingJob) Delete ¶
func (j *TrainingJob) Delete()
func (*TrainingJob) GetStatus ¶
func (j *TrainingJob) GetStatus() (api.State, []*api.Caffe2ReplicaStatus, error)
func (*TrainingJob) Reconcile ¶
func (j *TrainingJob) Reconcile(config *api.ControllerConfig) error
reconcile tries to get the job into the desired state.
func (*TrainingJob) UID ¶
func (j *TrainingJob) UID() types.UID