Documentation ¶
Index ¶
- func AddResourceList(a v1.ResourceList, b v1.ResourceList)
- type Autoscaler
- type Cluster
- func (c *Cluster) CreateJob(j *batchv1.Job) (*batchv1.Job, error)
- func (c *Cluster) CreateReplicaSet(r *v1beta1.ReplicaSet) (*v1beta1.ReplicaSet, error)
- func (c *Cluster) DeleteReplicaSet(namespace, name string) error
- func (c *Cluster) DeleteTrainerJob(namespace, name string) error
- func (c *Cluster) GetReplicaSet(namespace, name string) (*v1beta1.ReplicaSet, error)
- func (c Cluster) GetTrainerJob(job *edlresource.TrainingJob) (*batchv1.Job, error)
- func (c Cluster) GetTrainerJobByName(namespace, name string) (*batchv1.Job, error)
- func (c *Cluster) InquiryResource() (res ClusterResource, err error)
- func (c Cluster) JobPods(job *edlresource.TrainingJob) (total, running, pending int, err error)
- func (c Cluster) UpdateTrainerJob(job *batchv1.Job) error
- type ClusterResource
- type Controller
- type DefaultJobParser
- func (p *DefaultJobParser) ParseToMaster(job *edlresource.TrainingJob) *v1beta1.ReplicaSet
- func (p *DefaultJobParser) ParseToPserver(job *edlresource.TrainingJob) *v1beta1.ReplicaSet
- func (p *DefaultJobParser) ParseToTrainer(job *edlresource.TrainingJob) *batchv1.Job
- func (p *DefaultJobParser) Validate(job *edlresource.TrainingJob) error
- type EtcdClient
- type JobParser
- type Nodes
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func AddResourceList ¶
func AddResourceList(a v1.ResourceList, b v1.ResourceList)
AddResourceList add another v1.ResourceList to first's inner quantity. v1.ResourceList is equal to map[string]Quantity
Types ¶
type Autoscaler ¶
type Autoscaler struct {
// contains filtered or unexported fields
}
Autoscaler launches and scales the training jobs.
func (*Autoscaler) OnAdd ¶
func (a *Autoscaler) OnAdd(trainingjob *edlresource.TrainingJob)
OnAdd notifies the autoscaler that a job has been added.
func (*Autoscaler) OnDel ¶
func (a *Autoscaler) OnDel(trainingjob *edlresource.TrainingJob)
OnDel notifies the autoscaler that a job has been deleted.
func (*Autoscaler) OnUpdate ¶
func (a *Autoscaler) OnUpdate(trainingjob *edlresource.TrainingJob)
OnUpdate notifies the autoscaler that a job has been deleted.
func (*Autoscaler) Run ¶
func (a *Autoscaler) Run()
Run monitors the cluster resources and training jobs in a loop, scales the training jobs according to the cluster resource.
type Cluster ¶
type Cluster struct {
// contains filtered or unexported fields
}
Cluster is our interface to the Kubernetes cluster. It can inquiry the cluster's overall status and the status of a specific PaddlePaddle trainning job. It can also create training jobs and replica.
TODO(yi): The above functionalities are NOT logically related with each other. I am not sure if it is a good idea to group them in this source file.
func (*Cluster) CreateReplicaSet ¶
func (c *Cluster) CreateReplicaSet(r *v1beta1.ReplicaSet) (*v1beta1.ReplicaSet, error)
CreateReplicaSet creates a ReplicaSet.
func (*Cluster) DeleteReplicaSet ¶
DeleteReplicaSet delete a ReplicaSet and their pods.
func (*Cluster) DeleteTrainerJob ¶
DeleteTrainerJob deletes a trainerjob and their pods. see: https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/
func (*Cluster) GetReplicaSet ¶
func (c *Cluster) GetReplicaSet(namespace, name string) (*v1beta1.ReplicaSet, error)
GetReplicaSet gets a ReplicaSet.
func (Cluster) GetTrainerJob ¶
func (c Cluster) GetTrainerJob(job *edlresource.TrainingJob) (*batchv1.Job, error)
GetTrainerJob gets the trainer job spec.
func (Cluster) GetTrainerJobByName ¶
GetTrainerJobByName gets the trainer job spec.
func (*Cluster) InquiryResource ¶
func (c *Cluster) InquiryResource() (res ClusterResource, err error)
InquiryResource returns the idle and total resources of the Kubernetes cluster.
func (Cluster) JobPods ¶
func (c Cluster) JobPods(job *edlresource.TrainingJob) (total, running, pending int, err error)
JobPods returns the number total desired pods and the number of running pods of a job.
type ClusterResource ¶
type ClusterResource struct { NodeCount int // The total number of nodes in the cluster. // Each Kubernetes job could require some number of GPUs in // the range of [request, limit]. GPURequest int // \sum_job num_gpu_request(job) GPULimit int // \sum_job num_gpu_limit(job) GPUTotal int // The total number of GPUs in the cluster // Each Kubernetes job could require some CPU timeslices in // the unit of *milli*. CPURequestMilli int64 // \sum_job cpu_request_in_milli(job) CPULimitMilli int64 // \sum_job cpu_request_in_milli(job) CPUTotalMilli int64 // The total amount of CPUs in the cluster in milli. // Each Kubernetes job could require some amount of memory in // the unit of *mega*. MemoryRequestMega int64 // \sum_job memory_request_in_mega(job) MemoryLimitMega int64 // \sum_job memory_limit_in_mega(job) MemoryTotalMega int64 // The total amount of memory in the cluster in mega. Nodes Nodes }
ClusterResource is the resource of a cluster
type Controller ¶
type Controller struct {
// contains filtered or unexported fields
}
Controller for dispatching TrainingJob resource.
func New ¶
func New(c *rest.RESTClient, cs *kubernetes.Clientset, maxLoadDesired float64) (*Controller, error)
New construct a new Controller struct
func (*Controller) Run ¶
func (c *Controller) Run()
Run start to watch kubernetes events and do handlers.
func (*Controller) WatchTrainingJobs ¶
func (c *Controller) WatchTrainingJobs()
WatchTrainingJobs moinitors trainingjobs resources.
type DefaultJobParser ¶
type DefaultJobParser int
DefaultJobParser implement a basic JobParser.
func (*DefaultJobParser) ParseToMaster ¶
func (p *DefaultJobParser) ParseToMaster(job *edlresource.TrainingJob) *v1beta1.ReplicaSet
ParseToMaster parse TrainingJob to a kubernetes replicaset resource.
func (*DefaultJobParser) ParseToPserver ¶
func (p *DefaultJobParser) ParseToPserver(job *edlresource.TrainingJob) *v1beta1.ReplicaSet
ParseToPserver generate a pserver replicaset resource according to "TrainingJob" resource specs.
func (*DefaultJobParser) ParseToTrainer ¶
func (p *DefaultJobParser) ParseToTrainer(job *edlresource.TrainingJob) *batchv1.Job
ParseToTrainer parse TrainingJob to a kubernetes job resource.
func (*DefaultJobParser) Validate ¶
func (p *DefaultJobParser) Validate(job *edlresource.TrainingJob) error
Validate updates default values for the added job and validates the fields.
type EtcdClient ¶ added in v0.2.0
type EtcdClient struct {
// contains filtered or unexported fields
}
EtcdClient is the etcd client that the pserver uses for fault tolerance, service registry and coordination.
func NewEtcdClient ¶ added in v0.2.0
func NewEtcdClient(endpoints string, numPservers int, dialtimeout time.Duration, ttlSec int) *EtcdClient
NewEtcdClient creates an EtcdClient
func (*EtcdClient) Register ¶ added in v0.2.0
func (e *EtcdClient) Register(port int) error
Register returns the index of the current pserver.
func (*EtcdClient) Shutdown ¶ added in v0.2.0
func (e *EtcdClient) Shutdown() error
Shutdown shuts down the etcd client gracefully.
type JobParser ¶
type JobParser interface { Validate(job *edlresource.TrainingJob) error ParseToTrainer(job *edlresource.TrainingJob) *batchv1.Job ParseToPserver(job *edlresource.TrainingJob) *v1beta1.ReplicaSet ParseToMaster(job *edlresource.TrainingJob) *v1beta1.ReplicaSet }
JobParser is a interface can parse "TrainingJob" to ReplicaSet and job.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
apis
|
|
paddlepaddle/v1
Package v1 is the v1 version of the API.
|
Package v1 is the v1 version of the API. |
client
|
|
clientset/versioned
This package has the automatically generated clientset.
|
This package has the automatically generated clientset. |
clientset/versioned/fake
This package has the automatically generated fake clientset.
|
This package has the automatically generated fake clientset. |
clientset/versioned/scheme
This package contains the scheme of the automatically generated clientset.
|
This package contains the scheme of the automatically generated clientset. |
clientset/versioned/typed/paddlepaddle/v1
This package has the automatically generated typed clients.
|
This package has the automatically generated typed clients. |
clientset/versioned/typed/paddlepaddle/v1/fake
Package fake has the automatically generated clients.
|
Package fake has the automatically generated clients. |