Documentation ¶
Overview ¶
Package controller provides a Kubernetes controller for a TensorFlow job resource.
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var ( // ErrVersionOutdated is a exported var to capture the error in apiserver ErrVersionOutdated = errors.New("requested version is outdated in apiserver") // DefaultJobBackOff is the max backoff period, exported for the e2e test DefaultJobBackOff = 10 * time.Second // MaxJobBackOff is the max backoff period, exported for the e2e test MaxJobBackOff = 360 * time.Second )
Functions ¶
This section is empty.
Types ¶
type Controller ¶
type Controller struct { KubeClient kubernetes.Interface TFJobClient tfjobclient.Interface TFJobLister listers.TFJobLister TFJobSynced cache.InformerSynced // WorkQueue is a rate limited work queue. This is used to queue work to be // processed instead of performing it as soon as a change happens. This // means we can ensure we only process a fixed amount of resources at a // time, and makes it easy to ensure we are never processing the same item // simultaneously in two different workers. // // Items in the work queue correspond to the name of the job. // In response to various events (e.g. Add, Update, Delete), the informer // is configured to add events to the queue. Since the item in the queue // represents a job and not a particular event, we end up aggregating events for // a job and ensure that a particular job isn't being processed by multiple // workers simultaneously. // // We rely on the informer to periodically generate Update events. This ensures // we regularly check on each TFJob and take any action needed. // // If there is a problem processing a job, processNextWorkItem just requeues // the work item. This ensures that we end up retrying it. In this case // we rely on the rateLimiter in the worker queue to retry with exponential // backoff. WorkQueue workqueue.RateLimitingInterface // contains filtered or unexported fields }
Controller is structure to manage various service clients
func New ¶
func New(kubeClient kubernetes.Interface, tfJobClient tfjobclient.Interface, config tfv1alpha1.ControllerConfig, tfJobInformerFactory informers.SharedInformerFactory, enableGangScheduling bool) (*Controller, error)
New method sets up service client handles and returns controller object
func (*Controller) Run ¶
func (c *Controller) Run(threadiness int, stopCh <-chan struct{}) error
Run will set up the event handlers for types we are interested in, as well as syncing informer caches and starting workers. It will block until stopCh is closed, at which point it will shutdown the workqueue and wait for workers to finish processing their current work items.
Click to show internal directories.
Click to hide internal directories.