Documentation ¶
Overview ¶
Package controller provides a Kubernetes controller for a TFJob resource.
Package controller provides a Kubernetes controller for a TFJob resource.
Package controller provides a Kubernetes controller for a TFJob resource.
Package controller provides a Kubernetes controller for a TFJob resource.
Package controller provides a Kubernetes controller for a TFJob resource.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( // KeyFunc is the short name to DeletionHandlingMetaNamespaceKeyFunc. // IndexerInformer uses a delta queue, therefore for deletes we have to use this // key function but it should be just fine for non delete events. KeyFunc = cache.DeletionHandlingMetaNamespaceKeyFunc // DefaultTFJobControllerConfiguration is the suggested tf-operator configuration for production. DefaultTFJobControllerConfiguration = TFJobControllerConfiguration{ ReconcilerSyncLoopPeriod: metav1.Duration{Duration: 15 * time.Second}, } )
Functions ¶
func NewUnstructuredTFJobInformer ¶
func NewUnstructuredTFJobInformer(restConfig *restclientset.Config) tfjobinformersv1alpha2.TFJobInformer
func RecheckDeletionTimestamp ¶
RecheckDeletionTimestamp returns a CanAdopt() function to recheck deletion.
The CanAdopt() function calls getObject() to fetch the latest value, and denies adoption attempts if that object has a non-nil DeletionTimestamp.
Types ¶
type ClusterSpec ¶
ClusterSpec represents a cluster TensorFlow specification. https://www.tensorflow.org/deploy/distributed#create_a_tftrainclusterspec_to_describe_the_cluster It is a map from job names to network addresses.
type TFConfig ¶
type TFConfig struct { // Cluster represents a TensorFlow ClusterSpec. // See: https://www.tensorflow.org/api_docs/python/tf/train/ClusterSpec Cluster ClusterSpec `json:"cluster"` Task TaskSpec `json:"task"` }
TFConfig is a struct representing the distributed TensorFlow config. This struct is turned into an environment variable TF_CONFIG which is used by TensorFlow processes to configure themselves. https://www.tensorflow.org/api_docs/python/tf/estimator/RunConfig#methods https://cloud.google.com/ml-engine/docs/tensorflow/distributed-training-details
type TFJobController ¶
type TFJobController struct {
// contains filtered or unexported fields
}
TFJobController is the type for TFJob Controller, which manages the lifecycle of TFJobs.
func NewTFJobController ¶
func NewTFJobController( tfJobInformer tfjobinformersv1alpha2.TFJobInformer, kubeClientSet kubeclientset.Interface, tfJobClientSet tfjobclientset.Interface, kubeInformerFactory kubeinformers.SharedInformerFactory, tfJobInformerFactory tfjobinformers.SharedInformerFactory) *TFJobController
NewTFJobController returns a new TFJob controller.
func (*TFJobController) NewTFJobInformer ¶
func (tc *TFJobController) NewTFJobInformer(tfJobInformerFactory tfjobinformers.SharedInformerFactory) tfjobinformersv1alpha2.TFJobInformer
NewTFJobInformer returns TFJobInformer from the given factory.
func (*TFJobController) Run ¶
func (tc *TFJobController) Run(threadiness int, stopCh <-chan struct{}) error
Run will set up the event handlers for types we are interested in, as well as syncing informer caches and starting workers. It will block until stopCh is closed, at which point it will shutdown the workqueue and wait for workers to finish processing their current work items.
type TFJobControllerConfiguration ¶
type TFJobControllerConfiguration struct { // ReconcilerSyncLoopPeriod is the amount of time the reconciler sync states loop // wait between two reconciler sync. // It is set to 15 sec by default. // TODO(cph): maybe we can let it grows by multiple in the future // and up to 5 minutes to reduce idle loop. // e.g. 15s, 30s, 60s, 120s... ReconcilerSyncLoopPeriod metav1.Duration }
TFJobControllerConfiguration contains configuration of tf-operator. DefaultTimerConfig is the suggested tf-operator configuration for production.