controller

package
v0.2.0-rc1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 21, 2018 License: Apache-2.0 Imports: 36 Imported by: 0

Documentation

Overview

Package controller provides a Kubernetes controller for a TFJob resource.

Package controller provides a Kubernetes controller for a TFJob resource.

Package controller provides a Kubernetes controller for a TFJob resource.

Package controller provides a Kubernetes controller for a TFJob resource.

Package controller provides a Kubernetes controller for a TFJob resource.

Index

Constants

This section is empty.

Variables

View Source
var (

	// KeyFunc is the short name to DeletionHandlingMetaNamespaceKeyFunc.
	// IndexerInformer uses a delta queue, therefore for deletes we have to use this
	// key function but it should be just fine for non delete events.
	KeyFunc = cache.DeletionHandlingMetaNamespaceKeyFunc

	// DefaultTFJobControllerConfiguration is the suggested tf-operator configuration for production.
	DefaultTFJobControllerConfiguration = TFJobControllerConfiguration{
		ReconcilerSyncLoopPeriod: metav1.Duration{Duration: 15 * time.Second},
	}
)

Functions

func NewUnstructuredTFJobInformer

func NewUnstructuredTFJobInformer(restConfig *restclientset.Config) tfjobinformersv1alpha2.TFJobInformer

func RecheckDeletionTimestamp

func RecheckDeletionTimestamp(getObject func() (metav1.Object, error)) func() error

RecheckDeletionTimestamp returns a CanAdopt() function to recheck deletion.

The CanAdopt() function calls getObject() to fetch the latest value, and denies adoption attempts if that object has a non-nil DeletionTimestamp.

Types

type ClusterSpec

type ClusterSpec map[string][]string

ClusterSpec represents a cluster TensorFlow specification. https://www.tensorflow.org/deploy/distributed#create_a_tftrainclusterspec_to_describe_the_cluster It is a map from job names to network addresses.

type TFConfig

type TFConfig struct {
	// Cluster represents a TensorFlow ClusterSpec.
	// See: https://www.tensorflow.org/api_docs/python/tf/train/ClusterSpec
	Cluster ClusterSpec `json:"cluster"`
	Task    TaskSpec    `json:"task"`
}

TFConfig is a struct representing the distributed TensorFlow config. This struct is turned into an environment variable TF_CONFIG which is used by TensorFlow processes to configure themselves. https://www.tensorflow.org/api_docs/python/tf/estimator/RunConfig#methods https://cloud.google.com/ml-engine/docs/tensorflow/distributed-training-details

type TFJobController

type TFJobController struct {
	// contains filtered or unexported fields
}

TFJobController is the type for TFJob Controller, which manages the lifecycle of TFJobs.

func NewTFJobController

func NewTFJobController(

	tfJobInformer tfjobinformersv1alpha2.TFJobInformer,
	kubeClientSet kubeclientset.Interface,
	tfJobClientSet tfjobclientset.Interface,
	kubeInformerFactory kubeinformers.SharedInformerFactory,

	tfJobInformerFactory tfjobinformers.SharedInformerFactory) *TFJobController

NewTFJobController returns a new TFJob controller.

func (*TFJobController) NewTFJobInformer

NewTFJobInformer returns TFJobInformer from the given factory.

func (*TFJobController) Run

func (tc *TFJobController) Run(threadiness int, stopCh <-chan struct{}) error

Run will set up the event handlers for types we are interested in, as well as syncing informer caches and starting workers. It will block until stopCh is closed, at which point it will shutdown the workqueue and wait for workers to finish processing their current work items.

type TFJobControllerConfiguration

type TFJobControllerConfiguration struct {
	// ReconcilerSyncLoopPeriod is the amount of time the reconciler sync states loop
	// wait between two reconciler sync.
	// It is set to 15 sec by default.
	// TODO(cph): maybe we can let it grows by multiple in the future
	// and up to 5 minutes to reduce idle loop.
	// e.g. 15s, 30s, 60s, 120s...
	ReconcilerSyncLoopPeriod metav1.Duration
}

TFJobControllerConfiguration contains configuration of tf-operator. DefaultTimerConfig is the suggested tf-operator configuration for production.

type TaskSpec

type TaskSpec struct {
	Type  string `json:"type"`
	Index int    `json:"index"`
}

TaskSpec is the specification for a task (PS or worker) of the TFJob.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL