Documentation ¶
Overview ¶
Package controller provides a Kubernetes controller for a Caffe2Job resource.
Package controller provides a Kubernetes controller for a TensorFlow job resource.
Package controller provides a Kubernetes controller for a Caffe2Job resource.
Package controller provides a Kubernetes controller for a Caffe2Job resource.
Index ¶
- Constants
- Variables
- func GetPodFromTemplate(template *v1.PodTemplateSpec, parentObject runtime.Object, ...) (*v1.Pod, error)
- func RecheckDeletionTimestamp(getObject func() (metav1.Object, error)) func() error
- type BaseControllerRefManager
- type Caffe2Config
- type ClusterSpec
- type Controller
- type ControllerConfiguration
- type FakeServiceControl
- func (f *FakeServiceControl) CreateServices(namespace string, service *v1.Service, object runtime.Object) error
- func (f *FakeServiceControl) CreateServicesWithControllerRef(namespace string, service *v1.Service, object runtime.Object, ...) error
- func (f *FakeServiceControl) PatchService(namespace, name string, data []byte) error
- type PodControlInterface
- type PodControllerRefManager
- type RealPodControl
- func (r RealPodControl) CreatePods(namespace string, template *v1.PodTemplateSpec, object runtime.Object) error
- func (r RealPodControl) CreatePodsOnNode(nodeName, namespace string, template *v1.PodTemplateSpec, ...) error
- func (r RealPodControl) CreatePodsWithControllerRef(namespace string, template *v1.PodTemplateSpec, ...) error
- func (r RealPodControl) DeletePod(namespace string, podID string, object runtime.Object) error
- func (r RealPodControl) PatchPod(namespace, name string, data []byte) error
- type RealServiceControl
- func (r RealServiceControl) CreateServices(namespace string, service *v1.Service, object runtime.Object) error
- func (r RealServiceControl) CreateServicesWithControllerRef(namespace string, service *v1.Service, controllerObject runtime.Object, ...) error
- func (r RealServiceControl) PatchService(namespace, name string, data []byte) error
- type ServiceControlInterface
- type ServiceControllerRefManager
- type TaskSpec
Constants ¶
const ( // FailedCreatePodReason is added in an event and in a replica set condition // when a pod for a replica set is failed to be created. FailedCreatePodReason = "FailedCreate" // SuccessfulCreatePodReason is added in an event when a pod for a replica set // is successfully created. SuccessfulCreatePodReason = "SuccessfulCreate" // FailedDeletePodReason is added in an event and in a replica set condition // when a pod for a replica set is failed to be deleted. FailedDeletePodReason = "FailedDelete" // SuccessfulDeletePodReason is added in an event when a pod for a replica set // is successfully deleted. SuccessfulDeletePodReason = "SuccessfulDelete" FailedCreateServiceReason = "FailedCreateService" SuccessfulCreateServiceReason = "SuccessfulCreateService" )
Reasons for pod events
Variables ¶
var ( ErrVersionOutdated = errors.New("requested version is outdated in apiserver") // DefaultJobBackOff is the max backoff period, exported for the e2e test DefaultJobBackOff = 10 * time.Second // MaxJobBackOff is the max backoff period, exported for the e2e test MaxJobBackOff = 360 * time.Second )
Functions ¶
func GetPodFromTemplate ¶
func GetPodFromTemplate(template *v1.PodTemplateSpec, parentObject runtime.Object, controllerRef *metav1.OwnerReference) (*v1.Pod, error)
func RecheckDeletionTimestamp ¶
RecheckDeletionTimestamp returns a CanAdopt() function to recheck deletion.
The CanAdopt() function calls getObject() to fetch the latest value, and denies adoption attempts if that object has a non-nil DeletionTimestamp.
Types ¶
type BaseControllerRefManager ¶
type BaseControllerRefManager struct { Controller metav1.Object Selector labels.Selector CanAdoptFunc func() error // contains filtered or unexported fields }
func (*BaseControllerRefManager) CanAdopt ¶
func (m *BaseControllerRefManager) CanAdopt() error
func (*BaseControllerRefManager) ClaimObject ¶
func (m *BaseControllerRefManager) ClaimObject(obj metav1.Object, match func(metav1.Object) bool, adopt, release func(metav1.Object) error) (bool, error)
ClaimObject tries to take ownership of an object for this controller.
It will reconcile the following:
- Adopt orphans if the match function returns true.
- Release owned objects if the match function returns false.
A non-nil error is returned if some form of reconciliation was attempted and failed. Usually, controllers should try again later in case reconciliation is still needed.
If the error is nil, either the reconciliation succeeded, or no reconciliation was necessary. The returned boolean indicates whether you now own the object.
No reconciliation will be attempted if the controller is being deleted.
type Caffe2Config ¶
type Caffe2Config struct { // Cluster represents a TensorFlow ClusterSpec. // See: https://www.tensorflow.org/api_docs/python/tf/train/ClusterSpec Cluster ClusterSpec `json:"cluster"` Task TaskSpec `json:"task"` }
Caffe2Config is a struct representing the distributed TensorFlow config. This struct is turned into an environment variable CAFFE2_CONFIG which is used by TensorFlow processes to configure themselves. https://cloud.google.com/ml-engine/docs/trainer-considerations#use_tf_config
type ClusterSpec ¶
ClusterSpec represents a cluster Caffe2 specification. https://www.tensorflow.org/deploy/distributed#create_a_tftrainclusterspec_to_describe_the_cluster It is a map from job names to network addresses.
type Controller ¶
type Controller struct {
// contains filtered or unexported fields
}
func New ¶
func New(kubeClient kubernetes.Interface, caffe2JobClient jobclient.Interface) (*Controller, error)
func (*Controller) Run ¶
func (c *Controller) Run(threadiness int, stopCh <-chan struct{}) error
Run will set up the event handlers for types we are interested in, as well as syncing informer caches and starting workers. It will block until stopCh is closed, at which point it will shutdown the workqueue and wait for workers to finish processing their current work items.
type ControllerConfiguration ¶
var DefaultCaffe2JobControllerConfiguration ControllerConfiguration = ControllerConfiguration{ ReconcilerSyncLoopPeriod: metav1.Duration{Duration: 15 * time.Second}, }
DefaultCaffe2JobControllerConfiguration is the suggested caffe2-operator configuration for production.
type FakeServiceControl ¶
type FakeServiceControl struct { sync.Mutex Templates []v1.Service ControllerRefs []metav1.OwnerReference DeletePodName []string Patches [][]byte Err error CreateLimit int CreateCallCount int }
func (*FakeServiceControl) CreateServices ¶
func (*FakeServiceControl) CreateServicesWithControllerRef ¶
func (f *FakeServiceControl) CreateServicesWithControllerRef(namespace string, service *v1.Service, object runtime.Object, controllerRef *metav1.OwnerReference) error
func (*FakeServiceControl) PatchService ¶
func (f *FakeServiceControl) PatchService(namespace, name string, data []byte) error
type PodControlInterface ¶
type PodControlInterface interface { // CreatePods creates new pods according to the spec. CreatePods(namespace string, template *v1.PodTemplateSpec, object runtime.Object) error // CreatePodsOnNode creates a new pod according to the spec on the specified node, // and sets the ControllerRef. CreatePodsOnNode(nodeName, namespace string, template *v1.PodTemplateSpec, object runtime.Object, controllerRef *metav1.OwnerReference) error // CreatePodsWithControllerRef creates new pods according to the spec, and sets object as the pod's controller. CreatePodsWithControllerRef(namespace string, template *v1.PodTemplateSpec, object runtime.Object, controllerRef *metav1.OwnerReference) error // DeletePod deletes the pod identified by podID. DeletePod(namespace string, podID string, object runtime.Object) error // PatchPod patches the pod. PatchPod(namespace, name string, data []byte) error }
PodControlInterface is an interface that knows how to add or delete pods created as an interface to allow testing.
type PodControllerRefManager ¶
type PodControllerRefManager struct { BaseControllerRefManager // contains filtered or unexported fields }
func NewPodControllerRefManager ¶
func NewPodControllerRefManager( podControl PodControlInterface, controller metav1.Object, selector labels.Selector, controllerKind schema.GroupVersionKind, canAdopt func() error, ) *PodControllerRefManager
NewPodControllerRefManager returns a PodControllerRefManager that exposes methods to manage the controllerRef of pods.
The CanAdopt() function can be used to perform a potentially expensive check (such as a live GET from the API server) prior to the first adoption. It will only be called (at most once) if an adoption is actually attempted. If CanAdopt() returns a non-nil error, all adoptions will fail.
NOTE: Once CanAdopt() is called, it will not be called again by the same
PodControllerRefManager instance. Create a new instance if it makes sense to check CanAdopt() again (e.g. in a different sync pass).
func (*PodControllerRefManager) AdoptPod ¶
func (m *PodControllerRefManager) AdoptPod(pod *v1.Pod) error
AdoptPod sends a patch to take control of the pod. It returns the error if the patching fails.
func (*PodControllerRefManager) ClaimPods ¶
func (m *PodControllerRefManager) ClaimPods(pods []*v1.Pod, filters ...func(*v1.Pod) bool) ([]*v1.Pod, error)
ClaimPods tries to take ownership of a list of Pods.
It will reconcile the following:
- Adopt orphans if the selector matches.
- Release owned objects if the selector no longer matches.
Optional: If one or more filters are specified, a Pod will only be claimed if all filters return true.
A non-nil error is returned if some form of reconciliation was attempted and failed. Usually, controllers should try again later in case reconciliation is still needed.
If the error is nil, either the reconciliation succeeded, or no reconciliation was necessary. The list of Pods that you now own is returned.
func (*PodControllerRefManager) ReleasePod ¶
func (m *PodControllerRefManager) ReleasePod(pod *v1.Pod) error
ReleasePod sends a patch to free the pod from the control of the controller. It returns the error if the patching fails. 404 and 422 errors are ignored.
type RealPodControl ¶
type RealPodControl struct { KubeClient clientset.Interface Recorder record.EventRecorder }
RealPodControl is the default implementation of PodControlInterface.
func (RealPodControl) CreatePods ¶
func (r RealPodControl) CreatePods(namespace string, template *v1.PodTemplateSpec, object runtime.Object) error
func (RealPodControl) CreatePodsOnNode ¶
func (r RealPodControl) CreatePodsOnNode(nodeName, namespace string, template *v1.PodTemplateSpec, object runtime.Object, controllerRef *metav1.OwnerReference) error
func (RealPodControl) CreatePodsWithControllerRef ¶
func (r RealPodControl) CreatePodsWithControllerRef(namespace string, template *v1.PodTemplateSpec, controllerObject runtime.Object, controllerRef *metav1.OwnerReference) error
type RealServiceControl ¶
type RealServiceControl struct { KubeClient clientset.Interface Recorder record.EventRecorder }
RealServiceControl is the default implementation of ServiceControlInterface.
func (RealServiceControl) CreateServices ¶
func (RealServiceControl) CreateServicesWithControllerRef ¶
func (r RealServiceControl) CreateServicesWithControllerRef(namespace string, service *v1.Service, controllerObject runtime.Object, controllerRef *metav1.OwnerReference) error
func (RealServiceControl) PatchService ¶
func (r RealServiceControl) PatchService(namespace, name string, data []byte) error
type ServiceControlInterface ¶
type ServiceControlInterface interface { // CreateServices creates new Services according to the spec. CreateServices(namespace string, service *v1.Service, object runtime.Object) error // CreateServicesWithControllerRef creates new services according to the spec, and sets object as the service's controller. CreateServicesWithControllerRef(namespace string, service *v1.Service, object runtime.Object, controllerRef *metav1.OwnerReference) error // PatchService patches the service. PatchService(namespace, name string, data []byte) error }
ServiceControlInterface is an interface that knows how to add or delete Services created as an interface to allow testing.
type ServiceControllerRefManager ¶
type ServiceControllerRefManager struct { BaseControllerRefManager // contains filtered or unexported fields }
func NewServiceControllerRefManager ¶
func NewServiceControllerRefManager( serviceControl ServiceControlInterface, controller metav1.Object, selector labels.Selector, controllerKind schema.GroupVersionKind, canAdopt func() error, ) *ServiceControllerRefManager
NewServiceControllerRefManager returns a ServiceControllerRefManager that exposes methods to manage the controllerRef of services.
The canAdopt() function can be used to perform a potentially expensive check (such as a live GET from the API server) prior to the first adoption. It will only be called (at most once) if an adoption is actually attempted. If canAdopt() returns a non-nil error, all adoptions will fail.
NOTE: Once canAdopt() is called, it will not be called again by the same
ServiceControllerRefManager instance. Create a new instance if it makes sense to check canAdopt() again (e.g. in a different sync pass).
func (*ServiceControllerRefManager) AdoptService ¶
func (m *ServiceControllerRefManager) AdoptService(service *v1.Service) error
AdoptService sends a patch to take control of the service. It returns the error if the patching fails.
func (*ServiceControllerRefManager) ClaimServices ¶
func (m *ServiceControllerRefManager) ClaimServices(services []*v1.Service, filters ...func(*v1.Service) bool) ([]*v1.Service, error)
ClaimServices tries to take ownership of a list of Services.
It will reconcile the following:
- Adopt orphans if the selector matches.
- Release owned objects if the selector no longer matches.
Optional: If one or more filters are specified, a Service will only be claimed if all filters return true.
A non-nil error is returned if some form of reconciliation was attempted and failed. Usually, controllers should try again later in case reconciliation is still needed.
If the error is nil, either the reconciliation succeeded, or no reconciliation was necessary. The list of Services that you now own is returned.
func (*ServiceControllerRefManager) ReleaseService ¶
func (m *ServiceControllerRefManager) ReleaseService(service *v1.Service) error
ReleaseService sends a patch to free the service from the control of the controller. It returns the error if the patching fails. 404 and 422 errors are ignored.