controller

package
v0.0.0-...-3773f5b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 1, 2021 License: Apache-2.0 Imports: 35 Imported by: 1

Documentation

Overview

Package controller provides a Kubernetes controller for a Caffe2Job resource.

Package controller provides a Kubernetes controller for a Caffe2 job resource.

Package controller provides a Kubernetes controller for a Caffe2Job resource.

Package controller provides a Kubernetes controller for a Caffe2 job resource.

Package controller provides a Kubernetes controller for a Caffe2Job resource.

Index

Constants

View Source
const (
	// FailedCreatePodReason is added in an event and in a replica set condition
	// when a pod for a replica set is failed to be created.
	FailedCreatePodReason = "FailedCreate"
	// SuccessfulCreatePodReason is added in an event when a pod for a replica set
	// is successfully created.
	SuccessfulCreatePodReason = "SuccessfulCreate"
	// FailedDeletePodReason is added in an event and in a replica set condition
	// when a pod for a replica set is failed to be deleted.
	FailedDeletePodReason = "FailedDelete"
	// SuccessfulDeletePodReason is added in an event when a pod for a replica set
	// is successfully deleted.
	SuccessfulDeletePodReason = "SuccessfulDelete"

	FailedCreateServiceReason     = "FailedCreateService"
	SuccessfulCreateServiceReason = "SuccessfulCreateService"
)

Reasons for pod events

Variables

View Source
var (
	ErrVersionOutdated = errors.New("requested version is outdated in apiserver")

	// DefaultJobBackOff is the max backoff period, exported for the e2e test
	DefaultJobBackOff = 10 * time.Second
	// MaxJobBackOff is the max backoff period, exported for the e2e test
	MaxJobBackOff = 360 * time.Second
)

Functions

func GetPodFromTemplate

func GetPodFromTemplate(template *v1.PodTemplateSpec, parentObject runtime.Object, controllerRef *metav1.OwnerReference) (*v1.Pod, error)

func RecheckDeletionTimestamp

func RecheckDeletionTimestamp(getObject func() (metav1.Object, error)) func() error

RecheckDeletionTimestamp returns a CanAdopt() function to recheck deletion.

The CanAdopt() function calls getObject() to fetch the latest value, and denies adoption attempts if that object has a non-nil DeletionTimestamp.

Types

type BaseControllerRefManager

type BaseControllerRefManager struct {
	Controller metav1.Object
	Selector   labels.Selector

	CanAdoptFunc func() error
	// contains filtered or unexported fields
}

func (*BaseControllerRefManager) CanAdopt

func (m *BaseControllerRefManager) CanAdopt() error

func (*BaseControllerRefManager) ClaimObject

func (m *BaseControllerRefManager) ClaimObject(obj metav1.Object, match func(metav1.Object) bool, adopt, release func(metav1.Object) error) (bool, error)

ClaimObject tries to take ownership of an object for this controller.

It will reconcile the following:

  • Adopt orphans if the match function returns true.
  • Release owned objects if the match function returns false.

A non-nil error is returned if some form of reconciliation was attempted and failed. Usually, controllers should try again later in case reconciliation is still needed.

If the error is nil, either the reconciliation succeeded, or no reconciliation was necessary. The returned boolean indicates whether you now own the object.

No reconciliation will be attempted if the controller is being deleted.

type Caffe2Config

type Caffe2Config struct {
	// Cluster represents a TensorFlow ClusterSpec.
	// See: https://www.tensorflow.org/api_docs/python/tf/train/ClusterSpec
	Cluster ClusterSpec `json:"cluster"`
	Task    TaskSpec    `json:"task"`
}

Caffe2Config is a struct representing the distributed TensorFlow config. This struct is turned into an environment variable CAFFE2_CONFIG which is used by TensorFlow processes to configure themselves. https://cloud.google.com/ml-engine/docs/trainer-considerations#use_tf_config

type ClusterSpec

type ClusterSpec map[string][]string

ClusterSpec represents a cluster Caffe2 specification. https://www.tensorflow.org/deploy/distributed#create_a_tftrainclusterspec_to_describe_the_cluster It is a map from job names to network addresses.

type Controller

type Controller struct {
	// contains filtered or unexported fields
}

func New

func New(kubeClient kubernetes.Interface, caffe2JobClient jobclient.Interface) (*Controller, error)

func (*Controller) Run

func (c *Controller) Run(threadiness int, stopCh <-chan struct{}) error

Run will set up the event handlers for types we are interested in, as well as syncing informer caches and starting workers. It will block until stopCh is closed, at which point it will shutdown the workqueue and wait for workers to finish processing their current work items.

type ControllerConfiguration

type ControllerConfiguration struct {
	ReconcilerSyncLoopPeriod metav1.Duration
}
var DefaultCaffe2JobControllerConfiguration ControllerConfiguration = ControllerConfiguration{
	ReconcilerSyncLoopPeriod: metav1.Duration{Duration: 15 * time.Second},
}

DefaultCaffe2JobControllerConfiguration is the suggested caffe2-operator configuration for production.

type FakeServiceControl

type FakeServiceControl struct {
	sync.Mutex
	Templates       []v1.Service
	ControllerRefs  []metav1.OwnerReference
	DeletePodName   []string
	Patches         [][]byte
	Err             error
	CreateLimit     int
	CreateCallCount int
}

func (*FakeServiceControl) CreateServices

func (f *FakeServiceControl) CreateServices(namespace string, service *v1.Service, object runtime.Object) error

func (*FakeServiceControl) CreateServicesWithControllerRef

func (f *FakeServiceControl) CreateServicesWithControllerRef(namespace string, service *v1.Service, object runtime.Object, controllerRef *metav1.OwnerReference) error

func (*FakeServiceControl) PatchService

func (f *FakeServiceControl) PatchService(namespace, name string, data []byte) error

type PodControlInterface

type PodControlInterface interface {
	// CreatePods creates new pods according to the spec.
	CreatePods(namespace string, template *v1.PodTemplateSpec, object runtime.Object) error
	// CreatePodsOnNode creates a new pod according to the spec on the specified node,
	// and sets the ControllerRef.
	CreatePodsOnNode(nodeName, namespace string, template *v1.PodTemplateSpec, object runtime.Object, controllerRef *metav1.OwnerReference) error
	// CreatePodsWithControllerRef creates new pods according to the spec, and sets object as the pod's controller.
	CreatePodsWithControllerRef(namespace string, template *v1.PodTemplateSpec, object runtime.Object, controllerRef *metav1.OwnerReference) error
	// DeletePod deletes the pod identified by podID.
	DeletePod(namespace string, podID string, object runtime.Object) error
	// PatchPod patches the pod.
	PatchPod(namespace, name string, data []byte) error
}

PodControlInterface is an interface that knows how to add or delete pods created as an interface to allow testing.

type PodControllerRefManager

type PodControllerRefManager struct {
	BaseControllerRefManager
	// contains filtered or unexported fields
}

func NewPodControllerRefManager

func NewPodControllerRefManager(
	podControl PodControlInterface,
	controller metav1.Object,
	selector labels.Selector,
	controllerKind schema.GroupVersionKind,
	canAdopt func() error,
) *PodControllerRefManager

NewPodControllerRefManager returns a PodControllerRefManager that exposes methods to manage the controllerRef of pods.

The CanAdopt() function can be used to perform a potentially expensive check (such as a live GET from the API server) prior to the first adoption. It will only be called (at most once) if an adoption is actually attempted. If CanAdopt() returns a non-nil error, all adoptions will fail.

NOTE: Once CanAdopt() is called, it will not be called again by the same

PodControllerRefManager instance. Create a new instance if it makes
sense to check CanAdopt() again (e.g. in a different sync pass).

func (*PodControllerRefManager) AdoptPod

func (m *PodControllerRefManager) AdoptPod(pod *v1.Pod) error

AdoptPod sends a patch to take control of the pod. It returns the error if the patching fails.

func (*PodControllerRefManager) ClaimPods

func (m *PodControllerRefManager) ClaimPods(pods []*v1.Pod, filters ...func(*v1.Pod) bool) ([]*v1.Pod, error)

ClaimPods tries to take ownership of a list of Pods.

It will reconcile the following:

  • Adopt orphans if the selector matches.
  • Release owned objects if the selector no longer matches.

Optional: If one or more filters are specified, a Pod will only be claimed if all filters return true.

A non-nil error is returned if some form of reconciliation was attempted and failed. Usually, controllers should try again later in case reconciliation is still needed.

If the error is nil, either the reconciliation succeeded, or no reconciliation was necessary. The list of Pods that you now own is returned.

func (*PodControllerRefManager) ReleasePod

func (m *PodControllerRefManager) ReleasePod(pod *v1.Pod) error

ReleasePod sends a patch to free the pod from the control of the controller. It returns the error if the patching fails. 404 and 422 errors are ignored.

type RealPodControl

type RealPodControl struct {
	KubeClient clientset.Interface
	Recorder   record.EventRecorder
}

RealPodControl is the default implementation of PodControlInterface.

func (RealPodControl) CreatePods

func (r RealPodControl) CreatePods(namespace string, template *v1.PodTemplateSpec, object runtime.Object) error

func (RealPodControl) CreatePodsOnNode

func (r RealPodControl) CreatePodsOnNode(nodeName, namespace string, template *v1.PodTemplateSpec, object runtime.Object, controllerRef *metav1.OwnerReference) error

func (RealPodControl) CreatePodsWithControllerRef

func (r RealPodControl) CreatePodsWithControllerRef(namespace string, template *v1.PodTemplateSpec, controllerObject runtime.Object, controllerRef *metav1.OwnerReference) error

func (RealPodControl) DeletePod

func (r RealPodControl) DeletePod(namespace string, podID string, object runtime.Object) error

func (RealPodControl) PatchPod

func (r RealPodControl) PatchPod(namespace, name string, data []byte) error

type RealServiceControl

type RealServiceControl struct {
	KubeClient clientset.Interface
	Recorder   record.EventRecorder
}

RealServiceControl is the default implementation of ServiceControlInterface.

func (RealServiceControl) CreateServices

func (r RealServiceControl) CreateServices(namespace string, service *v1.Service, object runtime.Object) error

func (RealServiceControl) CreateServicesWithControllerRef

func (r RealServiceControl) CreateServicesWithControllerRef(namespace string, service *v1.Service, controllerObject runtime.Object, controllerRef *metav1.OwnerReference) error

func (RealServiceControl) PatchService

func (r RealServiceControl) PatchService(namespace, name string, data []byte) error

type ServiceControlInterface

type ServiceControlInterface interface {
	// CreateServices creates new Services according to the spec.
	CreateServices(namespace string, service *v1.Service, object runtime.Object) error
	// CreateServicesWithControllerRef creates new services according to the spec, and sets object as the service's controller.
	CreateServicesWithControllerRef(namespace string, service *v1.Service, object runtime.Object, controllerRef *metav1.OwnerReference) error
	// PatchService patches the service.
	PatchService(namespace, name string, data []byte) error
}

ServiceControlInterface is an interface that knows how to add or delete Services created as an interface to allow testing.

type ServiceControllerRefManager

type ServiceControllerRefManager struct {
	BaseControllerRefManager
	// contains filtered or unexported fields
}

func NewServiceControllerRefManager

func NewServiceControllerRefManager(
	serviceControl ServiceControlInterface,
	controller metav1.Object,
	selector labels.Selector,
	controllerKind schema.GroupVersionKind,
	canAdopt func() error,
) *ServiceControllerRefManager

NewServiceControllerRefManager returns a ServiceControllerRefManager that exposes methods to manage the controllerRef of services.

The canAdopt() function can be used to perform a potentially expensive check (such as a live GET from the API server) prior to the first adoption. It will only be called (at most once) if an adoption is actually attempted. If canAdopt() returns a non-nil error, all adoptions will fail.

NOTE: Once canAdopt() is called, it will not be called again by the same

ServiceControllerRefManager instance. Create a new instance if it makes
sense to check canAdopt() again (e.g. in a different sync pass).

func (*ServiceControllerRefManager) AdoptService

func (m *ServiceControllerRefManager) AdoptService(service *v1.Service) error

AdoptService sends a patch to take control of the service. It returns the error if the patching fails.

func (*ServiceControllerRefManager) ClaimServices

func (m *ServiceControllerRefManager) ClaimServices(services []*v1.Service, filters ...func(*v1.Service) bool) ([]*v1.Service, error)

ClaimServices tries to take ownership of a list of Services.

It will reconcile the following:

  • Adopt orphans if the selector matches.
  • Release owned objects if the selector no longer matches.

Optional: If one or more filters are specified, a Service will only be claimed if all filters return true.

A non-nil error is returned if some form of reconciliation was attempted and failed. Usually, controllers should try again later in case reconciliation is still needed.

If the error is nil, either the reconciliation succeeded, or no reconciliation was necessary. The list of Services that you now own is returned.

func (*ServiceControllerRefManager) ReleaseService

func (m *ServiceControllerRefManager) ReleaseService(service *v1.Service) error

ReleaseService sends a patch to free the service from the control of the controller. It returns the error if the patching fails. 404 and 422 errors are ignored.

type TaskSpec

type TaskSpec struct {
	Type  string `json:"type"`
	Index int    `json:"index"`
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL