csimigration

package
v1.44.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 12, 2022 License: Apache-2.0, BSD-2-Clause, MIT, + 1 more Imports: 26 Imported by: 0

README

Generic CSI Migration Controller

This package contains a generic CSI migration controller that helps provider extensions that are still using the legacy in-tree volume provisioner plugins to migrate to CSI.

How does CSI (migration) work in general?

Classically, the kube-controller-manager was responsible for provisioning/deprovisioning and attaching/detaching persistent volumes. The kubelet running on the worker nodes was mounting/unmounting the disks. As there are so many vendor-specific implementations that were all maintained in the main Kubernetes source tree, the community decided to move them out into dedicated drivers, and let the core Kubernetes components interact with them using a standardized interface (CSI).

In the Gardener context, we usually have two parts of the CSI components: One part contains the controllers that are to be deployed in the seed cluster (as part of shoot control plane). They comprise the vendor-specific driver (e.g., AWS EBS driver), plus a generic provisioner, attacher, snapshotter, and resizer controller. The other part is deployed as DaemonSet on each shoot worker node and is responsible for registering the CSI driver to the kubelet as well as mounting/unmounting the disks to the machines.

In order to tell the Kubernetes components that they are no longer responsible for volumes but CSI is used, the community has introduced feature gates. There is a general CSIMigration feature gate, plus two vendor-specific feature gates, e.g. CSIMigrationAWS and CSIMigrationAWSComplete:

  • If only the first two feature gates are enabled then CSI migration is in process. In this phase the kube-controller-manager / kubelets are still partly responsible. Concretely, the kube-controller-manager will still provision/deprovision volumes using legacy storage class provisioners (e.g., kubernetes.io/aws-ebs), and it will attach/detach volumes for nodes that have not yet been migrated to CSI.
  • If all three feature gates are enabled then CSI migration is completed and no in-tree plugin will be used anymore. Both kube-controller-manager and kubelet pass on responsibility to the CSI drivers and controllers.

For newly created clusters all feature gates can be enabled directly from the beginning and no migration is needed at all.

For existing clusters there are a few steps that must be performed. Usually, the kube-controller-manager ran with the --cloud-config and --external-cloud-volume-plugin flags when using the in-tree volume provisioners. Also, the kube-apiserver enables the PersistentVolumeLabel admission plugin, and the kubelet runs with --cloud-provider=aws. As part of the CSI migration, the CSIMigration<Provider>Complete feature gate may only be enabled if all nodes have been drained, and all kubelets have been updated with the feature gate flags.

As provider extensions usually perform a rolling update of worker machines when the Kubernetes minor version is upgraded, it is recommended to start the CSI migration during such a Kubernetes version upgrade.

⚠️ As of gardener/gardener#4971, the Kubernetes version used for specific worker pools can differ from the control plane's Kubernetes version. Hence, not all worker nodes might be rolled out when the Kubernetes version is updated. However, it is required to roll out all nodes when performing CSI migration.

Consequently, it is highly recommended that extensions validate Shoot resources to ensure worker pool Kubernetes versions may only differ from control plane Kubernetes version when CSI migration version is reached.

For example, when CSI migration is performed with 1.18 (i.e, during the Shoot upgrade from 1.17 to 1.18) then it shall not be possible to specify .spec.provider.workers[].kubernetes.version if .spec.kubernetes.version is less than 1.18.

Consequently, the migration can now happen by executing with the following steps:

  1. The CSIMigration and CSIMigrationAWS feature gates must be enabled on all master components, i.e., kube-apiserver, kube-controller-manager, and kube-scheduler.
  2. Until the CSI migration has completed the master components keep running with the cloud provider flags allowing to use in-tree volume provisioners.
  3. A rolling update of the worker machines is triggered, new kubelets are coming up with all three CSI migration feature gates enabled + the --cloud-provider=external flag.
  4. After only new worker machines exist the master components can be updated with the CSIMigration<Provider>Complete feature gate + removal of all cloud-specific flags.
  5. As StorageClasses are immutable the existing ones using a legacy in-tree provisioner can be deleted so that they can be recreated with the same name but using the new CSI provisioner. The CSI drivers are ensuring that they stay compatible with the legacy provisioner names forever.

How can this controller be used for CSI migration?

As motivated in above paragraph, for provider extensions the easiest way to start the CSI migration is together with a Kubernetes minor version upgrade because the necessary rolling update of the worker machines is triggered.

The problem now is that the steps 4) and 5) must happen only after no old nodes exist in the cluster anymore. This could be done in the Worker controller, however, it would be somewhat ugly and mix too many things together (it's already pretty large). Having such dedicated CSI migration controller allows for better separation, maintainability and less complexity.

The idea is that a provider extension adds this generic CSI migration controller - similar how it adds other generic controllers (like ControlPlane, Infrastructure, etc.). It will watch Cluster resources of shoots having the respective provider extension type + minimum Kubernetes version that was declared for starting CSI migration. When it detects such a Cluster it will start its CSI migration flow.

The (soft) contract between the CSI migration controller and control plane webhooks of provider extensions is that the CSI migration controller will annotate the Cluster resource with csi-migration.extensions.gardener.cloud/needs-complete-feature-gates=true in case the migration is finished. The control webhooks can read this annotation and - if present - configure the Kubernetes components accordingly (e.g., adding the CSIMigration<Provider>Complete feature gate + removing the cloud flags).

CSI Migration Flow
  1. Check if the shoot is newly created - if yes, annotate the Cluster object with csi-migration.extensions.gardener.cloud/needs-complete-feature-gates=true and exit.
  2. If the cluster is an existing one that is getting updated then
    1. If the shoot is hibernated then requeue and wait until it gets woken up.
    2. Wait until only new nodes exist in the shoot anymore.
    3. Delete the legacy storage classes in the shoot.
    4. Add the csi-migration.extensions.gardener.cloud/needs-complete-feature-gates=true annotation to the Cluster.
    5. Send empty PATCH requests to the kube-apiserver, kube-controller-manager, kube-scheduler Deployment resources to allow the control plane webhook adapting the specification.

Consequently, from the provider extension point of view, what needs to be done is

  1. Add the CSIMigration controller and start it.
  2. Deploy CSI controller to seed and CSI driver as part of the ControlPlane reconciliation for shoots of the Kubernetes version that is used for CSI migration.
  3. Add the CSIMigration and CSIMigration<Provider> feature gates to the Kubernetes master components (together with the cloud flags) if the Cluster was not yet annotated with csi-migration.extensions.gardener.cloud/needs-complete-feature-gates=true.
  4. Add the CSIMigration<Provider>Complete feature gate and remove all cloud flags if the Cluster was annotated with csi-migration.extensions.gardener.cloud/needs-complete-feature-gates=true.

Further References

Documentation

Index

Constants

View Source
const (
	// ControllerName is the name of the controller
	ControllerName = "csimigration_controller"

	// AnnotationKeyNeedsComplete is a constant for an annotation on the Cluster resource that indicates that
	// the control plane components require the CSIMigration<Provider>Complete feature gates.
	AnnotationKeyNeedsComplete = "csi-migration.extensions.gardener.cloud/needs-complete-feature-gates"
	// AnnotationKeyControllerFinished is a constant for an annotation on the Cluster resource that indicates that
	// the CSI migration has nothing more to do anymore because he completed earlier already.
	AnnotationKeyControllerFinished = "csi-migration.extensions.gardener.cloud/controller-finished"
)
View Source
const RequeueAfter = time.Minute

RequeueAfter is the duration to requeue a Cluster reconciliation if indicated by the CSI controller.

Variables

View Source
var NewClientForShoot = util.NewClientForShoot

NewClientForShoot is a function to create a new client for shoots.

Functions

func Add

func Add(mgr manager.Manager, args AddArgs) error

Add creates a new CSIMigration Controller and adds it to the Manager. and Start it when the Manager is Started.

func CheckCSIConditions

func CheckCSIConditions(cluster *extensionscontroller.Cluster, csiMigrationVersion string) (useCSI bool, csiMigrationComplete bool, err error)

CheckCSIConditions takes the `Cluster` object and the Kubernetes version that shall be used for CSI migration. It returns two booleans - the first one indicates whether CSI shall be used at all (this may help the provider extension to decide whether to enable CSIMigration feature gates), and the second one indicates whether the CSI migration has been completed (this may help the provider extension to decide whether to enable the CSIMigration<Provider>Complete feature gate). If the shoot cluster version is higher than the CSI migration version then it always returns true for both variables. If it's lower than the CSI migration version then it always returns false for both variables. If it's the exact CSI migration (minor) version then it returns true for the first value (CSI migration shall be enabled), and true or false based on whether the "needs-complete-feature-gates" annotation is set on the Cluster object.

func ClusterCSIMigrationControllerNotFinished

func ClusterCSIMigrationControllerNotFinished() predicate.Predicate

ClusterCSIMigrationControllerNotFinished is a predicate for an annotation on the cluster.

func NewReconciler

func NewReconciler(csiMigrationKubernetesVersion string, storageClassNameToLegacyProvisioner map[string]string) reconcile.Reconciler

NewReconciler creates a new reconcile.Reconciler that reconciles Cluster resources of Gardener's `extensions.gardener.cloud` API group.

Types

type AddArgs

type AddArgs struct {
	// ControllerOptions are the controller options used for creating a controller.
	// The options.Reconciler is always overridden with a reconciler created from the
	// given actuator.
	ControllerOptions controller.Options
	// Predicates are the predicates to use.
	Predicates []predicate.Predicate
	// CSIMigrationKubernetesVersion is the smallest Kubernetes version that is used for the CSI migration.
	CSIMigrationKubernetesVersion string
	// Type is the provider extension type.
	Type string
	// StorageClassNameToLegacyProvisioner is a map of storage class names to the used legacy provisioner name. As part
	// of the CSI migration they will be deleted so that new storage classes with the same name but a different CSI
	// provisioner can be created (storage classes are immutable).
	StorageClassNameToLegacyProvisioner map[string]string
}

AddArgs are arguments for adding an csimigration controller to a manager.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL