esx

package
v1.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 7, 2024 License: Apache-2.0 Imports: 23 Imported by: 0

README

ESX Controller

The ESX controller does two things for Kubernetes nodes running on virtual machines managed by a VMware vCenter. Firstly it regularly checks whether a nodes underlying ESX host is or goes into maintenance mode. If so the label cloud.sap/esx-in-maintenance is set to true.

Secondly, to complete entering maintenance mode all virtual machines on an ESX host need to be turned off. By setting the cloud.sap/esx-reboot-ok label to true on every node (within the cluster) belonging to certain ESX host, which is entering maintenance mode, the controller will cordon, drain and shutdown these nodes (and will keep them shutdown). When the ESX host leaves maintenance mode the controller will turn the nodes on and uncordon them. This behavior only occurs, if the cloud.sap/esx-reboot-initiated annotation is set to true, so it does not interfere with other maintenance activities. The cloud.sap/esx-reboot-initiated annotation is managed by the controller based on the cloud.sap/esx-in-maintenance and cloud.sap/esx-reboot-ok labels.

Using the cloud.sap/esx-in-maintenance label together with the cloud.sap/esx-reboot-ok label enables ESX maintenances to be managed flexibly with the "main" maintenance controller.

Certain alarms can be specified in the configuration file. If an ESX host has a triggered alarm with a name that matches the provided names in the configuration file, the cloud.sap/esx-in-maintenance label will be set to alarm. Draining nodes, which ESX maintenance state is alarm, will use deletions with a grace period of 0 (effectively force deleting these pods).

It is assumed that the nodes names equal the names of the hosting virtual machines. The availability zone within a cloud region is assumed to be the last character of the failure-domain.beta.kubernetes.io/zone label. The ESX hosts are to be tracked on relevant nodes using the kubernetes.cloud.sap/host label.

The nodes are also label with cloud.sap/esx-version containing the underlying ESX version.

Installation

The ESX controller is bundled within the maintenance controller binary. It needs to be enabled using the --enable-esx-maintenance flag.

Configuration

To be placed in ./config/esx.yaml.

intervals:
  # Defines how frequent the controller will check for ESX hosts entering maintenance mode
  check: # changing the check interval requires a pod restart to come into effect
    jitter: 0.1 # required
    period: 5m # required
  # Defines how long and frequent to check for pod deletions while draining
  podDeletion:
    period: 5s # required
    timeout: 2m # required
  # Defines how long to wait for a VM to shut down gracefully.
  # If a VM does not terminate within timeout it will be "unplugged the hard way"
  vmShutdown:
    period: 5s # required
    timeout: 2m # required
  # Defines how long and frequent to try to evict pods
  podEviction:
    period: 20s
    timeout: 5m
    force: false # If true and evictions do not succeed do normal DELETE calls
alarms:
  - "Host memory usage" # according to https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-82933270-1D72-4CF3-A1AF-E5A1343F62DE.html
vCenters:
  # Defines the urls to vCenters in different availability zones.
  # $AZ is replaced with the single character availability zone.
  templateUrl: https://some-vcenter-url-$AZ # required
  # Defines if a vCenters certificates should be checked
  insecure: # optional, defaults to false
  # Credentials for the vCenter per availability zone
  credentials: # required
    a:
      username: user # required
      password: pass # required

Documentation

Index

Constants

View Source
const AvailabilityZoneReplacer string = "$AZ"

Specifies the string in a vCenter URL, which is replaced by the availability zone.

Variables

This section is empty.

Functions

func CheckAlarms added in v0.19.3

func CheckAlarms(ctx context.Context, params CheckParameters) ([]string, error)

Returns list of active alarm names as described here: https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-82933270-1D72-4CF3-A1AF-E5A1343F62DE.html

func EnsureVMOff added in v0.21.0

func EnsureVMOff(ctx context.Context, params ShutdownParams) error

func FetchVersion added in v0.17.0

func FetchVersion(ctx context.Context, params CheckParameters) (string, error)

func RetrieveVM added in v0.21.0

func RetrieveVM(ctx context.Context, client *govmomi.Client, name string) (mo.VirtualMachine, error)

func ShouldShutdown

func ShouldShutdown(esx *Host) bool

Checks, if all Nodes on an ESX need maintenance and are allowed to be shutdown.

func ShouldShutdownNode added in v0.19.5

func ShouldShutdownNode(node *v1.Node) bool

func ShouldStart

func ShouldStart(node *v1.Node) bool

Checks if the controller initiated the maintenance and the underlying ESX is not in maintenance.

func ShutdownAllowed added in v0.19.4

func ShutdownAllowed(state Maintenance) bool

Types

type CheckParameters

type CheckParameters struct {
	VCenters *VCenters
	Host     HostInfo
	Log      logr.Logger
}

type Config

type Config struct {
	Intervals struct {
		Check struct {
			Jitter float64       `config:"jitter" validate:"min=0.001"`
			Period time.Duration `config:"period" validate:"required"`
		} `config:"check" validate:"required"`
		PodDeletion struct {
			Period  time.Duration
			Timeout time.Duration
		} `config:"podDeletion" validate:"required"`
		PodEviction struct {
			Period  time.Duration `config:"period" validate:"required"`
			Timeout time.Duration `config:"timeout" validate:"required"`
			Force   bool          `config:"force"`
		} `config:"podEviction" validate:"required"`
		VMShutdown struct {
			Period  time.Duration `config:"period" validate:"required"`
			Timeout time.Duration `config:"timeout" validate:"required"`
		} `config:"vmShutdown" validate:"required"`
	} `config:"intervals" validate:"required"`
	Alarms   []string
	VCenters VCenters `config:"vCenters" validate:"required"`
}

func (*Config) AlarmsAsSet added in v0.19.3

func (c *Config) AlarmsAsSet() map[string]struct{}

type Credential

type Credential struct {
	Username string `config:"username" validate:"required"`
	Password string `config:"password"`
}

type Host

type Host struct {
	HostInfo
	Nodes []v1.Node
}

func ParseHostList

func ParseHostList(nodes []v1.Node) ([]Host, error)

Assigns nodes to their underlying ESX.

type HostInfo

type HostInfo struct {
	Name             string
	AvailabilityZone string
}

type Maintenance

type Maintenance string
const AlarmMaintenance Maintenance = "alarm"
const InMaintenance Maintenance = "true"
const NoMaintenance Maintenance = "false"
const UnknownMaintenance Maintenance = "unknown"

func CheckForMaintenance

func CheckForMaintenance(ctx context.Context, params CheckParameters) (Maintenance, error)

Performs a check for the specified host if allowed by timestamps.

type PollPowerOffParams added in v0.21.0

type PollPowerOffParams struct {
	// contains filtered or unexported fields
}

type Runnable

type Runnable struct {
	client.Client
	Log  logr.Logger
	Conf *rest.Config
}

func (*Runnable) CheckMaintenance

func (r *Runnable) CheckMaintenance(ctx context.Context, conf *Config, esx *Host) error

Checks the maintenance mode of the given ESX and attaches the according Maintenance label.

func (*Runnable) FetchVersion added in v0.17.0

func (r *Runnable) FetchVersion(ctx context.Context, vCenters *VCenters, esx *Host) error

func (*Runnable) NeedLeaderElection

func (r *Runnable) NeedLeaderElection() bool

func (*Runnable) Reconcile

func (r *Runnable) Reconcile(ctx context.Context)

func (*Runnable) ShutdownNodes

func (r *Runnable) ShutdownNodes(ctx context.Context, conf *Config, esx *Host) error

Shuts down nodes on the given ESX, if the ESX has a maintenance and a node is labelled accordingly.

func (*Runnable) Start

func (r *Runnable) Start(ctx context.Context) error

func (*Runnable) StartNodes

func (r *Runnable) StartNodes(ctx context.Context, vCenters *VCenters, esx *Host)

Starts the nodes on the given ESX, if this controller shut them down and the underlying ESX is no longer in maintenance.

type ShutdownParams added in v0.21.0

type ShutdownParams struct {
	VCenters *VCenters
	Info     HostInfo
	NodeName string
	Period   time.Duration
	Timeout  time.Duration
	Log      logr.Logger
}

type VCenters

type VCenters struct {
	// URL to regional vCenters with the availability zone replaced by AvailabilityZoneReplacer.
	Template string `config:"templateUrl" validate:"required"`
	// If true the vCenters certificates are not validated.
	Insecure bool `config:"insecure"`
	// Pair of credentials per availability zone.
	Credentials map[string]Credential `config:"credentials" validate:"required"`
	// contains filtered or unexported fields
}

VCenters contains connection information to regional vCenters.

func (*VCenters) ClearCache

func (vc *VCenters) ClearCache(ctx context.Context, log logr.Logger)

func (*VCenters) Client

func (vc *VCenters) Client(ctx context.Context, availabilityZone string) (*govmomi.Client, error)

Returns a ready to use vCenter client for the given availability zone.

func (*VCenters) URL

func (vc *VCenters) URL(availabilityZone string) (*url.URL, error)

Gets an URL to connect to a vCenters in a specific availability zone.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL