cpu-pooler

module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 1, 2020 License: BSD-3-Clause

README

CPU Pooler for Kubernetes

Build Status

Overview

CPU-Pooler for Kubernetes is a solution for Kubernetes to manage predefined, distinct CPU pools of Kubernetes Nodes, and physically separate the CPU resources of the containers connecting to the various pools.

Two explicit types of CPU pools are supported; exclusive and shared. If a container does not explicitly ask resources from these pools, it will belong to a third one, called default.

Request from exclusive pool will allocate CPU(s) for latency-sensitive containers requiring exclusive access.

Containers content with some, limited level of sharing, but still appreciating best-effort protection from noisy neighbours can request resources from the shared pool. Shared requests are fractional CPUs with one CPU unit exactly matching to one thousandth of a CPU core. (millicpu).

The default pool is synonymous to the Kubelet managed, by default existing shared pool. Worth noting that the shared and default pool's CPU sets, and characteristics are distinct. By supporting the "original" CPU allocation method in parallel with the enhanced, 3rd party containers totally content with the default CPU management policy of Kubernetes can be instantiated on a CPU-Pooler upgraded system without any configuration changes.

Topology awareness (NUMA)

CPU-Pooler is officially topology aware starting with the 0.4.0 release!

Moreover, unlike other external managers CPU-Pooler natively integrates into Kubernetes' own Topology Manager. CPU-Pooler reports the NUMA Node ID of all the CPUs belonging to an exclusive CPU pool to the upstream Topology Manager. This means whenever a Pod requests resources from Kubernetes where topology matters -e.g. SR-IOV virtual functions, exclusive CPUs, GPUs etc.- Kubernetes will automatically assign resources with their NUMA node aligned - CPU-Pooler managed cores included!

This feature is automatic, therefore it does not require any configuration from the user. For it to work though CPU-Pooler's version must be at least 0.4.0, while Kubernetes must be at least 1.17.X.

Components of the CPU-Pooler project

The CPU-Pooler project contains 4 core components:

  • a Kubernetes standard Device Plugin seamlessly integrating the CPU pools to Kubernetes as schedulable resources
  • a Kubernetes standard Informer making sure containers belonging to different CPU pools are always physically isolated from each other
  • process starter binary capable of pinning specific processes to specific cores even within the confines of a container
  • a mutating admission webhook for the Kubernetes core Pod API, validating and mutating CPU pool specific user requests

The Device Plugin's job is to advertise the pools as consumable resources to Kubelet through the existing DPAPI. The CPUs allocated by the plugin are communicated to the container as environment variables containing a list of physical core IDs. By default the application can set its processes CPU affinity according to the given CPU list, or can leave it up to the standard Linux Completely Fair Scheduler.

For the edge case where application does not implement functionality to set the CPU affinity of its processes, the CPU pooler provides mechanism to set it on behalf of the application. This opt-in functionality is enabled by configuring the application process information to the annotation field of its Pod spec. A mutating admission controller webhook is provided with the project to mutate the Pod's specification according to the needs of the starter binary (mounts, environment variables etc.). The process-starter binary has to be installed to host file system in /opt/bin directory.

Lastly, the CPUSetter sub-component implements total physical separation of containers via Linux cpusets. This Informer constantly watches the Pod API of Kubernetes, and is triggered whenever a Pod is created, or changes its state(e.g. restarted etc.) CPUSetter first calculates what is the appropriate cpuset for the container: the allocated CPUs in case of exclusive, the shared pool in case of shared, or the default in case the container did not explicitly ask for any pooled resources. CPUSetter then provisions the calculated set into the relevant parametet of the container's cgroupfs filesystem (cpuset.cpus). As CPUSetter is triggered by all Pods on all Nodes, we can be sure no containers can ever -even accidentally- access CPU resources not meant for them!

Using the allocated CPUs

By default CPU-Pooler only provisions the appropriate cpuset for a container based on its resource request, but does not intervene with how threads inside the container are scheduled between the allowed vCPUs.

For users who require explicitly pinning application process / thread(s) to allocated CPUs for some reason, the following options are available:

pod annotation

If container has multiple processes and multiple exclusive CPUs are allocated or different pool types are used, pod annotation can be used to configure processes and CPU amounts for them. In this case CPU Pooler takes care of pinning the processes to allocated -but only to the allocated- CPUs. This is suitable for single threaded processes running on exclusive CPUs. The container can also have process(es) running on shared CPUs

use environment variables for pinning

CPU Pooler sets allocated exclusive CPUs to environment variable EXCLUSIVE_CPUS and allocated shared CPUs to SHARED_CPUS environment variable. They contain CPU(s) as comma separated list. These variables can be used by the application to read the allocated CPU(s) and do pinning of threads / processes to the CPU(s).

In order to avoid possible race conditions occuring due to multiple processes trying to set affinity at the same time (or application trying to set it too early); CPU-Pooler can guarantee that the container's entrypoint is only executed once the proper CPU configuration has been provisioned. This is achieved by the process-starter component under some conditions. In order for this functionality to work as intended the command property must be configured in container's pod manifest. If the command is not configured, the process-starer won't be used because it is not known which process needs to be started in the container. In such cases we fall back to the native Linux thread scheduling mechanism, but depending on user activity this might result in exotic race conditions occuring.

Configuration

Kubelet

Kubelet treats Devices transparently, therefore it will never realize the CPU-Pooler managed resources are actually CPUs. In order to avoid double bookkeeping of CPU resources in the cluster, every Node's Kubelet hosting the CPU-Pooler Device Plugin should be configured according to the following formula: --system-reserved = <TOTAL_CPU_CAPACITY> - SIZEOF(DEFAULT_POOL) + <DEFAULT_SYSTEM_RESERVED> This setting effectively tells Kubelet to discount the capacity belonging to the CPU-Pooler managed shared and exclusive pools.

Besides that, please note that Kubelet's inbuilt CPU Manager needs to be disabled on the Nodes which run CPU-Pooler to avoid overwriting CPU-Pooler's more fine-grained cpuset configuration.

CPU pools

CPU pools are configured with a configMap named cpu-pooler-configmap. The schema of configMap is as follows:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cpu-pooler-configmap
data:
  poolconfig-<name>.yaml: |
    pools:
      exclusive_<poolname1>:
        cpus : "<list of cpus"
      exclusive_<poolname2>:
        cpus : "<list of cpus"
      shared_<poolname3>:
        cpus : "<list of cpus>"
      default:
        cpus : "<list of cpus>"
      nodeSelector:
        <key> : <value>

The poolconfig-.yaml file must exist in the data section. The cpu pools are defined in poolconfig-.yaml files. There must be at least one poolconfig-.yaml file in the data section. Pool name from the config will be the resource in the fully qualified resource name (nokia.k8s.io/<pool name>). The pool name must have pool type prefix - 'exclusive' for exclusive cpu pool or 'shared' for shared cpu pool. A CPU pool not having either of these special prefixes is considered as the cluster-wide 'default' CPU pool, and as such, CPU cores belonging to this pool will not be advertised to the Device Manager as schedulable resources. The nodeSelector is used to tell which node the pool configuration file belongs to. CPU pooler and CPUSetter components both read the node labels and select the config that matches the value of nodeSelector.

In the deployment directory there is a sample pool config with two exclusive pools (both have two cpus) and one shared pool (one cpu). Nodes for the pool configurations are selected by nodeType label.

Please note: currently only one shared pool is supported per Node!

Pod spec

The cpu-device-plugin advertises the resources of exclusive, and shared CPU pools as name: nokia.k8s.io/<poolname>. The poolname is pool name configured in cpu-pooler-configmap. The cpus are requested in the resources section of container in the pod spec.

Annotation:

Annotation schema is following and the name for the annotation is nokia.k8s.io/cpus. Resource being the the advertised resource name.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/container"
  },
  "definitions": {
    "container": {
      "type": "object",
      "required": [
        "container",
        "processes"
      ],
      "properties": {
        "container": {
          "type": "string"
        },
        "processes": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/process"
          }
        }
      }
    },
    "process": {
      "type": "object",
      "required": [
        "process",
        "args",
        "pool",
        "cpus"
      ],
      "properties": {
        "process": {
          "type": "string"
        },
        "args": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "pool": {
          "type": "string"
        },
        "cpus": {
          "type": "number"
        }
      }
    }
  }
}

An example is provided in cpu-test.yaml pod manifest in the deployment folder.

Restrictions

Following restrictions apply when allocating cpu from pools and configuring pools:

  • There can be only one shared pool in the node
  • Resources belonging to the default pool are not advertised. The default pool definition is only used by the CPUSetter component

Build

Mutating webhook

$ docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t cpu-device-webhook -f build/Dockerfile.webhook  .

The device plugin

$ docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t cpudp -f build/Dockerfile.cpudp  .

Process starter

$ dep ensure --vendor-only
$ CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' github.com/nokia/CPU-Pooler/cmd/process-starter

CPUSetter

$ docker build --build-arg http_proxy=$http_proxy --build-arg https_proxy=$https_proxy -t cpusetter -f build/Dockerfile.cpusetter  .

Installation

Install process starter to host file system:

$ cp process-starter /opt/bin/

Create cpu-device-plugin config and daemonset:

$ kubectl create -f deployment/cpu-pooler-config.yaml
$ kubectl create -f deployment/cpu-dev-ds.yaml

There is a helper script ./scripts/generate_cert.sh that generates certificate and key for the webhook admission controller. The script deployment/create-webhook-conf.sh can be used to create the webhook configuration from the provided manifest file (webhook-conf.yaml).

Create CPUSetter daemonset:

$ kubectl create -f deployment/cpusetter-ds.yaml

Following steps create the webhook server with necessary configuration (including the certifcate and key)

$ ./scripts/generate-cert.sh
$ 
$ kubectl create -f deployment/webhook-svc-depl.yaml
$ deployment/create-webhook-conf.sh deployment/webhook-conf.yaml

The cpu-device-plugin with webhook should be running now. Test the installation with a cpu test pod. First create image for the test container:

$ docker build -t busyloop test/

Start the test container:

$ kubectl create -f ./deployment/cpu-test.yaml

Upon instantiation you can observe that the cpuset.cpus parameter of the created containers' cpuset cgroupfs hierarchy are set to right values:

  • to the configured cpuset belonging to the shared pool for sharedtestcontainer
  • to the a cpuset containing only the ID of the 2 chosen cores for exclusivetestcontainer
  • to the configured cpuset belonging to the default pool for defaulttestcontainer

Directories

Path Synopsis
cmd
pkg
test

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL