Intel RMD Operator
Kubernetes Operator designed to provision and manage Intel Resource Management Daemon (RMD) instances in a Kubernetes cluster.
Prerequisites
- Node Feature Discovery (NFD) should be deployed in the cluster before running the operator. Once NFD has applied labels to nodes with capabilities compatible with RMD, such as Intel L3 Cache Allocation Technology, the operator can deploy RMD on those nodes.
Note: NFD is recommended, but not essential. Node labels can also be applied manually. See the NFD repo for a full list of features labels.
- A working RMD container image from the RMD repo compatible with the RMD Operator (see compatiblilty table below).
Compatibility
RMD Version |
RMD Operator Version |
v0.1 |
N/A |
v0.2 |
v0.1 |
v0.3 |
v0.2 |
Setup
Debug Mode
To use the operator with RMD in debug mode, the port number of build/manifests/rmd-pod.yaml must be set to 8081
before building the operator. Debug mode is advised for testing only.
TLS Enablement
To use the operator with RMD with TLS enabled, the port number of build/manifests/rmd-pod.yaml must be set to 8443
before building the operator. The certificates provided in this repository are taken from the RMD repository and should be used for testing only. The user can generate their own certs for production and replace with those existing. The client certs for the RMD operator are stored in the following locations:
CA: build/certs/public/ca.pem
Public Key: build/certs/public/cert.pem
Private Key: build/certs/private/key.pem
Build
Note: The operator deploys pods with the RMD container. The Dockerfile for this container is located on the RMD repo and is out of scope for this project.
The operator supports RMD v0.2 only.
The pod spec used by the operator to deploy the RMD container is located at build/manifests/rmd-pod.yaml. Alterations to the image name/tag should be made here.
Build binaries and create docker images for the operator and the node agent:
make all
Note: The Docker images built are intel-rmd-operator:latest
and intel-rmd-node-agent:latest
. Once built, these images should be stored in a remote docker repository for use throughout the cluster.
Deploy
The deploy directory contains all specifications for the required RBAC objects. These objects can be inspected and deployed individually or created all at once using rbac.yaml:
kubectl create -f deploy/rbac.yaml
Create RmdNodeState CRD:
kubectl create -f deploy/crds/intel.com_rmdnodestates_crd.yaml
Create RmdWorkloads CRD:
kubectl create -f deploy/crds/intel.com_rmdworkloads_crd.yaml
Create Operator Pod:
kubectl create -f deploy/operator.yaml
Note: For the operator to deploy and run RMD instances, an up to date RMD docker image is required.
Custom Resource Definitions (CRDs)
RmdWorkload
The RmdWorkload custom resource is the object used to define a workload for RMD.
RmdWorkload objects can be created directly via the RmdWorkload spec or automatically via the pod spec.
Direct configuration affords the user more control over specific cores and specific nodes on which they wish to configure a particular RmdWorkload. This section describes the direct configuration approach.
Automatic configuration utilizes pod annotations and the intel.com/l3_cache_ways
extended resource to create an RmdWorkload for the same CPUs that are allocated to the pod.
The automatic configuration approach is described later. This approach has a number of limitations and is less stable than direct configuration.
Examples
See samples
directory for RmdWorkload templates.
Cache
See samples/rmd-workload-guaranteed-cache.yaml
apiVersion: intel.com/v1alpha1
kind: RmdWorkload
metadata:
name: rmdworkload-guaranteed-cache
spec:
coreIds: ["0","1","2","3"]
cache:
max: 2
min: 2
nodes: ["worker-node-1", "worker-node-2"]
This workload requests cache from the guaranteed group for CPUs 0 to 3 on nodes "worker-node-1" and "worker-node-2". See intel/rmd for details on cache pools/groups.
Note: Replace "worker-node-1" and "worker-node-2" in nodes field with the actual node name(s) you wish to target with your RmdWorkload spec.
Creating this workload is the equivalent of:
$ curl -H "Content-Type: application/json" --request POST --data \
'{"core_ids":["0","1","2","3"],
"cache" : {"max": 2, "min": 2 } }' \
https://hostname:port/v1/workloads
P-State
See samples/rmd-workload-guaranteed-cache-pstate.yaml
apiVersion: intel.com/v1alpha1
kind: RmdWorkload
metadata:
name: rmdworkload-guaranteed-cache-pstate
spec:
coreIds: ["4","5","6","7"]
cache:
max: 2
min: 2
pstate:
ratio: "3.0"
monitoring: "on"
nodes: ["worker-node-1", "worker-node-2"]
This workload expands on the previous example with manually specified parameters with P-State plugin enabled.
Creating this workload is the equivalent of:
$ curl -H "Content-Type: application/json" --request POST --data \
'{"core_ids":["4","5","6","7"],
"cache" : {"max": 2, "min": 2 } }' \
"pstate" : {"ratio": 3.0, "monitoring" : "on"} }' \
https://hostname:port/v1/workloads
Create RmdWorkload
kubectl create -f samples/rmd-workload-guaranteed-cache.yaml
List RmdWorkloads
kubectl get rmdworkloads
Display a particular RmdWorkload:
kubectl describe rmdworkload rmd-workload-guaranteed-cache
Name: rmdworkload-guaranteed-cache
Namespace: default
API Version: intel.com/v1alpha1
Kind: RmdWorkload
Spec:
Cache:
Max: 2
Min: 2
Core Ids:
0
1
2
3
Nodes:
worker-node-1
worker-node-2
Status:
Workload States:
worker-node-1:
Cos Name: 0_1_2_3-guarantee
Id: 2
Response: Success: 200
Status: Successful
worker-node-2:
Cos Name: 0_1_2_3-guarantee
Id: 2
Response: Success: 200
Status: Successful
This displays the RmdWorkload object including the spec as defined above and the status of the workload. Here, the status shows that this workload was configured successfully on nodes "worker-node-1" and "worker-node-2".
Delete RmdWorkload
When the user deletes an RmdWorkload object, a delete request is sent to the RMD API on every RMD instance on which that RmdWorkload is configured.
kubectl delete rmdworkload rmd-workload-guaranteed-cache
Note: If the user only wishes to delete the RmdWorkload from a specific node, that node should be removed from the RmdWorkload spec's "nodes" field and then apply the RmdWorkload object.
kubectl apply -f samples/rmd-workload-guaranteed-cache.yaml
RmdNodeState
The RmdNodeState custom resource is created for each node in the cluster which has RMD running. The purpose of this object is to allow the user to view all running workloads on a particular node at any given time.
Each RmdNodeState object will be named according to its corresponding node (ie rmd-node-state-<node-name>
).
List all RmdNodeStates on the cluster
kubectl get rmdnodestates
Display a particular RmdNodeState such as the example above
kubectl describe rmdnodestate rmd-node-state-worker-node-1
Name: rmd-node-state-worker-node-1
Namespace: default
API Version: intel.com/v1alpha1
Kind: RmdNodeState
Spec:
Node: worker-node-1
Node UID: 75d03574-6991-4292-8f16-af43a8bfa9a6
Status:
Workloads:
rmdworkload-guaranteed-cache:
Cache Max: 2
Cache Min: 2
Core IDs: 0,1,2,3
Cos Name: 0_1_2_3-guarantee
ID: 1
Origin: REST
Status: Successful
rmdworkload-guaranteed-cache-pstate:
Cache Max: 2
Cache Min: 2
Core IDs: 4,5,6,7
Cos Name: 4_5_6_7-guarantee
ID: 2
Origin: REST
Status: Successful
This example displays the RmdNodeState for worker-node-1. It shows that this node currently has two RMD workloads configured successfully.
Pod Requesting Cache Ways
It is also possible for the operator to create an RmdWorkload automatically by interpreting resource requests and annotations in the pod spec.
Warning: Automatic creation of workloads may be unstable and is not recommended in production for the RMD Operator v0.1. However, testing and feedback is welcomed to help stabilize this approach for future releases.
Under this approach, the user creates a pod with a container requesting exclusive CPUs from the Kubelet CPU Manager and available cache ways. The pod must also contain RMD specific pod annotations to describe the desired RmdWorkload.
It is then the responsiblity of the operator and the node agent to do the following:
- Extract the RMD related data passed to the pod spec by the user.
- Discover which CPUs have been allocated to the container by the CPU Manager.
- Create the RmdWorkload object based on this information.
The following criteria must be met in order for the operator to succesfully create an RmdWorkload for a container based on the pod spec.
- The container must request extended resource
intel.com/l3_cache_ways
.
- The container must also request exclusive CPUs from CPU Manager.
- Pod annotations pertaining to the container requesting cache ways must be prefixed with that container's name. See example and table below.
Example
See samples/pod-guaranteed-cache.yaml
apiVersion: v1
kind: Pod
metadata:
generateName: guaranteed-cache-pod-
labels:
name: nginx
annotations:
nginx1_cache_min: "2"
spec:
containers:
- name: nginx1
image: nginx
resources:
requests:
memory: "64Mi"
cpu: 3
intel.com/l3_cache_ways: 2
limits:
memory: "64Mi"
cpu: 3
intel.com/l3_cache_ways: 2
This pod spec has one container requesting 3 exclusive CPUs and 2 cache ways. The number of cache ways requested is also interpreted as the value for max cache
for the RmdWorkload.
The min cache
value is specified in the pod annotations. The naming convention for RMD workload related annotations must follow the table below.
Pod Annotations Naming Convention
Note: Annotations must be prefixed with the relevant container name as shown below.
Specification |
Container Name |
Required Annotation Name |
Min Cache |
nginx1 |
nginx1_cache_min |
Policy |
nginx1 |
nginx1_policy |
P-State Ratio |
nginx1 |
nginx1_pstate_ratio |
P-State Monitoring |
nginx1 |
nginx1_pstate_monitoring |
Failure to follow the provided annotation naming convention will result in failure to create the desired workload.
Create Pod
kubectl create -f sample/pod-guaranteed-cache.yaml
Display RmdWorkload
If successful, the RmdWorkload will be created with the naming convention rmd-workload-<pod-name>
kubectl describe rmdworkload rmd-workload-guaranteed-cache-pod-86676
Name: rmd-workload-guaranteed-cache-pod-86676
Namespace: default
API Version: intel.com/v1alpha1
Kind: RmdWorkload
Spec:
Cache:
Max: 2
Min: 2
Core Ids:
1
2
49
Nodes:
worker-node-1
Policy:
Pstate:
Monitoring:
Ratio:
Status:
Workload States:
worker-node-1:
Cos Name: 1_2_49-guarantee
Id: 3
Response: Success: 200
Status: Successful
This output displays the RmdWorkload which has been created succesfully based on the pod spec created above.
Note that CPUs 1,2 and 49 have been allocated to the container by the CPU Manager. As this RmdWorkload was created automatically via the pod spec, the user has no control over which CPUs are used by the container.
In order to explicitly define which CPUs are to be allocated cache ways, the RmdWorkload must be created directly via the RmdWorkload spec and not the pod spec.
Delete Pod and RmdWorkload
When an RmdWorkload is created by the operator based on a pod spec, that pod object becomes the owner of the RmdWorkload object it creates. Therefore when a pod that owns an RmdWorkload is deleted, its RmdWorkload child is automatically garbage collected and thus removed from RMD.
kubectl delete pod rmd-workload-guaranteed-cache-pod-86676
Limitations in Creating RmdWorkloads via Pod Spec
- Only one container per pod may request l3 cache ways.
- Automatic configuration is only achievable with the native Kubernetes CPU Manager static policy.
- The user has no control over which CPUs are configured with the automatically created RmdWorkload policy as the CPU Manager is in charge of CPU allocation.
Creating an RmdWorkload automatically via a pod spec is far less reliable than creating directly via an RmdWorkload spec. This is because the user no longer has the ability to explicitly define the specific CPUs on which the RmdWorkload will ultimately be configured.
CPU allocation for containers is the responsibility of the CPU Manager in Kubelet. As a result, the RmdWorkload will only be created after the pod is admitted. Once the RmdWorkload is created by the operator, the RmdWorkload information is sent to RMD in the form of an HTTPS post request.
Should the post to RMD fail at this point for any reason, the operator will then terminate the pod and by association the RmdWorkload.
To discover why the pod was terminated by the operator it is necessary to check the opertor pod's logs.
Example
kubectl logs intel-rmd-operator-6464fcfb94-4cvqn | grep guaranteed-cache-pod-9wbk9 -A 1
Example Output
{"level":"info","ts":1591601591.2043474,"logger":"controller_rmdworkload","msg":"Workload not found on RMD instance, create.","Request.Namespace":"default","Request.Name":"rmd-workload-guaranteed-cache-pod-2dnwh"} {"level":"error","ts":1591601591.2067816,"logger":"controller_rmdworkload.postWorkload","msg":"Failed to post workload to RMD","Response:":"Fail: Failed to validate workload. Reason: Workload validation in database failed. Details: CPU list 6 has been assigned\n","error":"Response status code error"...
Workflows
Direct Configuration
Automatic Configuration