Virtual GPU device plugin for Kubernetes
The virtual device plugin for Kubernetes is a Daemonset that allows you to automatically:
- Expose arbitrary number of virtual GPUs on GPU nodes of your cluster.
- Run ML serving containers backed by Accelerator with low latency and low cost in your Kubernetes cluster.
This repository contains AWS virtual GPU implementation of the Kubernetes device plugin.
Prerequisites
The list of prerequisites for running the virtual device plugin is described below:
- NVIDIA drivers ~= 361.93
- nvidia-docker version > 2.0 (see how to install and it's prerequisites)
- docker configured with nvidia as the default runtime.
- Kubernetes version >= 1.10
Limitations
- This solution is build on top of Volta Multi-Process Service(MPS). You can only use it on instances types with Tesla-V100 or newer. (Only Amazon EC2 P3 Instances and Amazon EC2 G4 Instances now)
- Virtual GPU device plugin by default set GPU compute mode to
EXCLUSIVE_PROCESS
which means GPU is assigned to MPS process, individual process threads can submit work to GPU concurrently via MPS server. This GPU can not be used for other purpose.
- Virtual GPU device plugin only on single physical GPU instance like P3.2xlarge if you request
k8s.amazonaws.com/vgpu
more than 1 in the workloads.
- Virtual GPU device plugin can not work with Nvidia device plugin together. You can label nodes and use selector to install Virtual GPU device plugin.
High Level Design
Quick Start
Label GPU node groups
kubectl label node <your_k8s_node_name> k8s.amazonaws.com/accelerator=vgpu
Enabling virtual GPU Support in Kubernetes
Update node selector label in the manifest file to match with labels of your GPU node group, then apply it to Kubernetes.
$ kubectl create -f https://raw.githubusercontent.com/awslabs/aws-virtual-gpu-device-plugin/v0.1.0/manifests/device-plugin.yml
Running GPU Jobs
Virtual NVIDIA GPUs can now be consumed via container level resource requirements using the resource name k8s.amazonaws.com/vgpu
:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: resnet-deployment
spec:
replicas: 3
template:
metadata:
labels:
app: resnet-server
spec:
# hostIPC is required for MPS communication
hostIPC: true
containers:
- name: resnet-container
image: seedjeffwan/tensorflow-serving-gpu:resnet
args:
# Make sure you set limit based on the vGPU account to avoid tf-serving process occupy all the gpu memory
- --per_process_gpu_memory_fraction=0.2
env:
- name: MODEL_NAME
value: resnet
ports:
- containerPort: 8501
# Use virtual gpu resource here
resources:
limits:
k8s.amazonaws.com/vgpu: 1
volumeMounts:
- name: nvidia-mps
mountPath: /tmp/nvidia-mps
volumes:
- name: nvidia-mps
hostPath:
path: /tmp/nvidia-mps
WARNING: if you don't request GPUs when using the device plugin all
the GPUs on the machine will be exposed inside your container.
Check the full example here
Development
Please check Development for more details.
Credits
The project idea comes from @RenaudWasTaken comment in kubernetes/kubernetes#52757 and Alibaba’s solution from @cheyang GPU Sharing Scheduler Extender Now Supports Fine-Grained Kubernetes Clusters.
Reference
AWS:
Community:
License
This project is licensed under the Apache-2.0 License.