Cambricon Device Plugin for Kubernetes
About Cambricon Device Plugin
The Cambricon device plugin for Kubernetes is a Daemonset which allows you to automatically:
- Report the quantity of MLU on each nodes of your cluster.
- Monitor the health status of MLUs.
- Be capable to run containers with MLU enabled.
This repository contains Cambricon's official implementation of the Kubernetes device plugin.
Prerequisites
The prerequisites for running the Cambricon Device Plugin:
- Support MLU3xx devices
- For MLU 3xx needs driver >= 4.20.9
- For MLU 3xx needs cntoolkit >= 2.8.2 on your building machine
- For SMLU needs driver >= 5.10.27 and cntoolkit >= 3.9.0
For MLU driver version before 4.9.13, and need to support MLU2xx please use release v1.1.3.
For Kubernetes version < 1.19.0, mlulink topology-aware mode can not be used. If you want to use this feature, make sure your Kubernetes version >= 1.19.0.
Quick Start
Preparing your MLU Nodes
It is assumed that Cambricon MLU driver and neuware are installed on your MLU Nodes.
Download and build
git clone https://github.com/Cambricon/cambricon-k8s-device-plugin.git
cd cambricon-k8s-device-plugin/device-plugin
Set the following environment variables if you need.
env |
description |
APT_PROXY |
apt proxy address |
GOPROXY |
golang proxy address |
ARCH |
target platform architecture, amd64 or arm64, amd64 by default |
LIBCNDEV |
absolute path of the libcndev.so binary, neuware installation path by default |
BASE_IMAGE |
device plugin base image |
CNTOPO |
absolute path of the cntopo binary, neuware installation path by default |
Docker should be >= 17.05.0 on your building machine. If you want to cross build, make sure docker version >= 19.03.
For amd64:
GOPROXY=https://goproxy.cn ./build_image.sh
For arm64:
ARCH=arm64 GOPROXY=https://goproxy.cn ./build_image.sh
Please make sure Cambricon neuware and cncl is installed in your compiling environment.
It uses libcndev.so and cntopo binary on your compiling machine and generates docker image in folder ./image
.
Enabling MLU Support in Kubernetes
-
Push the docker image to the docker repo of your cluster or load the docker image on all your MLU nodes by:
docker load -i image/cambricon-device-plugin-amd64.tar
-
Enable MLU support in your cluster by deploying the daemonset in examples folder:
Set the args in the yaml file
args:
- --mode=default #device plugin mode: default, env-share, mim and topology-aware
- --virtualization-num=1 # virtualization number for each MLU, used only in env-share mode, set to 110 to support multi cards per container in env-share mode
- --mlulink-policy=best-effort # MLULink topology policy: best-effort, guaranteed or restricted, used only in topology-aware mode
- --cnmon-path=/usr/bin/cnmon # host machine cnmon path, must be absolute path. comment out this line to avoid mounting cnmon.
- --enable-device-type #comment to enable device registration with type info
#- --enable-console #uncomment to enable UART console device(/dev/ttyMS) in container
#- --disable-health-check #uncomment to disable health check
#- --mount-rpmsg #uncomment to mount RPMsg directory, will be deprecated in the near future
supported features:
- default: default mode
- env-share: a whole card can be allocated into multiple containers, should set
virtualization-num
as maximum number of containers one MLU can be allocated into.
- mim: supports
mim
mode, need to preset mim
in the host before device plugin starting
- topology-aware: device plugin is aware of MLULink topology and tries to allocate MLUs forming a ring. Set
mlulink-policy
as described below
MLULink topology policies, guaranteed and restricted only works for 1,2,4,8 requested MLUs:
- best-effort: allocate devices forming maximum number of rings whenever possible
- guaranteed: allocated devices must form at least one ring, otherwise return error
- restricted: allocated devices must form the best rings, otherwise return error
kubectl create -f examples/cambricon-device-plugin-daemonset.yaml
(Optional) If you do not want the daemonset way of deployment, edit the static pod template in examples folder and
put the file into your configured static pod folder (/etc/kubernetes/manifests
by default).
-
Create rbac role and config.
kubectl create -f examples/cambricon-device-plugin-config.yaml
Install with helm
This step is optional, you can also install device plugin via helm
-
Download and install helm firstly, refer https://helm.sh/docs/intro/install/
-
Clone this repo, cd device-plugin/deploys/helm
and run helm install release-x.x.x device-plugin/
Running MLU Jobs
Cambricon MLUs can now be consumed via container level resource requirements using the resource name cambricon.com/mlu
:
apiVersion: v1
kind: Pod
metadata:
name: pod1
spec:
restartPolicy: OnFailure
containers:
- image: ubuntu:16.04
name: pod1-ctr
command: ["sleep"]
args: ["100000"]
resources:
limits:
cambricon.com/mlu: "1" # use this when device type is not enabled, else delete this line.
#cambricon.com/mlu370: "1" #uncomment to use when device type is enabled
#cambricon.com/mlu370.share: "1" #uncomment to use device with env-share mode
#cambricon.com/mlu370.mim-2m.8gb: "1" #uncomment to use device with mim mode
Change log level
Support to change log level and validate without restarting
change to debug
curl -i http://{{device-plugin-pod-ip}}:30107/logLevel?level=debug
change to info
curl -i http://{{device-plugin-pod-ip}}:30107/logLevel?level=info
Upgrade Notice
Please see changelog for deprecation and breaking changes.