k8sgpusharing

module

v0.0.0-...-78ba78e Latest Latest Go to latest Published: Feb 20, 2020 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ncy9371/k8sgpusharing

Links

Open Source Insights

README ¶

Please refer to NTHU-LSALAB/KubeShare

K8sGPUSharing

Make GPU shareable in Kubernetes

Features

Compatible with native K8s resource management
Fine-grained resource definition
Support GPU compute and memory limit
Support multiple GPU in a node
Avoid GPU fragmentation problem
Support GPU namespace

Interested?

Follow this link

Directories

pkg: golang packages for K8s generated by code-generator release-1.14 (6c2a4329ac29)

Limitation

Only support Nvidia GPU device plugin with nvidia-docker2 in K8s.
Not compatible with docker (version>=19) using newer GPU resource API.
Currently only support cuda 9.0
Require Kubernetes version >= 1.10 (device plugin & CRD)

Prerequisite

A K8s cluster with Nvidia GPU device plugin.
kubectl with admin permissions.

Installation

kubectl create -f https://lsalab.cs.nthu.edu.tw/~ericyeh/gpusharing/crd.yaml
kubectl create -f https://lsalab.cs.nthu.edu.tw/~ericyeh/gpusharing/controller.yaml
kubectl create -f https://lsalab.cs.nthu.edu.tw/~ericyeh/gpusharing/daemonset.yaml

Using Shareable GPU

In order to add extra information (some environment variables, volume mounts needed by shareable GPU are immutable after Pod had been created) to Pods created by user, we create a new CustomResourceDefinition (CRD) named MtgpuPod (Multi-tenant GPU Pod) as the basic execution unit which is originally represented by Pod.

An example for MtgpuPod spec:

apiVersion: lsalab.nthu/v1
kind: MtgpuPod
metadata:
  name: pod1
  annotations:
    "lsalab.nthu/gpu_request": "0.5"
    "lsalab.nthu/gpu_limit": "1.0"
    "lsalab.nthu/gpu_mem": "1073741824" # 1Gi, in bytes
    "lsalab.nthu/GPUID": "abc"
spec:
  nodeName: node1 # must be assigned
  containers:
  - name: sleep
    image: nvidia/cuda:9.0-base
    command: ["sh", "-c"]
    args:
    - 'nvidia-smi -L'
    resources:
      requests:
        cpu: "1"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "500Mi"

Because floating point custom device requests is forbidden by K8s, we move GPU resource usage definitions to Annotations.

lsalab.nthu/gpu_request: guaranteed GPU usage of Pod, gpu_request <= "1.0".
lsalab.nthu/gpu_limit: maximum extra usage if GPU still has free resources, gpu_request <= gpu_limit <= "1.0".
lsalab.nthu/gpu_mem: maximum GPU memory usage of Pod.
lsalab.nthu/GPUID: described in section Controlling everything of shareable GPU.
spec is a normal PodSpec definition to be deployed.
spec.nodeName must be assigned (a deployed MtgpuPod must be scheduled). More information described in section Cluster resources accounting.

Controlling everything of shareable GPU

A lsalab.nthu/GPUID (abbrv. GPUID) value is generated by random string of length 5, which temporary represents a physical GPU card. A GPUID value is available only if there is at least one Pod using the GPUID on the Node. The same GPUID can be assigned by multiple Pods simultaneously if the Pods want to use the same physical GPU card.

GPUID must be unique in Node scope.

An example for controlling GPU sharing: (one node with two physical GPU)

GPU1             GPU2
+--------------+ +--------------+
|              | |              |
|              | |              |
|              | |              |
|              | |              |
|              | |              |
+--------------+ +--------------+

randomString(5): zxcvb (wants a new physical GPU)
Create Pod1 gpu_request:0.2 GPUID:zxcvb

GPU1             GPU2(zxcvb)
+--------------+ +--------------+
|              | |   Pod1:0.2   |
|              | |              |
|              | |              |
|              | |              |
|              | |              |
+--------------+ +--------------+

randomString(5): qwert (don't want to share with Pod1, wants a new physical GPU)
Create Pod2 gpu_request:0.3 GPUID:qwert

GPU1(qwert)      GPU2(zxcvb)
+--------------+ +--------------+
|   Pod2:0.3   | |   Pod1:0.2   |
|              | |              |
|              | |              |
|              | |              |
|              | |              |
+--------------+ +--------------+

Create Pod3 gpu_request:0.4 GPUID:zxcvb (wants to share with Pod1)

GPU1(qwert)      GPU2(zxcvb)
+--------------+ +--------------+
|   Pod2:0.3   | |   Pod1:0.2   |
|              | |   Pod3:0.4   |
|              | |              |
|              | |              |
|              | |              |
+--------------+ +--------------+

Delete Pod2 (GPUID qwert is no longer available)

GPU1             GPU2(zxcvb)
+--------------+ +--------------+
|              | |   Pod1:0.2   |
|              | |   Pod3:0.4   |
|              | |              |
|              | |              |
|              | |              |
+--------------+ +--------------+

randomString(5): asdfg (don't want to share with Pod1 and Pod3, wants a new physical GPU)
Create Pod4 gpu_request:0.5 GPUID:asdfg

GPU1(asdfg)      GPU2(zxcvb)
+--------------+ +--------------+
|   Pod4:0.5   | |   Pod1:0.2   |
|              | |   Pod3:0.4   |
|              | |              |
|              | |              |
|              | |              |
+--------------+ +--------------+

Compatible with Nvidia device plugin

The Occupy-Pod

To prevent K8s default-scheduler cannot recognize MtgpuPods, causing that default-scheduler schedules Pods to Node whose physcial GPUs are used by MtgpuPod, we run an Occupy-Pod with 1 Nvidia device plugin GPU request (spec.containers[0].resources.requests: "nvidia.com/gpu": 1) when a new GPUID was generated, telling default-scheduler that GPU is wanted by MtgpuPod system.

Occupy-Pods was created in kube-system namespace, in naming format: mtgpupod-occupypod-{NodeName}-{GPUID}.

Cluster resources accounting

For some reasons, K8s default-scheduler doesn't support shareable custom devices now. It's necessary to run a MtgpuPod-scheduler for automatically deploying MtgpuPod. Although we may provide a basic MtgpuPod-scheduler in future that only support resource request scheduling, we still describe the method (pseudo-code) for accounting shareable custom device resources.

GPUResources := list of free GPU of every Node

for each Node:
    availableGPU := available (total) GPU on Node
    allocatedGPUMap := map of GPUID=>usage

    for each Pod on Node:
        if ! Pod.Name.Contains("mtgpupod-occupypod") // avoid repeat calculate to MtgpuPod
            availableGPU -= sum of "nvidia.com/gpu" request of containers in Pod

    for each MtgpuPod on Node:
        if GPUID of MtgpuPod in allocatedGPUMap exists:
            allocatedGPUMap.Get(GPUID) -= GPU request of MtgpuPod
        else:
            availableGPU -= 1
            allocatedGPUMap.Add(GPUID)
            allocatedGPUMap.Get(GPUID) = 1.0 - GPU request of MtgpuPod

    GPUResources.Add(availableGPU, allocatedGPUMap)

Uninstallation

kubectl delete -f https://lsalab.cs.nthu.edu.tw/~ericyeh/gpusharing/crd.yaml
kubectl delete -f https://lsalab.cs.nthu.edu.tw/~ericyeh/gpusharing/controller.yaml
kubectl delete -f https://lsalab.cs.nthu.edu.tw/~ericyeh/gpusharing/daemonset.yaml

Issues

Currently the GPU memory usage control may not work properly.

Directories ¶

Path	Synopsis
pkg
apis/mtgpupod
apis/mtgpupod/v1
client/clientset/versioned This package has the automatically generated clientset.	This package has the automatically generated clientset.
client/clientset/versioned/fake This package has the automatically generated fake clientset.	This package has the automatically generated fake clientset.
client/clientset/versioned/scheme This package contains the scheme of the automatically generated clientset.	This package contains the scheme of the automatically generated clientset.
client/clientset/versioned/typed/mtgpupod/v1 This package has the automatically generated typed clients.	This package has the automatically generated typed clients.
client/clientset/versioned/typed/mtgpupod/v1/fake Package fake has the automatically generated clients.	Package fake has the automatically generated clients.
client/informers/externalversions
client/informers/externalversions/internalinterfaces
client/informers/externalversions/mtgpupod
client/informers/externalversions/mtgpupod/v1
client/listers/mtgpupod/v1

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL