nvidia-scheduler-extender

module

v0.4.0 Latest Latest Go to latest Published: May 5, 2022 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/CERIT-SC/nvidia-scheduler-extender

Links

Open Source Insights

README ¶

Overview

More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization. So one important challenge is how to share GPUs between the pods. The community is also very interested in this topic.

Now there is a GPU sharing solution on native Kubernetes: it is based on scheduler extenders and device plugin mechanism, so you can reuse this solution easily in your own Kubernetes.

Prerequisites

Kubernetes 1.11+
golang 1.10+
NVIDIA drivers ~= 361.93
Nvidia-docker version > 2.0 (see how to install and it's prerequisites)
Docker configured with Nvidia as the default runtime.

Design

For more details about the design of this project, please read this Design document.

Setup

You can follow this Installation Guide. If you are using Alibaba Cloud Kubernetes, please follow this doc to install with Helm Charts.

User Guide

You can check this User Guide.

Developing

Scheduler Extender

git clone https://github.com/CERIT-SC/nvidia-scheduler-extender.git && cd gpushare-scheduler-extender
docker build -t cheyang/gpushare-scheduler-extender .

Device Plugin

git clone https://github.com/CERIT-SC/nvidia-device-plugin.git && cd gpushare-device-plugin
docker build -t cheyang/gpushare-device-plugin .

Kubectl Extension

golang > 1.10

mkdir -p $GOPATH/src/github.com/CERIT-SC
cd $GOPATH/src/github.com/CERIT-SC
git clone https://github.com/CERIT-SC/nvidia-device-plugin.git
cd gpushare-device-plugin
go build -o $GOPATH/bin/kubectl-inspect-gpushare-v2 cmd/inspect/*.go

Demo

- Demo 1: Deploy multiple GPU Shared Pods and schedule them on the same GPU device in binpack way

- Demo 2: Avoid GPU memory requests that fit at the node level, but not at the GPU device level

gpushare device plugin

Roadmap

Integrate Nvidia MPS as the option for isolation
Automated Deployment for the Kubernetes cluster which is deployed by kubeadm
Scheduler Extener High Availablity
Generic Solution for GPU, RDMA and other devices

Adopters

If you are intrested in GPUShare and would like to share your experiences with others, you are warmly welcome to add your information on ADOPTERS.md page. We will continuousely discuss new requirements and feature design with you in advance.

Acknowledgments

GPU sharing solution is based on Nvidia Docker2, and their gpu sharing design is our reference. The Nvidia Community is very supportive and We are very grateful.

Directories ¶

Path	Synopsis
cmd
pkg
cache
gpushare
routes
scheduler
utils
utils/signals

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL