nvidia_gpu_prometheus_exporter

command module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 25, 2025 License: Apache-2.0 Imports: 10 Imported by: 0

README

GitHub Actions

NVIDIA GPU Prometheus Exporter

This is a Prometheus Exporter for exporting NVIDIA GPU metrics. It uses the NVIDIA Go NVML bindings for NVIDIA Management Library (NVML) which is a C-based API that can be used for monitoring NVIDIA GPU devices. Unlike some other similar exporters, it does not call the nvidia-smi binary.

This Exporter is a fork of https://github.com/mindprince/nvidia_gpu_prometheus_exporter with the following main changes:

  • added parsing of /run/gpustat/XX for jobid and uid of the user running on the GPU. Slurm scripts that take advantage of this are available on jobstats website.
  • switched from Go bindings to NVIDIA Go NVML bindings
  • added support for MIG instance autodetection and stats

Building

E.g.

go build

Running

The exporter requires the following:

  • access to NVML library (libnvidia-ml.so.1).
  • access to the GPU devices.

To make sure that the exporter can access the NVML libraries, either add them to the search path for shared libraries. Or set LD_LIBRARY_PATH to point to their location.

By default the metrics are exposed on localhost:9445/metrics. The port can be modified using the -web.listen-address flag.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL