Directories ¶
Path | Synopsis |
---|---|
api
|
|
client
|
|
v1
Package v1 provides the gpud v1 client for the server.
|
Package v1 provides the gpud v1 client for the server. |
cmd
|
|
Package components defines the common interfaces for the components.
|
Package components defines the common interfaces for the components. |
accelerator
Package accelerator contains the accelerator components and its query interface.
|
Package accelerator contains the accelerator components and its query interface. |
accelerator/nvidia
Package nvidia contains the NVIDIA accelerator components and its query interface.
|
Package nvidia contains the NVIDIA accelerator components and its query interface. |
accelerator/nvidia/bad-envs
Package badenvs tracks any bad environment variables that are globally set for the NVIDIA GPUs.
|
Package badenvs tracks any bad environment variables that are globally set for the NVIDIA GPUs. |
accelerator/nvidia/bad-envs/id
Package id defines the ID for the bad-envs check.
|
Package id defines the ID for the bad-envs check. |
accelerator/nvidia/clock
Package clock monitors NVIDIA GPU clock events of all GPUs, such as HW Slowdown events
|
Package clock monitors NVIDIA GPU clock events of all GPUs, such as HW Slowdown events |
accelerator/nvidia/clock-speed
Package clockspeed tracks the NVIDIA per-GPU clock speed.
|
Package clockspeed tracks the NVIDIA per-GPU clock speed. |
accelerator/nvidia/ecc
Package ecc tracks the NVIDIA per-GPU ECC errors and other ECC related information.
|
Package ecc tracks the NVIDIA per-GPU ECC errors and other ECC related information. |
accelerator/nvidia/error
Package error implements NVIDIA GPU driver error detector.
|
Package error implements NVIDIA GPU driver error detector. |
accelerator/nvidia/error-xid-sxid
Package errorxidsxid implements NVIDIA GPU driver Xid/SXid error detector.
|
Package errorxidsxid implements NVIDIA GPU driver Xid/SXid error detector. |
accelerator/nvidia/error-xid-sxid/id
Package id is the identifier for the nvidia error xid sxid component.
|
Package id is the identifier for the nvidia error xid sxid component. |
accelerator/nvidia/error/sxid
Package sxid tracks the NVIDIA GPU SXid errors scanning the dmesg.
|
Package sxid tracks the NVIDIA GPU SXid errors scanning the dmesg. |
accelerator/nvidia/error/sxid/id
Package id provides the nvidia error sxid id component.
|
Package id provides the nvidia error sxid id component. |
accelerator/nvidia/error/xid
Package xid tracks the NVIDIA GPU Xid errors scanning the dmesg and using the NVIDIA Management Library (NVML).
|
Package xid tracks the NVIDIA GPU Xid errors scanning the dmesg and using the NVIDIA Management Library (NVML). |
accelerator/nvidia/error/xid/id
Package id provides the nvidia error xid id component.
|
Package id provides the nvidia error xid id component. |
accelerator/nvidia/fabric-manager
Package fabricmanager tracks the NVIDIA fabric manager version and its activeness.
|
Package fabricmanager tracks the NVIDIA fabric manager version and its activeness. |
accelerator/nvidia/gpm
Package gpm tracks the NVIDIA per-GPU GPM metrics.
|
Package gpm tracks the NVIDIA per-GPU GPM metrics. |
accelerator/nvidia/gsp-firmware-mode
Package gspfirmwaremode tracks the NVIDIA GSP firmware mode.
|
Package gspfirmwaremode tracks the NVIDIA GSP firmware mode. |
accelerator/nvidia/gsp-firmware-mode/id
Package id defines the GSP firmware component ID.
|
Package id defines the GSP firmware component ID. |
accelerator/nvidia/infiniband
Package infiniband monitors the infiniband status of the system.
|
Package infiniband monitors the infiniband status of the system. |
accelerator/nvidia/infiniband/id
Package id provides the ID for the NVIDIA InfiniBand component.
|
Package id provides the ID for the NVIDIA InfiniBand component. |
accelerator/nvidia/info
Package info provides relatively static information about the NVIDIA accelerator (e.g., GPU product names).
|
Package info provides relatively static information about the NVIDIA accelerator (e.g., GPU product names). |
accelerator/nvidia/memory
Package memory tracks the NVIDIA per-GPU memory usage.
|
Package memory tracks the NVIDIA per-GPU memory usage. |
accelerator/nvidia/nccl
Package nccl monitors the NCCL status.
|
Package nccl monitors the NCCL status. |
accelerator/nvidia/nvlink
Package nvlink monitors the NVIDIA per-GPU nvlink devices.
|
Package nvlink monitors the NVIDIA per-GPU nvlink devices. |
accelerator/nvidia/peermem
Package peermem monitors the peermem module status.
|
Package peermem monitors the peermem module status. |
accelerator/nvidia/persistence-mode
Package persistencemode tracks the NVIDIA persistence mode.
|
Package persistencemode tracks the NVIDIA persistence mode. |
accelerator/nvidia/persistence-mode/id
Package id defines the persistence mode component ID.
|
Package id defines the persistence mode component ID. |
accelerator/nvidia/power
Package power tracks the NVIDIA per-GPU power usage.
|
Package power tracks the NVIDIA per-GPU power usage. |
accelerator/nvidia/processes
Package processes tracks the NVIDIA per-GPU processes.
|
Package processes tracks the NVIDIA per-GPU processes. |
accelerator/nvidia/query
Package query implements "nvidia-smi --query" output helpers.
|
Package query implements "nvidia-smi --query" output helpers. |
accelerator/nvidia/query/fabric-manager-log
Package fabricmanagerlog implements the fabric manager log poller.
|
Package fabricmanagerlog implements the fabric manager log poller. |
accelerator/nvidia/query/infiniband
Package infiniband provides utilities to query infiniband status.
|
Package infiniband provides utilities to query infiniband status. |
accelerator/nvidia/query/metrics/clock
Package clock provides the NVIDIA clock metrics collection and reporting.
|
Package clock provides the NVIDIA clock metrics collection and reporting. |
accelerator/nvidia/query/metrics/clock-speed
Package clockspeed provides the NVIDIA clock speed metrics collection and reporting.
|
Package clockspeed provides the NVIDIA clock speed metrics collection and reporting. |
accelerator/nvidia/query/metrics/ecc
Package ecc provides the NVIDIA ECC metrics collection and reporting.
|
Package ecc provides the NVIDIA ECC metrics collection and reporting. |
accelerator/nvidia/query/metrics/gpm
Package gpm provides the NVIDIA GPM metrics collection and reporting.
|
Package gpm provides the NVIDIA GPM metrics collection and reporting. |
accelerator/nvidia/query/metrics/memory
Package memory provides the NVIDIA memory metrics collection and reporting.
|
Package memory provides the NVIDIA memory metrics collection and reporting. |
accelerator/nvidia/query/metrics/nvlink
Package nvlink provides the NVIDIA nvlink metrics collection and reporting.
|
Package nvlink provides the NVIDIA nvlink metrics collection and reporting. |
accelerator/nvidia/query/metrics/power
Package power provides the NVIDIA power usage metrics collection and reporting.
|
Package power provides the NVIDIA power usage metrics collection and reporting. |
accelerator/nvidia/query/metrics/processes
Package processes provides the NVIDIA processes metrics collection and reporting.
|
Package processes provides the NVIDIA processes metrics collection and reporting. |
accelerator/nvidia/query/metrics/remapped-rows
Package remappedrows provides the NVIDIA row remapping metrics collection and reporting.
|
Package remappedrows provides the NVIDIA row remapping metrics collection and reporting. |
accelerator/nvidia/query/metrics/temperature
Package temperature provides the NVIDIA temperature metrics collection and reporting.
|
Package temperature provides the NVIDIA temperature metrics collection and reporting. |
accelerator/nvidia/query/metrics/utilization
Package utilization provides the NVIDIA GPU utilization metrics collection and reporting.
|
Package utilization provides the NVIDIA GPU utilization metrics collection and reporting. |
accelerator/nvidia/query/nccl
Package nccl contains the implementation of the NCCL (NVIDIA Collective Communications Library) query for NVIDIA GPUs.
|
Package nccl contains the implementation of the NCCL (NVIDIA Collective Communications Library) query for NVIDIA GPUs. |
accelerator/nvidia/query/nvml
Package nvml implements the NVIDIA Management Library (NVML) interface.
|
Package nvml implements the NVIDIA Management Library (NVML) interface. |
accelerator/nvidia/query/peermem
Package peermem contains the implementation of the peermem query for NVIDIA GPUs.
|
Package peermem contains the implementation of the peermem query for NVIDIA GPUs. |
accelerator/nvidia/query/sxid
Package sxid provides the NVIDIA SXID error details.
|
Package sxid provides the NVIDIA SXID error details. |
accelerator/nvidia/query/xid
Package xid provides the NVIDIA XID error details.
|
Package xid provides the NVIDIA XID error details. |
accelerator/nvidia/query/xid-sxid-state
Package xidsxidstate provides the persistent storage layer for the nvidia query results.
|
Package xidsxidstate provides the persistent storage layer for the nvidia query results. |
accelerator/nvidia/remapped-rows
Package remappedrows tracks the NVIDIA per-GPU remapped rows.
|
Package remappedrows tracks the NVIDIA per-GPU remapped rows. |
accelerator/nvidia/temperature
Package temperature tracks the NVIDIA per-GPU temperatures.
|
Package temperature tracks the NVIDIA per-GPU temperatures. |
accelerator/nvidia/utilization
Package utilization tracks the NVIDIA per-GPU utilization.
|
Package utilization tracks the NVIDIA per-GPU utilization. |
common
Package common contains common types and functions used across multiple components.
|
Package common contains common types and functions used across multiple components. |
containerd
Package containerd contains the containerd components and its query interface.
|
Package containerd contains the containerd components and its query interface. |
containerd/pod
Package pod tracks the current pods from the containerd CRI.
|
Package pod tracks the current pods from the containerd CRI. |
cpu
Package cpu tracks the combined usage of all CPUs (not per-CPU).
|
Package cpu tracks the combined usage of all CPUs (not per-CPU). |
cpu/metrics
Package metrics implements the CPU metrics collection and reporting.
|
Package metrics implements the CPU metrics collection and reporting. |
diagnose
Package diagnose provides a way to diagnose the system and components.
|
Package diagnose provides a way to diagnose the system and components. |
disk
Package disk tracks the disk usage of all the mount points specified in the configuration.
|
Package disk tracks the disk usage of all the mount points specified in the configuration. |
disk/metrics
Package metrics implements the disk metrics collection and reporting.
|
Package metrics implements the disk metrics collection and reporting. |
dmesg
Package dmesg scans and watches dmesg outputs for errors, as specified in the configuration (e.g., regex match NVIDIA GPU errors).
|
Package dmesg scans and watches dmesg outputs for errors, as specified in the configuration (e.g., regex match NVIDIA GPU errors). |
docker
Package docker contains the docker components and its query interface.
|
Package docker contains the docker components and its query interface. |
docker/container
Package container tracks the current containers from the docker runtime.
|
Package container tracks the current containers from the docker runtime. |
fd
Package fd tracks the number of file descriptors used on the host.
|
Package fd tracks the number of file descriptors used on the host. |
fd/metrics
Package metrics implements the file descriptor metrics collection and reporting.
|
Package metrics implements the file descriptor metrics collection and reporting. |
file
Package file provides a component that returns healthy if and only if all the specified files exist.
|
Package file provides a component that returns healthy if and only if all the specified files exist. |
file/id
Package id defines the component ID for the file component.
|
Package id defines the component ID for the file component. |
info
Package info provides static information about the host (e.g., labels, IDs).
|
Package info provides static information about the host (e.g., labels, IDs). |
k8s/pod
Package pod tracks the current pods from the kubelet read-only port.
|
Package pod tracks the current pods from the kubelet read-only port. |
kernel-module
Package kernelmodule provides a component that checks the kernel modules in Linux.
|
Package kernelmodule provides a component that checks the kernel modules in Linux. |
kernel-module/id
Package id defines the component ID for the kernel module component.
|
Package id defines the component ID for the kernel module component. |
library
Package library provides a component that returns healthy if and only if all the specified libraries exist.
|
Package library provides a component that returns healthy if and only if all the specified libraries exist. |
memory
Package memory tracks the memory usage of the host.
|
Package memory tracks the memory usage of the host. |
memory/metrics
Package metrics implements the memory metrics collection and reporting.
|
Package metrics implements the memory metrics collection and reporting. |
metrics
Package metrics implements metrics collection and reporting.
|
Package metrics implements metrics collection and reporting. |
metrics/state
Package state provides the persistent storage layer for the metrics.
|
Package state provides the persistent storage layer for the metrics. |
network/latency
Package latency tracks the global network connectivity statistics.
|
Package latency tracks the global network connectivity statistics. |
network/latency/metrics
Package metrics implements the network latency metrics collection and reporting.
|
Package metrics implements the network latency metrics collection and reporting. |
os
Package os queries the host OS information (e.g., kernel version).
|
Package os queries the host OS information (e.g., kernel version). |
power-supply
Package powersupply tracks the power supply/usage on the host.
|
Package powersupply tracks the power supply/usage on the host. |
query
Package query provides the query/poller implementation.
|
Package query provides the query/poller implementation. |
query/config
Package config provides the query/poller configuration.
|
Package config provides the query/poller configuration. |
query/log
Package log provides the log file/output poller implementation.
|
Package log provides the log file/output poller implementation. |
query/log/common
Package common provides the common log components.
|
Package common provides the common log components. |
query/log/config
Package config provides the log poller configuration.
|
Package config provides the log poller configuration. |
query/log/state
Package state provides the persistent storage layer for the log poller.
|
Package state provides the persistent storage layer for the log poller. |
query/log/tail
Package tail implements the log file/output tail-ing operations.
|
Package tail implements the log file/output tail-ing operations. |
state
Package state provides the persistent storage layer for component states.
|
Package state provides the persistent storage layer for component states. |
systemd
Package systemd tracks the systemd state and unit files.
|
Package systemd tracks the systemd state and unit files. |
tailscale
Package tailscale tracks the tailscale state (e.g., version) if available.
|
Package tailscale tracks the tailscale state (e.g., version) if available. |
Package config provides the gpud configuration data for the server.
|
Package config provides the gpud configuration data for the server. |
docs
|
|
apis
Package apis Code generated by swaggo/swag.
|
Package apis Code generated by swaggo/swag. |
Package errdefs provides common error definitions for gpud.
|
Package errdefs provides common error definitions for gpud. |
internal
|
|
Package log provides the logging functionality for gpud.
|
Package log provides the logging functionality for gpud. |
Package pkg contains a set of generic Go packages that are useful to gpud and possibly to other projects.
|
Package pkg contains a set of generic Go packages that are useful to gpud and possibly to other projects. |
aws/eks
Package eks implements EKS utils.
|
Package eks implements EKS utils. |
dmesg
Package dmesg provides the functionality to poll the dmesg log.
|
Package dmesg provides the functionality to poll the dmesg log. |
file
Package file implements file utils.
|
Package file implements file utils. |
host
Package host provides the host information.
|
Package host provides the host information. |
latency
Package latency contains logic for egress traffic from each device.
|
Package latency contains logic for egress traffic from each device. |
latency/edge
Package edge provides a client for the Tailscale DERP (Designated Edge Router Protocol) service.
|
Package edge provides a client for the Tailscale DERP (Designated Edge Router Protocol) service. |
latency/edge/derpmap
Package derpmap provides the tailscale derp map implementation.
|
Package derpmap provides the tailscale derp map implementation. |
latency/edge/derpmap/sync
"sync" syncs the tailscale derp map.
|
"sync" syncs the tailscale derp map. |
process
Package process provides the process runner implementation on the host.
|
Package process provides the process runner implementation on the host. |
reboot
Package reboot provides a function to reboot the system.
|
Package reboot provides a function to reboot the system. |
sqlite
Package sqlite provides a SQLite3 database utils.
|
Package sqlite provides a SQLite3 database utils. |
systemd
Package systemd provides the common systemd helper functions.
|
Package systemd provides the common systemd helper functions. |
Package rootkeys provides the root keys for the server.
|
Package rootkeys provides the root keys for the server. |
Package systemd provides the systemd artifacts and variables for the gpud server.
|
Package systemd provides the systemd artifacts and variables for the gpud server. |
third_party
|
|
tailscale/distsign
Package distsign implements signature and validation of arbitrary distributable files.
|
Package distsign implements signature and validation of arbitrary distributable files. |
Package update provides the update functionality for the server.
|
Package update provides the update functionality for the server. |
Package version provides the version information for the gpud server.
|
Package version provides the version information for the gpud server. |
Click to show internal directories.
Click to hide internal directories.