Documentation
¶
Overview ¶
Package components defines the common interfaces for the components.
Index ¶
- Constants
- func GetAllComponents() map[string]Component
- func IsComponentRegistered(name string) bool
- func RegisterComponent(name string, comp Component) error
- type Component
- type Event
- type Info
- type Metric
- type OutputProvider
- type PromRegisterer
- type SettableComponent
- type State
- type WatchableComponent
Constants ¶
const ( StateHealthy = "Healthy" StateUnhealthy = "Unhealthy" StateInitializing = "Initializing" StateDegraded = "Degraded" )
Variables ¶
This section is empty.
Functions ¶
func GetAllComponents ¶
func IsComponentRegistered ¶ added in v0.1.5
func RegisterComponent ¶
Types ¶
type Component ¶
type Component interface { // Defines the component name, // and used for the HTTP handler registration path. // Must be globally unique. Name() string // Start called upon server start. // Implements component-specific poller start logic. Start() error // Returns the current states of the component. States(ctx context.Context) ([]State, error) // Returns all the events from "since". Events(ctx context.Context, since time.Time) ([]Event, error) // Returns all the metrics from the component. Metrics(ctx context.Context, since time.Time) ([]Metric, error) // Called upon server close. // Implements copmonent-specific poller cleanup logic. Close() error }
Component represents an individual component of the system.
Each component check is independent of each other. But the underlying implementation may share the same data sources in order to minimize the querying overhead (e.g., nvidia-smi calls).
Each component implements its own output format inside the State struct. And recommended to have a consistent name for its HTTP handler. And recommended to define const keys for the State extra information field.
func GetComponent ¶
type Event ¶
type Event struct { Time metav1.Time `json:"time"` Name string `json:"name,omitempty"` Type common.EventType `json:"type,omitempty"` Message string `json:"message,omitempty"` // detailed message of the event ExtraInfo map[string]string `json:"extra_info,omitempty"` // any extra information the component may want to expose SuggestedActions *common.SuggestedActions `json:"suggested_actions,omitempty"` }
type Metric ¶
type Metric struct { components_metrics_state.Metric ExtraInfo map[string]string `json:"extra_info,omitempty"` // any extra information the component may want to expose }
type OutputProvider ¶
Defines an optional component interface that returns the underlying output data.
type PromRegisterer ¶
type PromRegisterer interface {
RegisterCollectors(reg *prometheus.Registry, dbRW *sql.DB, dbRO *sql.DB, tableName string) error
}
Defines an optional component interface that supports Prometheus metrics.
type SettableComponent ¶
type State ¶
type State struct { Name string `json:"name,omitempty"` Healthy bool `json:"healthy,omitempty"` Health string `json:"health,omitempty"` // Healthy, Degraded, Unhealthy Reason string `json:"reason,omitempty"` // a detailed and processed reason on why the component is not healthy Error string `json:"error,omitempty"` // the unprocessed error returned from the component ExtraInfo map[string]string `json:"extra_info,omitempty"` // any extra information the component may want to expose SuggestedActions *common.SuggestedActions `json:"suggested_actions,omitempty"` }
type WatchableComponent ¶
type WatchableComponent interface { Component }
WatchableComponent wraps the component with a watchable interface. Useful to intercept the component states method calls to track metrics.
Directories
¶
Path | Synopsis |
---|---|
Package accelerator contains the accelerator components and its query interface.
|
Package accelerator contains the accelerator components and its query interface. |
nvidia
Package nvidia contains the NVIDIA accelerator components and its query interface.
|
Package nvidia contains the NVIDIA accelerator components and its query interface. |
nvidia/bad-envs
Package badenvs tracks any bad environment variables that are globally set for the NVIDIA GPUs.
|
Package badenvs tracks any bad environment variables that are globally set for the NVIDIA GPUs. |
nvidia/bad-envs/id
Package id defines the ID for the bad-envs check.
|
Package id defines the ID for the bad-envs check. |
nvidia/clock-speed
Package clockspeed tracks the NVIDIA per-GPU clock speed.
|
Package clockspeed tracks the NVIDIA per-GPU clock speed. |
nvidia/clock-speed/id
Package id contains the ID for the clock-speed component.
|
Package id contains the ID for the clock-speed component. |
nvidia/ecc
Package ecc tracks the NVIDIA per-GPU ECC errors and other ECC related information.
|
Package ecc tracks the NVIDIA per-GPU ECC errors and other ECC related information. |
nvidia/ecc/id
Package id contains the ID for the ecc component.
|
Package id contains the ID for the ecc component. |
nvidia/error
Package error implements NVIDIA GPU driver error detector.
|
Package error implements NVIDIA GPU driver error detector. |
nvidia/error/sxid
Package sxid tracks the NVIDIA GPU SXid errors scanning the dmesg.
|
Package sxid tracks the NVIDIA GPU SXid errors scanning the dmesg. |
nvidia/error/sxid/id
Package id provides the nvidia error sxid id component.
|
Package id provides the nvidia error sxid id component. |
nvidia/error/xid
Package xid tracks the NVIDIA GPU Xid errors scanning the dmesg and using the NVIDIA Management Library (NVML).
|
Package xid tracks the NVIDIA GPU Xid errors scanning the dmesg and using the NVIDIA Management Library (NVML). |
nvidia/error/xid/id
Package id provides the nvidia error xid id component.
|
Package id provides the nvidia error xid id component. |
nvidia/fabric-manager
Package fabricmanager tracks the NVIDIA fabric manager version and its activeness.
|
Package fabricmanager tracks the NVIDIA fabric manager version and its activeness. |
nvidia/gpm
Package gpm tracks the NVIDIA per-GPU GPM metrics.
|
Package gpm tracks the NVIDIA per-GPU GPM metrics. |
nvidia/gsp-firmware-mode
Package gspfirmwaremode tracks the NVIDIA GSP firmware mode.
|
Package gspfirmwaremode tracks the NVIDIA GSP firmware mode. |
nvidia/gsp-firmware-mode/id
Package id defines the GSP firmware component ID.
|
Package id defines the GSP firmware component ID. |
nvidia/hw-slowdown
Package hwslowdown monitors NVIDIA GPU hardware clock events of all GPUs, such as HW Slowdown events.
|
Package hwslowdown monitors NVIDIA GPU hardware clock events of all GPUs, such as HW Slowdown events. |
nvidia/hw-slowdown/id
Package id provides the ID for the hardware slowdown component.
|
Package id provides the ID for the hardware slowdown component. |
nvidia/infiniband
Package infiniband monitors the infiniband status of the system.
|
Package infiniband monitors the infiniband status of the system. |
nvidia/infiniband/id
Package id provides the ID for the NVIDIA InfiniBand component.
|
Package id provides the ID for the NVIDIA InfiniBand component. |
nvidia/info
Package info provides relatively static information about the NVIDIA accelerator (e.g., GPU product names).
|
Package info provides relatively static information about the NVIDIA accelerator (e.g., GPU product names). |
nvidia/memory
Package memory tracks the NVIDIA per-GPU memory usage.
|
Package memory tracks the NVIDIA per-GPU memory usage. |
nvidia/nccl
Package nccl monitors the NCCL status.
|
Package nccl monitors the NCCL status. |
nvidia/nvlink
Package nvlink monitors the NVIDIA per-GPU nvlink devices.
|
Package nvlink monitors the NVIDIA per-GPU nvlink devices. |
nvidia/peermem
Package peermem monitors the peermem module status.
|
Package peermem monitors the peermem module status. |
nvidia/persistence-mode
Package persistencemode tracks the NVIDIA persistence mode.
|
Package persistencemode tracks the NVIDIA persistence mode. |
nvidia/persistence-mode/id
Package id defines the persistence mode component ID.
|
Package id defines the persistence mode component ID. |
nvidia/power
Package power tracks the NVIDIA per-GPU power usage.
|
Package power tracks the NVIDIA per-GPU power usage. |
nvidia/power/id
Package id defines the power component ID.
|
Package id defines the power component ID. |
nvidia/processes
Package processes tracks the NVIDIA per-GPU processes.
|
Package processes tracks the NVIDIA per-GPU processes. |
nvidia/remapped-rows
Package remappedrows tracks the NVIDIA per-GPU remapped rows.
|
Package remappedrows tracks the NVIDIA per-GPU remapped rows. |
nvidia/temperature
Package temperature tracks the NVIDIA per-GPU temperatures.
|
Package temperature tracks the NVIDIA per-GPU temperatures. |
nvidia/utilization
Package utilization tracks the NVIDIA per-GPU utilization.
|
Package utilization tracks the NVIDIA per-GPU utilization. |
Package containerd contains the containerd components and its query interface.
|
Package containerd contains the containerd components and its query interface. |
pod
Package pod tracks the current pods from the containerd CRI.
|
Package pod tracks the current pods from the containerd CRI. |
pod/id
Package id represents the containerd pod ID.
|
Package id represents the containerd pod ID. |
Package cpu tracks the combined usage of all CPUs (not per-CPU).
|
Package cpu tracks the combined usage of all CPUs (not per-CPU). |
id
Package id represents the CPU component ID.
|
Package id represents the CPU component ID. |
metrics
Package metrics implements the CPU metrics collection and reporting.
|
Package metrics implements the CPU metrics collection and reporting. |
Package disk tracks the disk usage of all the mount points specified in the configuration.
|
Package disk tracks the disk usage of all the mount points specified in the configuration. |
id
Package id represents the disk component ID.
|
Package id represents the disk component ID. |
metrics
Package metrics implements the disk metrics collection and reporting.
|
Package metrics implements the disk metrics collection and reporting. |
Package docker contains the docker components and its query interface.
|
Package docker contains the docker components and its query interface. |
container
Package container tracks the current containers from the docker runtime.
|
Package container tracks the current containers from the docker runtime. |
container/id
Package id represents the Docker container ID.
|
Package id represents the Docker container ID. |
Package fd tracks the number of file descriptors used on the host.
|
Package fd tracks the number of file descriptors used on the host. |
id
Package id defines the component ID for the file descriptor component.
|
Package id defines the component ID for the file descriptor component. |
metrics
Package metrics implements the file descriptor metrics collection and reporting.
|
Package metrics implements the file descriptor metrics collection and reporting. |
Package file provides a component that returns healthy if and only if all the specified files exist.
|
Package file provides a component that returns healthy if and only if all the specified files exist. |
id
Package id defines the component ID for the file component.
|
Package id defines the component ID for the file component. |
Package fuse monitors the FUSE (Filesystem in Userspace).
|
Package fuse monitors the FUSE (Filesystem in Userspace). |
id
Package id provides the ID of the FUSE connection component.
|
Package id provides the ID of the FUSE connection component. |
metrics
Package metrics implements the FUSE connections metrics collection and reporting.
|
Package metrics implements the FUSE connections metrics collection and reporting. |
Package info provides static information about the host (e.g., labels, IDs).
|
Package info provides static information about the host (e.g., labels, IDs). |
id
Package id contains the ID for the info component.
|
Package id contains the ID for the info component. |
Package kernelmodule provides a component that checks the kernel modules in Linux.
|
Package kernelmodule provides a component that checks the kernel modules in Linux. |
id
Package id defines the component ID for the kernel module component.
|
Package id defines the component ID for the kernel module component. |
kubelet
|
|
pod
Package pod tracks the current pods from the kubelet read-only port.
|
Package pod tracks the current pods from the kubelet read-only port. |
pod/id
Package id represents the kubernetes pod ID.
|
Package id represents the kubernetes pod ID. |
Package library provides a component that returns healthy if and only if all the specified libraries exist.
|
Package library provides a component that returns healthy if and only if all the specified libraries exist. |
id
Package id defines the library component ID.
|
Package id defines the library component ID. |
Package memory tracks the memory usage of the host.
|
Package memory tracks the memory usage of the host. |
id
Package id provides the ID of the memory component.
|
Package id provides the ID of the memory component. |
metrics
Package metrics implements the memory metrics collection and reporting.
|
Package metrics implements the memory metrics collection and reporting. |
network
|
|
latency
Package latency tracks the global network connectivity statistics.
|
Package latency tracks the global network connectivity statistics. |
latency/id
Package id represents the network latency ID.
|
Package id represents the network latency ID. |
latency/metrics
Package metrics implements the network latency metrics collection and reporting.
|
Package metrics implements the network latency metrics collection and reporting. |
Package os queries the host OS information (e.g., kernel version).
|
Package os queries the host OS information (e.g., kernel version). |
id
Package id represents the OS ID.
|
Package id represents the OS ID. |
Package pci tracks the PCI devices and their Access Control Services (ACS) status.
|
Package pci tracks the PCI devices and their Access Control Services (ACS) status. |
id
Package id implements the PCI ID component.
|
Package id implements the PCI ID component. |
Package powersupply tracks the power supply/usage on the host.
|
Package powersupply tracks the power supply/usage on the host. |
id
Package id defines the power supply component ID.
|
Package id defines the power supply component ID. |
Package systemd tracks the systemd state and unit files.
|
Package systemd tracks the systemd state and unit files. |
id
Package id defines the systemd component ID.
|
Package id defines the systemd component ID. |
Package tailscale tracks the tailscale state (e.g., version) if available.
|
Package tailscale tracks the tailscale state (e.g., version) if available. |
id
Package id defines the tailscale component ID.
|
Package id defines the tailscale component ID. |