components

package
v0.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 13, 2024 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package components defines the common interfaces for the components.

Index

Constants

View Source
const (
	EventTypeMetric = "metric"
	EventTypeInfo   = "info"
	EventTypeWarn   = "warn"
	EventTypeError  = "error"
)

Variables

This section is empty.

Functions

func GetAllComponents

func GetAllComponents() map[string]Component

func IsComponentRegistered added in v0.1.5

func IsComponentRegistered(name string) bool

func RegisterComponent

func RegisterComponent(name string, comp Component) error

Types

type Component

type Component interface {
	// Defines the component name,
	// and used for the HTTP handler registration path.
	// Must be globally unique.
	Name() string

	// Returns the current states of the component.
	States(ctx context.Context) ([]State, error)

	// Returns all the events from "since".
	Events(ctx context.Context, since time.Time) ([]Event, error)

	// Returns all the metrics from the component.
	Metrics(ctx context.Context, since time.Time) ([]Metric, error)

	// Called upon server close.
	// Implements copmonent-specific poller cleanup logic.
	Close() error
}

Component represents an individual component of the system.

Each component check is independent of each other. But the underlying implementation may share the same data sources in order to minimize the querying overhead (e.g., nvidia-smi calls).

Each component implements its own output format inside the State struct. And recommended to have a consistent name for its HTTP handler. And recommended to define const keys for the State extra information field.

func GetComponent

func GetComponent(name string) (Component, error)

type Event

type Event struct {
	Time      metav1.Time       `json:"time"`
	Name      string            `json:"name,omitempty"`
	Type      string            `json:"type,omitempty"`       // optional: ErrCritical, ErrWarning, Info, Resolution, ...
	Message   string            `json:"message,omitempty"`    // detailed message of the event
	ExtraInfo map[string]string `json:"extra_info,omitempty"` // any extra information the component may want to expose

	SuggestedActions *common.SuggestedActions `json:"suggested_actions,omitempty"`
}

type Info

type Info struct {
	States  []State  `json:"states"`
	Events  []Event  `json:"events"`
	Metrics []Metric `json:"metrics"`
}

type Metric

type Metric struct {
	components_metrics_state.Metric
	ExtraInfo map[string]string `json:"extra_info,omitempty"` // any extra information the component may want to expose
}

type OutputProvider

type OutputProvider interface {
	Output() (any, error)
}

Defines an optional component interface that returns the underlying output data.

type PromRegisterer

type PromRegisterer interface {
	RegisterCollectors(reg *prometheus.Registry, db *sql.DB, tableName string) error
}

Defines an optional component interface that supports Prometheus metrics.

type SettableComponent

type SettableComponent interface {
	SetStates(ctx context.Context, states ...State) error
	SetEvents(ctx context.Context, events ...Event) error
}

type State

type State struct {
	Name      string            `json:"name,omitempty"`
	Healthy   bool              `json:"healthy,omitempty"`
	Reason    string            `json:"reason,omitempty"`     // a detailed and processed reason on why the component is not healthy
	Error     string            `json:"error,omitempty"`      // the unprocessed error returned from the component
	ExtraInfo map[string]string `json:"extra_info,omitempty"` // any extra information the component may want to expose

	SuggestedActions *common.SuggestedActions `json:"suggested_actions,omitempty"`
}

type WatchableComponent

type WatchableComponent interface {
	Component
}

WatchableComponent wraps the component with a watchable interface. Useful to intercept the component states method calls to track metrics.

Directories

Path Synopsis
Package accelerator contains the accelerator components and its query interface.
Package accelerator contains the accelerator components and its query interface.
nvidia
Package nvidia contains the NVIDIA accelerator components and its query interface.
Package nvidia contains the NVIDIA accelerator components and its query interface.
nvidia/bad-envs
Package badenvs tracks any bad environment variables that are globally set for the NVIDIA GPUs.
Package badenvs tracks any bad environment variables that are globally set for the NVIDIA GPUs.
nvidia/bad-envs/id
Package id defines the ID for the bad-envs check.
Package id defines the ID for the bad-envs check.
nvidia/clock
Package clock monitors NVIDIA GPU clock events of all GPUs, such as HW Slowdown events
Package clock monitors NVIDIA GPU clock events of all GPUs, such as HW Slowdown events
nvidia/clock-speed
Package clockspeed tracks the NVIDIA per-GPU clock speed.
Package clockspeed tracks the NVIDIA per-GPU clock speed.
nvidia/ecc
Package ecc tracks the NVIDIA per-GPU ECC errors and other ECC related information.
Package ecc tracks the NVIDIA per-GPU ECC errors and other ECC related information.
nvidia/error
Package error implements NVIDIA GPU driver error detector.
Package error implements NVIDIA GPU driver error detector.
nvidia/error/sxid
Package sxid tracks the NVIDIA GPU SXid errors scanning the dmesg.
Package sxid tracks the NVIDIA GPU SXid errors scanning the dmesg.
nvidia/error/sxid/id
Package id provides the nvidia error sxid id component.
Package id provides the nvidia error sxid id component.
nvidia/error/xid
Package xid tracks the NVIDIA GPU Xid errors scanning the dmesg and using the NVIDIA Management Library (NVML).
Package xid tracks the NVIDIA GPU Xid errors scanning the dmesg and using the NVIDIA Management Library (NVML).
nvidia/error/xid/id
Package id provides the nvidia error xid id component.
Package id provides the nvidia error xid id component.
nvidia/fabric-manager
Package fabricmanager tracks the NVIDIA fabric manager version and its activeness.
Package fabricmanager tracks the NVIDIA fabric manager version and its activeness.
nvidia/gpm
Package gpm tracks the NVIDIA per-GPU GPM metrics.
Package gpm tracks the NVIDIA per-GPU GPM metrics.
nvidia/gsp-firmware-mode
Package gspfirmwaremode tracks the NVIDIA GSP firmware mode.
Package gspfirmwaremode tracks the NVIDIA GSP firmware mode.
nvidia/gsp-firmware-mode/id
Package id defines the GSP firmware component ID.
Package id defines the GSP firmware component ID.
nvidia/infiniband
Package infiniband monitors the infiniband status of the system.
Package infiniband monitors the infiniband status of the system.
nvidia/info
Package info provides relatively static information about the NVIDIA accelerator (e.g., GPU product names).
Package info provides relatively static information about the NVIDIA accelerator (e.g., GPU product names).
nvidia/memory
Package memory tracks the NVIDIA per-GPU memory usage.
Package memory tracks the NVIDIA per-GPU memory usage.
nvidia/nccl
Package nccl monitors the NCCL status.
Package nccl monitors the NCCL status.
nvidia/nvlink
Package nvlink monitors the NVIDIA per-GPU nvlink devices.
Package nvlink monitors the NVIDIA per-GPU nvlink devices.
nvidia/peermem
Package peermem monitors the peermem module status.
Package peermem monitors the peermem module status.
nvidia/persistence-mode
Package persistencemode tracks the NVIDIA persistence mode.
Package persistencemode tracks the NVIDIA persistence mode.
nvidia/persistence-mode/id
Package id defines the persistence mode component ID.
Package id defines the persistence mode component ID.
nvidia/power
Package power tracks the NVIDIA per-GPU power usage.
Package power tracks the NVIDIA per-GPU power usage.
nvidia/processes
Package processes tracks the NVIDIA per-GPU processes.
Package processes tracks the NVIDIA per-GPU processes.
nvidia/query
Package query implements "nvidia-smi --query" output helpers.
Package query implements "nvidia-smi --query" output helpers.
nvidia/query/fabric-manager-log
Package fabricmanagerlog implements the fabric manager log poller.
Package fabricmanagerlog implements the fabric manager log poller.
nvidia/query/metrics/clock
Package clock provides the NVIDIA clock metrics collection and reporting.
Package clock provides the NVIDIA clock metrics collection and reporting.
nvidia/query/metrics/clock-speed
Package clockspeed provides the NVIDIA clock speed metrics collection and reporting.
Package clockspeed provides the NVIDIA clock speed metrics collection and reporting.
nvidia/query/metrics/ecc
Package ecc provides the NVIDIA ECC metrics collection and reporting.
Package ecc provides the NVIDIA ECC metrics collection and reporting.
nvidia/query/metrics/gpm
Package gpm provides the NVIDIA GPM metrics collection and reporting.
Package gpm provides the NVIDIA GPM metrics collection and reporting.
nvidia/query/metrics/memory
Package memory provides the NVIDIA memory metrics collection and reporting.
Package memory provides the NVIDIA memory metrics collection and reporting.
nvidia/query/metrics/nvlink
Package nvlink provides the NVIDIA nvlink metrics collection and reporting.
Package nvlink provides the NVIDIA nvlink metrics collection and reporting.
nvidia/query/metrics/power
Package power provides the NVIDIA power usage metrics collection and reporting.
Package power provides the NVIDIA power usage metrics collection and reporting.
nvidia/query/metrics/processes
Package processes provides the NVIDIA processes metrics collection and reporting.
Package processes provides the NVIDIA processes metrics collection and reporting.
nvidia/query/metrics/remapped-rows
Package remappedrows provides the NVIDIA row remapping metrics collection and reporting.
Package remappedrows provides the NVIDIA row remapping metrics collection and reporting.
nvidia/query/metrics/temperature
Package temperature provides the NVIDIA temperature metrics collection and reporting.
Package temperature provides the NVIDIA temperature metrics collection and reporting.
nvidia/query/metrics/utilization
Package utilization provides the NVIDIA GPU utilization metrics collection and reporting.
Package utilization provides the NVIDIA GPU utilization metrics collection and reporting.
nvidia/query/nccl
Package nccl contains the implementation of the NCCL (NVIDIA Collective Communications Library) query for NVIDIA GPUs.
Package nccl contains the implementation of the NCCL (NVIDIA Collective Communications Library) query for NVIDIA GPUs.
nvidia/query/nvml
Package nvml implements the NVIDIA Management Library (NVML) interface.
Package nvml implements the NVIDIA Management Library (NVML) interface.
nvidia/query/peermem
Package peermem contains the implementation of the peermem query for NVIDIA GPUs.
Package peermem contains the implementation of the peermem query for NVIDIA GPUs.
nvidia/query/sxid
Package sxid provides the NVIDIA SXID error details.
Package sxid provides the NVIDIA SXID error details.
nvidia/query/xid
Package xid provides the NVIDIA XID error details.
Package xid provides the NVIDIA XID error details.
nvidia/remapped-rows
Package remappedrows tracks the NVIDIA per-GPU remapped rows.
Package remappedrows tracks the NVIDIA per-GPU remapped rows.
nvidia/temperature
Package temperature tracks the NVIDIA per-GPU temperatures.
Package temperature tracks the NVIDIA per-GPU temperatures.
nvidia/utilization
Package utilization tracks the NVIDIA per-GPU utilization.
Package utilization tracks the NVIDIA per-GPU utilization.
Package common contains common types and functions used across multiple components.
Package common contains common types and functions used across multiple components.
Package containerd contains the containerd components and its query interface.
Package containerd contains the containerd components and its query interface.
pod
Package pod tracks the current pods from the containerd CRI.
Package pod tracks the current pods from the containerd CRI.
cpu
Package cpu tracks the combined usage of all CPUs (not per-CPU).
Package cpu tracks the combined usage of all CPUs (not per-CPU).
metrics
Package metrics implements the CPU metrics collection and reporting.
Package metrics implements the CPU metrics collection and reporting.
Package diagnose provides a way to diagnose the system and components.
Package diagnose provides a way to diagnose the system and components.
Package disk tracks the disk usage of all the mount points specified in the configuration.
Package disk tracks the disk usage of all the mount points specified in the configuration.
metrics
Package metrics implements the disk metrics collection and reporting.
Package metrics implements the disk metrics collection and reporting.
Package dmesg scans and watches dmesg outputs for errors, as specified in the configuration (e.g., regex match NVIDIA GPU errors).
Package dmesg scans and watches dmesg outputs for errors, as specified in the configuration (e.g., regex match NVIDIA GPU errors).
Package docker contains the docker components and its query interface.
Package docker contains the docker components and its query interface.
container
Package container tracks the current containers from the docker runtime.
Package container tracks the current containers from the docker runtime.
fd
Package fd tracks the number of file descriptors used on the host.
Package fd tracks the number of file descriptors used on the host.
metrics
Package metrics implements the file descriptor metrics collection and reporting.
Package metrics implements the file descriptor metrics collection and reporting.
Package file provides a component that returns healthy if and only if all the specified files exist.
Package file provides a component that returns healthy if and only if all the specified files exist.
Package info provides static information about the host (e.g., labels, IDs).
Package info provides static information about the host (e.g., labels, IDs).
k8s
pod
Package pod tracks the current pods from the kubelet read-only port.
Package pod tracks the current pods from the kubelet read-only port.
Package library provides a component that returns healthy if and only if all the specified libraries exist.
Package library provides a component that returns healthy if and only if all the specified libraries exist.
Package memory tracks the memory usage of the host.
Package memory tracks the memory usage of the host.
metrics
Package metrics implements the memory metrics collection and reporting.
Package metrics implements the memory metrics collection and reporting.
Package metrics implements metrics collection and reporting.
Package metrics implements metrics collection and reporting.
state
Package state provides the persistent storage layer for the metrics.
Package state provides the persistent storage layer for the metrics.
network
latency
Package latency tracks the global network connectivity statistics.
Package latency tracks the global network connectivity statistics.
latency/metrics
Package metrics implements the network latency metrics collection and reporting.
Package metrics implements the network latency metrics collection and reporting.
Package os queries the host OS information (e.g., kernel version).
Package os queries the host OS information (e.g., kernel version).
Package powersupply tracks the power supply/usage on the host.
Package powersupply tracks the power supply/usage on the host.
Package query provides the query/poller implementation.
Package query provides the query/poller implementation.
config
Package config provides the query/poller configuration.
Package config provides the query/poller configuration.
log
Package log provides the log file/output poller implementation.
Package log provides the log file/output poller implementation.
log/common
Package common provides the common log components.
Package common provides the common log components.
log/config
Package config provides the log poller configuration.
Package config provides the log poller configuration.
log/state
Package state provides the persistent storage layer for the log poller.
Package state provides the persistent storage layer for the log poller.
log/tail
Package tail implements the log file/output tail-ing operations.
Package tail implements the log file/output tail-ing operations.
Package state provides the persistent storage layer for component states.
Package state provides the persistent storage layer for component states.
Package systemd tracks the systemd state and unit files.
Package systemd tracks the systemd state and unit files.
Package tailscale tracks the tailscale state (e.g., version) if available.
Package tailscale tracks the tailscale state (e.g., version) if available.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL