Documentation ¶
Overview ¶
Package collector implements different collectors of the exporter
Index ¶
- Constants
- Variables
- func DisableDefaultCollectors()
- func GetAMDGPUDevices(rocmSmiPath string, logger log.Logger) (map[int]Device, error)
- func GetGPUDevices(gpuType string, logger log.Logger) (map[int]Device, error)
- func GetNvidiaGPUDevices(nvidiaSmiPath string, logger log.Logger) (map[int]Device, error)
- func IsNoDataError(err error) bool
- func LoadCgroupsV2Metrics(name string, cgroupfsPath string, controllers []string) (map[string]float64, error)
- func RegisterCollector(collector string, isDefaultEnabled bool, ...)
- func SanitizeMetricName(metricName string) string
- type CEEMSCollector
- type CEEMSExporter
- type CgroupMetric
- type Collector
- func NewCPUCollector(logger log.Logger) (Collector, error)
- func NewEmissionsCollector(logger log.Logger) (Collector, error)
- func NewIPMICollector(logger log.Logger) (Collector, error)
- func NewMeminfoCollector(logger log.Logger) (Collector, error)
- func NewRaplCollector(logger log.Logger) (Collector, error)
- func NewSlurmCollector(logger log.Logger) (Collector, error)
- type Device
- type JobProps
Constants ¶
const CEEMSExporterAppName = "ceems_exporter"
CEEMSExporterAppName is kingpin app name
const Namespace = "ceems"
Namespace defines the common namespace to be used by all metrics.
Variables ¶
var CEEMSExporterApp = *kingpin.New( CEEMSExporterAppName, "Prometheus Exporter to export compute (job, VM, pod) resource usage metrics.", )
CEEMSExporterApp is kingpin CLI app
var ErrNoData = errors.New("collector returned no data")
ErrNoData indicates the collector found no data to collect, but had no other error.
Functions ¶
func DisableDefaultCollectors ¶
func DisableDefaultCollectors()
DisableDefaultCollectors sets the collector state to false for all collectors which have not been explicitly enabled on the command line.
func GetAMDGPUDevices ¶
GetAMDGPUDevices returns all GPU devices using rocm-smi command Example output: bash-4.4$ rocm-smi --showproductname --showserial --csv device,Serial Number,Card series,Card model,Card vendor,Card SKU card0,20170000800c,deon Instinct MI50 32GB,0x0834,Advanced Micro Devices Inc. AMD/ATI,D16317 card1,20170003580c,deon Instinct MI50 32GB,0x0834,Advanced Micro Devices Inc. AMD/ATI,D16317 card2,20180003050c,deon Instinct MI50 32GB,0x0834,Advanced Micro Devices Inc. AMD/ATI,D16317
func GetGPUDevices ¶
GetGPUDevices returns GPU devices
func GetNvidiaGPUDevices ¶
GetNvidiaGPUDevices returns all physical or MIG devices using nvidia-smi command Example output: bash-4.4$ nvidia-smi --query-gpu=name,uuid --format=csv name, uuid Tesla V100-SXM2-32GB, GPU-f124aa59-d406-d45b-9481-8fcd694e6c9e Tesla V100-SXM2-32GB, GPU-61a65011-6571-a6d2-5ab8-66cbb6f7f9c3
Here we are using nvidia-smi to avoid having build issues if we use nvml go bindings. This way we dont have deps on nvidia stuff and keep exporter simple.
NOTE: Hoping this command returns MIG devices too
func IsNoDataError ¶
func LoadCgroupsV2Metrics ¶
func LoadCgroupsV2Metrics( name string, cgroupfsPath string, controllers []string, ) (map[string]float64, error)
LoadCgroupsV2Metrics returns cgroup metrics from a given path
func RegisterCollector ¶
func SanitizeMetricName ¶
SanitizeMetricName sanitize the given metric name by replacing invalid characters by underscores.
OpenMetrics and the Prometheus exposition format require the metric name to consist only of alphanumericals and "_", ":" and they must not start with digits. Since colons in MetricFamily are reserved to signal that the MetricFamily is the result of a calculation or aggregation of a general purpose monitoring system, colons will be replaced as well.
Note: If not subsequently prepending a namespace and/or subsystem (e.g., with prometheus.BuildFQName), the caller must ensure that the supplied metricName does not begin with a digit.
Types ¶
type CEEMSCollector ¶
type CEEMSCollector struct { Collectors map[string]Collector // contains filtered or unexported fields }
CEEMSCollector implements the prometheus.Collector interface.
func NewCEEMSCollector ¶
func NewCEEMSCollector(logger log.Logger, filters ...string) (*CEEMSCollector, error)
NewCEEMSCollector creates a new CEEMSCollector.
func (CEEMSCollector) Collect ¶
func (n CEEMSCollector) Collect(ch chan<- prometheus.Metric)
Collect implements the prometheus.Collector interface.
func (CEEMSCollector) Describe ¶
func (n CEEMSCollector) Describe(ch chan<- *prometheus.Desc)
Describe implements the prometheus.Collector interface.
type CEEMSExporter ¶
type CEEMSExporter struct { App kingpin.Application // contains filtered or unexported fields }
CEEMSExporter represents the `ceems_exporter` cli.
func NewCEEMSExporter ¶
func NewCEEMSExporter() (*CEEMSExporter, error)
NewCEEMSExporter returns a new CEEMSExporter instance
func (*CEEMSExporter) Main ¶
func (b *CEEMSExporter) Main() error
Main is the entry point of the `ceems_exporter` command
type CgroupMetric ¶
type CgroupMetric struct {
// contains filtered or unexported fields
}
CgroupMetric contains metrics returned by cgroup
type Collector ¶
type Collector interface { // Get new metrics and expose them via prometheus registry. Update(ch chan<- prometheus.Metric) error }
Collector is the interface a collector has to implement.
func NewCPUCollector ¶
NewCPUCollector returns a new Collector exposing kernel/system statistics.
func NewEmissionsCollector ¶
NewEmissionsCollector returns a new Collector exposing emission factor metrics.
func NewIPMICollector ¶
NewIPMICollector returns a new Collector exposing IMPI DCMI power metrics.
func NewMeminfoCollector ¶
NewMeminfoCollector returns a new Collector exposing memory stats.
func NewRaplCollector ¶
NewRaplCollector returns a new Collector exposing RAPL metrics.