Documentation ¶
Overview ¶
Package collector implements different collectors of the exporter
Index ¶
- Constants
- Variables
- func DisableDefaultCollectors()
- func GetAMDGPUDevices(rocmSmiPath string, logger log.Logger) (map[int]Device, error)
- func GetGPUDevices(gpuType string, logger log.Logger) (map[int]Device, error)
- func GetNvidiaGPUDevices(nvidiaSmiPath string, logger log.Logger) (map[int]Device, error)
- func IsNoDataError(err error) bool
- func LoadCgroupsV2Metrics(name string, cgroupfsPath string, controllers []string) (map[string]float64, error)
- func RegisterCollector(collector string, isDefaultEnabled bool, ...)
- func SanitizeMetricName(metricName string) string
- type CEEMSCollector
- type CEEMSExporter
- type CEEMSExporterServer
- type CgroupMetric
- type Collector
- func NewCPUCollector(logger log.Logger) (Collector, error)
- func NewEmissionsCollector(logger log.Logger) (Collector, error)
- func NewIPMICollector(logger log.Logger) (Collector, error)
- func NewMeminfoCollector(logger log.Logger) (Collector, error)
- func NewPerfCollector(logger log.Logger) (Collector, error)
- func NewRaplCollector(logger log.Logger) (Collector, error)
- func NewSlurmCollector(logger log.Logger) (Collector, error)
- type Config
- type Device
- type WebConfig
Constants ¶
const CEEMSExporterAppName = "ceems_exporter"
CEEMSExporterAppName is kingpin app name.
const Namespace = "ceems"
Namespace defines the common namespace to be used by all metrics.
Variables ¶
var CEEMSExporterApp = *kingpin.New( CEEMSExporterAppName, "Prometheus Exporter to export compute (job, VM, pod) resource usage metrics.", )
CEEMSExporterApp is kingpin CLI app.
var (
)Custom errors.
var ErrNoData = errors.New("collector returned no data")
ErrNoData indicates the collector found no data to collect, but had no other error.
Functions ¶
func DisableDefaultCollectors ¶
func DisableDefaultCollectors()
DisableDefaultCollectors sets the collector state to false for all collectors which have not been explicitly enabled on the command line.
func GetAMDGPUDevices ¶
GetAMDGPUDevices returns all GPU devices using rocm-smi command Example output: bash-4.4$ rocm-smi --showproductname --showserial --csv device,Serial Number,Card series,Card model,Card vendor,Card SKU card0,20170000800c,deon Instinct MI50 32GB,0x0834,Advanced Micro Devices Inc. AMD/ATI,D16317 card1,20170003580c,deon Instinct MI50 32GB,0x0834,Advanced Micro Devices Inc. AMD/ATI,D16317 card2,20180003050c,deon Instinct MI50 32GB,0x0834,Advanced Micro Devices Inc. AMD/ATI,D16317.
func GetGPUDevices ¶
GetGPUDevices returns GPU devices.
func GetNvidiaGPUDevices ¶
GetNvidiaGPUDevices returns all physical or MIG devices using nvidia-smi command Example output: bash-4.4$ nvidia-smi --query-gpu=name,uuid --format=csv name, uuid Tesla V100-SXM2-32GB, GPU-f124aa59-d406-d45b-9481-8fcd694e6c9e Tesla V100-SXM2-32GB, GPU-61a65011-6571-a6d2-5ab8-66cbb6f7f9c3
Here we are using nvidia-smi to avoid having build issues if we use nvml go bindings. This way we dont have deps on nvidia stuff and keep exporter simple.
NOTE: Hoping this command returns MIG devices too.
func IsNoDataError ¶
IsNoDataError returns true if error is ErrNoData.
func LoadCgroupsV2Metrics ¶
func LoadCgroupsV2Metrics( name string, cgroupfsPath string, controllers []string, ) (map[string]float64, error)
LoadCgroupsV2Metrics returns cgroup metrics from a given path.
func RegisterCollector ¶
func RegisterCollector( collector string, isDefaultEnabled bool, factory func(logger log.Logger) (Collector, error), )
RegisterCollector registers collector into collector factory.
func SanitizeMetricName ¶
SanitizeMetricName sanitize the given metric name by replacing invalid characters by underscores.
OpenMetrics and the Prometheus exposition format require the metric name to consist only of alphanumericals and "_", ":" and they must not start with digits. Since colons in MetricFamily are reserved to signal that the MetricFamily is the result of a calculation or aggregation of a general purpose monitoring system, colons will be replaced as well.
Note: If not subsequently prepending a namespace and/or subsystem (e.g., with prometheus.BuildFQName), the caller must ensure that the supplied metricName does not begin with a digit.
Types ¶
type CEEMSCollector ¶
type CEEMSCollector struct { Collectors map[string]Collector // contains filtered or unexported fields }
CEEMSCollector implements the prometheus.Collector interface.
func NewCEEMSCollector ¶
func NewCEEMSCollector(logger log.Logger) (*CEEMSCollector, error)
NewCEEMSCollector creates a new CEEMSCollector.
func (CEEMSCollector) Close ¶ added in v0.3.0
func (n CEEMSCollector) Close(ctx context.Context) error
Close stops all the collectors and release system resources.
func (CEEMSCollector) Collect ¶
func (n CEEMSCollector) Collect(ch chan<- prometheus.Metric)
Collect implements the prometheus.Collector interface.
func (CEEMSCollector) Describe ¶
func (n CEEMSCollector) Describe(ch chan<- *prometheus.Desc)
Describe implements the prometheus.Collector interface.
type CEEMSExporter ¶
type CEEMSExporter struct { App kingpin.Application // contains filtered or unexported fields }
CEEMSExporter represents the `ceems_exporter` cli.
func NewCEEMSExporter ¶
func NewCEEMSExporter() (*CEEMSExporter, error)
NewCEEMSExporter returns a new CEEMSExporter instance.
func (*CEEMSExporter) Main ¶
func (b *CEEMSExporter) Main() error
Main is the entry point of the `ceems_exporter` command.
type CEEMSExporterServer ¶ added in v0.3.0
type CEEMSExporterServer struct {
// contains filtered or unexported fields
}
CEEMSExporterServer struct implements HTTP server for exporter.
func NewCEEMSExporterServer ¶ added in v0.3.0
func NewCEEMSExporterServer(c *Config) (*CEEMSExporterServer, error)
NewCEEMSExporterServer creates new CEEMSExporterServer struct instance.
func (*CEEMSExporterServer) Shutdown ¶ added in v0.3.0
func (s *CEEMSExporterServer) Shutdown(ctx context.Context) error
Shutdown stops CEEMS exporter HTTP server.
func (*CEEMSExporterServer) Start ¶ added in v0.3.0
func (s *CEEMSExporterServer) Start() error
Start launches CEEMS exporter HTTP server.
type CgroupMetric ¶
type CgroupMetric struct {
// contains filtered or unexported fields
}
CgroupMetric contains metrics returned by cgroup.
type Collector ¶
type Collector interface { // Get new metrics and expose them via prometheus registry. Update(ch chan<- prometheus.Metric) error // Stops each collector and cleans up system resources Stop(ctx context.Context) error }
Collector is the interface a collector has to implement.
func NewCPUCollector ¶
NewCPUCollector returns a new Collector exposing kernel/system statistics.
func NewEmissionsCollector ¶
NewEmissionsCollector returns a new Collector exposing emission factor metrics.
func NewIPMICollector ¶
NewIPMICollector returns a new Collector exposing IMPI DCMI power metrics.
func NewMeminfoCollector ¶
NewMeminfoCollector returns a new Collector exposing memory stats.
func NewPerfCollector ¶ added in v0.3.0
NewPerfCollector returns a new perf based collector, it creates a profiler per compute unit.
func NewRaplCollector ¶
NewRaplCollector returns a new Collector exposing RAPL metrics.