nomad-exporter

command module

v0.0.0-...-09affa2 Latest Latest Go to latest Published: Nov 27, 2024 License: Apache-2.0 Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

gitlab.com/istvandbsd/nomad-exporter

README ¶

Nomad Prometheus Exporter

Originally a fork of yakshaving.art's nomad exporter, updated and slimmed down for Nomad 1.x.

Nomad after the 1.0 milestone contains an HTTP API endpoint called v1/metrics that might make an extension such as this one unnecessary - just point Prometheus at that and you'll have an extensive set of metrics. The additions and changes of this exporter are documented below.

Nomad version

Currently supporting Nomad 1.8.x API version. There is (/no) guarantee about the metrics supplied by Nomad's own endpoint, check before upgrades.

Running in Docker Compose

Use the provided compose file in the repo as a guide.

Running in Docker

docker run --rm -it -p 9441:9441 gitlab.com/yakshaving.art/nomad-exporter:latest -nomad.address http://nomad-server-address:4646

Running in Nomad

Use the provided hcl configuration file

Options

-allow-stale-reads allow reading metrics from a non-leader server. See notes below
-concurrency int max number of goroutines to launch concurrently when poking the API (default 20)
-debug enable debug log level
-no-allocation-stats-metrics disable stats metrics collection
-no-allocations-metrics disable allocations metrics collection
-no-deployment-metrics disable deployment metrics collection
-no-eval-metrics disable eval metrics collection
-no-jobs-metrics disable jobs metrics collection
-no-node-metrics disable node metrics collection
-no-peer-metrics disable peer metrics collection
-no-serf-metrics disable serf metrics collection
-nomad.address string HTTP API address of a Nomad server or agent. (default "http://localhost:4646")
-nomad.timeout int HTTP read timeout when talking to the Nomad agent. In milliseconds (default 500)
-nomad.waittime int Timeout to wait for the Nomad agent to deliver fresh data. In milliseconds. (default 10)
-tls.ca-file string ca-file path to a PEM-encoded CA cert file to use to verify the connection to nomad server
-tls.ca-path string ca-path is the path to a directory of PEM-encoded CA cert files to verify the connection to nomad server
-tls.cert-file string cert-file is the path to the client certificate for Nomad communication
-tls.insecure insecure enables or disables SSL verification
-tls.key-file string key-file is the path to the key for cert-file
-tls.tls-server-name string tls-server-name sets the SNI for Nomad ssl connection
-version Print version information.
-web.listen-address string Address to listen on for web interface and telemetry. (default ":9441")
-web.telemetry-path string Path under which to expose metrics. (default "/metrics")

Environment Variables

Environment variables are loaded into argument defaults, thus they can be overriden setting the arguments directly, still, they offer a way of configuring the executable with the environment instead of the command arguments.

NOMAD_ADDR same as -nomad.address
NOMAD_CACERT same as -tls.ca-file
NOMAD_CAPATH same as -tls.ca-path
NOMAD_CLIENT_CERT same as -tls.cert-file
NOMAD_CLIENT_KEY same as -tls.key-file
NOMAD_SKIP_VERIFY same as -tls.insecure
NOMAD_SNI_TLS_SERVER_NAME same as -tls.tls-server-name

Leader Detection

The way to identify the leader is by comparing the leader address obtained through the API call with the client address, if they both aim for the same hostname, then the reading exporter is considered to be reading from the leader host.

If you are having problems identifying the leader, use -debug to read what data the current exporter is handling.

Allow Reading Stale Metrics

By default exporter will try to identify the leader of the cluster and only get metrics from it.

This is a defense mechanism to prevent impacting the whole cluster by requesting every node with metrics from everybody else.

Still, there's a -allow-stale-reads argument that can be used to enable recording metrics from any hosts regardless of it being the leader or not.

Exported Metrics

Metric	Meaning	Type	Labels
nomad_up	Whether the exporter is able to talk to the nomad server.	Gauge
nomad_client_errors_total	Number of errors that were accounted for.	Gauge
nomad_leader	Whether the current host is the cluster leader.	Gauge
nomad_jobs_total	How many jobs are there in the cluster.	Gauge
nomad_node_info	Node information.	Gauge	name, version, class, status, drain, datacenter, scheduling_eligibility
nomad_raft_applied_index	Index being applied.	Gauge	datacenter, node
nomad_raft_peers	How many peers (servers) are in the Raft cluster.	Gauge	datacenter, node
nomad_serf_lan_members	How many members are in the cluster.	Gauge	datacenter, class, name, node_id, drain
nomad_serf_lan_member_status	Describe member state.		datacenter, class, node, drain
nomad_allocation	Allocation labeled with runtime information.	Gauge	status, desired_status, job_type, job_id, job_version, task_group, node
nomad_evals_total	The number of evaluations.	Gauge	status
nomad_tasks_total	The number of tasks.	Gauge	state, job_type, node
nomad_api_latency_seconds	nomad api latency for different queries	Histogram	query
nomad_api_node_latency_seconds	nomad api latency for different nodes and queries	Histogram	query, node
nomad_deployments_total	The number of deployments.	Gauge	status, job_id
nomad_deployment_task_group_desired_canaries_total	The number of desired canaries for the task group.	Gauge	job_id, job_version, task_group, promoted, auto_revert
nomad_deployment_task_group_desired_total	The number of desired allocs for the task group.	Gauge	job_id, job_version, task_group, promoted, auto_revert
nomad_deployment_task_group_healthy_allocs_total	The number of healthy allocs for the task group.	Gauge	job_id, job_version, task_group, promoted, auto_revert
nomad_deployment_task_group_placed_allocs_total	The number of placed allocs for the task group.	Gauge	job_id, job_version, task_group, promoted, auto_revert
nomad_deployment_task_group_unhealthy_allocs_total	The number of unhealthy allocs for the task group.	Gauge	job_id, job_version, task_group, promoted, auto_revert
nomad_allocation_memory_rss_bytes	Allocation memory usage.	Gauge	job, job_version, group, alloc, region, datacenter, node
nomad_allocation_memory_rss_bytes_limit	Allocation memory limit.	Gauge	job, job_version, group, alloc, region, datacenter, node
nomad_allocation_cpu_percent	Allocation CPU usage.	Gauge	job, job_version, group, alloc, region, datacenter, node
nomad_allocation_cpu_required	Allocation CPU Required.	Gauge	job, job_version, group, alloc, region, datacenter, node
nomad_allocation_cpu_user_mode	Allocation CPU User Mode Usage.	Gauge	job, job_version, group, alloc, region, datacenter, node
nomad_allocation_cpu_system_mode	Allocation CPU System Mode Usage.	Gauge	job, job_version, group, alloc, region, datacenter, node
nomad_allocation_cpu_throttle_time	Allocation throttled CPU.	Gauge	job, job_version, group, alloc, region, datacenter, node
nomad_task_cpu_total_ticks	Task CPU total ticks.	Gauge	job, job_version, group, alloc, region, datacenter, node, task
nomad_task_cpu_percent	Task CPU usage percent.	Gauge	job, job_version, group, alloc, region, datacenter, node, task
nomad_task_memory_rss_bytes	Task memory RSS usage in bytes.	Gauge	job, job_version, group, alloc, region, datacenter, node, task
nomad_node_resource_memory_bytes	Amount of allocatable memory the node has in bytes	Gauge	node, datacenter
nomad_node_allocated_memory_bytes	Amount of memory allocated to tasks on the node in bytes.	Gauge	node, datacenter
nomad_node_used_memory_bytes	Amount of memory used on the node in bytes.	Gauge	node, datacenter
nomad_node_resource_cpu_megahertz	Amount of allocatable CPU the node has in MHz.	Gauge	node, datacenter
nomad_node_resource_iops	Amount of allocatable IOPS the node has.	Gauge	node, datacenter
nomad_node_resource_disk_bytes	Amount of allocatable disk bytes the node has.	Gauge	node, datacenter
nomad_node_allocated_cpu_megahertz	Amount of allocated CPU on the node in MHz.	Gauge	node, datacenter
nomad_node_used_cpu_megahertz	Amount of CPU used on the node in MHz.	Gauge	node, datacenter

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
version

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL