OpenTelemetry Prometheus sidecar
This repository contains a sidecar for the
Prometheus Server that sends metrics data to
an OpenTelemetry Metrics Protocol endpoint. This
software is derived from the Stackdriver Prometheus
Sidecar.
![OpenTelemetry Prometheus Sidecar Diagram](https://github.com/lightstep/opentelemetry-prometheus-sidecar/raw/v0.18.1/docs/img/opentelemetry-prometheus-sidecar.png)
Repository Status (2/21/2021)
This repository is being maintained by Lightstep and will be donated
to OpenTelemetry. We are moving this
repository
into the OpenTelemetry
organization and will continue
development on a public
fork of the
upstream Stackdriver Prometheus
sidecar
repository.
This code base is 100% OpenTelemetry and Prometheus, not a
Lightstep-specific sidecar, functioning to read data collected and
written by Prometheus, convert into the OpenTelemetry data model, and
write to an OpenTelemetry endpoint.
This sidecar sends OpenTelemetry Protocol version 0.7 (or later versions) over gRPC.
Sidecar design
The sidecar includes:
- Prometheus write-ahead log (WAL) reader
- Metadata cache that tracks active instruments
- Configured settings:
- Additional resource attributes to apply to all metric timeseries
- Renaming and prefixing to change the name of metric timeseries
- Filters to avoid reporting specific metric timeseries
- Specify whether to use use int64 (optional) vs. double (default) protocol encoding
The sidecar operates by continually (and concurrently) reading the
log, refreshing its view of the instrument metadata,
transforming the data into OpenTelemetry Protocol metrics, and sending
over gRPC to an OpenTelemetry metrics service.
Installation
Lightstep publishes Docker images of this binary named
lightstep/opentelemetry-prometheus-sidecar:${VERSION}
, with the
latest release always tagged latest
.
To build from source, please clone this repository. You will build a
Docker image, push it to a private container registry, and then run
the container as described below. To test and build a Docker image
for the current operating system, simply:
export DOCKER_IMAGE_NAME=my.image.reposito.ry/opentelemetry/prometheus-sidecar
export DOCKER_IMAGE_TAG=$(cat ./VERSION)
make docker
docker push ${DOCKER_IMAGE_NAME}:${DOCKER_IMAGE_TAG}
Deployment
The sidecar is deployed next to an already running Prometheus server.
An example command-line:
opentelemetry-prometheus-sidecar \
--destination.endpoint=${DESTINATION} \
--destination.header="Custom-Header=${VALUE}" \
--destination.attribute="service.name=${SERVICE}" \
--prometheus.wal=${WAL} \
--prometheus.endpoint=${PROMETHEUS} \
where:
DESTINATION
: Destination address https://host:port
VALUE
: Value for the Custom-Header
request header
SERVICE
: Value for the service.name
resource attribute
WAL
: Prometheus' WAL directory, defaults to data/wal
PROMETHEUS
: URL of the Prometheus UI.
See the list of command-line flags below.
Settings can also be passed through a configuration file. For example:
destination:
endpoint: https://otlp.io:443
headers:
Custom-Header: custom-value
attributes:
service.name: my-service-name
prometheus:
wal: /prometheus/wal
endpoint: http://192.168.10.10:9191
See an example configuration yaml file here
The sidecar requires write access to the directory to store its progress between restarts.
Kubernetes and Helm setup
To configure the sidecar for a Prometheus server installed using the
Prometheus Community Helm Charts.
Sidecar Setup
To configure the core compoents of the Prometheus sidecar, add the following definition to your custom values.yaml
:
server:
sidecarContainers:
- name: otel-sidecar
image: lightstep/opentelemetry-prometheus-sidecar
imagePullPolicy: Always
args:
- --prometheus.wal=/data/wal
- --destination.endpoint=$(DESTINATION)
- --destination.header=access-token=AAAAAAAAAAAAAAAA
volumeMounts:
- name: storage-volume
mountPath: /data
ports:
- name: admin-port
containerPort: 9091
livenessProbe:
httpGet:
path: /-/health
port: admin-port
periodSeconds: 30
failureThreshold: 2
The upstream Stackdriver Prometheus sidecar Kubernetes
README
contains more examples of how to patch an existing Prometheus
deployment or deploy the sidecar without using Helm.
Configuration
Most sidecar configuration settings can be set through command-line
flags, while a few more rarely-used options are only settable through
a yaml configuration file. To see all available command-line flags,
run opentelemetry-prometheus-sidecar --help
. The printed usage is
shown below:
usage: opentelemetry-prometheus-sidecar [<flags>]
The OpenTelemetry Prometheus sidecar runs alongside the Prometheus
(https://prometheus.io/) Server and sends metrics data to an OpenTelemetry
(https://opentelemetry.io) Protocol endpoint.
Flags:
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--version Show application version.
--config-file=CONFIG-FILE A configuration file.
--destination.endpoint=DESTINATION.ENDPOINT
Destination address of a OpenTelemetry Metrics
protocol gRPC endpoint (e.g.,
https://host:port). Use "http" (not "https")
for an insecure connection.
--destination.attribute=DESTINATION.ATTRIBUTE ...
Destination resource attributes attached to
OTLP data (e.g., MyResource=Value1). May be
repeated.
--destination.header=DESTINATION.HEADER ...
Destination headers used for OTLP requests
(e.g., MyHeader=Value1). May be repeated.
--destination.timeout=DESTINATION.TIMEOUT
Destination timeout used for OTLP Export()
requests
--destination.compression=DESTINATION.COMPRESSION
Destination compression used for OTLP requests
(e.g., snappy, gzip, none).
--diagnostics.endpoint=DIAGNOSTICS.ENDPOINT
Diagnostics address of a OpenTelemetry Metrics
protocol gRPC endpoint (e.g.,
https://host:port). Use "http" (not "https")
for an insecure connection.
--diagnostics.attribute=DIAGNOSTICS.ATTRIBUTE ...
Diagnostics resource attributes attached to
OTLP data (e.g., MyResource=Value1). May be
repeated.
--diagnostics.header=DIAGNOSTICS.HEADER ...
Diagnostics headers used for OTLP requests
(e.g., MyHeader=Value1). May be repeated.
--diagnostics.timeout=DIAGNOSTICS.TIMEOUT
Diagnostics timeout used for OTLP Export()
requests
--diagnostics.compression=DIAGNOSTICS.COMPRESSION
Diagnostics compression used for OTLP requests
(e.g., snappy, gzip, none).
--prometheus.wal=PROMETHEUS.WAL
Directory from where to read the Prometheus
TSDB WAL. Default: data/wal
--prometheus.endpoint=PROMETHEUS.ENDPOINT
Endpoint where Prometheus hosts its UI, API,
and serves its own metrics. Default:
http://127.0.0.1:9090/
--prometheus.max-point-age=PROMETHEUS.MAX-POINT-AGE
Skip points older than this, to assist
recovery. Default: 25h0m0s
--prometheus.max-timeseries-per-request=PROMETHEUS.MAX-TIMESERIES-PER-REQUEST
Send at most this number of timeseries per
request. Default: 2000
--prometheus.max-shards=PROMETHEUS.MAX-SHARDS
Max number of shards, i.e. amount of
concurrency. Default: 2000
--prometheus.scrape-interval=PROMETHEUS.SCRAPE-INTERVAL ...
Delay at startup until Prometheus completes a
scrape for this interval. Default waits for the
first scrape to complete, multiple intervals
can be set
--admin.port=ADMIN.PORT Administrative port this process listens on.
Default: 9091
--admin.listen-ip=ADMIN.LISTEN-IP
Administrative IP address this process listens
on. Default: 0.0.0.0
--security.root-certificate=SECURITY.ROOT-CERTIFICATE ...
Root CA certificate to use for TLS connections,
in PEM format (e.g., root.crt). May be
repeated.
--opentelemetry.metrics-prefix=OPENTELEMETRY.METRICS-PREFIX
Customized prefix for exporter metrics. If not
set, none will be used
--filter=FILTER ... PromQL metric and label matcher which must pass
for a series to be forwarded to OpenTelemetry.
If repeated, the series must pass any of the
filter sets to be forwarded.
--startup.timeout=STARTUP.TIMEOUT
Timeout at startup to allow the endpoint to
become available. Default: 5m0s
--healthcheck.period=HEALTHCHECK.PERIOD
Period for internal health checking; set at a
minimum to the shortest Promethues scrape
period
--log.level=LOG.LEVEL Only log messages with the given severity or
above. One of: [debug, info, warn, error]
--log.format=LOG.FORMAT Output format of log messages. One of: [logfmt,
json]
--log.verbose=LOG.VERBOSE Verbose logging level: 0 = off, 1 = some, 2 =
more; 1 is automatically added when log.level
is 'debug'; impacts logging from the gRPC
library in particular
--disable-supervisor Disable the supervisor.
--disable-diagnostics Disable diagnostics by default; if unset,
diagnostics will be auto-configured to the
primary destination
Two kinds of sidecar customization are available only through the
configuration file. An example sidecar yaml configuration documents
the available options.
Command-line and configuration files can be used at the same time,
where command-line parameter values override configuration-file
parameter values, with one exception. Configurations that support
a map from string to string, including both request headers and
resource attributes, are combined from both sources.
Startup safety
The sidecar waits until Prometheus finishes its first scrape(s) to
begin processing the WAL, to ensure that target information is
available before the sidecar tries loading its metadata cache.
When multiple scrape intervals are in use, all intervals should be
monitored. Use the --prometheus.scrape-interval=DURATION
flag to
set scrape intervals to monitor at startup.
Resources
Use the --destination.attribute=KEY=VALUE
flag to add additional resource attributes to all exported timeseries.
Filters
The --filter
flag allows to provide filters which all series have to pass before being sent to the destination. Filters use the same syntax as Prometheus instant vector selectors, e.g.:
opentelemetry-prometheus-sidecar --filter='{__name__!~"cadvisor_.+",job="k8s"}' ...
This drops all series which do not have a job
label k8s
and all metrics that have a name starting with cadvisor_
.
For equality filter on metric name you can use the simpler notation, e.g. --filter='metric_name{label="foo"}'
.
The flag may be repeated to provide several sets of filters, in which case the metric will be forwarded if it matches at least one of them.
Metric renames
To change the name of a metric as it is exported, use the
metric_renames
section in the configuration file:
metric_renames:
- from: original_metric_name
to: new_metric_name
# - ...
To change the output type, value type, or description of a metric
instrument as it is exported, use the static_metadata
section in the
configuration file:
static_metadata:
- metric: some_metric_name
type: counter # or gauge, or histogram
value_type: double # or int64
help: an arbitrary help string
# - ...
Note:
- All
static_metadata
entries must have type
specified.
- If
value_type
is specified, it will override the default value type for counters and gauges. All Prometheus metrics have a default type of double.
Monitoring
When run in the default configuration, the sidecar will self-monitor for liveness and kill the process when it becomes unhealthy. Sidecar liveness can be monitored directly using the /-/health
endpoint. We recommend a period of 30 seconds and failureThreshold: 2
, for example in your Kubernetes deployment:
ports:
- name: admin-port
containerPort: 9091
livenessProbe:
httpGet:
path: /-/health
port: admin-port
periodSeconds: 30
failureThreshold: 2
Diagnostics
The sidecar is instrumented with the OpenTelemetry-Go SDK and runs with standard instrumentation packages, including runtime and host metrics.
By default, diagnostics are autoconfigured to the primary destination. Configuration options are available to disable or configure alternative destinations and options.
Configuration Options
Separate diagnostics settings can be configure to output OTLP similar to configuring the primary destination, for example:
diagnostics:
endpoint: https://otel-collector:443
headers:
Custom-Header: custom-value
timeout: timeout-value
attributes:
extra.resource: extra-value
Likewise, these fields can be accessed using --diagnostics.endpoint
,
--diagnostics.header
, --diagnostics.timeout
, and --diagnostics.attribute
.
Log levels
The Prometheus sidecar provides options for logging in the case of diagnoising an issue.
- We recommend starting with setting the
--log.level
to be debug
, info
, warn
, error
.
- Additional options are available to set the output format of the logs (
--log.format
to be logfmt
or json
), and the number of logs to recorded (--log.verbose
to be 0
for off, 1
for some, 2
for more)
Disabling Diagnostics
To disable diagnostics, set --disable-diagnostics
to true
.
Diagnostic Outputs
The Prometheus Sidecar has two main processes, the supervisor and a subordinate process. Diagnostic traces are sent from the supervisor process to monitor lifecycle events and diagnostic metrics are sent from the subordinate process.
Sidecar Supervisor Tracing
Traces from the supervisor process are most helpful for diagnosing unknown problems. It signals lifecycle events and captures the stderr, attaching it to the trace.
The sidecar will output spans from its supervisor process with service.name=opentelemetry-prometheus-sidecar-supervisor
. There are two kinds of spans: operation=health-client
and operation=shutdown-report
. These spans are also tagged with the attribute sidecar-health
with values ok
, not yet ready
or first time ready
.
Sidecar Metrics
Metrics from the subordinate process can help identify issues once the first metrics are successfully written. There are three standard host and runtime metrics to monitor:
Host and Runtime Metrics
- process.cpu.time, with tag (state:user,sys)
- system.network.io, with tag (direction:read,write)
- runtime.go.mem.heap_alloc
Internal Metrics
Metric Name |
Metric Type |
Description |
Additional Tags |
sidecar.connect.duration |
histogram |
how many attempts to connect (and how long) |
(error:true/false) |
sidecar.export.duration |
histogram |
how many attempts to export (and how long) |
(error:true/false) |
sidecar.monitor.duration |
histogram |
how many attempts to scrape Prometheus /metrics (and how long) |
(error:true/false) |
sidecar.metadata.fetch.duration |
histogram |
how many attempts to fetch metadata from Prometheus (and how long) |
(error:true/false) |
sidecar.queue.outcome |
counter |
outcome of the sample in the queue |
(outcome: success, failed, retry, aborted) |
sidecar.queue.capacity |
gauge |
number of available slots for samples (i.e., points) in the queue, counts buffer size times current number of shards |
|
sidecar.queue.running |
gauge |
number of running shards, those which have not exited |
|
sidecar.queue.shards |
gauge |
number of current shards, as set by the queue manager |
|
sidecar.queue.size |
gauge |
number of samples (i.e., points) standing in a queue waiting to export |
|
sidecar.samples.processed |
histogram |
number of samples (i.e., points) read in a prometheus WAL batch |
|
sidecar.samples.produced |
counter |
number of samples (i.e., points) read from the prometheus WAL |
|
sidecar.series.dropped |
counter |
number of points dropped because of missing metadata |
|
sidecar.wal.size |
gauge |
size of the prometheus WAL |
|
sidecar.wal.offset |
gauge |
current offset in the prometheus WAL |
|
sidecar.segment.opens |
counter |
number of WAL segment open() calls |
|
sidecar.segment.reads |
counter |
number of WAL segment read() calls |
|
sidecar.segment.bytes |
counter |
number of WAL segment bytes read |
|
Upstream
This repository was copied into a private reposotitory from this upstream fork of stackdriver-prometheus-sidecar
, dated July 31, 2020.
Compatibility
The matrix below lists the versions of Prometheus Server and other dependencies that have been qualified to work with releases of opentelemetry-prometheus-sidecar
. If the matrix does not list whether they are compatible, please assume they are not verified yet but can be compatible. Feel free to contribute to the matrix if you have run the end-to-end test between a version of opentelemetry-prometheus-sidecar
and Prometheus server.
Sidecar Version |
Compatible Prometheus Server Version(s) |
Incompatible Prometheus Server Version(s) |
0.1.x |
2.10, 2.11, 2.13, 2.15, 2.16, 2.18, 2.19, 2.21, 2.22, 2.23 |
2.5 |