Telemeter
Telemeter is a set of components used for OpenShift remote health monitoring. It allows OpenShift clusters to push telemetry data about clusters to Red Hat, as Prometheus metrics.
telemeter-server needs to receive and send metrics across multiple security boundaries, and thus needs to perform several authentication, authorization and data integrity checks. It (currently) has two endpoints via which it receives metrics and forwards them to an upstream service as a Prometheus remote write request.
Telemeter implements a Prometheus federation push client and server to allow isolated Prometheus instances that cannot be scraped from a central Prometheus to instead perform authorized push federation to a central location.
The telemeter-client is deployed via the OpenShift Cluster Monitoring Operator and performs a certain set of actions via a forwarder.Worker
every 4 minutes and 30 seconds (by default).
- On initialization, telemeter-client sends a
POST
request to the /authorize
endpoint of telemeter-server with its configured token (configured via --to-token/to-token-file
) as a auth header and the cluster ID as an id
request query param (configured via --id
). It exchanges the token for a JWT token from this endpoint and also receives a set of labels to include as well. Each client is uniquely identified by a cluster ID and all metrics federated are labelled with that ID. For more details on /authorize
see section.
- It caches this token and labels in
tokenStore
and returns a HTTP roundtripper. The roundtripper checks validity and of the cached token and refreshes it before attaching it to any request it sends to telemeter-server.
- telemeter-client sends a
GET
request to the /federate
endpoint of the in-cluster Prometheus instance, and scrapes all metrics (authenticates via --from-ca-file
+ --from-token/from-token-file
). It retrieves the metrics from the response body and parses it into a []*client_model.MetricFamily
type. You can even use --match
arguments to match rules while federating.
- telemeter-client performs some transformations on these collected metrics, to anonymize them, rename them and to add labels provided by the roundtripper tokenStore and CLI args.
- telemeter-client then encodes the metrics (of type
[]*client_model.MetricFamily
) into a POST
request body and sends it to the /upload
endpoint of telemeter-server, thereby "pushing" metrics.
The telemeter-server upon receiving a request at the /upload
endpoint, does the following,
- It authorizes the request by inspecting the JWT token attached in the auth header, via the
authorize.NewAuthorizeClientHandler
which uses jwt.clientAuthorizer
struct that implements the authorize.ClientAuthorizer
interface, to uniqely identify the telemeter-client.
- If successfully identified, it passes
authorize.Client
into the request context, from which cluster ID is extracted later on via server.ClusterID
middleware.
- It then checks if the cluster that the request came from, is under the configured request rate limit.
- If the request in under rate limits, telemeter-server validates/transforms those metrics encoded in the request, by checking request body size, applying whitelist label matcher rules, elide labels (configured via
--whitelist
and --elide-label
) and clusterID labels. It also overwrites all the timestamps that came with the metric families and records the drift, if any.
- The server then converts the received metric families to
[]prompb.TimeSeries
. During conversion however it drops all the timestamps again and overwrites that with current timestamp. It then marshals that into a Prometheus remote write request and forwards that to the Observatorium API, with an oauth2.Client
(configured via OIDC flags) which attaches the correct auth header token after hitting SSO.
/authorize (for telemeter-client)
telemeter-server implements an authorization endpoint for telemeter-client which does the following,
- telemeter-server uses
jwt.NewAuthorizeClusterHandler
which accepts POST
requests, having a auth header token and a "id" query param.
- This handler uses
tollbooth.NewAuthorizer
which implements the authorize.ClusterAuthorizer
interface, to authorize that particular cluster. It uses authorize.AgainstEndpoint
to send the cluster ID and token as a POST request to the authorization server (configured via --authorize
). The authorization server returns a 200 status code, if the cluster is identified correctly.
tollbooth.AuthorizeCluster
returns a subject which is used as the client identifier in a generated signed JWT which is returned to the telemeter-client, along with any labels.
telemeter-server also supports receiving remote write requests directly from in-cluster Prometheus (or any Prometheus with the appropriate auth header). In this case, telemeter-client is no longer needed.
Any client sending a remote write request will need to attach a composite token as an auth header to the request, so that telemeter-server can identify which cluster that request belongs to. You can generate the token via the following,
CLUSTER_ID="$(oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}')" && \
AUTH="$(oc get secret pull-secret -n openshift-config --template='{{index .data ".dockerconfigjson" | base64decode}}' | jq '.auths."cloud.openshift.com"'.auth)" && \
echo -n "{'authorization_token':$AUTH,'cluster_id':$CLUSTER_ID}" | base64 -w 0
The client will also be responsible for ensuring that all metrics sent will have the _id
(cluster ID) label. Sending metric metadata is not supported.
Upon receiving a request at this endpoint, telemeter-server does the following,
- telemeter-server parses the bearer token (decodes base64 JSON with "cluster_id" and "authorization_token" fields) via
authorize.NewHandler
- It then sends this as a
POST
request against the authorization server (configured via --authorize
) using authorize.AgainstEndpoint
. The authorization server returns a 200 status code, if the cluster is identified correctly.
- telemeter-server then checks the request body size and if all metrics in the remote write request have the cluster ID label (
_id
by default). It also drops metrics which do not match whitelist label matchers and elides labels (configured via --whitelist
and --elide-label
).
- It then forwards that to the Observatorium API, with an
oauth2.Client
(configured via OIDC flags) which attaches the correct auth header token after hitting SSO.
This is planned to be adopted by CMO.
note: Telemeter is alpha and may change significantly
Get started
To see this in action, run
make test-integration
The command launches a two instance telemeter-server
cluster and a single telemeter-client
to talk to that server, along with a Prometheus instance running on http://localhost:9090 that shows the federated metrics.
The client will scrape metrics from the local Prometheus, then send those to the telemeter-server, which will then forward metrics to Thanos Receive, which can be queried via a Thanos Querier.
To build binaries, run
make build
To execute the unit test suite, run
make test-unit
Adding new metrics to send via telemeter
Docs on the process on why and how to send these metrics are available here.
Testing recording rule changes
Run
make test-rules