README ¶
promdump
promdump dumps the head and persistent blocks of Prometheus. It supports filtering the persistent blocks by time range.
Why This Tool
When debugging Kubernetes clusters with restrictive access, I often find it helpful to get access to the in-cluster Prometheus metrics. To reduce the amount of back-and-forth with the users (due to missing metrics, incorrect labels etc.) , it makes sense to ask the users to "get me everything around the time of the incident".
The most common way to achieve this is to use commands like kubectl exec
and
kubectl cp
to compress and dump Prometheus' entire data directory. On
non-trivial clusters, the resulting compressed file can be very large. To
import the data into a local test instance, I will need at least the same amount
of disk space.
promdump is a tool that can be used to dump Prometheus data blocks. It is
different from the promtool tsdb dump
command in such a way that its output
can be re-used in another Prometheus instance. See this
issue for a discussion
on the limitation on the output of promtool tsdb dump
. And unlike the
Promethues TSDB snapshot
API, promdump doesn't require Prometheus to be
started with the --web.enable-admin-api
option. Instead of dumping the entire
TSDB, promdump offers the flexibility to filter persistent blocks by time range.
How It Works
The promdump CLI downloads the promdump-$(VERSION).tar.gz
file from a
public storage bucket
to your local /tmp
folder. The download will be skipped if such a file already
exists. The -f
option can be used to force a re-download.
Then the CLI uploads the decompressed promdump binary to the targeted Prometheus
container, via the pod's exec
subresource.
Within the Prometheus container, promdump queries the Prometheus TSDB using the
tsdb
package. It
reads and streams the WAL files, head block and persistent blocks to stdout,
which can be redirected to a file on your local file system. To regulate the
size of the dump, persistent blocks can be filtered by time range.
⭐ promdump performs read-only operations on the TSDB.
When the data dump is completed, the promdump binary will be automatically deleted from your Prometheus container.
The restore
subcommand can then be used to copy this dump file to another
Prometheus container. When this container is restarted, it will reconstruct its
in-memory index and chunks using the restored on-disk memory-mapped chunks and
WAL.
The --debug
option can be used to output more verbose logs for each command.
Getting Started
Install promdump as a kubectl
plugin:
# coming soon! until then, see https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/#using-a-plugin
kubectl krew install promdump
For demonstration purposes, use kind to create two K8s clusters:
for i in {0..1}; do \
kind create cluster --name dev-0$i ;\
done
Install Prometheus on both clusters using the community Helm chart:
for i in {0..1}; do \
helm --kube-context=kind-dev-0$i install prometheus prometheus-community/prometheus ;\
done
Deploy a custom controller to cluster dev-00
. This controller is annotated for
metrics scraping:
kubectl --context=kind-dev-00 apply -f https://raw.githubusercontent.com/ihcsim/controllers/master/podlister/deployment.yaml
Port-forward to the Prometheus pod to find the custom demo_http_requests_total
metric.
📝 Later, we will use promdump to copy the samples of this metric over to the
dev-01
cluster.
CONTEXT="kind-dev-00"
POD_NAME=$(kubectl --context "${CONTEXT}" get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
kubectl --context="${CONTEXT}" port-forward "${POD_NAME}" 9090
📝 In subsequent commands, the -c
and -d
options can be used to change
the container name and data directoy.
Dump the data from the first cluster:
# check the tsdb metadata
kubectl promdump meta --context=$CONTEXT -p $POD_NAME
Head Block Metadata
------------------------
Minimum time (UTC): | 2021-04-18 18:00:03
Maximum time (UTC): | 2021-04-18 20:34:48
Number of series | 18453
Persistent Blocks Metadata
----------------------------
Minimum time (UTC): | 2021-04-15 03:19:10
Maximum time (UTC): | 2021-04-18 18:00:00
Total number of blocks | 9
Total number of samples | 92561234
Total number of series | 181304
Total size | 139272005
# capture the data dump
TARFILE="dump-`date +%s`.tar.gz"
kubectl promdump \
--context "${CONTEXT}" \
-p "${POD_NAME}" \
--min-time "2021-04-15 03:19:10" \
--max-time "2021-04-18 20:34:48" > "${TARFILE}"
# view the content of the tar file. expect to see the 'chunk_heads', 'wal' and
# persistent blocks directories.
$ tar -tf "${TARFILE}"
Restore the data dump to the Prometheus pod on the dev-01
cluster, where we
don't have the custom controller:
CONTEXT="kind-dev-01"
POD_NAME=$(kubectl --context "${CONTEXT}" get pods --namespace default -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
# check the tsdb metadata
kubectl promdump meta --context "${CONTEXT}" -p "${POD_NAME}"
Head Block Metadata
------------------------
Minimum time (UTC): | 2021-04-18 20:39:21
Maximum time (UTC): | 2021-04-18 20:47:30
Number of series | 20390
No persistent blocks found
# restore the data dump found at ${TARFILE}
kubectl promdump restore \
--context="${CONTEXT}" \
-p "${POD_NAME}" \
-t "${TARFILE}"
# check the metadata again. it should match that of the dev-00 cluster
kubectl promdump meta --context "${CONTEXT}" -p "${POD_NAME}"
Head Block Metadata
------------------------
Minimum time (UTC): | 2021-04-18 18:00:03
Maximum time (UTC): | 2021-04-18 20:35:48
Number of series | 18453
Persistent Blocks Metadata
----------------------------
Minimum time (UTC): | 2021-04-15 03:19:10
Maximum time (UTC): | 2021-04-18 18:00:00
Total number of blocks | 9
Total number of samples | 92561234
Total number of series | 181304
Total size | 139272005
# confirm that the WAL, head and persistent blocks are copied to the targeted
# Prometheus server
kubectl --context="${CONTEXT}" exec "${POD_NAME}" -c prometheus-server -- ls -al /data
Restart the Prometheus pod:
kubectl --context="${CONTEXT}" delete po "${POD_NAME}"
Port-forward to the pod to confirm that the samples of
the demo_http_requests_total
metric have been copied over:
kubectl --context="${CONTEXT}" port-forward "${POD_NAME}" 9091:9090
Make sure that time frame of your query matches that of the restored data.
FAQ
Q: I am not seeing the restored data A: There are a few things you can check:
- When generating the dump, make sure the start and end date times are specified in the UTC time zone.
- If using the Prometheus console, make sure the time filter falls within the
time range of your data dump. You can confirm your restored data time range
using the
kubectl promdump meta
subcommand. - To see if the restore completed successfully, compare the TSDB metadata of the
target Prometheus with the source Prometheus to see if they match, using the
meta
subcommand. The head block metadata may deviate slightly depending on how old your data dump is. - Use commands like
kubectl exec
to run command likesls -al <data_dir>
andcat <data_dir>/<data_block>/meta.json
to confirm the data range of a particular data block. - Try restart the Prometheus pod after the restoration completed to give Prometheus a chance to replay the restored WALs. The restored data must be persisted to survive a restart.
- Check Prometheus logs to see if there are any errors due to corrupted data blocks.
- Run the
restore
subcommand with the--debug
flag to see if that provides more information.
Limitations
promdump is still in its experimental phase. It is used mainly to help with debugging issues, where data blocks are copied from one Prometheus instance to another development instance. Before restoring the data dump, promdump will delete the content of the data folder in the targeted Prometheus instance, to avoid corrupting the data blocks due to conflicting segment error such as:
opening storage failed: get segment range: segments are not sequential
It's not suitable for production backup/restore operation.
Like kubectl cp
, promdump requires the tar
binary in the Prometheus
container.
Development
To run linters and unit test:
make lint test
To produce local builds:
# the kubectl CLI plugin
make cli
# the promdump core
make core
To install Prometheus via Helm:
make hack/prometheus
To do a release:
git tag -a v$version
make dist release
Note that the GitHub Actions pipeline uses the same make release targets.
License
Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.