README ¶
Backup Sidecar
This Docker image is intended to be used as a sidecar to data base servers and take regular backups based on a schedule and upload it to a remote object storage.
How To Use
Prerequisites And Assumptions
Currently this image supports PostgreSQL, MySQL, MariaDB and MongoDB. Basically
it supports data bases where the installed versions of mysqldump
, mongodump
pg_dump/pg_dumpall
work.
Supported Targets
We tested with Minio S3 servers (edge), but it should really work with any S3-compatible service.
Also Google Cloud Storage (GCS) has been tested and is supported.
Note: Only one target type is supported as of now. GCS will take precendece
(and S3 is ignored) if the relevant *gcs*
flags/env vars have been set.
Setup & Configuration
Docker image:
docker pull registry.gitlab.com/mironet/magnolia-backup:latest
Set these environment variables to get the image going in server mode (which is intended as the default).
environment:
MGNLBACKUP_ADDRESS=:9999 # Listen on this address for HTTP endpoints.
# S3-related config:
MGNLBACKUP_BUCKET=backup
MGNLBACKUP_S3_ACCESSKEY=minio
MGNLBACKUP_S3_SECRETKEY=minio123
MGNLBACKUP_S3_ENDPOINT=<s3-server>:9000
# GCS-related config:
MGNLBACKUP_GCS_PROJECTID=project-id
GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
# Cron config:
MGNLBACKUP_CRON=@every 24h # see https://godoc.org/github.com/robfig/cron for details
# Commands to get an SQL dump
MGNLBACKUP_CMD=pg_dumpall
MGNLBACKUP_ARGS=-h,localhost,-U,magnolia
Available Enviroment Variables
General Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_ADDRESS | :9999 |
yes | :9999 |
Listen on this address for HTTP endpoints. |
MGNLBACKUP_CMD | pg_dumpall , mysqldump ,mongodump |
yes | The command which should be run | |
MGNLBACKUP_ARGS | --host localhost --user myuser |
yes | The commands which should be passed to the command. A working example: mysqldump: -h <HOST> -u <USER> -p<PASSWORD> <DATABASE> mongodump: --uri="mongodb://<USER>:<PASSWORD>@<HOST>:27017/<DATABASE>" --archive |
|
MGNLBACKUP_LOGLEVEL | error , warn , info , debug , trace |
no | info |
In order to enable verbosity and help debug. trace is the most verbose option. |
Cron Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_CRON | @every 24h |
yes | @every 24h |
See for more details robfig/cron. Casual cron commands like 0 * * * * are supported as well. |
MGNLBACKUP_CRON_TIMEOUT | time.Duration | yes | 10m |
Timeout (max duration) for a single cron job |
General Object Storage Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_BUCKET | string | yes | The name of the bucket. | |
MGNLBACKUP_PREFIX | string | no | String to append before the date in object names on object storage. | |
MGNLBACKUP_CYCLE | string | no | 15,4,3 |
Backup retention cycle in the format [daily,weekly,monthly] |
MGNLBACKUP_KEEPDAYS | int | no | 15 |
Keep this many days of backups max, the same as 15,0,0 for MGNLBACKUP_S3_CYCLE . |
MGNLBACKUP_HERITAGE | string | no | magnolia-backup-<version> |
Only objects in the bucket with this user tag are touched when recycling backups. New objects also will have this tag. |
S3 Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_S3_ACCESSKEY | string | yes | A valid access key. In case of AWS create an API user | |
MGNLBACKUP_S3_SECRETKEY | string | yes | The secret for the given access key. | |
MGNLBACKUP_S3_ENDPOINT | string | yes | minio:9000 |
Endpoint might include a port, an example for AWS S3 s3.eu-central-1.amazonaws.com |
MGNLBACKUP_S3_INSECURE | true,false |
no | If this is true, connect to the target without using TLS (!) |
Google Cloud Storage (GCS) Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_GCS_PROJECTID | string | yes | You can find this ID in the Google Console. | |
GOOGLE_APPLICATION_CREDENTIALS | string | yes | The path to the JSON file with the private key. |
Note: If you use GCS and run this as a Docker container, make sure to mount/copy the key file inside the container or use a KMS like Hashicorp Vault.
Features
After startup, the server tries to issue the command and args given to get a
(database/data) dump. It then uploads the output of this dump as a
gzip
-compressed file to the target bucket according to the cron schedule
given. Only one cron job can run at the same time.
Environment variables are copied to the command being executed and expanded in any arguments passed to the command.
When a new backup is created a sweep of the older backups is performed to clean
out stale backups according to the MGNLBACKUP_CYCLE
configuration.
You can get a list of all dumps taken over time and download them directly.
PostgreSQL WAL Archiving
In combination with PostgreSQL the mode can be switched to WAL archiving. It provides better backup performance and point-in-time recovery on top of it. Depending on the size of the data base it might even be impossible or impractical to use pg_dump
as a backup plan. This mode is PostgreSQL-specific and doesn't work with other data bases. It needs at least version 9+.
Mode of Operation
Instead of periodically backup up a fresh dump of the whole data base in this mode a base backup is taken periodically according to MGNLBACKUP_CRON
. This should be set to a large value like every month. This of course depends on the volume the data base is seeing. Also with PostgreSQL this procedure seems not to interfere too much with the source data base's performance, while pg_dump
uses all resources to finish as fast as possible.
After that PostgreSQL is instructed to copy its transaction logs after reaching a certain amount of data (16 MiBs default). This file then is uploaded to the object storage and deleted from the monitored archive folder. To restore the base backup and all transaction logs up until the desired point in time to recover is needed.
Configuration
The data base needs to know how to copy the transaction log regularly. And we need to specify the monitored archive folder to pick up those logs after they have been copied. In the case of PostgreSQL the following is an example config:
MGNLBACKUP_CMD: pg_basebackup
MGNLBACKUP_ARGS: '--host localhost --username $$POSTGRES_USER -D /scratch -Fp'
MGNLBACKUP_USE_PG_WAL: "true" # Uses tx log shipping/archiving.
MGNLBACKUP_SYNC_DIR: "/archive"
MGNLBACKUP_NO_STDOUT: "true"
In this case the monitored folder is /archive
which should be persistent in case the upload is interrupted. There's also the /scratch
folder in this example. This directory is used as a temporary space for the base backup and doesn't need to be persistent.
HTTP Endpoints
There are a few endpoints you can use to fetch data / alter the configuration.
Location | Effect |
---|---|
/list |
Shows the backup list page above. If you send the Accept: application/json header in requests to this endpoint you get the same information as a JSON output. |
/dump |
Takes a single backup right now. Expected response is 202 Accepted or 200 OK , everything else is not ok :) |
/metrics |
Dumps metrics in Prometheus format. |
Monitoring
In server mode this tool exposes a few metrics on the specified endpoint, the
default go_.*
and process_.*
metrics as well as mgnlbackup_.*
metrics like
total backup size on object storage, backup duration, count and errors.
Examples
Triggering a single database backup job
$ curl localhost:9999/dump?sync=true
{"name":"myprefix2020-01-16-155805.sql.gz","size":1048919}
Scrape metrics
$ curl -v http://localhost:9999/metrics | grep -C 2 mgnl
# TYPE go_threads gauge
go_threads 10
# HELP mgnlbackup_backups The total number of daily backups.
# TYPE mgnlbackup_backups gauge
mgnlbackup_backups{interval="daily"} 2
mgnlbackup_backups{interval="monthly"} 1
mgnlbackup_backups{interval="weekly"} 1
# HELP mgnlbackup_bytes Total byte size of all backups combined in target storage.
# TYPE mgnlbackup_bytes gauge
mgnlbackup_bytes 3.7760302e+07
# HELP mgnlbackup_errors Number of erroneous, not completed backup jobs.
# TYPE mgnlbackup_errors counter
mgnlbackup_errors 1
# HELP mgnlbackup_seconds Time taken for backup jobs.
# TYPE mgnlbackup_seconds summary
mgnlbackup_seconds_sum 77.9812096
mgnlbackup_seconds_count 3
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
Getting the current backup list in JSON format
$ curl -H 'Accept: application/json' http://localhost:9999/list | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2356 0 2356 0 0 176k 0 --:--:-- --:--:-- --:--:-- 176k
[
{
"link": "http://minio:9000/backup/myprefix2020-05-10-143004.sql.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20200510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200510T144815Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22myprefix2020-05-10-143004.sql.gz%22&X-Amz-Signature=9d9ea696bf75037a0fdf121f1e0574383098d84f5452fdef8e0a486a2293bc9c",
"name": "myprefix2020-05-10-143004.sql.gz",
"last_modified": "2020-05-10T14:30:04.05Z",
"size": 1048919,
"daily": true,
"weekly": true,
"monthly": true
},
{
"link": "http://minio:9000/backup/myprefix2020-05-10-143046.sql.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20200510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200510T144815Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22myprefix2020-05-10-143046.sql.gz%22&X-Amz-Signature=a786174e8922bf3f5337e533a989c84c843fc5f7f9d9fe1bd2c8fecfb83ee6e3",
"name": "myprefix2020-05-10-143046.sql.gz",
"last_modified": "2020-05-10T14:30:47Z",
"size": 1048919,
"daily": true,
"weekly": false,
"monthly": false
},
{
"link": "http://minio:9000/backup/myprefix2020-05-10-143310.sql.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20200510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200510T144815Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22myprefix2020-05-10-143310.sql.gz%22&X-Amz-Signature=ead1549ba7c89f90c1d1b8732fbb9c4701c4d362fffc862f408a1d0fc0c88fd7",
"name": "myprefix2020-05-10-143310.sql.gz",
"last_modified": "2020-05-10T14:33:10.9Z",
"size": 1048919,
"daily": true,
"weekly": false,
"monthly": false
},
{
"link": "http://minio:9000/backup/myprefix2020-05-10-143745.sql.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20200510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200510T144815Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22myprefix2020-05-10-143745.sql.gz%22&X-Amz-Signature=c53a0fc2748f14fe4b80b162432c281f30cbe74db36642480c934beb7a91d815",
"name": "myprefix2020-05-10-143745.sql.gz",
"last_modified": "2020-05-10T14:37:45.43Z",
"size": 1048919,
"daily": true,
"weekly": false,
"monthly": false
}
]
Documentation ¶
There is no documentation for this package.