magnolia-backup

command module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 2, 2020 License: MIT Imports: 39 Imported by: 0

README

Backup Sidecar

This Docker image is intended to be used as a sidecar to data base servers and take regular backups based on a schedule and upload it to a remote object storage.

How To Use

Prerequisites And Assumptions

Currently this image supports PostgreSQL, MySQL, MariaDB and MongoDB. Basically it supports data bases where the installed versions of mysqldump, mongodump pg_dump/pg_dumpall work.

Supported Targets

We tested with Minio S3 servers (edge), but it should really work with any S3-compatible service.

Also Google Cloud Storage (GCS) has been tested and is supported.

Note: Only one target type is supported as of now. GCS will take precendece (and S3 is ignored) if the relevant *gcs* flags/env vars have been set.

Setup & Configuration

Docker image:

docker pull registry.gitlab.com/mironet/magnolia-backup:latest

Set these environment variables to get the image going in server mode (which is intended as the default).

environment:
  MGNLBACKUP_ADDRESS=:9999 # Listen on this address for HTTP endpoints.
  # S3-related config:
  MGNLBACKUP_BUCKET=backup
  MGNLBACKUP_S3_ACCESSKEY=minio
  MGNLBACKUP_S3_SECRETKEY=minio123
  MGNLBACKUP_S3_ENDPOINT=<s3-server>:9000
  # GCS-related config:
  MGNLBACKUP_GCS_PROJECTID=project-id
  GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
  # Cron config:
  MGNLBACKUP_CRON=@every 24h # see https://godoc.org/github.com/robfig/cron for details
  # Commands to get an SQL dump
  MGNLBACKUP_CMD=pg_dumpall
  MGNLBACKUP_ARGS=-h,localhost,-U,magnolia

Available Enviroment Variables

General Configuration
Variable Values/Types Required Default Description
MGNLBACKUP_ADDRESS :9999 yes :9999 Listen on this address for HTTP endpoints.
MGNLBACKUP_CMD pg_dumpall, mysqldump,mongodump yes The command which should be run
MGNLBACKUP_ARGS --host localhost --user myuser yes The commands which should be passed to the command. A working example:

mysqldump: -h <HOST> -u <USER> -p<PASSWORD> <DATABASE>
mongodump: --uri="mongodb://<USER>:<PASSWORD>@<HOST>:27017/<DATABASE>" --archive
MGNLBACKUP_LOGLEVEL error, warn, info, debug, trace no info In order to enable verbosity and help debug. trace is the most verbose option.
Cron Configuration
Variable Values/Types Required Default Description
MGNLBACKUP_CRON @every 24h yes @every 24h See for more details robfig/cron. Casual cron commands like 0 * * * * are supported as well.
MGNLBACKUP_CRON_TIMEOUT time.Duration yes 10m Timeout (max duration) for a single cron job
General Object Storage Configuration
Variable Values/Types Required Default Description
MGNLBACKUP_BUCKET string yes The name of the bucket.
MGNLBACKUP_PREFIX string no String to append before the date in object names on object storage.
MGNLBACKUP_CYCLE string no 15,4,3 Backup retention cycle in the format [daily,weekly,monthly]
MGNLBACKUP_KEEPDAYS int no 15 Keep this many days of backups max, the same as 15,0,0 for MGNLBACKUP_S3_CYCLE.
MGNLBACKUP_HERITAGE string no magnolia-backup-<version> Only objects in the bucket with this user tag are touched when recycling backups. New objects also will have this tag.
S3 Configuration
Variable Values/Types Required Default Description
MGNLBACKUP_S3_ACCESSKEY string yes A valid access key. In case of AWS create an API user
MGNLBACKUP_S3_SECRETKEY string yes The secret for the given access key.
MGNLBACKUP_S3_ENDPOINT string yes minio:9000 Endpoint might include a port, an example for AWS S3 s3.eu-central-1.amazonaws.com
MGNLBACKUP_S3_INSECURE true,false no If this is true, connect to the target without using TLS (!)
Google Cloud Storage (GCS) Configuration
Variable Values/Types Required Default Description
MGNLBACKUP_GCS_PROJECTID string yes You can find this ID in the Google Console.
GOOGLE_APPLICATION_CREDENTIALS string yes The path to the JSON file with the private key.

Note: If you use GCS and run this as a Docker container, make sure to mount/copy the key file inside the container or use a KMS like Hashicorp Vault.

Features

After startup, the server tries to issue the command and args given to get a (database/data) dump. It then uploads the output of this dump as a gzip-compressed file to the target bucket according to the cron schedule given. Only one cron job can run at the same time.

Environment variables are copied to the command being executed and expanded in any arguments passed to the command.

When a new backup is created a sweep of the older backups is performed to clean out stale backups according to the MGNLBACKUP_CYCLE configuration.

You can get a list of all dumps taken over time and download them directly.

Screenshot of Backup List

PostgreSQL WAL Archiving

In combination with PostgreSQL the mode can be switched to WAL archiving. It provides better backup performance and point-in-time recovery on top of it. Depending on the size of the data base it might even be impossible or impractical to use pg_dump as a backup plan. This mode is PostgreSQL-specific and doesn't work with other data bases. It needs at least version 9+.

Mode of Operation

Instead of periodically backup up a fresh dump of the whole data base in this mode a base backup is taken periodically according to MGNLBACKUP_CRON. This should be set to a large value like every month. This of course depends on the volume the data base is seeing. Also with PostgreSQL this procedure seems not to interfere too much with the source data base's performance, while pg_dump uses all resources to finish as fast as possible.

WAL Archiving

After that PostgreSQL is instructed to copy its transaction logs after reaching a certain amount of data (16 MiBs default). This file then is uploaded to the object storage and deleted from the monitored archive folder. To restore the base backup and all transaction logs up until the desired point in time to recover is needed.

Configuration

The data base needs to know how to copy the transaction log regularly. And we need to specify the monitored archive folder to pick up those logs after they have been copied. In the case of PostgreSQL the following is an example config:

MGNLBACKUP_CMD: pg_basebackup
MGNLBACKUP_ARGS: '--host localhost --username $$POSTGRES_USER -D /scratch -Fp'
MGNLBACKUP_USE_PG_WAL: "true" # Uses tx log shipping/archiving.
MGNLBACKUP_SYNC_DIR: "/archive"
MGNLBACKUP_NO_STDOUT: "true"

In this case the monitored folder is /archive which should be persistent in case the upload is interrupted. There's also the /scratch folder in this example. This directory is used as a temporary space for the base backup and doesn't need to be persistent.

HTTP Endpoints

There are a few endpoints you can use to fetch data / alter the configuration.

Location Effect
/list Shows the backup list page above. If you send the Accept: application/json header in requests to this endpoint you get the same information as a JSON output.
/dump Takes a single backup right now. Expected response is 202 Accepted or 200 OK, everything else is not ok :)
/metrics Dumps metrics in Prometheus format.

Monitoring

In server mode this tool exposes a few metrics on the specified endpoint, the default go_.* and process_.* metrics as well as mgnlbackup_.* metrics like total backup size on object storage, backup duration, count and errors.

Examples

Triggering a single database backup job
$ curl localhost:9999/dump?sync=true
{"name":"myprefix2020-01-16-155805.sql.gz","size":1048919}
Scrape metrics
$ curl -v http://localhost:9999/metrics | grep -C 2 mgnl
# TYPE go_threads gauge
go_threads 10
# HELP mgnlbackup_backups The total number of daily backups.
# TYPE mgnlbackup_backups gauge
mgnlbackup_backups{interval="daily"} 2
mgnlbackup_backups{interval="monthly"} 1
mgnlbackup_backups{interval="weekly"} 1
# HELP mgnlbackup_bytes Total byte size of all backups combined in target storage.
# TYPE mgnlbackup_bytes gauge
mgnlbackup_bytes 3.7760302e+07
# HELP mgnlbackup_errors Number of erroneous, not completed backup jobs.
# TYPE mgnlbackup_errors counter
mgnlbackup_errors 1
# HELP mgnlbackup_seconds Time taken for backup jobs.
# TYPE mgnlbackup_seconds summary
mgnlbackup_seconds_sum 77.9812096
mgnlbackup_seconds_count 3
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
Getting the current backup list in JSON format
$ curl -H 'Accept: application/json' http://localhost:9999/list | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2356    0  2356    0     0   176k      0 --:--:-- --:--:-- --:--:--  176k
[
  {
    "link": "http://minio:9000/backup/myprefix2020-05-10-143004.sql.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20200510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200510T144815Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22myprefix2020-05-10-143004.sql.gz%22&X-Amz-Signature=9d9ea696bf75037a0fdf121f1e0574383098d84f5452fdef8e0a486a2293bc9c",
    "name": "myprefix2020-05-10-143004.sql.gz",
    "last_modified": "2020-05-10T14:30:04.05Z",
    "size": 1048919,
    "daily": true,
    "weekly": true,
    "monthly": true
  },
  {
    "link": "http://minio:9000/backup/myprefix2020-05-10-143046.sql.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20200510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200510T144815Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22myprefix2020-05-10-143046.sql.gz%22&X-Amz-Signature=a786174e8922bf3f5337e533a989c84c843fc5f7f9d9fe1bd2c8fecfb83ee6e3",
    "name": "myprefix2020-05-10-143046.sql.gz",
    "last_modified": "2020-05-10T14:30:47Z",
    "size": 1048919,
    "daily": true,
    "weekly": false,
    "monthly": false
  },
  {
    "link": "http://minio:9000/backup/myprefix2020-05-10-143310.sql.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20200510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200510T144815Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22myprefix2020-05-10-143310.sql.gz%22&X-Amz-Signature=ead1549ba7c89f90c1d1b8732fbb9c4701c4d362fffc862f408a1d0fc0c88fd7",
    "name": "myprefix2020-05-10-143310.sql.gz",
    "last_modified": "2020-05-10T14:33:10.9Z",
    "size": 1048919,
    "daily": true,
    "weekly": false,
    "monthly": false
  },
  {
    "link": "http://minio:9000/backup/myprefix2020-05-10-143745.sql.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=minio%2F20200510%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200510T144815Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3D%22myprefix2020-05-10-143745.sql.gz%22&X-Amz-Signature=c53a0fc2748f14fe4b80b162432c281f30cbe74db36642480c934beb7a91d815",
    "name": "myprefix2020-05-10-143745.sql.gz",
    "last_modified": "2020-05-10T14:37:45.43Z",
    "size": 1048919,
    "daily": true,
    "weekly": false,
    "monthly": false
  }
]

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL