magnolia-backup

command module

v0.5.0 Latest Latest Go to latest Published: Feb 23, 2022 License: MIT Imports: 57 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

gitlab.com/mironet/magnolia-backup

Links

Open Source Insights

README ¶

Backup Sidecar

This Docker image is intended to be used as a sidecar to data base servers and take regular backups based on a schedule and upload it to a remote object storage.

How To Use

Prerequisites And Assumptions

Currently this image supports PostgreSQL, MySQL, MariaDB and MongoDB. Basically it supports data bases where the installed versions of mysqldump, mongodump pg_dump/pg_dumpall work.

Supported Targets

We tested with Minio S3 servers (edge), but it should really work with any S3-compatible service.

Also Google Cloud Storage (GCS) has been tested and is supported.

Note: Only one target type is supported as of now at the same time. GCS will take precendece (and S3 is ignored) if the relevant *gcs* flags/env vars have been set.

Setup & Configuration

Docker image:

docker pull registry.gitlab.com/mironet/magnolia-backup

Set these environment variables to get the image going in server mode (which is intended as the default).

environment:
  MGNLBACKUP_ADDRESS=:9999 # Listen on this address for HTTP endpoints.
  # S3-related config:
  MGNLBACKUP_BUCKET=backup
  MGNLBACKUP_S3_ACCESSKEY=minio
  MGNLBACKUP_S3_SECRETKEY=minio123
  MGNLBACKUP_S3_ENDPOINT=<s3-server>:9000
  # GCS-related config:
  MGNLBACKUP_GCS_PROJECTID=project-id
  GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
  # Cron config:
  MGNLBACKUP_CRON=@every 24h # see https://godoc.org/github.com/robfig/cron for details
  # Commands to get an SQL dump
  MGNLBACKUP_CMD=pg_dumpall
  MGNLBACKUP_ARGS=-h,localhost,-U,magnolia

Available Enviroment Variables

General Configuration

Variable	Values/Types	Required	Default	Description
MGNLBACKUP_ADDRESS	`:9999`	yes	`:9999`	Listen on this address for HTTP endpoints.
MGNLBACKUP_CMD	`pg_dumpall`, `mysqldump`,`mongodump`	yes		The command which should be run
MGNLBACKUP_ARGS	`--host localhost --user myuser`	yes		The commands which should be passed to the command. A working example: mysqldump: `-h <HOST> -u <USER> -p<PASSWORD> <DATABASE>` mongodump: `--uri="mongodb://<USER>:<PASSWORD>@<HOST>:27017/<DATABASE>" --archive`
MGNLBACKUP_LOGLEVEL	`error`, `warn`, `info`, `debug`, `trace`	no	`info`	In order to enable verbosity and help debug. `trace` is the most verbose option.
MGNLBACKUP_TAGS_MY_LABEL	string	no		If an env var starting with `MGNLBACKUP_TAGS` is present, the backup will have its values as additional tags. The name will be derived from the part after `MGNLBACKUP_TAGS_` (lowercase). In this example `my_label=<value from env var>`. See the section about tags below.

Cron Configuration

Variable	Values/Types	Required	Default	Description
MGNLBACKUP_CRON	`@every 24h`	yes	`@every 24h`	See for more details robfig/cron. Casual cron commands like `0 * * * *` are supported as well.

Dump Configuration

Variable	Values/Types	Required	Default	Description
MGNLBACKUP_DUMP_TIMEOUT	time.Duration	yes	`10h`	Timeout (max duration) for a single dump operation.

General Object Storage Configuration

Variable	Values/Types	Required	Default	Description
MGNLBACKUP_BUCKET	string	yes		The name of the bucket.
MGNLBACKUP_PREFIX	string	no		String to append before the date in object names on object storage.
MGNLBACKUP_CYCLE	string	no	`15,4,3`	Backup retention cycle in the format [daily,weekly,monthly]
MGNLBACKUP_KEEPDAYS	int	no	`0`	Keep this many days of backups max. If keepdays is set to `0` (default), backups are kept forever. If keepdays is set to a positive integer `i`, this is the same as `i,0,0` for `MGNLBACKUP_CYCLE`.
MGNLBACKUP_HERITAGE	string	no	`magnolia-backup-<version>`	Only objects in the bucket with this user tag are touched when recycling backups. New objects also will have this tag.

PostgresSQL WAL archiving Configuration.

Variable	Values/Types	Required	Default	Description
MGNLBACKUP_USE_PG_WAL	`true,false`	no	`false`	Use PostgreSQL WAL archiving to object storage.
MGNLBACKUP_SYNC_DIR	string	yes (if WAL archiving)		Directory to continuously sync to the cloud target (use for continuous archiving of WAL logs).

S3 Configuration

Variable	Values/Types	Required	Default	Description
MGNLBACKUP_S3_ACCESSKEY	string	yes		A valid access key. In case of AWS create an API user
MGNLBACKUP_S3_SECRETKEY	string	yes		The secret for the given access key.
MGNLBACKUP_S3_ENDPOINT	string	yes	`minio:9000`	Endpoint might include a port, an example for AWS S3 `s3.eu-central-1.amazonaws.com`
MGNLBACKUP_S3_INSECURE	`true,false`	no		If this is true, connect to the target without using TLS (!)
MGNLBACKUP_S3_REGION	string	no	`us-east-1`	The S3 region used.

Google Cloud Storage (GCS) Configuration

Variable	Values/Types	Required	Default	Description
MGNLBACKUP_GCS_PROJECTID	string	yes		You can find this ID in the Google Console.
MGNLBACKUP_GCS_LOCATION	string	no	`EUROPE-WEST6`	Region where to create the bucket if not yet present.
MGNLBACKUP_GCS_LOCATION_TYPE	string	no	`region`	Replication type (multi-region, region or dual-region).
GOOGLE_APPLICATION_CREDENTIALS	string	yes		The path to the JSON file with the private key.

Note: If you use GCS and run this as a Docker container, make sure to mount/copy the key file inside the container or use a KMS like Hashicorp Vault.

Notable Command Line Arguments

--readonly When starting the server in read only mode it will not take any backups or upload any objects with the exception of backup/restore bundles requested with the /bundle endpoint. This is useful if you need a "view" of all backups currently in the bucket.

Features

After startup, the server tries to issue the command and args given to get a (database/data) dump. It then uploads the output of this dump as a gzip-compressed file to the target bucket according to the cron schedule given. Only one cron job can run at the same time.

Various endpoints of the server, when called, will dump the database in one way or another. Note that only one dump can run at the same time. The server will return 429 Too Many Requests if a dump cannot be run because there is already one running.

Environment variables are copied to the command being executed and expanded in any arguments passed to the command.

When a new backup is created a sweep of the older backups is performed to clean out stale backups according to the MGNLBACKUP_CYCLE configuration.

You can get a list of all dumps taken over time and download them directly.

Screenshot of Backup List

PostgreSQL WAL Archiving

In combination with PostgreSQL the mode can be switched to WAL archiving. It provides better backup performance and point-in-time recovery on top of it. Depending on the size of the data base it might even be impossible or impractical to use pg_dump as a backup plan. This mode is PostgreSQL-specific and doesn't work with other data bases. It needs at least version 9+.

Mode of Operation

Instead of periodically backup up a fresh dump of the whole data base in this mode a base backup is taken periodically according to MGNLBACKUP_CRON. This should be set to a large value like every month. This of course depends on the volume the data base is seeing. Also with PostgreSQL this procedure seems not to interfere too much with the source data base's performance, while pg_dump uses all resources to finish as fast as possible.

WAL Archiving

After that PostgreSQL is instructed to copy its transaction logs after reaching a certain amount of data (16 MiBs default). This file then is uploaded to the object storage and deleted from the monitored archive folder. To restore the base backup and all transaction logs up until the desired point in time to recover is needed.

Configuration

The data base needs to know how to copy the transaction log regularly. And we need to specify the monitored archive folder to pick up those logs after they have been copied. In the case of PostgreSQL the following is an example config:

MGNLBACKUP_USE_PG_WAL: "true" # Uses tx log shipping/archiving.
MGNLBACKUP_SYNC_DIR: "/archive"

In this case the monitored folder is /archive which should be persistent in case the upload is interrupted.

HTTP Endpoints

There are a few endpoints you can use to fetch data / alter the configuration.

Location	Effect
`/cache`	Send a DELETE request to this endpoint and the list cache will be cleared.
`/list`	Shows the backup list page above. If you send the `Accept: application/json` header in requests to this endpoint you get the same information as a JSON output.
`/list?query=<query>`	You can append a query matching tags. The syntax is similar to PromQL label matchers. See the section about querying tags below.
`/list?orderby=<tag key>`	`orderby` defines what tag is used for ordering. If `orderby` is not specified, the list is order by ULID whenever `dir` is specified.
`/list?dir=<direction>`	`dir` defines sorting direction (`asc` or `desc`) of whathever has been specified by orderBy.
`/list?limit=<number of entries>`	`limit` limits the returned number of list entries.
`/dump`	Takes a single backup right now. Expected response is `202 Accepted` or `200 OK`, everything else is not ok :)
`/bundle/<RFC3339 time>`	Returns a list of files needed for a point in time restore. The point in time can be specified as an RFC3339 timestamp or just the word `now` (or nothing at all) for the current time. This endpoint only returns sensible results when using WAL archiving.
`/bundle/<RFC3339 time>?query=<query>`	Same as above but includes a query to select certain backup objects. This is useful for the `--readonly` mode.
`/bundle/<RFC3339 time>?mode=restore&upload&query=<query>`	Stores a JSON formatted bundle in object storage for restoration with the `boot` command and returns the direct download URL.
`/bundle/<RFC3339 time>?download&query=<query>`	Download a tarball which can be used for starting PostgreSQL locally. The `recovery.conf` file be autogenerated inline.
`/metrics`	Dumps metrics in Prometheus format.

About Bundles

A backup bundle is made up of at least a single base backup, a respective base backup meta file and the following transaction logs. When you request a backup bundle with a specific timestamp the correct base backup and meta file will be selected and all transaction logs up to and including the desired timestamp.

In theory you can untar all the files listed and start PostgreSQL for a complete restore. The use of mgnlbackup boot with the --datasource switch is recommended though because it conveniently creates a recovery.conf file for PostgreSQL to know how to behave when starting from the restored files.

For example a request to /bundle/now would yield:

[
  {
    "link": "https://storage.googleapis.com/...",
    "name": "test-2021-05-26-221754-basebackup.tar.gz",
    "last_modified": "2021-05-26T22:17:46.685Z",
    "size": 4134201,
    "tags": {
      "heritage": "heritagetest",
      "variant": "tx_log_archiving"
    }
  },
  {
    "link": "https://storage.googleapis.com/...",
    "name": "test-2021-05-26-221819-bundle.json.gz",
    "last_modified": "2021-05-26T22:18:10.44Z",
    "size": 734,
    "tags": {
      "basebackup_parent": "test-2021-05-26-221754-basebackup.tar.gz",
      "basebackup_start": "2021-05-26T22:17:46Z",
      "heritage": "heritagetest",
      "variant": "tx_log_archiving"
    }
  },
  {
    "link": "https://storage.googleapis.com/...",
    "name": "test-2021-05-26-221753-txlog.tar.gz",
    "last_modified": "2021-05-26T22:18:14.636Z",
    "size": 115867,
    "tags": {
      "basebackup_parent": "test-2021-05-26-221754-basebackup.tar.gz",
      "basebackup_start": "2021-05-26T22:17:46Z",
      "heritage": "heritagetest",
      "variant": "tx_log_archiving"
    }
  }
]

Note: The output has been shortened for readability (link field).

This is interesting for informational usage (e.g. an API listing backup bundles).

You can ask for a bundle by setting /bundle/now?mode=restore, which can be used for point-in-time restores:

{
  "backup_list": [
    {
      "link": "https://storage.googleapis.com/...",
      "name": "test-2021-05-26-221754-basebackup.tar.gz",
      "last_modified": "2021-05-26T22:17:46.685Z",
      "size": 4134201,
      "tags": {
        "heritage": "heritagetest",
        "variant": "tx_log_archiving"
      }
    },
    {
      "link": "https://storage.googleapis.com/...",
      "name": "test-2021-05-26-221819-bundle.json.gz",
      "last_modified": "2021-05-26T22:18:10.44Z",
      "size": 734,
      "tags": {
        "basebackup_parent": "test-2021-05-26-221754-basebackup.tar.gz",
        "basebackup_start": "2021-05-26T22:17:46Z",
        "heritage": "heritagetest",
        "variant": "tx_log_archiving"
      }
    },
    {
      "link": "https://storage.googleapis.com/...",
      "name": "test-2021-05-26-221753-txlog.tar.gz",
      "last_modified": "2021-05-26T22:18:14.636Z",
      "size": 115867,
      "tags": {
        "basebackup_parent": "test-2021-05-26-221754-basebackup.tar.gz",
        "basebackup_start": "2021-05-26T22:17:46Z",
        "heritage": "heritagetest",
        "variant": "tx_log_archiving"
      }
    }
  ],
  "point_in_time": "2021-05-26T22:42:57.0618294Z"
}

By asking for /bundle/now?mode=restore&upload you get a single pre-signed link back which can be fed to mgnlbackup boot --datasource directly (just the URL in the link field, not the whole JSON output).

{
  "link": "https://storage.googleapis.com/..."
}

Note: The output has been shortened for readability.

This is useful for backup/restore automation.

About Tags

Every backup will be tagged by the lifecycler when uploaded. Tags are useful to add information about backups like which k8s namespace they came from or which deployment they are part of. Tags are key/value pairs (in Go a map[string]string).

A few default values will always be applied like interval markers daily, weekly, monthly (if applicable), heritage (always) and tags from the environment variables.

For example this environment variable list ...

MGNLBACKUP_TAGS_NAMESPACE=integration
MGNLBACKUP_TAGS_COMPONENT=author-instance
MGNLBACKUP_TAGS_TIER=app

... will result in the following tags on the object storage object:

    "tags": {
      "basebackup_parent": "myprefix-2021-03-21-145255-basebackup.tar.gz",
      "basebackup_start": "2021-03-21T14:52:59Z",
      "daily": "true",
      "heritage": "heritagetest",
      "namespace": "integration",
      "component": "author-instance",
      "tier": "app"
    }

Querying Tags

The /list endpoint supports filtering backups by tags based on a query provided in the request. This is an example with POST (forms):

curl -v -H 'accept: application/json' localhost:9999/list -d 'query={daily="true"}' | jq

The query syntax follows closely the syntax from PromQL label matchers. An empty query ({}) returns all results unfiltered.

Note: The heritage tag acts as a filter before any other filters. Even if there are other backups with different heritage tags, they will not be shown since the tag functions as a strict separator between backup sets. If you don't need this functionality and prefer a global view, set the heritage tag to the same value for all of your backups or use the --ignore-heritage switch when starting the server.

Monitoring

In server mode this tool exposes a few metrics on the specified endpoint, the default go_.* and process_.* metrics as well as mgnlbackup_.* metrics like total backup size on object storage, backup duration, count and errors.

Examples

Triggering a single database backup job

$ curl localhost:9999/dump?sync=true
{"name":"myprefix2020-01-16-155805.sql.gz","size":1048919}

Querying and Uploading a New Bundle

$ curl -v -G -H 'accept: application/json' 'localhost:9999/bundle/now?mode=restore&upload&' --data-urlencode 'query={release="prod",pod_name="prod-magnolia-helm-public-db-0",namespace="gitlab"}'

Scrape metrics

$ curl -v http://localhost:9999/metrics | grep -C 2 mgnl
# TYPE go_threads gauge
go_threads 10
# HELP mgnlbackup_backups The total number of daily backups.
# TYPE mgnlbackup_backups gauge
mgnlbackup_backups{interval="daily"} 2
mgnlbackup_backups{interval="monthly"} 1
mgnlbackup_backups{interval="weekly"} 1
# HELP mgnlbackup_bytes Total byte size of all backups combined in target storage.
# TYPE mgnlbackup_bytes gauge
mgnlbackup_bytes 3.7760302e+07
# HELP mgnlbackup_errors Number of erroneous, not completed backup jobs.
# TYPE mgnlbackup_errors counter
mgnlbackup_errors 1
# HELP mgnlbackup_seconds Time taken for backup jobs.
# TYPE mgnlbackup_seconds summary
mgnlbackup_seconds_sum 77.9812096
mgnlbackup_seconds_count 3
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter

Getting the current backup list in JSON format

$ curl -H 'Accept: application/json' http://localhost:9999/list | jq
[
  {
    "link": "http://minio:9000/...",
    "name": "myprefix-2021-03-20-185909-basebackup.tar.gz",
    "last_modified": "2021-03-20T18:59:13.97Z",
    "size": 3856884,
    "daily": true,
    "weekly": true,
    "monthly": true,
    "tags": {
      "daily": "true",
      "heritage": "heritagetest",
      "monthly": "true",
      "weekly": "true"
    }
  },
  {
    "link": "http://minio:9000/...",
    "name": "myprefix-2021-03-20-185908-txlog.tar.gz",
    "last_modified": "2021-03-20T18:59:38.29Z",
    "size": 33339,
    "daily": true,
    "weekly": false,
    "monthly": false,
    "tags": {
      "basebackup_parent": "myprefix-2021-03-20-185909-basebackup.tar.gz",
      "basebackup_start": "2021-03-20T18:59:13Z",
      "daily": "true",
      "heritage": "heritagetest"
    }
  },
  {
    "link": "http://minio:9000/...",
    "name": "myprefix-2021-03-21-145255-basebackup.tar.gz",
    "last_modified": "2021-03-21T14:52:59.83Z",
    "size": 3856670,
    "daily": true,
    "weekly": false,
    "monthly": false,
    "tags": {
      "basebackup_parent": "myprefix-2021-03-20-185909-basebackup.tar.gz",
      "basebackup_start": "2021-03-20T18:59:13Z",
      "daily": "true",
      "heritage": "heritagetest",
      "namespace": "development"
    }
  },
  {
    "link": "http://minio:9000/...",
    "name": "myprefix-2021-03-21-145254-txlog.tar.gz",
    "last_modified": "2021-03-21T14:53:24.41Z",
    "size": 33329,
    "daily": true,
    "weekly": false,
    "monthly": false,
    "tags": {
      "basebackup_parent": "myprefix-2021-03-21-145255-basebackup.tar.gz",
      "basebackup_start": "2021-03-21T14:52:59Z",
      "daily": "true",
      "heritage": "heritagetest",
      "namespace": "development"
    }
  },
  {
    "link": "http://minio:9000/...",
    "name": "myprefix-2021-03-21-145404-basebackup.tar.gz",
    "last_modified": "2021-03-21T14:54:08.85Z",
    "size": 3856918,
    "daily": true,
    "weekly": false,
    "monthly": false,
    "tags": {
      "basebackup_parent": "myprefix-2021-03-21-145255-basebackup.tar.gz",
      "basebackup_start": "2021-03-21T14:52:59Z",
      "daily": "true",
      "heritage": "heritagetest",
      "namespace": "development",
      "pod_name": "mysuperpod-author"
    }
  }
]

Note: The output has been shortened for readability.

Contributing

To start a local environment with PostgreSQL and the server you're building do:

make build-docker up

Go to http://localhost:9999/ for testing.

Generating Dummy Data

In PostgreSQL (enter with docker exec -it docker-postgres-1 psql magnolia -U magnolia) use this to generate some dummy data:

CREATE TABLE public.employee (
    id int8 NOT NULL,
    name varchar(120) NOT NULL,
    salary int8 NOT NULL,
    CONSTRAINT emp_pk PRIMARY KEY (id)
);

WITH salary_list AS (
    SELECT '{1000, 2000, 5000}'::INT[] salary
)
INSERT INTO public.employee
(id, name, salary)
SELECT n, 'Employee ' || n as name, salary[1 + mod(n, array_length(salary, 1))]
FROM salary_list, generate_series(1, 1000000) as n;

(stolen from here 🤭)

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
api
archive Package archive untars a tarball to disk.	Package archive untars a tarball to disk.
protected
tags

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL