README ¶
Backup Sidecar
This Docker image is intended to be used as a sidecar to data base servers and take regular backups based on a schedule and upload it to a remote object storage.
How To Use
Prerequisites And Assumptions
Currently this image supports PostgreSQL, MySQL, MariaDB and MongoDB. Basically
it supports data bases where the installed versions of mysqldump
, mongodump
pg_dump/pg_dumpall
work.
Supported Targets
We tested with Minio S3 servers (edge), but it should really work with any S3-compatible service.
Also Google Cloud Storage (GCS) and Azure Blob Storage has been tested and is supported.
Note: Only one target type is supported as of now at the same time. GCS will
take precendece (and S3 is ignored) if the relevant *gcs*
flags/env vars have
been set. Azure will be configured if the account name has been set
(--az-account-name
) and GCS/S3 are not configured.
Setup & Configuration
Docker image:
docker pull registry.gitlab.com/mironet/magnolia-backup
Set these environment variables to get the image going in server mode (which is intended as the default).
environment:
MGNLBACKUP_ADDRESS=:9999 # Listen on this address for HTTP endpoints.
# S3-related config:
MGNLBACKUP_BUCKET=backup
MGNLBACKUP_S3_ACCESSKEY=minio
MGNLBACKUP_S3_SECRETKEY=minio123
MGNLBACKUP_S3_ENDPOINT=<s3-server>:9000
# GCS-related config:
MGNLBACKUP_GCS_PROJECTID=project-id
GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
# Cron config:
MGNLBACKUP_CRON=@every 24h # see https://godoc.org/github.com/robfig/cron for details
# Commands to get an SQL dump
MGNLBACKUP_CMD=pg_dumpall
MGNLBACKUP_ARGS=-h,localhost,-U,magnolia
Available Enviroment Variables
General Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_ADDRESS | localhost:9999 |
yes | localhost:9999 |
Listen on this address for HTTP endpoints (except /metrics ). |
MGNLBACKUP_METRICS_ADDRESS | :9997 |
yes | :9997 |
Listen on this address for the /metrics endpoint. |
MGNLBACKUP_CMD | pg_dumpall , mysqldump ,mongodump |
yes | The command which should be run | |
MGNLBACKUP_ARGS | --host localhost --user myuser |
yes | The commands which should be passed to the command. A working example: mysqldump: -h <HOST> -u <USER> -p<PASSWORD> <DATABASE> mongodump: --uri="mongodb://<USER>:<PASSWORD>@<HOST>:27017/<DATABASE>" --archive |
|
MGNLBACKUP_LOGLEVEL | error , warn , info , debug , trace |
no | info |
In order to enable verbosity and help debug. trace is the most verbose option. |
MGNLBACKUP_TAGS_MY_LABEL | string | no | If an env var starting with MGNLBACKUP_TAGS is present, the backup will have its values as additional tags. The name will be derived from the part after MGNLBACKUP_TAGS_ (lowercase). In this example my_label=<value from env var> . See the section about tags below. |
|
MGNLBACKUP_MAX_SWEEP_REP_INTERVAL | time.Duration | no | 1h |
The duration that must (at least) pass between two runs of the backup sweeping procedure. |
MGNLBACKUP_TX_LOG_SIZE_THRESHOLD | int | no | 16 |
Threshold for merged transaction log size (in GiB). If the size of the "to be merged" transaction log exceeds this threshold, it will not be merged. |
Cron Configuration
Backup cron job:
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_CRON | @every 24h |
yes | @every 24h |
See for more details robfig/cron. Casual cron commands like 0 * * * * are supported as well. |
Parts sweep cron job:
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_PARTS_SWEEP_SCHEDULE | 0 6 * * * (daily at 6 in the morning) |
yes | @every 10h |
See for more details robfig/cron. Casual cron commands like 0 * * * * are supported as well. |
MGNLBACKUP_PARTS_SWEEP_DELAY | time.Duration | yes | 48h |
Parts will not get swept as long as they are not older than this duration. |
MGNLBACKUP_PARTS_SWEEP_TIMEOUT | time.Duration | yes | 10h |
Timeout (max duration) for a single part sweep operation. |
Dump Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_DUMP_TIMEOUT | time.Duration | yes | 10h |
Timeout (max duration) for a single dump operation. |
General Object Storage Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_BUCKET | string | yes | The name of the bucket (S3, GCS) or container (Azure). | |
MGNLBACKUP_PREFIX | string | no | String to append before the date in object names on object storage. | |
MGNLBACKUP_CYCLE | string | no | 15,4,3 |
Backup retention cycle in the format [daily,weekly,monthly] |
MGNLBACKUP_KEEPDAYS | int | no | 0 |
Keep this many days of backups max. If keepdays is set to 0 (default), backups are kept forever. If keepdays is set to a positive integer i , this is the same as i,0,0 for MGNLBACKUP_CYCLE . Note: If a value is set for MGNLBACKUP_KEEPDAYS , it overwrites the value set for MGNLBACKUP_CYCLE |
PostgresSQL WAL archiving Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_USE_PG_WAL | true,false |
no | false |
Use PostgreSQL WAL archiving to object storage. |
MGNLBACKUP_SYNC_DIR | string | yes (if WAL archiving) | Directory to continuously sync to the cloud target (use for continuous archiving of WAL logs). | |
MGNLBACKUP_PG_DATA | string | no | /var/lib/postgresql/data |
Where postgres data is stored at. |
MGNLBACKUP_PG_NAME | string | no | postgres |
Data base name to connect to. |
MGNLBACKUP_PG_USER | string | no | postgres |
User to connect to db for pg_wal. |
MGNLBACKUP_PG_PASS | string | no | Password to connect to db for pg_wal. | |
MGNLBACKUP_PG_HOST | string | no | localhost:5432 |
Host to connect to db for pg_wal. |
MGNLBACKUP_TX_LOG_PATH | string | no | archive |
Path relative to $PGDATA where the tx logs are restored to. |
S3 Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_S3_ACCESSKEY | string | yes | A valid access key. In case of AWS create an API user | |
MGNLBACKUP_S3_SECRETKEY | string | yes | The secret for the given access key. | |
MGNLBACKUP_S3_ENDPOINT | string | yes | minio:9000 |
Endpoint might include a port, an example for AWS S3 s3.eu-central-1.amazonaws.com |
MGNLBACKUP_S3_INSECURE | true,false |
no | false |
If this is true, connect to the target without using TLS (!) |
MGNLBACKUP_S3_INSECURE_SKIP_VERIFY | true,false |
no | false |
If this is true, connect to the TLS target without checking the certificate presented. |
MGNLBACKUP_S3_REGION | string | no | us-east-1 |
The S3 region used. |
Google Cloud Storage (GCS) Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_GCS_PROJECTID | string | yes | You can find this ID in the Google Console. | |
MGNLBACKUP_GCS_LOCATION | string | no | EUROPE-WEST6 |
Region where to create the bucket if not yet present. |
MGNLBACKUP_GCS_LOCATION_TYPE | string | no | region |
Replication type (multi-region, region or dual-region). |
GOOGLE_APPLICATION_CREDENTIALS | string | yes | The path to the JSON file with the private key. |
Note: If you use GCS and run this as a Docker container, make sure to mount/copy the key file inside the container or use a KMS like Hashicorp Vault.
Azure Blob Storage Configuration
Variable | Values/Types | Required | Default | Description |
---|---|---|---|---|
MGNLBACKUP_AZ_ACCOUNT_NAME | string | yes | This is the storage account name as configured in Azure. | |
MGNLBACKUP_AZ_ACCOUNT_KEY | string | yes | The shared key to access the bucket (currently Azure AD Auth is not supported). | |
MGNLBACKUP_AZ_MERGE_CONCURRENCY | int | no | 8 |
Max concurrent requests sent to Azure API for Merge() operations. Does not influence Upload() operation concurrency. |
Retention policy parameters - a clarification attempt
The two object storage retention policy parameters MGNLBACKUP_KEEPDAYS
and
MGNLBACKUP_CYCLE
have often proven to be the cause of some confusion during
the configuration. And indeed the differences between the two can be a bit
subtle.
One must use MGNLBACKUP_CYCLE
when using the non
pg_wal
mode and MGNLBACKUP_KEEPDAYS
when using the pg_wal
mode. This is
because in the non pg_wal
mode, there exists the concept of daily
,
weekly
and monthly
backups. These are basically tags assigned to the backups
at creation time. The daily
tag will always be assigned. The weekly
&
monthly
tags will only be assigned if no backup has been taken for a week or
month, respectively. During the backup sweep process, backups will only be kept,
if they are no older than
- d × 24 hours for the
daily
tag, - w × 7 days for the
weekly
tag and - m × 1 month for the
monthly
tag
where d
is the first number in the retention cycle parameter, w
is the
second number and m
is the third number. For example the retention cylce
15,4,3
means that all daily backups are kept for at least 15 days, all weekly
backups are kept for at least 4 weeks and all monthly backups are kept for at
least 3 months. Note that if a backup has more than one of the three tags, all
respective conditions must be expired in order for the backup to be deleted. For
example, again with cycle 15,4,3
, a backup is not deleted after 5 weeks if it
has the weekly
and the monthly
tag. It will only be deleted once also the
monthly condition has expired (thus after 3 months).
For the pg_wal
mode the concept of daily
, weekly
and monthly
backups
does not exist. Instead the backups are "only" kept for a certain amount of
days. This amount of days is specified by the parameter MGNLBACKUP_KEEPDAYS
.
Notable Command Line Arguments
--readonly
When starting the server in read only mode it will not take any backups or
upload any objects with the exception of backup/restore bundles requested with
the /bundle
endpoint. This is useful if you need a "view" of all backups
currently in the bucket.
Features
After startup, the server tries to issue the command and args given to get a
(database/data) dump. It then uploads the output of this dump as a
gzip
-compressed file to the target bucket according to the cron schedule
given. Only one cron job can run at the same time.
Various endpoints of the server, when called, will dump the database in one way
or another. Note that only one dump can run at the same time. The server will
return 429 Too Many Requests
if a dump cannot be run because there is already
one running.
Environment variables are copied to the command being executed and expanded in any arguments passed to the command.
When a new backup is created a sweep of the older backups is performed to clean
out stale backups according to the MGNLBACKUP_CYCLE
configuration.
You can get a list of all dumps taken over time and download them directly.
Parts sweep cron job
If the backup server is NOT started in read-only mode, it will try to regularly perform a parts sweeping routine. Currently this routine is only implemented for s3 backup storages. For other storages, parts sweeping is skipped.
Parts are swept from s3 storage, if they are older than a certain parts sweep delay
-duration, have a variant
-tag set to tx_log_archiving
and have all the
tags defined by the environment variables of form
MGNLBACKUP_TAGS_MY_LABEL.
The part sweeping cron job can be configured using environment variables.
PostgreSQL WAL Archiving
In combination with PostgreSQL the mode can be switched to WAL archiving. It
provides better backup performance and point-in-time recovery on top of it.
Depending on the size of the data base it might even be impossible or
impractical to use pg_dump
as a backup plan. This mode is PostgreSQL-specific
and doesn't work with other data bases. It needs at least version 9+.
Mode of Operation
Instead of periodically backup up a fresh dump of the whole data base in this
mode a base backup is taken periodically according to MGNLBACKUP_CRON
. This
should be set to a large value like every month. This of course depends on the
volume the data base is seeing. Also with PostgreSQL this procedure seems not to
interfere too much with the source data base's performance, while pg_dump
uses
all resources to finish as fast as possible.
After that PostgreSQL is instructed to copy its transaction logs after reaching a certain amount of data (16 MiBs default). This file then is uploaded to the object storage and deleted from the monitored archive folder. To restore the base backup and all transaction logs up until the desired point in time to recover is needed.
Configuration
The data base needs to know how to copy the transaction log regularly. And we need to specify the monitored archive folder to pick up those logs after they have been copied. In the case of PostgreSQL the following is an example config:
MGNLBACKUP_USE_PG_WAL: "true" # Uses tx log shipping/archiving.
MGNLBACKUP_SYNC_DIR: "/archive"
In this case the monitored folder is /archive
which should be persistent in
case the upload is interrupted.
Naming & ULIDs
All objects (base backups, meta files and transaction logs) belonging to a base backup are uploaded to the same folder
<prefix>/<base backup id>
e.g.
myprefix/01FWKQADD3K9YY1C1W881A3ZXG
on the object storage.
The base backup id is a ULID (Universally Unique Lexicographically Sortable Identifier) that guarantees that
- each base backup has a unique ID.
- when base backups are sorted lexicographically, base backups that were taken longer ago are listed first.
- a timestamp, referring to the creation time of the base backup, can be parsed (using this tool) from the id.
Inside the base backup folder the base backup itself and the meta file are located in a basebackup
folder whereas the transaction logs are located in a txlog
folder, yielding e.g. the following files on the object storage:
myprefix/01FWKQADD3K9YY1C1W881A3ZXG/basebackup/01FWKQADD40ZXCF60RPB5VFWS0.tar.gz
myprefix/01FWKQADD3K9YY1C1W881A3ZXG/basebackup/01FWKQAP0C35PFD2E67QVC92HF-meta.tar.gz
myprefix/01FWKQADD3K9YY1C1W881A3ZXG/txlog/01FWKQBAM5F82E8SQ6ED9NCK2D.tar.gz
Note that each file is again named according to the ULID that refers to the creation time of the respective file.
Tags
All objects (base backups, meta files and transaction logs) belonging to a base backup are tagged with
backup_id=<base backup id>
and
expiry=<RFC3339 formated time>
Base backups are additionally taged with
is_basebackup=true
meta files with
is_basebackup_meta=true
and transaction logs with
is_txlog=true
HTTP Endpoints
There are a few endpoints you can use to fetch data / alter the configuration.
Location | Effect |
---|---|
/cache |
Send a DELETE request to this endpoint and the list cache will be cleared. |
/list |
Shows the backup list page above. If you send the Accept: application/json header in requests to this endpoint you get the same information as a JSON output. |
/list?query=<query> |
You can append a query matching tags. The syntax is similar to PromQL label matchers. See the section about querying tags below. |
/list?orderby=<tag key> |
orderby defines what tag is used for ordering. If orderby is not specified, the list is order by ULID whenever dir is specified. |
/list?dir=<direction> |
dir defines sorting direction (asc or desc ) of whathever has been specified by orderBy. |
/list?limit=<number of entries> |
limit limits the returned number of list entries. |
/dump |
Takes a single backup right now. Expected response is 202 Accepted or 200 OK , everything else is not ok :) |
/bundle/<RFC3339 time> |
Returns a list of files needed for a point in time restore. The point in time can be specified as an RFC3339 timestamp or just the word now (or nothing at all) for the current time. This endpoint only returns sensible results when using WAL archiving. |
/bundle/<RFC3339 time>?query=<query> |
Same as above but includes a query to select certain backup objects. This is useful for the --readonly mode. |
/bundle/<RFC3339 time>?mode=restore&upload&query=<query> |
Stores a JSON formatted bundle in object storage for restoration with the boot command and returns the direct download URL. |
/bundle/<RFC3339 time>?download&query=<query> |
Download a tarball which can be used for starting PostgreSQL locally. The recovery.conf file be autogenerated inline. |
/metrics |
Dumps metrics in Prometheus format. |
About Bundles
A backup bundle is made up of at least a single base backup, a respective base backup meta file and the following transaction logs. When you request a backup bundle with a specific timestamp the correct base backup and meta file will be selected and all transaction logs up to and including the desired timestamp.
In theory you can untar all the files listed and start PostgreSQL for a complete
restore. The use of mgnlbackup boot
with the --datasource
switch is
recommended though because it conveniently creates a recovery.conf
file for
PostgreSQL to know how to behave when starting from the restored files.
For example a request to /bundle/now
would yield:
[
{
"link": "https://storage.googleapis.com/...",
"name": "myprefix/01FWKQADD3K9YY1C1W881A3ZXG/basebackup/01FWKQADD40ZXCF60RPB5VFWS0.tar.gz",
"last_modified": "2022-02-23T16:41:16.32Z",
"size": 3923233,
"tags": {
"backup_id": "01FWKQADD3K9YY1C1W881A3ZXG",
"expiry": "",
"is_basebackup": "true",
"variant": "tx_log_archiving"
}
},
{
"link": "https://storage.googleapis.com/...",
"name": "myprefix/01FWKQADD3K9YY1C1W881A3ZXG/basebackup/01FWKQAP0C35PFD2E67QVC92HF-meta.tar.gz",
"last_modified": "2022-02-23T16:41:16.08Z",
"size": 269,
"tags": {
"backup_id": "01FWKQADD3K9YY1C1W881A3ZXG",
"expiry": "",
"is_basebackup_meta": "true",
"variant": "tx_log_archiving"
}
},
{
"link": "https://storage.googleapis.com/...",
"name": "myprefix/01FWKQADD3K9YY1C1W881A3ZXG/txlog/01FWKQBAM5F82E8SQ6ED9NCK2D.tar.gz",
"last_modified": "2022-02-23T16:41:37.34Z",
"size": 33307,
"tags": {
"backup_id": "01FWKQADD3K9YY1C1W881A3ZXG",
"expiry": "",
"is_basebackup": "false",
"is_txlog": "true",
"variant": "tx_log_archiving"
}
}
]
Note: The output has been shortened for readability (link
field). The empty
value of the expiry tag means that the respective file never expires and thus is
never deleted automatically. One can set MGNLBACKUP_KEEPDAYS
to change that.
This is interesting for informational usage (e.g. an API listing backup bundles).
You can ask for a bundle by setting /bundle/now?mode=restore
, which can be
used for point-in-time restores:
{
"backup_list": [
{
"link":"https://storage.googleapis.com/...",
"name": "myprefix/01FWNFGR2G2WH5B8A11B88H09P/basebackup/01FWNFGR2G2WH5B8A11CTVSYYW.tar.gz",
"last_modified": "2022-02-24T09:03:21.95Z",
"size": 3876614,
"tags": {
"backup_id": "01FWNFGR2G2WH5B8A11B88H09P",
"expiry": "",
"is_basebackup": "true",
"variant": "tx_log_archiving"
}
},
{
"link":"https://storage.googleapis.com/...",
"name": "myprefix/01FWNFGR2G2WH5B8A11B88H09P/basebackup/01FWNFGYG71ZF4J6T6JXTMKVWE-meta.tar.gz",
"last_modified": "2022-02-24T09:03:21.62Z",
"size": 268,
"tags": {
"backup_id": "01FWNFGR2G2WH5B8A11B88H09P",
"expiry": "",
"is_basebackup_meta": "true",
"variant": "tx_log_archiving"
}
},
{
"link":"https://storage.googleapis.com/...",
"name": "myprefix/01FWNFGR2G2WH5B8A11B88H09P/txlog/01FWNFHN9AQSH6FS2J7R37Q0SX.tar.gz",
"last_modified": "2022-02-24T09:03:45.12Z",
"size": 33266,
"tags": {
"backup_id": "01FWNFGR2G2WH5B8A11B88H09P",
"expiry": "",
"is_basebackup": "false",
"is_txlog": "true",
"variant": "tx_log_archiving"
}
}
],
"point_in_time": "2022-02-24T09:03:55.9444242Z"
}
By asking for /bundle/now?mode=restore&upload
you get a single pre-signed link
back which can be fed to mgnlbackup boot --datasource
directly (just the URL
in the link
field, not the whole JSON output).
{
"link": "https://storage.googleapis.com/..."
}
Note: The output has been shortened for readability.
This is useful for backup/restore automation.
About Tags
Every backup will be tagged by the lifecycler when uploaded. Tags are useful to
add information about backups like which k8s namespace they came from or which
deployment they are part of. Tags are key/value pairs (in Go a
map[string]string
).
A few default values will always be applied like interval markers daily
,
weekly
, monthly
(if applicable) and tags from the environment variables.
For example this environment variable list ...
MGNLBACKUP_TAGS_NAMESPACE=integration
MGNLBACKUP_TAGS_COMPONENT=author-instance
MGNLBACKUP_TAGS_TIER=app
... will result in the following tags on the object storage object:
"tags": {
"backup_id": "01FWNFGR2G2WH5B8A11B88H09P",
"expiry": "2022-05-24T15:20:13Z",
"is_basebackup": "true",
"variant": "tx_log_archiving",
"namespace": "integration",
"component": "author-instance",
"tier": "app"
}
Querying Tags
The /list
endpoint supports filtering backups by tags based on a query
provided in the request. This is an example with POST (forms):
curl -v -H 'accept: application/json' localhost:9999/list -d 'query={daily="true"}' | jq
The query syntax follows closely the syntax from PromQL label
matchers.
An empty query ({}
) returns all results unfiltered.
Meta Tags
All tags starting with __
are meta tags added by the system and not present in the object storage. This could be used to query by object name for example:
curl -v -H 'accept: application/json' localhost:9999/list -d 'query={__name=~"myprefix/author-.*"}' | jq
The only meta tag currently supported is __name
which always is the full
object name (with "folders").
Monitoring
In server mode this tool exposes a few metrics on the specified endpoint, the
default go_.*
and process_.*
metrics as well as mgnlbackup_.*
metrics like
total backup size on object storage, backup duration, count and errors.
Examples
Triggering a single database backup job
$ curl localhost:9999/dump?sync=true
{"name":"myprefix/01FWNMBK39W3GDSRJFQAN8RZ50/basebackup/01FWNMBK3AGJCSB145NC9QC03E","size":3870612}
Querying and Uploading a New Bundle
curl -v -G -H 'accept: application/json' 'localhost:9999/bundle/now?mode=restore&upload&' --data-urlencode 'query={release="prod",pod_name="prod-magnolia-helm-public-db-0",namespace="gitlab"}' | jq
Note: Mind the
-G
flag. Without it, the queried key-value pairs might be ignored! More about-G
here.
Scrape metrics
$ curl -v http://localhost:9997/metrics | grep -C 2 mgnl
# TYPE go_threads gauge
go_threads 8
# HELP mgnlbackup_backup_info Information about the current backup id.
# TYPE mgnlbackup_backup_info counter
mgnlbackup_backup_info{backup_id="01FWNMBK39W3GDSRJFQAN8RZ50"} 1
# HELP mgnlbackup_backups The total number of daily backups.
# TYPE mgnlbackup_backups gauge
mgnlbackup_backups{interval="daily"} 0
mgnlbackup_backups{interval="monthly"} 0
mgnlbackup_backups{interval="weekly"} 0
# HELP mgnlbackup_bytes Total byte size of all backups combined in target storage.
# TYPE mgnlbackup_bytes gauge
mgnlbackup_bytes 7.807669e+06
# HELP mgnlbackup_errors Number of erroneous, not completed backup jobs.
# TYPE mgnlbackup_errors counter
mgnlbackup_errors 0
# HELP mgnlbackup_seconds Time taken for backup jobs.
# TYPE mgnlbackup_seconds summary
mgnlbackup_seconds_sum 34.2809758
mgnlbackup_seconds_count 5
# HELP mgnlbackup_version_info Shows the current version of this program.
# TYPE mgnlbackup_version_info counter
mgnlbackup_version_info{version="v0.5.1-1-g9a9d89f"} 1
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
Getting the current backup list in JSON format
$ curl -H 'Accept: application/json' http://localhost:9999/list | jq
[
{
"link": "http://minio:9000/...",
"name": "myprefix/01FWNMAWPBNSZEP04TQTZK9W9G/basebackup/01FWNMAWPCSMFTPCWVKMEM3APC.tar.gz",
"last_modified": "2022-02-24T10:27:34.49Z",
"size": 3870299,
"tags": {
"backup_id": "01FWNMAWPBNSZEP04TQTZK9W9G",
"expiry": "",
"is_basebackup": "true",
"namespace": "development",
"pod_name": "mysuperpod-author",
"variant": "tx_log_archiving"
}
},
{
"link": "http://minio:9000/...",
"name": "myprefix/01FWNMAWPBNSZEP04TQTZK9W9G/basebackup/01FWNMB4M3GD4SYETCZRA56N2S-meta.tar.gz",
"last_modified": "2022-02-24T10:27:34.15Z",
"size": 269,
"tags": {
"backup_id": "01FWNMAWPBNSZEP04TQTZK9W9G",
"expiry": "",
"is_basebackup_meta": "true",
"namespace": "development",
"pod_name": "mysuperpod-author",
"variant": "tx_log_archiving"
}
},
{
"link": "http://minio:9000/...",
"name": "myprefix/01FWNMBK39W3GDSRJFQAN8RZ50/basebackup/01FWNMBK3AGJCSB145NC9QC03E.tar.gz",
"last_modified": "2022-02-24T10:27:57.27Z",
"size": 3870343,
"tags": {
"backup_id": "01FWNMBK39W3GDSRJFQAN8RZ50",
"expiry": "",
"is_basebackup": "true",
"namespace": "development",
"pod_name": "mysuperpod-author",
"variant": "tx_log_archiving"
}
},
{
"link": "http://minio:9000/...",
"name": "myprefix/01FWNMBK39W3GDSRJFQAN8RZ50/basebackup/01FWNMBTNPND08H4VG7TB36E77-meta.tar.gz",
"last_modified": "2022-02-24T10:27:56.85Z",
"size": 269,
"tags": {
"backup_id": "01FWNMBK39W3GDSRJFQAN8RZ50",
"expiry": "",
"is_basebackup_meta": "true",
"namespace": "development",
"pod_name": "mysuperpod-author",
"variant": "tx_log_archiving"
}
},
{
"link": "http://minio:9000/...",
"name": "myprefix/01FWNMBK39W3GDSRJFQAN8RZ50/txlog/01FWNMCQ6Q6WB4MQ72P6JPT3JD.tar.gz",
"last_modified": "2022-02-24T10:28:26.29Z",
"size": 66489,
"tags": {
"backup_id": "01FWNMBK39W3GDSRJFQAN8RZ50",
"expiry": "",
"is_basebackup": "false",
"is_txlog": "true",
"namespace": "development",
"pod_name": "mysuperpod-author",
"variant": "tx_log_archiving"
}
},
{
"link": "http://minio:9000/...",
"name": "myprefix/01FWNMBK39W3GDSRJFQAN8RZ50/txlog/01FWNMMZ58AV37XMVG3H61NTQV.tar.gz",
"last_modified": "2022-02-24T10:32:56.33Z",
"size": 16566,
"tags": {
"backup_id": "01FWNMBK39W3GDSRJFQAN8RZ50",
"expiry": "",
"is_basebackup": "false",
"is_txlog": "true",
"namespace": "development",
"pod_name": "mysuperpod-author",
"variant": "tx_log_archiving"
}
}
]
Note: The output has been shortened for readability.
Restore Bundle in local environment
To restore any Magnolia-Backup Bundle locally, you'll need to generate a valid Bundle URL first:
$ curl -s -G -H 'accept: application/json' 'localhost:9999/bundle/now?mode=restore&upload&' --data-urlencode 'query={release="dev",pod_name="dev-magnolia-helm-author-db-0",namespace="dev"}' | jq
{
"link": "https://mycompany-backup-bucket.s3.dualstack.us-west-2.amazonaws.com/-01GHY1Q3BPNP03E855DG1NSWS0-bundle.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA4[...]PNP03E855DG1NSWS0-bundle.json%22&X-Amz-Signature=009cf15cec3177cef773a1863e510621f77d93b1af1d527b6c88fb2ec404707b"
}
Attention: Mind encoding of the URL String! If you use cURL to generate it, you might need to replace the
\u0026
with Ampersand&
or other characters. Omit that with parsing it tojq
!
Set the BUNDLE_URL
to include the received link and run:
# Link anonymized ;)
$ export BUNDLE_URL="https://mycompany-backup-bucket.s3.dualstack.us-west-2.amazonaws.com/-01GHY1Q3BPNP03E855DG1NSWS0-bundle.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA4[...]PNP03E855DG1NSWS0-bundle.json%22&X-Amz-Signature=009cf15cec3177cef773a1863e510621f77d93b1af1d527b6c88fb2ec404707b"
# Start local restore
$ make up-restore
## Example Output
[...]
docker-postgres-1 | 2022-11-14 16:14:06.709 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
docker-postgres-1 | 2022-11-14 16:14:06.709 GMT [1] LOG: listening on IPv6 address "::", port 5432
docker-postgres-1 | 2022-11-14 16:14:06.711 GMT [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
docker-postgres-1 | 2022-11-14 16:14:06.718 GMT [41] LOG: database system was shut down at 2022-11-14 16:14:06 GMT
docker-postgres-1 | 2022-11-14 16:14:06.721 GMT [1] LOG: database system is ready to accept connections
docker-restore-1 | time="2022-11-14T16:14:07Z" level=info msg="synced 10 B (18 B/s) ..."
docker-restore-1 | time="2022-11-14T16:14:07Z" level=info msg="synced 201.8 KiB (201.8 KiB/s) ..."
docker-restore-1 | time="2022-11-14T16:14:08Z" level=info msg="synced 897.8 KiB (561.0 KiB/s) ..."
docker-restore-1 | time="2022-11-14T16:14:08Z" level=info msg="synced 3.6 MiB (1.8 MiB/s) ..."
docker-restore-1 | time="2022-11-14T16:14:09Z" level=info msg="synced 8.7 MiB (3.5 MiB/s) ..."
docker-restore-1 | time="2022-11-14T16:14:10Z" level=info msg="synced 9.6 MiB (2.7 MiB/s) ..."
docker-restore-1 | time="2022-11-14T16:14:11Z" level=info msg="synced 9.6 MiB (1.9 MiB/s) ..."
docker-restore-1 | time="2022-11-14T16:14:11Z" level=info msg="synced 9.6 MiB (1.8 MiB/s) ..."
docker-restore-1 | time="2022-11-14T16:14:12Z" level=info msg="synced 9.7 MiB (1.7 MiB/s) ..."
docker-restore-1 | time="2022-11-14T16:14:12Z" level=info msg="synced 9.7 MiB (1.6 MiB/s) ..."
docker-restore-1 | time="2022-11-14T16:14:13Z" level=info msg="generating recovery.conf file ..."
docker-restore-1 | time="2022-11-14T16:14:13Z" level=info msg="synced 10.1 MiB (1.6 MiB/s) ..."
docker-restore-1 | 2022/11/14 16:14:13 extracted tarball into /var/lib/postgresql/data/mydata: 1928 files, 27 dirs (6.497706045s)
docker-restore-1 | time="2022-11-14T16:14:13Z" level=info msg="created marker file /var/lib/postgresql/data/mydata/.mgnl-backup-bootstrapped"
docker-restore-1 | time="2022-11-14T16:14:13Z" level=info msg="👉 request took 7.580686878s"
After the restore you'll probably need to verify the markerfile, restart postgres and login to view the restored data (use credentials from backup source!) :
# Abort docker-compose if still running and restart
$ make up-restore
docker-compose -f docker/docker-compose-restore.yml up
[...]
Attaching to docker-perms-1, docker-postgres-1, docker-restore-1
docker-perms-1 | chown: ls /var/lib/postgresql/data/mydata: No such file or directory
docker-restore-1 | time="2022-11-14T16:31:49Z" level=debug msg="setting log level to debug"
docker-restore-1 | time="2022-11-14T16:31:49Z" level=info msg="running version v0.5.10"
docker-restore-1 | time="2022-11-14T16:31:49Z" level=info msg="marker file /var/lib/postgresql/data/mydata/.mgnl-backup-bootstrapped present, not syncing"
docker-restore-1 | time="2022-11-14T16:31:49Z" level=info msg="👉 request took 6.191583ms"
docker-perms-1 exited with code 1
docker-restore-1 exited with code 0
docker-postgres-1 |
docker-postgres-1 | PostgreSQL Database directory appears to contain a database; Skipping initialization
[...]
$ docker exec -it docker-postgres-1 du -hs /var/lib/postgresql/data/mydata/
62.6M /var/lib/postgresql/data/mydata/
$ docker exec -it docker-postgres-1 ls /var/lib/postgresql/data/mydata/.mgnl-backup-bootstrapped
/var/lib/postgresql/data/mydata/.mgnl-backup-bootstrapped
$ docker exec -it docker-postgres-1 psql -U magnolia -W
Password:
psql (11.9)
Type "help" for help.
magnolia=#
Note: Restore will only start, if the data-dir is empty to not accidentically overwrite production data! The log will throw:
level=warning msg="target dir /var/lib/postgresql/data/mydata not empty, not syncing"
You may need to clean your local environment with on a failed restore and startover:
make clean-restore up-restore
Multisource storage
When starting the server in read only mode, it is possible to specify multiple
object storage sources. This is useful e.g. when backups from more than one
storage should be listed with a single /list
request or when a /bundle
request should search for matching bundles in more than just one object storage.
To run the server in multisource storage mode, set the environment variable
MGNLBACKUP_MULTISOURCE
to true.
The different sources (i.e. "sub"-storages) are defined in one (or more) yaml
file(s). If it is one file, the path to that yaml file must be handed over to the
server using the environment variable MGNLBACKUP_MULTISOURCE_YAML_PATHS
.
The yaml file should be structured in the following way:
s3Confs:
<name of s3 object storage 1>:
endpoint: "<s3 endpoint>" # As in env var MGNLBACKUP_S3_ENDPOINT for a single source s3 storage.
bucket: "<s3 bucket name>" # As in MGNLBACKUP_BUCKET
region: "<s3 region>" # As in MGNLBACKUP_S3_REGION
accessKey: "<s3 access key>" # As in MGNLBACKUP_S3_ACCESSKEY
secretKey: "<s3 secret key>" # As in MGNLBACKUP_S3_SECRETKEY
insecure: <"true" or "false"> # As in MGNLBACKUP_S3_INSECURE (optional parameter), Note: Argument must be given as a string!
insecureSkipVerify: <"true" or "false"> # As in MGNLBACKUP_S3_INSECURE_SKIP_VERIFY (optional parameter), Note: Argument must be given as a string!
prefix: "<object prefix>" # As in MGNLBACKUP_PREFIX
<name of s3 object storage 2>:
endpoint: "<s3 endpoint>"
bucket: "<s3 bucket name>"
region: "<s3 region>"
accessKey: "<s3 access key>"
secretKey: "<s3 secret key>"
insecure: <"true" or "false">
insecureSkipVerify: <"true" or "false">
prefix: "<object prefix>"
...
...
...
<name of s3 object storage n>:
endpoint: "<s3 endpoint>"
bucket: "<s3 bucket name>"
region: "<s3 region>"
accessKey: "<s3 access key>"
secretKey: "<s3 secret key>"
insecure: <"true" or "false">
insecureSkipVerify: <"true" or "false">
prefix: "<object prefix>"
gcsConfs:
<name of gcs object storage 1>:
prefix: "<object prefix>" # As in MGNLBACKUP_PREFIX
bucket: "<gcs bucket name>" # As in MGNLBACKUP_BUCKET
projectID: "<gsc project id>" # As in MGNLBACKUP_GCS_PROJECTID
location: "<gsc location>" # As in MGNLBACKUP_GCS_LOCATION
locationType: "<gcs location type>" # As in MGNLBACKUP_GCS_LOCATION_TYPE
<name of gcs object storage 2>:
prefix: "<object prefix>"
bucket: "<gcs bucket name>"
projectID: "<gsc project id>"
location: "<gsc location>"
locationType: "<gcs location type>"
...
...
...
<name of gcs object storage n>:
prefix: "<object prefix>"
bucket: "<gcs bucket name>"
projectID: "<gsc project id>"
location: "<gsc location>"
locationType: "<gcs location type>"
azConfs:
<name of the az storage>:
prefix: "<object prefix>"
container: "<az container name>" # MGNLBACKUP_BUCKET
accountName: "<account name>"
accountKey: "<account shared access key>"
Note: When using gcs object storages in a multisource storage, remember to also set env var
GOOGLE_APPLICATION_CREDENTIALS
.
One can also omit a storage type completely. For example this would be a valid multisource storage configuration:
s3Confs:
store1:
endpoint: "minio1:9000"
bucket: "9fe4200c9f3630e1-backup"
region: "ap-southeast-1"
accessKey: "minio"
secretKey: "minio123"
insecure: "true"
prefix: "myprefix"
store2:
endpoint: "minio2:9002"
bucket: "9fe4200c9f3630e2-backup"
region: "ap-southeast-2"
accessKey: "minio2"
secretKey: "minio2123"
insecure: "false"
insecureSkipVerify: "true"
prefix: "myprefix2"
Actually this exact config could be used for local testing (see
multisource.yaml
in docker
directory).
A local docker testing environment can be started by:
make build-docker up-multi
Once the environment has started up one can access:
Component | Link | Credentials |
---|---|---|
minio 1 | http://localhost:9000 | User: minio, PW: minio123 |
minio 2 | http://localhost:9002 | User: minio2, PW: minio2123 |
backup server 1 | http://localhost:9999/list | |
backup server 2 | http://localhost:10001/list | |
multisource backup server | http://localhost:10003/list |
Use
make down-multi clean-multi
to stop and clean that environment.
More than one config yaml
The config yaml's can be split up into more than one yaml file. This can be useful e.g. in a k8s context, when you want to split the configs up into a config map file and a secret file.
Note: K8s config map data and secret data can both be mounted as a yaml file to a path accessible from the backup storage server.
To do so, structure the config map data and the secret data such that the whole yaml file content is in the value of the key-value pair (and the key can then be e.g. the yaml file name).
Refer to the config map doc and the secret doc for more detailed instructions. (The docs refer to pods, but it should work similarly for stateful sets.)
An arbitrary number of yaml files can be specified under
MGNLBACKUP_MULTISOURCE_YAML_PATHS
. The paths have to be separated by a
semicolon. The configs of all the files are merged and configs from later files
overwrite configs from earlier files.
For example the above config could be split up into a config map part (say
/multisource-cm.yaml
)
s3Confs:
store1:
endpoint: "minio1:9000"
bucket: "9fe4200c9f3630e1-backup"
region: "ap-southeast-1"
insecure: "true"
prefix: "myprefix"
store2:
endpoint: "minio2:9002"
bucket: "9fe4200c9f3630e2-backup"
region: "ap-southeast-2"
insecure: "true"
prefix: "myprefix2"
and a secret part (say /multisource-secret.yaml
)
s3Confs:
store1:
accessKey: "minio"
secretKey: "minio123"
store2:
accessKey: "minio2"
secretKey: "minio2123"
Then MGNLBACKUP_MULTISOURCE_YAML_PATHS
must be set to
/multisource-cm.yaml;/multisource-secret.yaml
to obtain the same configuration
as in the single config file example.
Contributing
To start a local environment with PostgreSQL and the server you're building do:
make build-docker up
Go to http://localhost:9999/ for testing.
Generating Dummy Data
In PostgreSQL (enter with docker exec -it docker_postgres_1 psql magnolia -U magnolia
) use this to generate some dummy data:
CREATE TABLE public.employee (
id int8 NOT NULL,
name varchar(120) NOT NULL,
salary int8 NOT NULL,
CONSTRAINT emp_pk PRIMARY KEY (id)
);
WITH salary_list AS (
SELECT '{1000, 2000, 5000}'::INT[] salary
)
INSERT INTO public.employee
(id, name, salary)
SELECT n, 'Employee ' || n as name, salary[1 + mod(n, array_length(salary, 1))]
FROM salary_list, generate_series(1, 1000000) as n;
(stolen from here 🤭)
Testing mTLS
Manually get a new download from a running backup server.
First start the server with make up
and then use this curl to test the mTLS part manually:
curl -v --cert testdata/tls/cert.pem --key testdata/tls/key.pem --cacert testdata/tls/ca.pem https://localhost:10000/hello
curl -v --cert testdata/tls/cert.pem --key testdata/tls/key.pem --cacert testdata/tls/ca.pem https://localhost:10000/download -o /tmp/download.tar.gz
NOTE: You can regenerate testing certificates (
make gen-certs
), but it should not be needed. Do not forget to check them in if you regenerated them because of some issue with the old ones though.
Testing Azure
You need access to a storage account (shared key) and provide the key via env vars. Call make
like this:
export AZURE_STORAGE_ACCOUNT_KEY=<supersecretkey>
make up-az AZURE_STORAGE_ACCOUNT_KEY=$AZURE_STORAGE_ACCOUNT_KEY
Txlog Merge
Txlogs of the active base backup are regularly being merged according to the algorithm explained in this section. This is done because we want to prevent slow listing requests caused by lots of small txlog objects. By merging these small objects into larger ones we decrease the likelihood of slow list requests.
The merge algorithm first sorts all txlogs of a base backup according to their creation date (such that the oldest txlog is the first in the list). Merged txlogs keep the creation date of the original txlog. Thus the first txlog in the list will always be the first txlog in the list, even if it's merged.
Note: Sorting by creation date (usually) also sorts the txlogs by their size (such that the largest txlog is the first in the list) (exception).
After that, txlogs are merged according to the following principles:
- A txlog is always merged with all subsequent txlogs in the list.
- If a txlog is not going to be at least twice the size it was before the merge, it is not being merged.
- If a txlog is going to exceed a configurable txlog size threshold (default 16 GiB), it is not being merged.
These checks are always performed starting from the first txlog in the list and (if required) they are repeated for subsequent txlogs in the list. Once the first txlog is reached for which conditions 2 & 3 are met, this txlog is then merged with all subsequent txlogs in the list and the txlog merge process is finished.
If there is no txlog in the list for which conditions 2 & 3 are met, then no txlogs are being merged.
To illustrate the extent to which the txlog merge algorithm can reduce the number of txlog objects, think about the case where we have e.g. ~8 GiB of txlogs (which is not unrealistic for busy databases). With a default postgres configuration a txlog can be at most 16 MiB in size. Let's say that all txlogs would have exatcly this maximum size. Then we would still have ~512 txlog objects of size 16 MiB to cover all ~8 GiB of txlogs.
Using the merge algorithm we would have much less than 512 objects. The first txlog must be ~4 GiB in size. Because if it wasn't, then it would have been merged into a bigger txlog (since we have ~8 GiB in total). The second txlog thus must be ~2 GiB in size using the same argument. And so on. Until we reach the smallest txlog which has at least 16 MiB in size.
So in the worst case we have 9 txlogs to cover ~8 GiB:
txlog 1 (~4GiB) -> txlog 2 (~2GiB) -> txlog 3 (~1GiB) -> txlog 4 (~512MiB) ->
txlog 5 (~256MiB) -> txlog 6 (~128MiB) -> txlog 7 (~64MiB) -> txlog 8 (~32MiB) ->
txlog 9 (16MiB)
This is only the worst case for ~8 GiB and we'd usually have even fewer than 9 txlogs to cover the ~8 GiB. Just see what happens if we add another txlog to the list above (and for the sake of the argument assume that all the sizes are exact now):
txlog 1 (4GiB) -> txlog 2 (2GiB) -> txlog 3 (1GiB) -> txlog 4 (512MiB) ->
txlog 5 (256MiB) -> txlog 6 (128MiB) -> txlog 7 (64MiB) -> txlog 8 (32MiB) ->
txlog 9 (16MiB) -> txlog 10 (16MiB)
Running the txlog merge algorithm on that list would result in
txlog 1 (8GiB)
because txlog 1 can double its size by merging with txlogs 2-10.
Addtional Examples
Example 1
txlog 1 (16MiB) -> txlog 2 (8MiB)
Nothing is being merged here because txlog 1 cannot double its size by merging with txlog 2.
Example 2
txlog 1 (16MiB) -> txlog 2 (8MiB) -> txlog 3 (8MiB)
Here txlog 1 can be merged with txlogs 2 & 3 to double its size. Thus after the merge we have:
txlog 1 (32 MiB)
Example 3
txlog 1 (1GiB) -> txlog 2 (16MiB) -> txlog 3 (8MiB) > txlog 4 (8MiB)
Here no new txlog 1 is being created because it couldn't become at least 2 GB in size. Instead txlog 2 is being merged with txlogs 3 & 4 because it can double its size to 32 MiB. Thus after the merge we have:
txlog 1 (1GiB) -> txlog 2 (32MiB)
Example 4
Assume we have a txlog size threshold of 16 GiB and the following list of txlogs:
txlog 1 (16GiB) -> txlog 2 (8GiB) -> txlog 3 (6GiB) -> txlog 4 (4GiB) -> txlog 5 (2GiB)
Here txlog 1 & txlog 2 cannot be merged because they would exceed the txlog size threshold. But txlog 3 can be merged with txlogs 4 & 5 because it can double its size to 12 GiB. Thus after the merge we have:
txlog 1 (16GiB) -> txlog 2 (8GiB) -> txlog 3 (12GiB)
Note: We now have txlogs that are no longer sorted by size. The case where the txlog size threshold is being employed, is the only case where the txlogs are not sorted by size after the merge. But this poses no problem because the size order is not a requirement for the txlog merge algorithm to work.
Documentation ¶
There is no documentation for this package.
Source Files ¶
- backups.go
- boot.go
- bytes.go
- cache.go
- chars.go
- cmd_boot.go
- cmd_dump.go
- cmd_middleware.go
- cmd_root.go
- db.go
- dump.go
- gitlab_variables.go
- io.go
- k8s.go
- lifecycle.go
- main.go
- metrics.go
- object_storage_conf.go
- pg_wal.go
- pitr.go
- server.go
- server_handlers.go
- server_sync.go
- server_webhook.go
- storage.go
- storage_azure.go
- storage_gcs.go
- storage_http.go
- storage_multi.go
- storage_s3.go
- treeprinter.go
- ulid.go
- webhook.go