terraflakes

command module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 29, 2020 License: MIT Imports: 9 Imported by: 0

README

Terraflakes

Repeatedly apply and destroy a Terraform environment with the goal to find flakes, treating the Terraform configuration as a black box.

Status: EARLY BETA code

Usage

  1. Run terraflakes:

    $ cd /path/to/terraform-environment
    $ terraflakes
    [time passes...]
    [terraflakes terminates]
    
  2. Verify by hand that everything is cleaned up; if not, recover accordingly:

    $ terraform plan
    
Recommendation

Since terraflakes uses terraform and is normally long-running, it makes sense to run it within a tmux session, so that in case the connection to the host is interrupted, terraflakes and terraform can keep going and eventually terminate gracefully.

Options

--repeat N          repeat the apply/destroy sequence N times (see also --max-duration) [default: 10]
--max-duration D    stop after duration D (upper limit to --repeat) [default: 1h]

Secrets handling

Terraflakes will call terraform, so probably you will have to make secrets available:

$ summon terraflakes ...
$ aws-vault exec PROFILE -- terraflakes ...
...

Interrupting Terraflakes

A single SIGINT (keyboard: Ctrl-C) or a SIGTERM sent to terraflakes will be received also by the underlying terraform. Both processes will perform a graceful shutdown.

A second SIGINT or SIGTERM will terminate immediately terraform, probably leaving an inconsistent state. Don't do that!

Although the shutdown is graceful, it doesn't mean that you will be left with a terraform destroyed workspace: do a terraform plan, assess the output and cleanup accordingly.

Safety first

Terraflakes attempt to have a as safe as possible default behavior, but it requires the user collaboration.

Choose wisely the value of --max-duration. It must not surpass the duration of the cloud credentials needed to perform terraform operations.

At the beginning of each cycle, terraflakes looks at --max-duration and enters the apply/destroy sequence only if it has enough time (based on the statistics obtained up to that moment) to run said sequence. This is useful to reduce (but not remove!) the risk of stale locks or inconsistent state when the credentials expire at mid-flight.

Examples

The examples have a parametric failure rate, by default 30% (see examples/tf/random-failure.sh).

Warnings

1. This tool calls terraform destroy

Double-check that you are invoking it in the correct directory and correct workspace. DO NOT RUN IN YOUR PRODUCTION ENVIRONMENT.

The recommended approach is to use a dedicated cloud account for testing (so that it is impossible for the tests and the apply/destroy to spill into production).

2. Leftover resources cost money

In case that the final terraform destroy is not called or fails in the middle (eg cloud credentials expire during terraform operation, bug in terraflakes, bug in something else), this tool will leave resources in your cloud, for which you will have to pay.

Consider a fallback mechanism to ensure no cloud resources are left around, such as a script that wraps terraflakes and sends an alarm in case of non-zero exit status.

There are failures that cannot be recovered by a terraform destroy, for example if you have shared state and the credentials expire in between, terraform will not release the state lock and punt to human intervention (force-unlock).

Build and install

  1. go build
  2. Copy the generated terraflakes executable to a directory in your $PATH.

License

This code is released under the MIT license.

References

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL