README
¶
EKSphemeral: The EKS Ephemeral Cluster Manager
Do not use in production. This is a service for development and test environments. Also, this is not an official AWS offering but something MH9 cooked up.
Managing EKS clusters for dev/test environments manually is boring. You have to wait until they're up and available and have to remember to tear them down again to minimize costs.
How about automate these steps? Meet EKSphemeral :)
EKSphemeral is a simple Amazon EKS manager for ephemeral dev/test clusters, allowing you to launch an EKS cluster with an automatic tear-down after a given time.
Architecture
EKSphemeral has a control plane implemented in an AWS Lambda/Amazon S3 combo, and as its data plane it is using eksctl running in AWS Fargate. There are four scripts, eksp-*.sh
allowing you to install/uninstall EKSphemeral and to create and query clusters. Overall, the architecture looks as follows:
- The
eksp-up.sh
script provisions EKSphemeral's control plane (Lambda+S3). This is a one-time action, think of it as installing EKSphemeral in your AWS environment. - Whenever you want to provision a throwaway EKS cluster, use
eksp-create.sh
. It will do two things: - Provision the cluster using
eksctl
running in Fargate (what we call the EKSphemeral data plane), and when that is completed, - Create an cluster spec entry in S3, via the
/create
endpoint of EKSphemeral's HTTP API. - Once that is done, you can use
eksp-list.sh
to list all managed clusters or, should you wish to gather more information on a specific cluster, useeksp-list.sh $CLUSTERID
to retrieve it. This script uses the/status
endpoint of EKSphemeral's HTTP API. - Every 5 minutes, there is a CloudWatch event that triggers the execution of another Lambda function called
DestroyClusterFunc
, which notifies the owners of clusters that are about to expire (send an email up to 5 minutes before the cluster is destroyed), and when the time comes, it tears the cluster down. - Last but not least, if you want to get rid of EKSphemeral, use the
eksp-down.sh
script, removing all cluster specs in the S3 bucket and deleting all three Lambda functions.
If you like, you can have a look at a 4 min video walkthrough, before you try it out yourself. Since the minimal time for an end-to-end provisioning and usage cycle is ca. 40min, the video walkthrough is showing a 1:10 time compression, roughly.
If you want to try it out yourself, follow the steps below.
Install
In order to use EKSphemeral, clone this repo, and make sure you've got jq
, the aws
CLI and the Fargate CLI installed.
Make sure to set the respective environment variables before you proceed. This is so that the install process knows which S3 bucket to use for the control plane's Lambda functions (EKSPHEMERAL_SVC_BUCKET
) and where to put the cluster metadata (EKSPHEMERAL_CLUSTERMETA_BUCKET
):
$ export EKSPHEMERAL_SVC_BUCKET=eks-svc
$ export EKSPHEMERAL_CLUSTERMETA_BUCKET=eks-cluster-meta
Create the S3 bucket for the Lambda functions like so:
$ aws s3api create-bucket \
--bucket $EKSPHEMERAL_SVC_BUCKET \
--create-bucket-configuration LocationConstraint=us-east-2 \
--region us-east-2
Create the S3 bucket for the cluster metadata like so:
$ aws s3api create-bucket \
--bucket $EKSPHEMERAL_CLUSTERMETA_BUCKET \
--create-bucket-configuration LocationConstraint=us-east-2 \
--region us-east-2
Now that we have the S3 buckets set up, let's move on to the service code.
The following assumes that the S3 buckets as outlined above have been set up and you have access to AWS configured, locally.
$ ./eksp-up.sh
Note that in order to receive mail notifications about cluster creation and destruction (optional feature), you MUST verify both source and target email address in the Ireland region.
Now, let's check if there are already clusters are managed by EKSphemeral:
$ ./eksp-list.sh
[]
Since we just installed EKSphemeral, there are no clusters, yet. Let's change that.
Use
Create clusters
Let's start off by creating a throwaway EKS cluster with the default values:
$ ./eksp-create.sh
Now, let's create a cluster named 2node-111-30
, using the EKSPHEMERAL_SG
security group, with two worker nodes, using Kubernetes version 1.11, with a 30 min timeout as defined in the example cluster spec file 2node-111-30.json:
$ cat svc/2node-111-30.json
{
"name": "2node-111-30",
"numworkers": 2,
"kubeversion": "1.11",
"timeout": 30,
"owner": "hausenbl+notif@amazon.com"
}
$ ./eksp-create.sh 2node-111-30.json $EKSPHEMERAL_SG
Note that both the security group and the cluster spec file are optional. If not present, the first security group of the default VPC and default-cc.json
will be used, as we had it in the first example.
Further, note that, if you want to receive notification emails, you must verify both the source and target email address in the Ireland region.
List clusters
Next, let's check what clusters are managed by EKSphemeral:
$ ./eksp-list.sh
["9be65bee-3baa-4fd0-aa3e-032325d5390c","dd72f73a-3457-4d4b-b997-08a2b376160b"]
Here, we get an array of cluster IDs back. We can use such a cluster ID as follows to look up the spec of a particular cluster:
$ ./eksp-list.sh dd72f73a-3457-4d4b-b997-08a2b376160b | jq
{
"name": "default-eksp",
"numworkers": 1,
"kubeversion": "1.12",
"timeout": 20,
"owner": "nobody@example.com"
}
Prolong cluster lifetime
When you get a notification that one of your clusters is about to shut down or really at any time
before it shuts down, you can prolong the cluster lifetime using the eksp-prolong.sh
script.
Let's say we want to keep the cluster with the ID 7a4aa952-9582-4d99-98a0-0ab1a4e56337
around
for 40 min longer (with a remaining cluster runtime of 2 min). Here's what you would do:
$ ./eksp-prolong.sh 7a4aa952-9582-4d99-98a0-0ab1a4e56337 40
Trying to set the TTL of cluster 7a4aa952-9582-4d99-98a0-0ab1a4e56337 to 42 minutes, starting now
Successfully prolonged the lifetime of cluster 7a4aa952-9582-4d99-98a0-0ab1a4e56337 for 40 minutes. New TTL is 42 min starting now!
$ ./eksp-list.sh 7a4aa952-9582-4d99-98a0-0ab1a4e56337 | jq
{
"name": "1node-112-10",
"numworkers": 1,
"kubeversion": "1.12",
"timeout": 42,
"owner": "hausenbl+notif@amazon.com"
}
Note that the prolong command updates the timeout
field of your cluster spec, that is, the cluster TTL is
counted from the moment you issue the prolong command, taking the remaining cluster runtime into account.
Uninstall
To uninstall EKSphemeral, use the following command. This will remove the control plane elements, that is, delete the Lambda functions and remove all cluster specs from the EKSPHEMERAL_CLUSTERMETA_BUCKET
S3 bucket:
$ ./eksp-down.sh
Note that the service code bucket and the cluster metadata bucket are still around after this. You can either manually delete them or keep them around, to reuse them later.
Development
To learn how to customize and extend EKSphemeral or simply toy around with it,see the dedicated development docs.