checkpoint-restore-operator
This is a Kubernetes operator which tries to help managing checkpoints.
Description
Kubernetes 1.25 introduced the possibility to create stateful checkpoints
of container.
One of the early questions asked was how to control how many checkpoint archives
will be on the local disk. Having too many checkpoint archives might not be
useful, depending on the use case, but it might lead to a situation where no
more disk space is left due to a high number of checkpoint archives or if the
checkpoint archives are really large.
With the help of this operator it is possible to limit the number of checkpoint
archives per namespace/pod/container. The maximum number of checkpoints per
namespace/pod/container combination is 10
but it can be changed.
Available Parameters
The operator has the following parameters which can be changed:
maxCheckpointsPerContainer
: how many checkpoint archives should be kept
on disk for a certain namespace/pod/container combination. Defaults to 10
checkpointDirectory
: the directory where checkpoint archives are created
and which should be watched for the correct number of checkpoint archives.
Defaults to kubelet's default checkpoint location /var/lib/kubelet/checkpoints
.
A sample file can be found at config/samples/_v1_checkpointrestoreoperator.yaml
for setting these parameters.
Getting Started
You’ll need a Kubernetes cluster to run against. You can use
KIND to get a local cluster for testing, or run
against a remote cluster.
Note: Your controller will automatically use the current context in your
*kubeconfig file (i.e. whatever cluster kubectl cluster-info
shows).
Running on the Cluster
-
Install Instances of Custom Resources:
make install
-
Build and push your image to the location specified by IMG
:
make docker-build docker-push IMG=<some-registry>/checkpoint-restore-operator:tag
-
Deploy the controller to the cluster with the image specified by IMG
:
make deploy IMG=<some-registry>/checkpoint-restore-operator:tag
Uninstall CRDs
To delete the CRDs from the cluster:
make uninstall
Undeploy Controller
UnDeploy the controller from the cluster:
make undeploy
Contributing
While bug fixes can first be identified via an "issue", that is not required.
It's ok to just open up a PR with the fix, but make sure you include the same
information you would have included in an issue - like how to reproduce it.
PRs for new features should include some background on what use cases the
new code is trying to address. When possible and when it makes sense, try to
break-up larger PRs into smaller ones - it's easier to review smaller
code changes. But only if those smaller ones make sense as stand-alone PRs.
Regardless of the type of PR, all PRs should include:
- well documented code changes;
- additional testcases: ideally, they should fail w/o your code change applied;
- documentation changes.
Squash your commits into logical pieces of work that might want to be reviewed
separate from the rest of the PRs. Ideally, each commit should implement a
single idea, and the PR branch should pass the tests at every commit. GitHub
makes it easy to review the cumulative effect of many commits; so, when in
doubt, use smaller commits.
PRs that fix issues should include a reference like Closes #XXXX
in the
commit message so that github will automatically close the referenced issue
when the PR is merged.
Contributors must assert that they are in compliance with the Developer
Certificate of Origin 1.1. This is achieved
by adding a "Signed-off-by" line containing the contributor's name and e-mail
to every commit message. Your signature certifies that you wrote the patch or
otherwise have the right to pass it on as an open-source patch.
Test It Out
-
Install the CRDs into the cluster:
make install
-
Run your controller (this will run in the foreground, so switch to a new terminal if you want to leave it running):
make run
NOTE: You can also run this in one step by running: make install run
Modifying the API Definitions
If you are editing the API definitions, generate the manifests such as CRs or CRDs using:
make manifests
NOTE: Run make --help
for more information on all potential make
targets
More information can be found via the Kubebuilder Documentation
License and Copyright
Unless mentioned otherwise in a specific file's header, all code in
this project is released under the Apache 2.0 license.
The author of a change remains the copyright holder of their code
(no copyright assignment). The list of authors and contributors can be
retrieved from the git commit history and in some cases, the file headers.