chaoskube

command module

v0.10.0 Latest Latest Go to latest Published: Aug 3, 2018 License: MIT Imports: 17 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/djboris9/chaoskube

Links

Open Source Insights

README ¶

chaoskube

chaoskube periodically kills random pods in your Kubernetes cluster.

Why

Test how your system behaves under arbitrary pod failures.

Example

Running it will kill a pod in any namespace every 10 minutes by default.

$ chaoskube
INFO[0000] starting up              dryRun=true interval=10m0s version=v0.9.0
INFO[0000] connecting to cluster    serverVersion=v1.9.3+coreos.0 master="https://kube.you.me"
INFO[0000] setting pod filter       annotations= labels= namespaces=
INFO[0000] setting quiet times      daysOfYear="[]" timesOfDay="[]" weekdays="[]"
INFO[0000] setting timezone         location=UTC name=UTC offset=0
INFO[0001] terminating pod          name=kube-dns-v20-6ikos namespace=kube-system
INFO[0601] terminating pod          name=nginx-701339712-u4fr3 namespace=chaoskube
INFO[1201] terminating pod          name=kube-proxy-gke-earthcoin-pool-3-5ee87f80-n72s namespace=kube-system
INFO[1802] terminating pod          name=nginx-701339712-bfh2y namespace=chaoskube
INFO[2402] terminating pod          name=heapster-v1.2.0-1107848163-bhtcw namespace=kube-system
INFO[3003] terminating pod          name=l7-default-backend-v1.0-o2hc9 namespace=kube-system
INFO[3603] terminating pod          name=heapster-v1.2.0-1107848163-jlfcd namespace=kube-system
INFO[4203] terminating pod          name=nginx-701339712-bfh2y namespace=chaoskube
INFO[4804] terminating pod          name=nginx-701339712-51nt8 namespace=chaoskube
...

chaoskube allows to filter target pods by namespaces, labels and annotations as well as exclude certain weekdays, times of day and days of a year from chaos.

How

You can install chaoskube with Helm. Follow Helm's Quickstart Guide and then install the chaoskube chart.

$ helm install stable/chaoskube

Refer to chaoskube on kubeapps.com to learn how to configure it and to find other useful Helm charts.

Otherwise use the following manifest as an inspiration.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: chaoskube
  labels:
    app: chaoskube
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: chaoskube
    spec:
      containers:
      - name: chaoskube
        image: quay.io/linki/chaoskube:v0.9.0
        args:
        # kill a pod every 10 minutes
        - --interval=10m
        # only target pods in the test environment
        - --labels=environment=test
        # only consider pods with this annotation
        - --annotations=chaos.alpha.kubernetes.io/enabled=true
        # exclude all pods in the kube-system namespace
        - --namespaces=!kube-system
        # don't kill anything on weekends
        - --excluded-weekdays=Sat,Sun
        # don't kill anything during the night or at lunchtime
        - --excluded-times-of-day=22:00-08:00,11:00-13:00
        # don't kill anything as a joke or on christmas eve
        - --excluded-days-of-year=Apr1,Dec24
        # let's make sure we all agree on what the above times mean
        - --timezone=Europe/Berlin
        # terminate pods for real: this disables dry-run mode which is on by default
        # - --no-dry-run

By default chaoskube will be friendly and not kill anything. When you validated your target cluster you may disable dry-run mode. You can also specify a more aggressive interval and other supported flags for your deployment.

If you're running in a Kubernetes cluster and want to target the same cluster then this is all you need to do.

If you want to target a different cluster or want to run it locally specify your cluster via the --master flag or provide a valid kubeconfig via the --kubeconfig flag. By default, it uses your standard kubeconfig path in your home. That means, whatever is the current context in there will be targeted.

If you want to increase or decrease the amount of chaos change the interval between killings with the --interval flag. Alternatively, you can increase the number of replicas of your chaoskube deployment.

Remember that chaoskube by default kills any pod in all your namespaces, including system pods and itself.

chaoskube provides a simple HTTP endpoint that can be used to check that it is running. This can be used for Kubernetes liveness and readiness probes. By default, this listens on port 8080. To disable, pass --metrics-address="" to chaoskube.

Filtering targets

However, you can limit the search space of chaoskube by providing label, annotation and namespace selectors.

$ chaoskube --labels 'app=mate,chaos,stage!=production'
...
INFO[0000] setting pod filter       labels="app=mate,chaos,stage!=production"

This selects all pods that have the label app set to mate, the label chaos set to anything and the label stage not set to production or unset.

You can filter target pods by namespace selector as well.

$ chaoskube --namespaces 'default,testing,staging'
...
INFO[0000] setting pod filter       namespaces="default,staging,testing"

This will filter for pods in the three namespaces default, staging and testing.

You can also exclude namespaces and mix and match with the label and annotation selectors.

$ chaoskube \
    --labels 'app=mate,chaos,stage!=production' \
    --annotations '!scheduler.alpha.kubernetes.io/critical-pod' \
    --namespaces '!kube-system,!production'
...
INFO[0000] setting pod filter       annotations="!scheduler.alpha.kubernetes.io/critical-pod" labels="app=mate,chaos,stage!=production" namespaces="!kube-system,!production"

This further limits the search space of the above label selector by also excluding any pods in the kube-system and production namespaces as well as ignore all pods that are marked as critical.

The annotation selector can also be used to run chaoskube as a cluster addon and allow pods to opt-in to being terminated as you see fit. For example, you could run chaoskube like this:

$ chaoskube --annotations 'chaos.alpha.kubernetes.io/enabled=true' --debug
...
INFO[0000] setting pod filter       annotations="chaos.alpha.kubernetes.io/enabled=true"
DEBU[0000] found candidates         count=0
DEBU[0000] no victim found

Unless you already use that annotation somewhere, this will initially ignore all of your pods (you can see the number of candidates in debug mode). You could then selectively opt-in individual deployments to chaos mode by annotating their pods with chaos.alpha.kubernetes.io/enabled=true.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  template:
    metadata:
      annotations:
        chaos.alpha.kubernetes.io/enabled: "true"
    spec:
      ...

Limit the Chaos

You can limit the time when chaos is introduced by weekdays, time periods of a day, day of a year or all of them together.

Add a comma-separated list of abbreviated weekdays via the --excluded-weekdays options, a comma-separated list of time periods via the --excluded-times-of-day option and/or a comma-separated list of days of a year via the --excluded-days-of-year option and specify a --timezone by which to interpret them.

$ chaoskube \
    --excluded-weekdays=Sat,Sun \
    --excluded-times-of-day=22:00-08:00,11:00-13:00 \
    --excluded-days-of-year=Apr1,Dec24 \
    --timezone=Europe/Berlin
...
INFO[0000] setting quiet times      daysOfYear="[Apr 1 Dec24]" timesOfDay="[22:00-08:00 11:00-13:00]" weekdays="[Saturday Sunday]"
INFO[0000] setting timezone         location=Europe/Berlin name=CET offset=1

Use UTC, Local or pick a timezone name from the (IANA) tz database. If you're testing chaoskube from your local machine then Local makes the most sense. Once you deploy chaoskube to your cluster you should deploy it with a specific timezone, e.g. where most of your team members are living, so that both your team and chaoskube have a common understanding when a particular weekday begins and ends, for instance. If your team is spread across multiple time zones it's probably best to pick UTC which is also the default. Picking the wrong timezone shifts the meaning of a particular weekday by a couple of hours between you and the server.

Flags

Option	Description	Default
`--interval`	interval between pod terminations	10m
`--labels`	label selector to filter pods by	(matches everything)
`--annotations`	annotation selector to filter pods by	(matches everything)
`--namespaces`	namespace selector to filter pods by	(all namespaces)
`--excluded-weekdays`	weekdays when chaos is to be suspended, e.g. "Sat,Sun"	(no weekday excluded)
`--excluded-times-of-day`	times of day when chaos is to be suspended, e.g. "22:00-08:00"	(no times of day excluded)
`--excluded-days-of-year`	days of a year when chaos is to be suspended, e.g. "Apr1,Dec24"	(no days of year excluded)
`--timezone`	timezone from tz database, e.g. "America/New_York", "UTC" or "Local"	(UTC)
`--dry-run`	don't kill pods, only log what would have been done	true

There are several other projects that allow you to create some chaos in your Kubernetes cluster.

kube-monkey is a sophisticated pod-based chaos monkey for Kubernetes. Each morning it compiles a schedule of pod terminations that should happen throughout the day. It allows to specify a mean time between failures on a per-pod basis, a feature that chaoskube lacks. It can also be made aware of groups of pods forming an application so that it can treat them specially, e.g. kill all pods of an application at once. kube-mokey allows filtering targets globally via configuration options as well allows pods to opt-in to chaos via annotations. It understands a similar configuration file used by Netflix's ChaosMonkey.
PowerfulSeal is indeed a powerful tool to trouble your Kubernetes setup. Besides killing pods it can also take out your Cloud VMs or kill your Docker daemon. It has a vast number of configuration options to define what can be killed and when. It also has an interactive mode that allows you to kill pods easily.
fabric8's chaos monkey: A chaos monkey that comes bundled as an app with fabric8's Kubernetes platform. It can be deployed via a UI and reports any actions taken as a chat message and/or desktop notification. It can be configured with an interval and a pod name pattern that possible targets must match.
k8aos: An interactive tool that can issue a series of random pod deletions across an entire Kubernetes cluster or scoped to a namespace.
pod-reaper kills pods based on an interval and a configurable chaos chance. It allows to specify possible target pods via a label selector and namespace. It has the ability successfully shutdown itself after a while and therefore might be suited to work well with Kubernetes Job objects. It can also be configured to kill every pod that has been running for longer than a configurable duration.
kubernetes-pod-chaos-monkey: A very simple random pod killer using kubectl written in a couple lines of bash. Given a namespace and an interval it kills a random pod in that namespace at each interval. Pretty much like chaoskube worked in the beginning.

Acknowledgements

This project wouldn't be where it is with the ideas and help of several awesome contributors:

Thanks to @twildeboer and @klautcomputing who sparked the idea of limiting chaos during certain times, such as business hours or holidays as well as the first implementations of this feature in #54 and #55.
Thanks to @klautcomputing for the first attempt to solve the missing percentage feature as well as for providing the RBAC config files.
Thanks to @j0sh3rs for bringing the Helm chart to the latest version.
Thanks to @klautcomputing, @grosser and @twz123 for improvements to the Dockerfile and docs in #31, #40 and #58.