controllers

package

v2.0.0 Latest Latest Go to latest Published: Apr 23, 2023 License: Apache-2.0 Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/echoutopia/chaos-mesh

Links

Open Source Insights

README ¶

Controller Design of Chaos Mesh

This document describes the common controller specification in Chaos Mesh. Although no "standard" should be considered as absolute requirements (and the real world is full of trade-off and corner case), they should be carefully considered when you are trying to add a new controller.

One controller per field

One field should only be "controlled" by at most one controller. In this chapter, multiple reasons will be listed for this design:

Avoid the hidden bugs

Multiple controllers modifying a single object could lead to a conflict situation (which is more like a global optimistic lock). The common way to solve conflict is to adapt the modification and retry. However, if multiple controllers want to modify a single field, how could they merge the conflict? What's more, it always leads to a hidden bug under the logic. Here is an example:

If you want to split "pause" and "duration" (the former common chaos) into two standalone controllers, let's try to describe the logic of them:

For the "pause" controller, when the annotation is added, the chaos should enter "not injected" mode, and when the annotation is removed, the chaos should enter "injected" mode.

For the "duration" controller, when the time exceeds the duration, the chaos should enter "not injected" mode.

Though these logics seem to be intuitive, there is a bug under the conflict "mode" (or the desiredPhase in the current code). What will happen if the user removes the annotation after the duration exceed? The chaos will enter "injected" and then turn into "not injected" mode (with the help of "duration" controller), which is dirty and confusing.

If we obey the "One field per controller" rule, then they should be combined into one controller and can never be split.

Handle the conflict in an easier way

After retrying the conflict error, we don't need to rerun the whole controller logic (as there may be some side effects in the controller). Instead, we could save the single field, and set the corresponding field after getting the new object. Which will give us more confidence in the retry attempting.

Controller should work standalone

The behavior of every controller should be defined carefully, and they should be able to work without other controllers. The behavior of the controller should also be simple and easy to understand. Try to conclude the action/logic of the controller in one hundred words, if you failed, please reconsider whether it should be "one" controller, but not two or more (or even split a new CustomResource).

Controller should be well documented

Every controller should be described with a "little"/"short" document.

Error Handling

According to the source code of controller-runtime:

// RunInformersAndControllers the syncHandler, passing it the namespace/Name string of the
// resource to be synced.
if result, err := c.Do.Reconcile(req); err != nil {
    c.Queue.AddRateLimited(req)
    log.Error(err, "Reconciler error", "controller", c.Name, "request", req)
    ctrlmetrics.ReconcileErrors.WithLabelValues(c.Name).Inc()
    ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, "error").Inc()
    return false
} else if result.RequeueAfter > 0 {
    // The result.RequeueAfter request will be lost, if it is returned
    // along with a non-nil error. But this is intended as
    // We need to drive to stable reconcile loops before queuing due
    // to result.RequestAfter
    c.Queue.Forget(obj)
    c.Queue.AddAfter(req, result.RequeueAfter)
    ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, "requeue_after").Inc()
    return true
} else if result.Requeue {
    c.Queue.AddRateLimited(req)
    ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, "requeue").Inc()
    return true
}

If the Reconcile return a Requeue without RequeueAfter, this request will be added to the RateLimitQueue. The default RateLimitQueue is constructured in this way:

// DefaultControllerRateLimiter is a no-arg constructor for a default rate limiter for a workqueue.  It has
// both overall and per-item rate limiting.  The overall is a token bucket and the per-item is exponential
func DefaultControllerRateLimiter() RateLimiter {
    return NewMaxOfRateLimiter(
        NewItemExponentialFailureRateLimiter(5*time.Millisecond, 1000*time.Second),
        // 10 qps, 100 bucket size.  This is only for retry speed and its only the overall factor (not per item)
        &BucketRateLimiter{Limiter: rate.NewLimiter(rate.Limit(10), 100)},
    )
}

So it's a good enough error back off without stopping the worker. When a controller meets a retriable error, the simplest way to handle it is returning a ctrl.Result{Requeue: true}, nil

Documentation ¶

Index ¶

Variables

Constants ¶

This section is empty.

Variables ¶

Functions ¶

This section is empty.

Types ¶

This section is empty.

Source Files ¶

View all Source files

fx.go

Directories ¶

Path	Synopsis
action Package action introduces a multiplexer for actions.	Package action introduces a multiplexer for actions.
chaosimpl
awschaos
awschaos/detachvolume
awschaos/ec2restart
awschaos/ec2stop
azurechaos
azurechaos/diskdetach
azurechaos/utils
azurechaos/vmrestart
azurechaos/vmstop
blockchaos
dnschaos
gcpchaos
gcpchaos/diskloss
gcpchaos/nodereset
gcpchaos/nodestop
gcpchaos/utils
httpchaos
httpchaos/podhttpchaosmanager
iochaos
iochaos/podiochaosmanager
jvmchaos
kernelchaos
networkchaos
networkchaos/partition
networkchaos/podnetworkchaosmanager
networkchaos/trafficcontrol
physicalmachinechaos
podchaos
podchaos/containerkill
podchaos/podfailure
podchaos/podkill
stresschaos
timechaos
types
utils
common
condition
desiredphase
finalizers
pipeline
records
config
multicluster
clusterregistry
remotechaos
remotechaosmonitor
remotecluster
remotepodreconciler
podhttpchaos
podiochaos
podnetworkchaos
ipset
iptable
netutils
tc
schedule
active
cron
gc
pause
utils
statuscheck
http
test
types
utils
builder
chaosdaemon
controller
recorder
test
test/manager
test/metrics

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL