cops-vigilante

module

v1.1.3 Latest Latest Go to latest Published: Jul 3, 2023 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/conplementag/cops-vigilante

Links

Open Source Insights

README ¶

cops-vigilante

There are some problems in our day-to-day operations at Conplement AG which are simply hair pulling :)

CoreOps Vigilante project is an attempt to take matters in our own hands, and try to implement some automation or self-healing capabilities to such problems. "Tasks" which are performed via cops-vigilante are explained below in the Tasks section.

Installation

Installation is supported via Helm.

helm repo add cops-vigilante https://conplementag.github.io/cops-vigilante
helm repo update

helm install cops-vigilante cops-vigilante --namespace <<your-namespace>> --create-namespace --version v1.1.3

Tasks

AKS Windows nodes SNAT issue with complex networking setups (VPN, hub-spoke networks, etc.)

There is a SNAT issue regarding Windows nodes in AKS clusters, which is not fixed for a very long time now. To summarize, the issue is that in AzureCNI AKS networking configuration, Windows nodes always perform SNAT, meaning the pods sending packages will have their originating IPs replaced by the node IP. This should normally not be an issue (to have NAT-ing in a network), but the problem is that this does not work correctly (packages are dropped "randomly", performance issues due to port exhaustion etc.).

The only fix currently is to exclude certain CIDR ranges from this functionality. To do this, a certain 10-azure.conflist has to be modified and loaded, which can be done either via VM Extensions or host-process windows containers.

The problem is however, to apply this config, a new pod has to be scheduled on a node and there is no process which runs on the node to read this config in the background. Config is applied once kubelet is activated on pod scheduling, which in turn calls azurecni, which then loads the config. As the config creation / update does not occur atomically with node creation, we need a process which will schedule pod creations for some time, so that it is made sure the config is loaded at some point.

This SNAT task in cops-vigilante keeps a track of ready windows nodes, and schedules windows containers on them for ca. 30 minutes, after which the nodes are marked as "healed". Windows container used is as small as possible, currently we use the mcr.microsoft.com/oss/kubernetes/pause image, which is under 1 MB size!

If the node is considered healed, an annotation

cops-vigilante-snat-node-healed: "true"

will be added. Setting this annotation to "false", or removing it completely, will restart the healing process for this node.

General features

TLS via CertManager

To enable TLS for internal HTTP endpoint, you can use the Helm flag "create_certificates". Also make sure to set the "tls" option in the config section to true to load these certificates.

Prometheus integration via ServiceMonitor

Use the Helm flag "create_service_monitor" to enable the deplyoment of Prometheus Operator's ServiceMonitor resource. Metrics provided are prefixed with "cops_vigilante_".

Development

Check CONTRIBUTING.md

License

Check LICENSE

Directories ¶

Path	Synopsis
cmd
dev
vigilante
internal
vigilante
vigilante/cli
vigilante/clock
vigilante/clock/testing
vigilante/config
vigilante/errors
vigilante/http
vigilante/metrics
vigilante/scheduler
vigilante/services
vigilante/tasks
vigilante/tasks/snat
vigilante/tasks/snat/consts
vigilante/tasks/snat/metrics
vigilante/tasks/snat/testing
pkg
vigilante/helpers

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL