repairman

command module

v0.0.0-...-8a6f205 Latest Latest Go to latest Published: Sep 30, 2019 License: Apache-2.0 Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/awesomenix/repairman

Links

Open Source Insights

README ¶

RepairMan

Coordinate infrastructure maintenance on kubernetes across multiple actors

Introduction

Infrastructure management has become core of Kubernetes eco system, example cluster-api project aiming to manage infrastructure lifecycle, node-problem detector aimed at detecting infrastructure problems and many more. Infrastructure maintenance never consider the application workloads and may cause more problems than issues it tends to solve. Having a maintenance coordinator makes sure that your infrastructure maintenance is safe by honoring not only infrastructure failing limits (X/Y nodes are allowed to be repaired), but also honoring application workload downtime.

Design

You can set a maintenance limit (e.g. 5%) which is the max allowable maintenance to occur at once cluster-wide.

We define 2 CRDs

MaintenanceLimit CustomResourceDefinition
- Defines cluster-wide maintenance limits for kubernetes Node resource.(v1)
- Optionally support Cordon and Drain, also support grace period after such operation(v1).
- Optionally support infra drains (example detach attached storage), also support grace period after such operation(v1).
- Define timeouts for Approved->InProgress, if the state isnt moved within X seconds it will reject the repair and allow something else(v1).
- Define timeouts for InProgress>Completed, if the state isnt moved within X seconds it will reject the repair and allow something else(v1).
- Defines cluster-wide maintenance limits for other kubernetes resource, e.g Deployment. Daemonset etc(v2).
- Defines optionally timeout between each operation(v2)
- Define configurable limits for certain label selectors(v2)
MaintenanceRequest CustomResourceDefinition
- Actor proposes a maintenance action for a specific resource (State=Pending)
- Actor waits for State=Approved, updates state to State=Pending.
- Once actor completed an action updates state to State=Completed.
- Maintenance operator looks at the list of requests for repairs and it either approves or delays the request for repair
- Before the request runs it double checks to ensure it is still safe to do so

Deploy

kubectl apply -f https://raw.githubusercontent.com/awesomenix/repairman/master/config/deployment/repairman-deployment.yaml

Integrations

DrainSafe

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
api
v1 Package v1 contains API Schema definitions for the repairman v1 API group +kubebuilder:object:generate=true +groupName=repairman.k8s.io	Package v1 contains API Schema definitions for the repairman v1 API group +kubebuilder:object:generate=true +groupName=repairman.k8s.io
client
controllers
test

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL