kubemedic

module

v0.0.0-...-86ee7c6 Latest Latest Go to latest Published: Feb 5, 2025 License: Apache-2.0, Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

KubeMedic - Safe Kubernetes Auto-Remediation

KubeMedic is a Kubernetes operator that safely automates common remediation tasks while protecting your cluster from unintended consequences.

Prerequisites

Required

Kubernetes cluster (v1.16+)

Metrics Server installed in your cluster

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Optional

Prometheus & Grafana for advanced metrics visualization
- KubeMedic works with the native Kubernetes metrics API by default
- Can be integrated with Prometheus for historical data and advanced querying
- Grafana dashboards available for visualization

Key Features

🛡️ Safe by Default

Protected system namespaces (kube-system, etc.)
Resource quotas and scaling limits
Automatic state backups before actions
Gradual scaling with automatic revert

🎯 Common Remediations

CPU/Memory-based scaling
Pod restart on high error rates
HPA limit adjustments
Temporary resource overrides

🔒 Built-in Safeguards

Maximum scale factor (2x by default)
Rate limiting and cooldown periods
Resource quota validation
Protected resources via labels

Quick Start

Install Metrics Server (if not already installed)

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Install KubeMedic

kubectl apply -f https://raw.githubusercontent.com/ikepcampbell/kubemedic/main/config/deploy/kubemedic.yaml

Create a Simple Policy

apiVersion: remediation.kubemedic.io/v1alpha1
kind: SelfRemediationPolicy
metadata:
  name: cpu-scaling
  namespace: my-app
spec:
  rules:
    - name: high-cpu-scale
      conditions:
        - type: PodCPUUsage    # Uses metrics-server directly
          threshold: "80"      # 80% CPU usage
          duration: "5m"
      actions:
        - type: ScaleUp
          target:
            kind: Deployment
            name: my-service
          scalingParams:
            temporaryMaxReplicas: 5
            scalingDuration: "30m"
            revertStrategy: "Gradual"

Monitoring Options

1. Basic Monitoring (Default)

Uses Kubernetes metrics API directly
Real-time metrics without historical data

View with kubectl:

kubectl top pods
kubectl get pods
kubectl describe selfremediationpolicy

2. Advanced Monitoring (Optional)

With Prometheus

# values.yaml
monitoring:
  prometheus:
    enabled: true
    serviceMonitor:
      enabled: true    # If using prometheus-operator
    rules:
      enabled: true    # Install default alerting rules

With Grafana

monitoring:
  grafana:
    enabled: true
    dashboards:
      enabled: true    # Install default dashboards

Testing KubeMedic

KubeMedic comes with comprehensive examples that include both policies and test applications. Each example is self-contained and includes step-by-step testing instructions.

Available Examples

CPU Scaling Test

# Apply the CPU scaling example
kubectl apply -f examples/cpu-scaling-with-test.yaml

# Follow the testing instructions in the file comments

Memory Scaling Test

# Apply the memory scaling example
kubectl apply -f examples/memory-scaling-with-test.yaml

# Follow the testing instructions in the file comments

Pod Restart Test

# Apply the pod restart example
kubectl apply -f examples/pod-restart-with-test.yaml

# Follow the testing instructions in the file comments

Monitoring Tests

Monitor your tests using standard Kubernetes tools:

# Watch pods and policies
kubectl get pods,selfremediationpolicy -w

# Monitor resource usage
kubectl top pods

# Check policy status
kubectl describe selfremediationpolicy

Safety Features

Protected Resources

metadata:
  labels:
    kubemedic.io/protected: "true"  # Prevents any remediation

Namespace Exclusion

metadata:
  labels:
    kubemedic.io/exclude: "true"  # Excludes namespace from remediation

Resource Limits

Maximum 2x scaling factor
Minimum 1 pod maintained
Maximum 2-hour remediation duration
Namespace quota validation

Configuration

values.yaml Highlights

rbac:
  # Namespace restrictions
  namespaceRestrictions:
    enabled: true
    denied: ["kube-system", "kube-public"]

  # Resource protection
  resourceRestrictions:
    enabled: true
    allowed: ["deployments", "statefulsets"]

remediation:
  # Safety limits
  safetyLimits:
    maxScaleFactor: 2
    minPods: 1
    maxScalingDuration: "2h"

monitoring:
  # Metrics source
  metricsSource: "kubernetes"  # or "prometheus"
  # Optional Prometheus integration
  prometheus:
    enabled: false
  # Optional Grafana integration
  grafana:
    enabled: false

Support

License

Apache License 2.0 - See LICENSE for details.

Directories ¶

Path	Synopsis
api
v1alpha1 Package v1alpha1 contains API Schema definitions for the remediation v1alpha1 API group +kubebuilder:object:generate=true +groupName=remediation.kubemedic.io	Package v1alpha1 contains API Schema definitions for the remediation v1alpha1 API group +kubebuilder:object:generate=true +groupName=remediation.kubemedic.io
cmd
internal
controller
pkg
webhook
test
utils

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL