cost-manager
cost-manager is a collection of Kubernetes
controllers that automate cost
reductions for the cluster they are running on.
Controllers
Here we provide details of the various cost-manager controllers.
spot-migrator
Spot VMs are unused compute capacity that many cloud providers support access to at significantly
reduced costs (e.g. on GCP spot VMs provide a 60-91%
discount). Since spot VM availability
can fluctuate it is common to configure workloads to be able to run on spot VMs but to allow
fallback to on-demand VMs if spot VMs are unavailable. However, even if spot VMs are available, if
workloads are already running on on-demand VMs there is no reason for them to migrate.
To improve spot VM utilisation, spot-migrator periodically
attempts to migrate workloads from on-demand VMs to spot VMs by draining on-demand Nodes to force
cluster scale up, relying on the fact that the cluster autoscaler attempts to expand the least
expensive possible node
group,
taking into account the reduced cost of spot VMs. If an on-demand VM is added to the cluster then
spot-migrator assumes that there are currently no more spot VMs available and waits for the next
migration attempt (currently every hour) however if no on-demand VMs were added then spot-migrator
continues to drain on-demand VMs until there are no more left in the cluster (and all workloads are
running on spot VMs). Node draining respects
PodDisruptionBudgets to ensure
that workloads are migrated whilst maintaining desired levels of availability.
Currently only GKE
Standard clusters are
supported. To allow spot-migrator to migrate workloads to spot VMs with fallback to on-demand VMs
your cluster must be running at least one on-demand node pool and at least one spot node pool.
apiVersion: cost-manager.io/v1alpha1
kind: CostManagerConfiguration
controllers:
- spot-migrator
cloudProvider:
name: gcp
pod-safe-to-evict-annotator
Certain types of
Pods
can prevent the cluster autoscaler from removing a Node (e.g. Pods in the kube-system Namespace that
do not have a PodDisruptionBudget) leading to more Nodes in the cluster than necessary. This can be
particularly problematic for workloads that cluster operators are not in control of and can have a
high number of replicas, such as kube-dns or the Konnectivity
agent, which are typically
installed by cloud providers.
To allow the cluster autoscaler to evict all Pods that have not been explicitly marked as unsafe for
eviction, pod-safe-to-evict-annotator adds the
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
annotation to all Pods that have not
already been annotated; note that PodDisruptionBudgets can still be used to maintain desired levels
of availability.
apiVersion: cost-manager.io/v1alpha1
kind: CostManagerConfiguration
controllers:
- pod-safe-to-evict-annotator
podSafeToEvictAnnotator:
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- kube-system
Installation
You can install cost-manager into a GKE cluster with Workload
Identity enabled as
follows:
NAMESPACE="cost-manager"
kubectl get namespace "$NAMESPACE" || kubectl create namespace "$NAMESPACE"
LATEST_RELEASE_TAG="$(curl -s https://api.github.com/repos/hsbc/cost-manager/releases/latest | jq -r .tag_name)"
# GCP service account bound to the roles/compute.instanceAdmin role
GCP_SERVICE_ACCOUNT_EMAIL_ADDRESS="cost-manager@example.iam.gserviceaccount.com"
cat <<EOF > values.yaml
image:
tag: $LATEST_RELEASE_TAG
config:
apiVersion: cost-manager.io/v1alpha1
kind: CostManagerConfiguration
controllers:
- spot-migrator
- pod-safe-to-evict-annotator
cloudProvider:
name: gcp
podSafeToEvictAnnotator:
namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- kube-system
serviceAccount:
annotations:
iam.gke.io/gcp-service-account: $GCP_SERVICE_ACCOUNT_EMAIL_ADDRESS
EOF
helm template ./charts/cost-manager -n "$NAMESPACE" -f values.yaml | kubectl apply -f -
Testing
Build Docker image and run E2E tests using kind:
make image e2e
Contributing
Contributions are greatly appreciated. The project follows the typical GitHub pull request model.
See CONTRIBUTING.md for more details.