README ¶
K8s Ephemeral Storage Metrics
A prometheus ephemeral storage metric exporter for pods, containers, nodes, and volumes.
This project was created to address lack of monitoring in Kubernetes
This project does not monitor CSI backed ephemeral storage ex. Generic ephemeral volumes
Helm Install
helm repo add k8s-ephemeral-storage-metrics https://jmcgrath207.github.io/k8s-ephemeral-storage-metrics/chart
helm repo update
helm upgrade --install my-deployment k8s-ephemeral-storage-metrics/k8s-ephemeral-storage-metrics
Values
Key | Type | Default | Description |
---|---|---|---|
affinity | object | {} |
|
containerSecurityContext.allowPrivilegeEscalation | bool | false |
|
containerSecurityContext.capabilities.drop[0] | string | "ALL" |
|
containerSecurityContext.privileged | bool | false |
|
containerSecurityContext.readOnlyRootFilesystem | bool | false |
|
containerSecurityContext.runAsNonRoot | bool | true |
|
deploy_type | string | "Deployment" |
Set as Deployment for single controller to query all nodes or Daemonset |
dev | object | {"enabled":false,"grow":{"image":"ghcr.io/jmcgrath207/k8s-ephemeral-storage-grow-test:latest","imagePullPolicy":"IfNotPresent"},"shrink":{"image":"ghcr.io/jmcgrath207/k8s-ephemeral-storage-shrink-test:latest","imagePullPolicy":"IfNotPresent"}} |
For local development or testing that will deploy grow and shrink pods and debug service |
image.imagePullPolicy | string | "IfNotPresent" |
|
image.imagePullSecrets | list | [] |
|
image.repository | string | "ghcr.io/jmcgrath207/k8s-ephemeral-storage-metrics" |
|
image.tag | string | "1.16.2" |
|
interval | int | 15 |
Polling node rate for exporter |
kubelet | object | {"insecure":false,"readOnlyPort":0,"scrape":false} |
Scrape metrics through kubelet instead of kube api |
log_level | string | "info" |
|
max_node_concurrency | int | 10 |
Max number of concurrent query requests to the kubernetes API. |
metrics | object | {"adjusted_polling_rate":false,"ephemeral_storage_container_limit_percentage":true,"ephemeral_storage_container_volume_limit_percentage":true,"ephemeral_storage_container_volume_usage":true,"ephemeral_storage_inodes":true,"ephemeral_storage_node_available":true,"ephemeral_storage_node_capacity":true,"ephemeral_storage_node_percentage":true,"ephemeral_storage_pod_usage":true,"port":9100} |
Set metrics you want to enable |
metrics.adjusted_polling_rate | bool | false |
Create the ephemeral_storage_adjusted_polling_rate metrics to report Adjusted Poll Rate in milliseconds. Typically used for testing. |
metrics.ephemeral_storage_container_limit_percentage | bool | true |
Percentage of ephemeral storage used by a container in a pod |
metrics.ephemeral_storage_container_volume_limit_percentage | bool | true |
Percentage of ephemeral storage used by a container's volume in a pod |
metrics.ephemeral_storage_container_volume_usage | bool | true |
Current ephemeral storage used by a container's volume in a pod |
metrics.ephemeral_storage_inodes | bool | true |
Current ephemeral inode usage of pod |
metrics.ephemeral_storage_node_available | bool | true |
Available ephemeral storage for a node |
metrics.ephemeral_storage_node_capacity | bool | true |
Capacity of ephemeral storage for a node |
metrics.ephemeral_storage_node_percentage | bool | true |
Percentage of ephemeral storage used on a node |
metrics.ephemeral_storage_pod_usage | bool | true |
Current ephemeral byte usage of pod |
metrics.port | int | 9100 |
Adjust the metric port as needed (default 9100) |
nodeSelector | object | {} |
|
podAnnotations | object | {} |
|
podSecurityContext.runAsNonRoot | bool | true |
|
podSecurityContext.seccompProfile.type | string | "RuntimeDefault" |
|
pprof | bool | false |
Enable Pprof |
priorityClassName | string | nil |
|
prometheus.enable | bool | true |
|
prometheus.release | string | "kube-prometheus-stack" |
|
prometheus.rules.enable | bool | false |
Create PrometheusRules firing alerts when out of ephemeral storage |
prometheus.rules.labels | object | {"severity":"warning"} |
What additional labels to set on alerts |
prometheus.rules.predictFilledHours | int | 12 |
How many hours in the future to predict filling up of a volume |
rbac | object | {"create":true} |
RBAC configuration |
serviceAccount | object | {"create":true,"name":null} |
Service Account configuration |
serviceMonitor | object | {"additionalLabels":{},"enable":true,"metricRelabelings":[],"podTargetLabels":[],"relabelings":[],"targetLabels":[]} |
Configure the Service Monitor |
serviceMonitor.additionalLabels | object | {} |
Add labels to the ServiceMonitor.Spec |
serviceMonitor.metricRelabelings | list | [] |
Set metricRelabelings as per https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.RelabelConfig |
serviceMonitor.podTargetLabels | list | [] |
Set podTargetLabels as per https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.ServiceMonitorSpec |
serviceMonitor.relabelings | list | [] |
Set relabelings as per https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.RelabelConfig |
serviceMonitor.targetLabels | list | [] |
Set targetLabels as per https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.ServiceMonitorSpec |
tolerations | list | [] |
Prometheus alert rules
To prevent from multiple kind of alerts being fired for a single container or
emptyDir volume when both prometheus.enable
and prometheus.rules.enable
are
on, add the following inhibition
rules
to your Alert Manager config:
- source_matchers:
- alertname="EphemeralStorageVolumeFilledUp"
target_matchers:
- severity="warning"
- alertname="EphemeralStorageVolumeFillingUp"
equal:
- pod_namespace
- pod_name
- volume_name
- source_matchers:
- alertname="ContainerEphemeralStorageUsageAtLimit"
target_matchers:
- severity="warning"
- alertname="ContainerEphemeralStorageUsageReachingLimit"
equal:
- pod_namespace
- pod_name
- exported_container
Contribute
Start minikube
make new_minikube
Run locally
make deploy_local
Run locally with Delve Debug
make deploy_debug
Then connect to localhost:30002
with delve or your IDE.
Run e2e Test
make deploy_e2e
Debug e2e
make deploy_e2e_debug
Then run a debug against deployment_test.go
License
This project is licensed under the MIT License. See the LICENSE
file for more details.
Click to show internal directories.
Click to hide internal directories.