This plugin generates metrics derived from the state of the following
Kubernetes resources:
- daemonsets
- deployments
- endpoints
- ingress
- nodes
- persistentvolumes
- persistentvolumeclaims
- pods (containers)
- services
- statefulsets
- resourcequotas
Kubernetes is a fast moving project, with a new minor release every 3 months.
As such, we will aim to maintain support only for versions that are supported
by the major cloud providers; this is roughly 4 release / 2 years.
This plugin supports Kubernetes 1.11 and later.
Series Cardinality Warning
This plugin may produce a high number of series which, when not controlled
for, will cause high load on your database. Use the following techniques to
avoid cardinality issues:
Global configuration options
In addition to the plugin-specific configuration settings, plugins support
additional global and plugin configuration settings. These settings are used to
modify metrics, tags, and field or create aliases and configure ordering, etc.
See the for more details.
# Read metrics from the Kubernetes api
## URL for the Kubernetes API.
## If empty in-cluster config with POD's service account token will be used.
# url = ""
## URL for the kubelet, if set it will be used to collect the pods resource metrics
# url_kubelet = ""
## Namespace to use. Set to "" to use all namespaces.
# namespace = "default"
## Node name to filter to. No filtering by default.
# node_name = ""
## Use bearer token for authorization. ('bearer_token' takes priority)
## Ignored if url is empty and in-cluster config is used.
## If both of these are empty, we'll use the default serviceaccount:
## at: /var/run/secrets/
## To auto-refresh the token, please use a file with the bearer_token option.
## If given a string, Telegraf cannot refresh the token periodically.
# bearer_token = "/var/run/secrets/"
## OR
## deprecated in 1.24.0; use bearer_token with a file
# bearer_token_string = "abc_123"
## Set response_timeout (default 5 seconds)
# response_timeout = "5s"
## Optional Resources to exclude from gathering
## Leave them with blank with try to gather everything available.
## Values can be - "daemonsets", deployments", "endpoints", "ingress",
## "nodes", "persistentvolumes", "persistentvolumeclaims", "pods", "services",
## "statefulsets"
# resource_exclude = [ "deployments", "nodes", "statefulsets" ]
## Optional Resources to include when gathering
## Overrides resource_exclude if both set.
# resource_include = [ "deployments", "nodes", "statefulsets" ]
## selectors to include and exclude as tags. Globs accepted.
## Note that an empty array for both will include all selectors as tags
## selector_exclude overrides selector_include if both set.
# selector_include = []
# selector_exclude = ["*"]
## Optional TLS Config
## Trusted root certificates for server
# tls_ca = "/path/to/cafile"
## Used for TLS client certificate authentication
# tls_cert = "/path/to/certfile"
## Used for TLS client certificate authentication
# tls_key = "/path/to/keyfile"
## Send the specified TLS server name via SNI
# tls_server_name = ""
## Use TLS but skip chain & host verification
# insecure_skip_verify = false
## Uncomment to remove deprecated metrics.
# fieldexclude = ["terminated_reason"]
Kubernetes Permissions
If using RBAC authorization, you will need to create a cluster role to
list "persistentvolumes" and "nodes". You will then need to make an aggregated
ClusterRole that will eventually be bound to a user or group.
kind: ClusterRole
name: influx:cluster:viewer
labels: "true"
- apiGroups: [""]
resources: ["persistentvolumes", "nodes"]
verbs: ["get", "list"]
kind: ClusterRole
name: influx:telegraf
- matchLabels: "true"
- matchLabels: "true"
rules: [] # Rules are automatically filled in by the controller manager.
Bind the newly created aggregated ClusterRole with the following config file,
updating the subjects as needed.
kind: ClusterRoleBinding
name: influx:telegraf:viewer
kind: ClusterRole
name: influx:telegraf
- kind: ServiceAccount
name: telegraf
namespace: default
Quickstart in k3s
When monitoring k3s server instances one can re-use already
generated administration token. This is less secure than using the more
restrictive dedicated telegraf user but more convenient to set up.
# replace `telegraf` with the user the telegraf process is running as
$ install -o telegraf -m400 /var/lib/rancher/k3s/server/token /run/telegraf-kubernetes-token
$ install -o telegraf -m400 /var/lib/rancher/k3s/server/tls/client-admin.crt /run/telegraf-kubernetes-cert
$ install -o telegraf -m400 /var/lib/rancher/k3s/server/tls/client-admin.key /run/telegraf-kubernetes-key
bearer_token = "/run/telegraf-kubernetes-token"
tls_cert = "/run/telegraf-kubernetes-cert"
tls_key = "/run/telegraf-kubernetes-key"
kubernetes node status status
The node status ready can mean 3 different values.
Tag value |
Corresponding field value |
Meaning |
ready |
0 |
NotReady |
ready |
1 |
Ready |
ready |
2 |
Unknown |
pv phase_type
The persistentvolume "phase" is saved in the phase
tag with a correlated
numeric field called phase_type
corresponding with that tag value.
Tag value |
Corresponding field value |
bound |
0 |
failed |
1 |
pending |
2 |
released |
3 |
available |
4 |
unknown |
5 |
pvc phase_type
The persistentvolumeclaim "phase" is saved in the phase
tag with a correlated
numeric field called phase_type
corresponding with that tag value.
Tag value |
Corresponding field value |
bound |
0 |
lost |
1 |
pending |
2 |
unknown |
3 |
Example Output
kubernetes_configmap,configmap_name=envoy-config,namespace=default,resource_version=56593031 created=1544103867000000000i 1547597616000000000
kubernetes_daemonset,daemonset_name=telegraf,selector_select1=s1,namespace=logging number_unavailable=0i,desired_number_scheduled=11i,number_available=11i,number_misscheduled=8i,number_ready=11i,updated_number_scheduled=11i,created=1527758699000000000i,generation=16i,current_number_scheduled=11i 1547597616000000000
kubernetes_deployment,deployment_name=deployd,selector_select1=s1,namespace=default replicas_unavailable=0i,created=1544103082000000000i,replicas_available=1i 1547597616000000000
kubernetes_node,host=vjain node_count=8i 1628918652000000000
kubernetes_node,condition=Ready,host=vjain,node_name=ip-172-17-0-2.internal,status=True status_condition=1i 1629177980000000000
kubernetes_node,cluster_namespace=tools,condition=Ready,host=vjain,node_name=ip-172-17-0-2.internal,status=True allocatable_cpu_cores=4i,allocatable_memory_bytes=7186567168i,allocatable_millicpu_cores=4000i,allocatable_pods=110i,capacity_cpu_cores=4i,capacity_memory_bytes=7291424768i,capacity_millicpu_cores=4000i,capacity_pods=110i,spec_unschedulable=0i,status_condition=1i 1628918652000000000
kubernetes_resourcequota,host=vjain,namespace=default,resource=pods-high hard_cpu=1000i,hard_memory=214748364800i,hard_pods=10i,used_cpu=0i,used_memory=0i,used_pods=0i 1629110393000000000
kubernetes_resourcequota,host=vjain,namespace=default,resource=pods-low hard_cpu=5i,hard_memory=10737418240i,hard_pods=10i,used_cpu=0i,used_memory=0i,used_pods=0i 1629110393000000000
kubernetes_persistentvolume,phase=Released,pv_name=pvc-aaaaaaaa-bbbb-cccc-1111-222222222222,storageclass=ebs-1-retain phase_type=3i 1547597616000000000
kubernetes_persistentvolumeclaim,namespace=default,phase=Bound,pvc_name=data-etcd-0,selector_select1=s1,storageclass=ebs-1-retain phase_type=0i 1547597615000000000
kubernetes_pod,namespace=default,node_name=ip-172-17-0-2.internal,pod_name=tick1 last_transition_time=1547578322000000000i,ready="false" 1547597616000000000
kubernetes_service,cluster_ip=,namespace=redis-cache-0001,port_name=redis,port_protocol=TCP,selector_app=myapp,selector_io.kompose.service=redis,selector_role=slave,service_name=redis-slave created=1588690034000000000i,generation=0i,port=6379i,target_port=0i 1547597616000000000
kubernetes_pod_container,condition=Ready,host=vjain,pod_name=uefi-5997f76f69-xzljt,status=True status_condition=1i 1629177981000000000
kubernetes_pod_container,container_name=telegraf,namespace=default,node_name=ip-172-17-0-2.internal,,pod_name=tick1,phase=Running,state=running,readiness=ready resource_requests_cpu_units=0.1,resource_limits_memory_bytes=524288000,resource_limits_cpu_units=0.5,restarts_total=0i,state_code=0i,state_reason="",phase_reason="",resource_requests_memory_bytes=524288000 1547597616000000000
kubernetes_statefulset,namespace=default,selector_select1=s1,statefulset_name=etcd replicas_updated=3i,spec_replicas=3i,observed_generation=1i,created=1544101669000000000i,generation=1i,replicas=3i,replicas_current=3i,replicas_ready=3i 1547597616000000000