etcd-diagnosis
Overview
etcd-diagnosis is a comprehensive tool for etcd diagnosis. It diagnoses running etcd clusters and generates a
report with just one command. It reuses most of the etcdctl
global flags, so users follow the same experience
as etcdctl
to use etcd-diagnosis
. See the complete flags below,
$ ./bin/etcd-diagnosis -h
An one-stop etcd diagnosis tool
Usage:
etcd-diagnosis [flags]
Flags:
--cacert string verify certificates of TLS-enabled secure servers using this CA bundle
--cert string identify secure client using this TLS certificate file
--cluster use all endpoints from the cluster member list
--command-timeout duration command timeout (excluding dial timeout) (default 5s)
--dial-timeout duration dial timeout for client connections (default 2s)
-d, --discovery-srv string domain name to query for SRV records describing cluster endpoints
--discovery-srv-name string service name to query when using DNS discovery
--endpoints strings comma separated etcd endpoints (default [127.0.0.1:2379])
--etcd-storage-quota-bytes int etcd storage quota in bytes (the value passed to etcd instance by flag --quota-backend-bytes) (default 2147483648)
-h, --help help for etcd-diagnosis
--insecure-discovery accept insecure SRV records describing cluster endpoints (default true)
--insecure-skip-tls-verify skip server certificate verification (CAUTION: this option should be enabled only for testing purposes)
--insecure-transport disable transport security for client connections (default true)
--keepalive-time duration keepalive time for client connections (default 2s)
--keepalive-timeout duration keepalive timeout for client connections (default 5s)
--key string identify secure client using this TLS key file
--password string password for authentication (if this option is used, --user option shouldn't include password)
--user string username[:password] for authentication (prompt if password is not supplied)
--version print the version and exit
Examples
It's pretty simple & straightforward. See the example below, it automatically diagnoses all the endpoints specified by
flag --endpoints
and output the diagnosis result to both standard output and the file "etcd_diagnosis_report.json"
(see example report)
under the current directory.
$ ./etcd-diagnosis --endpoints=https://10.0.1.10:2379,https://10.0.1.11:2379,https://10.0.1.12:2379 --cacert ./ca.crt --key ./etcd-diagnosis.key --cert ./etcd-diagnosis.crt
If the communication isn't protected by TLS (e.g. in dev environment), use a command something like below,
$ ./etcd-diagnosis --endpoints=http://10.0.1.10:2379,http://10.0.1.11:2379,http://10.0.1.12:2379
Design
It's simple: one generic diagnosis engine + extensible plugins. Each plugin performs a diagnosis, and implements the
Plugin
interface. Currently, there are 5 plugins, see table below,
Name |
Description |
membershipChecker |
It checks whethere each endpoint has the same member list |
epStatusChecker |
It checks each endpoint's status, and verify whether their status is consistent |
serializableReadChecker |
It checks each endpoint can serve serialiable read requests and the duration to serve a read request |
linearizableReadChecker |
It checks each endpoint can serve linearizable read requests and the duration to serve a read request |
metricsChecker |
It collects some prometheus metrics from each endpoint |
any else? |
You are welcome to contribute new plugins! |
Contributing
Any contribution (e.g. new plugins) is welcome!