diagnostics/

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

OpenShift v3 Diagnostics

This is a tool to help administrators and users resolve common problems that occur with OpenShift v3 deployments. It will likely remain under continuous development as the OpenShift Origin project progresses.

The goals of the diagnostics tool are summarized in this Trello card. Diagnostics are included as an openshift binary sub-command that analyzes OpenShift as it finds it, whether from the perspective of an OpenShift client or on an OpenShift host.

Expected environment

OpenShift can be deployed in many ways: built from source, included in a VM image, in a Docker image, or as enterprise RPMs. Each of these would imply different configuration and environment. In order to keep assumptions about environment to a minimum, the diagnostics have been added to the openshift binary itself so that wherever there is an OpenShift server or client, the diagnostics can run in the exact same environment.

Diagnostics looks for config files in standard locations. If not found, related diagnostics are just skipped. Non-standard locations can be specified with flags.

Standard config file locations are:

Client:
- as indicated by --config flag
- as indicated by $KUBECONFIG env var
- ~/.kube/config file
Master:
- as indicated by --master-config flag
- /etc/openshift/master/master-config.yaml
Node:
- as indicated by --node-config flag
- /etc/openshift/node/node-config.yaml

Host environment

Master/node diagnostics will be most useful in a specific target environment, which is a deployment using RPMs and ansible deployment logic. This provides two major benefits:

master/node configuration is based on a configuration file in a standard location
all components log to journald

Having configuration files where ansible places them means you will generally not even need to specify where to find them. Running:

openshift ex diagnostics

by itself will look for master and node configs (in addition to client config file) in the standard locations and use them if found; so this should make the ansible-installed use case as simple as possible. It's also very easy to use configuration files when they are not in the expected Enterprise locations:

openshift ex diagnostics --master-config=... --node-config=...

Having logs in journald is necessary for the current log analysis logic. Other usage may have logs going into files, output to stdout, combined node/master... it may not be too hard to extend analysis to other log sources but the priority has been to look at journald logs as created by components in systemd-based deployments (including docker, openvswitch, etc.).

Client environment

The user may only have access as an ordinary user, as a cluster-admin user, and/or may be running on a host where OpenShift master or node services are operating. The diagnostics will attempt to use as much access as the user has available.

A client with ordinary access should be able to diagnose its connection to the master and look for problems in builds and deployments for the current context.

A client with cluster-admin access should be able to diagnose the status of infrastructure.

Writing diagnostics

Developers are encouraged to add to the available diagnostics as they encounter problems that are not easily communicated in the normal operations of the program, for example components with misconfigured connections, problems that are buried in logs, etc. The sanity you save may be your own.

A diagnostic is an object that conforms to the Diagnostic interface (see pkg/diagnostics/types/diagnostic.go). The diagnostic object should be built in one of the builders in the pkg/cmd/admin/diagnostics package (based on whether it depends on client, cluster-admin, or host configuration). When executed, the diagnostic logs its findings into a result object. It should be assumed that they may run in parallel.

Diagnostics should prefer providing information over perfect accuracy, as they are the first line of (self-)support for users. On the other hand, judgment should be exercised to prevent sending users down useless paths or flooding them with non-issues that obscure real problems.

Errors should be reserved for things that are almost certainly broken or causing problems, for example a broken URL.
Warnings indicate issues that may be a problem but could be valid for some configurations / situations, for example a node being disabled.

Message IDs

All messages should have a unique, unchanging, otherwise-meaningless message ID to facilitate the user greping for specific errors/warnings without having to depend on text that may change. Although nothing yet depends on them being unique, the message ID scheme attempts to ensure they are. That scheme is:

Initials of package + index of file in package + index of message in file

E.g. "DClu1001" is in package diagnostics/cluster (which needed to be differentiated from diagnostics/client), the first file indexed, and the first message in the file. This format is not important; it's just a convenience to help keep IDs unique. But don't change existing IDs.

Directories ¶

Path	Synopsis
client
cluster
aggregated_logging
host
log
network
networkpod
util
pod
systemd
types
util

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL