Kee
Keep an Eye on Everything
What is this?
Kee is a tool I wrote to help investigate issues with backend systems. It is not a replacement for your monitoring tools (Grafana, CloudWatch, Datadog ...) but rather an addition. Kee is designed to run on your laptop, customised to your systems, and summarize their current overall health.
Kee reports on the status of a broad range of backends/platforms/providers, from version control (ie Github via the statuspage
probes) to your production app (http
probe), including cloud providers (aws_cloudwatch
alarms probe) and orchestrator (k8s
probe). The interface is mostly a simple text table that plainly shows what's wrong and what's fine.
Kee is for DevOps and SREs who love keyboard-navigable CLI tools configured with plaintext files.
Setup
- Install Go >= 1.18, and make sure to set your
$PATH
to include $GOPATH/bin
- Run
go install gitlab.com/mwwaa/kee@v1.0.0
- Create and edit the following file:
$HOME/.config/kee/config.hcl
. See below for a configuration example.
- Run
kee
Configuration example
This is a quickstart configuration example. Customise to your own use and taste.
probe "http" "m.w.fr" {
interval = "30s"
url = "https://maxime.walzberg.fr/"
search_string = "Maxime Walzberg"
}
probe "statuspage_status" "CloudFlare" {
interval = "30s"
base_url = "https://www.cloudflarestatus.com"
curious = true
}
probe "dns" "w.fr.mx" {
interval = "30s"
nameserver = "beau.ns.cloudflare.com:53"
name = "walzberg.fr."
record_type = "A"
expected_values = [
"172.67.160.104",
"104.21.65.86",
]
expect_all_values = true
}
probe "ping" "1111" {
address = "1.1.1.1"
interval = "30s"
}
# This will only work if you have a "Minikube" running
probe "k8s_service" "Minikube" {
interval = "10s"
context = "minikube"
# Choose namespaces relevant to you
namespaces = ["cats"]
}
# This will only work if you have AWS configuration and credentials setup, and have CloudWatch alarms configured
probe "aws_cloudwatch" "CloudWatch" {
interval = "1m"
}
Configuration reference
Syntax
Kee uses Hashicorp's HCL configuration language (notably used by Terraform). But instead of resources
and providers
blocks, Kee has its own configuration schemas.
Please note that, while Kee supports HCL expressions, it does not implement the same HCL functions than Terraform.
Attributes that are not marked optional
are required.
Location
On startup, kee
will attempt to load the configuration from the following places:
- Command line argument, ie
kee path/to/config.hcl
~/.config/kee/config.hcl
./kee.hcl
Variables
You can use variables in the configuration file, and set their values from command line options:
kee -v myvar myvalue
(you can use values true
, false
for booleans, base-10 integers, or strings).
And/or with a variables file:
kee -f path/to/vars.hcl
with vars.hcl
:
myvar = "myval"
myvar2 = 2
This makes it easy to configure identical probes for different environments (ie production, staging, dev...).
Available functions and constants
toset(list)
: turns a list into a set. It does not accept duplicate values or values of different types.
today()
: returns today's date in the following format: YYYY-MM-DD
.
field
refers to various fields of a status row:
field.id
field.description
field.severity
field.update
field.change
field.label
field.layer
severity
refers to the 4 possible severities of a status row:
* severity.ok
* severity.notice
* severity.warning
* severity.critical
layer
refers to the 4 possible layers of a status row:
layer.none
layer.infrastructure
layer.platform
layer.application
Durations
Durations/time intervals are expressed as strings, using Go's time.ParseDuration such as "30s"
or "1h30m"
.
Probes
A probe
block configures a probe provider to watch a system or resource and reports its status(es)/conditions.
Kee has several probe providers, listed below, each has its own specific attributes, much like Terraform's resource
.
Probe blocks also have common attributes:
# The first label of the block is the provider type; the second is a human readable label for it.
# The tuple (provider type, label) must be unique within the configuration.
# You can define multiple probe blocks per configuration.
probe "provider" "label" {
# optional, The inteval attribute is a duration that define how often the probe will check for a system or resources conditions.
interval = "30s"
# optional, the minimum_severity attribute is a severity that will replace the probe's reported severity when the condition reported is not OK.
# Use this if a probe doesn't present a condition as severe enough for critical resources in your system.
minimum_severity = severity.critical
# optional, the layer attribute assigns the statuses from the probe to the provided layer for improved search/filtering.
layer = layer.platform
}
Probe providers
Use the generated Go doc linked below (type Config) to find the probe-specific attributes:
- AWS:
aws_cloudwatch
: report on CloudWatch alarms reference
dns
: check DNS records values reference
http
: check responses to HTTP requests reference
- Kubernetes (all k8s probes use the same configuration structure):
k8s_daemonset
: report on the status of DaemonSets objects
k8s_deployment
: report on the status of Deployment objects
k8s_node
: report on the status of Node objects
k8s_pod
: report on the status of Pod objects
k8s_service
: report on the status of Service objects
ping
: send ICMP pings and monitor responses reference
- StatusPage API (for 3rd-parties that use Atlassian's StatusPage)
statuspage_status
: reports overall system status reference
statuspage_components
: reports on details components reference
Preferences
Preferences ("preferences" block) sets general preferences such as default severity filter, ordering and user interface refresh interval. Only zero or one preferences
block is allowed per configuration.
preferences {
# optional, the refresh_interval attribute sets how often the interface will be refreshed. Use duration values.
refresh_interval = "1s"
# optional, the refresh_on_update attribute will refresh the interface everytime new information is gathered by a probe, in addition to the refresh_interval, when set to true.
refresh_on_update = false
# optional, the default_filter attribute is a severity that will filter status rows on launch. This is merely a convenience and you can always update the filter while the app is running.
default_filter = severity.notice
# optional, the default_sort attribute is a field that will sort status rows on launch. Only fields currently accepted are severity, label and change. You can change ordering while the app is running.
default_sort = field.change
# optional, the display_time_for_statuses attribute, when set to true, display the time and (if needed) the date of changes and updates rather than a duration.
display_time_for_statuses = false
# optional, the log_file attributes will enable a JSON log file reporting all changes. This is intended for incident post-mortems.
log_file = ".../logs/kee/${var.env}/${today()}.json"
# optional, the minimum_error_severity attribute determines which severities count as errors (in summaries, like the count presented in the interface header or the JSON output from -t mode). It defaults to severity.notice.
minimum_error_severity = severity.notice
# A clock block configures a timezone location and display format for the clock in the header and for the Update and Change columns when display_time_for_statuses is set to true.
# You can define zero, one or multiple clocks and use k to switch between them while kee is running.
# The first label of the clock is a human readable label for the clock. For example, use "UTC" for a clock that displays the UTC time.
clock "UTC" {
# The location attribute is a name corresponding to a well known location on earth, for example "Europe/Paris".
# You can also use the special values "UTC" and "Local" for the UTC timezone and the configured local one respectively.
# See [time.LoadLocation].
#
# [time.LoadLocation]: https://pkg.go.dev/time#LoadLocation
location = "UTC"
# optional, the format attribute is the display format of the clock. Use the [time.Format] reference time to describe the expected format, ie "2006-01-02 15:04:05"
format = "15:04:05"
# optional, the status_format attribute is the display format for the Update and Change columns of a status row when the display_time_for_statuses attribute of the preferences block is set to true.
status_format = "15:04:05"
# optional, the status_day_format attribute is the same as above, but used when the change or update happened a different day than today.
status_day_format = "2006-01-02 15:04:05"
}
}
Theme
Theme ("theme" block) customizes Kee's appearence. Only zero or one theme
block is allowed per configuration.
Colors are defined using their string names, the reference list can be found here.
Nested *rule
blocks for status/error counts and status row colors can appear zero, one or more time where expected. They are evaluated in lexical order, the first rule block that matches the related evaluation context is used to set the color of the element. When condition
is omitted, the rule always evaluates to true (thus if the rule is the first in a list of rules, it will discard any other rule from ever being applied - but if it's the last rule in a list of rule, it will be the default rule, applying whenever no other rules matches).
theme {
# optional, the title attribute sets what is displayed on top of the header of the interface.
title = "My production system"
# optional, customize the looks of the top part of the interface
header {
# optional, background color
bg_color = "black"
# optional, text color
fg_color = "snow"
# optional, sets the status count colors depending on condition
status_count_rule {
# optional, when the provided HCL expression equals true, this rule is applied to the status count interface element.
# variables available in this ctx: ctx.statuses and ctx.errors (in addition to env, constants and functions)
condition = ctx.statuses < 50
# optional, background color when this rule applies
bg_color = "black"
# optional, text color when this rule applies
fg_color = "yellow"
}
# optional, sets the error count colors depending on condition
error_count_rule {
# optional, when the provided HCL expression equals true, this rule is applied to the error count interface element.
# variables available in this ctx: ctx.statuses and ctx.errors (in addition to env, constants and functions)
condition = ctx.errors > 1
# optional, background color when this rule applies
bg_color = "black"
# optional, text color when this rule applies
fg_color = "yellow"
}
# optional, sets the border color
border_color = "black"
# optional, sets the layer menu colors
layer_menu {
# optional, sets the text color
fg_color = "snow"
# optional, sets the text color for the selected item
selected_fg_color = "yellow"
# optional, sets the text color for the menu title
title_fg_color = "snow"
}
# optional, sets the severity menu colors
severity_menu {
# optional, sets the text color
fg_color = "snow"
# optional, sets the text color for the selected item
selected_fg_color = "yellow"
# optional, sets the text color for the menu title
title_fg_color = "snow"
}
# optional, sets the sort menu colors
sort_menu {
# optional, sets the text color
fg_color = "snow"
# optional, sets the text color for the selected item
selected_fg_color = "yellow"
# optional, sets the text color for the menu title
title_fg_color = "snow"
}
}
# optional, customize the looks of the status table
status_table {
# optional, background color for the table
bg_color = "black"
# optional, (external) border color for the table
border_color = "black"
# optional, background color for the header row
header_bg_color = "black"
# optional, text color for the header row
header_fg_color = "snow"
# optional, ordered list of columns to display (use the field constant)
columns = [
field.severity,
field.layer,
field.label,
field.description,
field.change,
field.update,
]
# optional, sets a status row colors depending on condition
rule {
# optional, when the provided HCL expression equals true, this rule is applied to the status row being displayed.
# variables available in this ctx: ctx.id, ctx.severity, ctx.layer, ctx.label, ctx.description, ctx.update_sec, ctx.change_sec
condition = ctx.severity == severity.critical
# optional, background color when this rule applies
bg_color = "black"
# optional, text color when this rule applies
fg_color = "red"
}
}
}