⚠ The tool is still not ready for real production use yet.
Slotalk is a CLI tool that allows developers to embed Sloth SLO/SLI specifications as in-code annotations rather than a YAML file.
Similar to how Swaggo does for Swagger docs, Slotalk moves the SLO/SLI specification closer to where its relevant Prometheus metric was defined.
Slotalk can be used in tandem with the Sloth CLI to generate Prometheus alerts groups from the in-code annotations, which can be used in any Prometheus/Grafana monitoring system to keep track of the service's SLOs. See examples below.
Table of Contents
Motivation
- Experimentation, this was the main motivation behind development, testing libraries like: go/ast, wazero, participle.
- Developer experience, finding ways to improve developer experience when it comes to more platform engineering concepts like SLIs and SLOs. I want to see if moving these concepts closer to devs,
would make them less of an afterthought.
- More Experimentation, many of the cloud native tools I've seen, and I've worked on have been very targeted towards DevOps/SecOps and Platform Engineering personas,
so I wanted try my hand on building something for developers.
- Trying ways to avoid writing YAML...
Prerequisites
Try it!
Nix
Generate Prometheus SLO alert rules from an example metrics.go.
# creates a nix demo shell with slotalk and sloth. just follow the shell instructions
nix develop github:tfadeyi/slotalk#demo
Source
Generate Prometheus SLO alert rules from an example metrics.go.
- Install Slotalk
# install the latest version of slotalk
go install github.com/tfadeyi/slotalk@latest
# install the latest version of sloth
go install github.com/slok/sloth/cmd/sloth@latest
- Run
slotalk
and sloth
to generate Prometheus alert rules from code annotations.
curl https://gist.githubusercontent.com/tfadeyi/df60aebd858d1c76428c045d4df7b114/raw/dfb96773dfb64086280845b9a0776012cbd7d26b/metrics.go > metrics.go
cat metrics.go | slotalk init -f - > ./sloth_defs.yaml
sloth generate -i ./sloth_defs.yaml -o ./rules.yml
You now should have Prometheus alerting rules that can be added to your Prometheus configuration.
Prometheus configuration
# my global config
global:
scrape_interval: 5s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 5s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "exporter"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9301"]
Installation
Go install
# install the latest version of slotalk
go install github.com/tfadeyi/slotalk@latest
Nix
nix run github:tfadeyi/slotalk
Pre-released binaries
Download a pre-compiled binary from the release page.
curl -LJO https://github.com/tfadeyi/slotalk/releases/download/v0.0.3/slotalk-linux-amd64.tar.gz && \
tar -xzvf slotalk-linux-amd64.tar.gz && \
cd slotalk-linux-amd64
Docker
docker pull ghcr.io/tfadeyi/slotalk:latest
Get Started
-
Add comments to your source code. See Declarative Comments.
-
Run slotalk
init in the project's root. This will parse your source code annotations and print the sloth definitions to standard out.
./slotalk init
You can also specify the specific file to parse by using the -f
flag.
./slotalk init -f metrics.go
Another way would be to pass the input file through pipe.
cat metrics.go | ./slotalk init -f -
CLI usage
Usage:
slotalk init [flags]
Flags:
--dirs strings Comma separated list of directories to be parses by the tool (default [/home/jetstack-oluwole/go/src/github.com/tfadeyi/slotalk])
-f, --file string Source file to parse.
--format strings Output format (yaml,json). (default [yaml])
-h, --help help for init
--lang string Language of the source files. (go) (default "go")
Global Flags:
--log-level string Only log messages with the given severity or above. One of: [none, debug, info, warn], errors will always be printed (default "info")
The Sloth definitions are added through declarative comments, as shown below.
// @sloth service chatgpt
// @sloth.slo name chat-gpt-availability
// @sloth.slo objective 95.0
// @sloth.sli error_query sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[{{.window}}])) OR on() vector(0)
// @sloth.sli total_query sum(rate(tenant_login_operations_total{client="chat-gpt"}[{{.window}}]))
// @sloth.slo description 95% of logins to the chat-gpt app should be successful.
// @sloth.alerting name ChatGPTAvailability
Service definitions
annotation |
description |
example |
service |
Required. The name of the service the definitions refer to. |
@sloth service chat-gpt |
version |
The version of the Sloth specification. |
@sloth version prometheus/v1 |
labels |
The labels associated to the Sloth service. |
@sloth labels foo bar @sloth labels test slo |
SLO definitions
annotation |
description |
example |
name |
Required. The name of the SLO. |
@sloth.slo name availability |
objective |
Required. The SLO Objective is target of the SLO the percentage (0, 100] (e.g 99.9). |
@sloth.slo objective 95.0 |
description |
Description is the description of the SLO. |
@sloth.slo description 95% of logins to the chat-gpt app should be successful annotations. (can be multilined) |
labels |
Labels are the Prometheus labels that will have all the recording and alerting rules for this specific SLO. These labels are merged with the previous level labels. |
@sloth.slo labels foo bar @sloth labels test slo |
Alerting definitions
Page Alerting definitions
annotation |
description |
example |
labels |
Labels are the Prometheus labels for the specific alert. For example can be useful to route the Page alert to specific Slack channel. |
@sloth.alerting.page labels severity critical |
annotations |
Annotations are the Prometheus annotations for the specific alert. |
@sloth.alerting.page annotations tier application |
Ticket Alerting definitions
annotation |
description |
example |
labels |
Labels are the Prometheus labels for the specific alert. For example can be useful to route the Page alert to specific Slack channel. |
@sloth.alerting.ticket labels severity critical |
annotations |
Annotations are the Prometheus annotations for the specific alert. |
@sloth.alerting.ticket annotations tier application |
Examples
Basic usage - Generate Sloth definitions using go:generate
The following example shows how to use go:generate
to generate Sloth definitions from in code annotations.
metrics.go
// @sloth service chatgpt
var (
// @sloth.slo name chat-gpt-availability
// @sloth.slo objective 95.0
// @sloth.sli error_query sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[{{.window}}])) OR on() vector(0)
// @sloth.sli total_query sum(rate(tenant_login_operations_total{client="chat-gpt"}[{{.window}}]))
// @sloth.slo description 95% of logins to the chat-gpt app should be successful.
// @sloth.alerting name ChatGPTAvailability
metricGaugeCertInventoryProcessingMessages = prometheus.NewGauge(
prometheus.GaugeOpts{
Namespace: "chatgpt",
Subsystem: "auth0",
Name: "tenant_login_operations_total",
})
tenantFailedLogins = prometheus.NewCounter(
prometheus.CounterOpts{
Namespace: "chatgpt",
Subsystem: "auth0",
Name: "tenant_failed_login_operations_total",
})
)
main.go
//go:generate slotalk init
package main
import (
)
// @sloth service chatgpt
func main() {
}
Running go generate, will allow the slotalk
to walk through the different packages parsing the in code annotations and
generate Sloth definitions.
go generate ./...
Result Sloth Definitions.
# Code generated by slotalk: https://github.com/tfadeyi/slotalk.
# DO NOT EDIT.
version: prometheus/v1
service: chatgpt
slos:
- name: chat-gpt-availability
description: 95% of logins to the chat-gpt app should be successful.
objective: 95
sli:
events:
error_query: sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[{{.window}}])) OR on() vector(0)
total_query: sum(rate(tenant_login_operations_total{client="chat-gpt"}[{{.window}}]))
alerting:
name: ChatGPTAvailability
Basic usage - Generate Prometheus alert groups from code annotations
This example shows how sloth's comments can be added next to the prometheus metrics defined in a metrics.go
file.
// @sloth service chatgpt
var (
// @sloth.slo name chat-gpt-availability
// @sloth.slo objective 95.0
// @sloth.sli error_query sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[{{.window}}])) OR on() vector(0)
// @sloth.sli total_query sum(rate(tenant_login_operations_total{client="chat-gpt"}[{{.window}}]))
// @sloth.slo description 95% of logins to the chat-gpt app should be successful.
// @sloth.alerting name ChatGPTAvailability
metricGaugeCertInventoryProcessingMessages = prometheus.NewGauge(
prometheus.GaugeOpts{
Namespace: "chatgpt",
Subsystem: "auth0",
Name: "tenant_login_operations_total",
})
tenantFailedLogins = prometheus.NewCounter(
prometheus.CounterOpts{
Namespace: "chatgpt",
Subsystem: "auth0",
Name: "tenant_failed_login_operations_total",
})
)
Now running the following command from the root of the project.
./slotalk init
This will generate the following sloth definitions being outputted to standard out.
version: prometheus/v1
service: "chatgpt"
slos:
- name: chat-gpt-availability
description: 95% of logins to the chat-gpt app should be successful.
objective: 95
sli:
events:
error_query: sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[{{.window}}])) OR on() vector(0)
total_query: sum(rate(tenant_login_operations_total{client="chat-gpt"}[{{.window}}]))
alerting:
name: "ChatGPTAvailability"
This specification can then be passed to the Sloth CLI to generate Prometheus alerting groups.
./slotalk init > sloth_defs.yaml && sloth generate -i sloth_defs.yaml
Resulting alert groups.
# Code generated by Sloth (v0.11.0): https://github.com/slok/sloth.
# DO NOT EDIT.
groups:
- name: sloth-slo-sli-recordings-foo-chat-gpt-availability
rules:
- record: slo:sli_error:ratio_rate5m
expr: |
(sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[5m])) OR on() vector(0))
/
(sum(rate(tenant_login_operations_total{client="chat-gpt"}[5m])))
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
sloth_window: 5m
- record: slo:sli_error:ratio_rate30m
expr: |
(sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[30m])) OR on() vector(0))
/
(sum(rate(tenant_login_operations_total{client="chat-gpt"}[30m])))
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
sloth_window: 30m
- record: slo:sli_error:ratio_rate1h
expr: |
(sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[1h])) OR on() vector(0))
/
(sum(rate(tenant_login_operations_total{client="chat-gpt"}[1h])))
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
sloth_window: 1h
- record: slo:sli_error:ratio_rate2h
expr: |
(sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[2h])) OR on() vector(0))
/
(sum(rate(tenant_login_operations_total{client="chat-gpt"}[2h])))
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
sloth_window: 2h
- record: slo:sli_error:ratio_rate6h
expr: |
(sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[6h])) OR on() vector(0))
/
(sum(rate(tenant_login_operations_total{client="chat-gpt"}[6h])))
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
sloth_window: 6h
- record: slo:sli_error:ratio_rate1d
expr: |
(sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[1d])) OR on() vector(0))
/
(sum(rate(tenant_login_operations_total{client="chat-gpt"}[1d])))
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
sloth_window: 1d
- record: slo:sli_error:ratio_rate3d
expr: |
(sum(rate(tenant_failed_login_operations_total{client="chat-gpt"}[3d])) OR on() vector(0))
/
(sum(rate(tenant_login_operations_total{client="chat-gpt"}[3d])))
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
sloth_window: 3d
- record: slo:sli_error:ratio_rate30d
expr: |
sum_over_time(slo:sli_error:ratio_rate5m{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"}[30d])
/ ignoring (sloth_window)
count_over_time(slo:sli_error:ratio_rate5m{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"}[30d])
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
sloth_window: 30d
- name: sloth-slo-meta-recordings-foo-chat-gpt-availability
rules:
- record: slo:objective:ratio
expr: vector(0.95)
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
- record: slo:error_budget:ratio
expr: vector(1-0.95)
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
- record: slo:time_period:days
expr: vector(30)
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
- record: slo:current_burn_rate:ratio
expr: |
slo:sli_error:ratio_rate5m{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"}
/ on(sloth_id, sloth_slo, sloth_service) group_left
slo:error_budget:ratio{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"}
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
- record: slo:period_burn_rate:ratio
expr: |
slo:sli_error:ratio_rate30d{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"}
/ on(sloth_id, sloth_slo, sloth_service) group_left
slo:error_budget:ratio{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"}
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
- record: slo:period_error_budget_remaining:ratio
expr: 1 - slo:period_burn_rate:ratio{sloth_id="foo-chat-gpt-availability", sloth_service="foo",
sloth_slo="chat-gpt-availability"}
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_service: foo
sloth_slo: chat-gpt-availability
- record: sloth_slo_info
expr: vector(1)
labels:
foo: bar
sloth_id: foo-chat-gpt-availability
sloth_mode: cli-gen-prom
sloth_objective: "95"
sloth_service: foo
sloth_slo: chat-gpt-availability
sloth_spec: prometheus/v1
sloth_version: v0.11.0
- name: sloth-slo-alerts-foo-chat-gpt-availability
rules:
- alert: K8sApiserverAvailabilityAlert
expr: |
(
max(slo:sli_error:ratio_rate5m{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"} > (14.4 * 0.05)) without (sloth_window)
and
max(slo:sli_error:ratio_rate1h{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"} > (14.4 * 0.05)) without (sloth_window)
)
or
(
max(slo:sli_error:ratio_rate30m{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"} > (6 * 0.05)) without (sloth_window)
and
max(slo:sli_error:ratio_rate6h{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"} > (6 * 0.05)) without (sloth_window)
)
labels:
sloth_severity: page
annotations:
summary: '{{$labels.sloth_service}} {{$labels.sloth_slo}} SLO error budget burn
rate is over expected.'
title: (page) {{$labels.sloth_service}} {{$labels.sloth_slo}} SLO error budget
burn rate is too fast.
- alert: K8sApiserverAvailabilityAlert
expr: |
(
max(slo:sli_error:ratio_rate2h{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"} > (3 * 0.05)) without (sloth_window)
and
max(slo:sli_error:ratio_rate1d{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"} > (3 * 0.05)) without (sloth_window)
)
or
(
max(slo:sli_error:ratio_rate6h{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"} > (1 * 0.05)) without (sloth_window)
and
max(slo:sli_error:ratio_rate3d{sloth_id="foo-chat-gpt-availability", sloth_service="foo", sloth_slo="chat-gpt-availability"} > (1 * 0.05)) without (sloth_window)
)
labels:
sloth_severity: ticket
annotations:
summary: '{{$labels.sloth_service}} {{$labels.sloth_slo}} SLO error budget burn
rate is over expected.'
title: (ticket) {{$labels.sloth_service}} {{$labels.sloth_slo}} SLO error budget
burn rate is too fast.
License
MIT, see LICENSE.md.