sloth

module
v0.11.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 3, 2022 License: Apache-2.0

README

sloth

Sloth

CI Go Report Card Apache 2 licensed GitHub release (latest SemVer) Kubernetes release OpenSLO

Introduction

Meet the easiest way to generate SLOs for Prometheus.

Sloth generates understandable, uniform and reliable Prometheus SLOs for any kind of service. Using a simple SLO spec that results in multiple metrics and multi window multi burn alerts.

https://sloth.dev

Features

  • Simple, maintainable and understandable SLO spec.
  • Reliable SLO metrics and alerts.
  • Based on Google SLO implementation and multi window multi burn alerts framework.
  • Autogenerates Prometheus SLI recording rules in different time windows.
  • Autogenerates Prometheus SLO metadata rules.
  • Autogenerates Prometheus SLO multi window multi burn alert rules (Page and warning).
  • SLO spec validation (including validate command for Gitops and CI).
  • Customization of labels, disabling different type of alerts...
  • A single way (uniform) of creating SLOs across all different services and teams.
  • Automatic Grafana dashboard to see all your SLOs state.
  • Single binary and easy to use CLI.
  • Kubernetes (Prometheus-operator) support.
  • Kubernetes Controller/operator mode with CRDs.
  • Support different SLI types.
  • Support for SLI plugins
  • A library with common SLI plugins.
  • OpenSLO support.
  • Safe SLO period windows for 30 and 28 days by default.
  • Customizable SLO period windows for advanced use cases.

Small Sloth SLO dashboard

Getting started

Release the Sloth!

sloth generate -i ./examples/getting-started.yml
version: "prometheus/v1"
service: "myservice"
labels:
  owner: "myteam"
  repo: "myorg/myservice"
  tier: "2"
slos:
  # We allow failing (5xx and 429) 1 request every 1000 requests (99.9%).
  - name: "requests-availability"
    objective: 99.9
    description: "Common SLO based on availability for HTTP request responses."
    labels:
      category: availability
    sli:
      events:
        error_query: sum(rate(http_request_duration_seconds_count{job="myservice",code=~"(5..|429)"}[{{.window}}]))
        total_query: sum(rate(http_request_duration_seconds_count{job="myservice"}[{{.window}}]))
    alerting:
      name: "MyServiceHighErrorRate"
      labels:
        category: "availability"
      annotations:
        # Overwrite default Sloth SLO alert summmary on ticket and page alerts.
        summary: "High error rate on 'myservice' requests responses"
      page_alert:
        labels:
          severity: "pageteam"
          routing_key: "myteam"
      ticket_alert:
        labels:
          severity: "slack"
          slack_channel: "#alerts-myteam"

This would be the result you would obtain from the above spec example.

Documentation

Check the docs to know more about the usage, examples, and other handy features!

SLI plugins

Looking for common SLI plugins? Check this repository, if you are looking for the sli plugins docs, check this instead.

Development and Contributing

Check CONTRIBUTING.md.

Directories

Path Synopsis
cmd
core
log
examples
pkg
kubernetes/gen/clientset/versioned
This package has the automatically generated clientset.
This package has the automatically generated clientset.
kubernetes/gen/clientset/versioned/fake
This package has the automatically generated fake clientset.
This package has the automatically generated fake clientset.
kubernetes/gen/clientset/versioned/scheme
This package contains the scheme of the automatically generated clientset.
This package contains the scheme of the automatically generated clientset.
kubernetes/gen/clientset/versioned/typed/sloth/v1
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
kubernetes/gen/clientset/versioned/typed/sloth/v1/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
prometheus/api/v1
Package v1 Example YAML spec with 2 SLOs: version: "prometheus/v1" service: "k8s-apiserver" labels: cluster: "valhalla" component: "kubernetes" slos: - name: "requests-availability" objective: 99.9 description: "Common SLO based on availability for Kubernetes apiserver HTTP request responses." sli: events: error_query: sum(rate(apiserver_request_total{code=~"(5..|429)"}[{{.window}}])) total_query: sum(rate(apiserver_request_total[{{.window}}])) alerting: name: K8sApiserverAvailabilityAlert labels: category: "availability" annotations: runbook: "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh" page_alert: labels: severity: critical ticket_alert: labels: severity: warning - name: "requests-latency" objective: 99 description: "Common SLO based on latency for Kubernetes apiserver HTTP request responses." sli: events: error_query: | ( sum(rate(apiserver_request_duration_seconds_count{verb!="WATCH"}[{{.window}}])) - sum(rate(apiserver_request_duration_seconds_bucket{le="0.4",verb!="WATCH"}[{{.window}}])) ) total_query: sum(rate(apiserver_request_duration_seconds_count{verb!="WATCH"}[{{.window}}])) alerting: name: K8sApiserverLatencyAlert labels: category: "latency" annotations: runbook: "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh" page_alert: labels: severity: critical ticket_alert: labels: disable: true
Package v1 Example YAML spec with 2 SLOs: version: "prometheus/v1" service: "k8s-apiserver" labels: cluster: "valhalla" component: "kubernetes" slos: - name: "requests-availability" objective: 99.9 description: "Common SLO based on availability for Kubernetes apiserver HTTP request responses." sli: events: error_query: sum(rate(apiserver_request_total{code=~"(5..|429)"}[{{.window}}])) total_query: sum(rate(apiserver_request_total[{{.window}}])) alerting: name: K8sApiserverAvailabilityAlert labels: category: "availability" annotations: runbook: "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh" page_alert: labels: severity: critical ticket_alert: labels: severity: warning - name: "requests-latency" objective: 99 description: "Common SLO based on latency for Kubernetes apiserver HTTP request responses." sli: events: error_query: | ( sum(rate(apiserver_request_duration_seconds_count{verb!="WATCH"}[{{.window}}])) - sum(rate(apiserver_request_duration_seconds_bucket{le="0.4",verb!="WATCH"}[{{.window}}])) ) total_query: sum(rate(apiserver_request_duration_seconds_count{verb!="WATCH"}[{{.window}}])) alerting: name: K8sApiserverLatencyAlert labels: category: "latency" annotations: runbook: "https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapilatencyhigh" page_alert: labels: severity: critical ticket_alert: labels: disable: true
prometheus/plugin/v1
package plugin has all the API to load prometheus plugins using Yaegi.
package plugin has all the API to load prometheus plugins using Yaegi.
test

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL