shimesaba

package module
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 15, 2021 License: MIT Imports: 23 Imported by: 0

README

Latest GitHub release Github Actions test Go Report Card License

shimesaba

For SRE to operate and monitor services using Mackerel.

Description

shimesaba is a tool for tracking SLO/ErrorBudget using Mackerel as an SLI measurement service.

  1. Get and aggregate Mackerel (host/service) metrics within the calculated period.
  2. Calculate the SLI from the metric obtained in step 1 and determine if it is an SLO violation in the rolling window.
  3. Calculate the time (minutes) of SLO violation within the time frame of the rolling window and calculate the error budget.
  4. Post the calculated error budget, failure time for SLO violation, etc. as Mackerel service metric.

Install

binary packages

Releases.

Homebrew tap
$ brew install mashiike/tap/shimesaba

Usage

as CLI command
$ shimesaba -config config.yaml -mackerel-apikey <Mackerel API Key> run
NAME:
   shimesaba - A commandline tool for tracking SLO/ErrorBudget using Mackerel as an SLI measurement service.

USAGE:
   shimesaba [global options] command [command options] [arguments...]

VERSION:
   current

COMMANDS:
   dashboard  manage mackerel dashboard for SLI/SLO
   run        run shimesaba. this is main feature
   help, h    Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --config value, -c value           config file path, can set multiple [$CONFIG, $SHIMESABA_CONFIG]
   --debug                            output debug log (default: false) [$SHIMESABA_DEBUG]
   --mackerel-apikey value, -k value  for access mackerel API (default: *********) [$MACKEREL_APIKEY, $SHIMESABA_MACKEREL_APIKEY]
   --help, -h                         show help (default: false)
   --version, -v                      print the version (default: false)
2021/11/14 23:29:45 [error] Required flag "config" not set

run command usage is follow

$ shimesaba run --help
NAME:
   main run - run shimesaba. this is main feature

USAGE:
   shimesaba -config <config file> run [command options]

OPTIONS:
   --dry-run         report output stdout and not put mackerel (default: false) [$SHIMESABA_DRY_RUN]
   --backfill value  generate report before n point (default: 3) [$BACKFILL, $SHIMESABA_BACKFILL]
   --help, -h        show help (default: false)
as AWS Lambda function

shimesaba binary also runs as AWS Lambda function. shimesaba implicitly behaves as a run command when run as a bootstrap with a Lambda Function

CLI options can be specified from environment variables. For example, when MACKEREL_APIKEY environment variable is set, the value is set to -mackerel-apikey option.

Example Lambda functions configuration.

{
  "FunctionName": "shimesaba",
  "Environment": {
    "Variables": {
      "SHIMESABA_CONFIG": "config.yaml",
      "MACKEREL_APIKEY": "<Mackerel API KEY>"
    }
  },
  "Handler": "shimesaba",
  "MemorySize": 128,
  "Role": "arn:aws:iam::0123456789012:role/lambda-function",
  "Runtime": "provided.al2",
  "Timeout": 300
}
Configuration file

YAML format.

required_version: ">=0.0.0"

metrics:
  - id: alb_p90_response_time
    name: custom.alb.response.time_p90
    type: host
    service_name: prod
    host_name: dummy-alb
    aggregation_interval: 1m
    aggregation_method: max
  - id: component_response_time
    name: component.dummy.response_time
    type: service
    service_name: prod
    aggregation_interval: 1m
    aggregation_method: avg

definitions:
  - id: latency
    service_name: prod 
    time_frame: 28d #4weeks
    calculate_interval: 1h
    error_budget_size: 0.001
    objectives:
      - expr: alb_p90_response_time <= 1.0
      - expr: component_response_time <= 1.0

dashboard: dashboard.jsonnet
required_version

the requied_version accepts a version constraint string, which specifies which versions of shimesaba can be used with your configuration.

metrics

The metrics accepts list of Mackerel metrics configure.
shimesaba gets the mackerel metric specified in this list.
The metrics described in this list can be found in the definitions settings described below. Each setting item in the list is as follows

id

Requied.
An identifier to refer to in definitions. Must be unique in the list

name

Requied.
Metric identifier on Mackerel

type

Requied.
The type of metric. Host metric must set host and service metric must set service.

service_name

Requied.
Specify the name of the service to which the metric belongs

roles

Optional, only type=host
Specifies the role when searching for hosts that are subject to host metrics.

host_name

Optional, only type=host
Specify the host name when searching for the host that is the target of host metrics.

aggregation_interval

Optional, default=1m It's time to aggregate the metrics. This is also the unit for determining SLO violations. For example, if you calculate SLI using a metric with an aggregation interval of 5 minutes, you will get an SLO violation check in 5 minute increments.

aggregation_method

Optional, default=max How to aggregate metrics. There are max, total, avg.

definitions

The definitions accepts list of SLI/SLO definition configure.
6 Mackerel service metrics are posted per definition.

For example, if id is latency, the following service metric will be posted.

  • shimesaba.error_budget.latency: Current error budget remaining number (unit:minutes)
  • shimesaba.error_budget_percentage.latency: percentage of current error budget remaining. If it exceeds 100%, the error budget is used up.
  • shimesaba.error_budget_consumption.latency: Error budget newly consumed in this calculation window (unit:minutes)
  • shimesaba.error_budget_consumption_percentage.latency: Percentage of newly consumed error budget in this calculation window
  • shimesaba.failure_time.latency: Time of SLO violation within the rolling window time frame (unit:minutes)
  • shimesaba.uptime.latency: Time that can be treated as normal operation within the time frame of the rolling window (unit:minutes)

Each setting item in the list is as follows

id

Requied.
The identifier of definition. Based on this identifier, the service metric masterpiece at the time of posting is determined.
Must be unique in the list.

service_name

Requied.
The service to which the service metric is posted

time_frame

Requied. The size of the time frame of the rolling window.
For example, if you specify 40320 minutes, the error budget will be calculated for the SLI for the last 4 weeks.

calculate_interval

Requied.

The shift width of the rolling window. Service metrics are posted to Mackerel at individually specified time intervals.
This width is recommended to be shorter than 1440 minutes (1 day) because Mackerel ignores postings of time stamp metrics before 24 hours *1.

*1 https://mackerel.io/ja/api-docs/entry/service-metrics#post

We recommend running sihmesaba every hour with calculate_interval set to 60 minutes (1 hour).

error_budget_size:

Requied.
Setting how much error budget should be taken with respect to the width of the time frame of the rolling window. For example, if time_frame is 40320 and you specify 0.001 (0.1%), the size of the error budget will be 40 minutes. This means that we will tolerate SLO violations of up to 40 minutes in the last 4 weeks.

objectives

Requied.
A list of specific SLO definitions. This is a list of expr. expr defines a Go syntax comparison expression. You can use the id specified in metrics like a variable. The right-hand side of the comparison must always be a numeric literal. If multiple expr are defined in the objectives, all must be true. If any of expr are false, it is a violation of SLO.

For example:
Assuming that you have obtained the metrics alb_2xx and alb_5xx, you can write the following comparison formula.

- expr: rate(alb_2xx, alb_2xx + alb_5xx) >= 0.95

rate() is a function prepared to safely execute division while avoiding division by zero. The meaning of this comparison formula is If the HTTP request rate is 95% or higher, the service is healthy.

Environment variable SSMWRAP_PATHS

It incorporates github.com/handlename/ssmwrap for parameter management.
If you specify the path of the Parameter Store of AWS Systems Manager separated by commas, it will be output to the environment variable.
Useful when used as a Lambda function.

Usage Dashboard subcommand.

This subcommand can only be used when acting as a CLI.
If the dashboard of the config file contains the dashboard definition file, you can manage the dashboard JSON using Go Template.

For example, you can build a simple dashboard by defining a json file like the one below.

dashboard.jsonnet

local errorBudgetCounter(x, y, def_id, title) = {
  type: 'value',
  title: title,
  layout: {
    x: x,
    y: y,
    width: 10,
    height: 5,
  },
  metric: {
    type: 'service',
    name: 'shimesaba.error_budget.' + def_id,
    serviceName: 'shimesaba',
  },
  graph: null,
  range: null,
  fractionSize: 0,
  suffix: 'min',
};
{
  title: 'SLI/SLO',
  urlPath: '4oequPJEwwd',
  memo: '',
  widgets: [
    errorBudgetCounter(0, 0, 'availability', ''),
    errorBudgetCounter(10, 0, 'latency', ''),
    {
      type: 'markdown',
      title: 'SLO Definitions',
      layout: {
        x: 20,
        y: 0,
        width: 5,
        height: 20,
      },
      markdown: '{{file `definitions.md` | json_escape }}',
    },
  ],
}

definitions.md

{{ range $def_id, $def := .Definitions }}
## SLO {{ $def_id }}

- TimeFrame      : {{ $def.TimeFrame }}
- ErrorBudgetSize: {{ $def.ErrorBudgetSizeDuration }}  


{{ range $def.Objectives }}
- {{ . }}
{{ end }}
{{ end }}

LICENSE

MIT

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrDashboardNotFound = errors.New("dashboard not found")

Functions

func BackfillOption

func BackfillOption(count int) func(*Options)

BackfillOption specifies how many points of data to calculate retroactively from the current time.

func DryRunOption

func DryRunOption(dryRun bool) func(*Options)

DryRunOption is an option to output the calculated error budget as standard without posting it to Mackerel.

func MetricsComparate added in v0.2.3

func MetricsComparate(c evaluator.Comparator, metrics Metrics, startAt, endAt time.Time) (map[time.Time]bool, error)

Types

type App

type App struct {
	// contains filtered or unexported fields
}

App manages life cycle

func New

func New(apikey string, cfg *Config) (*App, error)

New creates an app

func NewWithMackerelClient

func NewWithMackerelClient(client MackerelClient, cfg *Config) (*App, error)

NewWithMackerelClient is there to accept mock clients.

func (*App) DashboardBuild added in v0.3.0

func (app *App) DashboardBuild(ctx context.Context, optFns ...func(*Options)) error

func (*App) DashboardInit added in v0.3.0

func (app *App) DashboardInit(ctx context.Context, dashboardIDOrURL string) error

func (*App) Run

func (app *App) Run(ctx context.Context, optFns ...func(*Options)) error

Run performs the calculation of the error bar calculation

type Config

type Config struct {
	RequiredVersion string `yaml:"required_version" json:"required_version"`

	Metrics     MetricConfigs     `yaml:"metrics" json:"metrics"`
	Definitions DefinitionConfigs `yaml:"definitions" json:"definitions"`

	Dashboard string `json:"dashboard,omitempty" yaml:"dashboard,omitempty"`
	// contains filtered or unexported fields
}

Config for App

func NewDefaultConfig

func NewDefaultConfig() *Config

NewDefaultConfig creates a default configuration.

func (*Config) Load

func (c *Config) Load(paths ...string) error

Load loads configuration file from file paths.

func (*Config) Restrict

func (c *Config) Restrict() error

Restrict restricts a configuration.

func (*Config) ValidateVersion

func (c *Config) ValidateVersion(version string) error

ValidateVersion validates a version satisfies required_version.

type Dashboard added in v0.3.0

type Dashboard = mackerel.Dashboard

Dashboard is alieas of mackerel.Dashboard

type Definition

type Definition struct {
	// contains filtered or unexported fields
}

Definition is SLI/SLO Definition

func NewDefinition

func NewDefinition(cfg *DefinitionConfig) (*Definition, error)

NewDefinition creates Definition from DefinitionConfig

func (*Definition) CreateRepoorts

func (d *Definition) CreateRepoorts(ctx context.Context, metrics Metrics) ([]*Report, error)

CreateRepoorts returns Report with Metrics

func (*Definition) ID

func (d *Definition) ID() string

ID returns DefinitionConfig.id

type DefinitionConfig

type DefinitionConfig struct {
	ID                string             `json:"id" yaml:"id"`
	TimeFrame         string             `yaml:"time_frame" json:"time_frame"`
	ServiceName       string             `json:"service_name" yaml:"service_name"`
	ErrorBudgetSize   float64            `yaml:"error_budget_size" json:"error_budget_size"`
	CalculateInterval string             `yaml:"calculate_interval" json:"calculate_interval"`
	Objectives        []*ObjectiveConfig `json:"objectives" yaml:"objectives"`
	// contains filtered or unexported fields
}

DefinitionConfig is a setting related to SLI/SLO

func (*DefinitionConfig) DurationCalculate

func (c *DefinitionConfig) DurationCalculate() time.Duration

DurationCalculate converts CalculateInterval as time.Duration

func (*DefinitionConfig) DurationTimeFrame

func (c *DefinitionConfig) DurationTimeFrame() time.Duration

DurationTimeFrame converts TimeFrame as time.Duration

func (*DefinitionConfig) MergeInto

func (c *DefinitionConfig) MergeInto(o *DefinitionConfig)

MergeInto merges DefinitionConfig together

func (*DefinitionConfig) Restrict

func (c *DefinitionConfig) Restrict() error

Restrict restricts a definition configuration.

type DefinitionConfigs

type DefinitionConfigs map[string]*DefinitionConfig

DefinitionConfigs is a collection of DefinitionConfigs that corrects the uniqueness of IDs.

func (DefinitionConfigs) MarshalYAML

func (c DefinitionConfigs) MarshalYAML() (interface{}, error)

MarshalYAML implements yaml.Marhaller

func (DefinitionConfigs) Restrict

func (c DefinitionConfigs) Restrict() error

Restrict restricts a definition configuration.

func (DefinitionConfigs) String

func (c DefinitionConfigs) String() string

String implements fmt.Stringer

func (DefinitionConfigs) ToSlice

func (c DefinitionConfigs) ToSlice() []*DefinitionConfig

func (*DefinitionConfigs) UnmarshalYAML

func (c *DefinitionConfigs) UnmarshalYAML(unmarshal func(interface{}) error) error

MarshalYAML implements yaml.Unmarhaller

type MackerelClient

type MackerelClient interface {
	FindHosts(param *mackerel.FindHostsParam) ([]*mackerel.Host, error)
	FetchHostMetricValues(hostID string, metricName string, from int64, to int64) ([]mackerel.MetricValue, error)
	FetchServiceMetricValues(serviceName string, metricName string, from int64, to int64) ([]mackerel.MetricValue, error)
	PostServiceMetricValues(serviceName string, metricValues []*mackerel.MetricValue) error

	FindDashboards() ([]*mackerel.Dashboard, error)
	FindDashboard(dashboardID string) (*mackerel.Dashboard, error)
	CreateDashboard(param *mackerel.Dashboard) (*mackerel.Dashboard, error)
	UpdateDashboard(dashboardID string, param *mackerel.Dashboard) (*mackerel.Dashboard, error)
}

MackerelClient is an abstraction interface for mackerel-client-go.Client

type Metric

type Metric struct {
	// contains filtered or unexported fields
}

Metric handles aggregated Mackerel metrics

func NewMetric

func NewMetric(cfg *MetricConfig) *Metric

func (*Metric) AggregationInterval

func (m *Metric) AggregationInterval() time.Duration

AggregationInterval returns the aggregation interval for metrics

func (*Metric) AppendValue

func (m *Metric) AppendValue(t time.Time, v interface{}) error

AppendValue adds a value to the metric

func (*Metric) EndAt

func (m *Metric) EndAt() time.Time

EndAt returns the end time of the metric

func (*Metric) GetValue

func (m *Metric) GetValue(t time.Time) (float64, bool)

GetValue gets the value at the specified time

func (*Metric) GetValues

func (m *Metric) GetValues(startAt time.Time, endAt time.Time) map[time.Time]float64

GetValues ​​gets the values ​​for the specified time period

func (*Metric) ID

func (m *Metric) ID() string

ID is the identifier of the metric

func (*Metric) StartAt

func (m *Metric) StartAt() time.Time

StartAt returns the start time of the metric

func (*Metric) String

func (m *Metric) String() string

String implements fmt.Stringer

type MetricConfig

type MetricConfig struct {
	ID                  string     `yaml:"id,omitempty" json:"id,omitempty"`
	Type                MetricType `yaml:"type,omitempty" json:"type,omitempty"`
	Name                string     `yaml:"name,omitempty" json:"name,omitempty"`
	ServiceName         string     `yaml:"service_name,omitempty" json:"service_name,omitempty"`
	Roles               []string   `yaml:"roles,omitempty" json:"roles,omitempty"`
	HostName            string     `yaml:"host_name,omitempty" json:"host_name,omitempty"`
	AggregationInterval string     `yaml:"aggregation_interval,omitempty" json:"aggregation_interval,omitempty"`
	AggregationMethod   string     `json:"aggregation_method,omitempty" yaml:"aggregation_method,omitempty"`
	// contains filtered or unexported fields
}

MetricConfig handles metric information obtained from Mackerel

func (*MetricConfig) DurationAggregation added in v0.1.0

func (c *MetricConfig) DurationAggregation() time.Duration

DurationAggregation converts CalculateInterval as time.Duration

func (*MetricConfig) MergeInto

func (c *MetricConfig) MergeInto(o *MetricConfig)

MergeInto merges MetricConfigs together

func (*MetricConfig) Restrict

func (c *MetricConfig) Restrict() error

Restrict restricts a configuration.

func (*MetricConfig) String

func (c *MetricConfig) String() string

String output json

type MetricConfigs

type MetricConfigs map[string]*MetricConfig

MetricConfigs is a collection of MetricConfig

func (MetricConfigs) MarshalYAML

func (c MetricConfigs) MarshalYAML() (interface{}, error)

MarshalYAML controls Yamlization

func (MetricConfigs) Restrict

func (c MetricConfigs) Restrict() error

Restrict restricts a metric configuration.

func (MetricConfigs) String

func (c MetricConfigs) String() string

String implements fmt.Stringer

func (MetricConfigs) ToSlice

func (c MetricConfigs) ToSlice() []*MetricConfig

ToSlice converts the collection to Slice

func (*MetricConfigs) UnmarshalYAML

func (c *MetricConfigs) UnmarshalYAML(unmarshal func(interface{}) error) error

UnmarshalYAML merges duplicate ID MetricConfig

type MetricType

type MetricType int

MetricType is the type of metric in Mackerel

const (
	HostMetric    MetricType = iota + 1 //host
	ServiceMetric                       //service
)

Reserved value

func MetricTypeString

func MetricTypeString(s string) (MetricType, error)

MetricTypeString retrieves an enum value from the enum constants string name. Throws an error if the param is not part of the enum.

func MetricTypeValues

func MetricTypeValues() []MetricType

MetricTypeValues returns all values of the enum

func (MetricType) IsAMetricType

func (i MetricType) IsAMetricType() bool

IsAMetricType returns "true" if the value is listed in the enum definition. "false" otherwise

func (MetricType) MarshalJSON

func (i MetricType) MarshalJSON() ([]byte, error)

MarshalJSON implements the json.Marshaler interface for MetricType

func (MetricType) MarshalText

func (i MetricType) MarshalText() ([]byte, error)

MarshalText implements the encoding.TextMarshaler interface for MetricType

func (MetricType) String

func (i MetricType) String() string

func (*MetricType) UnmarshalJSON

func (i *MetricType) UnmarshalJSON(data []byte) error

UnmarshalJSON implements the json.Unmarshaler interface for MetricType

func (*MetricType) UnmarshalText

func (i *MetricType) UnmarshalText(text []byte) error

UnmarshalText implements the encoding.TextUnmarshaler interface for MetricType

type Metrics

type Metrics map[string]*Metric

Metrics is a collection of metrics

func (Metrics) AggregationInterval

func (ms Metrics) AggregationInterval() time.Duration

AggregationInterval returns the longest aggregation period for the metric in the collection

func (Metrics) EndAt

func (ms Metrics) EndAt() time.Time

EndAt returns the latest end time of the metric in the collection

func (Metrics) Get

func (ms Metrics) Get(id string) (*Metric, bool)

Get uses an identifier to get the metric

func (Metrics) Set

func (ms Metrics) Set(m *Metric)

Set adds a metric to the collection

func (Metrics) StartAt

func (ms Metrics) StartAt() time.Time

StartAt returns the earliest start time in the metric in the collection

func (Metrics) String

func (ms Metrics) String() string

ToSlice converts the collection to Slice

func (Metrics) ToSlice

func (ms Metrics) ToSlice() []*Metric

type ObjectiveConfig

type ObjectiveConfig struct {
	Expr string `yaml:"expr" json:"expr"`
	// contains filtered or unexported fields
}

Objective Config is a SLO setting

func (*ObjectiveConfig) GetComparator added in v0.2.3

func (c *ObjectiveConfig) GetComparator() evaluator.Comparator

GetComparator returns a Comparator generated from ObjectiveConfig

func (*ObjectiveConfig) Restrict

func (c *ObjectiveConfig) Restrict() error

Restrict restricts a configuration.

type Options added in v0.3.0

type Options struct {
	// contains filtered or unexported fields
}

type Report

type Report struct {
	DefinitionID           string
	ServiceName            string
	DataPoint              time.Time
	TimeFrameStartAt       time.Time
	TimeFrameEndAt         time.Time
	UpTime                 time.Duration
	FailureTime            time.Duration
	ErrorBudgetSize        time.Duration
	ErrorBudget            time.Duration
	ErrorBudgetConsumption time.Duration
}

Report has SLI/ SLO/ErrorBudget numbers in one rolling window

func (*Report) ErrorBudgetConsumptionRate

func (r *Report) ErrorBudgetConsumptionRate() float64

ErrorBudgetConsumptionRate returns ErrorBudgetConsumption/ErrorBudgetSize

func (*Report) ErrorBudgetUsageRate

func (r *Report) ErrorBudgetUsageRate() float64

ErrorBudgetUsageRate returns (1.0 - ErrorBudget/ErrorBudgetSize)

func (*Report) MarshalJSON

func (r *Report) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaler

func (*Report) String

func (r *Report) String() string

String implements fmt.Stringer

type Repository

type Repository struct {
	// contains filtered or unexported fields
}

Repository handles reading and writing data

func NewRepository

func NewRepository(client MackerelClient) *Repository

NewRepository cretates Repository

func (*Repository) FetchMetric

func (repo *Repository) FetchMetric(ctx context.Context, cfg *MetricConfig, startAt time.Time, endAt time.Time) (*Metric, error)

FetchMetric gets Metric using MatricConfig

func (*Repository) FetchMetrics

func (repo *Repository) FetchMetrics(ctx context.Context, cfgs MetricConfigs, startAt time.Time, endAt time.Time) (Metrics, error)

FetchMetrics gets metrics togetheri

func (*Repository) FindDashboard added in v0.3.0

func (repo *Repository) FindDashboard(dashboardIDOrURL string) (*Dashboard, error)

FindDashboard get Mackerel Dashboard

func (*Repository) FindDashboardID added in v0.3.0

func (repo *Repository) FindDashboardID(dashboardIDOrURL string) (string, error)

FindDashboardID get Mackerel Dashboard ID from url or id

func (*Repository) SaveDashboard added in v0.3.0

func (repo *Repository) SaveDashboard(ctx context.Context, dashboard *Dashboard) error

SaveDashboard post Mackerel Dashboard

func (*Repository) SaveReports

func (repo *Repository) SaveReports(ctx context.Context, reports []*Report) error

SaveReports posts Reports to Mackerel

Directories

Path Synopsis
cmd
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL