shimesaba

package module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 5, 2021 License: MIT Imports: 19 Imported by: 0

README

Latest GitHub release Github Actions test Go Report Card License

shimesaba

For SRE to operate and monitor services using Mackerel.

Description

shimesaba is a tool for tracking SLO/ErrorBudget using Mackerel as an SLI measurement service.

  1. Get and aggregate Mackerel (host/service) metrics within the calculated period.
  2. Calculate the SLI from the metric obtained in step 1 and determine if it is an SLO violation in the rolling window.
  3. Calculate the time (minutes) of SLO violation within the time frame of the rolling window and calculate the error budget.
  4. Post the calculated error budget, failure time for SLO violation, etc. as Mackerel service metric.

Install

binary packages

Releases.

Homebrew tap
$ brew install mashiike/tap/shimesaba

Usage

as CLI command
$ shimesaba -config config.yaml -mackerel-apikey <Mackerel API Key>
Usage of shimesaba:
  -backfill uint
        generate report before n point (default 3)
  -config value
        config file path, can set multiple
  -debug
        output debug log
  -dry-run
        report output stdout and not put mackerel
  -mackerel-apikey string
        for access mackerel API
as AWS Lambda function

shimesaba binary also runs as AWS Lambda function.

CLI options can be specified from environment variables. For example, when MACKEREL_APIKEY environment variable is set, the value is set to -mackerel-apikey option.

Example Lambda functions configuration.

{
  "FunctionName": "shimesaba",
  "Environment": {
    "Variables": {
      "CONFIG": "config.yaml",
      "MACKEREL_APIKEY": "<Mackerel API KEY>"
    }
  },
  "Handler": "shimesaba",
  "MemorySize": 128,
  "Role": "arn:aws:iam::0123456789012:role/lambda-function",
  "Runtime": "provided.al2",
  "Timeout": 300
}
Configuration file

YAML format.

required_version: ">=0.0.0"

metrics:
  - id: alb_p90_response_time
    name: custom.alb.response.time_p90
    type: host
    service_name: prod
    host_name: dummy-alb
    aggregation_interval: 1m
    aggregation_method: max
  - id: component_response_time
    name: component.dummy.response_time
    type: service
    service_name: prod
    aggregation_interval: 1m
    aggregation_method: avg

definitions:
  - id: latency
    service_name: prod 
    time_frame: 28d #4weeks
    calculate_interval: 1h
    error_budget_size: 0.001
    objectives:
      - expr: alb_p90_response_time <= 1.0
      - expr: component_response_time <= 1.0
required_version

the requied_version accepts a version constraint string, which specifies which versions of shimesaba can be used with your configuration.

metrics

The metrics accepts list of Mackerel metrics configure.
shimesaba gets the mackerel metric specified in this list.
The metrics described in this list can be found in the definitions settings described below. Each setting item in the list is as follows

id

Requied.
An identifier to refer to in definitions. Must be unique in the list

name

Requied.
Metric identifier on Mackerel

type

Requied.
The type of metric. Host metric must set host and service metric must set service.

service_name

Requied.
Specify the name of the service to which the metric belongs

roles

Optional, only type=host
Specifies the role when searching for hosts that are subject to host metrics.

host_name

Optional, only type=host
Specify the host name when searching for the host that is the target of host metrics.

aggregation_interval

Optional, default=1m It's time to aggregate the metrics. This is also the unit for determining SLO violations. For example, if you calculate SLI using a metric with an aggregation interval of 5 minutes, you will get an SLO violation check in 5 minute increments.

aggregation_method

Optional, default=max How to aggregate metrics. There are max, total, avg.

definitions

The definitions accepts list of SLI/SLO definition configure.
6 Mackerel service metrics are posted per definition.

For example, if id is latency, the following service metric will be posted.

  • shimesaba.error_budget.latency: Current error budget remaining number (unit:minutes)
  • shimesaba.error_budget_percentage.latency: Parcentage of current error budget remaining. If it exceeds 100%, the error budget is used up.
  • shimesaba.error_budget_consumption.latency: Error budget newly consumed in this calculation window (unit:minutes)
  • shimesaba.error_budget_consumption_percentage.latency: Percentage of newly consumed error budget in this calculation window
  • shimesaba.failure_time.latency: Time of SLO violation within the rolling window time frame (unit:minutes)
  • shimesaba.uptime.latency: Time that can be treated as normal operation within the time frame of the rolling window (unit:minutes)

Each setting item in the list is as follows

id

Requied.
The identifier of definition. Based on this identifier, the service metric masterpiece at the time of posting is determined.
Must be unique in the list.

service_name

Requied.
The service to which the service metric is posted

time_frame

Requied. The size of the time frame of the rolling window.
For example, if you specify 40320 minutes, the error budget will be calculated for the SLI for the last 4 weeks.

calculate_interval

Requied.

The shift width of the rolling window. Service metrics are posted to Mackerel at individually specified time intervals.
This width is recommended to be shorter than 1440 minutes (1 day) because Mackerel ignores postings of time stamp metrics before 24 hours *1.

*1 https://mackerel.io/ja/api-docs/entry/service-metrics#post

We recommend running sihmesaba every hour with calculate_interval set to 60 minutes (1 hour).

error_budget_size:

Requied.
Setting how much error budget should be taken with respect to the width of the time frame of the rolling window. For example, if time_frame is 40320 and you specify 0.001 (0.1%), the size of the error budget will be 40 minutes. This means that we will tolerate SLO violations of up to 40 minutes in the last 4 weeks.

objectives

Requied.
A list of specific SLO definitions. This is a list of expr. expr defines a Go syntax comparison expression. You can use the id specified in metrics like a variable. The right-hand side of the comparison must always be a numeric literal. If multiple expr are defined in the objectives, all must be true. If any of expr are false, it is a violation of SLO.

For example:
Assuming that you have obtained the metrics alb_2xx and alb_5xx, you can write the following comparison formula.

- expr: rate(alb_2xx, alb_2xx + alb_5xx) >= 0.95

rate() is a function prepared to safely execute division while avoiding division by zero. The meaning of this comparison formula is If the HTTP request rate is 95% or higher, the service is healthy.

Environment variable SSMWRAP_PATHS

It incorporates github.com/handlename/ssmwrap for parameter management.
If you specify the path of the Parameter Store of AWS Systems Manager separated by commas, it will be output to the environment variable.
Useful when used as a Lambda function.

LICENSE

MIT

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNotComparativeExpression = errors.New("this expr is not comparative")
	ErrExprRightNotLiteral      = errors.New("expr right side is value literal")
)

Reserved errors

Functions

This section is empty.

Types

type App

type App struct {
	// contains filtered or unexported fields
}

App manages life cycle

func New

func New(apikey string, cfg *Config) (*App, error)

New creates an app

func NewWithMackerelClient

func NewWithMackerelClient(client MackerelClient, cfg *Config) (*App, error)

NewWithMackerelClient is there to accept mock clients.

func (*App) Run

func (app *App) Run(ctx context.Context, opts ...RunOption) error

Run performs the calculation of the error bar calculation

type Config

type Config struct {
	RequiredVersion string `yaml:"required_version" json:"required_version"`

	Metrics     MetricConfigs     `yaml:"metrics" json:"metrics"`
	Definitions DefinitionConfigs `yaml:"definitions" json:"definitions"`
	// contains filtered or unexported fields
}

Config for App

func NewDefaultConfig

func NewDefaultConfig() *Config

NewDefaultConfig creates a default configuration.

func (*Config) Load

func (c *Config) Load(paths ...string) error

Load loads configuration file from file paths.

func (*Config) Restrict

func (c *Config) Restrict() error

Restrict restricts a configuration.

func (*Config) ValidateVersion

func (c *Config) ValidateVersion(version string) error

ValidateVersion validates a version satisfies required_version.

type Definition

type Definition struct {
	// contains filtered or unexported fields
}

Definition is SLI/SLO Definition

func NewDefinition

func NewDefinition(cfg *DefinitionConfig) (*Definition, error)

NewDefinition creates Definition from DefinitionConfig

func (*Definition) CreateRepoorts

func (d *Definition) CreateRepoorts(ctx context.Context, metrics Metrics) ([]*Report, error)

CreateRepoorts returns Report with Metrics

func (*Definition) ID

func (d *Definition) ID() string

ID returns DefinitionConfig.id

type DefinitionConfig

type DefinitionConfig struct {
	ID                string             `json:"id" yaml:"id"`
	TimeFrame         string             `yaml:"time_frame" json:"time_frame"`
	ServiceName       string             `json:"service_name" yaml:"service_name"`
	ErrorBudgetSize   float64            `yaml:"error_budget_size" json:"error_budget_size"`
	CalculateInterval string             `yaml:"calculate_interval" json:"calculate_interval"`
	Objectives        []*ObjectiveConfig `json:"objectives" yaml:"objectives"`
	// contains filtered or unexported fields
}

DefinitionConfig is a setting related to SLI/SLO

func (*DefinitionConfig) DurationCalculate

func (c *DefinitionConfig) DurationCalculate() time.Duration

DurationCalculate converts CalculateInterval as time.Duration

func (*DefinitionConfig) DurationTimeFrame

func (c *DefinitionConfig) DurationTimeFrame() time.Duration

DurationTimeFrame converts TimeFrame as time.Duration

func (*DefinitionConfig) MergeInto

func (c *DefinitionConfig) MergeInto(o *DefinitionConfig)

MergeInto merges DefinitionConfig together

func (*DefinitionConfig) Restrict

func (c *DefinitionConfig) Restrict() error

Restrict restricts a definition configuration.

type DefinitionConfigs

type DefinitionConfigs map[string]*DefinitionConfig

DefinitionConfigs is a collection of DefinitionConfigs that corrects the uniqueness of IDs.

func (DefinitionConfigs) MarshalYAML

func (c DefinitionConfigs) MarshalYAML() (interface{}, error)

MarshalYAML implements yaml.Marhaller

func (DefinitionConfigs) Restrict

func (c DefinitionConfigs) Restrict() error

Restrict restricts a definition configuration.

func (DefinitionConfigs) String

func (c DefinitionConfigs) String() string

String implements fmt.Stringer

func (DefinitionConfigs) ToSlice

func (c DefinitionConfigs) ToSlice() []*DefinitionConfig

func (*DefinitionConfigs) UnmarshalYAML

func (c *DefinitionConfigs) UnmarshalYAML(unmarshal func(interface{}) error) error

MarshalYAML implements yaml.Unmarhaller

type MackerelClient

type MackerelClient interface {
	FindHosts(param *mackerel.FindHostsParam) ([]*mackerel.Host, error)
	FetchHostMetricValues(hostID string, metricName string, from int64, to int64) ([]mackerel.MetricValue, error)
	FetchServiceMetricValues(serviceName string, metricName string, from int64, to int64) ([]mackerel.MetricValue, error)
	PostServiceMetricValues(serviceName string, metricValues []*mackerel.MetricValue) error
}

MackerelClient is an abstraction interface for mackerel-client-go.Client

type Metric

type Metric struct {
	// contains filtered or unexported fields
}

Metric handles aggregated Mackerel metrics

func NewMetric

func NewMetric(cfg *MetricConfig) *Metric

func (*Metric) AggregationInterval

func (m *Metric) AggregationInterval() time.Duration

AggregationInterval returns the aggregation interval for metrics

func (*Metric) AppendValue

func (m *Metric) AppendValue(t time.Time, v interface{}) error

AppendValue adds a value to the metric

func (*Metric) EndAt

func (m *Metric) EndAt() time.Time

EndAt returns the end time of the metric

func (*Metric) GetValue

func (m *Metric) GetValue(t time.Time) (float64, bool)

GetValue gets the value at the specified time

func (*Metric) GetValues

func (m *Metric) GetValues(startAt time.Time, endAt time.Time) map[time.Time]float64

GetValues ​​gets the values ​​for the specified time period

func (*Metric) ID

func (m *Metric) ID() string

ID is the identifier of the metric

func (*Metric) StartAt

func (m *Metric) StartAt() time.Time

StartAt returns the start time of the metric

func (*Metric) String

func (m *Metric) String() string

String implements fmt.Stringer

type MetricComparator

type MetricComparator struct {
	// contains filtered or unexported fields
}

MetricComparator is a comparison using multiple metrics

func NewMetricComparator

func NewMetricComparator(str string) (*MetricComparator, error)

NewMetricComparator creates MetricComparator from expr string

func (MetricComparator) Eval

func (mc MetricComparator) Eval(metrics Metrics, startAt, endAt time.Time) (map[time.Time]bool, error)

Eval performs a comparison

type MetricConfig

type MetricConfig struct {
	ID                  string     `yaml:"id" json:"id"`
	Type                MetricType `yaml:"type" json:"type"`
	Name                string     `yaml:"name" json:"name"`
	ServiceName         string     `yaml:"service_name" json:"service_name"`
	Roles               []string   `yaml:"roles" json:"roles"`
	HostName            string     `yaml:"host_name" json:"host_name"`
	AggregationInterval string     `yaml:"aggregation_interval" json:"aggregation_interval"`
	AggregationMethod   string     `json:"aggregation_method" yaml:"aggregation_method"`
	// contains filtered or unexported fields
}

MetricConfig handles metric information obtained from Mackerel

func (*MetricConfig) DurationAggregation added in v0.1.0

func (c *MetricConfig) DurationAggregation() time.Duration

DurationAggregation converts CalculateInterval as time.Duration

func (*MetricConfig) MergeInto

func (c *MetricConfig) MergeInto(o *MetricConfig)

MergeInto merges MetricConfigs together

func (*MetricConfig) Restrict

func (c *MetricConfig) Restrict() error

Restrict restricts a configuration.

func (*MetricConfig) String

func (c *MetricConfig) String() string

String output json

type MetricConfigs

type MetricConfigs map[string]*MetricConfig

MetricConfigs is a collection of MetricConfig

func (MetricConfigs) MarshalYAML

func (c MetricConfigs) MarshalYAML() (interface{}, error)

MarshalYAML controls Yamlization

func (MetricConfigs) Restrict

func (c MetricConfigs) Restrict() error

Restrict restricts a metric configuration.

func (MetricConfigs) String

func (c MetricConfigs) String() string

String implements fmt.Stringer

func (MetricConfigs) ToSlice

func (c MetricConfigs) ToSlice() []*MetricConfig

ToSlice converts the collection to Slice

func (*MetricConfigs) UnmarshalYAML

func (c *MetricConfigs) UnmarshalYAML(unmarshal func(interface{}) error) error

UnmarshalYAML merges duplicate ID MetricConfig

type MetricType

type MetricType int

MetricType is the type of metric in Mackerel

const (
	HostMetric    MetricType = iota + 1 //host
	ServiceMetric                       //service
)

Reserved value

func MetricTypeString

func MetricTypeString(s string) (MetricType, error)

MetricTypeString retrieves an enum value from the enum constants string name. Throws an error if the param is not part of the enum.

func MetricTypeValues

func MetricTypeValues() []MetricType

MetricTypeValues returns all values of the enum

func (MetricType) IsAMetricType

func (i MetricType) IsAMetricType() bool

IsAMetricType returns "true" if the value is listed in the enum definition. "false" otherwise

func (MetricType) MarshalJSON

func (i MetricType) MarshalJSON() ([]byte, error)

MarshalJSON implements the json.Marshaler interface for MetricType

func (MetricType) MarshalText

func (i MetricType) MarshalText() ([]byte, error)

MarshalText implements the encoding.TextMarshaler interface for MetricType

func (MetricType) String

func (i MetricType) String() string

func (*MetricType) UnmarshalJSON

func (i *MetricType) UnmarshalJSON(data []byte) error

UnmarshalJSON implements the json.Unmarshaler interface for MetricType

func (*MetricType) UnmarshalText

func (i *MetricType) UnmarshalText(text []byte) error

UnmarshalText implements the encoding.TextUnmarshaler interface for MetricType

type Metrics

type Metrics map[string]*Metric

Metrics is a collection of metrics

func (Metrics) AggregationInterval

func (ms Metrics) AggregationInterval() time.Duration

AggregationInterval returns the longest aggregation period for the metric in the collection

func (Metrics) EndAt

func (ms Metrics) EndAt() time.Time

EndAt returns the latest end time of the metric in the collection

func (Metrics) Get

func (ms Metrics) Get(id string) (*Metric, bool)

Get uses an identifier to get the metric

func (Metrics) Set

func (ms Metrics) Set(m *Metric)

Set adds a metric to the collection

func (Metrics) StartAt

func (ms Metrics) StartAt() time.Time

StartAt returns the earliest start time in the metric in the collection

func (Metrics) String

func (ms Metrics) String() string

ToSlice converts the collection to Slice

func (Metrics) ToSlice

func (ms Metrics) ToSlice() []*Metric

type ObjectiveConfig

type ObjectiveConfig struct {
	Expr string `yaml:"expr" json:"expr"`
	// contains filtered or unexported fields
}

Objective Config is a SLO setting

func (*ObjectiveConfig) GetMetricComparator

func (c *ObjectiveConfig) GetMetricComparator() *MetricComparator

GetMetricComparator returns a MetricComparator generated from ObjectiveConfig

func (*ObjectiveConfig) Restrict

func (c *ObjectiveConfig) Restrict() error

Restrict restricts a configuration.

type Report

type Report struct {
	DefinitionID           string
	ServiceName            string
	DataPoint              time.Time
	TimeFrameStartAt       time.Time
	TimeFrameEndAt         time.Time
	UpTime                 time.Duration
	FailureTime            time.Duration
	ErrorBudgetSize        time.Duration
	ErrorBudget            time.Duration
	ErrorBudgetConsumption time.Duration
}

Report has SLI/ SLO/ErrorBudget numbers in one rolling window

func (*Report) ErrorBudgetConsumptionRate

func (r *Report) ErrorBudgetConsumptionRate() float64

ErrorBudgetConsumptionRate returns ErrorBudgetConsumption/ErrorBudgetSize

func (*Report) ErrorBudgetUsageRate

func (r *Report) ErrorBudgetUsageRate() float64

ErrorBudgetUsageRate returns (1.0 - ErrorBudget/ErrorBudgetSize)

func (*Report) MarshalJSON

func (r *Report) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaler

func (*Report) String

func (r *Report) String() string

String implements fmt.Stringer

type Repository

type Repository struct {
	// contains filtered or unexported fields
}

Repository handles reading and writing data

func NewRepository

func NewRepository(client MackerelClient) *Repository

NewRepository cretates Repository

func (*Repository) FetchMetric

func (repo *Repository) FetchMetric(ctx context.Context, cfg *MetricConfig, startAt time.Time, endAt time.Time) (*Metric, error)

FetchMetric gets Metric using MatricConfig

func (*Repository) FetchMetrics

func (repo *Repository) FetchMetrics(ctx context.Context, cfgs MetricConfigs, startAt time.Time, endAt time.Time) (Metrics, error)

FetchMetrics gets metrics togetheri

func (*Repository) SaveReports

func (repo *Repository) SaveReports(ctx context.Context, reports []*Report) error

SaveReports posts Reports to Mackerel

type RunOption

type RunOption interface {
	// contains filtered or unexported methods
}

RunOption is an App.Run option

func BackfillOption

func BackfillOption(count int) RunOption

BackfillOption specifies how many points of data to calculate retroactively from the current time.

func DryRunOption

func DryRunOption(dryRun bool) RunOption

DryRunOption is an option to output the calculated error budget as standard without posting it to Mackerel.

Directories

Path Synopsis
cmd
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL