incident

package
v0.0.0-...-2ad2f98 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 26, 2024 License: BSD-3-Clause Imports: 19 Imported by: 4

Documentation

Index

Constants

View Source
const (
	ALERT_NAME       = "alertname"
	CATEGORY         = "category"
	SEVERITY         = "severity"
	ID               = "id"
	ASSIGNED_TO      = "assigned_to"
	ABBR             = "abbr"
	OWNER            = "owner"
	ABBR_OWNER_REGEX = "abbr_owner_regex"
	K8S_POD_NAME     = "kubernetes_pod_name"
	COMMITTED_IMAGE  = "committedImage"
	LIVE_IMAGE       = "liveImage"
)

Well known keys for Incident.Params.

View Source
const (
	TX_RETRIES                   = 5
	NUM_RECENTLY_RESOLVED        = 20
	NUM_RECENTLY_RESOLVED_FOR_ID = 20
)
View Source
const (
	DirtyCommittedK8sImageAlertName = "DirtyCommittedK8sImage"
	StaleK8sImageAlertName          = "StaleK8sImage"
	DirtyRunningK8sConfigAlertName  = "DirtyRunningK8sConfig"
)

Well known alert names.

View Source
const DockerImageRegexString = ".+?-(?P<Owner>\\w+?)-\\w+?-(dirty|clean)$"

Matches images like gcr.io/skia-public/autoroll-be:2021-04-30T14_04_37Z-borenet-c3ecfbb-dirty

Variables

Functions

func AreIncidentsFlaky

func AreIncidentsFlaky(incidents []Incident, numThreshold int, durationThreshold int64, durationPercentage float32) bool

AreIncidentsFlaky is a utility function to help determine whether a slice of incidents are flaky. Flaky here is defined as alerts which occasionally show up and go away on their own with no actions taken to resolve them. They are also typically short lived.

numThreshold is the number of incidents required to have sufficient sample size. If len(incidents) < numThreshold then incidents are determined to be not flaky.

durationThreshold is the duration in seconds below which incidents could be considered to be flaky.

durationPercentage. If the percentage of incidents that have durations below durationThreshold is less than durationPercentage then the incidents are determined to be flaky. Eg: 0.50 for 50%. 1 for 100%.

Summary: The function uses the following to determine flakiness-

  • durationPercentage of incidents lasted less than durationThreshold.
  • Number of incidents must be >= durationThreshold to have sufficient sample size.

Types

type Incident

type Incident struct {
	Key          string            `json:"key" datastore:"key"`                 // Key is the web-safe serialized Datastore key for the incident.
	ID           string            `json:"id" datastore:"id"`                   // Also appears in Params.
	Active       bool              `json:"active" datastore:"active"`           // Or archived.
	Start        int64             `json:"start" datastore:"start"`             // Time in seconds since the epoch.
	LastSeen     int64             `json:"last_seen" datastore:"last_seen"`     // Time in seconds since the epoch.
	Params       paramtools.Params `json:"params" datastore:"-"`                // Params
	ParamsSerial string            `json:"-" datastore:"params_serial,noindex"` // Params serialized as JSON for easy storing in the datastore.
	Notes        []note.Note       `json:"notes" datastore:"notes,flatten"`
}

Incident - An alert that is being acted on.

Each alert has an ID which is the same each time that exact alert is fired. Not to be confused with the Key which is the datastore key for a single incident of an alert firing. There will be many Incidents in the datastore with the same ID, but at most one will be Active.

func (*Incident) IsSilenced

func (in *Incident) IsSilenced(silences []silence.Silence, matchOnlyActiveSilences bool) bool

IsSilence returns if any of the given silences apply to this incident. Has support for regexes (see skbug.com/9587).

func (*Incident) Load

func (in *Incident) Load(ps []datastore.Property) error

Load converts the JSON params back into a map[string]string.

func (*Incident) Save

func (in *Incident) Save() ([]datastore.Property, error)

Save serializes the params as JSON.

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store and retrieve Incidents from Cloud Datastore.

func NewStore

func NewStore(ds *datastore.Client, ignoredAttr []string) *Store

NewStore creates a new Store.

ds - Datastore client. ignoredAttr - A list of keys to ignore when calculating an Incidents ID.

func (*Store) AddNote

func (s *Store) AddNote(encodedKey string, note note.Note) (*Incident, error)

func (*Store) AlertArrival

func (s *Store) AlertArrival(m map[string]string) (*Incident, error)

AlertArrival turns alerts into Incidents, or archives Incidents if the arriving state is resolved.

Note that it is possible for the returned incident to be nil even if the returned error is non-nil. An example of when this could happen: If we receive an alert for an incident that is no longer active.

func (*Store) Archive

func (s *Store) Archive(encodedKey string) (*Incident, error)

func (*Store) Assign

func (s *Store) Assign(encodedKey string, user string) (*Incident, error)

func (*Store) DeleteNote

func (s *Store) DeleteNote(encodedKey string, index int) (*Incident, error)

func (*Store) GetAll

func (s *Store) GetAll() ([]Incident, error)

GetAll returns a list of all active Incidents.

func (*Store) GetRecentlyResolved

func (s *Store) GetRecentlyResolved() ([]Incident, error)

GetRecentlyResolved returns the N most recently archived Incidents.

func (*Store) GetRecentlyResolvedForID

func (s *Store) GetRecentlyResolvedForID(id, excludeKey string) ([]Incident, error)

GetRecentlyResolvedForID returns a list of the N most recent archived Incidents that don't match the given key.

func (*Store) GetRecentlyResolvedInRange

func (s *Store) GetRecentlyResolvedInRange(d string) ([]Incident, error)

GetRecentlyResolvedInRange returns the most recently archived Incidents in the given range.

d - The range in human units, e.g. "1w".

func (*Store) GetRecentlyResolvedInRangeWithID

func (s *Store) GetRecentlyResolvedInRangeWithID(d, id string) ([]Incident, error)

GetRecentlyResolvedInRangeWithID returns the most recently archived Incidents in the given range.

d - The range in human units, e.g. "1w". id - The id of the incidents to return.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL