Documentation ¶
Overview ¶
Package datapipe handles the processing of results from the agents' probes.
Threshold ¶
The Threshold states definition compares a field from a probe result against a fixed value to evaluate states.
States definition format (JSON):
{ "states": [{ "state": "critical|warning", "match": "all|any", "conditions": [{ "field": "field_name", "operator": "lt|le|eq|ne|ge|gt|match|notmatch", "value": X, "reason": "reason of failure" }] }] }
where:
- state: state of the probe. Default values: Critical, Warning
- match: logical operator on the tests performed (all <=> AND / any <=> OR)
- conditions: test(s) to perform to reach the state
- field: probe' field to apply the test to
- operator: type of operation to perform:
- lt / le
- eq / ne
- ge / gt
- match / notmatch
- value: reference value for the test.
- reason: explanation to provide the user with in case of failure.
History ¶
The History states definition compares a field from a probe result against a past result to evaluate states.
States definition format (JSON):
{ "interval": I "states": [{ "state": "critical|warning", "match": "all|any", "conditions": [{ "field": "field_name", "operator": "lt|le|eq|ne|ge|gt|match|notmatch", "tolerance": X, "reason": "reason of failure" }] }] }
where:
- interval: is how far back in history the newest result should be compared. Syntax is similar to `time.ParseDuration`
- state: state of the probe. Default values: Critical, Warning
- match: logical operator on the tests performed (all <=> AND / any <=> OR)
- conditions: test(s) to perform to reach the state
- field: probe' field to apply the test to
- operator: type of operation to perform:
- lt / le
- eq / ne
- ge / gt
- match / notmatch
- tolerance: a multiplying factor of the historical value to compare against. Ignored for non-numeric values. E.g. "operator":lt, "tolerance": 0.1 means that the state will be set if the latest value is "lower than 90% of the historical value" <=> (r < v - v * 0.1) (see below)
- reason: explanation to provide the user with in case of failure.
Tolerance for numeric comparisons can sometimes be confusing, this cheat sheet can be useful.
v-v*tol v v+v*tol <----------|*******^*******|----------> [__________[ "lt" [__________________________] "le" [ [_______________] ] "eq" [__________[ ]__________] "ne" [ [__________________________] "ge" ]__________] "gt" * Lower than: r < v-v*tol * Lower than or equal: r <= v+v*tol * Equal: r >= v-v*tol && r <= v+v*tol * Not equal: r < v-v*tol || r > v+v*tol * Greater than: r > v+v*tol * Greater than or equal: r >= v-v*tol
Trend ¶
The Trend states definition compares the rate of change of a field from a probe result against a fixed value to evaluate states. The rate of change, or "trend", can be expressed in natural language as: "The value has changed by X over the past I".
Example: assuming the CPU usage has been probed 5 minutes ago, returning a value of 5%. The probe runs again now and returns a value of 9%. That's "a +4% change over the past 5 minutes". This is a way to express the trend, or rate of change. The trend state definition requires an "interval" parameter, that allows expressing the rate of change over an arbitrary time duration. In our previous example, if the interval is set to "1m", the trend will be: "a +0.8% change over the past minute". The value used in the state evaluation will be 0.8.
States definition format (JSON):
{ "interval": I, "states": [{ "state": "critical|warning", "match": "all|any", "conditions": [{ "field": "field_name", "operator": "lt|le|eq|ne|ge|gt|match|notmatch", "difference": X, "reason": "reason of failure" }] }] }
where:
- interval: the interval to compute the trend over. This is the "over 5 minutes" part of the trend.
- state: state of the probe. Default values: Critical, Warning
- match: logical operator on the tests performed (all <=> AND / any <=> OR)
- conditions: test(s) to perform to reach the state
- field: probe' field to apply the test to
- operator: type of operation to perform:
- lt / le
- eq / ne
- ge / gt
- match / notmatch
- difference: difference value for the test. This is the "changed by +4%" part of the trend.
- reason: explanation to provide the user with in case of failure.
Index ¶
Constants ¶
const ( StateChange string = "state_change" StateRecurrence string = "state_recurrence" StateCombination string = "state_combination" StateFlap string = "state_flap" )
alertDefinition enum
const ( OpLowerThan operator = "lt" OpLowerEqual operator = "le" OpEqual operator = "eq" OpNotEqual operator = "ne" OpGreaterEqual operator = "ge" OpGreaterThan operator = "gt" OpMatch operator = "match" OpNotMatch operator = "notmatch" )
Comparison operators
const ( MatchAny match = "any" MatchAll match = "all" )
Match operator for conditions
const ( // StateCriticalLabel is the label of the field in the JSON definition of the "critical" state StateCriticalLabel string = "critical" // StateWarningLabel is the label of the field in the JSON definition of the "warning" state StateWarningLabel string = "warning" )
const ( // StateValueUnknown encodes a state that was impossible to evaluate StateValueUnknown = StateValue(api.State_UNKNOWN) // StateValueOK encodes an OK state StateValueOK = StateValue(api.State_OK) // StateValueWarning encodes a warning state StateValueWarning = StateValue(api.State_WARNING) // StateValueCritical encodes a critical state StateValueCritical = StateValue(api.State_CRITICAL) // StateValueMissingData encodes the state of a Probe Configuration // that hasn't received data in "a while" StateValueMissingData = StateValue(api.State_MISSING_DATA) // StateValueError encodes the state of a Probe Configuration // that encountered an error while collecting metrics StateValueError = StateValue(api.State_ERROR) )
const ( StateTypeThreshold string = "threshold" StateTypeHistory string = "history" StateTypeTrend string = "trend" StateTypeNone string = "none" )
StateType
const ( // AlertWaitingTime is the number of seconds to wait before sending another alert AlertWaitingTime time.Duration = 60 * time.Second )
Variables ¶
This section is empty.
Functions ¶
func ValidateStateConfigurationFormat ¶
ValidateStateConfigurationFormat tries to unmarshal the configuration provided in the right format
Types ¶
type Alert ¶
type Alert struct { State bool ProbeUUID uuid.UUID TargetUUID uuid.UUID AgentUUID uuid.UUID ChannelUUID uuid.UUID Reason string }
Alert represents an alert at a certain time
func GetAlert ¶
func GetAlert(result tsdb.Result, state *State, dbAlert *dbconf.Alert, ts tsdb.TSDB) (*Alert, error)
GetAlert evaluates an Alert.State based on the configuration
func (*Alert) SendAlert ¶
func (a *Alert) SendAlert(result tsdb.Result, state *State, db dbconf.ConfigurationDB, ts tsdb.TSDB) (bool, error)
SendAlert actually sends the Alert based on configuration and business rules. Returns true if the alert was actually sent, and false otherwise. Note that the alert can not be sent for a variety of reasons that don't designate an error in the system. As such, (false, nil) is a valid result.
type DataPipe ¶ added in v0.4.0
DataPipe is an interface that represents a Results-processing DataPipe as a service.
type History ¶
type History struct { Interval string `json:"interval"` States []struct { State string `json:"state"` Match match `json:"match"` Conditions []struct { Field string `json:"field"` Operator operator `json:"operator"` Tolerance float64 `json:"tolerance"` Reason string `json:"reason"` } `json:"conditions"` Tags []struct { Name string `json:"name"` Operator operator `json:"operator"` Value string `json:"value"` } `json:"tags"` } `json:"states"` }
History is the States definition for a history check
type State ¶
type State struct { Value StateValue Reason string }
State represents the state of a tsdb.Result when tested
type StateValue ¶
type StateValue int
StateValue encodes a value evaluated for a state
func (StateValue) String ¶
func (sv StateValue) String() string
type Threshold ¶
type Threshold struct { States []struct { State string `json:"state"` Match match `json:"match"` Conditions []struct { Field string `json:"field"` Operator operator `json:"operator"` Value interface{} `json:"value"` Reason string `json:"reason"` } `json:"conditions"` Tags []struct { Name string `json:"name"` Operator operator `json:"operator"` Value string `json:"value"` } `json:"tags"` } `json:"states"` }
Threshold is the States definition for a threshold check
type Trend ¶
type Trend struct { Interval string `json:"interval"` States []struct { State string `json:"state"` Match match `json:"match"` Conditions []struct { Field string `json:"field"` Operator operator `json:"operator"` Difference interface{} `json:"difference"` Reason string `json:"reason"` } `json:"conditions"` Tags []struct { Name string `json:"name"` Operator operator `json:"operator"` Value string `json:"value"` } `json:"tags"` } `json:"states"` }
Trend is the States definition for a trend check