Documentation ¶
Overview ¶
Package monitor provides core Blip components that, together, monitor one MySQL instance. Most monitoring logic happens in the package, but package metrics is closely related: this latter actually collect metrics, but it is driven by this package. Other Blip packages are mostly set up and support of monitors.
Index ¶
- Variables
- func NewLevelCollector(args LevelCollectorArgs) *lco
- func NewPlanChanger(args PlanChangerArgs) *planChanger
- func TickerDuration(d, e time.Duration)
- type Engine
- func (e *Engine) Collect(emrCtx context.Context, interval uint, levelName string, startTime time.Time) ([]*blip.Metrics, error)
- func (e *Engine) DB() *sql.DB
- func (e *Engine) MonitorId() string
- func (e *Engine) Prepare(ctx context.Context, plan blip.Plan, before, after func()) error
- func (e *Engine) Stop()
- type Exporter
- type LevelCollector
- type LevelCollectorArgs
- type LoadFunc
- type Loader
- func (ml *Loader) Count() uint
- func (ml *Loader) Load(ctx context.Context) error
- func (ml *Loader) Monitor(monitorId string) *Monitor
- func (ml *Loader) Monitors() []*Monitor
- func (ml *Loader) Print() string
- func (ml *Loader) Start(monitorId string, lock bool) error
- func (ml *Loader) StartMonitors()
- func (ml *Loader) Stop(monitorId string, lock bool) error
- func (ml *Loader) Unload(monitorId string, lock bool) error
- type LoaderArgs
- type Monitor
- type MonitorArgs
- type PlanChanger
- type PlanChangerArgs
- type StartMonitorFunc
Constants ¶
This section is empty.
Variables ¶
var ( ErrMonitorNotLoaded = errors.New("monitor not loaded") ErrStopLoss = errors.New("stop-loss prevents reloading") )
var CollectParallel = 2
CollectParallel sets how many domains to collect in parallel. Currently, this is not configurable via Blip config; it can only be changed via integration.
var Now func() time.Time = time.Now
Functions ¶
func NewLevelCollector ¶
func NewLevelCollector(args LevelCollectorArgs) *lco
func NewPlanChanger ¶
func NewPlanChanger(args PlanChangerArgs) *planChanger
func TickerDuration ¶
TickerDuration sets the internal ticker duration for testing. This is only called for testing; do not called outside testing.
Types ¶
type Engine ¶
Engine runs domain metric collectors to collect metrics. It's called by the LevelCollector (LCO) at intervals and expected to collect and return within an engine make runtime (EMR) passed to Collect. The LCO creates the Engine. On LCO.Stop, the Engine must stop/destroy all collectors because the LCO will stop/destroy the Engine. Like all Monitor components, an Engine is not restarted or reused, it's recreated if the Monitor is restarted.
func (*Engine) Collect ¶
func (e *Engine) Collect(emrCtx context.Context, interval uint, levelName string, startTime time.Time) ([]*blip.Metrics, error)
Collect collects the metrics at the given level. There are 3 return guarantees for the slice of metrics:
- metrics[0] is non-nil (always returns at least one blip.Metrics)
- metrics[n].Values is non-nil (but might be empty, no values)
- []metrics is sorted ascending by Interval
Collect returns when all collectors it starts return, or when emrCtx (engine max runtime) expires. The former is the normal case.
Both metrics and an error can be returned in the case of partially success: some collectors work but others fail. Caller should check returned metrics even if an error is returned.
func (*Engine) Prepare ¶
Prepare prepares the engine to collect metrics for the plan. The engine must be successfully prepared for Collect() to work because Prepare() initializes metric collectors for every level of the plan. Prepare() can be called again when, for example, the PlanChanger detects a state change and calls the LevelCollector to change plans, which than calls this func with the new state plan.
Do not call this func concurrently! It does not guard against concurrent calls. Serialization is handled by the only caller: LevelCollector.ChangePlan().
func (*Engine) Stop ¶
func (e *Engine) Stop()
Stop the engine and cleanup any metrics associated with it. TODO: There is a possible race condition when this is called. Since Engine.Collect is called as a go-routine, we could have an invocation of the function block waiting for Engine.Stop to unlock after which Collect would run after cleanup has been called. This could result in a panic, though that should be caught and logged. Since the monitor is stopping anyway this isn't a huge issue.
type Exporter ¶
Exporter emulates a Prometheus mysqld_exporter. It implements prom.Exporter.
func NewExporter ¶
func (Exporter) Collect ¶
func (e Exporter) Collect(ch chan<- prometheus.Metric)
Collect collects metrics. It is called indirectly via Scrape.
func (Exporter) Describe ¶
func (e Exporter) Describe(descs chan<- *prometheus.Desc)
type LevelCollector ¶
type LevelCollector interface { // Run runs the collector to collect metrics; it's a blocking call. Run(stopChan, doneChan chan struct{}) error // ChangePlan changes the plan; it's called by the PlanChanger. ChangePlan(newState, newPlanName string) error // Pause pauses metrics collection until ChangePlan is called. Pause() }
LevelCollector (LCO) executes the current plan to collect metrics. It's also responsible for changing the plan when called by the PlanChanger.
The term "collector" is a little misleading because the LCO doesn't collect metrics, but it is the first step in the metrics collection process, which looks roughly like: LCO -> Engine -> metric collectors -> MySQL. In Run, the LCO checks every 1s for the highest level in the plan to collect. For example, after 5s it'll collect levels with a frequency divisible by 5s. See https://cashapp.github.io/blip/v1.0/intro/plans.
Metrics from MySQL flow back to the LCO as blip.Metrics, which the LCO passes to blip.Plugin.TransformMetrics if specified, then to all sinks specified for the monitor.
type LevelCollectorArgs ¶
type LoadFunc ¶
type LoadFunc func(blip.Config) ([]blip.ConfigMonitor, error)
LoadFunc is a callback that matches blip.Plugin.LoadMonitors. It's an arg to NewLoader, if specified by the user.
type Loader ¶
Loader is the singleton monitor loader and repo. It's created by the server and only used there (and via API calls). It's dynamic so monitors can be loaded (created) and unloaded (destroyed) while Blip is running, but the normal case is one load and start on Blip startup: Server.Boot calls Load, then Server.Run calls StartMonitors. The user can make API calls to reload while Blip is running.
Loader is safe for concurrent use, but it's currently only called by the Server.
func NewLoader ¶
func NewLoader(args LoaderArgs) *Loader
NewLoader creates a new Loader singleton. It's called in Server.Boot and Server.Run.
func (*Loader) Count ¶
Count returns the number of loaded monitors. It's used by the API for status.
func (*Loader) Load ¶
Load loads all configured monitors and unloads (stops and removes) monitors that have been removed or changed since the last call to Load. It does not start new monitors. Call StartMonitors after Load to start new (or previously stopped) monitors.
Server.Boot calls Load, then Server.Run calls StartMonitors.
Load checks for stop-loss and does local MySQL auto-detection, if these two features are enabled.
If Load returns error, the currently loaded monitors are not affected. The error indicates a problem loading monitors or a validation error.
This function is safe for concurrent use, but calls are serialized.
func (*Loader) Monitor ¶
Monitor returns one monitor by ID. It's used by the API to get single monitor status.
func (*Loader) Print ¶
Print prints all loaded monitors in blip.ConfigMonitor YAML format. It's used for --print-monitors.
func (*Loader) StartMonitors ¶
func (ml *Loader) StartMonitors()
StartMonitors starts all monitors that have been loaded but not started. This should be called after Load. On Blip startup, the server calls Load in Server.Boot, then StartMonitors in server.Run. The user can reload by calling the server API: /monitors/reload.
This function is safe for concurrent use, but calls are serialized.
type LoaderArgs ¶
type Monitor ¶
type Monitor struct {
// contains filtered or unexported fields
}
Monitor monitors one MySQL instance. The monitor is a high-level component that runs (and keeps running) four monitor subsystems:
- Plan changer (PCH)
- Level collector (LCO)
- Blip heartbeat writer
- Exporter (Prometheus)
Each subsystem is optional based on the config, but LCO runs by default because it contains the Engine component that does actual metrics collection. If any subsystem crashes (returns for any reason or panics), the monitor stops and restarts all subsystems. The monitor doesn't stop until Stop is called. Consequently, if a monitor is not configured correctly (for example, it can't connect to MySQL), it tries and reports every forever.
Monitors are loaded, created, and initially started only by the MonitorLoader. A monitor can be stopped and started (again) via the server API.
A monitor is uniquely identified by its monitor ID, which should be included in all output by the monitor and its subsystems. The monitor ID is set when loaded by the MonitoLoad, which calls blip.MonitorId to determine the value.
A monitor is completely self-contained and independent. For example, each monitor has its own LCO, engine, and metric collectors.
func NewMonitor ¶
func NewMonitor(args MonitorArgs) *Monitor
NewMonitor creates a new Monitor with the given arguments. The caller must call Boot then, if that does not return an error, Run to start monitoring the MySQL instance.
func (*Monitor) Config ¶
func (m *Monitor) Config() blip.ConfigMonitor
Config returns the monitor config.
type MonitorArgs ¶
type MonitorArgs struct { Config blip.ConfigMonitor DbMaker blip.DbFactory PlanLoader *plan.Loader Sinks []blip.Sink TransformMetric func([]*blip.Metrics) error HA ha.Manager }
MonitorArgs are required arguments to NewMonitor.
type PlanChanger ¶
type PlanChanger interface {
Run(stopChan, doneChan chan struct{}) error
}
PlanChanger (PCH) changes the plan based on database instance state. If the plan changes, the PCH calls the LevelCollector (LCO) to do the real low-level work of swapping plans, because the LCO executes plans. In this sense, "Changer" is a bit misleading because it doesn't change the plan, it just determines if/when the plan should change, and then tells the LCO to actually change the plan.
type PlanChangerArgs ¶
type PlanChangerArgs struct { MonitorId string Config blip.ConfigPlanChange DB *sql.DB LCO LevelCollector HA ha.Manager }
type StartMonitorFunc ¶
type StartMonitorFunc func(blip.ConfigMonitor) bool
StartMonitorFunc is a callback that matches blip.Plugin.StartMonitor.