monitor

package
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 19, 2024 License: Apache-2.0 Imports: 26 Imported by: 0

Documentation

Overview

Package monitor provides core Blip components that, together, monitor one MySQL instance. Most monitoring logic happens in the package, but package metrics is closely related: this latter actually collect metrics, but it is driven by this package. Other Blip packages are mostly set up and support of monitors.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrMonitorNotLoaded = errors.New("monitor not loaded")
	ErrStopLoss         = errors.New("stop-loss prevents reloading")
)
View Source
var CollectParallel = 2

CollectParallel sets how many domains to collect in parallel. Currently, this is not configurable via Blip config; it can only be changed via integration.

View Source
var Now func() time.Time = time.Now

Functions

func NewLevelCollector

func NewLevelCollector(args LevelCollectorArgs) *lco

func NewPlanChanger

func NewPlanChanger(args PlanChangerArgs) *planChanger

func TickerDuration

func TickerDuration(d, e time.Duration)

TickerDuration sets the internal ticker duration for testing. This is only called for testing; do not called outside testing.

Types

type Engine

type Engine struct {
	*sync.Mutex
	// contains filtered or unexported fields
}

Engine runs domain metric collectors to collect metrics. It's called by the LevelCollector (LCO) at intervals and expected to collect and return within an engine make runtime (EMR) passed to Collect. The LCO creates the Engine. On LCO.Stop, the Engine must stop/destroy all collectors because the LCO will stop/destroy the Engine. Like all Monitor components, an Engine is not restarted or reused, it's recreated if the Monitor is restarted.

func NewEngine

func NewEngine(cfg blip.ConfigMonitor, db *sql.DB) *Engine

func (*Engine) Collect

func (e *Engine) Collect(emrCtx context.Context, interval uint, levelName string, startTime time.Time) ([]*blip.Metrics, error)

Collect collects the metrics at the given level. There are 3 return guarantees for the slice of metrics:

  • metrics[0] is non-nil (always returns at least one blip.Metrics)
  • metrics[n].Values is non-nil (but might be empty, no values)
  • []metrics is sorted ascending by Interval

Collect returns when all collectors it starts return, or when emrCtx (engine max runtime) expires. The former is the normal case.

Both metrics and an error can be returned in the case of partially success: some collectors work but others fail. Caller should check returned metrics even if an error is returned.

func (*Engine) DB

func (e *Engine) DB() *sql.DB

func (*Engine) MonitorId

func (e *Engine) MonitorId() string

func (*Engine) Prepare

func (e *Engine) Prepare(ctx context.Context, plan blip.Plan, before, after func()) error

Prepare prepares the engine to collect metrics for the plan. The engine must be successfully prepared for Collect() to work because Prepare() initializes metric collectors for every level of the plan. Prepare() can be called again when, for example, the PlanChanger detects a state change and calls the LevelCollector to change plans, which than calls this func with the new state plan.

Do not call this func concurrently! It does not guard against concurrent calls. Serialization is handled by the only caller: LevelCollector.ChangePlan().

func (*Engine) Stop

func (e *Engine) Stop()

Stop the engine and cleanup any metrics associated with it. TODO: There is a possible race condition when this is called. Since Engine.Collect is called as a go-routine, we could have an invocation of the function block waiting for Engine.Stop to unlock after which Collect would run after cleanup has been called. This could result in a panic, though that should be caught and logged. Since the monitor is stopping anyway this isn't a huge issue.

type Exporter

type Exporter struct {
	*sync.Mutex
	// contains filtered or unexported fields
}

Exporter emulates a Prometheus mysqld_exporter. It implements prom.Exporter.

func NewExporter

func NewExporter(cfg blip.ConfigExporter, plan blip.Plan, engine *Engine) *Exporter

func (Exporter) Collect

func (e Exporter) Collect(ch chan<- prometheus.Metric)

Collect collects metrics. It is called indirectly via Scrape.

func (Exporter) Describe

func (e Exporter) Describe(descs chan<- *prometheus.Desc)

func (Exporter) Plan

func (e Exporter) Plan() blip.Plan

func (Exporter) Scrape

func (e Exporter) Scrape() (string, error)

Scrape collects and returns metrics in Prometheus exposition format. This function is called in response to GET /metrics.

type LevelCollector

type LevelCollector interface {
	// Run runs the collector to collect metrics; it's a blocking call.
	Run(stopChan, doneChan chan struct{}) error

	// ChangePlan changes the plan; it's called by the PlanChanger.
	ChangePlan(newState, newPlanName string) error

	// Pause pauses metrics collection until ChangePlan is called.
	Pause()
}

LevelCollector (LCO) executes the current plan to collect metrics. It's also responsible for changing the plan when called by the PlanChanger.

The term "collector" is a little misleading because the LCO doesn't collect metrics, but it is the first step in the metrics collection process, which looks roughly like: LCO -> Engine -> metric collectors -> MySQL. In Run, the LCO checks every 1s for the highest level in the plan to collect. For example, after 5s it'll collect levels with a frequency divisible by 5s. See https://cashapp.github.io/blip/v1.0/intro/plans.

Metrics from MySQL flow back to the LCO as blip.Metrics, which the LCO passes to blip.Plugin.TransformMetrics if specified, then to all sinks specified for the monitor.

type LevelCollectorArgs

type LevelCollectorArgs struct {
	Config           blip.ConfigMonitor
	DB               *sql.DB
	PlanLoader       *plan.Loader
	Sinks            []blip.Sink
	TransformMetrics func([]*blip.Metrics) error
}

type LoadFunc

type LoadFunc func(blip.Config) ([]blip.ConfigMonitor, error)

LoadFunc is a callback that matches blip.Plugin.LoadMonitors. It's an arg to NewLoader, if specified by the user.

type Loader

type Loader struct {
	*sync.Mutex
	// contains filtered or unexported fields
}

Loader is the singleton monitor loader and repo. It's created by the server and only used there (and via API calls). It's dynamic so monitors can be loaded (created) and unloaded (destroyed) while Blip is running, but the normal case is one load and start on Blip startup: Server.Boot calls Load, then Server.Run calls StartMonitors. The user can make API calls to reload while Blip is running.

Loader is safe for concurrent use, but it's currently only called by the Server.

func NewLoader

func NewLoader(args LoaderArgs) *Loader

NewLoader creates a new Loader singleton. It's called in Server.Boot and Server.Run.

func (*Loader) Count

func (ml *Loader) Count() uint

Count returns the number of loaded monitors. It's used by the API for status.

func (*Loader) Load

func (ml *Loader) Load(ctx context.Context) error

Load loads all configured monitors and unloads (stops and removes) monitors that have been removed or changed since the last call to Load. It does not start new monitors. Call StartMonitors after Load to start new (or previously stopped) monitors.

Server.Boot calls Load, then Server.Run calls StartMonitors.

Load checks for stop-loss and does local MySQL auto-detection, if these two features are enabled.

If Load returns error, the currently loaded monitors are not affected. The error indicates a problem loading monitors or a validation error.

This function is safe for concurrent use, but calls are serialized.

func (*Loader) Monitor

func (ml *Loader) Monitor(monitorId string) *Monitor

Monitor returns one monitor by ID. It's used by the API to get single monitor status.

func (*Loader) Monitors

func (ml *Loader) Monitors() []*Monitor

Monitors returns a list of all currently loaded monitors.

func (*Loader) Print

func (ml *Loader) Print() string

Print prints all loaded monitors in blip.ConfigMonitor YAML format. It's used for --print-monitors.

func (*Loader) Start

func (ml *Loader) Start(monitorId string, lock bool) error

Start starts a monitor if it's not already running.

func (*Loader) StartMonitors

func (ml *Loader) StartMonitors()

StartMonitors starts all monitors that have been loaded but not started. This should be called after Load. On Blip startup, the server calls Load in Server.Boot, then StartMonitors in server.Run. The user can reload by calling the server API: /monitors/reload.

This function is safe for concurrent use, but calls are serialized.

func (*Loader) Stop

func (ml *Loader) Stop(monitorId string, lock bool) error

Stop stops a monitor but does not unload it. It can be started again by calling Start.

func (*Loader) Unload

func (ml *Loader) Unload(monitorId string, lock bool) error

Unload stops and removes a monitor.

type LoaderArgs

type LoaderArgs struct {
	Config     blip.Config
	Factories  blip.Factories
	Plugins    blip.Plugins
	PlanLoader *plan.Loader
	RDSLoader  aws.RDSLoader
}

type Monitor

type Monitor struct {
	// contains filtered or unexported fields
}

Monitor monitors one MySQL instance. The monitor is a high-level component that runs (and keeps running) four monitor subsystems:

  • Plan changer (PCH)
  • Level collector (LCO)
  • Blip heartbeat writer
  • Exporter (Prometheus)

Each subsystem is optional based on the config, but LCO runs by default because it contains the Engine component that does actual metrics collection. If any subsystem crashes (returns for any reason or panics), the monitor stops and restarts all subsystems. The monitor doesn't stop until Stop is called. Consequently, if a monitor is not configured correctly (for example, it can't connect to MySQL), it tries and reports every forever.

Monitors are loaded, created, and initially started only by the MonitorLoader. A monitor can be stopped and started (again) via the server API.

A monitor is uniquely identified by its monitor ID, which should be included in all output by the monitor and its subsystems. The monitor ID is set when loaded by the MonitoLoad, which calls blip.MonitorId to determine the value.

A monitor is completely self-contained and independent. For example, each monitor has its own LCO, engine, and metric collectors.

func NewMonitor

func NewMonitor(args MonitorArgs) *Monitor

NewMonitor creates a new Monitor with the given arguments. The caller must call Boot then, if that does not return an error, Run to start monitoring the MySQL instance.

func (*Monitor) Config

func (m *Monitor) Config() blip.ConfigMonitor

Config returns the monitor config.

func (*Monitor) DSN

func (m *Monitor) DSN() string

DSN returns the redacted DSN (no password).

func (*Monitor) MonitorId

func (m *Monitor) MonitorId() string

MonitorId returns the monitor ID.

func (*Monitor) Start

func (m *Monitor) Start() error

Start starts the monitor. If it's already running, it returns an error. It can be called again after calling Stop.

Start/stop monitors only through the Loader. DO NOT call Start or Stop directly, else the running state of the monitor and the Loader will be out of sync.

func (*Monitor) Stop

func (m *Monitor) Stop() error

Stop stops the monitor. It is idempotent and thread-safe.

Start/stop monitors only through the Loader. DO NOT call Start or Stop directly, else the running state of the monitor and the Loader will be out of sync.

type MonitorArgs

type MonitorArgs struct {
	Config          blip.ConfigMonitor
	DbMaker         blip.DbFactory
	PlanLoader      *plan.Loader
	Sinks           []blip.Sink
	TransformMetric func([]*blip.Metrics) error
	HA              ha.Manager
}

MonitorArgs are required arguments to NewMonitor.

type PlanChanger

type PlanChanger interface {
	Run(stopChan, doneChan chan struct{}) error
}

PlanChanger (PCH) changes the plan based on database instance state. If the plan changes, the PCH calls the LevelCollector (LCO) to do the real low-level work of swapping plans, because the LCO executes plans. In this sense, "Changer" is a bit misleading because it doesn't change the plan, it just determines if/when the plan should change, and then tells the LCO to actually change the plan.

type PlanChangerArgs

type PlanChangerArgs struct {
	MonitorId string
	Config    blip.ConfigPlanChange
	DB        *sql.DB
	LCO       LevelCollector
	HA        ha.Manager
}

type StartMonitorFunc

type StartMonitorFunc func(blip.ConfigMonitor) bool

StartMonitorFunc is a callback that matches blip.Plugin.StartMonitor.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL