monitor

package

v1.2.1 Latest Latest Go to latest Published: Jul 19, 2024 License: Apache-2.0 Imports: 26 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cashapp/blip

Links

Open Source Insights

Documentation ¶

Overview ¶

Package monitor provides core Blip components that, together, monitor one MySQL instance. Most monitoring logic happens in the package, but package metrics is closely related: this latter actually collect metrics, but it is driven by this package. Other Blip packages are mostly set up and support of monitors.

Index ¶

Variables
func NewLevelCollector(args LevelCollectorArgs) *lco
func NewPlanChanger(args PlanChangerArgs) *planChanger
func TickerDuration(d, e time.Duration)
type Engine
- func NewEngine(cfg blip.ConfigMonitor, db *sql.DB) *Engine
- func (e *Engine) Collect(emrCtx context.Context, interval uint, levelName string, startTime time.Time) ([]*blip.Metrics, error)
- func (e *Engine) DB() *sql.DB
- func (e *Engine) MonitorId() string
- func (e *Engine) Prepare(ctx context.Context, plan blip.Plan, before, after func()) error
- func (e *Engine) Stop()
type Exporter
- func NewExporter(cfg blip.ConfigExporter, plan blip.Plan, engine *Engine) *Exporter
- func (e Exporter) Collect(ch chan<- prometheus.Metric)
- func (e Exporter) Describe(descs chan<- *prometheus.Desc)
- func (e Exporter) Plan() blip.Plan
- func (e Exporter) Scrape() (string, error)
type LevelCollector
type LevelCollectorArgs
type LoadFunc
type Loader
- func NewLoader(args LoaderArgs) *Loader
- func (ml *Loader) Count() uint
- func (ml *Loader) Load(ctx context.Context) error
- func (ml *Loader) Monitor(monitorId string) *Monitor
- func (ml *Loader) Monitors() []*Monitor
- func (ml *Loader) Print() string
- func (ml *Loader) Start(monitorId string, lock bool) error
- func (ml *Loader) StartMonitors()
- func (ml *Loader) Stop(monitorId string, lock bool) error
- func (ml *Loader) Unload(monitorId string, lock bool) error
type LoaderArgs
type Monitor
- func NewMonitor(args MonitorArgs) *Monitor
- func (m *Monitor) Config() blip.ConfigMonitor
- func (m *Monitor) DSN() string
- func (m *Monitor) MonitorId() string
- func (m *Monitor) Start() error
- func (m *Monitor) Stop() error
type MonitorArgs
type PlanChanger
type PlanChangerArgs
type StartMonitorFunc

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	ErrMonitorNotLoaded = errors.New("monitor not loaded")
	ErrStopLoss         = errors.New("stop-loss prevents reloading")
)

View Source

var CollectParallel = 2

CollectParallel sets how many domains to collect in parallel. Currently, this is not configurable via Blip config; it can only be changed via integration.

View Source

var Now func() time.Time = time.Now

Functions ¶

func NewLevelCollector ¶

func NewLevelCollector(args LevelCollectorArgs) *lco

func NewPlanChanger ¶

func NewPlanChanger(args PlanChangerArgs) *planChanger

func TickerDuration ¶

func TickerDuration(d, e time.Duration)

TickerDuration sets the internal ticker duration for testing. This is only called for testing; do not called outside testing.

Types ¶

type Engine ¶

type Engine struct {
	*sync.Mutex
	// contains filtered or unexported fields
}

Engine runs domain metric collectors to collect metrics. It's called by the LevelCollector (LCO) at intervals and expected to collect and return within an engine make runtime (EMR) passed to Collect. The LCO creates the Engine. On LCO.Stop, the Engine must stop/destroy all collectors because the LCO will stop/destroy the Engine. Like all Monitor components, an Engine is not restarted or reused, it's recreated if the Monitor is restarted.

func NewEngine ¶

func NewEngine(cfg blip.ConfigMonitor, db *sql.DB) *Engine

func (*Engine) Collect ¶

func (e *Engine) Collect(emrCtx context.Context, interval uint, levelName string, startTime time.Time) ([]*blip.Metrics, error)

Collect collects the metrics at the given level. There are 3 return guarantees for the slice of metrics:

metrics[0] is non-nil (always returns at least one blip.Metrics)
metrics[n].Values is non-nil (but might be empty, no values)
[]metrics is sorted ascending by Interval

Collect returns when all collectors it starts return, or when emrCtx (engine max runtime) expires. The former is the normal case.

Both metrics and an error can be returned in the case of partially success: some collectors work but others fail. Caller should check returned metrics even if an error is returned.

func (*Engine) DB ¶

func (e *Engine) DB() *sql.DB

func (*Engine) MonitorId ¶

func (e *Engine) MonitorId() string

func (*Engine) Prepare ¶

func (e *Engine) Prepare(ctx context.Context, plan blip.Plan, before, after func()) error

Prepare prepares the engine to collect metrics for the plan. The engine must be successfully prepared for Collect() to work because Prepare() initializes metric collectors for every level of the plan. Prepare() can be called again when, for example, the PlanChanger detects a state change and calls the LevelCollector to change plans, which than calls this func with the new state plan.

Do not call this func concurrently! It does not guard against concurrent calls. Serialization is handled by the only caller: LevelCollector.ChangePlan().

func (*Engine) Stop ¶

func (e *Engine) Stop()

Stop the engine and cleanup any metrics associated with it. TODO: There is a possible race condition when this is called. Since Engine.Collect is called as a go-routine, we could have an invocation of the function block waiting for Engine.Stop to unlock after which Collect would run after cleanup has been called. This could result in a panic, though that should be caught and logged. Since the monitor is stopping anyway this isn't a huge issue.

type Exporter ¶

type Exporter struct {
	*sync.Mutex
	// contains filtered or unexported fields
}

Exporter emulates a Prometheus mysqld_exporter. It implements prom.Exporter.

func NewExporter ¶

func NewExporter(cfg blip.ConfigExporter, plan blip.Plan, engine *Engine) *Exporter

func (Exporter) Collect ¶

func (e Exporter) Collect(ch chan<- prometheus.Metric)

Collect collects metrics. It is called indirectly via Scrape.

func (Exporter) Describe ¶

func (e Exporter) Describe(descs chan<- *prometheus.Desc)

func (Exporter) Plan ¶

func (e Exporter) Plan() blip.Plan

func (Exporter) Scrape ¶

func (e Exporter) Scrape() (string, error)

Scrape collects and returns metrics in Prometheus exposition format. This function is called in response to GET /metrics.

type LevelCollector ¶

type LevelCollector interface {
	// Run runs the collector to collect metrics; it's a blocking call.
	Run(stopChan, doneChan chan struct{}) error

	// ChangePlan changes the plan; it's called by the PlanChanger.
	ChangePlan(newState, newPlanName string) error

	// Pause pauses metrics collection until ChangePlan is called.
	Pause()
}

LevelCollector (LCO) executes the current plan to collect metrics. It's also responsible for changing the plan when called by the PlanChanger.

The term "collector" is a little misleading because the LCO doesn't collect metrics, but it is the first step in the metrics collection process, which looks roughly like: LCO -> Engine -> metric collectors -> MySQL. In Run, the LCO checks every 1s for the highest level in the plan to collect. For example, after 5s it'll collect levels with a frequency divisible by 5s. See https://cashapp.github.io/blip/v1.0/intro/plans.

Metrics from MySQL flow back to the LCO as blip.Metrics, which the LCO passes to blip.Plugin.TransformMetrics if specified, then to all sinks specified for the monitor.

type LevelCollectorArgs ¶

type LevelCollectorArgs struct {
	Config           blip.ConfigMonitor
	DB               *sql.DB
	PlanLoader       *plan.Loader
	Sinks            []blip.Sink
	TransformMetrics func([]*blip.Metrics) error
}

type LoadFunc ¶

type LoadFunc func(blip.Config) ([]blip.ConfigMonitor, error)

LoadFunc is a callback that matches blip.Plugin.LoadMonitors. It's an arg to NewLoader, if specified by the user.

type Loader ¶

type Loader struct {
	*sync.Mutex
	// contains filtered or unexported fields
}

Loader is the singleton monitor loader and repo. It's created by the server and only used there (and via API calls). It's dynamic so monitors can be loaded (created) and unloaded (destroyed) while Blip is running, but the normal case is one load and start on Blip startup: Server.Boot calls Load, then Server.Run calls StartMonitors. The user can make API calls to reload while Blip is running.

Loader is safe for concurrent use, but it's currently only called by the Server.

func NewLoader ¶

func NewLoader(args LoaderArgs) *Loader

NewLoader creates a new Loader singleton. It's called in Server.Boot and Server.Run.

func (*Loader) Count ¶

func (ml *Loader) Count() uint

Count returns the number of loaded monitors. It's used by the API for status.

func (*Loader) Load ¶

func (ml *Loader) Load(ctx context.Context) error

Load loads all configured monitors and unloads (stops and removes) monitors that have been removed or changed since the last call to Load. It does not start new monitors. Call StartMonitors after Load to start new (or previously stopped) monitors.

Server.Boot calls Load, then Server.Run calls StartMonitors.

Load checks for stop-loss and does local MySQL auto-detection, if these two features are enabled.

If Load returns error, the currently loaded monitors are not affected. The error indicates a problem loading monitors or a validation error.

This function is safe for concurrent use, but calls are serialized.

func (*Loader) Monitor ¶

func (ml *Loader) Monitor(monitorId string) *Monitor

Monitor returns one monitor by ID. It's used by the API to get single monitor status.

func (*Loader) Monitors ¶

func (ml *Loader) Monitors() []*Monitor

Monitors returns a list of all currently loaded monitors.

func (*Loader) Print ¶

func (ml *Loader) Print() string

Print prints all loaded monitors in blip.ConfigMonitor YAML format. It's used for --print-monitors.

func (*Loader) Start ¶

func (ml *Loader) Start(monitorId string, lock bool) error

Start starts a monitor if it's not already running.

func (*Loader) StartMonitors ¶

func (ml *Loader) StartMonitors()

StartMonitors starts all monitors that have been loaded but not started. This should be called after Load. On Blip startup, the server calls Load in Server.Boot, then StartMonitors in server.Run. The user can reload by calling the server API: /monitors/reload.

This function is safe for concurrent use, but calls are serialized.

func (*Loader) Stop ¶

func (ml *Loader) Stop(monitorId string, lock bool) error

Stop stops a monitor but does not unload it. It can be started again by calling Start.

func (*Loader) Unload ¶

func (ml *Loader) Unload(monitorId string, lock bool) error

Unload stops and removes a monitor.

type LoaderArgs ¶

type LoaderArgs struct {
	Config     blip.Config
	Factories  blip.Factories
	Plugins    blip.Plugins
	PlanLoader *plan.Loader
	RDSLoader  aws.RDSLoader
}

type Monitor ¶

type Monitor struct {
	// contains filtered or unexported fields
}

Monitor monitors one MySQL instance. The monitor is a high-level component that runs (and keeps running) four monitor subsystems:

Plan changer (PCH)
Level collector (LCO)
Blip heartbeat writer
Exporter (Prometheus)

Each subsystem is optional based on the config, but LCO runs by default because it contains the Engine component that does actual metrics collection. If any subsystem crashes (returns for any reason or panics), the monitor stops and restarts all subsystems. The monitor doesn't stop until Stop is called. Consequently, if a monitor is not configured correctly (for example, it can't connect to MySQL), it tries and reports every forever.

Monitors are loaded, created, and initially started only by the MonitorLoader. A monitor can be stopped and started (again) via the server API.

A monitor is uniquely identified by its monitor ID, which should be included in all output by the monitor and its subsystems. The monitor ID is set when loaded by the MonitoLoad, which calls blip.MonitorId to determine the value.

A monitor is completely self-contained and independent. For example, each monitor has its own LCO, engine, and metric collectors.

func NewMonitor ¶

func NewMonitor(args MonitorArgs) *Monitor

NewMonitor creates a new Monitor with the given arguments. The caller must call Boot then, if that does not return an error, Run to start monitoring the MySQL instance.

func (*Monitor) Config ¶

func (m *Monitor) Config() blip.ConfigMonitor

Config returns the monitor config.

func (*Monitor) DSN ¶

func (m *Monitor) DSN() string

DSN returns the redacted DSN (no password).

func (*Monitor) MonitorId ¶

func (m *Monitor) MonitorId() string

MonitorId returns the monitor ID.

func (*Monitor) Start ¶

func (m *Monitor) Start() error

Start starts the monitor. If it's already running, it returns an error. It can be called again after calling Stop.

Start/stop monitors only through the Loader. DO NOT call Start or Stop directly, else the running state of the monitor and the Loader will be out of sync.

func (*Monitor) Stop ¶

func (m *Monitor) Stop() error

Stop stops the monitor. It is idempotent and thread-safe.

Start/stop monitors only through the Loader. DO NOT call Start or Stop directly, else the running state of the monitor and the Loader will be out of sync.

type MonitorArgs ¶

type MonitorArgs struct {
	Config          blip.ConfigMonitor
	DbMaker         blip.DbFactory
	PlanLoader      *plan.Loader
	Sinks           []blip.Sink
	TransformMetric func([]*blip.Metrics) error
	HA              ha.Manager
}

MonitorArgs are required arguments to NewMonitor.

type PlanChanger ¶

type PlanChanger interface {
	Run(stopChan, doneChan chan struct{}) error
}

PlanChanger (PCH) changes the plan based on database instance state. If the plan changes, the PCH calls the LevelCollector (LCO) to do the real low-level work of swapping plans, because the LCO executes plans. In this sense, "Changer" is a bit misleading because it doesn't change the plan, it just determines if/when the plan should change, and then tells the LCO to actually change the plan.

type PlanChangerArgs ¶

type PlanChangerArgs struct {
	MonitorId string
	Config    blip.ConfigPlanChange
	DB        *sql.DB
	LCO       LevelCollector
	HA        ha.Manager
}

type StartMonitorFunc ¶

type StartMonitorFunc func(blip.ConfigMonitor) bool

StartMonitorFunc is a callback that matches blip.Plugin.StartMonitor.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL