Documentation ¶
Overview ¶
Package gwatchdog provides a Watchdog type that periodically communicates with subsystems that have opted in to the watchdog. Each subsystem that opts in provides an interval and jitter indicating how frequently the watchdog will poll the subsystem, and a timeout indicating the tolerable duration for the subystem's response. If the subsystem does not repsond within the tolerable duration, the watchdog invokes a termination by canceling the root context.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func IsTermination ¶
IsTermination reports whether the context was cancelled by the watchdog.
Types ¶
type FailureToRespondError ¶
type FailureToRespondError struct {
SubsystemName string
}
FailureToRespondError indicates a particular subsystem failed to respond to its watchdog monitor within the configured response expectation duration.
func (FailureToRespondError) Error ¶
func (e FailureToRespondError) Error() string
type ForcedTerminationError ¶
type ForcedTerminationError struct {
Reason string
}
ForcedTerminationError indicates that *Watchdog.Terminate was called.
func (ForcedTerminationError) Error ¶
func (e ForcedTerminationError) Error() string
type MonitorConfig ¶
type MonitorConfig struct { // The name of the subsystem being monitored, for reporting purposes. Name string // The watchdog will poll the subsystem every Interval + [-Jitter, +Jitter) duration. // The jitter range is uniformly distributed. Interval, Jitter time.Duration // If the subsystem does not both accept the signal // and close its Alive response channel within ResponseTimeout, // the watchdog sends a termination signal to the entire system. ResponseTimeout time.Duration }
type Signal ¶
type Signal struct {
// Every signal will have a non-nil, non-closed Alive channel.
Alive chan<- struct{}
}
Signal is the value returned by *Watchdog.Monitor. The subsystem requesting the monitor must respond to the signal as soon as possible in order to prevent the watchdog from terminating the entire system.
type Watchdog ¶
type Watchdog struct {
// contains filtered or unexported fields
}
func NewNopWatchdog ¶
NewNopWatchdog returns a new Watchdog that disregards calls to *Watchdog.Monitor, but still respects calls to Terminate.
NewNopWatchdog should only be called in test.
func NewWatchdog ¶
NewWatchdog returns a new Watchdog and a context associated with the watchdog and derived from the passed-in context.
The returned context is canceled if a subsystem who subscribes through *Watchdog.Monitor fails to respond to a signal within its configured response timeout, or more rarely, upon a call to *Watchdog.Terminate.
func (*Watchdog) Monitor ¶
func (w *Watchdog) Monitor(ctx context.Context, cfg MonitorConfig) <-chan Signal
Monitor configures a monitor for an individual subsystem. The subsystem requesting a monitor must receive from the returned channel in the subystem's main loop and close the [Signal.Alive] channel to indicate timely receipt of the signal.
The name argument is used for reporting purposes, to indicate which subsystem is being monitored.
Under normal operation, a value will arrive on the returned channel every interval + [-jitter/2, +jitter/2) duration; the jitter duration is uniformly distributed. However, it will also be possible in the future for an operator to request a status check.
If the context is cancelled before the new monitor starts running, the returned channel is nil.
func (*Watchdog) Terminate ¶
Terminate forces the watchdog context to be cancelled with a cause of ForcedTerminationError.
func (*Watchdog) Wait ¶
func (w *Watchdog) Wait()
Wait blocks until w's background goroutines complete. The goroutines are tied to the lifecycle of the context passed to NewWatchdog, so simply calling Terminate or failing to process a monitor signal are not sufficient to unblock a call to Wait.