watchdog

package
v0.0.0-...-73a4147 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 5, 2024 License: Apache-2.0, MIT Imports: 9 Imported by: 0

Documentation

Overview

Package watchdog is responsible for monitoring the sentry for tasks that may potentially be stuck or looping inderterminally causing hard to debug hungs in the untrusted app.

It works by periodically querying all tasks to check whether they are in user mode (RunUser), kernel mode (RunSys), or blocked in the kernel (OffCPU). Tasks that have been running in kernel mode for a long time in the same syscall without blocking are considered stuck and are reported.

When a stuck task is detected, the watchdog can take one of the following actions:

  1. LogWarning: Logs a warning message followed by a stack dump of all goroutines. If a tasks continues to be stuck, the message will repeat every minute, unless a new stuck task is detected
  2. Panic: same as above, followed by panic()

Index

Constants

This section is empty.

Variables

View Source
var DefaultOpts = Opts{

	TaskTimeout:       3 * time.Minute,
	TaskTimeoutAction: LogWarning,

	StartupTimeout:       30 * time.Second,
	StartupTimeoutAction: LogWarning,
}

DefaultOpts is a default set of options for the watchdog.

Functions

This section is empty.

Types

type Action

type Action int

Action defines what action to take when a stuck task is detected.

const (
	// LogWarning logs warning message followed by stack trace.
	LogWarning Action = iota

	// Panic will do the same logging as LogWarning and panic().
	Panic
)

func (*Action) Get

func (a *Action) Get() any

Get implements flag.Value.

func (*Action) Set

func (a *Action) Set(v string) error

Set implements flag.Value.

func (Action) String

func (a Action) String() string

String returns Action's string representation.

type Opts

type Opts struct {
	// TaskTimeout is the amount of time to allow a task to execute the
	// same syscall without blocking before it's declared stuck.
	TaskTimeout time.Duration

	// TaskTimeoutAction indicates what action to take when a stuck tasks
	// is detected.
	TaskTimeoutAction Action

	// StartupTimeout is the amount of time to allow between watchdog
	// creation and calling watchdog.Start.
	StartupTimeout time.Duration

	// StartupTimeoutAction indicates what action to take when
	// watchdog.Start is not called within the timeout.
	StartupTimeoutAction Action
}

Opts configures the watchdog.

type Watchdog

type Watchdog struct {
	// Configuration options are embedded.
	Opts
	// contains filtered or unexported fields
}

Watchdog is the main watchdog class. It controls a goroutine that periodically analyses all tasks and reports if any of them appear to be stuck.

func New

func New(k *kernel.Kernel, opts Opts) *Watchdog

New creates a new watchdog.

func (*Watchdog) Start

func (w *Watchdog) Start()

Start starts the watchdog.

func (*Watchdog) Stop

func (w *Watchdog) Stop()

Stop requests the watchdog to stop and wait for it.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL