health

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 18, 2021 License: MIT Imports: 6 Imported by: 94

README

Health

A simple and flexible health check library for Go.

Build codecov Go Report Card GolangCI FOSSA Status

Documentation · Report Bug · Request Feature

Table of Contents

  1. Features
  2. Getting Started
  3. Caching
  4. Periodic Checks
  5. Failure Tolerance
  6. Listening to Status Changes
  7. Listening to Lifecycle Events
  8. License

Features

This library allows you to build health checks that do not simply return HTTP status code 200 but actually check if all necessary components are healthy.

This library provides the following features:

  • Request based and fixed-schedule health checks.
  • Global and check-based timeout management.
  • Caching
  • Lifecycle hooks and status change listeners.
  • Failure tolerance based on fail count and/or time thresholds.
  • Provides an http.Handler and http.HandlerFunc that are fully compatible with net/http.

Getting Started

package main

import (
	"context"
	"database/sql"
	"fmt"
	"github.com/alexliesenfeld/health"
	_ "github.com/mattn/go-sqlite3"
	"net/http"
	"time"
)

func main() {
	db, _ := sql.Open("sqlite3", "simple.sqlite")
	defer db.Close()

	// Create a new Checker
	checker := health.NewChecker(

		// Configure a global timeout that will be applied to all checks.
		health.WithTimeout(10*time.Second),

		// A simple check to see if database connection is up.
		health.WithCheck(health.Check{
			Name:    "database",
			Timeout: 2 * time.Second, // A a check specific timeout.
			Check:   db.PingContext,
		}),

		// The following check will be executed periodically every 30 seconds.
		health.WithPeriodicCheck(30*time.Second, health.Check{
			Name: "search",
			Check: func(ctx context.Context) error {
				return fmt.Errorf("this makes the check fail")
			},
		}),
	)

	// We Create a new http.Handler that provides health check information
	// serialized as a JSON string via HTTP.
	http.Handle("/health", health.NewHandler(checker))
	http.ListenAndServe(":3000", nil)
}

Because our search component is down, the request curl -u username:password http://localhost:3000/health would yield a response with HTTP status code 503 (Service Unavailable), and the following JSON response body:

{
  "status": "down",
  "details": {
    "database": {
      "status": "up",
      "timestamp": "2021-07-01T08:05:14.603364Z"
    },
    "search": {
      "status": "down",
      "timestamp": "2021-07-01T08:05:08.522685Z",
      "error": "this makes the check fail"
    }
  }
}

Caching

Health check responses are cached to avoid sending too many request to the services that your program checks and to mitigate "denial of service" attacks. The TTL is set to 1 second by default. If you do not want to use caching altogether, you can disable it using the health.WithDisabledCache() configuration option.

Periodic Checks

When executing health check functions synchronously (i.e. for every HTTP request), the overall response delay will be at least as high as the one of your slowest check function. This is usually OK for smaller applications with a low number of quickly checkable dependencies and enabled caching. This approach, however, will likely be problematic for more involved applications that either have many dependencies and/or some relatively slow check functions.

Rather than executing a health check function on every request that is received over the health endpoint, periodic checks execute the check function on a fixed schedule. With this approach, the health status is always read from a local cache. It allows responding to HTTP requests instantly without waiting for the check function to complete.

Periodic checks can be configured using the WithPeriodicCheck configuration option (see example above).

Failure Tolerance

This library lets you configure failure tolerant checks that allow some degree of failure. The check is only considered failed, when predefined tolerance thresholds are crossed.

Example

Let's assume that your app provides a REST API but also consumes messages from a Kafka topic. If the connection to Kafka is down, your app can still serve API requests, but it will not process any messages during this time. If the Kafka health check is configured without any failure tolerance, your whole application will become unhealthy. This is most likely not what you want. However, if Kafka is down for too long, there may indeed be a problem that requires attention. In this case, you still may want to flag your app unhealthy by returning a failing health check, so that it can be automatically restarted by your infrastructure.

Failure tolerant health checks let you configure this kind of behaviour.

health.WithCheck(health.Check{
    Name:    "unreliable-service",
    // Check is allowed to fail up to 4 times until considered unavailable
    MaxConsecutiveFails: 4,
    // Check is allowed to be in an erroneous state for up to 1 minute until considered unavailable.
    MaxTimeInError:      1 * time.Minute,
    Check: myCheckFunc,
}),

Listening to Status Changes

It can be useful to react to health status changes. For example, you might want to log status changes, so you can easier correlate logs during root cause analysis or perform actions to mitigate the impact of an unhealthy component.

This library allows you to configure listener functions that will be called either when the overall system health status, or a component status changes.

Example

The example below shows a configuration that adds the following two listeners:

  • a status listener to a check that will be called whenever the status of the check changes (e.g., from "up" to "down"),
  • an overall system status listener, that will be called whenever the overall system status changes.
health.WithPeriodicCheck(5*time.Second, health.Check{
    Name:   "search",
    Check:  myCheckFunc,
    StatusListener: func (name string, state health.CheckState) {
        log.Printf("status of component %s changed to %s", name, state.Status)
    },
}),

health.WithStatusListener(func (status health.AvailabilityStatus, state map[string]health.CheckState) {
    log.Printf("overall system health status changed to %s", status)
}),

Listening to Lifecycle Events

It can be useful to hook into the checking lifecycle to do some pre- and postprocessing. For example, you might want to add tracing (such as Jaeger traces and spans), or some logging functionality that requires you to perform some actions before and after a check function is executed.

This library allows you to add listeners for both,

  • a BeforeCheckListener and AfterCheckListener for each individual component, and
  • a BeforeSystemCheckListener and AfterSystemCheckListener that are triggered before/after a full system check is executed.

Please refer to the documentation for more information.

License

health is free software: you can redistribute it and/or modify it under the terms of the MIT Public License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the MIT Public License for more details.

FOSSA Status

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewHandler added in v0.2.0

func NewHandler(checker Checker) http.Handler

NewHandler creates a new health check http.Handler. If periodic checks have been configured (see WithPeriodicCheck), they will be started as well (if not explicitly turned off using WithManualStart).

func NewHandlerFunc added in v0.3.0

func NewHandlerFunc(checker Checker) http.HandlerFunc

NewHandlerFunc creates a new health check http.Handler. If periodic checks have been configured (see WithPeriodicCheck), they will be started as well (if not explicitly turned off using WithManualStart).

func NewHandlerFuncWithConfig added in v0.3.0

func NewHandlerFuncWithConfig(checker Checker, cfg HandlerConfig) http.HandlerFunc

NewHandlerFuncWithConfig creates a new health check http.Handler. If periodic checks have been configured (see WithPeriodicCheck), they will be started as well (if not explicitly turned off using WithManualStart).

func NewHandlerWithConfig added in v0.3.0

func NewHandlerWithConfig(checker Checker, cfg HandlerConfig) http.Handler

NewHandlerWithConfig creates a new health check http.Handler. If periodic checks have been configured (see WithPeriodicCheck), they will be started as well (if not explicitly turned off using WithManualStart).

func WithAfterCheckListener added in v0.3.0

func WithAfterCheckListener(listener AfterSystemCheckListener) option

WithAfterCheckListener registers a handler function that will be called whenever the overall system health AvailabilityStatus changes. Attention: Ideally, this method should be quick and not block for too long.

func WithBeforeCheckListener added in v0.3.0

func WithBeforeCheckListener(listener BeforeSystemCheckListener) option

WithBeforeCheckListener registers a handler function that will be called whenever the overall system health AvailabilityStatus changes. Attention: Ideally, this method should be quick and not block for too long.

func WithCacheDuration

func WithCacheDuration(duration time.Duration) option

WithCacheDuration sets the duration for how long the aggregated health check result will be cached. This is set to 1 second by default. Caching will prevent that each incoming HTTP request triggers a new health check. A duration of 0 will effectively disable the cache and has the same effect as WithDisabledCache.

func WithCheck

func WithCheck(check Check) option

WithCheck adds a new health check that contributes to the overall service availability AvailabilityStatus. This check will be triggered each time the health check HTTP endpoint is called (and the cache has expired, see WithCacheDuration). If health checks are expensive or you expect a lot of calls to the health endpoint, consider using WithPeriodicCheck instead.

func WithDisabledCache

func WithDisabledCache() option

WithDisabledCache disabled the check cache. This is not recommended in most cases. This will effectively lead to a health endpoint that initiates a new health check for each incoming HTTP request. This may have an impact on the systems that are being checked (especially if health checks are expensive). Caching also mitigates "denial of service" attacks.

func WithDisabledDetails added in v0.3.0

func WithDisabledDetails() option

WithDisabledDetails disables hides all data in the JSON response body but the the AvailabilityStatus itself. Example: { "AvailabilityStatus":"down" }

func WithMaxErrorMessageLength

func WithMaxErrorMessageLength(length uint) option

WithMaxErrorMessageLength limits maximum number of characters in error messages.

func WithPeriodicCheck

func WithPeriodicCheck(refreshPeriod time.Duration, check Check) option

WithPeriodicCheck adds a new health check that contributes to the overall service availability AvailabilityStatus. The health check will be performed on a fixed schedule and will not be executed for each HTTP request (as in contrast to WithCheck). This allows to process a much higher number of HTTP requests without actually calling the checked services too often or to execute long running checks. The health endpoint always returns the last result of the periodic check. When periodic checks are started (happens automatically if WithManualStart is not used) they are also executed for the first time. Until all periodic checks have not been executed at least once, the overall availability AvailabilityStatus will be "unknown" with HTTP AvailabilityStatus code 503 (Service Unavailable).

func WithStatusListener added in v0.3.0

func WithStatusListener(listener SystemStatusListener) option

WithStatusListener registers a handler function that will be called whenever the overall system health AvailabilityStatus changes. Attention: Ideally, this method should be quick and not block for too long.

func WithTimeout

func WithTimeout(timeout time.Duration) option

WithTimeout globally defines a timeout duration for all checks. You can still override this timeout by using the timeout value in the Check configuration. Default value is 30 seconds.

Types

type AfterCheckListener added in v0.3.0

type AfterCheckListener func(ctx context.Context, state CheckState) context.Context

AfterCheckListener is a callback function that will be called right after a components availability status will be checked. The listener is allowed to add or remove values to/from the context in parameter ctx. The new context is expected in the return value of the function. If you do not want to extend the context, just return the passed ctx parameter.

type AfterSystemCheckListener added in v0.3.0

type AfterSystemCheckListener func(ctx context.Context, state map[string]CheckState) context.Context

AfterSystemCheckListener is a callback function that will be called right after a the availability status of the system was checked. The listener is allowed to add or remove values to/from the context in parameter ctx. The new context is expected in the return value of the function. If you do not want to extend the context, just return the passed ctx parameter.

type AvailabilityStatus added in v0.3.0

type AvailabilityStatus string

AvailabilityStatus expresses the availability of either a component or the whole system.

const (
	// StatusUnknown holds the information that the availability
	// status is not known yet, because no check was yet.
	StatusUnknown AvailabilityStatus = "unknown"
	// StatusUp holds the information that the system or component
	// is available.
	StatusUp AvailabilityStatus = "up"
	// StatusDown holds the information that the system or component
	// is not available.
	StatusDown AvailabilityStatus = "down"
)

type BeforeCheckListener added in v0.3.0

type BeforeCheckListener func(ctx context.Context, state CheckState) context.Context

BeforeCheckListener is a callback function that will be called right before a components availability status will be checked. The listener is allowed to add/remove values to the context in parameter ctx. The new context is expected in the return value of the function. If you do not want to extend the context, just return the passed ctx parameter.

type BeforeSystemCheckListener added in v0.3.0

type BeforeSystemCheckListener func(ctx context.Context, state map[string]CheckState) context.Context

BeforeSystemCheckListener is a callback function that will be called right before a the availability status of the system will be checked. The listener is allowed to add/remove values to the context in parameter ctx. The new context is expected in the return value of the function. If you do not want to extend the context, just return the passed ctx parameter.

type Check

type Check struct {
	// The Name must be unique among all checks. Name is a required attribute.
	Name string // Required
	// Check is the check function that will be executed to check availability.
	// This function must return an error if the checked service is considered
	// not available. Check is a required attribute.
	Check func(ctx context.Context) error // Required
	// Timeout will override the global timeout value, if it is smaller than
	// the global timeout (see WithTimeout).
	Timeout time.Duration // Optional
	// MaxTimeInError will set a duration for how long a service must be
	// in an error state until it is considered unavailable.
	MaxTimeInError time.Duration // Optional
	// MaxConsecutiveFails will set a maximum number of consecutive
	// check fails until the service is considered unavailable.
	MaxConsecutiveFails uint // Optional
	// StatusListener allows to set a listener that will be called
	// whenever the AvailabilityStatus of the check changes.
	StatusListener CheckStatusListener // Optional
	// BeforeCheckListener is a callback function that will be called
	// right before a components availability status will be checked.
	BeforeCheckListener BeforeCheckListener // Optional
	// AfterCheckListener is a callback function that will be called
	// right after a components availability status was checked.
	AfterCheckListener AfterCheckListener // Optional
	// contains filtered or unexported fields
}

Check allows to configure health checks.

type CheckState added in v0.3.0

type CheckState struct {
	// LastCheckedAt holds the time of when the check was last executed.
	LastCheckedAt *time.Time
	// LastCheckedAt holds the last time of when the check did not return an error.
	LastSuccessAt *time.Time
	// LastFailureAt holds the last time of when the check did return an error.
	LastFailureAt *time.Time
	// FirstCheckStartedAt holds the time of when the first check was started.
	FirstCheckStartedAt time.Time
	// LastResult holds the error of the last check (is nil if successful).
	LastResult error
	// ConsecutiveFails holds the number of how often the check failed in a row.
	ConsecutiveFails uint
	// The current availability status of the check.
	Status AvailabilityStatus
}

CheckState contains all state attributes of a components check.

type CheckStatus added in v0.3.0

type CheckStatus struct {
	// Status is the availability status of a component.
	Status AvailabilityStatus `json:"status"`
	// Timestamp holds the time when the check happened.
	Timestamp *time.Time `json:"timestamp,omitempty"`
	// Error contains the error message, if a check was not successful.
	Error *string `json:"error,omitempty"`
}

CheckStatus holds the a components health information.

type CheckStatusListener added in v0.3.0

type CheckStatusListener func(ctx context.Context, state CheckState)

CheckStatusListener is a callback function that will be called when a components availability status changes (e.g. from "up" to "down").

type Checker added in v0.3.0

type Checker interface {
	// Start will start all periodic checks and prepares the
	// checker for accepting health check requests.
	Start()
	// Stop stops will stop the checker (i.e. all periodic checks).
	Stop()
	// Check performs a health check. I expects a context, that
	// may contain deadlines to which will be adhered to. The context
	// will be passed to downstream calls.
	Check(ctx context.Context) SystemStatus
	// GetRunningPeriodicCheckCount returns the number of currently
	// running periodic checks.
	GetRunningPeriodicCheckCount() int
}

Checker is the main checker interface and it encapsulates all health checking logic.

func NewChecker added in v0.3.0

func NewChecker(options ...option) Checker

NewChecker creates a standalone health checker. If periodic checks have been configured (see WithPeriodicCheck) or if automatic start is explicitly turned off using WithManualStart). It operates in the same way as NewHandler but returning the Checker directly instead of the handler.

type HandlerConfig added in v0.3.0

type HandlerConfig struct {
	StatusCodeUp            int
	StatusCodeDown          int
	DisableCheckerAutostart bool
}

type SystemStatus added in v0.3.0

type SystemStatus struct {
	// Status is the aggregated availability status of the system.
	Status AvailabilityStatus `json:"status"`
	// Details contains health information about all checked components.
	Details *map[string]CheckStatus `json:"details,omitempty"`
}

SystemStatus holds the aggregated system health information.

type SystemStatusListener added in v0.3.0

type SystemStatusListener func(ctx context.Context, status AvailabilityStatus, state map[string]CheckState)

SystemStatusListener is a callback function that will be called when the system availability status changes (e.g. from "up" to "down").

Directories

Path Synopsis
checks module
examples module

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL