healthcheck

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 8, 2021 License: Apache-2.0 Imports: 0 Imported by: 0

README

healthcheck

Healthcheck is a library for implementing Kubernetes liveness and readiness probe handlers in your Go application.

Features

  • Integrates easily with Kubernetes. This library explicitly separates liveness vs. readiness checks instead of lumping everything into a single category of check.

  • Optionally exposes each check as a Prometheus gauge metric. This allows for cluster-wide monitoring and alerting on individual checks.

  • Supports asynchronous checks, which run in a background goroutine at a fixed interval. These are useful for expensive checks that you don't want to add latency to the liveness and readiness endpoints.

  • Includes a small library of generically useful checks for validating upstream DNS, TCP, HTTP, and database dependencies as well as checking basic health of the Go runtime.

  • Provides an implementation that supports gRPC Health checking protocol, i.e. it automatically sets correct serving status of gRPC Health server.

Usage

See the GoDoc examples for more detail.

  • Install with go get or your favorite Go dependency manager: go get -u github.com/GlobalWebIndex/healthcheck

  • Import the package: import "github.com/GlobalWebIndex/healthcheck/handlers"

  • Create a handlers.Handler:

    health := handlers.NewHandler()
    

    or

    grpcHealth := handlers.NewGrpcHandler(...)
    
  • Configure some application-specific liveness checks (whether the app itself is unhealthy):

    // Our app is not happy if we've got more than 100 goroutines running.
    health.AddLivenessCheck("goroutine-threshold", healthcheck.GoroutineCountCheck(100))
    
  • Configure some application-specific readiness checks (whether the app is ready to serve requests):

    // Our app is not ready if we can't resolve our upstream dependency in DNS.
    health.AddReadinessCheck(
        "upstream-dep-dns",
        healthcheck.DNSResolveCheck("upstream.example.com", 50*time.Millisecond))
    
    // Our app is not ready if we can't connect to our database (`var db *sql.DB`) in <1s.
    health.AddReadinessCheck("database", healthcheck.DatabasePingCheck(db, 1*time.Second))
    

    or

    // Our app is not ready if we can't connect to our database in under 1 sec.
    // Execute this check at 60 second intervals.
    hs.AddLivenessCheck("postgres", checks.DatabaseSelectCheck(db, 1*time.Second), 60*time.Second)
    
    // Our app is not ready if a grpc server 'weather' is not healthy
    c, err := grpc.Dial(...)
    if err != nil {
        ...
    }
    defer c.Close()
    
    healthClient := grpc_health_v1.NewHealthClient(c)
    grpcHealth.AddGrpcReadinessCheck("grpc-weather", healthClient)
    
  • Expose the /live and /ready endpoints over HTTP (on port 8086):

    go http.ListenAndServe("0.0.0.0:8086", health)
    
  • Configure your Kubernetes container with HTTP liveness and readiness probes see the (Kubernetes documentation) for more detail:

    # this is a bare bones example
    # copy and paste livenessProbe and readinessProbe as appropriate for your app
    apiVersion: v1
    kind: Pod
    metadata:
      name: heptio-healthcheck-example
    spec:
      containers:
      - name: liveness
        image: your-registry/your-container
    
        # define a liveness probe that checks every 5 seconds, starting after 5 seconds
        livenessProbe:
          httpGet:
            path: /live
            port: 8086
          initialDelaySeconds: 5
          periodSeconds: 5
    
        # define a readiness probe that checks every 5 seconds
        readinessProbe:
          httpGet:
            path: /ready
            port: 8086
          periodSeconds: 5
    
  • If one of your readiness checks fails, Kubernetes will stop routing traffic to that pod within a few seconds (depending on periodSeconds and other factors).

  • If one of your liveness checks fails or your app becomes totally unresponsive, Kubernetes will restart your container.

HTTP Endpoints

When you run go http.ListenAndServe("0.0.0.0:8086", health), two HTTP endpoints are exposed:

  • /live: liveness endpoint (HTTP 200 if healthy, HTTP 503 if unhealthy)
  • /ready: readiness endpoint (HTTP 200 if healthy, HTTP 503 if unhealthy)

Pass the ?full=1 query parameter to see the full check results as JSON. These are omitted by default for performance.

Documentation

Overview

Package healthcheck helps you implement Kubernetes liveness and readiness checks for your application. It supports synchronous and asynchronous (background) checks. It can optionally report each check's status as a set of Prometheus gauge metrics for cluster-wide monitoring and alerting. It even contains a handler with gRPC support.

It also includes a small library of generic checks for DNS, TCP, and HTTP reachability as well as Goroutine usage.

Example
package main

import (
	"fmt"
	"net/http"
	"net/http/httptest"
	"net/http/httputil"
	"strings"
	"time"

	"github.com/GlobalWebIndex/healthcheck/checks"
	"github.com/GlobalWebIndex/healthcheck/handlers"
)

func main() {
	// Create a Handler that we can use to register liveness and readiness checks.
	health := handlers.NewHandler()

	// Add a readiness check to make sure an upstream dependency resolves in DNS.
	// If this fails we don't want to receive requests, but we shouldn't be
	// restarted or rescheduled.
	upstreamHost := "upstream.example.com"
	err := health.AddReadinessCheck(
		"upstream-dep-dns",
		checks.DNSResolveCheck(upstreamHost, 50*time.Millisecond))
	if err != nil {
		panic("`health.AddReadinessCheck()` failed")
	}

	// Add a liveness check to detect Goroutine leaks. If this fails we want
	// to be restarted/rescheduled.
	err = health.AddLivenessCheck("goroutine-threshold", checks.GoroutineCountCheck(100))
	if err != nil {
		panic("`health.AddLivenessCheck()` failed")
	}

	// Serve http://0.0.0.0:8080/live and http://0.0.0.0:8080/ready endpoints.
	// go http.ListenAndServe("0.0.0.0:8080", health)

	// Make a request to the readiness endpoint and print the response.
	fmt.Print(dumpRequest(health, "GET", "/ready"))

}

func dumpRequest(handler http.Handler, method string, path string) string {
	req, err := http.NewRequest(method, path, nil)
	if err != nil {
		panic(err)
	}
	rr := httptest.NewRecorder()
	handler.ServeHTTP(rr, req)
	dump, err := httputil.DumpResponse(rr.Result(), true)
	if err != nil {
		panic(err)
	}
	return strings.Replace(string(dump), "\r\n", "\n", -1)
}
Output:

HTTP/1.1 503 Service Unavailable
Connection: close
Content-Type: application/json; charset=utf-8

{}
Example (Advanced)
package main

import (
	"fmt"
	"net/http"
	"net/http/httptest"
	"net/http/httputil"
	"strings"
	"time"

	"github.com/GlobalWebIndex/healthcheck/checks"
	"github.com/GlobalWebIndex/healthcheck/handlers"
)

func main() {
	// Create a Handler that we can use to register liveness and readiness checks.
	health := handlers.NewHandler()

	// Make sure we can connect to an upstream dependency over TCP in less than
	// 50ms. Run this check asynchronously in the background every 10 seconds
	// instead of every time the /ready or /live endpoints are hit.
	//
	// Async is useful whenever a check is expensive (especially if it causes
	// load on upstream services).
	upstreamAddr := "upstream.example.com:5432"
	err := health.AddReadinessCheck(
		"upstream-dep-tcp",
		checks.Async(checks.TCPDialCheck(upstreamAddr, 50*time.Millisecond), 10*time.Second))
	if err != nil {
		panic("`health.AddReadinessCheck()` failed")
	}

	// Add a readiness check against the health of an upstream HTTP dependency
	upstreamURL := "http://upstream-svc.example.com:8080/healthy"
	err = health.AddReadinessCheck(
		"upstream-dep-http",
		checks.HTTPGetCheck(upstreamURL, 500*time.Millisecond))
	if err != nil {
		panic("`health.AddReadinessCheck()` failed")
	}

	// Implement a custom check with a 50 millisecond timeout.
	err = health.AddLivenessCheck("custom-check-with-timeout", checks.Timeout(func() error {
		// Simulate some work that could take a long time
		time.Sleep(time.Millisecond * 100)
		return nil
	}, 50*time.Millisecond))
	if err != nil {
		panic("`health.AddLivenessCheck()` failed")
	}

	// Expose the readiness endpoints on a custom path /healthz mixed into
	// our main application mux.
	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		_, _ = w.Write([]byte("Hello, world!"))
	})
	mux.HandleFunc("/healthz", health.ReadyEndpoint)

	// Sleep for just a moment to make sure our Async handler had a chance to run
	time.Sleep(500 * time.Millisecond)

	// Make a sample request to the /healthz endpoint and print the response.
	fmt.Println(dumpRequest(mux, "GET", "/healthz"))

}

func dumpRequest(handler http.Handler, method string, path string) string {
	req, err := http.NewRequest(method, path, nil)
	if err != nil {
		panic(err)
	}
	rr := httptest.NewRecorder()
	handler.ServeHTTP(rr, req)
	dump, err := httputil.DumpResponse(rr.Result(), true)
	if err != nil {
		panic(err)
	}
	return strings.Replace(string(dump), "\r\n", "\n", -1)
}
Output:

HTTP/1.1 503 Service Unavailable
Connection: close
Content-Type: application/json; charset=utf-8

{}
Example (Database)
// Connect to a database/sql database
database := connectToDatabase()

// Create a Handler that we can use to register liveness and readiness checks.
health := handlers.NewHandler()

// Add a readiness check to we don't receive requests unless we can reach
// the database with a ping in <1 second.
err := health.AddReadinessCheck("database", checks.DatabaseSelectCheck(database, 1*time.Second))
if err != nil {
	panic("`health.AddReadinessCheck()` failed")
}

// Serve http://0.0.0.0:8080/live and http://0.0.0.0:8080/ready endpoints.
// go http.ListenAndServe("0.0.0.0:8080", health)

// Make a request to the readiness endpoint and print the response.
fmt.Print(dumpRequest(health, "GET", "/ready?full=1"))
Output:

HTTP/1.1 200 OK
Connection: close
Content-Type: application/json; charset=utf-8

{
    "database": "OK"
}
Example (Metrics)
package main

import (
	"fmt"
	"net/http"
	"net/http/httptest"
	"net/http/httputil"
	"strings"

	"github.com/GlobalWebIndex/healthcheck/handlers"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
	// Create a new Prometheus registry (you'd likely already have one of these).
	registry := prometheus.NewRegistry()

	// Create a metrics-exposing Handler for the Prometheus registry
	// The healthcheck related metrics will be prefixed with the provided namespace
	health := handlers.NewMetricsHandler(registry, "example")

	// Add a simple readiness check that always fails.
	err := health.AddReadinessCheck("failing-check", func() error {
		return fmt.Errorf("example failure")
	})
	if err != nil {
		panic("`health.AddReadinessCheck()` failed")
	}

	// Add a liveness check that always succeeds
	err = health.AddLivenessCheck("successful-check", func() error {
		return nil
	})
	if err != nil {
		panic("`health.AddLivenessCheck()` failed")
	}

	// Create an "admin" listener on 0.0.0.0:9402
	adminMux := http.NewServeMux()
	// go http.ListenAndServe("0.0.0.0:9402", adminMux)

	// Expose prometheus metrics on /metrics
	adminMux.Handle("/metrics", promhttp.HandlerFor(registry, promhttp.HandlerOpts{}))

	// Expose a liveness check on /live
	adminMux.HandleFunc("/live", health.LiveEndpoint)

	// Expose a readiness check on /ready
	adminMux.HandleFunc("/ready", health.ReadyEndpoint)

	// Make a request to the metrics endpoint and print the response.
	fmt.Println(dumpRequest(adminMux, "GET", "/metrics"))

}

func dumpRequest(handler http.Handler, method string, path string) string {
	req, err := http.NewRequest(method, path, nil)
	if err != nil {
		panic(err)
	}
	rr := httptest.NewRecorder()
	handler.ServeHTTP(rr, req)
	dump, err := httputil.DumpResponse(rr.Result(), true)
	if err != nil {
		panic(err)
	}
	return strings.Replace(string(dump), "\r\n", "\n", -1)
}
Output:

HTTP/1.1 200 OK
Connection: close
Content-Type: text/plain; version=0.0.4; charset=utf-8

# HELP example_healthcheck_status Current check status (0 indicates success, 1 indicates failure)
# TYPE example_healthcheck_status gauge
example_healthcheck_status{check="failing-check"} 1
example_healthcheck_status{check="successful-check"} 0

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL