nagios

package module
v0.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 18, 2022 License: MIT Imports: 6 Imported by: 36

README

go-nagios

Shared Golang package for Nagios plugins

Latest Release Go Reference Validate Codebase Validate Docs Lint and Build using Makefile Quick Validation

Table of contents

Status

Alpha quality.

This codebase is subject to change without notice and may break client code that depends on it. You are encouraged to vendor this package if you find it useful until such time that the API is considered stable.

Overview

This package contains common types and package-level variables used when developing Nagios plugins. The intent is to reduce code duplication between various plugins and help reduce typos associated with literal strings.

Features

  • Nagios state constants
    • state labels (e.g., StateOKLabel)
    • state exit codes (e.g., StateOKExitCode)
  • Nagios CheckOutputEOL constant
    • provides a consistent newline format for both Nagios Core and Nagios XI (and presumably other similar monitoring systems)
  • Nagios ServiceState type
    • simple label and exit code "wrapper"
    • useful in client code as a way to map internal check results to a Nagios service state value
  • Supports "branding" callback function to display application name, version, or other information as a "trailer" for check results provided to Nagios
    • this could be useful for identifying what version of a plugin determined the service or host state to be an issue
  • Panics from client code are captured and reported
    • panics are surfaced as CRITICAL state
    • service output and error details are overridden to panic prominent
  • Optional support for emitting performance data generated by plugins
    • NOTE: The implementation of this support is not yet stable and may change in the future. See GH-81 and GH-92 for additional details.
  • Support for collecting multiple errors from client code
  • Support for explicitly omitting Errors section in LongServiceOutput
    • this section is automatically omitted if no errors were recorded (by client code or panic handling code)
  • Support for explicitly omitting Thresholds section in LongServiceOutput
    • this section is automatically omitted if no thresholds were specified by client code
  • Automatically omit LongServiceOutput section if not specify by client code
  • Support for overriding text used for section headers/labels

Changelog

See the CHANGELOG.md file for the changes associated with each release of this application. Changes that have been merged to master, but not yet an official release may also be noted in the file under the Unreleased section. A helpful link to the Git commit history since the last official release is also provided for further review.

Examples

Import this library

Add this line to your imports like so:

package main

import (
  "fmt"
  "log"
  "os"

  "github.com/atc0005/go-nagios"
)

and pull in a specific version of this library that you'd like to use.

go get github.com/atc0005/go-nagios@v0.9.0

Alternatively, you can use the latest stable tag available to get started:

go get github.com/atc0005/go-nagios@latest
Use only the provided constants

After you've imported this library, reference the exported data types as you would from any other package. In this example, we reference a specific exit code for the OK state:

fmt.Println("OK: All checks have passed")
os.Exit(nagios.StateOKExitCode)

You can also use the provided state "labels" to avoid using literal string state values (recommended):

fmt.Printf(
    "%s: All checks have passed%s",
    nagios.StateOKLabel,
    nagios.CheckOutputEOL,
)

os.Exit(nagios.StateOKExitCode)
Basic plugin structure

First, create an instance of the ExitState type and immediately defer ReturnCheckResults() so that it runs as the last step in your client code.

Also, avoid calling os.Exit() directly from your code. If you do, this library is unable to function properly; this library expects that it will handle calling os.Exit() with the required exit code (and specifically formatted output).

Also, if you do not defer ReturnCheckResults() immediately any other deferred functions in your client code will not run.

Here we're optimistic and we are going to note that all went well.

package main

import (
  // ...
)

func main() {

    var nagiosExitState = nagios.ExitState{
        LastError:         nil,
        ExitStatusCode:    nagios.StateOKExitCode,
    }

    defer nagiosExitState.ReturnCheckResults()

    // more stuff here

    nagiosExitState.ServiceOutput = certs.OneLineCheckSummary(
        nagios.StateOKLabel,
        certChain,
        certsSummary.Summary,
    )

    nagiosExitState.LongServiceOutput := certs.GenerateCertsReport(
        certChain,
        certsExpireAgeCritical,
        certsExpireAgeWarning,
    )

For handling error cases, the approach is roughly the same, only you call return explicitly to end execution of the client code and allow deferred functions to run.

Use a branding callback

In this example, we'll make a further assumption that you have a config value with an EmitBranding field to indicate whether the user/sysadmin has opted to emit branding information.

package main

import (
  // ...
)

func main() {

    var nagiosExitState = nagios.ExitState{
        LastError:         nil,
        ExitStatusCode:    nagios.StateOKExitCode,
    }

    defer nagiosExitState.ReturnCheckResults()

    // ...

    if config.EmitBranding {
      // If enabled, show application details at end of notification
      nagiosExitState.BrandingCallback = Branding("Notification generated by ")
    }

    // ...

}

the Branding function might look something like this:

// Branding accepts a message and returns a function that concatenates that
// message with version information. This function is intended to be called as
// a final step before application exit after any other output has already
// been emitted.
func Branding(msg string) func() string {
    return func() string {
        return strings.Join([]string{msg, Version()}, "")
    }
}

but you could just as easily create an anonymous function as the callback:

package main

import (
  // ...
)

func main() {

    var nagiosExitState = nagios.ExitState{
        LastError:         nil,
        ExitStatusCode:    nagios.StateOKExitCode,
    }

    defer nagiosExitState.ReturnCheckResults()

    if config.EmitBranding {
        // If enabled, show application details at end of notification
        nagiosExitState.BrandingCallback = func(msg string) func() string {
            return func() string {
                return "Notification generated by " + msg
            }
        }("HelloWorld")
    }

    // ...

}
Override section header/label text

In this example, we override the default text with values that better fit our use case.

package main

import (
  // ...
)

func main() {

    var nagiosExitState = nagios.ExitState{
        LastError:         nil,
        ExitStatusCode:    nagios.StateOKExitCode,
    }

    defer nagiosExitState.ReturnCheckResults()

    // Override default section headers with our custom values.
    nagiosExitState.SetErrorsLabel("VALIDATION ERRORS")
    nagiosExitState.SetDetailedInfoLabel("VALIDATION CHECKS REPORT")

    // more stuff here

    nagiosExitState.ServiceOutput = certs.OneLineCheckSummary(
        nagios.StateOKLabel,
        certChain,
        certsSummary.Summary,
    )

    nagiosExitState.LongServiceOutput := certs.GenerateCertsReport(
        certChain,
        certsExpireAgeCritical,
        certsExpireAgeWarning,
    )

Omit Errors, Thresholds sections

In this example, we hide or omit the Errors and Thresholds sections entirely.

package main

import (
  // ...
)

func main() {

    var nagiosExitState = nagios.ExitState{
        LastError:         nil,
        ExitStatusCode:    nagios.StateOKExitCode,
    }

    defer nagiosExitState.ReturnCheckResults()

    // Hide/Omit these sections from plugin output
    nagiosExitState.HideErrorsSection()
    nagiosExitState.HideThresholdsSection()

    // more stuff here

    nagiosExitState.ServiceOutput = certs.OneLineCheckSummary(
        nagios.StateOKLabel,
        certChain,
        certsSummary.Summary,
    )

    nagiosExitState.LongServiceOutput := certs.GenerateCertsReport(
        certChain,
        certsExpireAgeCritical,
        certsExpireAgeWarning,
    )

Collect and emit Performance Data

This example provides plugin runtime via a deferred anonymous function:

package main

import (
  // ...
)

func main() {

    // Start the timer. We'll use this to emit the plugin runtime as a
    // performance data metric.
    pluginStart := time.Now()

    var nagiosExitState = nagios.ExitState{
        LastError:         nil,
        ExitStatusCode:    nagios.StateOKExitCode,
    }

    // defer this from the start so it is the last deferred function to run
    defer nagiosExitState.ReturnCheckResults()

    // Collect last minute details just before ending plugin execution.
    defer func(exitState *nagios.ExitState, start time.Time, logger zerolog.Logger) {

        // Record plugin runtime, emit this metric regardless of exit
        // point/cause.
        runtimeMetric := nagios.PerformanceData{
            Label: "time",
            Value: fmt.Sprintf("%dms", time.Since(start).Milliseconds()),
        }
        if err := exitState.AddPerfData(false, runtimeMetric); err != nil {
            zlog.Error().
                Err(err).
                Msg("failed to add time (runtime) performance data metric")
        }

        // Annotate errors (if applicable) with additional context to aid in
        // troubleshooting.
        nagiosExitState.Errors = annotateError(logger, nagiosExitState.Errors...)
    }(&nagiosExitState, pluginStart, cfg.Log)

    // more stuff here

    // This example also assumes that the check results indicate success. You
    // may opt to use a "IsOK()" style check and a switch statement to
    // have the below execute within a default branch.
    nagiosExitState.ServiceOutput = certs.OneLineCheckSummary(
        nagios.StateOKLabel,
        certChain,
        certsSummary.Summary,
    )

    nagiosExitState.LongServiceOutput := certs.GenerateCertsReport(
        certChain,
        certsExpireAgeCritical,
        certsExpireAgeWarning,
    )

    // ...

}

and this example provides multiple performance data values explicitly:

package main

import (
  // ...
)

func main() {

    // Start the timer. We'll use this to emit the plugin runtime as a
    // performance data metric.
    pluginStart := time.Now()

    var nagiosExitState = nagios.ExitState{
        LastError:         nil,
        ExitStatusCode:    nagios.StateOKExitCode,
    }

    // defer this from the start so it is the last deferred function to run
    defer nagiosExitState.ReturnCheckResults()

    // more stuff here

    pd := []nagios.PerformanceData{
        {
            Label: "time",
            Value: fmt.Sprintf("%dms", time.Since(start).Milliseconds()),
        },
        {
            Label: "datacenters",
            Value: fmt.Sprintf("%d", len(dcs)),
        },
        {
            Label: "triggered_alarms",
            Value: fmt.Sprintf("%d", len(triggeredAlarms)),
        },
    }

    if err := nagiosExitState.AddPerfData(false, pd...); err != nil {
        log.Error().
            Err(err).
            Msg("failed to add performance data")
    }

    // This example also assumes that the check results indicate success. You
    // may opt to use a "IsOK()" style check and a switch statement to
    // have the below execute within a default branch.
    nagiosExitState.ServiceOutput = certs.OneLineCheckSummary(
        nagios.StateOKLabel,
        certChain,
        certsSummary.Summary,
    )

    nagiosExitState.LongServiceOutput := certs.GenerateCertsReport(
        certChain,
        certsExpireAgeCritical,
        certsExpireAgeWarning,
    )

    // ...

}

This example drops the deferred handling of the plugin runtime metric in order to illustrate how it would look if handled directly alongside other metrics.

See these issues for further details:

License

From the LICENSE file:

MIT License

Copyright (c) 2020 Adam Chalkley

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

References

Documentation

Overview

Package nagios provides common types, constants, package-level variables, performance data and methods for use with Nagios plugins.

OVERVIEW

This package provides common functionality for use by plugins used by Nagios (and similar) monitoring systems. The goal is to reduce code duplication for monitoring plugins written in the Go programming language.

PROJECT HOME

See our GitHub repo (https://github.com/atc0005/go-nagios) for the latest code, to file an issue or submit improvements for review and potential inclusion into the project.

FEATURES

  • Nagios state labels (e.g., StateOKLabel), state exit codes (e.g., StateOKExitCode)
  • Nagios ServiceState type useful in client code as a way to map internal check results to a Nagios service state value
  • Nagios CheckOutputEOL constant useful for consistent newline display in results displayed in web UI, email notifications
  • ExitState type with ReturnCheckResults method used to process and return all applicable check results to Nagios for further processing/display
  • Optional support for collecting/emitting performance data generated by plugins
  • Supports "branding" callback function to display application name, version, or other information as a "trailer" for check results provided to Nagios
  • Panics from client code are captured and reported
  • Support for collecting multiple errors from client code
  • Support for explicitly omitting Errors section in LongServiceOutput (automatically omitted if none were recorded)
  • Support for explicitly omitting Thresholds section in LongServiceOutput (automatically omitted if none were recorded)
  • Automatically omit LongServiceOutput section if not specify by client code
  • Support for overriding text used for section headers/labels

HOW TO USE

  • See the code documentation here for specifics
  • See the README for this project for examples

Index

Constants

View Source
const (
	StateOKExitCode        int = 0
	StateWARNINGExitCode   int = 1
	StateCRITICALExitCode  int = 2
	StateUNKNOWNExitCode   int = 3
	StateDEPENDENTExitCode int = 4
)

Nagios plugin/service check states. These constants replicate the values from utils.sh which is normally found at one of these two locations, depending on which Linux distribution you're using:

- /usr/lib/nagios/plugins/utils.sh - /usr/local/nagios/libexec/utils.sh

See also http://nagios-plugins.org/doc/guidelines.html

View Source
const (
	StateOKLabel        string = "OK"
	StateWARNINGLabel   string = "WARNING"
	StateCRITICALLabel  string = "CRITICAL"
	StateUNKNOWNLabel   string = "UNKNOWN"
	StateDEPENDENTLabel string = "DEPENDENT"
)

Nagios plugin/service check state "labels". These constants are provided as an alternative to using literal state strings throughout client application code.

View Source
const CheckOutputEOL string = " \n"

CheckOutputEOL is the newline character(s) used with formatted service and host check output. Based on previous testing, Nagios treats LF newlines (without a leading space) within the `$LONGSERVICEOUTPUT$` macro as literal values instead of parsing them for display purposes.

Using DOS EOL values with fmt.Fprintf() (or fmt.Fprintln()) gives expected formatting results in the Nagios Core web UI, but results in double newlines in Nagios XI output (see GH-109). Using a UNIX EOL with a single leading space appears to give the intended results for both Nagios Core and Nagios XI.

Variables

View Source
var (
	// ErrPanicDetected indicates that client code has an unhandled panic and
	// that this library detected it before it could cause the plugin to
	// abort. This error is included in the LongServiceOutput emitted by the
	// plugin.
	ErrPanicDetected = errors.New("plugin crash/panic detected")

	// ErrPerformanceDataMissingLabel indicates that client code did not
	// provide a PerformanceData value in the expected format; the label for
	// the label/value pair is missing.
	ErrPerformanceDataMissingLabel = errors.New("provided performance data missing required label")

	// ErrPerformanceDataMissingValue indicates that client code did not
	// provide a PerformanceData value in the expected format; the value for
	// the label/value pair is missing.
	ErrPerformanceDataMissingValue = errors.New("provided performance data missing required value")

	// ErrNoPerformanceDataProvided indicates that client code did not provide
	// the expected PerformanceData value(s).
	ErrNoPerformanceDataProvided = errors.New("no performance data provided")
)

Sentinel error collection. Exported for potential use by client code to detect & handle specific error scenarios.

Functions

This section is empty.

Types

type ExitCallBackFunc added in v0.4.0

type ExitCallBackFunc func() string

ExitCallBackFunc represents a function that is called as a final step before application termination so that branding information can be emitted for inclusion in the notification. This helps identify which specific application (and its version) that is responsible for the notification.

type ExitState added in v0.4.0

type ExitState struct {

	// LastError is the last error encountered which should be reported as
	// part of ending the service check (e.g., "Failed to connect to XYZ to
	// check contents of Inbox").
	//
	// Deprecated: Use Errors field or AddError method instead.
	LastError error

	// Errors is a collection of one or more recorded errors to be displayed
	// in LongServiceOutput as a list when ending the service check.
	Errors []error

	// ExitStatusCode is the exit or exit status code provided to the Nagios
	// instance that calls this service check. These status codes indicate to
	// Nagios "state" the service is considered to be in. The most common
	// states are OK (0), WARNING (1) and CRITICAL (2).
	ExitStatusCode int

	// ServiceOutput is the first line of text output from the last service
	// check (i.e. "Ping OK").
	ServiceOutput string

	// LongServiceOutput is the full text output (aside from the first line)
	// from the last service check.
	LongServiceOutput string

	// WarningThreshold is the value used to determine when the service check
	// has crossed between an existing state into a WARNING state. This value
	// is used for display purposes.
	WarningThreshold string

	// CriticalThreshold is the value used to determine when the service check
	// has crossed between an existing state into a CRITICAL state. This value
	// is used for display purposes.
	CriticalThreshold string

	// BrandingCallback is a function that is called before application
	// termination to emit branding details at the end of the notification.
	// See also ExitCallBackFunc.
	BrandingCallback ExitCallBackFunc
	// contains filtered or unexported fields
}

ExitState represents the last known execution state of the application, including the most recent error and the final intended plugin state.

func (*ExitState) AddError added in v0.9.0

func (es *ExitState) AddError(err ...error)

AddError appends provided errors to the collection.

func (*ExitState) AddPerfData added in v0.8.0

func (es *ExitState) AddPerfData(skipValidate bool, pd ...PerformanceData) error

AddPerfData appends provided performance data. Validation is skipped if requested, otherwise an error is returned if validation fails. Validation failure results in no performance data being appended.

Client code may wish to disable validation if performing this step directly.

func (*ExitState) HideErrorsSection added in v0.9.0

func (es *ExitState) HideErrorsSection()

HideErrorsSection indicates that client code has opted to hide the errors section, regardless of whether values were previously provided for display.

func (*ExitState) HideThresholdsSection added in v0.9.0

func (es *ExitState) HideThresholdsSection()

HideThresholdsSection indicates that client code has opted to hide the thresholds section, regardless of whether values were previously provided for display.

func (*ExitState) ReturnCheckResults added in v0.4.0

func (es *ExitState) ReturnCheckResults()

ReturnCheckResults is intended to provide a reliable way to return a desired exit code from applications used as Nagios plugins. In most cases, this method should be registered as the first deferred function in client code. See remarks regarding "masking" or "swallowing" application panics.

Since Nagios relies on plugin exit codes to determine success/failure of checks, the approach that is most often used with other languages is to use something like Using os.Exit() directly and force an early exit of the application with an explicit exit code. Using os.Exit() directly in Go does not run deferred functions. Go-based plugins that do not rely on deferring function calls may be able to use os.Exit(), but introducing new dependencies later could introduce problems if those dependencies rely on deferring functions.

Before calling this method, client code should first set appropriate field values on the receiver. When called, this method will process them and exit with the desired exit code and status output.

To repeat, if scheduled via defer, this method should be registered first; because this method calls os.Exit to set the intended plugin exit state, no other deferred functions will have an opportunity to run, so register this method first so that when deferred, it will be run last (FILO).

Because this method is (or should be) deferred first within client code, it will run after all other deferred functions. It will also run before a panic in client code forces the application to exit. As already noted, this method calls os.Exit to set the plugin exit state. Because os.Exit forces the application to terminate immediately without running other deferred functions or processing panics, this "masks", "swallows" or "blocks" panics from client code from surfacing. This method checks for unhandled panics and if found, overrides exit state details from client code and surfaces details from the panic instead as a CRITICAL state.

func (*ExitState) SetDetailedInfoLabel added in v0.9.0

func (es *ExitState) SetDetailedInfoLabel(newLabel string)

SetDetailedInfoLabel overrides the default detailed info label text.

func (*ExitState) SetErrorsLabel added in v0.9.0

func (es *ExitState) SetErrorsLabel(newLabel string)

SetErrorsLabel overrides the default errors label text.

func (*ExitState) SetOutputTarget added in v0.10.0

func (es *ExitState) SetOutputTarget(w io.Writer)

SetOutputTarget assigns a target for Nagios plugin output. By default output is emitted to os.Stdout.

func (*ExitState) SetThresholdsLabel added in v0.9.0

func (es *ExitState) SetThresholdsLabel(newLabel string)

SetThresholdsLabel overrides the default thresholds label text.

func (*ExitState) SkipOSExit added in v0.10.0

func (es *ExitState) SkipOSExit()

SkipOSExit indicates that the os.Exit(x) step used to signal to Nagios what state plugin execution has completed in (e.g., OK, WARNING, ...) should be skipped. If skipped, a message is logged to os.Stderr in place of the os.Exit(x) call.

Disabling the call to os.Exit is needed by tests to prevent panics in Go 1.16 and newer.

type PerformanceData added in v0.8.0

type PerformanceData struct {

	// Label is the single quoted text string used as a label for a specific
	// performance data point. The label length is arbitrary, but ideally the
	// first 19 characters are unique due to a limitation in RRD. There is
	// also a limitation in the amount of data that NRPE returns to Nagios.
	//
	// The popular convention used by plugin authors (and official
	// documentation) is to use underscores for separating multiple words. For
	// example, 'percent_packet_loss' instead of 'percent packet loss',
	// 'percentPacketLoss' or 'percent-packet-loss.
	Label string

	// Value is the data point associated with the performance data label.
	//
	// Value is in class [-0-9.] and must be the same UOM as Min and Max UOM.
	// Value may be a literal "U" instead, this would indicate that the actual
	// value couldn't be determined.
	Value string

	// UnitOfMeasurement is an optional unit of measurement (UOM). If
	// provided, consists of a string of zero or more characters. Numbers,
	// semicolons or quotes are not permitted.
	//
	// Examples:
	//
	// 1) no unit specified - assume a number (int or float) of things (eg,
	// users, processes, load averages)
	// 2) s - seconds (also us, ms)
	// 3) % - percentage
	// 4) B - bytes (also KB, MB, TB)
	// 5) c - a continuous counter (such as bytes transmitted on an interface)
	UnitOfMeasurement string

	// Warn is in the range format (see the Section called Threshold and
	// Ranges). Must be the same UOM as Crit. An empty string is permitted.
	//
	// https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT
	Warn string

	// Crit is in the range format (see the Section called Threshold and
	// Ranges). Must be the same UOM as Warn. An empty string is permitted.
	//
	// https://nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT
	Crit string

	// Min is in class [-0-9.] and must be the same UOM as Value and Max. Min
	// is not required if UOM=%. An empty string is permitted.
	Min string

	// Max is in class [-0-9.] and must be the same UOM as Value and Min. Max
	// is not required if UOM=%. An empty string is permitted.
	Max string
}

PerformanceData represents the performance data generated by a Nagios plugin.

Plugin performance data is external data specific to the plugin used to perform the host or service check. Plugin-specific data can include things like percent packet loss, free disk space, processor load, number of current users, etc. - basically any type of metric that the plugin is measuring when it executes.

func (PerformanceData) Validate added in v0.8.0

func (pd PerformanceData) Validate() error

Validate performs basic validation of PerformanceData. An error is returned for any validation failures.

type ServiceState added in v0.7.0

type ServiceState struct {

	// Label maps directly to one of the supported Nagios state labels.
	Label string

	// ExitCode is the exit or exit status code associated with a Nagios
	// service check.
	ExitCode int
}

ServiceState represents the status label and exit code for a service check.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL