notmain

package
v0.0.0-...-d6e163c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 20, 2024 License: MPL-2.0 Imports: 5 Imported by: 0

README

boulder-observer

A modular configuration driven approach to black box monitoring with Prometheus.

Usage

Options
$ ./boulder-observer -help
  -config string
        Path to boulder-observer configuration file (default "config.yml")
Starting the boulder-observer daemon
$ ./boulder-observer -config test/config-next/observer.yml
I152525 boulder-observer _KzylQI Versions: main=(Unspecified Unspecified) Golang=(go1.16.2) BuildHost=(Unspecified)
I152525 boulder-observer q_D84gk Initializing boulder-observer daemon from config: test/config-next/observer.yml
I152525 boulder-observer 7aq68AQ all monitors passed validation
I152527 boulder-observer yaefiAw kind=[HTTP] success=[true] duration=[0.130097] name=[https://letsencrypt.org-[200]]
I152527 boulder-observer 65CuDAA kind=[HTTP] success=[true] duration=[0.148633] name=[http://letsencrypt.org/foo-[200 404]]
I152530 boulder-observer idi4rwE kind=[DNS] success=[false] duration=[0.000093] name=[[2606:4700:4700::1111]:53-udp-A-google.com-recurse]
I152530 boulder-observer prOnrw8 kind=[DNS] success=[false] duration=[0.000242] name=[[2606:4700:4700::1111]:53-tcp-A-google.com-recurse]
I152530 boulder-observer 6uXugQw kind=[DNS] success=[true] duration=[0.022962] name=[1.1.1.1:53-udp-A-google.com-recurse]
I152530 boulder-observer to7h-wo kind=[DNS] success=[true] duration=[0.029860] name=[owen.ns.cloudflare.com:53-udp-A-letsencrypt.org-no-recurse]
I152530 boulder-observer ovDorAY kind=[DNS] success=[true] duration=[0.033820] name=[owen.ns.cloudflare.com:53-tcp-A-letsencrypt.org-no-recurse]
...

Configuration

Configuration is provided via a YAML file.

Root
Schema

debugaddr: The Prometheus scrape port prefixed with a single colon (e.g. :8040).

buckets: List of floats representing Prometheus histogram buckets (e.g [.001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10])

syslog: Map of log levels, see schema below.

  • stdoutlevel: Log level for stdout, see legend below.
  • sysloglevel:Log level for stdout, see legend below.

0: EMERG 1: ALERT 2: CRIT 3: ERR 4: WARN 5: NOTICE 6: INFO 7: DEBUG

monitors: List of monitors, see monitors for schema.

Example
debugaddr: :8040
buckets: [.001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10]
syslog:
  stdoutlevel: 6
  sysloglevel: 6
  -
    ...
Monitors
Schema

period: Interval between probing attempts (e.g. 1s 1m 1h).

kind: Kind of prober to use, see probers for schema.

settings: Map of prober settings, see probers for schema.

Example
monitors:
  - 
    period: 5s
    kind: DNS
    settings:
        ...
Probers
DNS
Schema

protocol: Protocol to use, options are: udp or tcp.

server: Hostname, IPv4 address, or IPv6 address surrounded with brackets + port of the DNS server to send the query to (e.g. example.com:53, 1.1.1.1:53, or [2606:4700:4700::1111]:53).

recurse: Bool indicating if recursive resolution is desired.

query_name: Name to query (e.g. example.com).

query_type: Record type to query, options are: A, AAAA, TXT, or CAA.

Example
monitors:
  - 
    period: 5s
    kind: DNS
    settings:
      protocol: tcp
      server: [2606:4700:4700::1111]:53
      recurse: false
      query_name: letsencrypt.org
      query_type: A
HTTP
Schema

url: Scheme + Hostname to send a request to (e.g. https://example.com).

rcodes: List of expected HTTP response codes.

useragent: String to set HTTP header User-Agent. If no useragent string is provided it will default to letsencrypt/boulder-observer-http-client.

Example
monitors:
  - 
    period: 2s
    kind: HTTP
    settings:
      url: http://letsencrypt.org/FOO
      rcodes: [200, 404]
      useragent: letsencrypt/boulder-observer-http-client
CRL
Schema

url: Scheme + Hostname to grab the CRL from (e.g. http://x1.c.lencr.org/).

Example
monitors:
  - 
    period: 1h
    kind: CRL
    settings:
      url: http://x1.c.lencr.org/
TLS
Schema

hostname: Hostname to run TLS check on (e.g. valid-isrgrootx1.letsencrypt.org).

rootOrg: Organization to check against the root certificate Organization (e.g. Internet Security Research Group).

rootCN: Name to check against the root certificate Common Name (e.g. ISRG Root X1). If not provided, root comparison will be skipped.

response: Expected site response; must be one of: valid, revoked or expired.

Example
monitors:
  - 
    period: 1h
    kind: TLS
    settings:
      hostname: valid-isrgrootx1.letsencrypt.org
      rootOrg: "Internet Security Research Group"
      rootCN: "ISRG Root X1"
      response: valid

Metrics

Observer provides the following metrics.

Global Metrics

These metrics will always be available.

obs_monitors

Count of configured monitors.

Labels:

kind: Kind of Prober the monitor is configured to use.

valid: Bool indicating whether settings provided could be validated for the kind of Prober specified.

obs_observations

Labels:

name: Name of the monitor.

kind: Kind of prober the monitor is configured to use.

duration: Duration of the probing in seconds.

success: Bool indicating whether the result of the probe attempt was successful.

Bucketed response times:

This is configurable, see buckets under root/schema.

CRL Metrics

These metrics will be available whenever a valid CRL prober is configured.

obs_crl_this_update

Unix timestamp value (in seconds) of the thisUpdate field for a CRL.

Labels:

url: Url of the CRL

Example Usage:

This is a sample rule that alerts when a CRL has a thisUpdate timestamp in the future, signalling that something may have gone wrong during its creation:

- alert: CRLThisUpdateInFuture
  expr: obs_crl_this_update{url="http://x1.c.lencr.org/"} > time()
  labels:
    severity: critical
  annotations:
    description: 'CRL thisUpdate is in the future'
obs_crl_next_update

Unix timestamp value (in seconds) of the nextUpdate field for a CRL.

Labels:

url: Url of the CRL

Example Usage:

This is a sample rule that alerts when a CRL has a nextUpdate timestamp in the past, signalling that the CRL was not updated on time:

- alert: CRLNextUpdateInPast
  expr: obs_crl_next_update{url="http://x1.c.lencr.org/"} < time()
  labels:
    severity: critical
  annotations:
    description: 'CRL nextUpdate is in the past'

Another potentially useful rule would be to notify when nextUpdate is within X days from the current time, as a reminder that the update is coming up soon.

obs_crl_revoked_cert_count

Count of revoked certificates in a CRL.

Labels:

url: Url of the CRL

TLS Metrics

These metrics will be available whenever a valid TLS prober is configured.

obs_tls_not_after

Unix timestamp value (in seconds) of the notAfter field for a subscriber certificate.

Labels:

hostname: Hostname of the site of the subscriber certificate

Example Usage:

This is a sample rule that alerts when a site has a notAfter timestamp indicating that the certificate will expire within the next 20 days:

  - alert: CertExpiresSoonWarning
    annotations:
      description: "The certificate at {{ $labels.hostname }} expires within 20 days, on: {{ $value | humanizeTimestamp }}"
    expr: (obs_tls_not_after{hostname=~"^[^e][a-zA-Z]*-isrgrootx[12][.]letsencrypt[.]org"}) <= time() + 1728000
    for: 60m
    labels:
      severity: warning
obs_tls_reason

This is a count that increments by one for each resulting reason of a TSL check. The reason is nil if the TLS Prober returns true and one of the following otherwise: internalError, ocspError, rootDidNotMatch, responseDidNotMatch.

Labels:

hostname: Hostname of the site of the subscriber certificate reason: The reason for TLS Probe returning false, and nil if it returns true

Example Usage:

This is a sample rule that alerts when TLS Prober returns false, providing insight on the reason for failure.

  - alert: TLSCertCheckFailed
    annotations:
      description: "The TLS probe for {{ $labels.hostname }} failed for reason: {{ $labels.reason }}. This potentially violents CP 2.2."
    expr: (rate(obs_observations_count{success="false",name=~"[a-zA-Z]*-isrgrootx[12][.]letsencrypt[.]org"}[5m])) > 0
    for: 5m
    labels:
      severity: critical

Development

Starting Prometheus locally

Please note, this assumes you've installed a local Prometheus binary.

prometheus --config.file=boulder/test/prometheus/prometheus.yml
Viewing metrics locally

When developing with a local Prometheus instance you can use this link to view metrics: link

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL