The highest tagged major version is v2.

minimal-server-monitoring

module

v1.1.4 Latest Latest Go to latest Published: Mar 23, 2024 License: MIT

README ¶

minimal-server-monitoring

This tool lets you monitor a typical home server running applications in containers and receive alerts on your smartphone. It is designed to be light and simple (no database, no GUI, a single configuration file).

Features

run in a container (tested with both docker and podman)
send notifications to any supported services by shoutrrr
alert when a container is restarting forever
alert when a container isn't started
alert when a target is unreachable (ping)
alert when available disk space is low
alert when systemd service is failed
notify when a container image is updated (provide an alternative to watchtower if you are running podman with podman-auto-update)

Versioning and packaging

This tool follows semantic versioning.

Pre-built images are available on github packages:

ghcr.io/mcarbonne/minimal-server-monitoring:main (main branch)
ghcr.io/mcarbonne/minimal-server-monitoring:latest: latest tagged version
ghcr.io/mcarbonne/minimal-server-monitoring:x.x.x
ghcr.io/mcarbonne/minimal-server-monitoring:x.x
ghcr.io/mcarbonne/minimal-server-monitoring:x

For automatic updates (watchtower, podman-auto-update...), using the lastest major tag available (ghcr.io/mcarbonne/minimal-server-monitoring:1) is recommanded to avoid breaking changes.

Minimal configuration

Bare minimum (container monitoring only, and alerts with shoutrrr)

docker run -e MACHINENAME=$(hostname) -e SHOUTRRR=XXXXXXX -v .../cache.json:/app/cache.json -v /var/run/docker.sock:/var/run/docker.sock:ro \
--name minimal-server-monitoring -d ghcr.io/mcarbonne/minimal-server-monitoring:1

Custom config.json

docker run \
-v .../config.json:/app/config.json:ro \
-v .../cache.json:/app/cache.json \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /run/systemd:/run/systemd:ro \
--name minimal-server-monitoring -d ghcr.io/mcarbonne/minimal-server-monitoring:1

-v .../config.json:/app/config.json:ro: override default configuration file with your settings. Default configuration file is available here. Have a look at example_config.json for an exhaustive lists of available parameters.
-v .../cache.json:/app/cache.json: persist the cache
-v /var/run/docker.sock:/var/run/docker.sock:ro: give access to the host docker daemon (required for container provider). Use /run/podman/podman.sock:/var/run/docker.sock:ro if you are using podman.
-v /run/systemd:/run/systemd:ro: give access to the host systemd (required for systemd provider)

Internal

flowchart TD
subgraph Scraping
    Storage
      Sc(Schedule scrapers)
      Sc-..->S1 & S2 & S3
      S1("`**Scraper n°1**
      - provider: container
      - scrape_interval: 15s`")
      S2("`**Scraper n°2**
      - provider: ping
      - scrape_interval: 30s`")
      S3(...)
    S1 & S2 & S3 -->SC
    SC{{Collect ScrapeResult}}
    Storage[(Storage)]
    S1 & S2 & S3<-.->Storage
end

SC--"- states\n- messages"-->AlertCenter

subgraph AlertCenter
    AC{{"Generate notifications"}}
    AC--notifications-->F
    F{{Filtering}}
    F--filtered notifications-->G
    G{{Grouping}}
end
G--filtered and grouped notifications-->Notifier
subgraph Notifier
    C{{Send notifications}}
    N1(Shoutrrr)
    N2(...)
    C-->N1
    C-->N2
end

Scraping

Schedule configured scrapers. Each scraper may emit multiple states and multiple messages. On contrary to some other monitoring tools, decisions are taken in scrapers (i.e. is metric healthy).

Multiple instances of a given provider may be allowed (depending on provider).

A State metric is the combination of a metricId, a state (boolean) and a message. Example: metricId: "container_XXXX_state", isHealthy: false, message: "XXXX isn't running"

A Message metric is the combination of a metricId and a message. Example: metricId: "container_XXXX_updated", message: "container XXXX was updated ...."

Providers can persist data using Storage, a simple key-value database.

The following providers are implemented :

container

no parameters
only one instance allowed
messages (for every running containers):
- when a container image is updated
states (for every running containers):
- container status (check if started)
- container restart (check if restarting forever)

ping

parameter	description	required	default value
targets	list of ip addresses/hostnames to ping	yes	-
retry_count	how many times to retry if ping failed	no	3

provide one state: is target reachable.
multiple instances allowed

filesystemusage

parameter	description	required	default value
mountpoints	list of mount points to check	yes	-
threshold_percent	minimum threshold (percentage) of available disk space	no	20

provide one state per mountpoint
multiple instances allowed

systemd

no parameters
only one instance allowed
states (for every services):
- service active state (ActiveState != failed)

AlertCenter

AlertCenter is here to:

emit notifications from scrape result
avoid beeing flooded with notifications (filtering + grouping)

Generate notifications

If a state is marked as failed unhealthy_threshold time in a row, a notification is sent (metric XX failed). If a state is marked as OK healthy_threshold time in a row, a notification is sent (metric XX OK).

Messages are forwared as notifications (no processing at this step).

Filtering

Avoid sending too many notifications for a given metricId. Each metricId is allowed to send at most 5 messages every 30 minutes.

Grouping

When processing a notification, wait up to 15 seconds to group at most 10 notifications.

Notifier

Send all notifications to all configured notifiers. Multiple instances of each type are allowed.

Directories ¶

Path	Synopsis
cmd
minimal-server-monitoring
pkg
alert
config
logging
notifier
scheduler
scraping
scraping/provider
storage
utils

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL