NOTE: some documentation outdated, since there was a major refactoring
Requires these ENV vars:
lambda-alertmanager?
- Provides simple & reliable alerting for your infrastructure.
- Uses so little resources that it is practically free to run.
- Monitors your web properties for being up,
receive alerts from Prometheus,
Amazon CloudWatch alarms, alarms via SNS topic or
any custom HTTP integration (as JSON).
- Runs entirely on AWS' reliable infrastructure (after setup nothing for you to manage or fix). The compute part is Lambda,
but we also use DynamoDB + streams (for state), IAM (for sandboxing AlertManager), API Gateway (for inbound https integrations),
CloudWatch Events (for scheduling) and SNS (inbound alarm receiving, outbound alert delivery).
- Acknowledge -model: each separate alarm is alerted only once until it is acknowledged from UI,
even if the same alarm is submitted again. F.ex. Prometheus sends the same alert continuously
until the issue is resolved, but of course you want to receive the alert only once.
- Rate limiting: if shit hits the fan and your hundreds of alarms trigger all at once, you only get alerts
for the first, say, 10 alarms. The rate limit is configurable.
- Supports dead man's switches: a service has to periodically make a check-in. If the
check-ins stop coming, we raise an alert.
Can send alerts to you (or many people) via:
- SMS (free: <= 100 alerts/month)
- Email
- Webhook
- Push to mobile device (though SMS is better in cases when you are travelling or otherwise not reachable via mobile data)
- Any combination of these (I use SMS + Email)
- Or anything that SNS supports (the above are just SNS transports)
Can directly monitor:
- http/https checks via AlertManager-Canary component (included but optional):
checks that your web properties are up - triggers an alert if not. Can even check all your properties
at 1 minute intervals, and runs efficiently because all the checks are executed in parallel. Tries to minimize
false positives by retrying each failed check once before generating an alarm.
Integrates with:
- Supports receiving alerts from Prometheus.
- Supports receiving alerts via SNS (= directly plugs into Amazon CloudWatch Alerts)
or any other SNS-publishing source. For example we receive alerts from CloudWatch -> AlertManager if our
queue processors stop processing work.
- Supports receiving alerts over https as JSON.
How to install & other docs
Take note of your AWS region. These docs assume you are in the us-west-2
region.
If not, substitute your region code everywhere in these docs!
Follow these steps precisely, and you've got yourself a working installation:
- Set up SNS topics
- Set up DynamoDB
- Set up IAM
- Set up AlertManager
- Set up API Gateway (also includes: testing that this works)
- (recommended) Set up AlertManager-canary
- (optional) Set up Prometheus integration
- (optional) Set up custom integration
- (optional) Set up CloudWatch integration
Diagram
FAQ
Q: Why use this, uptimerobot.com is free?
A: uptimerobot.com is good, but:
- The free option only supports 5 minute rates while lambda-alertmanager supports 1 minute rates.
- I don't trust the quality of it for my production usage: I had an issue where a failed check
after fixing stayed failed for more than 12 hours even though I manually checked that the
endpoint works. I had to pause-and-then-resume the check right after it UptimeRobot
reported the check as OK.
- It does mainly HTTP/HTTPS checks, while lambda-alertmanager integrates with Prometheus, Amazon CloudWatch & others as well.
- It supports free SMS messages (no delivery guarantees), but they have non-free "pro SMS" (better delivery).
lambda-alertmanager SMSes are all "pro SMS" and free to a certain limit.
- lambda-alertmanager is simple, free, open source, runs "on premises" (in your AWS account) and should run forever
(AWS is not going anywhere).
- That being said, lambda-alertmanager is not "dead simple" to set up and you need an AWS account. If your use
case does not require lambda-alertmanager, you should probably choose uptimerobot. :)
Basic support (no guarantees) for issues / feature requests via GitHub issues.
Paid support is available via function61.com/consulting
Contact options (email, Twitter etc.) at function61.com