Canary checker is a kubernetes-native platform for monitoring health across application and infrastructure using both passive and active (synthetic) mechanisms.
Features
- Batteries Included - 35+ built-in check types
- Kubernetes Native - Health checks (or canaries) are CRD's that reflect health via the
status
field, making them compatible with GitOps, Flux Health Checks, Argo, Helm, etc..
- Secret Management - Leverage K8S secrets and configmaps for authentication and connection details
- Prometheus - Prometheus compatible metrics are exposed at
/metrics
. A Grafana Dashboard is also available.
- Dependency Free - Runs an embedded postgres instance by default, can also be configured to use an external database.
- JUnit Export (CI/CD) - Export health check results to JUnit format for integration into CI/CD pipelines
- JUnit Import (k6/newman/puppeter/etc) - Use any container that creates JUnit test results
- Scriptable - Go templates, Javascript and CEL can be used to:
- Evaluate whether a check is passing and severity to use when failing
- Extract a user friendly error message
- Transform and filter check responses into individual check results
- Extract custom metrics
- Multi-Modal - While designed as a Kubernetes Operator, canary checker can also run as a CLI and a server without K8s
Getting Started
- Install canary checker with Helm
helm repo add flanksource https://flanksource.github.io/charts
helm repo update
helm install \
canary-checker \
flanksource/canary-checker \
-n canary-checker \
--create-namespace
--wait
- Create a new check
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: http-check
spec:
interval: 30
http:
- name: basic-check
url: https://httpbin.demo.aws.flanksource.com/status/200
- name: failing-check
url: https://httpbin.demo.aws.flanksource.com/status/500
2a. Run the check locally (Optional)
wget https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker_linux_amd64 \
-O canary-checker && chmod +x canary-checker
./canary-checker run canary.yaml
- Apply the check
kubectl apply -f canary.yaml
- Check the health status
kubectl get canary
NAME INTERVAL STATUS LAST CHECK UPTIME 1H LATENCY 1H LAST TRANSITIONED
http-check. 30 Passed 13s 18/18 (100.0%) 480ms 13s
See fixtures for more examples and docs for more comprehensive documentation.
Use Cases
Synthetic Testing
Run simple HTTP/DNS/ICMP probes or more advanced full test suites using JMeter, K6, Playright, Postman.
# Run a container that executes a playwright test, and then collect the
# JUnit formatted test results from the /tmp folder
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: playwright-junit
spec:
interval: 120
junit:
- testResults: "/tmp/"
name: playwright-junit
spec:
containers:
- name: playwright
image: ghcr.io/flanksource/canary-playwright:latest
Infrastructure Testing
Verify that infrastructure is fully operational by deploying new pods, spinning up new EC2 instances and pushing/pulling from docker and helm repositories.
# Schedule a new pod with an ingress and then time how long it takes to schedule, be ready, respond to an http request and finally be cleaned up.
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: pod-check
spec:
interval: 30
pod:
- name: golang
spec: |
apiVersion: v1
kind: Pod
metadata:
name: hello-world-golang
namespace: default
labels:
app: hello-world-golang
spec:
containers:
- name: hello
image: quay.io/toni0/hello-webserver-golang:latest
port: 8080
path: /foo/bar
scheduleTimeout: 20000
readyTimeout: 10000
httpTimeout: 7000
deleteTimeout: 12000
ingressTimeout: 10000
deadline: 60000
httpRetryInterval: 200
expectedContent: bar
expectedHttpStatuses: [200, 201, 202]
Backup Checks / Batch File Monitoring
Check that batch file processes are functioning correctly by checking the age and size of files in local file systems, SFTP, SMB, S3 and GCS.
# Checks that a recent DB backup has been uploaded
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: folder-check
spec:
schedule: 0 22 * * *
folder:
- path: s3://database-backups/prod
name: prod-backup
maxAge: 1d
minSize: 10gb
Alert Aggregation
Aggregate alerts and recommendations from Prometheus, AWS Cloudwatch, Dynatrace, etc.
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: alertmanager-check
spec:
schedule: "*/5 * * * *"
alertmanager:
- url: alertmanager.monitoring.svc
alerts:
- .*
ignore:
- KubeScheduler.*
- Watchdog
transform:
# for each alert, transform it into a new check
javascript: |
var out = _.map(results, function(r) {
return {
name: r.name,
labels: r.labels,
icon: 'alert',
message: r.message,
description: r.message,
}
})
JSON.stringify(out);
Prometheus Exporter Replacement
Export custom metrics from the result of any check, making it possible to replace various other promethus exporters that collect metrics via HTTP, SQL, etc..
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: exchange-rates
spec:
schedule: "every 1 @hour"
http:
- name: exchange-rates
url: https://api.frankfurter.app/latest?from=USD&to=GBP,EUR,ILS
metrics:
- name: exchange_rate
type: gauge
value: result.json.rates.GBP
labels:
- name: "from"
value: "USD"
- name: to
value: GBP
Canary checker is ideal for building platforms, developers can include health checks for their applications in whatever tooling they prefer, with secret management that uses native Kubernetes constructs.
apiVersion: v1
kind: Secret
metadata:
name: basic-auth
stringData:
user: john
pass: doe
---
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: http-basic-auth-configmap
spec:
http:
- url: https://httpbin.demo.aws.flanksource.com/basic-auth/john/doe
username:
valueFrom:
secretKeyRef:
name: basic-auth
key: user
password:
valueFrom:
secretKeyRef:
name: basic-auth
key: pass
Dashboard
Canary checker comes with a built-in dashboard by default
There is also a grafana dashboard, or build your own using the metrics exposed.
Getting Help
If you have any questions about canary checker:
Your feedback is always welcome!
Check Types
Protocol |
Status |
Checks |
HTTP(s) |
GA |
Response body, headers and duration |
DNS |
GA |
Response and duration |
Ping/ICMP |
GA |
Duration and packet loss |
TCP |
GA |
Port is open and connectable |
Data Sources |
|
|
SQL (MySQL, Postgres, SQL Server) |
GA |
Ability to login, results, duration, health exposed via stored procedures |
LDAP |
GA |
Ability to login, response time |
ElasticSearch / Opensearch |
GA |
Ability to login, response time, size of search results |
Mongo |
Beta |
Ability to login, results, duration, |
Redis |
GA |
Ability to login, results, duration, |
Prometheus |
GA |
Ability to login, results, duration, |
Alerts |
|
Prometheus |
Prometheus Alert Manager |
GA |
Pending and firing alerts |
AWS Cloudwatch Alarms |
GA |
Pending and firing alarms |
Dynatrace Problems |
Beta |
Problems deteced |
DevOps |
|
|
Git |
GA |
Query Git and Github repositories via SQL |
Azure Devops |
Beta |
|
Integration Testing |
|
|
JMeter |
Beta |
Runs and checks the result of a JMeter test |
JUnit / BYO |
Beta |
Run a pod that saves Junit test results |
K6 |
Beta |
Runs K6 tests that export JUnit via a container |
Newman |
Beta |
Runs Newman / Postman tests that export JUnit via a container |
Playwright |
Beta |
Runs Playwright tests that export JUnit via a container |
File Systems / Batch |
|
|
Local Disk / NFS |
GA |
Check folders for files that are: too few/many, too old/new, too small/large |
S3 |
GA |
Check contents of AWS S3 Buckets |
GCS |
GA |
Check contents of Google Cloud Storage Buckets |
SFTP |
GA |
Check contents of folders over SFTP |
SMB / CIFS |
GA |
Check contents of folders over SMB/CIFS |
Config |
|
|
AWS Config |
GA |
Query AWS config using SQL |
AWS Config Rule |
GA |
AWS Config Rules that are firing, Custom AWS Config queries |
Config DB |
GA |
Custom config queries for Mission Control Config D |
Kubernetes Resources |
GA |
Kubernetes resources that are missing or are in a non-ready state |
Backups |
|
|
GCP Databases |
GA |
Backup freshness |
Restic |
Beta |
Backup freshness and integrity |
Infrastructure |
|
|
EC2 |
GA |
Ability to launch new EC2 instances |
Kubernetes Ingress |
GA |
Ability to schedule and then route traffic via an ingress to a pod |
Docker/Containerd |
Deprecated |
Ability to push and pull containers via docker/containerd |
Helm |
Deprecated |
Ability to push and pull helm charts |
S3 Protocol |
GA |
Ability to read/write/list objects on an S3 compatible object store |
Contributing
See CONTRIBUTING.md
Thank you to all our contributors !
License
Canary Checker core (the code in this repository) is licensed under Apache 2.0 and accepts contributions via GitHub pull requests after signing a CLA.
The UI (Dashboard) is free to use with canary checker under a license exception of Flanksource UI