spot-interruption-exporter

command module
v0.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 29, 2023 License: Apache-2.0 Imports: 14 Imported by: 0

README

spot-interruption-exporter

Publishes a prometheus metric interruption_events_total that increments by 1 whenever a spot instance has been interrupted.

This is a very helpful metric, as it

  • helps correlate workload issues with spot interruption times

  • can aid in seeing if certain flavours are more susceptible to interruption

  • can aid in seeing how much more susceptible single-zone clusters are to interruption

  • can be used as a signal on whether to promote spot instances to other environments

The app can be expanded to support other cloud providers, but currently is only built for GCP.

spot-interruption-exporter-gcp

Config

The app reads in a config file from $CONFIG_PATH with the structure below.

cloud_provider: gcp 
gcp:
  project_name: example
  subscription_name: spot-interruption-exporter-subscription 
prometheus:
  port: 8090 
  path: /metrics 

Deploying

Infrastructure

You'll need to deploy the required infrastructure before standing up the application.

The infrastructure that the app depends for GCP on can be created via

$ terraform -chdir=infra/gcp init
$ terraform -chdir=infra/gcp apply

and can be destroyed via

$ terraform -chdir=infra/gcp destroy
Kubernetes manifests

kustomize/ holds relevant kubernetes config files. You will likely want to overlay the base resources. For an example of how you might do this, see kustomize/example-overlay.

Verifying

You can send a test message via

$ gcloud pubsub topics publish spot-interruption-exporter-topic --message '{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {
      "message": "Instance was preempted."
    },
    "authenticationInfo": {
      "principalEmail": "system@google.com"
    },
    "serviceName": "compute.googleapis.com",
    "methodName": "compute.instances.preempted",
    "resourceName": "projects/mock-project/zones/europe-west1-c/instances/mock-instance-spot-3706-5b909138-nr65",
    "request": {
      "@type": "type.googleapis.com/compute.instances.preempted"
    }
  },
  "insertId": "qnwer3e38dfz",
  "resource": {
    "type": "gce_instance",
    "labels": {
      "instance_id": "184448819...",
      "project_id": "mock-project",
      "zone": "europe-west1-c"
    }
  },
  "timestamp": "2023-09-16T10:42:31.325309Z",
  "severity": "INFO",
  "logName": "projects/mock-project/logs/cloudaudit.googleapis.com%2Fsystem_event",
  "operation": {
    "id": "systemevent-1694860946116....",
    "producer": "compute.instances.preempted",
    "first": true,
    "last": true
  },
  "receiveTimestamp": "2023-09-16T10:42:31.782066320Z"
}'

After sending a few messages, you can view the metric count increasing

$ curl localhost:8080/metrics | grep interruption
# HELP interruption_events_total The total number of interruption events for a given cluster
# TYPE interruption_events_total counter
interruption_events_total{kubernetes_cluster="kubernetes-cluster"} 6

Documentation

Overview

Package main listens for interruption events from the specified Notifier, incrementing a counter every time an event is received

Directories

Path Synopsis
internal
cache
Package cache provides a simple ttl-based item cache
Package cache provides a simple ttl-based item cache
events
Package events defines an interface for receiving interruption events, along with cloud-provider specific implementations
Package events defines an interface for receiving interruption events, along with cloud-provider specific implementations

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL