shepherd

command
v1.2.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 23, 2024 License: Apache-2.0 Imports: 63 Imported by: 0

README

Shepherd service

Shepherd service collects prometheus metrics on a cluster and writes them into an influxDB.

The service collects cluster-wide as well as per-pod metrics. Service will create database clusterstats in the influxDB if one doesn't exist. Cluster metrics are collected and stored in crm-cluster measurement in clusterstats database. It includes the following metrics:

  • cpu - cluster CPU utilization percentage
  • mem - cluster memory utilization percentage
  • disk - cluster filesystem utilization percentage
  • sendBytes - cluster tx traffic rate averaged over 1 minute
  • recvBytes - cluster rx traffic rate averaged over 1 minute
  • tcpConns - total number of established TCP connections on this cluster
  • tcpRetrans - total number of TCP retransmissions on this cluster
  • udpRecv - total number of rx UDP datagrams on this cluster
  • udpSend - total number of tx UDP datagrams on this cluster
  • udpRecvErr - tatal number of UDP errors received on this cluster In addition to the above values cluster tag is added to each measurement with the name of a cluster. Per-pod metrics are collected and stored in crm-appinst measurement in clusterstats database. The following metrics are collected:
  • cpu - CPU utilization of this pod as a percentage of total available CPU
  • mem - current memory footprint of a given pod in bytes
  • disk - filesystem usage for a given pod
  • sendBytes - tx traffic rate averaged over 1 minute for a given pod
  • recvBytes - rx traffic rate averaged over 1 minute for a given pod In addition to the above values cluster, dev, and app tags are added to the measurement to uniquely identify a particular time series.

The collection of the above metrics happens every set interval by running queries against a prometheus running in a cluster. See Usage section for addition details of how to configure interval/influxDB address/Prometheus address.

Usage

This service is meant to run as a process (similar to crm) that can be started locally with the following usage.

$ shepherd -h
Usage of shepherd:
  -cloudletKey string
    	Json or Yaml formatted cloudletKey for the cloudlet in which this CRM is instantiated; e.g. '{"operator_key":{"name":"DMUUS"},"name":"tmocloud1"}'
  -d string
    	comma separated list of [etcd api notify dmedb dmereq locapi mexos metrics upgrade]
  -influxdb string
    	InfluxDB address to export to (default "http://0.0.0.0:8086")
  -interval duration
    	Metrics collection interval (default 15s)
  -notifyAddrs string
    	CRM notify listener addresses (default "127.0.0.1:51001")
  -physicalName string
    	Physical infrastructure cloudlet name, defaults to cloudlet name in cloudletKey
  -platform string
    	Platform type of Cloudlet
  -tls string
    	server9 tls cert file.  Keyfile and CA file mex-ca.crt must be in same directory
  -vaultAddr string
    	Address to vault

Docker Image

Currently not available, will be soon

TODO

  1. Need to find a better way to organize metrics being sent to influxdb. It is currently too rigid to provide configurable metrics and adding in a new one later would be a hassle.
  2. Register shepherd with the country controller to be able to send metrics through the notify framework so that controller writes to influxdb instead of shepherd.
  3. Azure support

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL