Documentation ¶
Overview ¶
Package collectors provides a number of Prometheus collectors which are capable of retrieving metrics from a Ceph cluster.
Index ¶
Constants ¶
const ( // CephHealthOK denotes the status of ceph cluster when healthy. CephHealthOK = "HEALTH_OK" // CephHealthWarn denotes the status of ceph cluster when unhealthy but recovering. CephHealthWarn = "HEALTH_WARN" // CephHealthErr denotes the status of ceph cluster when unhealthy but usually needs // manual intervention. CephHealthErr = "HEALTH_ERR" )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ClusterHealthCollector ¶
type ClusterHealthCollector struct { // HealthStatus shows the overall health status of a given cluster. HealthStatus prometheus.Gauge // TotalPGs shows the total no. of PGs the cluster constitutes of. TotalPGs prometheus.Gauge // DegradedPGs shows the no. of PGs that have some of the replicas // missing. DegradedPGs prometheus.Gauge // StuckDegradedPGs shows the no. of PGs that have some of the replicas // missing, and are stuck in that state. StuckDegradedPGs prometheus.Gauge // UncleanPGs shows the no. of PGs that do not have all objects in the PG // that are supposed to be in it. UncleanPGs prometheus.Gauge // StuckUncleanPGs shows the no. of PGs that do not have all objects in the PG // that are supposed to be in it, and are stuck in that state. StuckUncleanPGs prometheus.Gauge // UndersizedPGs depicts the no. of PGs that have fewer copies than configured // replication level. UndersizedPGs prometheus.Gauge // StuckUndersizedPGs depicts the no. of PGs that have fewer copies than configured // replication level, and are stuck in that state. StuckUndersizedPGs prometheus.Gauge // StalePGs depicts no. of PGs that are in an unknown state i.e. monitors do not know // anything about their latest state since their pg mapping was modified. StalePGs prometheus.Gauge // StuckStalePGs depicts no. of PGs that are in an unknown state i.e. monitors do not know // anything about their latest state since their pg mapping was modified, and are stuck // in that state. StuckStalePGs prometheus.Gauge // DegradedObjectsCount gives the no. of RADOS objects are constitute the degraded PGs. // This includes object replicas in its count. DegradedObjectsCount prometheus.Gauge // MisplacedObjectsCount gives the no. of RADOS objects that constitute the misplaced PGs. // Misplaced PGs usually represent the PGs that are not in the storage locations that // they should be in. This is different than degraded PGs which means a PG has fewer copies // that it should. // This includes object replicas in its count. MisplacedObjectsCount prometheus.Gauge // OSDsDown show the no. of OSDs that are in the DOWN state. OSDsDown prometheus.Gauge // OSDsUp show the no. of OSDs that are in the UP state and are able to serve requests. OSDsUp prometheus.Gauge // OSDsIn shows the no. of OSDs that are marked as IN in the cluster. OSDsIn prometheus.Gauge // OSDsNum shows the no. of total OSDs the cluster has. OSDsNum prometheus.Gauge // RemappedPGs show the no. of PGs that are currently remapped and needs to be moved // to newer OSDs. RemappedPGs prometheus.Gauge // RecoveryIORate shows the i/o rate at which the cluster is performing its ongoing // recovery at. RecoveryIORate prometheus.Gauge // RecoveryIOKeys shows the rate of rados keys recovery. RecoveryIOKeys prometheus.Gauge // RecoveryIOObjects shows the rate of rados objects being recovered. RecoveryIOObjects prometheus.Gauge // ClientIORead shows the total client read i/o on the cluster. ClientIORead prometheus.Gauge // ClientIOWrite shows the total client write i/o on the cluster. ClientIOWrite prometheus.Gauge // ClientIOOps shows the rate of total operations conducted by all clients on the cluster. ClientIOOps prometheus.Gauge // ClientIOReadOps shows the rate of total read operations conducted by all clients on the cluster. ClientIOReadOps prometheus.Gauge // ClientIOWriteOps shows the rate of total write operations conducted by all clients on the cluster. ClientIOWriteOps prometheus.Gauge // CacheFlushIORate shows the i/o rate at which data is being flushed from the cache pool. CacheFlushIORate prometheus.Gauge // CacheEvictIORate shows the i/o rate at which data is being flushed from the cache pool. CacheEvictIORate prometheus.Gauge // CachePromoteIOOps shows the rate of operations promoting objects to the cache pool. CachePromoteIOOps prometheus.Gauge // contains filtered or unexported fields }
ClusterHealthCollector collects information about the health of an overall cluster. It surfaces changes in the ceph parameters unlike data usage that ClusterUsageCollector does.
func NewClusterHealthCollector ¶
func NewClusterHealthCollector(conn Conn) *ClusterHealthCollector
NewClusterHealthCollector creates a new instance of ClusterHealthCollector to collect health metrics on.
func (*ClusterHealthCollector) Collect ¶
func (c *ClusterHealthCollector) Collect(ch chan<- prometheus.Metric)
Collect sends all the collected metrics to the provided prometheus channel. It requires the caller to handle synchronization.
func (*ClusterHealthCollector) Describe ¶
func (c *ClusterHealthCollector) Describe(ch chan<- *prometheus.Desc)
Describe sends all the descriptions of individual metrics of ClusterHealthCollector to the provided prometheus channel.
type ClusterUsageCollector ¶
type ClusterUsageCollector struct { // GlobalCapacity displays the total storage capacity of the cluster. This // information is based on the actual no. of objects that are allocated. It // does not take overcommitment into consideration. GlobalCapacity prometheus.Gauge // UsedCapacity shows the storage under use. UsedCapacity prometheus.Gauge // AvailableCapacity shows the remaining capacity of the cluster that is left unallocated. AvailableCapacity prometheus.Gauge // Objects show the total no. of RADOS objects that are currently allocated. Objects prometheus.Gauge // contains filtered or unexported fields }
A ClusterUsageCollector is used to gather all the global stats about a given ceph cluster. It is sometimes essential to know how fast the cluster is growing or shrinking as a whole in order to zero in on the cause. The pool specific stats are provided separately.
func NewClusterUsageCollector ¶
func NewClusterUsageCollector(conn Conn) *ClusterUsageCollector
NewClusterUsageCollector creates and returns the reference to ClusterUsageCollector and internally defines each metric that display cluster stats.
func (*ClusterUsageCollector) Collect ¶
func (c *ClusterUsageCollector) Collect(ch chan<- prometheus.Metric)
Collect sends the metric values for each metric pertaining to the global cluster usage over to the provided prometheus Metric channel.
func (*ClusterUsageCollector) Describe ¶
func (c *ClusterUsageCollector) Describe(ch chan<- *prometheus.Desc)
Describe sends the descriptors of each metric over to the provided channel. The corresponding metric values are sent separately.
type Conn ¶
type Conn interface { ReadDefaultConfigFile() error Connect() error Shutdown() MonCommand([]byte) ([]byte, string, error) }
Conn interface implements only necessary methods that are used in this repository of *rados.Conn. This keeps rest of the implementation clean and *rados.Conn doesn't need to show up everywhere (it being more of an implementation detail in reality). Also it makes mocking easier for unit-testing the collectors.
type MonitorCollector ¶
type MonitorCollector struct { // TotalKBs display the total storage a given monitor node has. TotalKBs *prometheus.GaugeVec // UsedKBs depict how much of the total storage our monitor process // has utilized. UsedKBs *prometheus.GaugeVec // AvailKBs shows the space left unused. AvailKBs *prometheus.GaugeVec // PercentAvail shows the amount of unused space as a percentage of total // space. PercentAvail *prometheus.GaugeVec // Store exposes information about internal backing store. Store Store // ClockSkew shows how far the monitor clocks have skewed from each other. This // is an important metric because the functioning of Ceph's paxos depends on // the clocks being aligned as close to each other as possible. ClockSkew *prometheus.GaugeVec // Latency displays the time the monitors take to communicate between themselves. Latency *prometheus.GaugeVec // NodesinQuorum show the size of the working monitor quorum. Any change in this // metric can imply a significant issue in the cluster if it is not manually changed. NodesinQuorum prometheus.Gauge // contains filtered or unexported fields }
MonitorCollector is used to extract stats related to monitors running within Ceph cluster. As we extract information pertaining to each monitor instance, there are various vector metrics we need to use.
func NewMonitorCollector ¶
func NewMonitorCollector(conn Conn) *MonitorCollector
NewMonitorCollector creates an instance of the MonitorCollector and instantiates the individual metrics that show information about the monitor processes.
func (*MonitorCollector) Collect ¶
func (m *MonitorCollector) Collect(ch chan<- prometheus.Metric)
Collect extracts the given metrics from the Monitors and sends it to the prometheus channel.
func (*MonitorCollector) Describe ¶
func (m *MonitorCollector) Describe(ch chan<- *prometheus.Desc)
Describe sends the descriptors of each Monitor related metric we have defined to the channel provided.
type NoopConn ¶
type NoopConn struct {
// contains filtered or unexported fields
}
NoopConn is the stub we use for mocking rados Conn. Unit testing each individual collectors becomes a lot easier after that.
func NewNoopConn ¶
NewNoopConn returns an instance of *NoopConn. The string that we want outputted at the end of the command we issue to ceph, should be specified in the only input parameter.
func (*NoopConn) Connect ¶
Connect does not need to return an error. It satisfies rados.Conn's function with the same prototype.
func (*NoopConn) MonCommand ¶
MonCommand returns the provided output string to NoopConn as is, making it seem like it actually ran something and produced that string as a result.
func (*NoopConn) ReadDefaultConfigFile ¶
ReadDefaultConfigFile does not need to return an error. It satisfies rados.Conn's function with the same prototype.
type OSDCollector ¶
type OSDCollector struct { // CrushWeight is a persistent setting, and it affects how CRUSH assigns data to OSDs. // It displays the CRUSH weight for the OSD CrushWeight *prometheus.GaugeVec // Depth displays the OSD's level of hierarchy in the CRUSH map Depth *prometheus.GaugeVec // Reweight sets an override weight on the OSD. // It displays value within 0 to 1. Reweight *prometheus.GaugeVec // Bytes displays the total bytes available in the OSD Bytes *prometheus.GaugeVec // UsedBytes displays the total used bytes in the OSD UsedBytes *prometheus.GaugeVec // AvailBytes displays the total available bytes in the OSD AvailBytes *prometheus.GaugeVec // Utilization displays current utilization of the OSD Utilization *prometheus.GaugeVec // Variance displays current variance of the OSD from the standard utilization Variance *prometheus.GaugeVec // Pgs displays total no. of placement groups in the OSD. // Available in Ceph Jewel version. Pgs *prometheus.GaugeVec // CommitLatency displays in seconds how long it takes for an operation to be applied to disk CommitLatency *prometheus.GaugeVec // ApplyLatency displays in seconds how long it takes to get applied to the backing filesystem ApplyLatency *prometheus.GaugeVec // OSDIn displays the In state of the OSD OSDIn *prometheus.GaugeVec // OSDUp displays the Up state of the OSD OSDUp *prometheus.GaugeVec // TotalBytes displays total bytes in all OSDs TotalBytes prometheus.Gauge // TotalUsedBytes displays total used bytes in all OSDs TotalUsedBytes prometheus.Gauge // TotalAvailBytes displays total available bytes in all OSDs TotalAvailBytes prometheus.Gauge // AverageUtil displays average utilization in all OSDs AverageUtil prometheus.Gauge // contains filtered or unexported fields }
OSDCollector displays statistics about OSD in the ceph cluster. An important aspect of monitoring OSDs is to ensure that when the cluster is up and running that all OSDs that are in the cluster are up and running, too
func NewOSDCollector ¶
func NewOSDCollector(conn Conn) *OSDCollector
NewOSDCollector creates an instance of the OSDCollector and instantiates the individual metrics that show information about the OSD.
func (*OSDCollector) Collect ¶
func (o *OSDCollector) Collect(ch chan<- prometheus.Metric)
Collect sends all the collected metrics to the provided prometheus channel. It requires the caller to handle synchronization.
func (*OSDCollector) Describe ¶
func (o *OSDCollector) Describe(ch chan<- *prometheus.Desc)
Describe sends the descriptors of each OSDCollector related metrics we have defined to the provided prometheus channel.
type PoolUsageCollector ¶
type PoolUsageCollector struct { // UsedBytes tracks the amount of bytes currently allocated for the pool. This // does not factor in the overcommitment made for individual images. UsedBytes *prometheus.GaugeVec // RawUsedBytes tracks the amount of raw bytes currently used for the pool. This // factors in the replication factor (size) of the pool. RawUsedBytes *prometheus.GaugeVec // MaxAvail tracks the amount of bytes currently free for the pool, // which depends on the replication settings for the pool in question. MaxAvail *prometheus.GaugeVec // Objects shows the no. of RADOS objects created within the pool. Objects *prometheus.GaugeVec // DirtyObjects shows the no. of RADOS dirty objects in a cache-tier pool, // this doesn't make sense in a regular pool, see: // http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000557.html DirtyObjects *prometheus.GaugeVec // ReadIO tracks the read IO calls made for the images within each pool. ReadIO *prometheus.GaugeVec // Readbytes tracks the read throughput made for the images within each pool. ReadBytes *prometheus.GaugeVec // WriteIO tracks the write IO calls made for the images within each pool. WriteIO *prometheus.GaugeVec // WriteBytes tracks the write throughput made for the images within each pool. WriteBytes *prometheus.GaugeVec // contains filtered or unexported fields }
PoolUsageCollector displays statistics about each pool we have created in the ceph cluster.
func NewPoolUsageCollector ¶
func NewPoolUsageCollector(conn Conn) *PoolUsageCollector
NewPoolUsageCollector creates a new instance of PoolUsageCollector and returns its reference.
func (*PoolUsageCollector) Collect ¶
func (p *PoolUsageCollector) Collect(ch chan<- prometheus.Metric)
Collect extracts the current values of all the metrics and sends them to the prometheus channel.
func (*PoolUsageCollector) Describe ¶
func (p *PoolUsageCollector) Describe(ch chan<- *prometheus.Desc)
Describe fulfills the prometheus.Collector's interface and sends the descriptors of pool's metrics to the given channel.
type Store ¶
type Store struct { // TotalBytes displays the current size of the FileStore. TotalBytes *prometheus.GaugeVec // SSTBytes shows the amount used by LevelDB's sorted-string tables. SSTBytes *prometheus.GaugeVec // LogBytes shows the amount used by logs. LogBytes *prometheus.GaugeVec // MiscBytes shows the amount used by miscellaneous information. MiscBytes *prometheus.GaugeVec }
Store displays information about Monitor's FileStore. It is responsible for storing all the meta information about the cluster, including monmaps, osdmaps, pgmaps, etc. along with logs and other data.