Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type NodeQuarantiner ¶
type NodeQuarantiner struct {
// contains filtered or unexported fields
}
NodeQuarantiner determines whether nodes should be quarantined, i.e., removed from consideration when scheduling new jobs, based on the estimated failure probability of the node.
Specifically, any node for which the following is true is quarantined: 1. The estimated failure probability exceeds failureProbabilityQuarantineThreshold. 2. The failure probability estimate was updated at most failureProbabilityEstimateTimeout ago.
func NewNodeQuarantiner ¶
func NewNodeQuarantiner( failureProbabilityQuarantineThreshold float64, failureProbabilityEstimateTimeout time.Duration, failureEstimator *failureestimator.FailureEstimator, ) (*NodeQuarantiner, error)
func (*NodeQuarantiner) Collect ¶
func (nq *NodeQuarantiner) Collect(ch chan<- prometheus.Metric)
func (*NodeQuarantiner) Describe ¶
func (nq *NodeQuarantiner) Describe(ch chan<- *prometheus.Desc)
func (*NodeQuarantiner) IsQuarantined ¶
func (nq *NodeQuarantiner) IsQuarantined(t time.Time, nodeName string) (taint v1.Taint, isQuarantined bool)
IsQuarantined returns true if the node is quarantined and a taint expressing the reason why, and false otherwise.
type QueueQuarantiner ¶
type QueueQuarantiner struct {
// contains filtered or unexported fields
}
QueueQuarantiner determines whether queues should be quarantined, i.e., whether we should reduce the rate which we schedule jobs from the queue, based on the estimated failure probability of the queue.
Specifically, each queue has a quarantine factor associated with it equal to: - Zero, if the failure probability estimate was last updated more then failureProbabilityEstimateTimeout ago. - Failure probability estimate of the queue multiplied by quarantineFactorMultiplier otherwise.
func NewQueueQuarantiner ¶
func NewQueueQuarantiner( quarantineFactorMultiplier float64, failureProbabilityEstimateTimeout time.Duration, failureEstimator *failureestimator.FailureEstimator, ) (*QueueQuarantiner, error)
func (*QueueQuarantiner) Collect ¶
func (qq *QueueQuarantiner) Collect(ch chan<- prometheus.Metric)
func (*QueueQuarantiner) Describe ¶
func (qq *QueueQuarantiner) Describe(ch chan<- *prometheus.Desc)
func (*QueueQuarantiner) QuarantineFactor ¶
func (qq *QueueQuarantiner) QuarantineFactor(t time.Time, queueName string) float64
QuarantineFactor returns a value in [0, 1] indicating to which extent the queue should be quarantined, where 0.0 indicates not at all and 1.0 completely.