quarantine

package
v0.8.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 13, 2024 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type NodeQuarantiner

type NodeQuarantiner struct {
	// contains filtered or unexported fields
}

NodeQuarantiner determines whether nodes should be quarantined, i.e., removed from consideration when scheduling new jobs, based on the estimated failure probability of the node.

Specifically, any node for which the following is true is quarantined: 1. The estimated failure probability exceeds failureProbabilityQuarantineThreshold. 2. The failure probability estimate was updated at most failureProbabilityEstimateTimeout ago.

func NewNodeQuarantiner

func NewNodeQuarantiner(
	failureProbabilityQuarantineThreshold float64,
	failureProbabilityEstimateTimeout time.Duration,
	failureEstimator *failureestimator.FailureEstimator,
) (*NodeQuarantiner, error)

func (*NodeQuarantiner) Collect

func (nq *NodeQuarantiner) Collect(ch chan<- prometheus.Metric)

func (*NodeQuarantiner) Describe

func (nq *NodeQuarantiner) Describe(ch chan<- *prometheus.Desc)

func (*NodeQuarantiner) IsQuarantined

func (nq *NodeQuarantiner) IsQuarantined(t time.Time, nodeName string) (taint v1.Taint, isQuarantined bool)

IsQuarantined returns true if the node is quarantined and a taint expressing the reason why, and false otherwise.

type QueueQuarantiner

type QueueQuarantiner struct {
	// contains filtered or unexported fields
}

QueueQuarantiner determines whether queues should be quarantined, i.e., whether we should reduce the rate which we schedule jobs from the queue, based on the estimated failure probability of the queue.

Specifically, each queue has a quarantine factor associated with it equal to: - Zero, if the failure probability estimate was last updated more then failureProbabilityEstimateTimeout ago. - Failure probability estimate of the queue multiplied by quarantineFactorMultiplier otherwise.

func NewQueueQuarantiner

func NewQueueQuarantiner(
	quarantineFactorMultiplier float64,
	failureProbabilityEstimateTimeout time.Duration,
	failureEstimator *failureestimator.FailureEstimator,
) (*QueueQuarantiner, error)

func (*QueueQuarantiner) Collect

func (qq *QueueQuarantiner) Collect(ch chan<- prometheus.Metric)

func (*QueueQuarantiner) Describe

func (qq *QueueQuarantiner) Describe(ch chan<- *prometheus.Desc)

func (*QueueQuarantiner) QuarantineFactor

func (qq *QueueQuarantiner) QuarantineFactor(t time.Time, queueName string) float64

QuarantineFactor returns a value in [0, 1] indicating to which extent the queue should be quarantined, where 0.0 indicates not at all and 1.0 completely.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL