metrics

package
v0.0.0-...-e326408 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 20, 2024 License: Apache-2.0 Imports: 39 Imported by: 1

Documentation

Index

Constants

View Source
const (
	MetricOvnkubeNamespace               = "ovnkube"
	MetricOvnkubeSubsystemController     = "controller"
	MetricOvnkubeSubsystemClusterManager = "clustermanager"
	MetricOvnkubeSubsystemNode           = "node"
	MetricOvnNamespace                   = "ovn"
	MetricOvnSubsystemDB                 = "db"
	MetricOvnSubsystemNorthd             = "northd"
	MetricOvnSubsystemController         = "controller"
	MetricOvsNamespace                   = "ovs"
	MetricOvsSubsystemVswitchd           = "vswitchd"
	MetricOvsSubsystemDB                 = "db"
)
View Source
const (
	DepthKey                   = "depth"
	AddsKey                    = "adds_total"
	QueueLatencyKey            = "queue_duration_seconds"
	WorkDurationKey            = "work_duration_seconds"
	UnfinishedWorkKey          = "unfinished_work_seconds"
	LongestRunningProcessorKey = "longest_running_processor_seconds"
	RetriesKey                 = "retries_total"
)

Metrics keys used by the workqueue.

Variables

View Source
var MetricCNIRequestDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemNode,
	Name:      "cni_request_duration_seconds",
	Help:      "The duration of CNI server requests.",
	Buckets:   prometheus.ExponentialBuckets(.1, 2, 15)},

	[]string{"command", "err"},
)

MetricCNIRequestDuration is a prometheus metric that tracks the duration of CNI requests

View Source
var MetricClusterManagerLeader = prometheus.NewGauge(prometheus.GaugeOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemClusterManager,
	Name:      "leader",
	Help:      "Identifies whether the instance of ovnkube-cluster-manager is a leader(1) or not(0).",
})

MetricClusterManagerLeader identifies whether this instance of ovnkube-cluster-manager is a leader or not

View Source
var MetricClusterManagerReadyDuration = prometheus.NewGauge(prometheus.GaugeOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemClusterManager,
	Name:      "ready_duration_seconds",
	Help:      "The duration for the cluster manager to get to ready state",
})
View Source
var MetricNodeReadyDuration = prometheus.NewGauge(prometheus.GaugeOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemNode,
	Name:      "ready_duration_seconds",
	Help:      "The duration for the node to get to ready state.",
})
View Source
var MetricOVNKubeControllerLeader = prometheus.NewGauge(prometheus.GaugeOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "leader",
	Help:      "Identifies whether the instance of ovnkube-controller is a leader(1) or not(0).",
})

MetricOVNKubeControllerLeader identifies whether this instance of ovnkube-controller is a leader or not

View Source
var MetricOVNKubeControllerReadyDuration = prometheus.NewGauge(prometheus.GaugeOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "ready_duration_seconds",
	Help:      "The duration for the ovnkube-controller to get to ready state",
})
View Source
var MetricOVNKubeControllerSyncDuration = prometheus.NewGaugeVec(prometheus.GaugeOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "sync_duration_seconds",
	Help:      "The duration to sync and setup all handlers for a given resource"},
	[]string{
		"resource_name",
	})

MetricOVNKubeControllerSyncDuration is the time taken to complete initial Watch for different resource. Resource name is in the label.

View Source
var MetricOvsInterfaceUpWait = prometheus.NewCounter(prometheus.CounterOpts{
	Namespace: MetricOvsNamespace,
	Subsystem: MetricOvsSubsystemVswitchd,
	Name:      "interface_up_wait_seconds_total",
	Help: "The total number of seconds that is required to wait for pod " +
		"Open vSwitch interface until its available",
})
View Source
var MetricRequeueServiceCount = prometheus.NewCounter(prometheus.CounterOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "requeue_service_total",
	Help:      "A metric that captures the number of times a service is requeued after failing to sync with OVN"},
)

MetricRequeueServiceCount is the number of times a particular service has been requeued.

View Source
var MetricResourceAddLatency = prometheus.NewHistogram(prometheus.HistogramOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "resource_add_latency_seconds",
	Help:      "The duration to process all handlers for a given resource event - add.",
	Buckets:   prometheus.ExponentialBuckets(.1, 2, 15)},
)

MetricResourceAddLatency is the time taken to complete resource update by an handler. This measures the latency for all of the handlers for a given resource.

View Source
var MetricResourceDeleteLatency = prometheus.NewHistogram(prometheus.HistogramOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "resource_delete_latency_seconds",
	Help:      "The duration to process all handlers for a given resource event - delete.",
	Buckets:   prometheus.ExponentialBuckets(.1, 2, 15)},
)

MetricResourceDeleteLatency is the time taken to complete resource update by an handler. This measures the latency for all of the handlers for a given resource.

View Source
var MetricResourceRetryFailuresCount = prometheus.NewCounter(prometheus.CounterOpts{
	Namespace: MetricOvnkubeNamespace,
	Name:      "resource_retry_failures_total",
	Help:      "The total number of times processing a Kubernetes resource reached the maximum retry limit and was no longer processed",
})

MetricResourceRetryFailuresCount is the number of times retrying to reconcile a Kubernetes resource reached the maximum retry limit and will not be retried. This metric doesn't need Subsystem string since it is applicable for both master and node.

View Source
var MetricResourceUpdateCount = prometheus.NewCounterVec(prometheus.CounterOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "resource_update_total",
	Help:      "The number of times a given resource event (add, update, or delete) has been handled"},
	[]string{
		"name",
		"event",
	},
)

MetricResourceUpdateCount is the number of times a particular resource's UpdateFunc has been called.

View Source
var MetricResourceUpdateLatency = prometheus.NewHistogram(prometheus.HistogramOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "resource_update_latency_seconds",
	Help:      "The duration to process all handlers for a given resource event - update.",
	Buckets:   prometheus.ExponentialBuckets(.1, 2, 15)},
)

MetricResourceUpdateLatency is the time taken to complete resource update by an handler. This measures the latency for all of the handlers for a given resource.

View Source
var MetricSyncServiceCount = prometheus.NewCounter(prometheus.CounterOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "sync_service_total",
	Help:      "A metric that captures the number of times a service is synced with OVN load balancers"},
)

MetricSyncServiceCount is the number of times a particular service has been synced.

View Source
var MetricSyncServiceLatency = prometheus.NewHistogram(prometheus.HistogramOpts{
	Namespace: MetricOvnkubeNamespace,
	Subsystem: MetricOvnkubeSubsystemController,
	Name:      "sync_service_latency_seconds",
	Help:      "The latency of syncing a service with the OVN load balancers",
	Buckets:   prometheus.ExponentialBuckets(.1, 2, 15)},
)

MetricSyncServiceLatency is the time taken to sync a service with the OVN load balancers.

Functions

func DecrementANPCount

func DecrementANPCount()

DecrementANPCount decrements the number of Admin Network Policies

func DecrementBANPCount

func DecrementBANPCount()

DecrementBANPCount decrements the number of Baseline Admin Network Policies

func DecrementEgressFirewallCount

func DecrementEgressFirewallCount()

DecrementEgressFirewallCount decrements the number of Egress firewalls

func IncrementANPCount

func IncrementANPCount()

IncrementANPCount increments the number of Admin Network Policies

func IncrementBANPCount

func IncrementBANPCount()

IncrementBANPCount increments the number of Baseline Admin Network Policies

func IncrementEgressFirewallCount

func IncrementEgressFirewallCount()

IncrementEgressFirewallCount increments the number of Egress firewalls

func MonitorIPSec

func MonitorIPSec(ovnNBClient libovsdbclient.Client)

MonitorIPSec will register a metric to determine if IPSec is enabled/disabled. It will also add a handler to NB libovsdb cache to update the IPSec metric. This function should only be called once.

func RecordEgressIPAssign

func RecordEgressIPAssign(duration time.Duration)

RecordEgressIPAssign records how long it took EgressIP to configure OVN.

func RecordEgressIPCount

func RecordEgressIPCount(count float64)

RecordEgressIPCount records the total number of Egress IPs. This total may include multiple Egress IPs per EgressIP CR.

func RecordEgressIPRebalance

func RecordEgressIPRebalance(count int)

RecordEgressIPRebalance records how many EgressIPs had to move to a different egress node.

func RecordEgressIPUnassign

func RecordEgressIPUnassign(duration time.Duration)

RecordEgressIPUnassign records how long it took EgressIP to unconfigure OVN.

func RecordEgressIPUnreachableNode

func RecordEgressIPUnreachableNode()

RecordEgressIPReachableNode records how many times EgressIP detected an unuseable node.

func RecordEgressRoutingViaHost

func RecordEgressRoutingViaHost()

RecordEgressRoutingViaHost records the egress gateway mode of the cluster The values are: 0: If it is shared gateway mode 1: If it is local gateway mode 2: invalid mode

func RecordNetpolEvent

func RecordNetpolEvent(eventName string, duration time.Duration)

func RecordNetpolLocalPodEvent

func RecordNetpolLocalPodEvent(eventName string, duration time.Duration)

func RecordNetpolPeerNamespaceEvent

func RecordNetpolPeerNamespaceEvent(eventName string, duration time.Duration)

func RecordPodCreated

func RecordPodCreated(pod *kapi.Pod, netInfo util.NetInfo)

RecordPodCreated extracts the scheduled timestamp and records how long it took us to notice this and set up the pod's scheduling.

func RecordPodEvent

func RecordPodEvent(eventName string, duration time.Duration)

func RecordPodSelectorAddrSetNamespaceEvent

func RecordPodSelectorAddrSetNamespaceEvent(eventName string, duration time.Duration)

func RecordPodSelectorAddrSetPodEvent

func RecordPodSelectorAddrSetPodEvent(eventName string, duration time.Duration)

func RecordSubnetCount

func RecordSubnetCount(v4SubnetCount, v6SubnetCount float64, networkName string)

RecordSubnetCount records the number of available subnets per configuration for ovn-kubernetes

func RecordSubnetUsage

func RecordSubnetUsage(v4SubnetsAllocated, v6SubnetsAllocated float64, networkName string)

RecordSubnetUsage records the number of subnets allocated for nodes

func RegisterClusterManagerBase

func RegisterClusterManagerBase()

RegisterClusterManagerBase registers ovnkube cluster manager base metrics with the Prometheus registry. This function should only be called once.

func RegisterClusterManagerFunctional

func RegisterClusterManagerFunctional()

RegisterClusterManagerFunctional is a collection of metrics that help us understand ovnkube-cluster-manager functions. Call once after LE is won.

func RegisterNodeMetrics

func RegisterNodeMetrics(stopChan <-chan struct{})

func RegisterOVNKubeControllerBase

func RegisterOVNKubeControllerBase()

RegisterOVNKubeControllerBase registers ovnkube controller base metrics with the Prometheus registry. This function should only be called once.

func RegisterOVNKubeControllerFunctional

func RegisterOVNKubeControllerFunctional(stopChan <-chan struct{})

RegisterOVNKubeControllerFunctional is a collection of metrics that help us understand ovnkube-controller functions. Call once after LE is won.

func RegisterOVNKubeControllerPerformance

func RegisterOVNKubeControllerPerformance(nbClient libovsdbclient.Client)

RegisterOVNKubeControllerPerformance registers metrics that help us understand ovnkube-controller performance. Call once after LE is won.

func RegisterOvnControllerMetrics

func RegisterOvnControllerMetrics(stopChan <-chan struct{})

func RegisterOvnDBMetrics

func RegisterOvnDBMetrics(clientset kubernetes.Interface, k8sNodeName string, stopChan <-chan struct{})

func RegisterOvnMetrics

func RegisterOvnMetrics(clientset kubernetes.Interface, k8sNodeName string, stopChan <-chan struct{})

func RegisterOvnNorthdMetrics

func RegisterOvnNorthdMetrics(clientset kubernetes.Interface, k8sNodeName string, stopChan <-chan struct{})

func RegisterOvsMetricsWithOvnMetrics

func RegisterOvsMetricsWithOvnMetrics(stopChan <-chan struct{})

func RegisterStandaloneOvsMetrics

func RegisterStandaloneOvsMetrics(stopChan <-chan struct{})

func RunOVNKubeFeatureDBObjectsMetricsUpdater

func RunOVNKubeFeatureDBObjectsMetricsUpdater(ovnNBClient libovsdbclient.Client, controllerName string, tickPeriod time.Duration, stopChan <-chan struct{})

func RunTimestamp

func RunTimestamp(stopChan <-chan struct{}, sbClient, nbClient libovsdbclient.Client)

RunTimestamp adds a goroutine that registers and updates timestamp metrics. This is so we can determine 'freshness' of the components NB/SB DB and northd. Function must be called once.

func StartMetricsServer

func StartMetricsServer(bindAddress string, enablePprof bool, certFile string, keyFile string,
	stopChan <-chan struct{}, wg *sync.WaitGroup)

StartMetricsServer runs the prometheus listener so that OVN K8s metrics can be collected It puts the endpoint behind TLS if certFile and keyFile are defined.

func StartOVNMetricsServer

func StartOVNMetricsServer(bindAddress, certFile, keyFile string, stopChan <-chan struct{}, wg *sync.WaitGroup)

StartOVNMetricsServer runs the prometheus listener so that OVN metrics can be collected

func UpdateEgressFirewallRuleCount

func UpdateEgressFirewallRuleCount(count float64)

UpdateEgressFirewallRuleCount records the number of Egress firewall rules.

Types

type ConfigDurationRecorder

type ConfigDurationRecorder struct {
	// contains filtered or unexported fields
}

func GetConfigDurationRecorder

func GetConfigDurationRecorder() *ConfigDurationRecorder

func (*ConfigDurationRecorder) AddOVN

func (cr *ConfigDurationRecorder) AddOVN(nbClient libovsdbclient.Client, kind, namespace, name string) (
	[]ovsdb.Operation, func(), time.Time, error)

AddOVN adds OVN config duration to an existing recording - previously started by calling function Start It will return ovsdb operations which a user can add to existing operations they wish to track. Upon successful transaction of the operations to the ovsdb server, the user of this function must call a call-back function to lock-in the request to measure and report. Failure to call the call-back function, will result in no OVN measurement and no metrics are reported. AddOVN will result in a no-op if Start isn't called previously for the same kind/namespace/name. If multiple AddOVN is called between Start and End for the same kind/namespace/name, then the OVN durations will be summed and added to the total. There is an assumption that processing of kind/namespace/name is sequential

func (*ConfigDurationRecorder) End

func (cr *ConfigDurationRecorder) End(kind, namespace, name string) time.Time

func (*ConfigDurationRecorder) Run

func (cr *ConfigDurationRecorder) Run(nbClient libovsdbclient.Client, kube kube.Interface, k float64,
	workerLoopPeriod time.Duration, stop <-chan struct{})

Run monitors the config duration for OVN-Kube master to configure k8 kinds. A measurement maybe allowed and this is related to the number of k8 nodes, N [1] and by argument k [2] where there is a probability that 1 out of N*k measurement attempts are allowed. If k=0, all measurements are allowed. mUpdatePeriod determines the period to process and publish metrics [1] 1<N<inf, [2] 0<=k<inf

func (*ConfigDurationRecorder) Start

func (cr *ConfigDurationRecorder) Start(kind, namespace, name string) (time.Time, bool)

Start allows the caller to attempt measurement of a control plane configuration duration, as a metric, the duration between functions Start and End. Optionally, if you wish to record OVN config duration, call AddOVN which will add the duration for OVN to apply the configuration to all nodes. The caller must pass kind,namespace,name which will be used to determine if the object is allowed to record. To allow no locking, each go routine that calls this function, can determine itself if it is allowed to measure. There is a mandatory two-step process to complete a measurement. Step 1) Call Start when you wish to begin a measurement - ideally when processing for the object starts Step 2) Call End which will complete a measurement Optionally, call AddOVN when you are making a transaction to OVN in order to add on the OVN duration to an existing measurement. This must be called between Start and End. Not every call to Start will result in a measurement and the rate of measurements depends on the number of nodes and function Run arg k. Only one measurement for a kind/namespace/name is allowed until the current measurement is Ended (via End) and processed. This is guaranteed by workqueues (even with multiple workers) and informer event handlers.

type OVNDBClusterStatus

type OVNDBClusterStatus struct {
	// contains filtered or unexported fields
}

type PodRecorder

type PodRecorder struct {
	// contains filtered or unexported fields
}

func NewPodRecorder

func NewPodRecorder() PodRecorder

func (*PodRecorder) AddLSP

func (pr *PodRecorder) AddLSP(podUID kapimtypes.UID, netInfo util.NetInfo)

func (*PodRecorder) AddPod

func (pr *PodRecorder) AddPod(podUID kapimtypes.UID)

func (*PodRecorder) CleanPod

func (pr *PodRecorder) CleanPod(podUID kapimtypes.UID)

func (*PodRecorder) Run

func (pr *PodRecorder) Run(sbClient libovsdbclient.Client, stop <-chan struct{})

Run monitors pod setup latency

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL