trimaran

package
v0.30.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 3, 2024 License: Apache-2.0 Imports: 14 Imported by: 0

README

Trimaran: Load-aware scheduling plugins

Trimaran is a collection of load-aware scheduler plugins described in Trimaran: Real Load Aware Scheduling.

Currently, the collection consists of the following plugins.

  • TargetLoadPacking: Implements a packing policy up to a configured CPU utilization, then switches to a spreading policy among the hot nodes. (Supports CPU resource.)
  • LoadVariationRiskBalancing: Equalizes the risk, defined as a combined measure of average utilization and variation in utilization, among nodes. (Supports CPU and memory resources.)
  • LowRiskOverCommitment: Evaluates the performance risk of overcommitment and selects the node with the lowest risk by taking into consideration (1) the resource limit values of pods (limit-aware) and (2) the actual load (utilization) on the nodes (load-aware). Thus, it provides a low risk environment for pods and alleviate issues with overcommitment, while allowing pods to use their limits.

The Trimaran plugins utilize a load-watcher to access resource utilization data via metrics providers. Currently, the load-watcher supports three metrics providers: Kubernetes Metrics Server, Prometheus Server, and SignalFx.

There are two modes for a Trimaran plugin to use the load-watcher: as a service or as a library.

load-watcher as a service

In this mode, the Trimaran plugin uses a deployed load-watcher service in the cluster as depicted in the figure below. A watcherAddress configuration parameter is required to define the load-watcher service endpoint. For example,

watcherAddress: http://xxxx.svc.cluster.local:2020

Instructions on how to build and deploy the load-watcher can be found here. The load-watcher service may also be deployed in the same scheduler pod, following the tutorial here.

load-watcher as a service

load-watcher as a library

In this mode, the Trimaran plugin embeds the load-watcher as a library, which in turn accesses the configured metrics provider. In this case, we have three configuration parameters: metricProvider.type, metricProvider.address and metricProvider.token.

load-watcher as a library

The configuration parameters should be set as follows.

  • metricProvider.type: the type of the metrics provider
    • KubernetesMetricsServer (default)
    • Prometheus
    • SignalFx
  • metricProvider.address: the address of the metrics provider endpoint, if needed. For the Kubernetes Metrics Server, this parameter may be ignored. For the Prometheus Server, an example setting is
    • http://prometheus-k8s.monitoring.svc.cluster.local:9090
  • metricProvider.token: set only if an authentication token is needed to access the metrics provider.

The selection of the load-watcher mode is based on the existence of a watcherAddress parameter. If it is set, then the load-watcher is in the 'as a service' mode, otherwise it is in the 'as a library' mode.

In addition to the above configuration parameters, the Trimaran plugin may have its own specific parameters.

Following is an example scheduler configuration.

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
leaderElection:
  leaderElect: false
profiles:
- schedulerName: trimaran
  plugins:
    score:
      enabled:
       - name: LoadVariationRiskBalancing
  pluginConfig:
  - name: LoadVariationRiskBalancing
    args:
      metricProvider:
        type: Prometheus
        address: http://prometheus-k8s.monitoring.svc.cluster.local:9090
      safeVarianceMargin: 1
      safeVarianceSensitivity: 2
Configure Prometheus Metric Provider under different environments
  1. Invalid self-signed SSL connection error for the Prometheus metric queries The Prometheus metric queries may have invalid self-signed SSL connection error when the cluster environment disables the skipInsecureVerify option for HTTPs. In this case, you can configure insecureSkipVerify: true for metricProvider to skip the SSL verification.

    args:
      metricProvider:
        type: Prometheus
        address: http://prometheus-k8s.monitoring.svc.cluster.local:9090
        insecureSkipVerify: true
    
  2. OpenShift Prometheus authentication without tokens. The OpenShift clusters disallow non-verified clients to access its Prometheus metrics. To run the Trimaran plugin on OpenShift, you need to set an environment variable ENABLE_OPENSHIFT_AUTH=true for your trimaran scheduler deployment when run load-watcher as a library.

A note on multiple plugins

The Trimaran plugins have different, potentially conflicting, objectives. Thus, it is recommended not to enable them concurrently. As such, they are designed to each have its own load-watcher.

Documentation

Index

Constants

View Source
const (
	// MegaFactor : Mega unit multiplier
	MegaFactor = float64(1. / 1024. / 1024.)
)

Variables

This section is empty.

Functions

func GetEffectiveResource added in v0.26.7

func GetEffectiveResource(pod *v1.Pod, fn func(container *v1.Container) v1.ResourceList) *framework.Resource

GetEffectiveResource : calculate effective resources of a pod (CPU and Memory)

func GetMuSigma added in v0.24.9

func GetMuSigma(rs *ResourceStats) (float64, float64)

GetMuSigma : get average and standard deviation from statistics

func GetResourceData added in v0.24.9

func GetResourceData(metrics []watcher.Metric, resourceType string) (avg float64, stDev float64, isValid bool)

GetResourceData : get data from measurements for a given resource type

func GetResourceLimits added in v0.26.7

func GetResourceLimits(pod *v1.Pod) *framework.Resource

GetResourceLimits : calculate the resource limits of a pod (CPU and Memory)

func GetResourceRequested added in v0.24.9

func GetResourceRequested(pod *v1.Pod) *framework.Resource

GetResourceRequested : calculate the resource requests of a pod (CPU and Memory)

func SetMaxLimits added in v0.26.7

func SetMaxLimits(requests *framework.Resource, limits *framework.Resource)

SetMaxLimits : set limits to max(limits, requests) (Note: we could have used '(r *Resource) SetMaxResource(rl v1.ResourceList)', but takes map as arg )

Types

type Collector added in v0.24.9

type Collector struct {
	// contains filtered or unexported fields
}

Collector : get data from load watcher, encapsulating the load watcher and its operations

Trimaran plugins have different, potentially conflicting, objectives. Thus, it is recommended not to enable them concurrently. As such, they are currently designed to each have its own Collector. If a need arises in the future to enable multiple Trimaran plugins, a restructuring to have a single Collector, serving the multiple plugins, may be beneficial for performance reasons.

func NewCollector added in v0.24.9

func NewCollector(logger klog.Logger, trimaranSpec *pluginConfig.TrimaranSpec) (*Collector, error)

NewCollector : create an instance of a data collector

func (*Collector) GetNodeMetrics added in v0.24.9

func (collector *Collector) GetNodeMetrics(logger klog.Logger, nodeName string) ([]watcher.Metric, *watcher.WatcherMetrics)

GetNodeMetrics : get metrics for a node from watcher

type NodeRequestsAndLimits added in v0.26.7

type NodeRequestsAndLimits struct {
	// NodeRequest sum of requests of all pods on node
	NodeRequest *framework.Resource
	// NodeLimit sum of limits of all pods on node
	NodeLimit *framework.Resource
	// NodeRequestMinusPod is the NodeRequest without the requests of the pending pod
	NodeRequestMinusPod *framework.Resource
	// NodeLimitMinusPod is the NodeLimit without the limits of the pending pod
	NodeLimitMinusPod *framework.Resource
	// Nodecapacity is the capacity (allocatable) of node
	Nodecapacity *framework.Resource
}

NodeRequestsAndLimits : data ralated to requests and limits of resources on a node

func GetNodeRequestsAndLimits added in v0.26.7

func GetNodeRequestsAndLimits(logger klog.Logger, podInfosOnNode []*framework.PodInfo, node *v1.Node, pod *v1.Pod,
	podRequests *framework.Resource, podLimits *framework.Resource) *NodeRequestsAndLimits

GetNodeRequestsAndLimits : total requested and limits of resources on a given node plus a pod

type PodAssignEventHandler

type PodAssignEventHandler struct {
	// Maintains the node-name to podInfo mapping for pods successfully bound to nodes
	ScheduledPodsCache map[string][]podInfo
	sync.RWMutex
}

This event handler watches assigned Pod and caches them locally

func New

func New() *PodAssignEventHandler

New returns a new instance of PodAssignEventHandler, after starting a background go routine for cache cleanup

func (*PodAssignEventHandler) AddToHandle added in v0.24.9

func (p *PodAssignEventHandler) AddToHandle(handle framework.Handle)

AddToHandle : add event handler to framework handle

func (*PodAssignEventHandler) OnAdd

func (p *PodAssignEventHandler) OnAdd(obj interface{}, _ bool)

func (*PodAssignEventHandler) OnDelete

func (p *PodAssignEventHandler) OnDelete(obj interface{})

func (*PodAssignEventHandler) OnUpdate

func (p *PodAssignEventHandler) OnUpdate(oldObj, newObj interface{})

type ResourceStats added in v0.24.9

type ResourceStats struct {
	// average used (absolute)
	UsedAvg float64
	// standard deviation used (absolute)
	UsedStdev float64
	// req of pod
	Req float64
	// node capacity
	Capacity float64
}

ResourceStats : statistics data for a resource

func CreateResourceStats added in v0.24.9

func CreateResourceStats(logger klog.Logger, metrics []watcher.Metric, node *v1.Node, podRequest *framework.Resource,
	resourceName v1.ResourceName, watcherType string) (rs *ResourceStats, isValid bool)

CreateResourceStats : get resource statistics data from measurements for a node

Directories

Path Synopsis
Package loadvariationriskbalancing plugin attempts to balance the risk in load variation across the cluster.
Package loadvariationriskbalancing plugin attempts to balance the risk in load variation across the cluster.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL