cache

package
v0.27.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 5, 2024 License: Apache-2.0 Imports: 23 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CleanRegisteredSchedulerProfileNames

func CleanRegisteredSchedulerProfileNames()

for testing only; NOT thread safe

func IsForeignPod

func IsForeignPod(pod *corev1.Pod) bool

func RegisterSchedulerProfileName

func RegisterSchedulerProfileName(schedProfileName string)

func SetupForeignPodsDetector

func SetupForeignPodsDetector(schedProfileName string, podInformer k8scache.SharedInformer, cc Interface)

func TrackAllForeignPods added in v0.26.7

func TrackAllForeignPods()

func TrackOnlyForeignPodsWithExclusiveResources added in v0.26.7

func TrackOnlyForeignPodsWithExclusiveResources()

Types

type DiscardReserved added in v0.26.7

type DiscardReserved struct {
	// contains filtered or unexported fields
}

DiscardReserved is intended to solve similiar problem as Overreserve Cache, which is to minimize amount of incorrect scheduling decisions based on stale NRT data. Unfortunately Overreserve cache only works for single-numa-node Topology Manager policy. Dis tries to minimize amount of Admission Errors and non-optimal placement when NodeResourceTopologyMatch plugin is used to schedule PODs requesting resources from multiple NUMA domains. There are scenarios where using DiscardReserved won't mitigate drawbacks of using Passthrough cache. NRT update is expected once PostBind triggers, but there's no guarantee about when this will happen. In cases like: - NFD(or any other component that advertises NRT) can be nonfunctional - network can be slow - Pod being scheduled after PostBind trigger and before NRT update in those cases DiscardReserved cache will act same as Passthrough cache

func (*DiscardReserved) GetCachedNRTCopy added in v0.26.7

func (pt *DiscardReserved) GetCachedNRTCopy(ctx context.Context, nodeName string, _ *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool)

func (*DiscardReserved) NodeHasForeignPods added in v0.26.7

func (pt *DiscardReserved) NodeHasForeignPods(nodeName string, pod *corev1.Pod)

func (*DiscardReserved) NodeMaybeOverReserved added in v0.26.7

func (pt *DiscardReserved) NodeMaybeOverReserved(nodeName string, pod *corev1.Pod)

func (*DiscardReserved) PostBind added in v0.26.7

func (pt *DiscardReserved) PostBind(nodeName string, pod *corev1.Pod)

PostBind is invoked to cleanup reservationMap

func (*DiscardReserved) ReserveNodeResources added in v0.26.7

func (pt *DiscardReserved) ReserveNodeResources(nodeName string, pod *corev1.Pod)

func (*DiscardReserved) UnreserveNodeResources added in v0.26.7

func (pt *DiscardReserved) UnreserveNodeResources(nodeName string, pod *corev1.Pod)

type Interface

type Interface interface {
	// GetCachedNRTCopy retrieves a NRT copy from cache, and then deducts over-reserved resources if necessary.
	// It will be used as the source of truth across the Pod's scheduling cycle.
	// Over-reserved resources are the resources consumed by pods scheduled to that node after the last update
	// of NRT pertaining to the same node, pessimistically overallocated on ALL the NUMA zones of the node.
	// The pod argument is used only for logging purposes.
	// Returns a boolean to signal the caller if the NRT data is clean. If false, then the node has foreign
	// Pods detected - so it should be ignored or handled differently by the caller.
	GetCachedNRTCopy(ctx context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool)

	// NodeMaybeOverReserved declares a node was filtered out for not enough resources available.
	// This means this node is eligible for a resync. When a node is marked discarded (dirty), it matters not
	// if it is so because pessimistic overallocation or because the node truly cannot accomodate the request;
	// this is for the resync step to figure out.
	// The pod argument is used only for logging purposes.
	NodeMaybeOverReserved(nodeName string, pod *corev1.Pod)

	// NodeHasForeignPods declares we observed one or more pods on this node which wasn't scheduled by this
	// scheduler instance. This means the resource accounting is likely out of date, so this function also signals
	// a cache resync is needed for this node.
	// Until that happens, this node should not be considered in the scheduling decisions, like it has zero resources
	// available. Note this condition is different from "no topology info available".
	// The former is a always-fail, the latter is a always-succeed.
	NodeHasForeignPods(nodeName string, pod *corev1.Pod)

	// ReserveNodeResources add the resources requested by a pod to the assumed resources for the node on which the pod
	// is scheduled on. This is a prerequesite for the pessimistic overallocation tracking.
	// Additionally, this function resets the discarded counter for the same node. Being able to handle a pod means
	// that this node has still available resources. If a node was previously discarded and then cleared, we interpret
	// this sequence of events as the previous pod required too much - a possible and benign condition.
	ReserveNodeResources(nodeName string, pod *corev1.Pod)

	// UnreserveNodeResources decrement from the node assumed resources the resources required by the given pod.
	UnreserveNodeResources(nodeName string, pod *corev1.Pod)

	// PostBind is called after a pod is successfully bound. These plugins are
	// informational. A common application of this extension point is for cleaning
	// up. If a plugin needs to clean-up its state after a pod is scheduled and
	// bound, PostBind is the extension point that it should register.
	PostBind(nodeName string, pod *corev1.Pod)
}

func NewDiscardReserved added in v0.26.7

func NewDiscardReserved(client ctrlclient.Client) Interface

func NewPassthrough

func NewPassthrough(client ctrlclient.Client) Interface

type OverReserve

type OverReserve struct {
	// contains filtered or unexported fields
}

func (*OverReserve) FlushNodes

func (ov *OverReserve) FlushNodes(logID string, nrts ...*topologyv1alpha2.NodeResourceTopology)

FlushNodes drops all the cached information about a given node, resetting its state clean.

func (*OverReserve) GetCachedNRTCopy

func (ov *OverReserve) GetCachedNRTCopy(ctx context.Context, nodeName string, pod *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool)

func (*OverReserve) NodeHasForeignPods

func (ov *OverReserve) NodeHasForeignPods(nodeName string, pod *corev1.Pod)

func (*OverReserve) NodeMaybeOverReserved

func (ov *OverReserve) NodeMaybeOverReserved(nodeName string, pod *corev1.Pod)

func (*OverReserve) NodesMaybeOverReserved

func (ov *OverReserve) NodesMaybeOverReserved(logID string) []string

NodesMaybeOverReserved returns a slice of all the node names which have been discarded previously, so which are supposed to be `dirty` in the cache. A node can be discarded for two reasons: 1. it legitmately cannot fit containers because it has not enough free resources 2. it was pessimistically overallocated, so the node is a candidate for resync This function enables the caller to know the slice of nodes should be considered for resync, avoiding the need to rescan the full node list.

func (*OverReserve) PostBind added in v0.26.7

func (ov *OverReserve) PostBind(nodeName string, pod *corev1.Pod)

func (*OverReserve) ReserveNodeResources

func (ov *OverReserve) ReserveNodeResources(nodeName string, pod *corev1.Pod)

func (*OverReserve) Resync

func (ov *OverReserve) Resync()

Resync implements the cache resync loop step. This function checks if the latest available NRT information received matches the state of a dirty node, for all the dirty nodes. If this is the case, the cache of a node can be Flush()ed. The trigger for attempting to resync a node is not just that we overallocated it. If a node was overallocated but still has capacity, we keep using it. But we cannot predict when the capacity is too low, because that would mean predicting the future workload requests. The best heuristic found so far is count how many times the node was skipped *AND* crosscheck with its overallocation state. If *both* a node has pessimistic overallocation accounted to it *and* was discarded "too many" (how much is too much is a runtime parameter which needs to be set and tuned) times, then it becomes a candidate for resync. Just using one of these two factors would lead to too aggressive resync attempts, so to more, likely unnecessary, computation work on the scheduler side.

func (*OverReserve) Store

func (ov *OverReserve) Store() *nrtStore

to be used only in tests

func (*OverReserve) UnreserveNodeResources

func (ov *OverReserve) UnreserveNodeResources(nodeName string, pod *corev1.Pod)

type Passthrough

type Passthrough struct {
	// contains filtered or unexported fields
}

func (Passthrough) GetCachedNRTCopy

func (pt Passthrough) GetCachedNRTCopy(ctx context.Context, nodeName string, _ *corev1.Pod) (*topologyv1alpha2.NodeResourceTopology, bool)

func (Passthrough) NodeHasForeignPods

func (pt Passthrough) NodeHasForeignPods(nodeName string, pod *corev1.Pod)

func (Passthrough) NodeMaybeOverReserved

func (pt Passthrough) NodeMaybeOverReserved(nodeName string, pod *corev1.Pod)

func (Passthrough) PostBind added in v0.26.7

func (pt Passthrough) PostBind(nodeName string, pod *corev1.Pod)

func (Passthrough) ReserveNodeResources

func (pt Passthrough) ReserveNodeResources(nodeName string, pod *corev1.Pod)

func (Passthrough) UnreserveNodeResources

func (pt Passthrough) UnreserveNodeResources(nodeName string, pod *corev1.Pod)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL