scheduler

package

v0.43.1 Latest Latest Go to latest Published: Nov 4, 2024 License: Apache-2.0 Imports: 9 Imported by: 3

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/knative/eventing

Links

Open Source Insights

README ¶

Knative Eventing Multi-Tenant Scheduler with High-Availability

An eventing source instance (for example, KafkaSource, etc) gets materialized as a virtual pod (* vpod*) and can be scaled up and down by increasing or decreasing the number of virtual pod replicas (vreplicas). A vreplica corresponds to a resource in the source that can replicated for maximum distributed processing (for example, number of consumers running in a consumer group).

The vpod multi-tenant scheduler is responsible for placing vreplicas onto real Kubernetes pods. Each pod is limited in capacity and can hold a maximum number of vreplicas. The scheduler takes a list of (source, # of vreplicas) tuples and computes a set of Placements. Placement info are added to the source status.

Scheduling strategies rely on pods having a sticky identity (StatefulSet replicas) and the current State of the cluster.

Components:

Scheduler

The scheduler allocates as many as vreplicas as possible into the lowest possible StatefulSet ordinal number before triggering the autoscaler when no more capacity is left to schedule vpods.

Autoscaler

The autoscaler scales up pod replicas of the statefulset adapter when there are vreplicas pending to be scheduled, and scales down if there are unused pods.

State Collector

Current state information about the cluster is collected after placing each vreplica and during intervals. Cluster information include computing the free capacity for each pod, list of schedulable pods (unschedulable pods are pods that are marked for eviction for compacting, number of pods ( stateful set replicas), total number of vreplicas in each pod for each vpod (spread).

Evictor

Autoscaler periodically attempts to compact veplicas into a smaller number of free replicas with lower ordinals. Vreplicas placed on higher ordinal pods are evicted and rescheduled to pods with a lower ordinal using the same scheduling strategies.

Normal Operation

Busy scheduler:

Scheduler can be very busy allocating the best placements for multiple eventing sources at a time using the scheduler predicates and priorities configured. During this time, the cluster could see statefulset replicas increasing, as the autoscaler computes how many more pods are needed to complete scheduling successfully. Also, the replicas could be decreasing during idle time, either caused by less events flowing through the system, or the evictor compacting vreplicas placements into a smaller number of pods or the deletion of event sources. The current placements are stored in the eventing source's status field for observability.

Software upgrades:

We can expect periodic software version upgrades or fixes to be performed on the Kubernetes cluster running the scheduler or on the Knative framework installed. Either of these scenarios could involve graceful rebooting of nodes and/or reapplying of controllers, adapters and other resources.

All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica scheduler. (For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member changes.)

No more cluster resources:

When there are no resources available on existing nodes in the cluster to schedule more pods and the autoscaler continues to scale up replicas, the new pods are left in a Pending state till cluster size is increased. Nothing to do for the scheduler until then.

Disaster Recovery

Some failure scenarios are described below:

Pod failure:

When a pod/replica in a StatefulSet goes down due to some reason (but its node and zone are healthy), a new replica is spun up by the StatefulSet with the same pod identity (pod can come up on a different node) almost immediately.

All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica scheduler. (For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member changes.)

References:

To learn more about Knative, please visit the /docs repository.

This repo falls under the Knative Code of Conduct

Documentation ¶

Overview ¶

Package scheduler is responsible for placing virtual pod (VPod) replicas within real pods.

Index ¶

Constants
func GetTotalVReplicas(placements []duckv1alpha1.Placement) int32
type Evictor
type ScaleCache
- func NewScaleCache(ctx context.Context, namespace string, scaleClient ScaleClient, ...) *ScaleCache
type ScaleCacheConfig
type ScaleClient
type Scheduler
type SchedulerFunc
- func (f SchedulerFunc) Schedule(ctx context.Context, vpod VPod) ([]duckv1alpha1.Placement, error)
type VPod
type VPodLister

Constants ¶

View Source

const (
	// PodAnnotationKey is an annotation used by the scheduler to be informed of pods
	// being evicted and not use it for placing vreplicas
	PodAnnotationKey = "eventing.knative.dev/unschedulable"
)

Variables ¶

This section is empty.

Functions ¶

func GetTotalVReplicas ¶

func GetTotalVReplicas(placements []duckv1alpha1.Placement) int32

GetTotalVReplicas returns the total number of placed virtual replicas

Types ¶

type Evictor ¶

type Evictor func(pod *corev1.Pod, vpod VPod, from *duckv1alpha1.Placement) error

Evictor allows for vreplicas to be evicted. For instance, the evictor is used by the statefulset scheduler to move vreplicas to pod with a lower ordinal.

type ScaleCache ¶ added in v0.41.0

type ScaleCache struct {
	// contains filtered or unexported fields
}

func NewScaleCache ¶ added in v0.41.0

func NewScaleCache(ctx context.Context, namespace string, scaleClient ScaleClient, config ScaleCacheConfig) *ScaleCache

func (*ScaleCache) GetScale ¶ added in v0.41.0

func (sc *ScaleCache) GetScale(ctx context.Context, statefulSetName string, options metav1.GetOptions) (*autoscalingv1.Scale, error)

func (*ScaleCache) Reset ¶ added in v0.41.0

func (sc *ScaleCache) Reset()

func (*ScaleCache) UpdateScale ¶ added in v0.41.0

func (sc *ScaleCache) UpdateScale(ctx context.Context, statefulSetName string, scale *autoscalingv1.Scale, opts metav1.UpdateOptions) (*autoscalingv1.Scale, error)

type ScaleCacheConfig ¶ added in v0.41.0

type ScaleCacheConfig struct {
	RefreshPeriod time.Duration `json:"refreshPeriod"`
}

type ScaleClient ¶ added in v0.41.0

type ScaleClient interface {
	GetScale(ctx context.Context, name string, options metav1.GetOptions) (*autoscalingv1.Scale, error)
	UpdateScale(ctx context.Context, name string, scale *autoscalingv1.Scale, options metav1.UpdateOptions) (*autoscalingv1.Scale, error)
}

type Scheduler ¶

type Scheduler interface {
	// Schedule computes the new set of placements for vpod.
	Schedule(ctx context.Context, vpod VPod) ([]duckv1alpha1.Placement, error)
}

Scheduler is responsible for placing VPods into real Kubernetes pods

type SchedulerFunc ¶ added in v0.29.0

type SchedulerFunc func(ctx context.Context, vpod VPod) ([]duckv1alpha1.Placement, error)

SchedulerFunc type is an adapter to allow the use of ordinary functions as Schedulers. If f is a function with the appropriate signature, SchedulerFunc(f) is a Scheduler that calls f.

func (SchedulerFunc) Schedule ¶ added in v0.29.0

func (f SchedulerFunc) Schedule(ctx context.Context, vpod VPod) ([]duckv1alpha1.Placement, error)

Schedule implements the Scheduler interface.

type VPod ¶

type VPod interface {
	// GetKey returns the VPod key (namespace/name).
	GetKey() types.NamespacedName

	// GetVReplicas returns the number of expected virtual replicas
	GetVReplicas() int32

	// GetPlacements returns the current list of placements
	// Do not mutate!
	GetPlacements() []duckv1alpha1.Placement

	GetResourceVersion() string
}

VPod represents virtual replicas placed into real Kubernetes pods The scheduler is responsible for placing VPods

type VPodLister ¶

type VPodLister func() ([]VPod, error)

VPodLister is the function signature for returning a list of VPods

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
state
statefulset
testing

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL