flytek8s

package
v1.9.28 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 17, 2023 License: Apache-2.0 Imports: 29 Imported by: 4

Documentation

Index

Constants

View Source
const Interrupted = "Interrupted"
View Source
const OOMKilled = "OOMKilled"
View Source
const PodKind = "pod"
View Source
const PrimaryContainerKey = "primary_container_name"
View Source
const ResourceNvidiaGPU = "nvidia.com/gpu"

ResourceNvidiaGPU is the name of the Nvidia GPU resource. Copied from: k8s.io/autoscaler/cluster-autoscaler/utils/gpu/gpu.go

View Source
const SIGKILL = 137

Variables

This section is empty.

Functions

func AddCoPilotToPod

func AddCoPilotToPod(ctx context.Context, cfg config.FlyteCoPilotConfig, coPilotPod *v1.PodSpec, iFace *core.TypedInterface, taskExecMetadata core2.TaskExecutionMetadata, inputPaths io.InputFilePaths, outputPaths io.OutputFilePaths, pilot *core.DataLoadingConfig) error

func AddFlyteCustomizationsToContainer

func AddFlyteCustomizationsToContainer(ctx context.Context, parameters template.Parameters,
	mode ResourceCustomizationMode, container *v1.Container) error

AddFlyteCustomizationsToContainer takes a container definition which specifies how to run a Flyte task and fills in templated command and argument values, updates resources and decorates environment variables with platform and task-specific customizations.

func AddPreferredNodeSelectorRequirements added in v1.9.20

func AddPreferredNodeSelectorRequirements(base *v1.Affinity, weight int32, new ...v1.NodeSelectorRequirement)

AddPreferredNodeSelectorRequirements appends the provided v1.NodeSelectorRequirement objects to an existing v1.Affinity object's list of preferred scheduling terms. See: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity-weight for how weights are used during scheduling.

func AddRequiredNodeSelectorRequirements added in v1.9.20

func AddRequiredNodeSelectorRequirements(base *v1.Affinity, new ...v1.NodeSelectorRequirement)

AddRequiredNodeSelectorRequirements adds the provided v1.NodeSelectorRequirement objects to an existing v1.Affinity object. If there are no existing required node selectors, the new v1.NodeSelectorRequirement will be added as-is. However, if there are existing required node selectors, we iterate over all existing node selector terms and append the node selector requirement. Note that multiple node selector terms are OR'd, and match expressions within a single node selector term are AND'd during scheduling. See: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity

func ApplyFlytePodConfiguration

func ApplyFlytePodConfiguration(ctx context.Context, tCtx pluginsCore.TaskExecutionContext, podSpec *v1.PodSpec, objectMeta *metav1.ObjectMeta, primaryContainerName string) (*v1.PodSpec, *metav1.ObjectMeta, error)

ApplyFlytePodConfiguration updates the PodSpec and ObjectMeta with various Flyte configuration. This includes applying default k8s configuration, resource requests, injecting copilot containers, and merging with the configuration PodTemplate (if exists).

func ApplyGPUNodeSelectors added in v1.9.20

func ApplyGPUNodeSelectors(podSpec *v1.PodSpec, gpuAccelerator *core.GPUAccelerator)

func ApplyInterruptibleNodeAffinity

func ApplyInterruptibleNodeAffinity(interruptible bool, podSpec *v1.PodSpec)

ApplyInterruptibleNodeAffinity configures the node-affinity for the pod using the configuration specified.

func ApplyInterruptibleNodeSelectorRequirement

func ApplyInterruptibleNodeSelectorRequirement(interruptible bool, affinity *v1.Affinity)

ApplyInterruptibleNodeSelectorRequirement configures the node selector requirement of the node-affinity using the configuration specified.

func ApplyResourceOverrides

func ApplyResourceOverrides(resources, platformResources v1.ResourceRequirements, assignIfUnset bool) v1.ResourceRequirements

ApplyResourceOverrides handles resource resolution, allocation and validation. Primarily, it ensures that container resources do not exceed defined platformResource limits and in the case of assignIfUnset, ensures that limits and requests are sensibly set for resources of all types. Furthermore, this function handles some clean-up such as converting GPU resources to the recognized Nvidia gpu resource name and deleting unsupported Storage-type resources.

func BuildIdentityPod

func BuildIdentityPod() *v1.Pod

func BuildRawContainer

func BuildRawContainer(ctx context.Context, tCtx pluginscore.TaskExecutionContext) (*v1.Container, error)

BuildRawContainer constructs a Container based on the definition passed by the TaskExecutionContext.

func BuildRawPod

BuildRawPod constructs a PodSpec and ObjectMeta based on the definition passed by the TaskExecutionContext. This definition does not include any configuration injected by Flyte.

func CalculateStorageSize

func CalculateStorageSize(requirements *v1.ResourceRequirements) *resource.Quantity

func CopilotCommandArgs

func CopilotCommandArgs(storageConfig *storage.Config) []string

func DataVolume

func DataVolume(name string, size *resource.Quantity) v1.Volume

func DecorateEnvVars

func DecorateEnvVars(ctx context.Context, envVars []v1.EnvVar, taskEnvironmentVariables map[string]string, id pluginsCore.TaskExecutionID) []v1.EnvVar

func DemystifyFailure

func DemystifyFailure(status v1.PodStatus, info pluginsCore.TaskInfo) (pluginsCore.PhaseInfo, error)

DemystifyFailure resolves the various Kubernetes pod failure modes to determine the most appropriate course of action

func DemystifyPending

func DemystifyPending(status v1.PodStatus) (pluginsCore.PhaseInfo, error)

DemystifyPending is one the core functions, that helps FlytePropeller determine if a pending pod is indeed pending, or it is actually stuck in a un-reparable state. In such a case the pod should be marked as dead and the task should be retried. This has to be handled sadly, as K8s is still largely designed for long running services that should recover from failures, but Flyte pods are completely automated and should either run or fail Important considerations. Pending Status in Pod could be for various reasons and sometimes could signal a problem Case I: Pending because the Image pull is failing and it is backing off

This could be transient. So we can actually rely on the failure reason.
The failure transitions from ErrImagePull -> ImagePullBackoff

Case II: Not enough resources are available. This is tricky. It could be that the total number of

resources requested is beyond the capability of the system. for this we will rely on configuration
and hence input gates. We should not allow bad requests that Request for large number of resource through.
In the case it makes through, we will fail after timeout

func DemystifySuccess

func DemystifySuccess(status v1.PodStatus, info pluginsCore.TaskInfo) (pluginsCore.PhaseInfo, error)

func DeterminePrimaryContainerPhase

func DeterminePrimaryContainerPhase(primaryContainerName string, statuses []v1.ContainerStatus, info *pluginsCore.TaskInfo) pluginsCore.PhaseInfo

DeterminePrimaryContainerPhase as the name suggests, given all the containers, will return a pluginsCore.PhaseInfo object corresponding to the phase of the primaryContainer which is identified using the provided name. This is useful in case of sidecars or pod jobs, where Flyte will monitor successful exit of a single container.

func DownloadCommandArgs

func DownloadCommandArgs(fromInputsPath, outputPrefix storage.DataReference, toLocalPath string, format core.DataLoadingConfig_LiteralMapFormat, inputInterface *core.VariableMap) ([]string, error)

func FlyteCoPilotContainer

func FlyteCoPilotContainer(name string, cfg config.FlyteCoPilotConfig, args []string, volumeMounts ...v1.VolumeMount) (v1.Container, error)

func GetContainer added in v1.9.18

func GetContainer(podSpec *v1.PodSpec, name string) (*v1.Container, error)

func GetContextEnvVars

func GetContextEnvVars(ownerCtx context.Context) []v1.EnvVar

func GetExecutionEnvVars

func GetExecutionEnvVars(id pluginsCore.TaskExecutionID) []v1.EnvVar

func GetLastTransitionOccurredAt

func GetLastTransitionOccurredAt(pod *v1.Pod) metav1.Time

func GetPodTemplateUpdatesHandler

func GetPodTemplateUpdatesHandler(store *PodTemplateStore) cache.ResourceEventHandler

GetPodTemplateUpdatesHandler returns a new ResourceEventHandler which adds / removes PodTemplates to / from the provided PodTemplateStore.

func GetPodTolerations

func GetPodTolerations(interruptible bool, resourceRequirements ...v1.ResourceRequirements) []v1.Toleration

func GetReportedAt

func GetReportedAt(pod *v1.Pod) metav1.Time

func GetServiceAccountNameFromTaskExecutionMetadata

func GetServiceAccountNameFromTaskExecutionMetadata(taskExecutionMetadata pluginmachinery_core.TaskExecutionMetadata) string

func MergeResources

func MergeResources(in v1.ResourceRequirements, out *v1.ResourceRequirements)

func MergeWithBasePodTemplate

func MergeWithBasePodTemplate(ctx context.Context, tCtx pluginsCore.TaskExecutionContext,
	podSpec *v1.PodSpec, objectMeta *metav1.ObjectMeta, primaryContainerName string) (*v1.PodSpec, *metav1.ObjectMeta, error)

MergeWithBasePodTemplate attempts to merge the provided PodSpec and ObjectMeta with the configuration PodTemplate for this task.

func SidecarCommandArgs

func SidecarCommandArgs(fromLocalPath string, outputPrefix, rawOutputPath storage.DataReference, startTimeout time.Duration, iface *core.TypedInterface) ([]string, error)

func ToK8sContainer

func ToK8sContainer(ctx context.Context, tCtx pluginscore.TaskExecutionContext) (*v1.Container, error)

ToK8sContainer builds a Container based on the definition passed by the TaskExecutionContext. This involves applying all Flyte configuration including k8s plugins and resource requests.

func ToK8sEnvVar

func ToK8sEnvVar(env []*core.KeyValuePair) []v1.EnvVar

func ToK8sPodSpec

ToK8sPodSpec builds a PodSpec and ObjectMeta based on the definition passed by the TaskExecutionContext. This involves parsing the raw PodSpec definition and applying all Flyte configuration options.

func ToK8sResourceList

func ToK8sResourceList(resources []*core.Resources_ResourceEntry) (v1.ResourceList, error)

TODO we should modify the container resources to contain a map of enum values? Also we should probably create tolerations / taints, but we could do that as a post process

func ToK8sResourceRequirements

func ToK8sResourceRequirements(resources *core.Resources) (*v1.ResourceRequirements, error)

func UpdatePod

func UpdatePod(taskExecutionMetadata pluginsCore.TaskExecutionMetadata,
	resourceRequirements []v1.ResourceRequirements, podSpec *v1.PodSpec)

UpdatePod updates the base pod spec used to execute tasks. This is configured with plugins and task metadata-specific options

Types

type NonInterruptibleTaskExecutionContext added in v1.9.18

type NonInterruptibleTaskExecutionContext struct {
	pluginsCore.TaskExecutionContext
	// contains filtered or unexported fields
}

A wrapper around a regular TaskExecutionContext allowing to inject a custom TaskExecutionMetadata which is non-interruptible

func NewNonInterruptibleTaskExecutionContext added in v1.9.18

func NewNonInterruptibleTaskExecutionContext(ctx pluginsCore.TaskExecutionContext) NonInterruptibleTaskExecutionContext

func (NonInterruptibleTaskExecutionContext) TaskExecutionMetadata added in v1.9.18

type NonInterruptibleTaskExecutionMetadata added in v1.9.18

type NonInterruptibleTaskExecutionMetadata struct {
	pluginsCore.TaskExecutionMetadata
}

Wraps a regular TaskExecutionMetadata and overrides the IsInterruptible method to always return false This is useful as the runner and the scheduler pods should never be interruptible

func (NonInterruptibleTaskExecutionMetadata) IsInterruptible added in v1.9.18

func (n NonInterruptibleTaskExecutionMetadata) IsInterruptible() bool

type PodTemplateStore

type PodTemplateStore struct {
	*sync.Map
	// contains filtered or unexported fields
}

PodTemplateStore maintains a thread-safe mapping of active PodTemplates with their associated namespaces.

var DefaultPodTemplateStore PodTemplateStore = NewPodTemplateStore()

func NewPodTemplateStore

func NewPodTemplateStore() PodTemplateStore

NewPodTemplateStore initializes a new PodTemplateStore

func (*PodTemplateStore) Delete

func (p *PodTemplateStore) Delete(podTemplate *v1.PodTemplate)

Delete removes the specified PodTemplate from the store.

func (*PodTemplateStore) LoadOrDefault

func (p *PodTemplateStore) LoadOrDefault(namespace string, podTemplateName string) *v1.PodTemplate

LoadOrDefault returns the PodTemplate with the specified name in the given namespace. If one does not exist it attempts to retrieve the one associated with the defaultNamespace.

func (*PodTemplateStore) SetDefaultNamespace

func (p *PodTemplateStore) SetDefaultNamespace(namespace string)

SetDefaultNamespace sets the default namespace for the PodTemplateStore.

func (*PodTemplateStore) Store

func (p *PodTemplateStore) Store(podTemplate *v1.PodTemplate)

Store loads the specified PodTemplate into the store.

type ResourceCustomizationMode

type ResourceCustomizationMode int
const (
	// ResourceCustomizationModeAssignResources is used for container tasks where resources are validated and assigned if necessary.
	ResourceCustomizationModeAssignResources ResourceCustomizationMode = iota
	// ResourceCustomizationModeMergeExistingResources is used for primary containers in pod tasks where container requests and limits are
	// merged, validated and assigned if necessary.
	ResourceCustomizationModeMergeExistingResources
	// ResourceCustomizationModeEnsureExistingResourcesInRange is used for secondary containers in pod tasks where requests and limits are only
	// adjusted if needed (downwards).
	ResourceCustomizationModeEnsureExistingResourcesInRange
)

func ResourceCustomizationModeString

func ResourceCustomizationModeString(s string) (ResourceCustomizationMode, error)

ResourceCustomizationModeString retrieves an enum value from the enum constants string name. Throws an error if the param is not part of the enum.

func ResourceCustomizationModeValues

func ResourceCustomizationModeValues() []ResourceCustomizationMode

ResourceCustomizationModeValues returns all values of the enum

func (ResourceCustomizationMode) IsAResourceCustomizationMode

func (i ResourceCustomizationMode) IsAResourceCustomizationMode() bool

IsAResourceCustomizationMode returns "true" if the value is listed in the enum definition. "false" otherwise

func (ResourceCustomizationMode) String

func (i ResourceCustomizationMode) String() string

type ResourceRequirement

type ResourceRequirement struct {
	Request resource.Quantity
	Limit   resource.Quantity
}

func AdjustOrDefaultResource

func AdjustOrDefaultResource(request, limit, platformDefault, platformLimit resource.Quantity) ResourceRequirement

AdjustOrDefaultResource validates resources conform to platform limits and assigns defaults for Request and Limit values by using the Request when the Limit is unset, and vice versa.

Directories

Path Synopsis
Package config contains configuration for the flytek8s module - which is global configuration for all Flyte K8s interactions.
Package config contains configuration for the flytek8s module - which is global configuration for all Flyte K8s interactions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL