compute

package
v0.0.0-...-2e51320 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2024 License: MIT Imports: 15 Imported by: 11

Documentation

Overview

Package compute contains code for accessing compute resources from many different cluster types, including AWS, Google Cloud, and HPC-style cluster scheduler.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type HPCBackend

type HPCBackend struct {
	Name      string
	SubmitCmd string
	CancelCmd string
	Template  string
	Conf      config.Config
	Event     events.Writer
	Database  tes.ReadOnlyServer
	// ExtractID is responsible for extracting the task id from the response
	// returned by the SubmitCmd.
	ExtractID func(string) string
	// MapStates takes a list of backend specific ids and calls out to the backend
	// via (squeue, qstat, condor_q, etc) to get that tasks current state. These states
	// are mapped to TES states along with an optional reason for this mapping.
	// The Reconcile function can then use the response to update the task states
	// and system logs to report errors reported by the backend.
	MapStates     func([]string) ([]*HPCTaskState, error)
	ReconcileRate time.Duration
	Log           *logger.Logger

	events.Computer
	// contains filtered or unexported fields
}

HPCBackend represents an HPCBackend such as HtCondor, Slurm, Grid Engine, etc.

func (*HPCBackend) Cancel

func (b *HPCBackend) Cancel(ctx context.Context, taskID string) error

Cancel cancels a task via "qdel", "condor_rm", "scancel", etc.

func (*HPCBackend) CheckBackendParameterSupport

func (b *HPCBackend) CheckBackendParameterSupport(task *tes.Task) error

func (*HPCBackend) Close

func (b *HPCBackend) Close()

func (*HPCBackend) Reconcile

func (b *HPCBackend) Reconcile(ctx context.Context)

Reconcile loops through tasks and checks the status from Funnel's database against the status reported by the backend (slurm, htcondor, grid engine, etc). This allows the backend to report system error's that prevented the worker process from running.

Currently this handles a narrow set of cases:

|---------------------|-----------------|--------------------| | Funnel State | Backend State | Reconciled State | |---------------------|-----------------|--------------------| | QUEUED | FAILED | SYSTEM_ERROR | | QUEUED | QUEUED/PENDING* | SYSTEM_ERROR | | INITIALIZING | FAILED | SYSTEM_ERROR | | RUNNING | FAILED | SYSTEM_ERROR |

In this context a "FAILED" state is being used as a generic term that captures one or more terminal states for the backend.

*QUEUED/PENDING: this captures the case where the scheduler has a task that is stuck in the queued state because the resource request that can never be fulfilled.

func (*HPCBackend) Submit

func (b *HPCBackend) Submit(task *tes.Task) error

Submit submits a task via "qsub", "condor_submit", "sbatch", etc.

func (*HPCBackend) WriteEvent

func (b *HPCBackend) WriteEvent(ctx context.Context, ev *events.Event) error

WriteEvent writes an event to the compute backend. Currently, only TASK_CREATED is handled, which calls Submit.

type HPCTaskState

type HPCTaskState struct {
	ID       string
	TESState tes.State
	State    string
	Reason   string
	Remove   bool
}

HPCTaskState is a structure used by Reconcile to represent the state of a task in Funnel and the HPC backend.

Directories

Path Synopsis
Package batch contains code for accessing compute resources via AWS Batch.
Package batch contains code for accessing compute resources via AWS Batch.
Package gridengine contains code for accessing compute resources via Open Grid Engine.
Package gridengine contains code for accessing compute resources via Open Grid Engine.
Package htcondor contains code for accessing compute resources via HTCondor.
Package htcondor contains code for accessing compute resources via HTCondor.
Package kubernetes contains code for accessing compute resources via the Kubernetes v1 Batch API.
Package kubernetes contains code for accessing compute resources via the Kubernetes v1 Batch API.
Package local contains code for accessing compute resources via the local computer, for Funnel development and debugging.
Package local contains code for accessing compute resources via the local computer, for Funnel development and debugging.
Package noop contains a compute backend that does nothing, for testing purposes.
Package noop contains a compute backend that does nothing, for testing purposes.
Package pbs contains code for accessing compute resources via PBS/Torque.
Package pbs contains code for accessing compute resources via PBS/Torque.
Code generated by mockery v1.0.0.
Code generated by mockery v1.0.0.
Package slurm contains code for accessing compute resources via Slurm.
Package slurm contains code for accessing compute resources via Slurm.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL