Documentation ¶
Overview ¶
Package compute contains code for accessing compute resources from many different cluster types, including AWS, Google Cloud, and HPC-style cluster scheduler.
Index ¶
- type HPCBackend
- func (b *HPCBackend) Cancel(ctx context.Context, taskID string) error
- func (b *HPCBackend) CheckBackendParameterSupport(task *tes.Task) error
- func (b *HPCBackend) Close()
- func (b *HPCBackend) Reconcile(ctx context.Context)
- func (b *HPCBackend) Submit(task *tes.Task) error
- func (b *HPCBackend) WriteEvent(ctx context.Context, ev *events.Event) error
- type HPCTaskState
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type HPCBackend ¶
type HPCBackend struct { Name string SubmitCmd string CancelCmd string Template string Conf config.Config Event events.Writer Database tes.ReadOnlyServer // ExtractID is responsible for extracting the task id from the response // returned by the SubmitCmd. ExtractID func(string) string // MapStates takes a list of backend specific ids and calls out to the backend // via (squeue, qstat, condor_q, etc) to get that tasks current state. These states // are mapped to TES states along with an optional reason for this mapping. // The Reconcile function can then use the response to update the task states // and system logs to report errors reported by the backend. MapStates func([]string) ([]*HPCTaskState, error) ReconcileRate time.Duration Log *logger.Logger events.Computer // contains filtered or unexported fields }
HPCBackend represents an HPCBackend such as HtCondor, Slurm, Grid Engine, etc.
func (*HPCBackend) Cancel ¶
func (b *HPCBackend) Cancel(ctx context.Context, taskID string) error
Cancel cancels a task via "qdel", "condor_rm", "scancel", etc.
func (*HPCBackend) CheckBackendParameterSupport ¶
func (b *HPCBackend) CheckBackendParameterSupport(task *tes.Task) error
func (*HPCBackend) Close ¶
func (b *HPCBackend) Close()
func (*HPCBackend) Reconcile ¶
func (b *HPCBackend) Reconcile(ctx context.Context)
Reconcile loops through tasks and checks the status from Funnel's database against the status reported by the backend (slurm, htcondor, grid engine, etc). This allows the backend to report system error's that prevented the worker process from running.
Currently this handles a narrow set of cases:
|---------------------|-----------------|--------------------| | Funnel State | Backend State | Reconciled State | |---------------------|-----------------|--------------------| | QUEUED | FAILED | SYSTEM_ERROR | | QUEUED | QUEUED/PENDING* | SYSTEM_ERROR | | INITIALIZING | FAILED | SYSTEM_ERROR | | RUNNING | FAILED | SYSTEM_ERROR |
In this context a "FAILED" state is being used as a generic term that captures one or more terminal states for the backend.
*QUEUED/PENDING: this captures the case where the scheduler has a task that is stuck in the queued state because the resource request that can never be fulfilled.
func (*HPCBackend) Submit ¶
func (b *HPCBackend) Submit(task *tes.Task) error
Submit submits a task via "qsub", "condor_submit", "sbatch", etc.
func (*HPCBackend) WriteEvent ¶
WriteEvent writes an event to the compute backend. Currently, only TASK_CREATED is handled, which calls Submit.
Directories ¶
Path | Synopsis |
---|---|
Package batch contains code for accessing compute resources via AWS Batch.
|
Package batch contains code for accessing compute resources via AWS Batch. |
Package gridengine contains code for accessing compute resources via Open Grid Engine.
|
Package gridengine contains code for accessing compute resources via Open Grid Engine. |
Package htcondor contains code for accessing compute resources via HTCondor.
|
Package htcondor contains code for accessing compute resources via HTCondor. |
Package kubernetes contains code for accessing compute resources via the Kubernetes v1 Batch API.
|
Package kubernetes contains code for accessing compute resources via the Kubernetes v1 Batch API. |
Package local contains code for accessing compute resources via the local computer, for Funnel development and debugging.
|
Package local contains code for accessing compute resources via the local computer, for Funnel development and debugging. |
Package noop contains a compute backend that does nothing, for testing purposes.
|
Package noop contains a compute backend that does nothing, for testing purposes. |
Package pbs contains code for accessing compute resources via PBS/Torque.
|
Package pbs contains code for accessing compute resources via PBS/Torque. |
Code generated by mockery v1.0.0.
|
Code generated by mockery v1.0.0. |
Package slurm contains code for accessing compute resources via Slurm.
|
Package slurm contains code for accessing compute resources via Slurm. |