slurm

package
v0.0.0-...-b186a73 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 5, 2025 License: MIT Imports: 30 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

Functions

func GetSessionContext

func GetSessionContext(r *http.Request) string

func GetSessionContextMessage

func GetSessionContextMessage(sessionContext string) string

func SLURMBatchSubmit

func SLURMBatchSubmit(Ctx context.Context, config SlurmConfig, path string) (string, error)

SLURMBatchSubmit submits the job provided in the path argument to the SLURM queue. At this point, it's up to the SLURM scheduler to manage the job. Returns the output of the sbatch command and the first encoundered error.

Types

type CreateStruct

type CreateStruct struct {
	PodUID string `json:"PodUID"`
	PodJID string `json:"PodJID"`
}

type JidStruct

type JidStruct struct {
	PodUID       string    `json:"PodUID"`
	PodNamespace string    `json:"PodNamespace"`
	JID          string    `json:"JID"`
	StartTime    time.Time `json:"StartTime"`
	EndTime      time.Time `json:"EndTime"`
}

type ResourceLimits

type ResourceLimits struct {
	CPU    int64
	Memory int64
}

type SidecarHandler

type SidecarHandler struct {
	Config SlurmConfig
	JIDs   *map[string]*JidStruct
	Ctx    context.Context
}

func (*SidecarHandler) CreateDirectories

func (h *SidecarHandler) CreateDirectories() error

CreateDirectories is just a function to be sure directories exists at runtime

func (*SidecarHandler) GetLogsFollowMode

func (h *SidecarHandler) GetLogsFollowMode(
	spanCtx context.Context,
	podUid string,
	w http.ResponseWriter,
	r *http.Request,
	path string,
	req commonIL.LogStruct,
	containerOutputPath string,
	containerOutput []byte,
	sessionContext string,
) error

Logs in follow mode (get logs until the death of the container) with "kubectl -f".

func (*SidecarHandler) GetLogsHandler

func (h *SidecarHandler) GetLogsHandler(w http.ResponseWriter, r *http.Request)

GetLogsHandler reads Jobs' output file to return what's logged inside. What's returned is based on the provided parameters (Tail/LimitBytes/Timestamps/etc)

func (*SidecarHandler) LoadJIDs

func (h *SidecarHandler) LoadJIDs() error

LoadJIDs loads Job IDs into the main JIDs struct from files in the root folder. It's useful went down and needed to be restarded, but there were jobs running, for example. Return only error in case of failure

func (*SidecarHandler) ReadLogs

func (h *SidecarHandler) ReadLogs(logsPath string, span trace.Span, ctx context.Context, w http.ResponseWriter, sessionContextMessage string) ([]byte, error)

Goal: read the file if it exist. If not, return empty. Important to wait because if we don't wait and return empty array, it will generates a JSON unmarshall error in InterLink VK. Fail for any error not related to file not existing (eg: permission error will raise an error). Already handle error.

func (*SidecarHandler) StatusHandler

func (h *SidecarHandler) StatusHandler(w http.ResponseWriter, r *http.Request)

StatusHandler performs a squeue --me and uses regular expressions to get the running Jobs' status

func (*SidecarHandler) StopHandler

func (h *SidecarHandler) StopHandler(w http.ResponseWriter, r *http.Request)

StopHandler runs a scancel command, updating JIDs and cached statuses

func (*SidecarHandler) SubmitHandler

func (h *SidecarHandler) SubmitHandler(w http.ResponseWriter, r *http.Request)

SubmitHandler generates and submits a SLURM batch script according to provided data. 1 Pod = 1 Job. If a Pod has multiple containers, every container is a line with it's parameters in the SLURM script.

type SingularityCommand

type SingularityCommand struct {
	// contains filtered or unexported fields
}

type SlurmConfig

type SlurmConfig struct {
	VKConfigPath      string `yaml:"VKConfigPath"`
	Sbatchpath        string `yaml:"SbatchPath"`
	Scancelpath       string `yaml:"ScancelPath"`
	Squeuepath        string `yaml:"SqueuePath"`
	Sidecarport       string `yaml:"SidecarPort"`
	Socket            string `yaml:"Socket"`
	ExportPodData     bool   `yaml:"ExportPodData"`
	Commandprefix     string `yaml:"CommandPrefix"`
	ImagePrefix       string `yaml:"ImagePrefix"`
	DataRootFolder    string `yaml:"DataRootFolder"`
	Namespace         string `yaml:"Namespace"`
	Tsocks            bool   `yaml:"Tsocks"`
	Tsockspath        string `yaml:"TsocksPath"`
	Tsockslogin       string `yaml:"TsocksLoginNode"`
	BashPath          string `yaml:"BashPath"`
	VerboseLogging    bool   `yaml:"VerboseLogging"`
	ErrorsOnlyLogging bool   `yaml:"ErrorsOnlyLogging"`
	// contains filtered or unexported fields
}

InterLinkConfig holds the whole configuration

var SlurmConfigInst SlurmConfig

func NewSlurmConfig

func NewSlurmConfig() (SlurmConfig, error)

NewSlurmConfig returns a variable of type SlurmConfig, used in many other functions and the first encountered error.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL