What is it
is a simple CLI tool for machine learning engineer to deploy MPIJob in Kubernetes cluster without Kubernetes-related knowledge or manually deployment of yaml files. Imitated from this project, I reinvented the wheel for learning purpose again.
How to install
go install github.com/FFFFFaraway/farctl@latest
How to use
Submit MPIJob
You'll need to write deep learning code using horovod. For example here
(Optional) You can upload the code to some public available platform like github, or gitlab
Submit the job, for example:
farctl submit test -i https://github.com/FFFFFaraway/sample-python-train.git -c "python generate_data.py" -c "python main.py" --gang -n 2
- Test is the name of the submitted MPIJob
- -i denote the url of git clone, or the local directory path (default: . )
- -c denote the command as entry point. we can have multiple commands by using multiple -c
- -gang denote we'll use gang scheduler. But it's needed a extra installation.
- -n denote the number of workers to be created
- Other options can be found by typing
farctl submit -h
Another example use local code directory, it will first copy . (current directory) recursively to all worker pods:
farctl submit local-test -c "echo ab" -c "echo cd"
List MPIJob
farctl list
We could monitor the status of mpijobs here:
Namespace Name ReadyWorkers/Total LauncherStatus Age
farctl test 1/2 WaitingWorkers 1m17s
Namespace Name ReadyWorkers/Total LauncherStatus Age
farctl test 2/2 Running 2m3s
Get MPIJob Log
When the LauncherStatus
become Running
, we can access the log of the MPIJob:
farctl log test
Get applyed MPIJob Configuration
farctl get test
Delete MPIJob
farctl delete test