Morphling
Morphling is an auto-configuration framework for
machine learning model serving (inference) on Kubernetes. Check the website for details.
Morphling paper accepted at ACM Socc 2021:
Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving
Overview
Morphling tunes the optimal configurations for your ML/DL model serving deployments.
It searches the best container-level configurations (e.g., resource allocations and runtime parameters) by empirical trials, where a few configurations are sampled for performance evaluation.
Features
Key benefits include:
- Automated tuning workflows hidden behind simple APIs.
- Out of the box ML model serving stress-test clients.
- Cloud agnostic and tested on AWS, Alicloud, etc.
- ML framework agnostic and generally support popular frameworks, including TensorFlow, PyTorch, etc.
- Equipped with various and customizable hyper-parameter tuning algorithms.
Getting started
Install using Yaml files
Install CRDs
From git root directory, run
kubectl apply -k config/crd/bases
Install Morphling Components
kubectl create namespace morphling-system
kubectl apply -k manifests/configmap
kubectl apply -k manifests/controllers
kubectl apply -k manifests/pv
kubectl apply -k manifests/mysql-db
kubectl apply -k manifests/db-manager
kubectl apply -k manifests/ui
kubectl apply -k manifests/algorithm
By default, Morphling will be installed under morphling-system
namespace.
The official Morphling component images are hosted under docker hub.
Check if all components are running successfully:
kubectl get deployment -n morphling-system
Expected output:
NAME READY UP-TO-DATE AVAILABLE AGE
morphling-algorithm-server 1/1 1 1 34s
morphling-controller 1/1 1 1 9m23s
morphling-db-manager 1/1 1 1 9m11s
morphling-mysql 1/1 1 1 9m15s
morphling-ui 1/1 1 1 4m53s
Uninstall Morphling controller
bash script/undeploy.sh
Delete CRDs
kubectl get crd | grep morphling.kubedl.io | cut -d ' ' -f 1 | xargs kubectl delete crd
Install using Helm chart
Install Helm
Helm is a package manager for Kubernetes. A demo installation on MacOS:
brew install helm
Check the helm website for more details.
Install Morphling
From the root directory, run
helm install morphling ./helm/morphling --create-namespace -n morphling-system
You can override default values defined in values.yaml with --set
flag.
For example, set the custom cpu/memory resource:
helm install morphling ./helm/morphling --create-namespace -n morphling-system --set resources.requests.cpu=1024m --set resources.requests.memory=2Gi
Helm will install CRDs and other Morphling components under morphling-system
namespace.
Uninstall Morphling
helm uninstall morphling -n morphling-system
Delete all Morphling CRDs
kubectl get crd | grep morphling.kubedl.io | cut -d ' ' -f 1 | xargs kubectl delete crd
Morphling UI
Morphling UI is built upon Ant Design.
If you are installing Morphling with Yaml files, from the root directory, run
kubectl apply -k manifests/ui
Or if you are installing Morphling with Helm chart, Morphling UI is automatically deployed.
Check if all Morphling UI is running successfully:
kubectl -n morphling-system get svc morphling-ui
Expected output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
morphling-ui NodePort 10.96.63.162 <none> 9091:30680/TCP 44m
If you are using minikube, you can get access to the UI with port-forward:
kubectl -n morphling-system port-forward --address 0.0.0.0 svc/morphling-ui 30263:9091
Then you can get access to the ui at http://localhost:30263/.
For detailed UI deployment and developing guide, please check UI.md
Running Examples
This example demonstrates how to tune the configuration for a mobilenet model deployed with Tensorflow Serving under Morphling.
For demonstration, we choose two configurations to tune:
the first one the CPU cores (resource allocation), and the second one is maximum serving batch size (runtime parameter).
We use grid search for configuration sampling.
Submit the configuration tuning experiment
kubectl -n morphling-system apply -f https://raw.githubusercontent.com/alibaba/morphling/main/examples/experiment/experiment-mobilenet-grid.yaml
To start multi-framework tunining experiment:
kubectl -n morphling-system apply -f examples/experiment/experiment-grid.yaml
You can specify the model name in this file examples/experiment/experiment-grid.yaml
. Noted that under the setting of INFERENCE_FRAMEWORK=vllm
and DTYPE=int8
, the bitsandbytes only support LLMs with LLAMA architecture (LlamaForCausalLM). So far we only support tuning between float16/bfloat16 and int8 data types. Make sure there are enough resources for LLM serving.
Monitor the status of the configuration tuning experiment
kubectl get -n morphling-system pe
kubectl describe -n morphling-system pe
kubectl -n morphling-system get trial
Get the searched optimal configuration
kubectl -n morphling-system get pe
Expected output:
NAME STATE AGE OBJECT NAME OPTIMAL OBJECT VALUE OPTIMAL PARAMETERS
mobilenet-experiment-grid Succeeded 12m qps 32 [map[category:resource name:cpu value:4] map[category:env name:BATCH_SIZE value:32]]
Delete the tuning experiment
kubectl -n morphling-system delete pe --all
Workflow
See Morphling Workflow to check how Morphling tunes ML serving
configurations automatically in a Kubernetes-native way.
Developer Guide
Build the controller manager binary
make manager
Run the tests
make test
Generate manifests, e.g., CRD, RBAC YAML files, etc.
make manifests
Build Multi inference framework Docker Image
Download the right version of vllm .whl file to pkg/server
directory (the guidance to download) before building the image.
For example, if the CUDA version is 11.8 and want to download vllm with version 0.6.1.post1, then download vllm-0.6.1.post1+cu118-cp310-cp310-manylinux1_x86_64.whl
to pkg/server
directory. Noeted that the python version in this image is 3.10.
Then modify the arguments CUDA_VERSION
and VLLM_FILE
in script/docker_build.sh
, and building the image.
Build the component docker images, e.g., Morphling controller, DB-Manager
make docker-build
Push the component docker images
make docker-push
To develop/debug Morphling controller manager locally, please check the debug guide.
If you have any questions or want to contribute, GitHub issues or pull requests are warmly welcome.