morphling

module
v0.0.0-...-764b39c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 1, 2024 License: Apache-2.0

README

Morphling

logo

Morphling is an auto-configuration framework for machine learning model serving (inference) on Kubernetes. Check the website for details.

Morphling paper accepted at ACM Socc 2021:
Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving

Overview

Morphling tunes the optimal configurations for your ML/DL model serving deployments. It searches the best container-level configurations (e.g., resource allocations and runtime parameters) by empirical trials, where a few configurations are sampled for performance evaluation.

Stack

Features

Key benefits include:

  • Automated tuning workflows hidden behind simple APIs.
  • Out of the box ML model serving stress-test clients.
  • Cloud agnostic and tested on AWS, Alicloud, etc.
  • ML framework agnostic and generally support popular frameworks, including TensorFlow, PyTorch, etc.
  • Equipped with various and customizable hyper-parameter tuning algorithms.

Getting started

Install using Yaml files
Install CRDs

From git root directory, run

kubectl apply -k config/crd/bases
Install Morphling Components
kubectl create namespace morphling-system

kubectl apply -k manifests/configmap
kubectl apply -k manifests/controllers
kubectl apply -k manifests/pv
kubectl apply -k manifests/mysql-db
kubectl apply -k manifests/db-manager
kubectl apply -k manifests/ui
kubectl apply -k manifests/algorithm

By default, Morphling will be installed under morphling-system namespace.

The official Morphling component images are hosted under docker hub.

Check if all components are running successfully:

kubectl get deployment -n morphling-system

Expected output:

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
morphling-algorithm-server   1/1     1            1           34s
morphling-controller         1/1     1            1           9m23s
morphling-db-manager         1/1     1            1           9m11s
morphling-mysql              1/1     1            1           9m15s
morphling-ui                 1/1     1            1           4m53s
Uninstall Morphling controller
bash script/undeploy.sh
Delete CRDs
kubectl get crd | grep morphling.kubedl.io | cut -d ' ' -f 1 | xargs kubectl delete crd
Install using Helm chart
Install Helm

Helm is a package manager for Kubernetes. A demo installation on MacOS:

brew install helm

Check the helm website for more details.

Install Morphling

From the root directory, run

helm install morphling ./helm/morphling --create-namespace -n morphling-system

You can override default values defined in values.yaml with --set flag. For example, set the custom cpu/memory resource:

helm install morphling ./helm/morphling --create-namespace -n morphling-system  --set resources.requests.cpu=1024m --set resources.requests.memory=2Gi

Helm will install CRDs and other Morphling components under morphling-system namespace.

Uninstall Morphling
helm uninstall morphling -n morphling-system
Delete all Morphling CRDs
kubectl get crd | grep morphling.kubedl.io | cut -d ' ' -f 1 | xargs kubectl delete crd

Morphling UI

Morphling UI is built upon Ant Design.

If you are installing Morphling with Yaml files, from the root directory, run

kubectl apply -k manifests/ui

Or if you are installing Morphling with Helm chart, Morphling UI is automatically deployed.

Stack

Check if all Morphling UI is running successfully:

kubectl -n morphling-system get svc morphling-ui

Expected output:

NAME           TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
morphling-ui   NodePort   10.96.63.162   <none>        9091:30680/TCP   44m

If you are using minikube, you can get access to the UI with port-forward:

kubectl -n morphling-system port-forward --address 0.0.0.0 svc/morphling-ui 30263:9091

Then you can get access to the ui at http://localhost:30263/.

For detailed UI deployment and developing guide, please check UI.md

Running Examples

This example demonstrates how to tune the configuration for a mobilenet model deployed with Tensorflow Serving under Morphling.

For demonstration, we choose two configurations to tune: the first one the CPU cores (resource allocation), and the second one is maximum serving batch size (runtime parameter). We use grid search for configuration sampling.

Submit the configuration tuning experiment
kubectl -n morphling-system apply -f https://raw.githubusercontent.com/alibaba/morphling/main/examples/experiment/experiment-mobilenet-grid.yaml

To start multi-framework tunining experiment:

kubectl -n morphling-system apply -f examples/experiment/experiment-grid.yaml

You can specify the model name in this file examples/experiment/experiment-grid.yaml. Noted that under the setting of INFERENCE_FRAMEWORK=vllm and DTYPE=int8, the bitsandbytes only support LLMs with LLAMA architecture (LlamaForCausalLM). So far we only support tuning between float16/bfloat16 and int8 data types. Make sure there are enough resources for LLM serving.

Monitor the status of the configuration tuning experiment
kubectl get -n morphling-system pe
kubectl describe -n morphling-system pe
Monitor sampling trials (performance test)
kubectl -n morphling-system get trial
Get the searched optimal configuration
kubectl -n morphling-system get pe

Expected output:

NAME                        STATE       AGE   OBJECT NAME   OPTIMAL OBJECT VALUE   OPTIMAL PARAMETERS
mobilenet-experiment-grid   Succeeded   12m   qps           32                     [map[category:resource name:cpu value:4] map[category:env name:BATCH_SIZE value:32]]
Delete the tuning experiment
kubectl -n morphling-system delete pe --all

Workflow

See Morphling Workflow to check how Morphling tunes ML serving configurations automatically in a Kubernetes-native way.

Developer Guide

Build the controller manager binary
make manager
Run the tests
make test
Generate manifests, e.g., CRD, RBAC YAML files, etc.
make manifests
Build Multi inference framework Docker Image

Download the right version of vllm .whl file to pkg/server directory (the guidance to download) before building the image. For example, if the CUDA version is 11.8 and want to download vllm with version 0.6.1.post1, then download vllm-0.6.1.post1+cu118-cp310-cp310-manylinux1_x86_64.whl to pkg/server directory. Noeted that the python version in this image is 3.10. Then modify the arguments CUDA_VERSION and VLLM_FILE in script/docker_build.sh, and building the image.

Build the component docker images, e.g., Morphling controller, DB-Manager
make docker-build
Push the component docker images
make docker-push

To develop/debug Morphling controller manager locally, please check the debug guide.

Community

If you have any questions or want to contribute, GitHub issues or pull requests are warmly welcome.

Directories

Path Synopsis
api
v1alpha1
Package v1alpha1 contains API Schema definitions for the morphling v1alpha1 API group +kubebuilder:object:generate=true +groupName=morphling.kubedl.io +kubebuilder:subresource:status
Package v1alpha1 contains API Schema definitions for the morphling v1alpha1 API group +kubebuilder:object:generate=true +groupName=morphling.kubedl.io +kubebuilder:subresource:status
v1alpha1/grpc_proto/health
Package grpc_health_v1 is a generated protocol buffer package.
Package grpc_health_v1 is a generated protocol buffer package.
cmd
console
pkg
mock/db
Package mock_backends is a generated GoMock package.
Package mock_backends is a generated GoMock package.
mock/profilingexperiment/sampling
Package mock_sampling is a generated GoMock package.
Package mock_sampling is a generated GoMock package.
mock/trial
Package mock_dbclient is a generated GoMock package.
Package mock_dbclient is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL