inference-manager

module

v1.10.0 Latest Latest Go to latest Published: Dec 6, 2024 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/llmariner/inference-manager

Links

Open Source Insights

README ¶

inference-manager

The inference-manager manages inference runtimes (e.g., vLLM and Ollama) in containers, load models, and process requests.

Set up Inference Server/Engine for development

Requirements:

Run the following command:

make setup-all

[!TIP]

Run just only make helm-reapply-inference-server or make helm-reapply-inference-engine, it will rebuild inference-manager container images, deploy them using the local helm chart, and restart containers.

You can configure parameters in .values.yaml.

Run vLLM on ARM macOS

To run vLLM on ARM CPU (macOS), you'll need to build an image.

git clone https://github.com/vllm-project/vllm.git
cd vllm
docker build -f Dockerfile.arm -t vllm-cpu-env --shm-size=4g .
kind load docker-image vllm-cpu-env:latest

Then, run make with the RUNTIME option.

make setup-all RUNTIME=vllm

[!NOTE] See vLLM - ARM installation for details.

Try out inference APIs

with curl:

curl --request POST http://localhost:8080/v1/chat/completions -d '{
  "model": "google-gemma-2b-it-q4_0",
  "messages": [{"role": "user", "content": "hello"}]
}'

with llma:

export LLMARINER_API_KEY=dummy
llma chat completions create \
    --model google-gemma-2b-it-q4_0 \
    --role system \
    --completion 'hi'

Directories ¶

Path	Synopsis
api
v1
common
pkg/sse
pkg/test
engine
cmd
internal/autoscaler
internal/config
internal/metrics
internal/modeldownloader
internal/modeldownloader/common
internal/modeldownloader/huggingface
internal/models
internal/ollama
internal/processor
internal/runtime
internal/s3
server
cmd
internal/admin
internal/config
internal/infprocessor
internal/monitoring
internal/rag
internal/rate
internal/router
internal/server
internal/taskexchanger
triton-proxy
cmd
internal/server

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL