inference-manager

module

v0.305.0 Latest Latest Go to latest Published: Sep 10, 2024 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/llm-operator/inference-manager

Links

Open Source Insights

README ¶

inference-manager

Running with Docker Compose

Run the following command:

docker-compose build
docker-compose up

You then need to exec into the engine container and pull a model by running the following command:

export OLLAMA_HOST=0.0.0.0:8080
ollama pull gemma:2b

Then you can hit inference-manager-server at port 8080.

curl --request POST http://localhost:8080/v1/chat/completions -d '{
  "model": "gemma:2b",
  "messages": [{"role": "user", "content": "hello"}]
}'

Running Engine Locally

Run the following command:

make build-docker-engine
docker run \
  -v ./configs/engine:/config \
  -p 8080:8080 \
  -p 8081:8081 \
  llm-operator/inference-manager-engine \
  run \
  --config /config/config.yaml

Then hit the HTTP point and verify that Ollama responds.

curl http://localhost:8080/api/generate -d '{
  "model": "gemma:2b",
  "prompt":"Why is the sky blue?"
}'

If you want to load modelds from your local filesystem, you can add mount the volume.

docker run \
  -v ./configs/engine:/config \
  -p 8080:8080 \
  -p 8081:8081 \
  -v ./models:/models \
  llm-operator/inference-manager-engine \
  run \
  --config /config/config.yaml

Then import the models to Ollama.

docker exec -it <contaiener ID> bash

export OLLAMA_HOST=0.0.0.0:8080
ollama create <model-name> -f <modelfile>

Here are example modelfiles:

FROM /models/gemma-2b-it.gguf
TEMPLATE """<start_of_turn>user
{{ if .System }}{{ .System }} {{ end }}{{ .Prompt }}<end_of_turn>
<start_of_turn>model
{{ .Response }}<end_of_turn>
"""
PARAMETER repeat_penalty 1
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"

FROM gemma-2b-it
ADAPTER /models/ggml-adapter-model.bin

Directories ¶

Path	Synopsis
api
v1
common
pkg/models
pkg/sse
engine
cmd
internal/autoscaler
internal/config
internal/health
internal/huggingface
internal/metrics
internal/modeldownloader
internal/modelsyncer
internal/ollama
internal/processor
internal/runtime
internal/s3
internal/test
pkg
llmkind
server
cmd
internal/admin
internal/config
internal/infprocessor
internal/monitoring
internal/rag
internal/router
internal/server

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL