inference-manager

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2024 License: Apache-2.0

README

inference-manager

TODO

Here are some other notes:

Running Engine Locally

Run the following command:

make build-docker-engine
docker run \
  -v ./config:/config \
  -v ./adapter:/adapter \
  -p 8080:8080 \
  -p 8081:8081 \
  llm-operator/inference-manager-engine \
  run \
  --config /config/config.yaml

./config/config.yaml has the following content:

internalGrpcPort: 8081
ollamaPort: 8080

./adapter has ggml-adapter-model.bin (fine-tuned model).

Then hit the HTTP point and verify that Ollama responds.

curl http://localhost:11434/api/generate -d '{
  "model": "gemma:2b",
  "prompt":"Why is the sky blue?"
}'

Register a new model and use it.

grpcurl \
  -d '{"model_name": "gemma:2b-fine-tuned", "base_model": "gemma:2b", "adapter_path": "/adapter/ggml-adapter-model.bin"}' \
  -plaintext localhost:8081 \
  llmoperator.inference_engine.v1.InferenceEngineInternalService/RegisterModel

curl http://localhost:11434/api/generate -d '{
  "model": "gemma:2b-fine-tuned",
  "prompt":"Why is the sky blue?"
}'

Directories

Path Synopsis
api
v1
Package v1 is a reverse proxy.
Package v1 is a reverse proxy.
engine
cmd
server
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL