AIKit β¨
AIKit is a comprehensive platform to quickly get started to host, deploy, build and fine-tune large language models (LLMs).
AIKit offers two main capabilities:
-
Inference: AIKit uses LocalAI, which supports a wide range of inference capabilities and formats. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as Kubectl AI, Chatbot-UI and many more, to send requests to open LLMs!
-
Fine-Tuning: AIKit offers an extensible fine-tuning interface. It supports Unsloth for fast, memory efficient, and easy fine-tuning experience.
π For full documentation, please see AIKit website!
Features
Quick Start
You can get started with AIKit quickly on your local machine without a GPU!
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:8b
After running this, navigate to http://localhost:8080/chat to access the WebUI!
API
AIKit provides an OpenAI API compatible endpoint, so you can use any OpenAI API compatible client to send requests to open LLMs!
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama-3.1-8b-instruct",
"messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
}'
Output should be similar to:
{
// ...
"model": "llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of applications and services, allowing developers to focus on writing code rather than managing infrastructure."
}
}
],
// ...
}
That's it! π API is OpenAI compatible so this is a drop-in replacement for any OpenAI API compatible client.
Pre-made Models
AIKit comes with pre-made models that you can use out-of-the-box!
If it doesn't include a specific model, you can always create your own images, and host in a container registry of your choice!
CPU
Model |
Optimization |
Parameters |
Command |
Model Name |
License |
π¦ Llama 3.2 |
Instruct |
1B |
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:1b |
llama-3.2-1b-instruct |
Llama |
π¦ Llama 3.2 |
Instruct |
3B |
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:3b |
llama-3.2-3b-instruct |
Llama |
π¦ Llama 3.1 |
Instruct |
8B |
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:8b |
llama-3.1-8b-instruct |
Llama |
π¦ Llama 3.1 |
Instruct |
70B |
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:70b |
llama-3.1-70b-instruct |
Llama |
βοΈ Mixtral |
Instruct |
8x7B |
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b |
mixtral-8x7b-instruct |
Apache |
π
ΏοΈ Phi 3.5 |
Instruct |
3.8B |
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b |
phi-3.5-3.8b-instruct |
MIT |
π‘ Gemma 2 |
Instruct |
2B |
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/gemma2:2b |
gemma-2-2b-instruct |
Gemma |
β¨οΈ Codestral 0.1 |
Code |
22B |
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/codestral:22b |
codestral-22b |
MNLP |
NVIDIA CUDA
[!NOTE]
To enable GPU acceleration, please see GPU Acceleration.
Please note that only difference between CPU and GPU section is the --gpus all
flag in the command to enable GPU acceleration.
Model |
Optimization |
Parameters |
Command |
Model Name |
License |
π¦ Llama 3.2 |
Instruct |
1B |
docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:1b |
llama-3.2-1b-instruct |
Llama |
π¦ Llama 3.2 |
Instruct |
3B |
docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:3b |
llama-3.2-3b-instruct |
Llama |
π¦ Llama 3.1 |
Instruct |
8B |
docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.1:8b |
llama-3.1-8b-instruct |
Llama |
π¦ Llama 3.1 |
Instruct |
70B |
docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.1:70b |
llama-3.1-70b-instruct |
Llama |
βοΈ Mixtral |
Instruct |
8x7B |
docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b |
mixtral-8x7b-instruct |
Apache |
π
ΏοΈ Phi 3.5 |
Instruct |
3.8B |
docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b |
phi-3.5-3.8b-instruct |
MIT |
π‘ Gemma 2 |
Instruct |
2B |
docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/gemma2:2b |
gemma-2-2b-instruct |
Gemma |
β¨οΈ Codestral 0.1 |
Code |
22B |
docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/codestral:22b |
codestral-22b |
MNLP |
πΈ Flux 1 Dev |
Text to image |
12B |
docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/flux1:dev |
flux-1-dev |
FLUX.1 [dev] Non-Commercial License |
What's next?
π For more information and how to fine tune models or create your own images, please see AIKit website!