A lightweight P2P-based cache system for model distributions on Kubernetes.
data:image/s3,"s3://crabby-images/8e3bc/8e3bc70430a393cbad70e7da7cf3427131c550d7" alt="Latest Release"
Name Story: the inspiration of the name Manta
is coming from Dota2, called Manta Style, which will create 2 images of your hero just like peers in the P2P network.
Architecture
data:image/s3,"s3://crabby-images/6a7b0/6a7b0ddeb43b6c54ade590cb2405248d0f516ffa" alt="architecture"
Note: llmaz is just one kind of integrations, Manta can be deployed and used independently.
Features Overview
- Model Preheat: Models could be preloaded to clusters, to specified nodes to accelerate the model serving.
- Model Cache: Models will be cached after downloading for faster model loading.
- Model Lifecycle Management: Manage the model lifecycle automatically with different policies, like
Retain
or Delete
.
- Plugin Framework: Filter and Score plugins could be extended to pick up the best candidates.
- Memory Management(WIP): Manage the reserved memories for caching, together with LRU algorithm for GC.
Quick Start
Installation
Read the Installation for guidance.
Preheat Models
A sample to preload the Qwen/Qwen2.5-0.5B-Instruct
model:
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
hub:
name: Huggingface
repoID: Qwen/Qwen2.5-0.5B-Instruct
If you want to preload the model to specified nodes, use the NodeSelector
:
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
hub:
name: Huggingface
repoID: Qwen/Qwen2.5-0.5B-Instruct
nodeSelector:
zone: zone-a
Delete Models
If you want to remove the model weights once Torrent
is deleted, set the ReclaimPolicy=Delete
, default to Retain
:
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
hub:
name: Huggingface
repoID: Qwen/Qwen2.5-0.5B-Instruct
reclaimPolicy: Delete
More details refer to the APIs.
Join us for more discussions:
Contributions
All kinds of contributions are welcomed ! Please following CONTRIBUTING.md.