Documentation ¶
Overview ¶
Package v1alpha1 contains API Schema definitions for the v1alpha1 API group +kubebuilder:object:generate=true +groupName=llmaz.io
Index ¶
Constants ¶
const ( ModelFamilyNameLabelKey = "llmaz.io/model-family-name" ModelNameLabelKey = "llmaz.io/model-name" HUGGING_FACE = "Huggingface" MODEL_SCOPE = "ModelScope" )
Variables ¶
var ( // GroupVersion is group version used to register these objects GroupVersion = schema.GroupVersion{Group: "llmaz.io", Version: "v1alpha1"} // SchemeGroupVersion is alias to GroupVersion for client-go libraries. // It is required by pkg/client/informers/externalversions/... SchemeGroupVersion = GroupVersion // SchemeBuilder is used to add go types to the GroupVersionKind scheme SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion} // AddToScheme adds the types in this group-version to the given scheme. AddToScheme = SchemeBuilder.AddToScheme )
Functions ¶
func Resource ¶
func Resource(resource string) schema.GroupResource
Resource is required by pkg/client/listers/...
Types ¶
type Flavor ¶
type Flavor struct { // Name represents the flavor name, which will be used in model claim. Name FlavorName `json:"name"` // Requests defines the required accelerators to serve the model, like nvidia.com/gpu: 8. // When GPU number is greater than 8, like 32, then multi-host inference is enabled and // 32/8=4 hosts will be grouped as an unit, each host will have a resource request as // nvidia.com/gpu: 8. The may change in the future if the GPU number limit is broken. // Not recommended to set the cpu and memory usage here. // If using playground, you can define the cpu/mem usage at backendConfig. // If using service, you can define the cpu/mem at the container resources. // Note: if you define the same accelerator requests at playground/service as well, // the requests here will be covered. // +optional Requests v1.ResourceList `json:"requests,omitempty"` // NodeSelector represents the node candidates for Pod placements, if a node doesn't // meet the nodeSelector, it will be filtered out in the resourceFungibility scheduler plugin. // If nodeSelector is empty, it means every node is a candidate. // +optional NodeSelector map[string]string `json:"nodeSelector,omitempty"` // Params stores other useful parameters and will be consumed by the autoscaling components // like cluster-autoscaler, Karpenter. // E.g. when scaling up nodes with 8x Nvidia A00, the parameter can be injected with // instance-type: p4d.24xlarge for AWS. // +optional Params map[string]string `json:"params,omitempty"` }
Flavor defines the accelerator requirements for a model and the necessary parameters in autoscaling. Right now, it will be used in two places: - Pod scheduling with node selectors specified. - Cluster autoscaling with essential parameters provided.
func (*Flavor) DeepCopy ¶
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new Flavor.
func (*Flavor) DeepCopyInto ¶
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type FlavorName ¶
type FlavorName string
type ModelClaim ¶
type ModelClaim struct { // ModelName represents the name of the Model. ModelName ModelName `json:"modelName,omitempty"` // InferenceFlavors represents a list of flavors with fungibility support // to serve the model. // If set, The flavor names should be a subset of the model configured flavors. // If not set, Model configured flavors will be used by default. // +optional InferenceFlavors []FlavorName `json:"inferenceFlavors,omitempty"` }
ModelClaim represents claiming for one model, it's the standard claimMode of multiModelsClaim compared to other modes like SpeculativeDecoding.
func (*ModelClaim) DeepCopy ¶
func (in *ModelClaim) DeepCopy() *ModelClaim
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelClaim.
func (*ModelClaim) DeepCopyInto ¶
func (in *ModelClaim) DeepCopyInto(out *ModelClaim)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelClaims ¶ added in v0.0.6
type ModelClaims struct { // Models represents a list of models with roles specified, there maybe // multiple models here to support state-of-the-art technologies like // speculative decoding, then one model is main(target) model, another one // is draft model. // +kubebuilder:validation:MinItems=1 Models []ModelRefer `json:"models,omitempty"` // InferenceFlavors represents a list of flavors with fungibility supported // to serve the model. // - If not set, always apply with the 0-index model by default. // - If set, will lookup the flavor names following the model orders. // +optional InferenceFlavors []FlavorName `json:"inferenceFlavors,omitempty"` }
ModelClaims represents multiple claims for different models.
func (*ModelClaims) DeepCopy ¶ added in v0.0.6
func (in *ModelClaims) DeepCopy() *ModelClaims
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelClaims.
func (*ModelClaims) DeepCopyInto ¶ added in v0.0.6
func (in *ModelClaims) DeepCopyInto(out *ModelClaims)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelHub ¶
type ModelHub struct { // Name refers to the model registry, such as huggingface. // +kubebuilder:default=Huggingface // +kubebuilder:validation:Enum={Huggingface,ModelScope} // +optional Name *string `json:"name,omitempty"` // ModelID refers to the model identifier on model hub, // such as meta-llama/Meta-Llama-3-8B. ModelID string `json:"modelID,omitempty"` // Filename refers to a specified model file rather than the whole repo. // This is helpful to download a specified GGUF model rather than downloading // the whole repo which includes all kinds of quantized models. // TODO: this is only supported with Huggingface, add support for ModelScope // in the near future. // Note: once filename is set, allowPatterns and ignorePatterns should be left unset. Filename *string `json:"filename,omitempty"` // Revision refers to a Git revision id which can be a branch name, a tag, or a commit hash. // +kubebuilder:default=main // +optional Revision *string `json:"revision,omitempty"` // AllowPatterns refers to files matched with at least one pattern will be downloaded. // +optional AllowPatterns []string `json:"allowPatterns,omitempty"` // IgnorePatterns refers to files matched with any of the patterns will not be downloaded. // +optional IgnorePatterns []string `json:"ignorePatterns,omitempty"` }
ModelHub represents the model registry for model downloads.
func (*ModelHub) DeepCopy ¶
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelHub.
func (*ModelHub) DeepCopyInto ¶
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelRefer ¶ added in v0.0.7
type ModelRefer struct { // Name represents the model name. Name ModelName `json:"name"` // Role represents the model role once more than one model is required. // Such as a draft role, which means running with SpeculativeDecoding, // and default arguments for backend will be searched in backendRuntime // with the name of speculative-decoding. // +kubebuilder:validation:Enum={main,draft} // +kubebuilder:default=main // +optional Role *ModelRole `json:"role,omitempty"` }
ModelRefer refers to a created Model with it's role.
func (*ModelRefer) DeepCopy ¶ added in v0.0.7
func (in *ModelRefer) DeepCopy() *ModelRefer
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelRefer.
func (*ModelRefer) DeepCopyInto ¶ added in v0.0.7
func (in *ModelRefer) DeepCopyInto(out *ModelRefer)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelRole ¶ added in v0.0.6
type ModelRole string
const ( // Main represents the main model, if only one model is required, // it must be the main model. Only one main model is allowed. MainRole ModelRole = "main" // Draft represents the draft model in speculative decoding, // the main model is the target model then. DraftRole ModelRole = "draft" )
type ModelSource ¶
type ModelSource struct { // ModelHub represents the model registry for model downloads. // +optional ModelHub *ModelHub `json:"modelHub,omitempty"` // URI represents a various kinds of model sources following the uri protocol, e.g.: // - OSS: oss://<bucket>.<endpoint>/<path-to-your-model> // // +optional URI *URIProtocol `json:"uri,omitempty"` }
ModelSource represents the source of the model. Only one model source will be used.
func (*ModelSource) DeepCopy ¶
func (in *ModelSource) DeepCopy() *ModelSource
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelSource.
func (*ModelSource) DeepCopyInto ¶
func (in *ModelSource) DeepCopyInto(out *ModelSource)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelSpec ¶
type ModelSpec struct { // FamilyName represents the model type, like llama2, which will be auto injected // to the labels with the key of `llmaz.io/model-family-name`. FamilyName ModelName `json:"familyName"` // Source represents the source of the model, there're several ways to load // the model such as loading from huggingface, OCI registry, s3, host path and so on. Source ModelSource `json:"source"` // InferenceFlavors represents the accelerator requirements to serve the model. // Flavors are fungible following the priority represented by the slice order. // +kubebuilder:validation:MaxItems=8 // +optional InferenceFlavors []Flavor `json:"inferenceFlavors,omitempty"` }
ModelSpec defines the desired state of Model
func (*ModelSpec) DeepCopy ¶
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelSpec.
func (*ModelSpec) DeepCopyInto ¶
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelStatus ¶
type ModelStatus struct { // Conditions represents the Inference condition. Conditions []metav1.Condition `json:"conditions,omitempty"` }
ModelStatus defines the observed state of Model
func (*ModelStatus) DeepCopy ¶
func (in *ModelStatus) DeepCopy() *ModelStatus
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelStatus.
func (*ModelStatus) DeepCopyInto ¶
func (in *ModelStatus) DeepCopyInto(out *ModelStatus)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type OpenModel ¶
type OpenModel struct { metav1.TypeMeta `json:",inline"` metav1.ObjectMeta `json:"metadata,omitempty"` Spec ModelSpec `json:"spec,omitempty"` Status ModelStatus `json:"status,omitempty"` }
OpenModel is the Schema for the open models API
func (*OpenModel) DeepCopy ¶
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new OpenModel.
func (*OpenModel) DeepCopyInto ¶
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (*OpenModel) DeepCopyObject ¶
DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
type OpenModelList ¶
type OpenModelList struct { metav1.TypeMeta `json:",inline"` metav1.ListMeta `json:"metadata,omitempty"` Items []OpenModel `json:"items"` }
OpenModelList contains a list of OpenModel
func (*OpenModelList) DeepCopy ¶
func (in *OpenModelList) DeepCopy() *OpenModelList
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new OpenModelList.
func (*OpenModelList) DeepCopyInto ¶
func (in *OpenModelList) DeepCopyInto(out *OpenModelList)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (*OpenModelList) DeepCopyObject ¶
func (in *OpenModelList) DeepCopyObject() runtime.Object
DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.