Documentation ¶
Index ¶
- Constants
- Variables
- func GetAKSGPUImageSHA(size string) string
- func GetGPUDriverType(size string) string
- func GetGPUDriverVersion(size string) string
- func GetSubnetResourceID(subscriptionID, resourceGroupName, virtualNetworkName, subnetName string) string
- func GetVMName(providerID string) (string, error)
- func GetVnetSubnetIDComponents(vnetSubnetID string) (vnetSubnetResource, error)
- func ImageReferenceToString(imageRef *armcompute.ImageReference) string
- func IsMarinerEnabledGPUSKU(vmSize string) bool
- func IsNvidiaEnabledSKU(vmSize string) bool
- func IsVMDeleting(vm armcompute.VirtualMachine) bool
- func MkVMID(resourceGroupName string, vmName string) string
- func PrettySlice[T any](s []T, maxItems int) string
- func ResourceIDToProviderID(ctx context.Context, id string) string
- func StringMap(list v1.ResourceList) map[string]string
- func UseGridDrivers(size string) bool
- func WithDefaultFloat64(key string, def float64) float64
- type NvidiaSKUConfig
Constants ¶
const ( Nvidia470CudaDriverVersion = "470.82.01" // https://github.com/Azure/AgentBaker/blob/ddf36a24eafd02ce0589657ff2dc799125f4ad37/parts/linux/cloud-init/artifacts/components.json#L562 NvidiaCudaDriverVersion = "550.90.12" AKSGPUCudaVersionSuffix = "20241021235610" NvidiaGridDriverVersion = "535.161.08" AKSGPUGridVersionSuffix = "20241021235607" )
TODO: Get these from agentbaker
Variables ¶
var ConvergedGPUDriverSizes = map[string]bool{ "standard_nv6ads_a10_v5": true, "standard_nv12ads_a10_v5": true, "standard_nv18ads_a10_v5": true, "standard_nv36ads_a10_v5": true, "standard_nv72ads_a10_v5": true, "standard_nv36adms_a10_v5": true, "standard_nc8ads_a10_v4": true, "standard_nc16ads_a10_v4": true, "standard_nc32ads_a10_v4": true, }
ConvergedGPUDriverSizes : these sizes use a "converged" driver to support both cuda/grid workloads.
how do you figure this out? ask HPC or find out by trial and error. installing vanilla cuda drivers will fail to install with opaque errors. see https://github.com/Azure/azhpc-extensions/blob/daaefd78df6f27012caf30f3b54c3bd6dc437652/NvidiaGPU/resources.json
Functions ¶
func GetAKSGPUImageSHA ¶
func GetGPUDriverType ¶ added in v0.5.5
GetGPUDriverType returns the type of GPU driver for given VM SKU ("grid" or "cuda")
func GetGPUDriverVersion ¶
NV series GPUs target graphics workloads vs NC which targets compute. they typically use GRID, not CUDA drivers, and will fail to install CUDA drivers. NVv1 seems to run with CUDA, NVv5 requires GRID. NVv3 is untested on AKS, NVv4 is AMD so n/a, and NVv2 no longer seems to exist (?).
func GetSubnetResourceID ¶ added in v0.4.0
func GetSubnetResourceID(subscriptionID, resourceGroupName, virtualNetworkName, subnetName string) string
GetSubnetResourceID constructs the subnet resource id
func GetVMName ¶
GetVMName parses the provider ID stored on the node to get the vmName associated with a node
func GetVnetSubnetIDComponents ¶ added in v0.4.0
func ImageReferenceToString ¶ added in v0.7.0
func ImageReferenceToString(imageRef *armcompute.ImageReference) string
func IsMarinerEnabledGPUSKU ¶
IsNvidiaEnabledSKU determines if an VM SKU has nvidia driver support
func IsNvidiaEnabledSKU ¶
IsNvidiaEnabledSKU determines if an VM SKU has nvidia driver support
func IsVMDeleting ¶ added in v0.7.0
func IsVMDeleting(vm armcompute.VirtualMachine) bool
func PrettySlice ¶ added in v0.7.0
PrettySlice truncates a slice after a certain number of max items to ensure that the Slice isn't too long
func StringMap ¶ added in v0.7.0
func StringMap(list v1.ResourceList) map[string]string
StringMap returns the string map representation of the resource list
func UseGridDrivers ¶ added in v0.5.5
func WithDefaultFloat64 ¶ added in v0.7.0
WithDefaultFloat64 returns the float64 value of the supplied environment variable or, if not present, the supplied default value. If the float64 conversion fails, returns the default