Documentation ¶
Index ¶
- Constants
- Variables
- func GetAKSGPUImageSHA(size string) string
- func GetGPUDriverVersion(size string) string
- func GetSubnetResourceID(subscriptionID, resourceGroupName, virtualNetworkName, subnetName string) string
- func GetVMName(providerID string) (string, error)
- func GetVnetSubnetIDComponents(vnetSubnetID string) (vnetSubnetResource, error)
- func IsMarinerEnabledGPUSKU(vmSize string) bool
- func IsNvidiaEnabledSKU(vmSize string) bool
- func MkVMID(resourceGroupName string, vmName string) string
- func ResourceIDToProviderID(ctx context.Context, id string) string
Constants ¶
const ( Nvidia470CudaDriverVersion = "cuda-470.82.01" Nvidia550CudaDriverVersion = "cuda-550.54.15" Nvidia535GridDriverVersion = "grid-535.161.08" // These SHAs will change once we update aks-gpu images in aks-gpu repository. We do that fairly rarely at this time. // So for now these will be kept here like this and periodically bump them AKSGPUGridSHA = "sha-d1f0ca" AKSGPUCudaSHA = "sha-2d4c96" )
TODO: Get these from agentbaker
Variables ¶
var ( /* If a new GPU sku becomes available, add a key to this map, but only if you have a confirmation that we have an agreement with NVIDIA for this specific gpu. */ NvidiaEnabledSKUs = map[string]bool{ "standard_nv6": true, "standard_nv12": true, "standard_nv12s_v3": true, "standard_nv24": true, "standard_nv24s_v3": true, "standard_nv24r": true, "standard_nv48s_v3": true, "standard_nd6s": true, "standard_nd12s": true, "standard_nd24s": true, "standard_nd24rs": true, "standard_nc6s_v2": true, "standard_nc12s_v2": true, "standard_nc24s_v2": true, "standard_nc24rs_v2": true, "standard_nc6s_v3": true, "standard_nc12s_v3": true, "standard_nc24s_v3": true, "standard_nc24rs_v3": true, "standard_nd40s_v3": true, "standard_nd40rs_v2": true, "standard_nc4as_t4_v3": true, "standard_nc8as_t4_v3": true, "standard_nc16as_t4_v3": true, "standard_nc64as_t4_v3": true, "standard_nd96asr_v4": true, "standard_nd112asr_a100_v4": true, "standard_nd120asr_a100_v4": true, "standard_nd96amsr_a100_v4": true, "standard_nd112amsr_a100_v4": true, "standard_nd120amsr_a100_v4": true, "standard_nc24ads_a100_v4": true, "standard_nc48ads_a100_v4": true, "standard_nc96ads_a100_v4": true, "standard_ncads_a100_v4": true, "standard_nc8ads_a10_v4": true, "standard_nc16ads_a10_v4": true, "standard_nc32ads_a10_v4": true, "standard_nv6ads_a10_v5": true, "standard_nv12ads_a10_v5": true, "standard_nv18ads_a10_v5": true, "standard_nv36ads_a10_v5": true, "standard_nv36adms_a10_v5": true, "standard_nv72ads_a10_v5": true, "standard_nd96ams_v4": true, "standard_nd96ams_a100_v4": true, } // List of GPU SKUs currently enabled and validated for Mariner. Will expand the support // to cover other SKUs available in Azure MarinerNvidiaEnabledSKUs = map[string]bool{ "standard_nc6s_v3": true, "standard_nc12s_v3": true, "standard_nc24s_v3": true, "standard_nc24rs_v3": true, "standard_nd40s_v3": true, "standard_nd40rs_v2": true, "standard_nc4as_t4_v3": true, "standard_nc8as_t4_v3": true, "standard_nc16as_t4_v3": true, "standard_nc64as_t4_v3": true, } )
var ConvergedGPUDriverSizes = map[string]bool{ "standard_nv6ads_a10_v5": true, "standard_nv12ads_a10_v5": true, "standard_nv18ads_a10_v5": true, "standard_nv36ads_a10_v5": true, "standard_nv72ads_a10_v5": true, "standard_nv36adms_a10_v5": true, "standard_nc8ads_a10_v4": true, "standard_nc16ads_a10_v4": true, "standard_nc32ads_a10_v4": true, }
ConvergedGPUDriverSizes : these sizes use a "converged" driver to support both cuda/grid workloads.
how do you figure this out? ask HPC or find out by trial and error. installing vanilla cuda drivers will fail to install with opaque errors. see https://github.com/Azure/azhpc-extensions/blob/daaefd78df6f27012caf30f3b54c3bd6dc437652/NvidiaGPU/resources.json
Functions ¶
func GetAKSGPUImageSHA ¶
func GetGPUDriverVersion ¶
NV series GPUs target graphics workloads vs NC which targets compute. they typically use GRID, not CUDA drivers, and will fail to install CUDA drivers. NVv1 seems to run with CUDA, NVv5 requires GRID. NVv3 is untested on AKS, NVv4 is AMD so n/a, and NVv2 no longer seems to exist (?).
func GetSubnetResourceID ¶ added in v0.4.0
func GetSubnetResourceID(subscriptionID, resourceGroupName, virtualNetworkName, subnetName string) string
GetSubnetResourceID constructs the subnet resource id
func GetVMName ¶
GetVMName parses the provider ID stored on the node to get the vmName associated with a node
func GetVnetSubnetIDComponents ¶ added in v0.4.0
func IsMarinerEnabledGPUSKU ¶
IsNvidiaEnabledSKU determines if an VM SKU has nvidia driver support
func IsNvidiaEnabledSKU ¶
IsNvidiaEnabledSKU determines if an VM SKU has nvidia driver support
Types ¶
This section is empty.