Documentation
¶
Index ¶
- Variables
- func GPUCount() (cnt uint)
- func GPUSlots() (cnt uint, freeCnt uint)
- func GetCUDAInfo() (outDevs cudaDevices, err kv.Error)
- func GetDevices(slots uint) (devices []string, err kv.Error)
- func GetSlots(name string) (slots uint, err kv.Error)
- func HasCUDA() bool
- func LargestFreeGPUMem() (freeMem uint64)
- func LargestFreeGPUSlots() (cnt uint)
- func MonitorGPUs(ctx context.Context, statusC chan<- []string, errC chan<- kv.Error)
- func ReturnGPU(alloc *GPUAllocated) (err kv.Error)
- func TotalFreeGPUSlots() (cnt uint)
- type GPUAllocated
- type GPUAllocations
- type GPUTrack
Constants ¶
This section is empty.
Variables ¶
var ( // UseGPU is used for specific types of testing to disable GPU tests when there // are GPU cards potentially present but they need to be disabled, this flag // is not used during production to change behavior in any way UseGPU *bool // CudaInitErr records the result of the CUDA library initialization that would // impact ongoing operation CudaInitErr *kv.Error // CudaInitWarnings records warnings and kv.that are deemed not be be fatal // to the ongoing CUDA library usage but are of importance CudaInitWarnings = []kv.Error{} // CudaInTest is used to check if the running process is a go test process, if so then // this will disable certain types of checking when using very limited GPU // Hardware CudaInTest = false )
Functions ¶
func GetCUDAInfo ¶
func GetDevices ¶
GetDevices will return a list of the possible devices that support a specified compute slot count. The returned order of cards is ascending going from the smaller capacity cards to the largest and most expensive. This function incorporates the AWS naming for cards when using the EC2 information functions to extract card details.
func LargestFreeGPUMem ¶
func LargestFreeGPUMem() (freeMem uint64)
LargestFreeGPUMem will obtain the largest number of available GPU slots on any of the individual cards accessible to the runner
func LargestFreeGPUSlots ¶
func LargestFreeGPUSlots() (cnt uint)
LargestFreeGPUSlots gets the largest number of single device free GPU slots
func MonitorGPUs ¶
MonitorGPUs will having initialized all of the devices in the tracking map when started as a go function check the devices for ECC and other kv.marking failed GPUs
func ReturnGPU ¶
func ReturnGPU(alloc *GPUAllocated) (err kv.Error)
ReturnGPU releases the GPU allocation passed in. It will validate some of the allocation details but is an honors system.
func TotalFreeGPUSlots ¶
func TotalFreeGPUSlots() (cnt uint)
TotalFreeGPUSlots gets the largest number of single device free GPU slots
Types ¶
type GPUAllocated ¶
type GPUAllocated struct { Slots uint // The number of GPU slots given from the allocation Mem uint64 // The amount of memory given to the allocation Env map[string]string // Any environment variables the device allocator wants the runner to use // contains filtered or unexported fields }
GPUAllocated is used to record the allocation/reservation of a GPU resource on behalf of a caller
type GPUAllocations ¶
type GPUAllocations []*GPUAllocated
GPUAllocations records the allocations that together are present to a caller.
type GPUTrack ¶
type GPUTrack struct { UUID string // The UUID designation for the GPU being managed Slots uint // The number of logical slots the GPU based on its throughput/size has Mem uint64 // The amount of memory the GPU has Allocated bool // Indicates is the card is allocated currently EccFailure *kv.Error // Any Ecc failure related error messages, nil if no kv.encountered Tracking map[string]struct{} // Used to validate allocations as they are released }
GPUTrack is used to track usage of GPU cards and any kv.generated by the cards at the hardware level
func GPUInventory ¶
GPUInventory can be used to extract a copy of the current state of the GPU hardware seen within the runner