cuda

package

v0.0.0-...-f09375e Latest Latest Go to latest Published: Feb 12, 2025 License: Apache-2.0 Imports: 15 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/leaf-ai/studio-go-runner

Documentation ¶

Index ¶

Variables
func GPUCount() (cnt uint)
func GPUSlots() (cnt uint, freeCnt uint)
func GetCUDAInfo() (outDevs cudaDevices, err kv.Error)
func GetDevices(slots uint) (devices []string, err kv.Error)
func GetSlots(name string) (slots uint, err kv.Error)
func HasCUDA() bool
func LargestFreeGPUMem() (freeMem uint64)
func LargestFreeGPUSlots() (cnt uint)
func MonitorGPUs(ctx context.Context, statusC chan<- []string, errC chan<- kv.Error)
func ReturnGPU(alloc *GPUAllocated) (err kv.Error)
func TotalFreeGPUSlots() (cnt uint)
type GPUAllocated
type GPUAllocations
- func AllocGPU(maxGPU uint, maxGPUMem uint64, unitsOfAllocation []int, cardCount int, ...) (alloc GPUAllocations, err kv.Error)
type GPUTrack
- func GPUInventory() (gpus []GPUTrack, err kv.Error)

Constants ¶

This section is empty.

Variables ¶

View Source

var (

	// UseGPU is used for specific types of testing to disable GPU tests when there
	// are GPU cards potentially present but they need to be disabled, this flag
	// is not used during production to change behavior in any way
	UseGPU *bool

	// CudaInitErr records the result of the CUDA library initialization that would
	// impact ongoing operation
	CudaInitErr *kv.Error

	// CudaInitWarnings records warnings and kv.that are deemed not be be fatal
	// to the ongoing CUDA library usage but are of importance
	CudaInitWarnings = []kv.Error{}

	// CudaInTest is used to check if the running process is a go test process, if so then
	// this will disable certain types of checking when using very limited GPU
	// Hardware
	CudaInTest = false
)

Functions ¶

func GPUCount ¶

func GPUCount() (cnt uint)

GPUCount returns the number of allocatable GPU resources

func GPUSlots ¶

func GPUSlots() (cnt uint, freeCnt uint)

GPUSlots gets the free and total number of GPU capacity slots within the machine

func GetCUDAInfo ¶

func GetCUDAInfo() (outDevs cudaDevices, err kv.Error)

func GetDevices ¶

func GetDevices(slots uint) (devices []string, err kv.Error)

GetDevices will return a list of the possible devices that support a specified compute slot count. The returned order of cards is ascending going from the smaller capacity cards to the largest and most expensive. This function incorporates the AWS naming for cards when using the EC2 information functions to extract card details.

func GetSlots ¶

func GetSlots(name string) (slots uint, err kv.Error)

GetSlots is used to retrieved the number of compute slots that cards are capable of

func HasCUDA ¶

func HasCUDA() bool

func LargestFreeGPUMem ¶

func LargestFreeGPUMem() (freeMem uint64)

LargestFreeGPUMem will obtain the largest number of available GPU slots on any of the individual cards accessible to the runner

func LargestFreeGPUSlots ¶

func LargestFreeGPUSlots() (cnt uint)

LargestFreeGPUSlots gets the largest number of single device free GPU slots

func MonitorGPUs ¶

func MonitorGPUs(ctx context.Context, statusC chan<- []string, errC chan<- kv.Error)

MonitorGPUs will having initialized all of the devices in the tracking map when started as a go function check the devices for ECC and other kv.marking failed GPUs

func ReturnGPU ¶

func ReturnGPU(alloc *GPUAllocated) (err kv.Error)

ReturnGPU releases the GPU allocation passed in. It will validate some of the allocation details but is an honors system.

func TotalFreeGPUSlots ¶

func TotalFreeGPUSlots() (cnt uint)

TotalFreeGPUSlots gets the largest number of single device free GPU slots

Types ¶

type GPUAllocated ¶

type GPUAllocated struct {
	Slots uint              // The number of GPU slots given from the allocation
	Mem   uint64            // The amount of memory given to the allocation
	Env   map[string]string // Any environment variables the device allocator wants the runner to use
	// contains filtered or unexported fields
}

GPUAllocated is used to record the allocation/reservation of a GPU resource on behalf of a caller

type GPUAllocations ¶

type GPUAllocations []*GPUAllocated

GPUAllocations records the allocations that together are present to a caller.

func AllocGPU ¶

func AllocGPU(maxGPU uint, maxGPUMem uint64, unitsOfAllocation []int, cardCount int, live bool) (alloc GPUAllocations, err kv.Error)

AllocGPU will select the default allocation pool for GPUs and call the allocation for it.

type GPUTrack ¶

type GPUTrack struct {
	UUID       string              // The UUID designation for the GPU being managed
	Slots      uint                // The number of logical slots the GPU based on its throughput/size has
	Mem        uint64              // The amount of memory the GPU has
	Allocated  bool                // Indicates is the card is allocated currently
	EccFailure *kv.Error           // Any Ecc failure related error messages, nil if no kv.encountered
	Tracking   map[string]struct{} // Used to validate allocations as they are released
}

GPUTrack is used to track usage of GPU cards and any kv.generated by the cards at the hardware level

func GPUInventory ¶

func GPUInventory() (gpus []GPUTrack, err kv.Error)

GPUInventory can be used to extract a copy of the current state of the GPU hardware seen within the runner

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL