Documentation ¶
Index ¶
- func BestFit(req *sproto.AllocateRequest, agent *agentState) float64
- func MakeFitFunction(fittingPolicy string) func(*sproto.AllocateRequest, *agentState) float64
- func Single[K comparable, V any](m map[K]V) (kr K, vr V, ok bool)
- func WorstFit(req *sproto.AllocateRequest, agent *agentState) float64
- type HardConstraint
- type ResourceManager
- func (a ResourceManager) CheckMaxSlotsExceeded(name string, slots int) (bool, error)
- func (a ResourceManager) DisableSlot(req *apiv1.DisableSlotRequest) (resp *apiv1.DisableSlotResponse, err error)
- func (a ResourceManager) EnableSlot(req *apiv1.EnableSlotRequest) (resp *apiv1.EnableSlotResponse, err error)
- func (a ResourceManager) GetAgents(msg *apiv1.GetAgentsRequest) (resp *apiv1.GetAgentsResponse, err error)
- func (a *ResourceManager) GetSlot(req *apiv1.GetSlotRequest) (resp *apiv1.GetSlotResponse, err error)
- func (a ResourceManager) NotifyContainerRunning(msg sproto.NotifyContainerRunning) error
- func (a ResourceManager) ResolveResourcePool(name string, workspaceID, slots int) (string, error)
- func (a ResourceManager) TaskContainerDefaults(pool string, fallbackConfig model.TaskContainerDefaultsConfig) (result model.TaskContainerDefaultsConfig, err error)
- func (a ResourceManager) ValidateResourcePool(name string) error
- func (a ResourceManager) ValidateResourcePoolAvailability(name string, slots int) ([]command.LaunchWarning, error)
- func (a ResourceManager) ValidateResources(name string, slots int, command bool) error
- type Scheduler
- type SoftConstraint
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BestFit ¶
func BestFit(req *sproto.AllocateRequest, agent *agentState) float64
BestFit returns a float affinity score between 0 and 1 for the affinity between the task and the agent. This method attempts to allocate tasks to the agent that is both most utilized and offers the fewest slots. This method should be used when the cluster is dominated by multi-slot applications.
func MakeFitFunction ¶
func MakeFitFunction(fittingPolicy string) func( *sproto.AllocateRequest, *agentState) float64
MakeFitFunction returns the corresponding fitting function.
func Single ¶
func Single[K comparable, V any](m map[K]V) (kr K, vr V, ok bool)
Single asserts there's a single element in the map and take it.
func WorstFit ¶
func WorstFit(req *sproto.AllocateRequest, agent *agentState) float64
WorstFit returns a float affinity score between 0 and 1 for the affinity between the task and the agent. This method attempts to allocate tasks to the agent that is least utilized. This method should be used when the cluster is dominated by single-slot applications.
Types ¶
type HardConstraint ¶
type HardConstraint func(req *sproto.AllocateRequest, agent *agentState) bool
HardConstraint returns true if the task can be assigned to the agent and false otherwise.
type ResourceManager ¶
type ResourceManager struct {
*actorrm.ResourceManager
}
ResourceManager is a resource manager for Determined-managed resources.
func New ¶
func New( system *actor.System, db *db.PgDB, echo *echo.Echo, config *config.ResourceConfig, opts *aproto.MasterSetAgentOptions, cert *tls.Certificate, ) *ResourceManager
New returns a new ResourceManager, which manages communicating with and scheduling on Determined agents.
func (ResourceManager) CheckMaxSlotsExceeded ¶
func (a ResourceManager) CheckMaxSlotsExceeded(name string, slots int) (bool, error)
CheckMaxSlotsExceeded checks if the job exceeded the maximum number of slots.
func (ResourceManager) DisableSlot ¶
func (a ResourceManager) DisableSlot( req *apiv1.DisableSlotRequest, ) (resp *apiv1.DisableSlotResponse, err error)
DisableSlot implements 'det slot disable...' functionality.
func (ResourceManager) EnableSlot ¶
func (a ResourceManager) EnableSlot( req *apiv1.EnableSlotRequest, ) (resp *apiv1.EnableSlotResponse, err error)
EnableSlot implements 'det slot enable...' functionality.
func (ResourceManager) GetAgents ¶
func (a ResourceManager) GetAgents( msg *apiv1.GetAgentsRequest, ) (resp *apiv1.GetAgentsResponse, err error)
GetAgents gets the state of connected agents. Go around the RM and directly to the agents actor to avoid blocking asks through it.
func (*ResourceManager) GetSlot ¶
func (a *ResourceManager) GetSlot( req *apiv1.GetSlotRequest, ) (resp *apiv1.GetSlotResponse, err error)
GetSlot implements rm.ResourceManager.
func (ResourceManager) NotifyContainerRunning ¶
func (a ResourceManager) NotifyContainerRunning( msg sproto.NotifyContainerRunning, ) error
NotifyContainerRunning receives a notification from the container to let the master know that the container is running.
func (ResourceManager) ResolveResourcePool ¶
func (a ResourceManager) ResolveResourcePool(name string, workspaceID, slots int) (string, error)
ResolveResourcePool fully resolves the resource pool name.
func (ResourceManager) TaskContainerDefaults ¶
func (a ResourceManager) TaskContainerDefaults( pool string, fallbackConfig model.TaskContainerDefaultsConfig, ) (result model.TaskContainerDefaultsConfig, err error)
TaskContainerDefaults returns TaskContainerDefaults for the specified pool.
func (ResourceManager) ValidateResourcePool ¶
func (a ResourceManager) ValidateResourcePool(name string) error
ValidateResourcePool validates existence of a resource pool.
func (ResourceManager) ValidateResourcePoolAvailability ¶
func (a ResourceManager) ValidateResourcePoolAvailability(name string, slots int) ( []command.LaunchWarning, error, )
ValidateResourcePoolAvailability is a default implementation to satisfy the interface.
func (ResourceManager) ValidateResources ¶
func (a ResourceManager) ValidateResources(name string, slots int, command bool) error
ValidateResources ensures enough resources are available for a command.
type Scheduler ¶
type Scheduler interface { Schedule(rp *resourcePool) ([]*sproto.AllocateRequest, []model.AllocationID) JobQInfo(rp *resourcePool) map[model.JobID]*sproto.RMJobInfo }
Scheduler schedules tasks on agents. Its only function Schedule is called to determine which pending requests can be fulfilled and which scheduled tasks can be terminated. Schedule is expected to ba called every time there is a change to the cluster status, for example, new agents being connected, devices being disabled, and etc,. Schedule should avoid unnecessary shuffling tasks on agents to avoid the overhead of restarting a preempted task.
func MakeScheduler ¶
func MakeScheduler(conf *config.SchedulerConfig) Scheduler
MakeScheduler returns the corresponding scheduler implementation.
func NewFairShareScheduler ¶
func NewFairShareScheduler() Scheduler
NewFairShareScheduler creates a new scheduler that schedules tasks according to the max-min fairness of groups. For groups that are above their fair share, the scheduler requests them to terminate their idle tasks until they have achieved their fair share.
func NewPriorityScheduler ¶
func NewPriorityScheduler(config *config.SchedulerConfig) Scheduler
NewPriorityScheduler creates a new scheduler that schedules tasks via priority.
func NewRoundRobinScheduler ¶
func NewRoundRobinScheduler() Scheduler
NewRoundRobinScheduler creates a new scheduler that schedules tasks via round-robin of groups sorted low to high by their current allocated slots.
type SoftConstraint ¶
type SoftConstraint func(req *sproto.AllocateRequest, agent *agentState) float64
SoftConstraint returns a score from 0 (lowest) to 1 (highest) representing how optimal is the state of the cluster if the task were assigned to the agent.