Documentation ¶
Index ¶
- type Config
- type Metrics
- type RecoveryHandler
- type Server
- type ServerProcess
- type ServiceHandler
- func (h *ServiceHandler) DequeueGangs(ctx context.Context, req *resmgrsvc.DequeueGangsRequest) (*resmgrsvc.DequeueGangsResponse, error)
- func (h *ServiceHandler) EnqueueGangs(ctx context.Context, req *resmgrsvc.EnqueueGangsRequest) (*resmgrsvc.EnqueueGangsResponse, error)
- func (h *ServiceHandler) GetActiveTasks(ctx context.Context, req *resmgrsvc.GetActiveTasksRequest) (*resmgrsvc.GetActiveTasksResponse, error)
- func (h *ServiceHandler) GetHostsByScores(ctx context.Context, req *resmgrsvc.GetHostsByScoresRequest) (*resmgrsvc.GetHostsByScoresResponse, error)
- func (h *ServiceHandler) GetOrphanTasks(ctx context.Context, req *resmgrsvc.GetOrphanTasksRequest) (*resmgrsvc.GetOrphanTasksResponse, error)
- func (h *ServiceHandler) GetPendingTasks(ctx context.Context, req *resmgrsvc.GetPendingTasksRequest) (*resmgrsvc.GetPendingTasksResponse, error)
- func (h *ServiceHandler) GetPlacements(ctx context.Context, req *resmgrsvc.GetPlacementsRequest) (*resmgrsvc.GetPlacementsResponse, error)
- func (h *ServiceHandler) GetPreemptibleTasks(ctx context.Context, req *resmgrsvc.GetPreemptibleTasksRequest) (*resmgrsvc.GetPreemptibleTasksResponse, error)
- func (h *ServiceHandler) GetStreamHandler() *eventstream.Handler
- func (h *ServiceHandler) GetTasksByHosts(ctx context.Context, req *resmgrsvc.GetTasksByHostsRequest) (*resmgrsvc.GetTasksByHostsResponse, error)
- func (h *ServiceHandler) KillTasks(ctx context.Context, req *resmgrsvc.KillTasksRequest) (*resmgrsvc.KillTasksResponse, error)
- func (h *ServiceHandler) NotifyTaskUpdates(ctx context.Context, req *resmgrsvc.NotifyTaskUpdatesRequest) (*resmgrsvc.NotifyTaskUpdatesResponse, error)
- func (h *ServiceHandler) SetPlacements(ctx context.Context, req *resmgrsvc.SetPlacementsRequest) (*resmgrsvc.SetPlacementsResponse, error)
- func (h *ServiceHandler) UpdateTasksState(ctx context.Context, req *resmgrsvc.UpdateTasksStateRequest) (*resmgrsvc.UpdateTasksStateResponse, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct { // HTTP port which hostmgr is listening on HTTPPort int `yaml:"http_port"` // GRPC port which hostmgr is listening on GRPCPort int `yaml:"grpc_port"` // Period to run task scheduling in seconds TaskSchedulingPeriod time.Duration `yaml:"task_scheduling_period"` // Period to run entitlement calculator EntitlementCaculationPeriod time.Duration `yaml:"entitlement_calculation_period"` // Period to run task reconciliation TaskReconciliationPeriod time.Duration `yaml:"task_reconciliation_period"` // RM Task Config RmTaskConfig *task.Config `yaml:"task"` // Config for task preemption PreemptionConfig *common.PreemptionConfig `yaml:"preemption"` // Period to run host drainer HostDrainerPeriod time.Duration `yaml:"host_drainer_period"` // This flag will enable/disable host scorer service EnableHostScorer bool `yaml:"enable_host_scorer"` // HostManagerAPIVersion is the API version that the Resource Manager // should use to talk to Host Manager. HostManagerAPIVersion api.Version `yaml:"hostmgr_api_version"` // UseHostPool is the config switch to use host pool in Resource manager UseHostPool bool `yaml:"use_host_pool"` }
Config is Resource Manager specific configuration
type Metrics ¶
type Metrics struct { APIEnqueueGangs tally.Counter EnqueueGangSuccess tally.Counter EnqueueGangFail tally.Counter APIDequeueGangs tally.Counter DequeueGangSuccess tally.Counter DequeueGangTimeout tally.Counter APIGetPreemptibleTasks tally.Counter GetPreemptibleTasksSuccess tally.Counter GetPreemptibleTasksTimeout tally.Counter APISetPlacements tally.Counter SetPlacementSuccess tally.Counter SetPlacementFail tally.Counter APIGetPlacements tally.Counter GetPlacementSuccess tally.Counter GetPlacementFail tally.Counter APILaunchedTasks tally.Counter RecoverySuccess tally.Counter RecoveryFail tally.Counter RecoveryRunningSuccessCount tally.Counter RecoveryRunningFailCount tally.Counter RecoveryEnqueueFailedCount tally.Counter RecoveryEnqueueSuccessCount tally.Counter RecoveryTimer tally.Timer PlacementQueueLen tally.Gauge PlacementFailed tally.Counter Elected tally.Gauge }
Metrics is a placeholder for all metrics in resmgr.
func NewMetrics ¶
NewMetrics returns a new instance of resmgr.Metrics.
type RecoveryHandler ¶
type RecoveryHandler struct {
// contains filtered or unexported fields
}
RecoveryHandler performs recovery of jobs which are in non-terminated states and re-queues the tasks in the pending queue.
This is performed in 2 phases when the resource manager gains leadership ¶
Phase 1 - Performs recovery of all the *running* tasks by adding to the task tracker so that the resource accounting can be done and transitions the task state machine to the correct state. Failure to perform recovery of any task in this phase results in the failure of the whole recovery process and resource manager would fail to start up. After successful completion of this phase the handler returns so that the entitlement calculation can start and resource manager doesn't block anymore incoming requests.
Phase 2 - This phase is performed in the background and involves recovery of non-running tasks by the re-enqueueing them resource manager. Failure in this phase is non-fatal.
Recovery of maintenance queue is performed
func NewRecovery ¶
func NewRecovery( parent tally.Scope, taskStore storage.TaskStore, activeJobsOps ormobjects.ActiveJobsOps, jobConfigOps ormobjects.JobConfigOps, jobRuntimeOps ormobjects.JobRuntimeOps, handler *ServiceHandler, tree respool.Tree, config Config, hostmgrClient hostsvc.InternalHostServiceYARPCClient, ) *RecoveryHandler
NewRecovery initializes the RecoveryHandler
func (*RecoveryHandler) Start ¶
func (r *RecoveryHandler) Start() error
Start loads all the jobs and tasks which are not in terminal state and requeue them
func (*RecoveryHandler) Stop ¶
func (r *RecoveryHandler) Stop() error
Stop stops the recovery handler
type Server ¶
type Server struct { sync.Mutex ID string // The peloton resource manager master address // contains filtered or unexported fields }
Server struct for handling the zk election
func NewServer ¶
func NewServer( parent tally.Scope, httpPort, grpcPort int, tree ServerProcess, recoveryHandler ServerProcess, entitlementCalculator ServerProcess, reconciler ServerProcess, preemptor ServerProcess, drainer ServerProcess, batchScorer ServerProcess) *Server
NewServer will create the elect handle object
func (*Server) GainedLeadershipCallback ¶
GainedLeadershipCallback is the callback when the current node becomes the leader
func (*Server) GetID ¶
GetID function returns the peloton resource manager master address required to implement leader.Nomination
func (*Server) HasGainedLeadership ¶
HasGainedLeadership returns true iff once GainedLeadershipCallback completes
func (*Server) LostLeadershipCallback ¶
LostLeadershipCallback is the callback when the current node lost leadership
func (*Server) ShutDownCallback ¶
ShutDownCallback is the callback to shut down gracefully if possible
type ServerProcess ¶
ServerProcess is the interface for a process inside a server which starts and stops based on the leadership delegation of the server
type ServiceHandler ¶
type ServiceHandler struct {
// contains filtered or unexported fields
}
ServiceHandler implements peloton.private.resmgr.ResourceManagerService
func NewServiceHandler ¶
func NewServiceHandler( d *yarpc.Dispatcher, parent tally.Scope, rmTracker rmtask.Tracker, batchScorer hostmover.Scorer, tree respool.Tree, preemptionQueue preemption.Queue, hostmgrClient hostsvc.InternalHostServiceYARPCClient, conf Config) *ServiceHandler
NewServiceHandler initializes the handler for ResourceManagerService
func NewTestServiceHandler ¶
func NewTestServiceHandler() *ServiceHandler
NewTestServiceHandler returns an empty new ServiceHandler ptr for testing.
func (*ServiceHandler) DequeueGangs ¶
func (h *ServiceHandler) DequeueGangs( ctx context.Context, req *resmgrsvc.DequeueGangsRequest, ) (*resmgrsvc.DequeueGangsResponse, error)
DequeueGangs implements ResourceManagerService.DequeueGangs
func (*ServiceHandler) EnqueueGangs ¶
func (h *ServiceHandler) EnqueueGangs( ctx context.Context, req *resmgrsvc.EnqueueGangsRequest, ) (*resmgrsvc.EnqueueGangsResponse, error)
EnqueueGangs implements ResourceManagerService.EnqueueGangs
func (*ServiceHandler) GetActiveTasks ¶
func (h *ServiceHandler) GetActiveTasks( ctx context.Context, req *resmgrsvc.GetActiveTasksRequest, ) (*resmgrsvc.GetActiveTasksResponse, error)
GetActiveTasks returns the active tasks in the scheduler based on the filters The filters can be particular task states, job ID or resource pool ID.
func (*ServiceHandler) GetHostsByScores ¶
func (h *ServiceHandler) GetHostsByScores( ctx context.Context, req *resmgrsvc.GetHostsByScoresRequest, ) (*resmgrsvc.GetHostsByScoresResponse, error)
GetHostsByScores returns a list of batch hosts with lowest host scores
func (*ServiceHandler) GetOrphanTasks ¶
func (h *ServiceHandler) GetOrphanTasks( ctx context.Context, req *resmgrsvc.GetOrphanTasksRequest, ) (*resmgrsvc.GetOrphanTasksResponse, error)
GetOrphanTasks returns the list of orphan tasks
func (*ServiceHandler) GetPendingTasks ¶
func (h *ServiceHandler) GetPendingTasks( ctx context.Context, req *resmgrsvc.GetPendingTasksRequest, ) (*resmgrsvc.GetPendingTasksResponse, error)
GetPendingTasks returns the pending tasks from a resource pool in the order in which they were added up to a max limit number of gangs. Eg specifying a limit of 10 would return pending tasks from the first 10 gangs in the queue. The tasks are grouped according to their gang membership since a gang is the unit of admission.
func (*ServiceHandler) GetPlacements ¶
func (h *ServiceHandler) GetPlacements( ctx context.Context, req *resmgrsvc.GetPlacementsRequest, ) (*resmgrsvc.GetPlacementsResponse, error)
GetPlacements implements ResourceManagerService.GetPlacements
func (*ServiceHandler) GetPreemptibleTasks ¶
func (h *ServiceHandler) GetPreemptibleTasks( ctx context.Context, req *resmgrsvc.GetPreemptibleTasksRequest) (*resmgrsvc.GetPreemptibleTasksResponse, error)
GetPreemptibleTasks returns tasks which need to be preempted from the resource pool
func (*ServiceHandler) GetStreamHandler ¶
func (h *ServiceHandler) GetStreamHandler() *eventstream.Handler
GetStreamHandler returns the stream handler
func (*ServiceHandler) GetTasksByHosts ¶
func (h *ServiceHandler) GetTasksByHosts(ctx context.Context, req *resmgrsvc.GetTasksByHostsRequest) (*resmgrsvc.GetTasksByHostsResponse, error)
GetTasksByHosts returns all tasks of the given task type running on the given list of hosts.
func (*ServiceHandler) KillTasks ¶
func (h *ServiceHandler) KillTasks( ctx context.Context, req *resmgrsvc.KillTasksRequest, ) (*resmgrsvc.KillTasksResponse, error)
KillTasks kills the task
func (*ServiceHandler) NotifyTaskUpdates ¶
func (h *ServiceHandler) NotifyTaskUpdates( ctx context.Context, req *resmgrsvc.NotifyTaskUpdatesRequest) (*resmgrsvc.NotifyTaskUpdatesResponse, error)
NotifyTaskUpdates is called by HM to notify task updates
func (*ServiceHandler) SetPlacements ¶
func (h *ServiceHandler) SetPlacements( ctx context.Context, req *resmgrsvc.SetPlacementsRequest, ) (*resmgrsvc.SetPlacementsResponse, error)
SetPlacements implements ResourceManagerService.SetPlacements
func (*ServiceHandler) UpdateTasksState ¶
func (h *ServiceHandler) UpdateTasksState( ctx context.Context, req *resmgrsvc.UpdateTasksStateRequest) (*resmgrsvc.UpdateTasksStateResponse, error)
UpdateTasksState will be called to notify the resource manager about the tasks which have been moved to cooresponding state , by that resource manager can take appropriate actions for those tasks. As an example if the tasks been launched then job manager will call resource manager to notify it is launched by that resource manager can stop timer for launching state. Similarly if task is been failed to be launched in host manager due to valid failure then job manager will tell resource manager about the task to be killed by that resource manager can remove the task from the tracker and relevant resource accounting can be done.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
Package respool is responsible for 1.
|
Package respool is responsible for 1. |