resmgr

package
v0.0.0-...-c0686e8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2022 License: Apache-2.0 Imports: 41 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Config

type Config struct {
	// HTTP port which hostmgr is listening on
	HTTPPort int `yaml:"http_port"`

	// GRPC port which hostmgr is listening on
	GRPCPort int `yaml:"grpc_port"`

	// Period to run task scheduling in seconds
	TaskSchedulingPeriod time.Duration `yaml:"task_scheduling_period"`

	// Period to run entitlement calculator
	EntitlementCaculationPeriod time.Duration `yaml:"entitlement_calculation_period"`

	// Period to run task reconciliation
	TaskReconciliationPeriod time.Duration `yaml:"task_reconciliation_period"`

	// RM Task Config
	RmTaskConfig *task.Config `yaml:"task"`

	// Config for task preemption
	PreemptionConfig *common.PreemptionConfig `yaml:"preemption"`

	// Period to run host drainer
	HostDrainerPeriod time.Duration `yaml:"host_drainer_period"`

	// This flag will enable/disable host scorer service
	EnableHostScorer bool `yaml:"enable_host_scorer"`

	// HostManagerAPIVersion is the API version that the Resource Manager
	// should use to talk to Host Manager.
	HostManagerAPIVersion api.Version `yaml:"hostmgr_api_version"`

	// UseHostPool is the config switch to use host pool in Resource manager
	UseHostPool bool `yaml:"use_host_pool"`
}

Config is Resource Manager specific configuration

type Metrics

type Metrics struct {
	APIEnqueueGangs    tally.Counter
	EnqueueGangSuccess tally.Counter
	EnqueueGangFail    tally.Counter

	APIDequeueGangs    tally.Counter
	DequeueGangSuccess tally.Counter
	DequeueGangTimeout tally.Counter

	APIGetPreemptibleTasks     tally.Counter
	GetPreemptibleTasksSuccess tally.Counter
	GetPreemptibleTasksTimeout tally.Counter

	APISetPlacements    tally.Counter
	SetPlacementSuccess tally.Counter
	SetPlacementFail    tally.Counter

	APIGetPlacements    tally.Counter
	GetPlacementSuccess tally.Counter
	GetPlacementFail    tally.Counter

	APILaunchedTasks tally.Counter

	RecoverySuccess             tally.Counter
	RecoveryFail                tally.Counter
	RecoveryRunningSuccessCount tally.Counter
	RecoveryRunningFailCount    tally.Counter
	RecoveryEnqueueFailedCount  tally.Counter
	RecoveryEnqueueSuccessCount tally.Counter
	RecoveryTimer               tally.Timer

	PlacementQueueLen tally.Gauge
	PlacementFailed   tally.Counter

	Elected tally.Gauge
}

Metrics is a placeholder for all metrics in resmgr.

func NewMetrics

func NewMetrics(scope tally.Scope) *Metrics

NewMetrics returns a new instance of resmgr.Metrics.

type RecoveryHandler

type RecoveryHandler struct {
	// contains filtered or unexported fields
}

RecoveryHandler performs recovery of jobs which are in non-terminated states and re-queues the tasks in the pending queue.

This is performed in 2 phases when the resource manager gains leadership

Phase 1 - Performs recovery of all the *running* tasks by adding to the task tracker so that the resource accounting can be done and transitions the task state machine to the correct state. Failure to perform recovery of any task in this phase results in the failure of the whole recovery process and resource manager would fail to start up. After successful completion of this phase the handler returns so that the entitlement calculation can start and resource manager doesn't block anymore incoming requests.

Phase 2 - This phase is performed in the background and involves recovery of non-running tasks by the re-enqueueing them resource manager. Failure in this phase is non-fatal.

Recovery of maintenance queue is performed

func NewRecovery

func NewRecovery(
	parent tally.Scope,
	taskStore storage.TaskStore,
	activeJobsOps ormobjects.ActiveJobsOps,
	jobConfigOps ormobjects.JobConfigOps,
	jobRuntimeOps ormobjects.JobRuntimeOps,
	handler *ServiceHandler,
	tree respool.Tree,
	config Config,
	hostmgrClient hostsvc.InternalHostServiceYARPCClient,
) *RecoveryHandler

NewRecovery initializes the RecoveryHandler

func (*RecoveryHandler) Start

func (r *RecoveryHandler) Start() error

Start loads all the jobs and tasks which are not in terminal state and requeue them

func (*RecoveryHandler) Stop

func (r *RecoveryHandler) Stop() error

Stop stops the recovery handler

type Server

type Server struct {
	sync.Mutex

	ID string // The peloton resource manager master address
	// contains filtered or unexported fields
}

Server struct for handling the zk election

func NewServer

func NewServer(
	parent tally.Scope,
	httpPort,
	grpcPort int,
	tree ServerProcess,
	recoveryHandler ServerProcess,
	entitlementCalculator ServerProcess,
	reconciler ServerProcess,
	preemptor ServerProcess,
	drainer ServerProcess,
	batchScorer ServerProcess) *Server

NewServer will create the elect handle object

func (*Server) GainedLeadershipCallback

func (s *Server) GainedLeadershipCallback() (err error)

GainedLeadershipCallback is the callback when the current node becomes the leader

func (*Server) GetID

func (s *Server) GetID() string

GetID function returns the peloton resource manager master address required to implement leader.Nomination

func (*Server) HasGainedLeadership

func (s *Server) HasGainedLeadership() bool

HasGainedLeadership returns true iff once GainedLeadershipCallback completes

func (*Server) LostLeadershipCallback

func (s *Server) LostLeadershipCallback() error

LostLeadershipCallback is the callback when the current node lost leadership

func (*Server) ShutDownCallback

func (s *Server) ShutDownCallback() error

ShutDownCallback is the callback to shut down gracefully if possible

type ServerProcess

type ServerProcess interface {
	Start() error
	Stop() error
}

ServerProcess is the interface for a process inside a server which starts and stops based on the leadership delegation of the server

type ServiceHandler

type ServiceHandler struct {
	// contains filtered or unexported fields
}

ServiceHandler implements peloton.private.resmgr.ResourceManagerService

func NewServiceHandler

func NewServiceHandler(
	d *yarpc.Dispatcher,
	parent tally.Scope,
	rmTracker rmtask.Tracker,
	batchScorer hostmover.Scorer,
	tree respool.Tree,
	preemptionQueue preemption.Queue,
	hostmgrClient hostsvc.InternalHostServiceYARPCClient,
	conf Config) *ServiceHandler

NewServiceHandler initializes the handler for ResourceManagerService

func NewTestServiceHandler

func NewTestServiceHandler() *ServiceHandler

NewTestServiceHandler returns an empty new ServiceHandler ptr for testing.

func (*ServiceHandler) DequeueGangs

DequeueGangs implements ResourceManagerService.DequeueGangs

func (*ServiceHandler) EnqueueGangs

EnqueueGangs implements ResourceManagerService.EnqueueGangs

func (*ServiceHandler) GetActiveTasks

GetActiveTasks returns the active tasks in the scheduler based on the filters The filters can be particular task states, job ID or resource pool ID.

func (*ServiceHandler) GetHostsByScores

GetHostsByScores returns a list of batch hosts with lowest host scores

func (*ServiceHandler) GetOrphanTasks

GetOrphanTasks returns the list of orphan tasks

func (*ServiceHandler) GetPendingTasks

GetPendingTasks returns the pending tasks from a resource pool in the order in which they were added up to a max limit number of gangs. Eg specifying a limit of 10 would return pending tasks from the first 10 gangs in the queue. The tasks are grouped according to their gang membership since a gang is the unit of admission.

func (*ServiceHandler) GetPlacements

GetPlacements implements ResourceManagerService.GetPlacements

func (*ServiceHandler) GetPreemptibleTasks

GetPreemptibleTasks returns tasks which need to be preempted from the resource pool

func (*ServiceHandler) GetStreamHandler

func (h *ServiceHandler) GetStreamHandler() *eventstream.Handler

GetStreamHandler returns the stream handler

func (*ServiceHandler) GetTasksByHosts

GetTasksByHosts returns all tasks of the given task type running on the given list of hosts.

func (*ServiceHandler) KillTasks

KillTasks kills the task

func (*ServiceHandler) NotifyTaskUpdates

NotifyTaskUpdates is called by HM to notify task updates

func (*ServiceHandler) SetPlacements

SetPlacements implements ResourceManagerService.SetPlacements

func (*ServiceHandler) UpdateTasksState

UpdateTasksState will be called to notify the resource manager about the tasks which have been moved to cooresponding state , by that resource manager can take appropriate actions for those tasks. As an example if the tasks been launched then job manager will call resource manager to notify it is launched by that resource manager can stop timer for launching state. Similarly if task is been failed to be launched in host manager due to valid failure then job manager will tell resource manager about the task to be killed by that resource manager can remove the task from the tracker and relevant resource accounting can be done.

Directories

Path Synopsis
Package respool is responsible for 1.
Package respool is responsible for 1.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL