nodemanager

package
v0.0.16 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 16, 2020 License: Apache-2.0 Imports: 20 Imported by: 1

Documentation

Overview

not sure if this is a good pattern for decoupling the pod_controller from the node controller... Going to give it a try.

Index

Constants

This section is empty.

Variables

View Source
var (
	BootTimeout         time.Duration = 300 * time.Second
	HealthyTimeout      time.Duration = 90 * time.Second
	HealthcheckPause    time.Duration = 5 * time.Second
	SpotRequestPause    time.Duration = 60 * time.Second
	BootImage           cloud.Image   = cloud.Image{}
	MaxBootPerIteration int           = 10
)

Making these vars makes it easier testing non-const timeouts were endorsed by Mitchell Hashimoto

Functions

This section is empty.

Types

type BindingNodeScaler

type BindingNodeScaler struct {
	// contains filtered or unexported fields
}

func NewBindingNodeScaler

func NewBindingNodeScaler(nodeReg StatusUpdater, standbyNodes []StandbyNodeSpec, cloudStatus cloud.StatusKeeper, defaultVolumeSize string, fixedSizeVolume bool) *BindingNodeScaler

func (*BindingNodeScaler) Compute

func (s *BindingNodeScaler) Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)

A brief summary of how we figure out what nodes need to be started and what nodes need to be shut down:

1. We only care about watiting pods and available or creat(ing|ed) nodes.

2. Re-generate the podNodeBinding map by looking at the existing bindings from nodes to pods in each node's node.Status.BoundPodName. Also make sure that any pods listed in there are actually still waiting (because the user might have killed a pod). Along the way we keep track of unbound pods and nodes.

3. Match any unbound pods to unbound nodes. Before doing that, order our pods and nodes so that we choose the most specific matches for our pods and nodes). E.g. a pod with a supplied Placement will match a specific node before a pod with no placement spec matches/takes that specific node.

4. Any remaining unbound pods that haven't been matched will get a node booted for them with the exception of node requests that we know cannot be fulfilled due to unavailability in the cloud.

5. Finally, make sure that we have enough nodes to satisfy our standby pools of nodes.

At the end of this process, return the nodes that we should start, the nodes that need to be shut down and the current bindings map (so that the dispatcher can be fast).

type NodeController

type NodeController struct {
	Config             NodeControllerConfig
	NodeRegistry       *registry.NodeRegistry
	LogRegistry        *registry.LogRegistry
	PodReader          registry.PodLister
	NodeDispenser      *NodeDispenser
	NodeScaler         ScalingAlgorithm
	CloudClient        cloud.CloudClient
	NodeClientFactory  nodeclient.ItzoClientFactoryer
	Events             *events.EventSystem
	PoolLoopTimer      *stats.LoopTimer
	ImageIdCache       *timeoutmap.TimeoutMap
	CloudInitFile      *cloudinitfile.File
	CertificateFactory *certs.CertificateFactory
	CloudStatus        cloud.StatusKeeper
	BootImageSpec      cloud.BootImageSpec
}

func (*NodeController) Dump

func (c *NodeController) Dump() []byte

func (*NodeController) ResumeWaits

func (c *NodeController) ResumeWaits()

When we restart the server, we had old nodes that we were starting or restarting and we were polling them in order to know when to change their state to available. We need to restart those polls.

func (*NodeController) Start

func (c *NodeController) Start(quit <-chan struct{}, wg *sync.WaitGroup)

func (*NodeController) StopCreatingNodes

func (c *NodeController) StopCreatingNodes()

If the controller was shut down while creating a node, it will remain in creating indeffinately since we don't have an instanceID for the node. Kill it here.

type NodeControllerConfig

type NodeControllerConfig struct {
	PoolInterval      time.Duration
	HeartbeatInterval time.Duration
	ReaperInterval    time.Duration
	ItzoVersion       string
	ItzoURL           string
	CellConfig        map[string]string
}

when configuring these intervals we want the following constraints to be satisfied:

1. The pool interval should be longer than the heartbeat interval 2. The heartbeat interval should be longer than the heartbeat client timeout.

type NodeDispenser

type NodeDispenser struct {
	NodeRequestChan chan NodeRequest
	NodeReturnChan  chan NodeReturn
}

func NewNodeDispenser

func NewNodeDispenser() *NodeDispenser

func (*NodeDispenser) RequestNode

func (e *NodeDispenser) RequestNode(requestingPod api.Pod) NodeReply

we pass in a copy of the requesting pod for safety reasons.

func (*NodeDispenser) ReturnNode

func (e *NodeDispenser) ReturnNode(nodeName string, unused bool)

type NodeReply

type NodeReply struct {
	Node *api.Node
	// When there's no binding for a pod, that either means the
	// pod is new or something might have gone wrong with the pod
	// spec, possibly it was created by a replicaSet and we can't
	// satisfy the placement spec of the pod.  We use NoBinding
	// to signal that we can't currently create a node for the pod.
	// if a pod remains unbound for too long, we can act accordingly
	// (e.g. for a replicaSet pod, we kill the pod).
	NoBinding bool
}

type NodeRequest

type NodeRequest struct {
	ReplyChan chan NodeReply
	// contains filtered or unexported fields
}

type NodeReturn

type NodeReturn struct {
	NodeName string
	Unused   bool
}

type ScalingAlgorithm

type ScalingAlgorithm interface {
	// todo, figure out what we really need to pass in
	// and return value will likely get much more complex
	//Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, int)
	Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)
}

type StandbyNodeSpec

type StandbyNodeSpec struct {
	InstanceType string `json:"instanceType"`
	Count        int    `json:"count"`
	Spot         bool   `json:"spot"`
}

Used externally in provider.yaml to specify a buffered node

type StatusUpdater

type StatusUpdater interface {
	UpdateStatus(*api.Node) (*api.Node, error)
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL