nodemanager

package

v0.0.11 Latest Latest Go to latest Published: Jul 2, 2020 License: Apache-2.0 Imports: 20 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/elotl/kip

Links

Open Source Insights

Documentation ¶

Overview ¶

not sure if this is a good pattern for decoupling the pod_controller from the node controller... Going to give it a try.

Index ¶

Variables
type BindingNodeScaler
- func NewBindingNodeScaler(nodeReg StatusUpdater, standbyNodes []StandbyNodeSpec, ...) *BindingNodeScaler
- func (s *BindingNodeScaler) Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)
type NodeController
type NodeControllerConfig
type NodeDispenser
- func NewNodeDispenser() *NodeDispenser
- func (e *NodeDispenser) RequestNode(requestingPod api.Pod) NodeReply
- func (e *NodeDispenser) ReturnNode(nodeName string, unused bool)
type NodeReply
type NodeRequest
type NodeReturn
type ScalingAlgorithm
type StandbyNodeSpec
type StatusUpdater

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	BootTimeout         time.Duration = 300 * time.Second
	HealthyTimeout      time.Duration = 90 * time.Second
	HealthcheckPause    time.Duration = 5 * time.Second
	SpotRequestPause    time.Duration = 60 * time.Second
	BootImage           cloud.Image   = cloud.Image{}
	MaxBootPerIteration int           = 10
)

Making these vars makes it easier testing non-const timeouts were endorsed by Mitchell Hashimoto

Functions ¶

This section is empty.

Types ¶

type BindingNodeScaler ¶

type BindingNodeScaler struct {
	// contains filtered or unexported fields
}

func NewBindingNodeScaler ¶

func NewBindingNodeScaler(nodeReg StatusUpdater, standbyNodes []StandbyNodeSpec, cloudStatus cloud.StatusKeeper, defaultVolumeSize string, fixedSizeVolume bool) *BindingNodeScaler

func (*BindingNodeScaler) Compute ¶

func (s *BindingNodeScaler) Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)

A brief summary of how we figure out what nodes need to be started and what nodes need to be shut down:

1. We only care about watiting pods and available or creat(ing|ed) nodes.

2. Re-generate the podNodeBinding map by looking at the existing bindings from nodes to pods in each node's node.Status.BoundPodName. Also make sure that any pods listed in there are actually still waiting (because the user might have killed a pod). Along the way we keep track of unbound pods and nodes.

3. Match any unbound pods to unbound nodes. Before doing that, order our pods and nodes so that we choose the most specific matches for our pods and nodes). E.g. a pod with a supplied Placement will match a specific node before a pod with no placement spec matches/takes that specific node.

4. Any remaining unbound pods that haven't been matched will get a node booted for them with the exception of node requests that we know cannot be fulfilled due to unavailability in the cloud.

5. Finally, make sure that we have enough nodes to satisfy our standby pools of nodes.

At the end of this process, return the nodes that we should start, the nodes that need to be shut down and the current bindings map (so that the dispatcher can be fast).

type NodeController ¶

type NodeController struct {
	Config             NodeControllerConfig
	NodeRegistry       *registry.NodeRegistry
	LogRegistry        *registry.LogRegistry
	PodReader          registry.PodLister
	NodeDispenser      *NodeDispenser
	NodeScaler         ScalingAlgorithm
	CloudClient        cloud.CloudClient
	NodeClientFactory  nodeclient.ItzoClientFactoryer
	Events             *events.EventSystem
	PoolLoopTimer      *stats.LoopTimer
	ImageIdCache       *timeoutmap.TimeoutMap
	CloudInitFile      *cloudinitfile.File
	CertificateFactory *certs.CertificateFactory
	CloudStatus        cloud.StatusKeeper
	BootImageSpec      cloud.BootImageSpec
}

func (*NodeController) Dump ¶

func (c *NodeController) Dump() []byte

func (*NodeController) ResumeWaits ¶

func (c *NodeController) ResumeWaits()

When we restart the server, we had old nodes that we were starting or restarting and we were polling them in order to know when to change their state to available. We need to restart those polls.

func (*NodeController) Start ¶

func (c *NodeController) Start(quit <-chan struct{}, wg *sync.WaitGroup)

func (*NodeController) StopCreatingNodes ¶

func (c *NodeController) StopCreatingNodes()

If the controller was shut down while creating a node, it will remain in creating indeffinately since we don't have an instanceID for the node. Kill it here.

type NodeControllerConfig ¶

type NodeControllerConfig struct {
	PoolInterval      time.Duration
	HeartbeatInterval time.Duration
	ReaperInterval    time.Duration
	ItzoVersion       string
	ItzoURL           string
	CellConfig        map[string]string
}

when configuring these intervals we want the following constraints to be satisfied:

1. The pool interval should be longer than the heartbeat interval 2. The heartbeat interval should be longer than the heartbeat client timeout.

type NodeDispenser ¶

type NodeDispenser struct {
	NodeRequestChan chan NodeRequest
	NodeReturnChan  chan NodeReturn
}

func NewNodeDispenser ¶

func NewNodeDispenser() *NodeDispenser

func (*NodeDispenser) RequestNode ¶

func (e *NodeDispenser) RequestNode(requestingPod api.Pod) NodeReply

we pass in a copy of the requesting pod for safety reasons.

func (*NodeDispenser) ReturnNode ¶

func (e *NodeDispenser) ReturnNode(nodeName string, unused bool)

type NodeReply ¶

type NodeReply struct {
	Node *api.Node
	// When there's no binding for a pod, that either means the
	// pod is new or something might have gone wrong with the pod
	// spec, possibly it was created by a replicaSet and we can't
	// satisfy the placement spec of the pod.  We use NoBinding
	// to signal that we can't currently create a node for the pod.
	// if a pod remains unbound for too long, we can act accordingly
	// (e.g. for a replicaSet pod, we kill the pod).
	NoBinding bool
}

type NodeRequest ¶

type NodeRequest struct {
	ReplyChan chan NodeReply
	// contains filtered or unexported fields
}

type NodeReturn ¶

type NodeReturn struct {
	NodeName string
	Unused   bool
}

type ScalingAlgorithm ¶

type ScalingAlgorithm interface {
	// todo, figure out what we really need to pass in
	// and return value will likely get much more complex
	//Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, int)
	Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)
}

type StandbyNodeSpec ¶

type StandbyNodeSpec struct {
	InstanceType string `json:"instanceType"`
	Count        int    `json:"count"`
	Spot         bool   `json:"spot"`
}

Used externally in provider.yaml to specify a buffered node

type StatusUpdater ¶

type StatusUpdater interface {
	UpdateStatus(*api.Node) (*api.Node, error)
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL