Documentation ¶
Overview ¶
not sure if this is a good pattern for decoupling the pod_controller from the node controller... Going to give it a try.
Index ¶
Constants ¶
const ( ParameterCACertificate = "ca.crt" ParameterServerCertificate = "server.crt" ParameterServerKey = "server.key" ParameterItzoVersion = "itzo_version" ParameterItzoURL = "itzo_url" ParameterCellConfig = "cell_config.yaml" )
Variables ¶
var ( // TODO: this was changed to handle mac1.metal boot, ideally we should have different // bootTimeouts depending on instance family BootTimeout time.Duration = 20 * time.Minute HealthyTimeout time.Duration = 90 * time.Second HealthcheckPause time.Duration = 5 * time.Second SpotRequestPause time.Duration = 60 * time.Second BootImage cloud.Image = cloud.Image{} MaxBootPerIteration int = 10 )
Making these vars makes it easier testing non-const timeouts were endorsed by Mitchell Hashimoto
Functions ¶
This section is empty.
Types ¶
type BindingNodeScaler ¶
type BindingNodeScaler struct {
// contains filtered or unexported fields
}
func NewBindingNodeScaler ¶
func NewBindingNodeScaler(nodeReg StatusUpdater, standbyNodes []StandbyNodeSpec, bootLimiter *InstanceBootLimiter, defaultVolumeSize string, fixedSizeVolume bool) *BindingNodeScaler
func (*BindingNodeScaler) Compute ¶
func (s *BindingNodeScaler) Compute(nodes []*api.Node, pods []*api.Pod) ([]*api.Node, []*api.Node, map[string]string)
A brief summary of how we figure out what nodes need to be started and what nodes need to be shut down:
1. We only care about watiting pods and available or creat(ing|ed) nodes.
2. Re-generate the podNodeBinding map by looking at the existing bindings from nodes to pods in each node's node.Status.BoundPodName. Also make sure that any pods listed in there are actually still waiting (because the user might have killed a pod). Along the way we keep track of unbound pods and nodes.
3. Match any unbound pods to unbound nodes. Before doing that, we need to ensure that we order our pods and nodes so that we choose the most specific matches for our pods and nodes).
4. Any remaining unbound pods that haven't been matched will get a node booted for them with the exception of node requests that we know cannot be fulfilled due to unavailability in the cloud.
5. Finally, make sure that we have enough nodes to satisfy our standby pools of nodes.
At the end of this process, return the nodes that we should start, the nodes that need to be shut down and the current bindings map (so that the dispatcher can be fast).
type InstanceBootLimiter ¶ added in v1.0.0
type InstanceBootLimiter struct {
// contains filtered or unexported fields
}
func NewInstanceBootLimiter ¶ added in v1.0.0
func NewInstanceBootLimiter() *InstanceBootLimiter
func (*InstanceBootLimiter) AddUnavailableInstance ¶ added in v1.0.0
func (s *InstanceBootLimiter) AddUnavailableInstance(instanceType string, spot bool)
func (*InstanceBootLimiter) IsUnavailableInstance ¶ added in v1.0.0
func (s *InstanceBootLimiter) IsUnavailableInstance(instanceType string, spot bool) bool
func (*InstanceBootLimiter) Start ¶ added in v1.0.0
func (s *InstanceBootLimiter) Start()
type InstanceConfig ¶ added in v1.0.5
type NodeController ¶
type NodeController struct { Config NodeControllerConfig NodeRegistry *registry.NodeRegistry LogRegistry *registry.LogRegistry PodReader registry.PodLister NodeDispenser *NodeDispenser NodeScaler ScalingAlgorithm CloudClient cloud.CloudClient NodeClientFactory nodeclient.ItzoClientFactoryer Events *events.EventSystem PoolLoopTimer *stats.LoopTimer ImageIdCache *timeoutmap.TimeoutMap CloudInitFile *cloudinitfile.File CertificateFactory *certs.CertificateFactory BootLimiter *InstanceBootLimiter BootImageSpec cloud.BootImageSpec }
func (*NodeController) Dump ¶
func (c *NodeController) Dump() []byte
func (*NodeController) ResumeWaits ¶
func (c *NodeController) ResumeWaits()
When we restart the server, we had old nodes that we were starting or restarting and we were polling them in order to know when to change their state to available. We need to restart those polls.
func (*NodeController) Start ¶
func (c *NodeController) Start(quit <-chan struct{}, wg *sync.WaitGroup)
func (*NodeController) StopCreatingNodes ¶
func (c *NodeController) StopCreatingNodes()
If the controller was shut down while creating a node, it will remain in creating indeffinately since we don't have an instanceID for the node. Kill it here.
type NodeControllerConfig ¶
type NodeControllerConfig struct { PoolInterval time.Duration HeartbeatInterval time.Duration ReaperInterval time.Duration ItzoVersion string ItzoURL string CellConfig map[string]string UseCloudParameterStore bool DefaultIAMPermissions string }
when configuring these intervals we want the following constraints to be satisfied:
1. The pool interval should be longer than the heartbeat interval 2. The heartbeat interval should be longer than the heartbeat client timeout.
type NodeDispenser ¶
type NodeDispenser struct { NodeRequestChan chan NodeRequest NodeReturnChan chan NodeReturn }
func NewNodeDispenser ¶
func NewNodeDispenser() *NodeDispenser
func (*NodeDispenser) RequestNode ¶
func (e *NodeDispenser) RequestNode(requestingPod api.Pod) NodeReply
we pass in a copy of the requesting pod for safety reasons.
func (*NodeDispenser) ReturnNode ¶
func (e *NodeDispenser) ReturnNode(nodeName string, unused bool)
type NodeReply ¶
type NodeReply struct { Node *api.Node // When there's no binding for a pod, that either means the // pod is new or something might have gone wrong with the pod // spec, possibly it was created by a replicaSet and we can't // satisfy the placement spec of the pod. We use NoBinding // to signal that we can't currently create a node for the pod. // if a pod remains unbound for too long, we can act accordingly // (e.g. for a replicaSet pod, we kill the pod). NoBinding bool }
type NodeRequest ¶
type NodeRequest struct { ReplyChan chan NodeReply // contains filtered or unexported fields }