debuggingsnapshot

package
v0.0.0-...-83b693c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 8, 2025 License: Apache-2.0 Imports: 8 Imported by: 0

README

Debugging Snapshotter

It's a tool to visualize the internal state of cluster-autoscaler at a point in time to help debug autoscaling issues.

Requirements

Require Cluster-autoscaler versions 1.24+

What data snapshotter can capture?

https://github.com/kubernetes/autoscaler/blob/8cf630a3e33ed3656cb4e669461bec197b77f2bb/cluster-autoscaler/debuggingsnapshot/debugging_snapshot.go#L60C1-L71C1

type DebuggingSnapshotImpl struct {
	NodeList                      []*ClusterNode          `json:"NodeList"`
	UnscheduledPodsCanBeScheduled []*v1.Pod               `json:"UnscheduledPodsCanBeScheduled"`
	Error                         string                  `json:"Error,omitempty"`
	StartTimestamp                time.Time               `json:"StartTimestamp"`
	EndTimestamp                  time.Time               `json:"EndTimestamp"`
	TemplateNodes                 map[string]*ClusterNode `json:"TemplateNodes"`
}

Development

Add the following flag to your cluster-autoscaler configuration to enable the snapshotter feature.

--debugging-snapshot-enabled=true

To access the sapshot from the command line, use the following command:

 curl http://127.0.0.1:8085/snapshotz > FIlE_NAME.json

How to nevigate JSON file?

cat FIlE_NAME.json | jq 'keys'
cat FIlE_NAME.json | jq '.NodeList | keys' //to see how many nodes are running
cat FIlE_NAME.json | jq '.TempletsNodes | keys' //to see templated nodes
cat FIlE_NAME.json | jq '.UnscheduledPodsCanBeScheduled | keys' //to see unscheduled pods that can be scheduled

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ClusterNode

type ClusterNode struct {
	Node *v1.Node  `json:"Node"`
	Pods []*v1.Pod `json:"Pods"`
}

ClusterNode captures a single entity of nodeInfo. i.e. Node specs and all the pods on that node.

func GetClusterNodeCopy

func GetClusterNodeCopy(template *framework.NodeInfo) *ClusterNode

GetClusterNodeCopy is an util func to copy template node and filter values

type DebuggingSnapshot

type DebuggingSnapshot interface {
	// SetClusterNodes is a setter to capture all the ClusterNode
	SetClusterNodes([]*framework.NodeInfo)
	// SetUnscheduledPodsCanBeScheduled is a setter for all pods which are unscheduled,
	// but they can be scheduled. i.e. pods which aren't triggering scale-up
	SetUnscheduledPodsCanBeScheduled([]*v1.Pod)
	// SetTemplateNodes is a setter for all the TemplateNodes present in the cluster
	// incl. templates for which there are no nodes
	SetTemplateNodes(map[string]*framework.NodeInfo)
	// SetErrorMessage sets the error message in the snapshot
	SetErrorMessage(string)
	// SetEndTimestamp sets the timestamp in the snapshot,
	// when all the data collection is finished
	SetEndTimestamp(time.Time)
	// SetStartTimestamp sets the timestamp in the snapshot,
	// when all the data collection is started
	SetStartTimestamp(time.Time)
	// GetOutputBytes return the output state of the Snapshot with bool to specify if
	// the snapshot has the error message set
	GetOutputBytes() ([]byte, bool)
	// Cleanup clears the internal data obj of the snapshot, readying for next request
	Cleanup()
}

DebuggingSnapshot is the interface used to define any debugging snapshot implementation, incl. any custom impl. to be used by DebuggingSnapshotter

type DebuggingSnapshotImpl

type DebuggingSnapshotImpl struct {
	NodeList                      []*ClusterNode          `json:"NodeList"`
	UnscheduledPodsCanBeScheduled []*v1.Pod               `json:"UnscheduledPodsCanBeScheduled"`
	Error                         string                  `json:"Error,omitempty"`
	StartTimestamp                time.Time               `json:"StartTimestamp"`
	EndTimestamp                  time.Time               `json:"EndTimestamp"`
	TemplateNodes                 map[string]*ClusterNode `json:"TemplateNodes"`
}

DebuggingSnapshotImpl is the struct used to collect all the data to be output. Please add all new output fields in this struct. This is to make the data encoding/decoding easier as the single object going into the decoder

func (*DebuggingSnapshotImpl) Cleanup

func (s *DebuggingSnapshotImpl) Cleanup()

Cleanup cleans up all the data in the snapshot without changing the pointer reference

func (*DebuggingSnapshotImpl) GetOutputBytes

func (s *DebuggingSnapshotImpl) GetOutputBytes() ([]byte, bool)

GetOutputBytes return the output state of the Snapshot with bool to specify if the snapshot has the error message set

func (*DebuggingSnapshotImpl) SetClusterNodes

func (s *DebuggingSnapshotImpl) SetClusterNodes(nodeInfos []*framework.NodeInfo)

SetClusterNodes is the setter for Node Group Info All filtering/prettifying of data should be done here.

func (*DebuggingSnapshotImpl) SetEndTimestamp

func (s *DebuggingSnapshotImpl) SetEndTimestamp(t time.Time)

SetEndTimestamp is the setter for end timestamp

func (*DebuggingSnapshotImpl) SetErrorMessage

func (s *DebuggingSnapshotImpl) SetErrorMessage(error string)

SetErrorMessage sets the error message in the snapshot

func (*DebuggingSnapshotImpl) SetStartTimestamp

func (s *DebuggingSnapshotImpl) SetStartTimestamp(t time.Time)

SetStartTimestamp is the setter for end timestamp

func (*DebuggingSnapshotImpl) SetTemplateNodes

func (s *DebuggingSnapshotImpl) SetTemplateNodes(templates map[string]*framework.NodeInfo)

SetTemplateNodes is the setter for TemplateNodes

func (*DebuggingSnapshotImpl) SetUnscheduledPodsCanBeScheduled

func (s *DebuggingSnapshotImpl) SetUnscheduledPodsCanBeScheduled(podList []*v1.Pod)

SetUnscheduledPodsCanBeScheduled is the setter for UnscheduledPodsCanBeScheduled

type DebuggingSnapshotter

type DebuggingSnapshotter interface {

	// StartDataCollection will check the State(s) and enable data
	// collection for the loop if applicable
	StartDataCollection()
	// SetClusterNodes is a setter to capture all the ClusterNode
	SetClusterNodes([]*framework.NodeInfo)
	// SetUnscheduledPodsCanBeScheduled is a setter for all pods which are unscheduled
	// but they can be scheduled. i.e. pods which aren't triggering scale-up
	SetUnscheduledPodsCanBeScheduled([]*v1.Pod)
	// SetTemplateNodes is a setter for all the TemplateNodes present in the cluster
	// incl. templates for which there are no nodes
	SetTemplateNodes(map[string]*framework.NodeInfo)
	// ResponseHandler is the http response handler to manage incoming requests
	ResponseHandler(http.ResponseWriter, *http.Request)
	// IsDataCollectionAllowed checks the internal State of the snapshotter
	// to find if data can be collected. This can be used before preprocessing
	// for the snapshot
	IsDataCollectionAllowed() bool
	// Flush triggers the flushing of the snapshot
	Flush()
	// Cleanup clears the internal data beans of the snapshot, readying for next request
	Cleanup()
}

DebuggingSnapshotter is the interface for debugging snapshot

func NewDebuggingSnapshotter

func NewDebuggingSnapshotter(isDebuggerEnabled bool) DebuggingSnapshotter

NewDebuggingSnapshotter returns a new instance of DebuggingSnapshotter

type DebuggingSnapshotterImpl

type DebuggingSnapshotterImpl struct {
	// State captures the internal state of the snapshotter
	State *DebuggingSnapshotterState
	// DebuggingSnapshot is the data bean for the snapshot
	DebuggingSnapshot DebuggingSnapshot
	// Mutex is the synchronisation used to the methods/states in the critical section
	Mutex *sync.Mutex
	// Trigger is the channel on which the Response Handler waits on to know
	// when there is data to be flushed back to the channel from the Snapshot
	Trigger chan struct{}
	// CancelRequest is the cancel function for the snapshot request. It is used to
	// terminate any ongoing request when CA is shutting down
	CancelRequest context.CancelFunc
}

DebuggingSnapshotterImpl is the impl for DebuggingSnapshotter

func (*DebuggingSnapshotterImpl) Cleanup

func (d *DebuggingSnapshotterImpl) Cleanup()

Cleanup clears the internal data sets of the cluster

func (*DebuggingSnapshotterImpl) Flush

func (d *DebuggingSnapshotterImpl) Flush()

Flush is the impl for DebuggingSnapshotter.Flush It checks if any data has been collected or data collection failed

func (*DebuggingSnapshotterImpl) IsDataCollectionAllowed

func (d *DebuggingSnapshotterImpl) IsDataCollectionAllowed() bool

IsDataCollectionAllowed encapsulate the check to know if data collection is currently active This should be used by setters and by any function that is contingent on data collection State before doing extra processing. e.g. If you want to pre-process a particular State in cloud-provider for snapshot you should check this func in the loop before doing that extra processing

func (*DebuggingSnapshotterImpl) IsDataCollectionAllowedNoLock

func (d *DebuggingSnapshotterImpl) IsDataCollectionAllowedNoLock() bool

IsDataCollectionAllowedNoLock encapsulated the check to know if data collection is currently active The need for NoLock implementation is for cases when the caller funcs have procured the lock for a single transactional execution

func (*DebuggingSnapshotterImpl) ResponseHandler

func (d *DebuggingSnapshotterImpl) ResponseHandler(w http.ResponseWriter, r *http.Request)

ResponseHandler is the impl for request handler

func (*DebuggingSnapshotterImpl) SetClusterNodes

func (d *DebuggingSnapshotterImpl) SetClusterNodes(nodeInfos []*framework.NodeInfo)

SetClusterNodes is the setter for Node Group Info All filtering/prettifying of data should be done here.

func (*DebuggingSnapshotterImpl) SetTemplateNodes

func (d *DebuggingSnapshotterImpl) SetTemplateNodes(templates map[string]*framework.NodeInfo)

SetTemplateNodes is the setter for TemplateNodes

func (*DebuggingSnapshotterImpl) SetUnscheduledPodsCanBeScheduled

func (d *DebuggingSnapshotterImpl) SetUnscheduledPodsCanBeScheduled(podList []*v1.Pod)

SetUnscheduledPodsCanBeScheduled is the setter for UnscheduledPodsCanBeScheduled

func (*DebuggingSnapshotterImpl) StartDataCollection

func (d *DebuggingSnapshotterImpl) StartDataCollection()

StartDataCollection changes the State when the trigger has been enabled to start data collection. To be done at the start of the runLoop to allow for consistency as the trigger can be called mid-loop leading to partial data collection

type DebuggingSnapshotterState

type DebuggingSnapshotterState int

DebuggingSnapshotterState is the type for the debugging snapshot State machine The states guide the workflow of the snapshot.

const (
	// SNAPSHOTTER_DISABLED is when debuggingSnapshot is disabled on the cluster and no action can be taken
	SNAPSHOTTER_DISABLED DebuggingSnapshotterState = iota + 1
	// LISTENING is set when snapshotter is enabled on the cluster and is ready to listen to a
	// snapshot request. Used by ResponseHandler to wait on to listen to request
	LISTENING
	// TRIGGER_ENABLED is set by ResponseHandler if a valid snapshot request is received
	// it states that a snapshot request needs to be processed
	TRIGGER_ENABLED
	// START_DATA_COLLECTION is used to synchronise the collection of data.
	// Since the trigger is an asynchronous process, data collection could be started mid-loop
	// leading to incomplete data. So setter methods wait for START_DATA_COLLECTION before collecting data
	// which is set at the start of the next loop after receiving the trigger
	START_DATA_COLLECTION
	// DATA_COLLECTED is set by setter func (also used by setter func for data collection)
	// This is set to let Flush know that at least some data collected and there isn't
	// an error State leading to no data collection
	DATA_COLLECTED
)

DebuggingSnapshotterState help navigate the different workflows of the snapshot capture.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL