Sig-Node community has reached a general consensus, as a best practice, to
avoid introducing any new checkpointing support. We reached this understanding
after struggling with some hard-to-debug issues in the production environments
caused by the checkpointing.
Any changes to the checkpointed data structure would be considered incompatible and a component should add its own handling if it needs to ensure backward compatibility of reading old-format checkpoint files.
Introduction
This folder contains a framework & primitives, Checkpointing Manager, which is
used by several other Kubelet submodules, dockershim, devicemanager, pods
and cpumanager, to implement checkpointing at each submodule level. As already
explained in above Disclaimer section, think twice before introducing any further
checkpointing in Kubelet. If still checkpointing is required, then this folder
provides the common APIs and the framework for implementing checkpointing.
Using same APIs across all the submodules will help maintaining consistency at
Kubelet level.
Below is the history of checkpointing support in Kubelet.
type CheckpointManager interface {
// CreateCheckpoint persists checkpoint in CheckpointStore. checkpointKey is the key for utilstore to locate checkpoint.// For file backed utilstore, checkpointKey is the file name to write the checkpoint data.
CreateCheckpoint(checkpointKey string, checkpoint Checkpoint) error// GetCheckpoint retrieves checkpoint from CheckpointStore. GetCheckpoint(checkpointKey string, checkpoint Checkpoint) error// WARNING: RemoveCheckpoint will not return error if checkpoint does not exist. RemoveCheckpoint(checkpointKey string) error// ListCheckpoint returns the list of existing checkpoints. ListCheckpoints() ([]string, error)
}
CheckpointManager provides the interface to manage checkpoint