bootstrapmonitor

package

v1.12.0-initial-poc.3 Latest Latest Go to latest Published: Oct 4, 2024 License: BSD-3-Clause Imports: 27 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ava-labs/avalanchego

README ¶

bootstrap-monitor

Code rooted at this package implements a bootstrap-monitor binary intended to enable continous bootstrap testing for avalanchego networks.

Bootstrap testing

Bootstrapping an avalanchego node on a persistent network like mainnet or fuji requires that the version of avalanchego that the node is running be compatible with the historical data of that network. Bootstrapping regularly is a good way of insuring against regressions in compatibility.

Types of bootstrap testing for C-Chain

The X-Chain and P-Chain always synchronize all state, but the bulk of data for testnet and mainnet is on the C-Chain and there are 2 options:

State Sync

A bootstrap with state sync enabled (the default) ensures that only recent blocks will be processed.

Full Sync

All history will be processed, though with pruning (enabled by default) not all history will be stored.

To enable, supply state-sync-enabled: false as C-Chain configuration.

Overview

The intention of bootstrap-monitor is to enable a Kubernetes StatefulSet to perform continous bootstrap testing for a given avalanchego configuration. It ensures that a testing pod either starts or resumes a test, and upon completion of a test, polls for a new image to test and initiates a new test when one is found.

Both the init and wait-for-completion commands of the bootstrap-monitor binary are intended to run as containers of a pod alongside an avalanchego container. The pod is expected to be managed by a StatefulSet to ensure the pod is restarted on failure and that only a single pod runs at a time to avoid contention for the backing data volume. Both commands derive the configuration of a bootstrap test from the pod:
- The network targeted by the test is determined by the value of the AVAGO_NETWORK_NAME env var set for the avalanchego container.
- Whether state sync is enabled is determined by the value of the AVAGO_CHAIN_CONFIG_CONTENT env var set for the avalanchego container.
- The image used by the test is determined by the image configured for the avalanchego container.
- The versions of the avalanchego image used by the test is determined by the pod annotation with key avalanche.avax.network/avalanchego-versions.
When a bootstrap testing pod is inevitably rescheduled or restarted, the contents of the PersistentVolumeClaim configured by the managing StatefulSet will persist across pod restarts to allow resumption of the interrupted test.
Both the init and wait-for-completion commands of the bootstrap-monitor attempt to read serialized test details (namely the image used for the test and the start time of the test) from the same data volume used by the avalanchego node. These details are written by the init command when it determines that a new test is starting.
The bootstrap-monitor init command is intended to run as an init container of an avalanchego node and ensure that the ID of the image and its associated versions are recorded for the test and that the contents of the pod's data volume is either cleared for a new test or retained to enable resuming a previously started test. It accomplishes this by:
- Mounting the same data volume as the avalanchego node
- Reading bootstrap test configuration as described previously
- Determining the image ID and versions for an image if the avalanchego image for the pod uses the latest tag. This will only need to be performed the first pod that a bootstrap testing StatefulSet runs. Subsequent pods from the same StatefulSet should have an image qualified with its SHA and version details set by the previous test run's wait-for-completion pod.
  - A new pod will be started with the latest image to execute avalanchego --versions-json to determine the image ID (which includes a sha256 hash) of the image and its avalanchego versions. Those values will then be applied to the StatefulSet managing the pod which will prompt pod deletion and recreation with the updated values. This ensures that a test result can be associated with both a specific image SHA and the avalanchego versions (including commit hash) of the binary that the image provides.
  - A separate pod is used because the image ID of a non-init avalanchego container using a latest-tagged image is only available when that container runs rather than when an init container runs.
  - While it would be possible to add an init container running the same avalanchego image as the primary avalanchego container, have it run the version command, and then have a subsequent bootstrap-monitor init container read those results, the use of a separate pod for SHA and versions discovery would still be required by the wait-for-completion command. It seemed preferable to have only a single way to discover image details.
- Attempting to read the serialized test details from a file on the data volume. This file will not exist if the data volume has not been used before.
- Comparing the image from the serialized test details to the image in the test configuration.
  - If the images differ (or the file was not present), the data volume is initialized for a new test:
    - The data volume is cleared
    - The image from the test configuration and the time are serialized to a file on the data volume
  - If the images are the same, the data volume is used as-is to enable resuming an in-progress test.
bootstrap-monitor wait-for-completion is intended to run as a sidecar of the avalanchego container. It polls the health of the node container to detect when a bootstrap test has completed successfully and then polls for a new image to test. When a new image is found, the managing StatefulSet is updated with the details of the image to trigger a new test. The process to detect a new image is the same as was described for the init command.

Package details

Filename	Purpose
bootstrap_test_config.go	Defines how the configuration for a bootstrap test is read from a pod
common.go	Defines code common between init and wait
init.go	Defines how a bootstrap test is initialized
wait.go	Defines how a bootstrap test is determined to have completed and how a new one is initiated
cmd/main.go	The binary entrypoint for the `bootstrap-monitor`
e2e/e2e_test.go	The e2e test that validates `bootstrap-monitor`

Supporting files

Filename	Purpose
scripts/build_bootstrap_monitor.sh	Builds the `bootstrap-monitor` binary
scripts/build_bootstrap_monitor_image.sh	Builds the image for the `bootstrap-monitor`
scripts/tests.e2e.bootstrap_monitor.go	Script for running the `bootstrap-monitor` e2e test

The test script is used by the github action workflow that validates the bootstrap-monitor binary and image.
The image build script is used by the github action workflow that publishes repo images post-merge.

Alternatives considered

Run bootstrap tests on github workers

Public github workers are not compatible with bootstrap testing due to the available storage of 30GB being insufficient for even state sync bootstrap.
Self-hosted github workers are not compatible with bootstrap testing due to the 5 day maximum duration for a job running on a self-hosted runner. State sync bootstrap usually completes within 5 days, but full sync bootstrap usually takes much longer.

Adding a 'bootstrap mode' to avalanchego

If avalanchego supported a --bootstrap-mode flag that exited on successful bootstrap, and a pod configured with this flag used an image with a latest tag, the pod would continously bootstrap, exit, and restart with the current latest image. While appealingly simple, this approach doesn't directly support:

A mechanism for resuming a long-running bootstrap. Given the expected duration of a bootstrap test, and the fact that a workload on Kubernetes is not guaranteed to run without interruption, a separate init process is suggested to enable resumption of an interrupted test.
A mechanism for reporting disk usage and duration of execution

Documentation ¶

Index ¶

Constants
func InitBootstrapTest(log logging.Logger, namespace string, podName string, nodeContainerName string, ...) error
func NodeDataDir(path string) string
func WaitForCompletion(log logging.Logger, namespace string, podName string, nodeContainerName string, ...) error
func WaitForPodCondition(ctx context.Context, clientset *kubernetes.Clientset, namespace string, ...) error
type BootstrapTestConfig
- func GetBootstrapTestConfigFromPod(ctx context.Context, clientset *kubernetes.Clientset, namespace string, ...) (*BootstrapTestConfig, error)
type ImageDetails

Constants ¶

View Source

const (
	BootstrapStartingMessage = "Starting bootstrap test"
	BootstrapResumingMessage = "Resuming bootstrap test"
)

View Source

const (
	ImageUnchanged = "Image unchanged"
)

View Source

const VersionsAnnotationKey = "avalanche.avax.network/avalanchego-versions"

Variables ¶

This section is empty.

Functions ¶

func InitBootstrapTest ¶

func InitBootstrapTest(log logging.Logger, namespace string, podName string, nodeContainerName string, dataDir string) error

func NodeDataDir ¶

func NodeDataDir(path string) string

func WaitForCompletion ¶

func WaitForCompletion(
	log logging.Logger,
	namespace string,
	podName string,
	nodeContainerName string,
	dataDir string,
	healthCheckInterval time.Duration,
	imageCheckInterval time.Duration,
) error

func WaitForPodCondition ¶

func WaitForPodCondition(ctx context.Context, clientset *kubernetes.Clientset, namespace string, podName string, conditionType corev1.PodConditionType) error

WaitForPodCondition watches the specified pod until the status includes the specified condition.

Types ¶

type BootstrapTestConfig ¶

type BootstrapTestConfig struct {
	Network          string            `json:"network"`
	StateSyncEnabled bool              `json:"stateSyncEnabled"`
	Image            string            `json:"image"`
	Versions         *version.Versions `json:"versions,omitempty"`
}

func GetBootstrapTestConfigFromPod ¶

func GetBootstrapTestConfigFromPod(ctx context.Context, clientset *kubernetes.Clientset, namespace string, podName string, nodeContainerName string) (*BootstrapTestConfig, error)

GetBootstrapTestConfigFromPod extracts the bootstrap test configuration from the specified pod.

type ImageDetails ¶

type ImageDetails struct {
	Image    string
	Versions *version.Versions
}

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL