testruns

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2022 License: MIT Imports: 24 Imported by: 0

Documentation

Index

Constants

View Source
const ParallelResultCalculation = 4

ParallelResultCalculation dictates how many results can be calculated in parallel

View Source
const PerformanceDataVersion = 3
View Source
const TestResultVersion = 2

Increase this if the test result calculation changed - this forces recalculation on startup of the coordinator

Variables

This section is empty.

Functions

This section is empty.

Types

type PortIncrement

type PortIncrement int

PortIncrement is a type that specifies the offset from the default port based on predefined purposes

const PortIncrementClientPort PortIncrement = 2

PortIncrementClientPort is the offset from the standard port for the Client listeners (for instance the endpoints the shards expose allowing clients to query the spent status of a UHS)

const PortIncrementDefaultPort PortIncrement = 0

PortIncrementDefaultPort is the offset from the standard port for default listeners. These are the listeners for the system role's default work, which in most cases (Atomizer, Shard, Sentinel) is receiving transactions for processing.

const PortIncrementRaftPort PortIncrement = 1

PortIncrementRaftPort is the offset from the standard port for the RAFT listeners. This is used by RAFT replicated components such as the Atomizer and 2PC Shards/Coordinators to communicate between cluster participants

type TestManagerConfig

type TestManagerConfig struct {
	MaxAgents int `json:"maxAgents"`
}

TestManagerConfig is the main type in which parameters for the controller can be persisted

type TestRunManager

type TestRunManager struct {
	// contains filtered or unexported fields
}

func NewTestRunManager

func NewTestRunManager(
	c *coordinator.Coordinator,
	am *agents.AgentsManager,
	src *sources.SourcesManager,
	ev chan coordinator.Event,
	awsm *awsmgr.AwsManager,
	commitHash string,
) (*TestRunManager, error)

func (*TestRunManager) BreakAndTerminateAllCmds

func (t *TestRunManager) BreakAndTerminateAllCmds(
	tr *common.TestRun,
	cmds []runningCommand,
) error

BreakAndTerminateAllCmds will instruct the agent runnning a command to send a os.Interrupt signal followed by an os.Kill signal to it, for each of the commands in the runningCommands array.

func (*TestRunManager) CalculateFlameGraph

func (t *TestRunManager) CalculateFlameGraph(
	tr *common.TestRun,
	commandID string,
) error

CalculateFlameGraph uses a python script to turn the data gathered by running commands with `perf` enabled into a flame graph. This uses the code from https://github.com/brendangregg/FlameGraph, which is cloned next to the coordinator binary by the Dockerfile

func (*TestRunManager) CalculatePerformancePlot

func (t *TestRunManager) CalculatePerformancePlot(
	tr *common.TestRun,
	commandID, plotType string,
) error

CalculatePerformancePlot uses a python script to turn the data gathered by the performance counters on the agent into a performance metric plot. The performance data is stored for each command separately, and this method is given the ID of the command to calculate the plot for. plotType can be any of the following:

		system_memory
		network_buffers
		cpu_usage
		num_threads
		process_cpu_usage
		process_disk_usage
		disk_usage
     flamegraph

func (*TestRunManager) CalculateResults

func (t *TestRunManager) CalculateResults(
	tr *common.TestRun,
	recalc bool,
) (*common.TestResult, error)

CalculateResults will enqueue the result calculation if needed onto the job channel and await its completion, returning the result. Use `recalc` set to `true` to force calculation even if results are already present

func (*TestRunManager) CheckPreseed

func (t *TestRunManager) CheckPreseed(tr *common.TestRun) error

func (*TestRunManager) CompileBinaries

func (t *TestRunManager) CompileBinaries(
	tr *common.TestRun,
	seeder bool,
) error

func (*TestRunManager) Config

func (t *TestRunManager) Config() TestManagerConfig

Config returns the entire config of the TestRunManager

func (*TestRunManager) ContinueSweep

func (t *TestRunManager) ContinueSweep(tr *common.TestRun, sweepID string)

ContinueSweep will identify the next test run in a one-at-a-time test sweep and schedule it for execution

func (*TestRunManager) CopyOutputs

func (t *TestRunManager) CopyOutputs(
	tr *common.TestRun,
	envs map[int32][]byte,
	ignoreErrors bool,
) error

CopyOutputs will use the `copyFiles` map to instruct the agents to upload all indicated files from its file system to S3 so that the coordinator can download them later

func (*TestRunManager) CreateStartSequenceAtomizer

func (t *TestRunManager) CreateStartSequenceAtomizer(
	tr *common.TestRun,
	archiverDone chan []runningCommand,
	errChan chan error,
) []startSequenceEntry

CreateStartSequenceAtomizer uses the test run configuration to determine in which sequence the agent roles should be started, and returns an array of startSequenceEntry elements that are ordered in the sequence in which they should be started up.

func (*TestRunManager) CreateStartSequenceTwoPhase

func (t *TestRunManager) CreateStartSequenceTwoPhase(
	tr *common.TestRun,
) []startSequenceEntry

CreateStartSequenceTwoPhase uses the test run configuration to determine in which sequence the agent roles should be started, and returns an array of startSequenceEntry elements that are ordered in the sequence in which they should be started up.

func (*TestRunManager) DeployBinaries

func (t *TestRunManager) DeployBinaries(
	tr *common.TestRun,
	binariesInS3Path string,
) (map[int32][]byte, error)

DeployBinaries deploys the prebuilt binaries to all involved agents, and returns a map of agentID => environmentID for all environments created on the test agents. It calls PrepareAgentWithBinariesForCommit for each role in the testrun

func (*TestRunManager) DeployConfig

func (t *TestRunManager) DeployConfig(
	tr *common.TestRun,
	envs map[int32][]byte,
	cfg []byte,
) error

DeployConfig is a convenience method to send a DeployFileRequestMsg to all agents that are part of a testrun with the contents of the system-wide configuration file. It will be deployed at a location relative to the environment folder in which the test is running on that agent, and its path can be resolved through the %CFG% substitution parameter, which we pass to all commands we run

func (*TestRunManager) ExecuteTestRun

func (t *TestRunManager) ExecuteTestRun(tr *common.TestRun)

ExecuteTestRun is the main function that executes a test run

func (*TestRunManager) FailRoles

func (t *TestRunManager) FailRoles(tr *common.TestRun, cancel chan bool)

FailRoles is run in a goroutine by RunBinaries to fail roles that were configured to fail at a certain point in the test run. The `cancel` channel is monitored, if anything is sent there the failure logic is aborted. This is mainly the case when the main executing logic fails and aborts the test run.

func (*TestRunManager) FailTestRun

func (t *TestRunManager) FailTestRun(tr *common.TestRun, err error)

FailTestRun will set the status of a testrun to failed, with the given error as reason. It will then terminate the AWS roles that are still active and copy any test run outputs/performance data that was uploaded to S3 before the test had failed. Lastly, it will reschedule the test run if it was configure to be rescheduled on failures

func (*TestRunManager) GenerateConfig

func (t *TestRunManager) GenerateConfig(tr *common.TestRun) ([]byte, error)

func (*TestRunManager) GenerateConfigAtomizer

func (t *TestRunManager) GenerateConfigAtomizer(
	tr *common.TestRun,
) ([]byte, error)

func (*TestRunManager) GenerateConfigTwoPhase

func (t *TestRunManager) GenerateConfigTwoPhase(
	tr *common.TestRun,
) ([]byte, error)

GenerateConfigTwoPhase creates a configuration file to place on all nodes such that the system roles can properly find each other and are configured as was dictacted by the scheduled test definition in the UI

func (*TestRunManager) GenerateMatrix

func (t *TestRunManager) GenerateMatrix() ([]*common.MatrixResult, time.Time, time.Time)

GenerateMatrix generates a matrix of results from all testruns. Note that this is a quite costly function

func (*TestRunManager) GenerateMatrixForRuns

func (t *TestRunManager) GenerateMatrixForRuns(
	trs []*common.TestRun,
) ([]*common.MatrixResult, time.Time, time.Time)

GenerateMatrixForRuns generates a matrix of results for all test runs passed in the `trs` parameter. A matrix will identify the different configurations in the set of testruns, and average the testresults that share the same configuration. The return value is a list of matrix results as well as the earliest and latest time tests in this set were executed

func (*TestRunManager) GenerateSweepMatrix

func (t *TestRunManager) GenerateSweepMatrix(
	sweepIDs []string,
) ([]*common.MatrixResult, time.Time, time.Time)

GenerateSweepMatrix generates a matrix of results from all testruns in a single sweep

func (*TestRunManager) GetAllRolesSorted

func (t *TestRunManager) GetAllRolesSorted(
	tr *common.TestRun,
	role common.SystemRole,
) []*common.TestRunRole

GetAllRolesSorted extracts all roles of a particular type from the set of roles in the testrun, and sorts them by Index

func (*TestRunManager) GetLogFiles

func (t *TestRunManager) GetLogFiles(
	tr *common.TestRun,
	cmds []runningCommand,
	envs map[int32][]byte,
) error

GetPerformanceProfiles instructs the agents to upload the performance data gathered while running the command(s) to S3

func (*TestRunManager) GetPerformanceProfiles

func (t *TestRunManager) GetPerformanceProfiles(
	tr *common.TestRun,
	cmds []runningCommand,
	envs map[int32][]byte,
) error

GetPerformanceProfiles instructs the agents to upload the performance data gathered while running the command(s) to S3

func (*TestRunManager) GetRequiredVCPUs

func (t *TestRunManager) GetRequiredVCPUs(tr *common.TestRun) map[string]int32

GetRequiredVCPUs will use the region and VCPU count of the chosen launch templates for all the roles in the test run to build a total tally map of region => vcpu_count and return it.

func (*TestRunManager) GetRoleEndpoint

func (t *TestRunManager) GetRoleEndpoint(
	tr *common.TestRun,
	role *common.TestRunRole,
	portIncrement PortIncrement,
) (string, error)

GetRoleEndpoint will return the IP and port at which a particular role in our test is / should be listening. This endpoint is derived from the IP address reported by the agent and the port number based on the default for that role, and the specified increment (Default, RAFT or Client)

func (*TestRunManager) GetTestRun

func (t *TestRunManager) GetTestRun(runID string) (*common.TestRun, bool)

GetTestRun returns a single test run by its ID

func (*TestRunManager) GetTestRuns

func (t *TestRunManager) GetTestRuns() []*common.TestRun

GetTestRuns returns all testruns known to the system

func (*TestRunManager) HandleCommandFailure

func (t *TestRunManager) HandleCommandFailure(
	tr *common.TestRun,
	allCmds []runningCommand,
	envs map[int32][]byte,
	fail *common.ExecutedCommand,
) error

HandleCommandFailure is called when one of the commands fails during the testrun. It will kill all the other commands, download the performance profiles and outputs that are available for inspection.

func (*TestRunManager) HasAWSRoles

func (t *TestRunManager) HasAWSRoles(tr *common.TestRun) bool

HasAWSRoles will return true if the test run has roles that (are supposed to) run on AWS EC2

func (*TestRunManager) Is2PC

func (t *TestRunManager) Is2PC(architectureID string) bool

Is2PC returns wheter the given architectureID is a two-phase commit architecture - this stems from the time when 2PC existed in both an on-disk and in-memory shard configuration

func (*TestRunManager) IsAtomizer

func (t *TestRunManager) IsAtomizer(architectureID string) bool

func (*TestRunManager) KillAwsAgents

func (t *TestRunManager) KillAwsAgents(tr *common.TestRun) error

KillAwsAgents will terminate all running EC2 instances for the specified test run

func (*TestRunManager) LoadAllTestRuns

func (t *TestRunManager) LoadAllTestRuns()

LoadAllTestRuns is ran on start up of the controller to scan the entire directory of testrun data and load the relevant test runs into memory

func (*TestRunManager) LoadConfig

func (t *TestRunManager) LoadConfig() error

LoadConfig loads the configuration variables from persistence (file). It also sends a real-time update for the frontend to know what the current value is

func (*TestRunManager) LoadTestResult

func (t *TestRunManager) LoadTestResult(tr *common.TestRun)

LoadTestResult loads the test result, which is stored separately from the test run's metadata, from disk

func (*TestRunManager) LoadTestRun

func (t *TestRunManager) LoadTestRun(id string) (*common.TestRun, error)

LoadTestRun loads a single test run from disk

func (*TestRunManager) NormalizeRole

func (t *TestRunManager) NormalizeRole(
	role common.SystemRole,
) common.SystemRole

NormalizeRole converts variants of certain roles (for instance the two-phase commit shard) to one standardized role name - this is specifically used when generating the configuration file where both locking_shard and shard require configuration prefix "shard". Full translation table:

2PC Shard -> Shard 2PC Sentinel -> Sentinel

func (*TestRunManager) PersistConfig

func (t *TestRunManager) PersistConfig() error

PersistConfig saves the configuration variables to persistence (file).It also sends a real-time update for the frontend to know what the current value is

func (*TestRunManager) PersistTestRun

func (t *TestRunManager) PersistTestRun(tr *common.TestRun)

PersistTestRun stores the test run data in the persisted state. At present, this is a flat directory structure with JSON files. This could be changed into a database at some point - but this has been deemed not a priority.

func (*TestRunManager) PreseedShards

func (t *TestRunManager) PreseedShards(
	tr *common.TestRun,
	envs map[int32][]byte,
) error

PreseedShards will instruct the agents that run shard roles to download the relevant shard preseed set from S3 and unpack it into the right spot for the shard to pick it up on startup.

func (*TestRunManager) RedownloadTestOutputsFromS3

func (t *TestRunManager) RedownloadTestOutputsFromS3(tr *common.TestRun) error

RedownloadTestOutputsFromS3 will enumerate all files in the S3 bucket for the given testrun and download them all. This can be triggered from the user interface. This exists because sometimes files get either corrupted or fail downloading and redoing the downloads can help fix that.

func (*TestRunManager) Reschedule

func (t *TestRunManager) Reschedule(tr *common.TestRun)

Reschedule will insert a copy of the passed (failed) testrun into the queue if the max retries have not reached their limit. It will reset the agent IDs as well, since we need to spawn new AWS roles to run this test.

func (*TestRunManager) ResultCalculator

func (t *TestRunManager) ResultCalculator()

ResultCalculator is the main processor for the resultCalculationChan. It will be started `ParallelResultCalculation` times in the background and read from the channel to see which result calculations need to be performed. Once a calculation request has been read from the channel, it will use the result calculation python script to produce the results.

func (*TestRunManager) RetrySpawn

func (t *TestRunManager) RetrySpawn(id string)

RetrySpawn is used to manually initiate respawning of the AWS roles that are not online yet

func (*TestRunManager) RunBinaries

func (t *TestRunManager) RunBinaries(
	tr *common.TestRun,
	envs map[int32][]byte,
	cmd chan *common.ExecutedCommand,
	failures chan *common.ExecutedCommand,
) error

RunBinaries is a convenience method that will execute the correct method based on the architecture configured for the test run, or return an error if the architecture is unknown

func (*TestRunManager) RunBinariesAtomizer

func (t *TestRunManager) RunBinariesAtomizer(
	tr *common.TestRun,
	envs map[int32][]byte,
	cmd chan *common.ExecutedCommand,
	failures chan *common.ExecutedCommand,
) error

func (*TestRunManager) RunBinariesTwoPhase

func (t *TestRunManager) RunBinariesTwoPhase(
	tr *common.TestRun,
	envs map[int32][]byte,
	cmd chan *common.ExecutedCommand,
	failures chan *common.ExecutedCommand,
) error

RunBinariesTwoPhase will orchestrate the running of all roles for a full cycle test with the two-phase commit architecture

func (*TestRunManager) RunForAllAgents

func (t *TestRunManager) RunForAllAgents(
	f func(role *common.TestRunRole) error,
	tr *common.TestRun,
	description string,
	timeout time.Duration,
) error

func (*TestRunManager) ScheduleTestRun

func (t *TestRunManager) ScheduleTestRun(tr *common.TestRun)

ScheduleTestRun will add the given testrun to the set of queued testruns. This method will assign a testrun its ID and set the creation time, initiate certain fields with their defaults if not set, persist it and broadcast it over the real-time channel so that other users connected to the system will learn about the test run's existence without refreshing the browser

func (*TestRunManager) Scheduler

func (t *TestRunManager) Scheduler()

Scheduleris the main loop that checks if Queued testruns can commence execution by looking at the total number of active agents in the Running testruns, and considers vCPU limits on EC2 to prevent trying to start a test run for which the account does not have enough allowance

func (*TestRunManager) SetMaxAgents

func (t *TestRunManager) SetMaxAgents(max int) error

SetMaxAgents changes the maximum number of parallel running agents which is used by the scheduler. The scheduler will not run testruns from the queue that would exceed this number of agents active

func (*TestRunManager) ShouldCalculateResults

func (t *TestRunManager) ShouldCalculateResults(tr *common.TestRun) bool

ShouldCalculateResults returns if a result calculation is necessary - currently only determined by the absence of test results

func (*TestRunManager) ShouldTerminate

func (t *TestRunManager) ShouldTerminate(tr *common.TestRun) bool

ShouldTerminate does a non-blocking read on the TerminateChan of the given testrun and returns true if anything is read from it - signaling that the user has manually terminated the run

func (*TestRunManager) SnapshotAgents

func (t *TestRunManager) SnapshotAgents(tr *common.TestRun)

SnapshotAgents will take a copy of the current status of the connected agents such that we preserve them for later inspection.

func (*TestRunManager) SpawnAWSInstances

func (t *TestRunManager) SpawnAWSInstances(tr *common.TestRun) bool

SpawnAWSInstances will look for the roles needed to be spawned on AWS EC2 and initiate spawning them

func (*TestRunManager) StartRoleBinaries

func (t *TestRunManager) StartRoleBinaries(
	cmds []runningCommand,
	roles []*common.TestRunRole,
	tr *common.TestRun,
	envs map[int32][]byte,
	cmd chan *common.ExecutedCommand,
	wait bool,
) ([]runningCommand, error)

StartRoleBinaries is a convenience method to start a set of test run roles from a particular test run on the agents that are supposed to run those roles. Gets passed the current set of running commands and will return the set with the commands run by this routine appended. The method uses AgentsManager.ExecuteCommand to do the actual command execution. See the documentation for that method for explanation for the parameter `wait`. `envs` is the map of agent ID to environment ID, which is created from the main test run logic when deploying the binaries. `cmd` is a channel where executed commands get signaled to by the ExecuteCommand method.

func (*TestRunManager) SubstituteParameters

func (t *TestRunManager) SubstituteParameters(
	params []string,
	r *common.TestRunRole,
	tr *common.TestRun,
) []string

SubstituteParameters will replace placeholders in commands, command line parameters, with values based on the role's configuration or index in a cluster

func (*TestRunManager) Terminate

func (t *TestRunManager) Terminate(id string)

Terminate will terminate a test run. If the test run is queued, it will change its status to Canceled. If the testrun is running, it will signal the request for termination through the testruns TerminateChan, which is read by ShouldTerminate at certain points in the test runs execution logic, at which time the test run logic will be terminated cleanly.

func (*TestRunManager) TerminateIfNeeded

func (t *TestRunManager) TerminateIfNeeded(
	tr *common.TestRun,
	allCmds []runningCommand,
	envs map[int32][]byte,
	failures chan *common.ExecutedCommand,
) bool

TerminateIfNeeded will call ShouldTerminate to determine if the test run needs to be terminated, or checks the failures channel for any failed command that warrants terminating the test run. It then proceeds to takes care of all the actions needed to cleanly terminate the test run. Will return true if the test run was terminated

func (*TestRunManager) TestRunsLoaded

func (t *TestRunManager) TestRunsLoaded() bool

TestRunsLoaded indicates if the system has completed loading the test runs

func (*TestRunManager) UpdateStatus

func (t *TestRunManager) UpdateStatus(
	tr *common.TestRun,
	newStatus common.TestRunStatus,
	details string,
)

UpdateStatus will set the status property of the testrun, append the new status to the test run log if it's not a duplicate of the current status. It will also set the start/complete time to Now() if those times are Zero and the status is Running / Completed. It will also send this status update over the real-time event channel such that the UI can update these statuses in real-time

func (*TestRunManager) UploadBinaries

func (t *TestRunManager) UploadBinaries(
	tr *common.TestRun,
	seeder bool,
) (string, error)

UploadBinaries upload binaries for this testrun to S3

func (*TestRunManager) UploadConfig

func (t *TestRunManager) UploadConfig(cfg []byte, tr *common.TestRun) error

UploadConfig uploads the contents of the configuration file for the system to S3 for future reference

func (*TestRunManager) ValidateTestRun

func (t *TestRunManager) ValidateTestRun(
	tr *common.TestRun,
) []error

ValidateTestRun validates the role composition of the test run by calling the architecture-specific function and return all errors reported

func (*TestRunManager) ValidateTestRunAtomizer

func (t *TestRunManager) ValidateTestRunAtomizer(
	tr *common.TestRun,
) []error

ValidateTestRunAtomizer validates the role composition of the test run for an atomizer commit system. Reports all errors back as an array

func (*TestRunManager) ValidateTestRunTwoPhase

func (t *TestRunManager) ValidateTestRunTwoPhase(
	tr *common.TestRun,
) []error

ValidateTestRunTwoPhase validates the role composition of the test run for a twophase commit system. Reports all errors back as an array

func (*TestRunManager) WaitForAWSInstances

func (t *TestRunManager) WaitForAWSInstances(
	tr *common.TestRun,
) (bool, bool, bool)

WaitForAWSInstances will use the spawned instance IDs set to the role information by SpawnAWSInstances to determine if all the roles needed for the test run are online and ready to begin the test. It will retry spawning if it takes too long, and fail if it doesn't succeed after three retries.

func (*TestRunManager) WaitForRoleOnline

func (t *TestRunManager) WaitForRoleOnline(
	tr *common.TestRun,
	role *common.TestRunRole,
	portIncrement PortIncrement,
	timeout time.Duration,
) error

WaitForRoleOnline will wait until it's able to open a TCP connection to the endpoint of the role at the given `portIncrement`. This can for instance be used to wait for the RAFT port of a follower node to be online before continuing the start sequence and start the leader of that RAFT cluster. It does not do any semantic check if the TCP endpoint is actually processing data. It just opens and closes the connection - once it can succesfully open the connection the method returns. If the `timeout` specified has elapsed and no connection is possible, the method returns an error

func (*TestRunManager) WaitForRolesOnline

func (t *TestRunManager) WaitForRolesOnline(
	tr *common.TestRun,
	roles []*common.TestRunRole,
	portIncrement PortIncrement,
	timeout time.Duration,
) error

WaitForRolesOnline will

func (*TestRunManager) WriteLog

func (t *TestRunManager) WriteLog(
	tr *common.TestRun,
	format string,
	a ...interface{},
)

WriteLog writes a statement to a testrun's log file and sends it over the real-time channel to the UI

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL