Image test suites
What is being tested
The tests are a combination of various types - end to end tests on certain
software components, image validations and feature validations, etc. The
collective whole represents the quality assurance bar for releasing a
supported GCE Image,
and the test suites here must all pass before Google engineers will release a
new GCE image.
Tests are broken down by suite below:
Test Suites
Test suite: shapevalidation
Test that a VM can boot and access the virtual hardware of the large machine shape in a VM family.
Test$FAMILY
Mem
Test that the available system memory is at least the expected amount of memory for this VM shape.
Test$FAMILY
Cpu
Test the the number of active processors is equal to the number of processors expected for this VM shape.
Test$FAMILY
Numa
Test the the number of active numa nodes is equal to the number of processors expected for this VM shape.
Test suite: cvm
TestSEVEnabled/TestSEVSNPEnabled/TestTDXEnabled
Validate that an instance can boot with the specified confidential instance type and load its guest kernel module.
Test suite: disk
TestDiskResize
Validate the filesystem is resized on reboot after a disk resize.
-
Background: A convenience feature offered on supported GCE Images, if you resize the
underlying disk to be larger, then a set of scripts invoked during boot will
automatically resize the root partition and filesystem to take advantage of the
new space.
-
Test logic: Launch a VM with the default disk size. Wait for it to boot up, then resize the
disk and reboot the VM via the API. Wait for the VM to boot again, and validate
the new size as reported by the operating system matches the expected size.
Test suite: hostnamevalidation
Tests which verify that the metadata hostname is created and works with the DNS record.
TestHostname
Test that the system hostname is correctly set.
-
Background: The hostname is one of many pieces of 'dynamic' configuration that supported
GCE Images will set for you. This is compared to the
'static' configuration which is present on the image to be tested. Dynamic
configuration allows a single GCE Image to be used on many VMs without
pre-modification.
-
Test logic: Retrieve the intended FQDN from metadata (which is authoritative) and
compare the hostname part of it (first label) to the currently set hostname as
returned by the kernel.
TestFQDN
Test that the fully-qualified domain name is correctly set.
-
Background: The FQDN is a complicated concept in Linux operating systems, and setting it in
an incorrect way can lead to unexpected behavior in some software.
-
Test logic: Retrieve the intended FQDN from metadata and compare the full value to the
output of /bin/hostname -f
. See man 1 hostname
for more details.
TestCustomHostname
Test that custom domain names are correctly set.
-
Background: The domain name for a VM matches the configured internal GCE DNS setting (https://cloud.google.com/compute/docs/internal-dns). By default, this will be the zonal or global DNS name. However, if you
specify a custom domain name at instance creation time, this will be used instead.
-
Test logic: Launch a VM with a custom domain name. Validate the domain name as with TestFQDN.
TestHostKeysGeneratedOnce
Validate that SSH host keys are only generated once per instance.
-
Background: The Google guest agent will generate new SSH hostkeys on the first boot of an
instance. This is a dynamic configuration to enable GCE Images to be used on
many instances, as multiple instances sharing host keys or having predictable
host keys is a security risk. However, the host keys should remain constant for
the lifetime of an instance, as changing them after the first generation may
prevent new SSH connections.
-
Test logic: Launch a VM and confirm the guest agent generates unique host keys on startup.
Restart the guest agent and confirm the host keys are not changed.
Test suite: hotattach
TestFileHotAttach
Validate that hot attach disks work: a file can be written to the disk, the disk can be detached and
reattached, and the file can still be read.
Test suite: imageboot
TestGuestBoot
Test that the VM can boot.
TestGuestReboot
Test that the VM can reboot.
-
Background: Some categories of errors can produce an OS image that boots but cannot
successfully reboot. Documenting these errors is out of scope for this document,
but this test is a regression test against this category of error.
-
Test logic: Launch a VM and create a 'marker file' on disk. Reboot the VM and validate the
marker file exists on the second boot.
TestGuestSecureBoot
Test that VM launched with
secure boot
features works properly.
-
Background: Secure Boot is a Linux system feature that is supported on certain GCE Images
and VM types. Documenting how Secure Boot works is out of scope for this
document.
-
Test logic: Launch a VM with Secure Boot enabled via the shielded instance config. Validate
that Secure Boot is enabled by querying the appropriate EFI variable through the
sysfs/efivarfs interface.
TestGuestShutdownScript
Test that shutdown scripts can run for around two minutes (as a proxy for
'forever')
-
Background: We guarantee shutdown scripts will block the system shutdown process until the
script completes. For scripts which never complete, this would cause the server
to remain in a 'shutting down' state forever. However, VMs that are stopped via
the API are first sent an ACPI soft-shutdown signal which triggers the OS
shutdown process, invoking this script. But after a set amount of time
(currently 90 seconds), if the VM is still running, the GCE API will hard-reset
the VM. It's not possible to validate that the shutdown script will run
'forever'. However, we validate that it will run at least until hard-reset
occurs.
-
Test logic: Launch a VM with a shutdown script in metadata. The shutdown script writes an
increasing counter value every second to a file on disk, forever. Since this
causes the graceful shutdown process to never succeed, the API hard-resets the
VM after 2 minutes. After the VM finishes shutdown, start the VM and inspect the
last value written to the file. It should be >110 to represent approximately 2
minute shutdown time.
Test suite: licensevalidation
A suite which tests that linux licensing and windows activation are working successfully.
TestLinuxLicense
Validate the image has the appropriate license attached
-
Background: Several of the supported GCE Images are subject to licensing agreements with the
OS vendor. This is represented with the GCE License resource, which is attached
to the GCE Image resource. Official GCE Images should not be released without
the appropriate license.
-
Test logic: Connect to the metadata server from the VM and confirm the license available in
metadata matches the expected value.
Test suite: network
TestDefaultMTU
Validate the primary interface has correct MTU of 1460
-
Background: The default MTU for a GCE VPC is 1460. Setting the correct MTU on the network
interface to match will prevent unnecessary packet fragmentation.
-
Test logic: Identify the primary network interface using metadata, and confirm it has the
correct MTU using the golang 'net' package, which uses the netlink interface on
Linux (same as the ip
command).
Test suite: networkperf
Validate the network performance of an image reaches at least 85% of advertised
speeds.
-
Background: Reaching advertised speeds is important, as failing to reach them means that
there are problems with the image or its drivers. The 85% number is chosen as
that is the baseline that the performance tests generally can match or exceed.
Reaching 100% of the advertised speeds is unrealistic in real scenarios.
-
Test logic: Launch a server VM and client VM, then run an iperf test between the two to test
network speeds. This test launches up to 3 sets of servers and clients: default
network, jumbo frames network, and tier1 networking tier.
Test suite: oslogin
Validate that the user can SSH using OSLogin, and that the guest agent can correctly provision a
VM to utilize OSLogin.
-
Background: OSLogin is a utility that helps manage users' keys and access for SSH. It also provides
features such as the ability to authenticate users using 2FA, security keys, or certificates.
-
Test logic: Launch a client VM and two server VMs. Each of the server VMs will perform a check to
make sure the guest agent responds correctly to OSLogin metadata changes, and the client VM will use
test users to SSH to each of the server VMs. The methods covered by this test are normal SSH and 2FA SSH.
Test suite: packagevalidation
TestNTPService
Test that a time synchronization package is installed and properly configured.
-
Background: Linux operating systems require a time synchronization sofware to be running to
correct any drift in the system clock. Correct clock time is required for a wide
variety of applications, and virtual machines are particularly prone to clock
drift.
-
Test logic: Validate that an appropriate time synchronization package is installed using the
system package manager, and read its configuration file to verify that it is
configured to check the Google-provided time server.
TestStandardPrograms
Validate that Google-provided programs are present.
-
Background: Google-provided Linux OS images come with certain Google utilities such as
gsutil
and gcloud
preinstalled as a convenience.
-
Test logic: Attempt to invoke the utilities, confirming they are present, found in the PATH,
and executable.
TestGuestPackages
Validate that the Google guest environment packages are installed
-
Background: Google-provided Linux OS images come with the Google guest environment
preinstalled. The guest environment enables many GCE features to function.
-
Test logic: Validate that the guest environment packages are installed using the system
package manager.
Test suite: security
TestKernelSecuritySettings
Validate sysctl tuneables have correct values
-
Background: Linux has a wide variety of kernel tuneables exposed via the sysctl interface.
Supported GCE Images are built with some of these setting predefined for best
behavior in the GCE environment, for example
"net.ipv4.icmp_echo_ignore_broadcasts", which configures the kernel not
respond to broadcast pings.
-
Test logic: Read each sysctl option from the /proc/sys filesystem interface and confirm it
has the correct value.
TestAutomaticUpdates
Validate automatic security updates are enabled on supported distributions
-
Background: Some Linux distributions provide a mechanism for automatic package updates that
are marked as security updates. We enable these updates in supported GCE Images.
-
Test logic: Confirm the relevant automatic updates package is installed, and that the
relevant configuration options are set in the configuration files.
TestPasswordSecurity
Validate security settings for SSHD and system accounts
-
Background: As part of the default configuration provided in supported GCE Images, certain
security validations are performed. These include ensuring that password based
logins and root logins via SSH are disabled, and that system accounts have the
correct password and shell settings.
-
Test logic: Read the SSHD configuration file and confirm it has the 'PasswordAuthentication
no' and 'PermitRootLogin no' directives set. Read the /etc/passwd file and
confirm all users have disabled passwords, and that 'system account' users
(those with UID < 1000) have the correct shell set (typically set to 'nologin'
or 'false')
Test suite: storageperf
This test suite verifies PD performance on linux and windows. The following documentation is relevant for working with these tests, as of January 2024.
Performance limits: https://cloud.google.com/compute/docs/disks/performance. In addition to machine type and vCPU performance limits, most disks have a performance limit per VM, as well as a performance limit per GB.
FIO command options: https://cloud.google.com/compute/docs/disks/benchmarking-pd-performance. To reach maximum IOPS and bandwidth MB per second, the disk needs to be warmed up with a "random write" fio task before running the benchmarking test.
Hyperdisk limits: https://cloud.google.com/compute/docs/disks/benchmark-hyperdisk-performance. Hyperdisk disk types have a much higher performance limit and limit per GB of disk size. To reach the highest performance values on linux, some additional fio options may be required.
TestRandomReadIOPS and TestSequentialReadIOPS
Checks random and sequential read performance on files and compares it to an expected IOPS value
(in a future change, this will be compared to the documented IOPS value).
-
Background: The public documentation for machine shapes and types lists certain values for
read IOPS. This test was designed to verify that the read IOPS which are attainable
are within a certain range (such as 97%) of the documented value.
-
Test logic: FIO is downloaded based on the machine type and distribution. Next, the fio program
is run and the json output is returned. Out of the json output, we can get the read iops
value which was achieved, and check that it is above a certain threshold.
TestRandomWriteIOPS and TestSequentialWriteIOPS
Checks random and sequential file write performance on a disk and compares it to an expected IOPS value
(in a future change, this will be compared to a documented IOPS value).
- Background: Similar to the read iops tests, we want to verify that write IOPS on disks work at
the rate we expect for both random writes and throughput.