integration_test/

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/GoogleCloudPlatform/ops-agent

README ¶

Integration Testing

Integration tests are implemented as Kokoro builds that run on each PR. The builds first build the Ops Agent and then run tests on that agent. The Kokoro builds are split up by distro.

Setup

You will need a GCP project to run VMs in. This is referred to as ${PROJECT} in the following instructions.

The project needs sufficient quota to run many tests in parallel. It also needs a firewall that allows connections over port 22 for ssh. It is recommended for Googlers to use our prebuilt testing project. Ask a teammate (e.g. martijnvs@) for the project ID.

You will also need a GCS bucket that is used to transfer files onto the testing VMs. This is referred to as ${TRANSFERS_BUCKET}. For Googlers, stackdriver-test-143416-untrusted-file-transfers is recommended.

You will need gcloud to be installed. Run gcloud auth login to set up gcloud authentication (if you haven't done that already).

To give the tests credentials to be able to access Google APIs as you, run the following command and do what it says (it may ask you to run a command on a separate machine if your main machine doesn't have the ability to open a browser window):

gcloud --billing-project="${PROJECT}" auth application-default login

Once these steps are complete, you should be able to run the below commands.

Ops Agent Test

This test exercises "core" features of the Ops Agent such as watching syslog or a custom log file. It is implemented in ops_agent_test.go. It can be run outside of Kokoro with some setup (see above).

Testing Command

When the setup steps are complete, you can run ops_agent_test from the Makefile:

make integration_tests PROJECT=${PROJECT} TRANSFERS_BUCKET=${TRANSFERS_BUCKET}

Alternatively, you can export PROJECT and TRANSFERS_BUCKET in your environment and simply call the target. You can also specify the ZONES and IMAGE_SPECS variables if you would like to run the tests on something other than the defaults (us-central1-b for ZONES and debian-cloud:debian-11 for IMAGE_SPECS).

The above command will run the tests against the stable Ops Agent. To test against a pre-built but unreleased agent, you can use add the AGENT_PACKAGES_IN_GCS environment variable onto your command like this:

AGENT_PACKAGES_IN_GCS=gs://ops-agents-public-buckets-test-logs/prod/stackdriver_agents/testing/consumer/ops_agent/build/bullseye_x86_64/2677/20240711-200228/result \

You can obtain such a URI by:

take a previous Kokoro run with a successful build and go to the Invocation Details page. Get the value corresponding to the GCS key. For example: https://console.cloud.google.com/storage/browser/ops-agents-public-buckets-test-logs/prod/stackdriver_agents/testing/consumer/ops_agent/build/bullseye_x86_64/2677/20240711-200228
Replace https://console.cloud.google.com/storage/browser/ at the beginning of the URL with gs:// and put /result on the end and pass that as AGENT_PACKAGES_IN_GCS.

Googlers can also provide a REPO_SUFFIX to test an agent built by our release scripts. When doing so, you may need to supply ARTIFACT_REGISTRY_REGION=us as well, once b/266410466 is completed.

Third Party Apps Test

This test attempts to verify, for each application in supported_applications.txt, that the application can be installed on a real GCE VM and that a single representative metric is successfully uploaded to Google Cloud Monitoring.

Testing Command

The make target third_party_apps_test similarly requires PROJECT and TRANSFERS_BUCKET to be specified in the environment or the command.

make third_party_apps_test PROJECT=${PROJECT} TRANSFERS_BUCKET=${TRANSFERS_BUCKET}

As above, you can supply AGENT_PACKAGES_IN_GCS or REPO_SUFFIX to test a pre-built agent.

Additionally, to run specific third party applications you can use the command:

go test -v ./integration_test \
    -tags=integration_test \
    -test.run="TestThirdPartyApps/.*/(nvml|dcgm)"

Make sure the platform you specify is included in the IMAGE_SPECS environment variable.

Testing Flow

The test is designed to be highly parameterizable. It reads various files from third_party_apps_data and decides what to do based on their contents. First it reads metadata.yaml and uses that to set some testing options, such as which platforms to skip, and whether the application supports windows or linux. Each application is tested in parallel. For each, the test will:

Bring up a GCE VM
Install the application on the VM by running applications/<application>/<platform>/install on the VM
Install the Ops Agent (built from the contents of the PR) on the VM
Configure the the Ops Agent to look for the application's logs/metrics by running applications/<application>/enable on the VM.
Run applications/<application>/exercise script to send some load to the application, so that we can get it to generate some logs/metrics
Wait for up to 7 minutes for logs matching the expectations in applications/<application>/expected_logs.yaml to appear in the Google Cloud Logging backend.
Wait up to 7 minutes for metrics matching the expectations in expected_metrics of applications/<application>/metadata.yaml to appear in the Google Cloud Monitoring backend.

The test is designed so that simply modifying files in the third_party_apps_data directory is sufficient to get the test runner to do the right thing. But we do expect that we will need to make big changes to both the data directory and the test runner before it is really meeting our needs.

By default, the test will skip any applications that were not impacted by the currently modified set of files. However, if the modified files are unrelated to any apps, it assumes that all apps are impacted.

Adding a new third-party application

You will need to add and modify a few files. Start by adding your new application to agent/<linux_or_windows>/supported_applications.txt

Then, inside applications/<application>/:

<platform>/install to install the application,
enable to configure the Ops Agent to read the application's metrics exposed in the previous step.
(if necessary) exercise. This is only needed sometimes, e.g. to get the application to log to a particular file.
Inside metadata.yaml, add short_name, e.g. solr and long_name, e.g. Apache Solr.
Some integration will have steps for configuring instance, e.g. Apache Hadoop.
(if you want to test logging) add expected_logs in metadata.yaml
(if you want to test metrics) add expected_metrics in metadata.yaml

expected_logs

We use expected_logs inside metadata.yaml file both as a test artifact and as a source for documentation, e.g. Apache(httpd) public doc. All logs ingested from the integration should be documented here.

A sample expected_logs snippet looks like:

expected_logs:
- log_name: apache_access
  fields:
  - name: httpRequest.requestMethod
    value_regex: GET
    type: string
    description: HTTP method
  - name: jsonPayload.host
    type: string
    description: Contents of the Host header
  - name: jsonPayload.user
    type: string
    description: Authenticated username for the request

name: required, it will be used in e2e searching for the matching logs
type: required, informational
description: required, informational
value_regex: optional, the value of the LogEntry field will be used in e2e searching for the matching logs.

expected_metrics

We use expected_metrics inside metadata.yaml file both as a test artifact and as a source for documentation. All metrics ingested from the integration should be documented here.

A sample expected_metrics snippet looks like:

expected_metrics:
- type: workload.googleapis.com/apache.current_connections
  value_type: INT64
  kind: GAUGE
  monitored_resource: gce_instance
  labels:
    server_name: .*
  representative: true

type, value_type and kind come directly from the metric descriptor for that metric. monitored_resource should always be gce_instance.

labels is an exhaustive list of labels associated with the metric. Each key in labels is the label name, and its value is a regular expression. During the test, each label returned by the time series for that metric is checked against labels: every label in the time series must be present in labels, and its value must match the regular expression.

For example, if a metric defines a label operation whose values can only be read or write, then an appropriate labels map in expected_metrics would be as follows:

  labels:
    operation: read|write

Exactly one metric from each integration's expected_metrics must have representative: true. This metric can be used to detect when the integration is enabled. A representative metric cannot be optional.

With optional: true, the metric will be skipped during the test. This can be useful for metrics that are not guaranteed to be present during the test, for example due to platform differences or unimplemented test setup procedures. An optional metric cannot be representative.

expected_metrics can be generated or updated using generate_expected_metrics.go:

PROJECT="${PROJECT}" \
SCRIPTS_DIR=third_party_apps_data \
go run ./cmd/generate_expected_metrics

This queries all metric descriptors under workload.googleapis.com/, agent.googleapis.com/iis/, and agent.googleapis.com/mssql/. The optional variable FILTER is also provided to make it quicker to test individual integrations. For example:

PROJECT="${PROJECT}" \
SCRIPTS_DIR=third_party_apps_data \
FILTER='metric.type=starts_with("workload.googleapis.com/apache")' \
go run ./cmd/generate_expected_metrics

Existing expected_metrics files are updated with any new metrics that are retrieved. Any existing metrics within the file will be overwritten with newly retrieved ones, except that existing labels patterns are preserved.

Test Logs

The Kokoro presubmit will have a "Details" link next to it. Clicking there will take you to a publicly-visible GCS bucket that contains various log files. It's a little tricky to figure out which one(s) to look at first, so here's a guide for that.

TLDR: start in sponge_log.xml to see what failed, then drill down to the corresponding main_log.txt from there.

Here is the full contents uploaded to the GCS bucket for a single test run. The "Details" link takes you directly to the "logs" subdirectory to save you a hop.

└── logs
    ├── sponge_log.xml
    └── TestThirdPartyApps_debian-cloud:debian-11_jetty
        ├── VM_initialization.txt
        ├── config.yaml.txt
        ├── fluent_bit_main.conf.txt
        ├── fluent_bit_metrics.txt
        ├── fluent_bit_parser.conf.txt
        ├── health-checks.log.txt
        ├── journalctl_output.txt
        ├── logging-module.log.txt
        ├── main_log.txt
        ├── metrics-module.log.txt
        ├── nvidia-installer.log.txt
        ├── otel.yaml.txt
        ├── otel_metrics.txt
        ├── syslog.txt
        └── systemctl_status_for_ops_agent.txt

Let's go through each of these files and discuss what they are.

TODO: Document log files for a Windows VM.

sponge_log.xml: Structured data about which tests passed/failed, but not very human readable.
main_log.txt: The main log for the particular test shard (e.g. TestThirdPartyApps_debian-cloud:debian-11_jetty) that ran. This is the place to start if you are wondering what happened to a particular shard.
syslog.txt: The system's /var/log/{syslog,messages}. Highly useful. OTel collector logs can be found here by searching for otelopscol.
logging-module.log.txt: The Fluent-Bit log file.
journalctl_output.txt: The output of running journalctl -xe. Useful when the Ops Agent can't start/restart properly, often due to malformed config files.
otel.yaml.txt: The generated config file used to start the OTel collector.
VM_initialization.txt: Only useful to look at when we can't bring up a fresh VM properly.
fluent_bit_main.conf.txt, fluent_bit_parser.conf.txt: Fluent-Bit config files.

Vendored Dependencies

Due to being throttled by some sites, notably archive.apache.org, we are keeping a local copy of various large installers instead of downloading them fresh each time. These are stored in https://console.cloud.google.com/storage/browser/ops-agents-public-buckets-vendored-deps/mirrored-content and the script mirror_content.sh is intended to help upload the installer there. Run it like (using cassandra as an example):

./mirror_content.sh https://archive.apache.org/dist/cassandra/4.0.1/apache-cassandra-4.0.1-bin.tar.gz

And then change the install script(s) for cassandra to download from:

https://storage.googleapis.com/ops-agents-public-buckets-vendored-deps/mirrored-content/archive.apache.org/dist/cassandra/4.0.1/apache-cassandra-4.0.1-bin.tar.gz

instead of the original URL.

Directories ¶

Path	Synopsis
cmd
generate_expected_metrics
run_resource_detector
feature_tracking
gce
logging Package logging has utilities to aid in recording logs for tests.	Package logging has utilities to aid in recording logs for tests.
metadata

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL