README ¶
Integration Testing
Integration tests are implemented as Kokoro builds that run on each PR. The builds first build the Ops Agent and then run tests on that agent. The Kokoro builds are split up by distro.
Setup
You will need a GCP project to run VMs in. This is referred to as ${PROJECT}
in
the following instructions.
The project needs sufficient quota to run many tests in parallel. It also needs a firewall that allows connections over port 22 for ssh. It is recommended for Googlers to use our prebuilt testing project. Ask a teammate (e.g. martijnvs@) for the project ID.
You will also need a GCS bucket that is used to transfer files onto the
testing VMs. This is referred to as ${TRANSFERS_BUCKET}
. For Googlers,
stackdriver-test-143416-untrusted-file-transfers
is recommended.
You will need gcloud
to be installed. Run gcloud auth login
to set up gcloud
authentication (if you haven't done that already).
To give the tests credentials to be able to access Google APIs as you, run the following command and do what it says (it may ask you to run a command on a separate machine if your main machine doesn't have the ability to open a browser window):
gcloud --billing-project="${PROJECT}" auth application-default login
Once these steps are complete, you should be able to run the below commands.
Ops Agent Test
This test exercises "core" features of the Ops Agent such as watching syslog or a custom log file. It is implemented in ops_agent_test.go. It can be run outside of Kokoro with some setup (see above).
Testing Command
When the setup steps are complete, you can run ops_agent_test from the Makefile:
make integration_tests PROJECT=${PROJECT} TRANSFERS_BUCKET=${TRANSFERS_BUCKET}
Alternatively, you can export PROJECT
and TRANSFERS_BUCKET
in your
environment and simply call the target.
You can also specify the ZONES
and IMAGE_SPECS
variables if you would like
to run the tests on something other than the defaults (us-central1-b
for
ZONES and debian-cloud:debian-11
for IMAGE_SPECS
).
The above command will run the tests against the stable Ops Agent. To test
against a pre-built but unreleased agent, you can use add the
AGENT_PACKAGES_IN_GCS
environment variable onto your command like this:
AGENT_PACKAGES_IN_GCS=gs://ops-agents-public-buckets-test-logs/prod/stackdriver_agents/testing/consumer/ops_agent/build/bullseye_x86_64/2677/20240711-200228/result \
You can obtain such a URI by:
- take a previous Kokoro run with a successful build and go to the
Invocation Details
page. Get the value corresponding to theGCS
key. For example:https://console.cloud.google.com/storage/browser/ops-agents-public-buckets-test-logs/prod/stackdriver_agents/testing/consumer/ops_agent/build/bullseye_x86_64/2677/20240711-200228
- Replace
https://console.cloud.google.com/storage/browser/
at the beginning of the URL withgs://
and put/result
on the end and pass that asAGENT_PACKAGES_IN_GCS
.
Googlers can also provide a REPO_SUFFIX
to test an agent built by our release scripts.
When doing so, you may need to supply ARTIFACT_REGISTRY_REGION=us
as well, once
b/266410466 is completed.
Third Party Apps Test
This test attempts to verify, for each application in supported_applications.txt
,
that the application can be installed on a real GCE VM and that a single
representative metric is successfully uploaded to Google Cloud Monitoring.
Testing Command
The make target third_party_apps_test
similarly requires PROJECT
and
TRANSFERS_BUCKET
to be specified in the environment or the command.
make third_party_apps_test PROJECT=${PROJECT} TRANSFERS_BUCKET=${TRANSFERS_BUCKET}
As above, you can supply AGENT_PACKAGES_IN_GCS
or REPO_SUFFIX
to test a pre-built agent.
Additionally, to run specific third party applications you can use the command:
go test -v ./integration_test \
-tags=integration_test \
-test.run="TestThirdPartyApps/.*/(nvml|dcgm)"
Make sure the platform you specify is included in the IMAGE_SPECS
environment
variable.
Testing Flow
The test is designed to be highly parameterizable. It reads various files from
third_party_apps_data
and decides what to do based on their contents. First
it reads metadata.yaml
and uses that to set some testing options, such as
which platforms to skip, and whether the application supports windows or linux.
Each application is tested in parallel. For each, the test will:
- Bring up a GCE VM
- Install the application on the VM by running
applications/<application>/<platform>/install
on the VM - Install the Ops Agent (built from the contents of the PR) on the VM
- Configure the the Ops Agent to look for the application's logs/metrics by
running
applications/<application>/enable
on the VM. - Run
applications/<application>/exercise
script to send some load to the application, so that we can get it to generate some logs/metrics - Wait for up to 7 minutes for logs matching the expectations in
applications/<application>/expected_logs.yaml
to appear in the Google Cloud Logging backend. - Wait up to 7 minutes for metrics matching the expectations in expected_metrics of
applications/<application>/metadata.yaml
to appear in the Google Cloud Monitoring backend.
The test is designed so that simply modifying files in the
third_party_apps_data
directory is sufficient to get the test runner to do the
right thing. But we do expect that we will need to make big changes to both the
data directory and the test runner before it is really meeting our needs.
By default, the test will skip any applications that were not impacted by the currently modified set of files. However, if the modified files are unrelated to any apps, it assumes that all apps are impacted.
Adding a new third-party application
You will need to add and modify a few files. Start by adding your new
application to agent/<linux_or_windows>/supported_applications.txt
Then, inside applications/<application>/
:
<platform>/install
to install the application,enable
to configure the Ops Agent to read the application's metrics exposed in the previous step.- (if necessary)
exercise
. This is only needed sometimes, e.g. to get the application to log to a particular file. - Inside
metadata.yaml
, addshort_name
, e.g.solr
andlong_name
, e.g.Apache Solr
. - Some integration will have steps for configuring instance, e.g. Apache Hadoop.
- (if you want to test logging) add
expected_logs
in metadata.yaml - (if you want to test metrics) add
expected_metrics
in metadata.yaml
expected_logs
We use expected_logs
inside metadata.yaml
file both as a test artifact and as a source for documentation, e.g. Apache(httpd) public doc. All logs ingested from the integration should be documented here.
A sample expected_logs
snippet looks like:
expected_logs:
- log_name: apache_access
fields:
- name: httpRequest.requestMethod
value_regex: GET
type: string
description: HTTP method
- name: jsonPayload.host
type: string
description: Contents of the Host header
- name: jsonPayload.user
type: string
description: Authenticated username for the request
name
: required, it will be used in e2e searching for the matching logstype
: required, informationaldescription
: required, informationalvalue_regex
: optional, the value of the LogEntry field will be used in e2e searching for the matching logs.
expected_metrics
We use expected_metrics
inside metadata.yaml
file both as a test artifact and as a source for documentation. All metrics ingested from the integration should be documented here.
A sample expected_metrics
snippet looks like:
expected_metrics:
- type: workload.googleapis.com/apache.current_connections
value_type: INT64
kind: GAUGE
monitored_resource: gce_instance
labels:
server_name: .*
representative: true
type
, value_type
and kind
come directly from the metric descriptor for that metric. monitored_resource
should always be gce_instance
.
labels
is an exhaustive list of labels associated with the metric. Each key in labels
is the label name, and its value is a regular expression. During the test, each label returned by the time series for that metric is checked against labels
: every label in the time series must be present in labels
, and its value must match the regular expression.
For example, if a metric defines a label operation
whose values can only be read
or write
, then an appropriate labels
map in expected_metrics
would be as follows:
labels:
operation: read|write
Exactly one metric from each integration's expected_metrics
must have representative: true
. This metric can be used to detect when the integration is enabled. A representative metric cannot be optional.
With optional: true
, the metric will be skipped during the test. This can be useful for metrics that are not guaranteed to be present during the test, for example due to platform differences or unimplemented test setup procedures. An optional metric cannot be representative.
expected_metrics
can be generated or updated using generate_expected_metrics.go
:
PROJECT="${PROJECT}" \
SCRIPTS_DIR=third_party_apps_data \
go run ./cmd/generate_expected_metrics
This queries all metric descriptors under workload.googleapis.com/
, agent.googleapis.com/iis/
, and agent.googleapis.com/mssql/
. The optional variable FILTER
is also provided to make it quicker to test individual integrations. For example:
PROJECT="${PROJECT}" \
SCRIPTS_DIR=third_party_apps_data \
FILTER='metric.type=starts_with("workload.googleapis.com/apache")' \
go run ./cmd/generate_expected_metrics
Existing expected_metrics
files are updated with any new metrics that are retrieved. Any existing metrics within the file will be overwritten with newly retrieved ones, except that existing labels
patterns are preserved.
Test Logs
The Kokoro presubmit will have a "Details" link next to it. Clicking there will take you to a publicly-visible GCS bucket that contains various log files. It's a little tricky to figure out which one(s) to look at first, so here's a guide for that.
TLDR: start in sponge_log.xml
to see what failed, then drill down to
the corresponding main_log.txt
from there.
Here is the full contents uploaded to the GCS bucket for a single test run. The "Details" link takes you directly to the "logs" subdirectory to save you a hop.
└── logs
├── sponge_log.xml
└── TestThirdPartyApps_debian-cloud:debian-11_jetty
├── VM_initialization.txt
├── config.yaml.txt
├── fluent_bit_main.conf.txt
├── fluent_bit_metrics.txt
├── fluent_bit_parser.conf.txt
├── health-checks.log.txt
├── journalctl_output.txt
├── logging-module.log.txt
├── main_log.txt
├── metrics-module.log.txt
├── nvidia-installer.log.txt
├── otel.yaml.txt
├── otel_metrics.txt
├── syslog.txt
└── systemctl_status_for_ops_agent.txt
Let's go through each of these files and discuss what they are.
TODO: Document log files for a Windows VM.
sponge_log.xml
: Structured data about which tests passed/failed, but not very human readable.main_log.txt
: The main log for the particular test shard (e.g.TestThirdPartyApps_debian-cloud:debian-11_jetty
) that ran. This is the place to start if you are wondering what happened to a particular shard.syslog.txt
: The system's/var/log/{syslog,messages}
. Highly useful. OTel collector logs can be found here by searching forotelopscol
.logging-module.log.txt
: The Fluent-Bit log file.journalctl_output.txt
: The output of runningjournalctl -xe
. Useful when the Ops Agent can't start/restart properly, often due to malformed config files.otel.yaml.txt
: The generated config file used to start the OTel collector.VM_initialization.txt
: Only useful to look at when we can't bring up a fresh VM properly.fluent_bit_main.conf.txt
,fluent_bit_parser.conf.txt
: Fluent-Bit config files.
Vendored Dependencies
Due to being throttled by some sites, notably archive.apache.org, we are keeping
a local copy of various large installers instead of downloading them fresh each
time. These are stored in
https://console.cloud.google.com/storage/browser/ops-agents-public-buckets-vendored-deps/mirrored-content
and the script mirror_content.sh
is intended to help upload the installer
there. Run it like (using cassandra as an example):
./mirror_content.sh https://archive.apache.org/dist/cassandra/4.0.1/apache-cassandra-4.0.1-bin.tar.gz
And then change the install
script(s) for cassandra to download from:
https://storage.googleapis.com/ops-agents-public-buckets-vendored-deps/mirrored-content/archive.apache.org/dist/cassandra/4.0.1/apache-cassandra-4.0.1-bin.tar.gz
instead of the original URL.
Directories ¶
Path | Synopsis |
---|---|
cmd
|
|
Package logging has utilities to aid in recording logs for tests.
|
Package logging has utilities to aid in recording logs for tests. |