README ¶
OpenTelemetry Protocol with Apache Arrow
The OpenTelemetry Protocol with Apache Arrow project is an effort within OpenTelemetry to use Apache Arrow libraries for bulk data transport in OpenTelemetry collection pipelines. This repository is the home of the OpenTelemetry Protocol with Apache Arrow protocol and reference implementation.
Quick start
Instructions for building an OpenTelemetry Collector with the modules in this repository are provided in BUILDING.md.
Examples for running the OpenTelemetry Collector with the modules in this repository are documented in collector/examples.
Overview
OpenTelemetry and Apache Arrow have similar charters, so it was natural to think about combining them. Both projects offer vendor-neutral interfaces with a cross-language interface specification, so that their implementation will feel familiar to users as they move between programming languages, and both specify a data model that is used throughout the project.
The OpenTelemetry project defines OTLP, the "OpenTeLemetry Prototcol" as the standard form of telemetry data in OpenTelemetry, being as similar as possible to the data model underlying the project. OTLP is defined in terms of Google protocol buffer definitions.
OTLP is a stateless protocol, where export requests map directly into the data model, nothing is omitted, and little is shared. OTLP export requests to not contain external or internal references, making the data relatively simple and easy to interpret. Because of this design, users of OTLP will typically configure network compression. In environments where telemetry data will be shipped to a service provider across a wide-area network, users would like more compression than can be achieved using a row-based data model and a stateless protocol.
Project goals
This is organized in phases. Our initial aim is to facilitate traffic reduction between a pair of OpenTelemetry collectors as illustrated in the following diagram.
The collector provided in this repository implements a new Arrow Receiver and Exporter able to fallback on standard OTLP when needed. The following diagram is an overview of this integration. In this first phase, the internal representation of the telemetry data is still fundamentally row-oriented.
Ultimately, we believe that an end-to-end OpenTelemetry Protocol with Apache Arrow pipeline will enable telemetry pipelines with substantially lower overhead to be built. These are our future milestones for OpenTelemetry and Apache Arrow integration:
- Extend OpenTelemetry client SDKs to natively support the OpenTelemetry
Protocol with Apache Arrow Protocol - Extend the OpenTelemetry collector with direct support for OpenTelemetry Protocol with Apache Arrow pipelines
- Extend OpenTelemetry data model with native support for multi-variate metrics.
- Output OpenTelemetry data to the Parquet file format, part of the Apache Arrow ecosystem
Improve network-level compression with OpenTelemetry Protocol with Apache Arrow
The first general-purpose application for the project is traffic reduction. At a high-level, this protocol performs the following steps to compactly encode and transmit telemetry using Apache Arrow.
- Separate the OpenTelemetry Resource and Scope elements from the hierarchy, then encode and transmit each distinct entity once per stream lifetime.
- Calculate distinct attribute sets used by Resources, Scopes, Metrics, Logs, Spans, Span Events, and Span Links, then encode and transmit each distinct entity once per stream lifetime.
- Use Apache Arrow's built-in support for encoding dictionaries and leverage other purpose-built low-level facilities, such as delta-dictionaries and sorting, to encode structures compactly.
Here is a diagram showing how the protocol transforms OTLP Log Records into column-oriented data, which also makes the data more compressible.
Project status
The first phase of the project has entered the Beta stability level, as defined by the OpenTelemetry collector guidelines. We do not plan to make breaking changes in this protocol without first engineering an approach that ensures forwards and backwards-compatibility for existing and new users. We believe it is safe to begin using these components for production data, non-critical workloads.
Project deliverables
We are pleased to release two new collector components, presently housed in this this repository.
We are working with the maintainers of the OpenTelemetry Collector-Contrib to merge these components into that repository. See our tracking issue.
The OpenTelemetry Protocol with Apache Arrow exporter and receiver components are drop-in compatible
with the core collector's OTLP exporter and receiver components.
Users with an established OTLP collection pipeline between two
OpenTelemetry Collectors can re-build their collectors with
otelarrow
components, then simply replace the component name otlp
with otelarrow
. The exporter and receiver both support falling back
to standard OTLP in case either side does not recognize the protocol,
so the upgrade should be painless. The OpenTelemetry Protocol with Apache Arrow receiver serves
both OpenTelemetry Protocol with Apache Arrow and OTLP on the standard port for OTLP gRPC (4317).
See the Exporter and Receiver documentation for details and sample configurations.
Project documentation
This package is a reference implementation of the OpenTelemetry Protocol with Apache Arrow protocol specified in this OTEP, which is currently the best source of information about OpenTelemetry Protocol with Apache Arrow. The Donation request describes how the project began.
Here are several more resources that are available to learn more about OpenTelemetry Protocol with Apache Arrow.
- Arrow Data Model - Mapping OTLP entities to Arrow Schemas.
- Benchmark results - Based on synthetic and production data.
- Validation process - Encoding/Decoding validation process.
- Articles describing some of the Arrow techniques used behind the scenes to optimize compression ratio and memory usage:
Benchmark summary
The following chart shows the compressed message size (in bytes) as a function of the batch size for metrics (univariate), logs, and traces. The bottom of the chart shows the reduction factor for both the standard OTLP protocol (with ZSTD compression) and the OpenTelemetry Protocol with Apache Arrow protocol (ZSTD) in comparison with an uncompressed OTLP protocol.
The next chart follows the same logic but shows the results for multivariate metrics (see left column).
The following heatmap represents, for different combinations of batch sizes and connection durations (expressed as the number of batches per stream), the additional percentage of compression gain between this new protocol and OTLP, both compressed with ZSTD. The data used here comes from a traffic of spans captured in a production environment. The gains are substantial in most cases. It is even interesting to note that these gains compared to OTLP+ZSTD are more significant for moderate-sized batches (e.g., 100 and 1000 spans per batch), which makes this protocol also interesting for scenarios where the additional latency introduced by batching must be minimized. There is hardly any scenario where micro-batches (e.g., 10 spans per batch) make the overhead of the Arrow schema prohibitive, and the advantage of a columnar representation becomes negligible. In other cases, this initial overhead is very quickly offset after just the first few batches. The columnar organization also lends itself better to compression. For very large batch sizes, ZSTD does an excellent job as long as the compression window is sufficiently large, but even in this case, the new protocol remains superior. As previously mentioned, these compression gains can be higher for traffic predominantly containing multivariate metrics.
For more details, see the following benchmark results page.
Developers
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. For more information, please read CONTRIBUTING.md.
License
OpenTelemetry Protocol with Apache Arrow Protocol Adapter is licensed under Apache 2.0.
Directories ¶
Path | Synopsis |
---|---|
api
|
|
experimental/arrow/v1/mock
Package mock is a generated GoMock package.
|
Package mock is a generated GoMock package. |
collector
module
|
|
cmd/otelarrowcol
Module
|
|
examples/printer
Module
|
|
exporter/fileexporter
Module
|
|
exporter/otelarrowexporter
Module
|
|
receiver/filereceiver
Module
|
|
receiver/otelarrowreceiver
Module
|
|
pkg
|
|
arrow
Package arrow provides a set of utility functions to access Arrow data structures.
|
Package arrow provides a set of utility functions to access Arrow data structures. |
benchmark
Package benchmark is a framework for benchmarking the performance of the OTLP protocol vs the OTLP Arrow protocol.
|
Package benchmark is a framework for benchmarking the performance of the OTLP protocol vs the OTLP Arrow protocol. |
benchmark/dataset
Package dataset defines the concept of dataset used in this benchmarking framework.
|
Package dataset defines the concept of dataset used in this benchmarking framework. |
benchmark/profileable
Package profileable defines the different protocols that can be profiled.
|
Package profileable defines the different protocols that can be profiled. |
benchmark/profileable/arrow
Package arrow implements the Profile interface for the OTLP Arrow protocol.
|
Package arrow implements the Profile interface for the OTLP Arrow protocol. |
benchmark/profileable/otlp
Package otlp implements the Profile interface for the OTLP protocol.
|
Package otlp implements the Profile interface for the OTLP protocol. |
datagen
Package datagen is a basic framework for generating fake telemetry data for benchmarking.
|
Package datagen is a basic framework for generating fake telemetry data for benchmarking. |
otel
Package otel provides a set of functions to convert OTLP entities to OTLP Arrow entities and vice versa.
|
Package otel provides a set of functions to convert OTLP entities to OTLP Arrow entities and vice versa. |
otel/arrow_record
Package arrow_record contains the consumer and producer for OTLP Arrow protocol.
|
Package arrow_record contains the consumer and producer for OTLP Arrow protocol. |
otel/arrow_record/mock
Package mock is a generated GoMock package.
|
Package mock is a generated GoMock package. |
otel/assert
Package assert provides a set of helper functions to assert conditions in OTLP Arrow tests.
|
Package assert provides a set of helper functions to assert conditions in OTLP Arrow tests. |
otel/common
Package common defines the common types and functions used by the packages logs, metrics and traces.
|
Package common defines the common types and functions used by the packages logs, metrics and traces. |
otel/common/arrow
Package arrow contains common types and functions used to convert OTLP entities into their Arrow representation.
|
Package arrow contains common types and functions used to convert OTLP entities into their Arrow representation. |
otel/common/otlp
Package otlp contains common types and functions used to convert OTLP Arrow entities into their OTLP representation.
|
Package otlp contains common types and functions used to convert OTLP Arrow entities into their OTLP representation. |
otel/constants
Package constants defines the constants used in the sibling packages.
|
Package constants defines the constants used in the sibling packages. |
otel/logs
Package logs provides functions to convert OTLP logs to OTLP Arrow logs and vice versa.
|
Package logs provides functions to convert OTLP logs to OTLP Arrow logs and vice versa. |
otel/logs/arrow
Package arrow contains types and functions used to convert OTLP logs into their Arrow representation.
|
Package arrow contains types and functions used to convert OTLP logs into their Arrow representation. |
otel/logs/otlp
Package otlp contains types and functions used to convert OTLP Arrow logs into their OTLP representation.
|
Package otlp contains types and functions used to convert OTLP Arrow logs into their OTLP representation. |
otel/metrics
Package metrics provides functions to convert OTLP metrics to OTLP Arrow metrics and vice versa.
|
Package metrics provides functions to convert OTLP metrics to OTLP Arrow metrics and vice versa. |
otel/metrics/arrow
Package arrow contains types and functions used to convert OTLP metrics into their Arrow representation.
|
Package arrow contains types and functions used to convert OTLP metrics into their Arrow representation. |
otel/metrics/otlp
Package otlp contains types and functions used to convert OTLP Arrow metrics into their OTLP representation.
|
Package otlp contains types and functions used to convert OTLP Arrow metrics into their OTLP representation. |
otel/traces
Package traces provides functions to convert OTLP traces to OTLP Arrow traces and vice versa.
|
Package traces provides functions to convert OTLP traces to OTLP Arrow traces and vice versa. |
otel/traces/arrow
Package arrow contains types and functions used to convert OTLP traces into their Arrow representation.
|
Package arrow contains types and functions used to convert OTLP traces into their Arrow representation. |
otel/traces/otlp
Package otlp contains types and functions used to convert OTLP Arrow traces into their OTLP representation.
|
Package otlp contains types and functions used to convert OTLP Arrow traces into their OTLP representation. |
Package tools contains tools used to benchmark OTLP, OTLP Arrow protocols, and to generate fake data.
|
Package tools contains tools used to benchmark OTLP, OTLP Arrow protocols, and to generate fake data. |
logs_benchmark
Package main contains a CLI tool for benchmarking the logs between OTLP and OTLP Arrow protocols.
|
Package main contains a CLI tool for benchmarking the logs between OTLP and OTLP Arrow protocols. |
logs_gen
Package main contains a CLI tool for generating fake OTLP logs.
|
Package main contains a CLI tool for generating fake OTLP logs. |
metrics_benchmark
Package main contains a CLI tool for benchmarking the metrics between OTLP and OTLP Arrow protocols.
|
Package main contains a CLI tool for benchmarking the metrics between OTLP and OTLP Arrow protocols. |
metrics_gen
Package main contains a CLI tool for generating fake OTLP metrics.
|
Package main contains a CLI tool for generating fake OTLP metrics. |
trace_benchmark
Package main contains a CLI tool for benchmarking the traces between OTLP and OTLP Arrow protocols.
|
Package main contains a CLI tool for benchmarking the traces between OTLP and OTLP Arrow protocols. |
trace_gen
Package main contains a CLI tool for generating fake OTLP traces.
|
Package main contains a CLI tool for generating fake OTLP traces. |
trace_head
Package main contains a CLI tool used to extract the first n spans from a trace file (protobuf file).
|
Package main contains a CLI tool used to extract the first n spans from a trace file (protobuf file). |
trace_verify
* Copyright The OpenTelemetry Authors * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License.
|
* Copyright The OpenTelemetry Authors * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. |