OpenTelemetry Protocol with Apache Arrow Exporter
Exports telemetry data using OpenTelemetry Protocol with Apache
Arrow components with
support for both OpenTelemetry Protocol with Apache
Arrow and standard OpenTelemetry Protocol
(OTLP) protocol via gRPC.
Getting Started
The OpenTelemetry Protocol with Apache
Arrow exporter combines
the features and configuration syntax of the core OpenTelemetry
Collector OTLP
exporter
component with additional support for the OpenTelemetry Protocol with
Apache Arrow.
OpenTelemetry Protocol with Apache Arrow supports column-oriented data
transport using the Apache Arrow data format. This component converts
OTLP data into an optimized representation and then sends batches of
data using Apache Arrow to encode the stream. The OpenTelemetry
Protocol with Apache Arrow receiver component contains logic to reverse the process used in this
component.
The use of an OpenTelemetry Protocol with Apache Arrow
exporter-receiver pair is recommended when the network is expensive.
Typically, expect to see a 50% reduction in bandwidth compared with
the same data being sent using standard OTLP/gRPC with Zstd
compression, batch sizes being equal.
This component includes all the features and configuration of the core
OTLP exporter, making it possible to upgrade from the core OTLP
exporter component. This is as simple as replacing "otlp" with
"otelarrow" as the component name in the collector configuration.
To enable the OpenTelemetry Protocol with Apache Arrow exporter,
include it in the list of exporters for a pipeline. The endpoint
setting is required. The tls
setting is required for insecure
transport.
endpoint
(no default): host:port to which the exporter is going to send OTLP trace data,
using the gRPC protocol. The valid syntax is described
here.
If a scheme of https
is used then client transport security is enabled and overrides the insecure
setting.
tls
: see TLS Configuration Settings for the full set of available options.
Example:
exporters:
otelarrow/secure:
endpoint: external-collector:4317
tls:
cert_file: file.cert
key_file: file.key
otelarrow/insecure:
endpoint: internal-collector:4317
tls:
insecure: true
By default, zstd
compression is enabled at the gRPC level. See
compression configuration below. To
disable gRPC-level compression, configure "none":
exporters:
otelarrow:
compression: none
endpoint: ...
tls: ...
Configuration
Several helper files are leveraged to provide additional capabilities automatically:
Arrow-specific Configuration
In the arrow
configuration block, the following settings enable and
disable the use of OpenTelemetry Protocol with Apache Arrow as opposed
to standard OTLP.
disabled
(default: false): disables use of Arrow, causing the exporter to use standard OTLP
disable_downgrade
(default: false): prevents this exporter from using standard OTLP.
The following setting determines how long a stream will stay open.
Stream lifetime is limited to 30 seconds because compression benefit
is limited at that point and shorter streams make load balancing
easier.
max_stream_lifetime
(default: 30s): duration after which streams
are recycled.
The following setting determines memory and CPU resources that the
exporter will use:
num_streams
(default: max(1, NumCPU()/2)
): the number of concurrent Arrow streams
The num_streams
default limits the exporter stream count to half the
number of CPUs or 1, whichever is greater. When num_streams
is
greater than one, a configurable policy determines how load is
assigned across streams to balance load. The supported policies are
leastloaded
, which picks the stream with the smallest number of
outstanding requests, and leastloadedN
for N <= num_streams
, which
limits the decision to a random subset of N
streams.
prioritizer
(default: "leastloaded"): policy for distributing load across multiple streams.
The following configuration values allow for separate streams per unique
metadata combinations:
metadata_keys
(default = empty): When set, this exporter will create one
arrow exporter instance per distinct combination of values in the
client.Metadata.
metadata_cardinality_limit
(default = 1000): When metadata_keys is not empty,
this setting limits the number of unique combinations of metadata key values
that will be processed over the lifetime of the exporter.
Network Configuration
This component uses round_robin
by default as the gRPC load
balancer. This can be modified using the balancer_name
setting, for
example, to configure the pick_first
balancer:
exporters:
otelarrow:
balancer_name: pick_first
endpoint: ...
tls: ...
When the server or an intermediate proxy uses a keepalive setting, the
Arrow-specific max_stream_lifetime
setting is critical to avoiding
abrupt termination of Arrow streams, which causes retries of the
in-flight requests. The maximum stream lifetime should be set to a
value less than the minimum of the server's keepalive parameter (and
any of the intermediate proxies), plus the export timeout.
exporters:
otelarrow:
timeout: 30s
arrow:
max_stream_lifetime: 9m30s
endpoint: ...
tls: ...
When this is configured, the stream will terminate cleanly without
causing retries, with OK
gRPC status.
The corresponding otelarrowreceiver
keepalive setting, that is
compatible with the one above, reads:
receivers:
otelarrow:
protocols:
grpc:
keepalive:
server_parameters:
max_connection_age: 1m
max_connection_age_grace: 10m
Exporter metrics
In addition to the the standard
exporterhelper
and
obsreport
metrics, this component provides network-level measurement instruments
which we anticipate will become part of exporterhelper
and/or
obsreport
in the future. At the normal
level of metrics detail:
otelcol_exporter_sent
: uncompressed bytes sent, prior to compression
otelcol_exporter_sent_wire
: compressed bytes sent, on the wire.
Arrow's compression performance can be derived by dividing the average
otelcol_exporter_sent
value by the average otelcol_exporter_sent_wire
value.
At the detailed
metrics detail level, information about the stream
of data being returned to the exporter will be instrumented:
otelcol_exporter_recv
: uncompressed bytes received, prior to compression
otelcol_exporter_recv_wire
: compressed bytes received, on the wire.
Compression Configuration
The exporter supports configuring Zstd compression at both the gRPC
and the Arrow level. The exporter metrics described above will be
correct in either case. The default settings are subject to change as
we gain experience.
See the Collector compression
comparison
for general information about the choice of Zstd by default, for other
general compression configuration and benchmark information.
For the OpenTelemetry Protocol with Apache Arrow streams specifically,
gRPC-level the Zstd compression level can be configured. However,
there is an important caveat: the gRPC-Go library requires that
compressor implementations be registered statically. These libraries
use compressors named zstdarrow1
, zstdarrow2
, ..., zstdarrow10
,
supporting 10 configurable compression levels. Note, however that
these configurations are static and only one unique configuration is
possible per level. It is possible to configure multiple
OpenTelemetry Protocol with Apache Arrow exporters with different Zstd
configuration simply by using distinct levels.
Under arrow
, the zstd
sub-configuration has the following fields:
level
: in the range 1-10 determines a number of defaults (default 5)
window_size_mib
: size of the Zstd window in MiB, 0 indicates to determine based on level (default 0)
concurrency
: controls background CPU used for compression, 0 indicates to let zstd
library decide (default 1)
The exporter supports configuring compression at the Arrow
columnar-protocol
level.
payload_compression
(default "zstd"): compression applied at the Arrow IPC level.
Compression at the Arrow level is enabled by default because it boosts
compression slightly and helps Arrow payloads meet gRPC maximum
request size limits. Compression settings at the Arrow IPC level
cannot be further configured.
For example, two exporters may be configured with multiple zstd
configurations, provided they use different levels:
exporters:
otelarrow/best:
compression: zstd # describes gRPC-level compression (default "zstd")
arrow:
zstd:
level: 10 # describes gRPC-level compression level (default 5)
otelarrow/fastest:
compression: zstd
arrow:
zstd:
level: 1 # 1 is the "fastest" compression level
Batching Configuration
This exporter includes a new, experimental batcher
configuration for
batching in the exporterhelper
module, but this mode is disabled by
default. This batching support works when combined with
queue_sender
functionality.
exporters:
otelarrow:
batcher:
enabled: true
sending_queue:
enabled: true
storage: file_storage/otc
extensions:
file_storage/otc:
directory: /var/lib/storage/otc
The built-in batcher is only recommended with a persistent queue,
otherwise it cannot provide back-pressure to the caller. If building
a custom build of the OpenTelemetry Collector, we recommend using the
Concurrent Batch
Processor
to provide simultaneous back-pressure, concurrency, and batching
functionality. See more discussion on this
issue.
exporters:
otelarrow:
batcher:
enabled: false
sending_queue:
enabled: false
processors:
concurrentbatch:
send_batch_max_size: 1500
send_batch_size: 1000
timeout: 1s