DataSet Exporter
This exporter sends logs to DataSet.
See the Getting Started guide.
Configuration
Required Settings
dataset_url
(no default): The URL of the DataSet API that ingests the data. Most likely https://app.scalyr.com.
api_key
(no default): The "Log Write" API Key required to use API. Instructions how to get API key.
If you do not want to specify api_key
in the file, you can use the builtin functionality and use api_key: ${env:DATASET_API_KEY}
.
Server Host Settings
Specifying the server host is crucial for ensuring the correct functionality of DataSet.
DataSet expects the server host value to be provided in the serverHost
attribute.
If the server host value is stored in a different attribute, you can use the resourceprocessor or attributesprocessor to copy it into the serverHost
attribute.
You can also utilize the server_host
settings (described below) to populate the serverHost attribute with different values.
The process of populating the serverHost attribute works as follows:
- If the
serverHost
attribute is specified and not empty in the log or trace, then it is used.
- If the
serverHost
attribute is specified and not empty in the resource, then it is used.
- If the
host.name
attribute is specified and not empty in the resource, then it is used.
- If the
server_host.server_host
setting is specified and not empty, then it is used.
- If
server_host.use_host_name
setting is set to true
, the hostname
of the node is used.
Make sure to provide the appropriate server host value in the serverHost
attribute to ensure the proper functionality of DataSet and accurate handling of events.
Optional Settings
debug
(default = false): Adds session_key
to the server fields. It's useful for debugging throughput issues.
buffer
:
max_lifetime
(default = 5s): The maximum delay between sending batches from the same source.
group_by
(default = []): The list of attributes based on which events should be grouped. They are moved from the event attributes to the session info and shown as server fields in the UI.
retry_initial_interval
(default = 5s): Time to wait after the first failure before retrying.
retry_max_interval
(default = 30s): Is the upper bound on backoff.
retry_max_elapsed_time
(default = 300s): Is the maximum amount of time spent trying to send a buffer.
retry_shutdown_timeout
(default = 30s): The maximum time for which it will try to send data to the DataSet during shutdown. This value should be shorter than container's grace period.
logs
:
export_resource_info_on_event
(default = false): Include LogRecord resource information (if available) on the DataSet event.
export_resource_prefix
(default = 'resource.attributes.'): A prefix string for the resource, if export_resource_info_on_event
is enabled.
export_scope_info_on_event
(default = true): Include LogRecord scope information (if available) on the DataSet event.
export_scope_prefix
(default = 'scope.attributes.'): A prefix string for the scope, if export_scope_info_on_event
is enabled.
export_separator
(default = '.'): The separator to add between keys when flattening nested structures (maps, arrays).
export_distinguishing_suffix
(default = '_'): A suffix string to resolve naming collisions when flattening.
decompose_complex_message_field
(default = false): Decompose complex body / message field types (e.g. a maps, arrays) into separate fields.
decomposed_complex_message_prefix
(default = 'body.map.'): A prefix string to use when a complex message is decomposed.
traces
:
export_separator
(default = '.'): The separator to add between keys when flattening nested structures (maps, arrays).
export_distinguishing_suffix
(default = '_'): A suffix string to resolve naming collisions when flattening.
server_host
:
server_host
(default = ''): Specifies the server host to be used for the events.
use_hostname
(default = true): Determines whether the hostname
of the node should be used as the server host for the events. When set to true
, the node's hostname
is automatically used.
retry_on_failure
: See retry_on_failure
sending_queue
: See sending_queue
timeout
: See timeout
Attributes
Enabled attributes are exported in the order:
- Log properties
- Body
- Resource attributes
- Scope attributes
- Log attributes
If there is a name conflict, the export_distinguishing_suffix
value is appended to the later attribute's name. If the export_distinguishing_suffix
value is an empty string, then the value from the last attribute is used.
Example
Example LogRecord:
Log
- body:
- b: 1
- x: "b"
- resource:
- r: 2
- x: "r"
- scope:
- s: 3
- x: "s"
- attribute:
- a: 4
- x: "a"
- map:
- m1: 5
- m2: 6
Then the event will look like:
- Default settings for
logs
:
- Everything enabled:
- Configuration:
logs:
export_resource_info_on_event: true
export_resource_prefix: "r."
export_scope_info_on_event: true
export_scope_prefix: "s."
decompose_complex_message_field: true
decomposed_complex_message_prefix: "m."
export_separator: "-"
export_distinguishing_suffix: "_"
- Event:
- message: "{\"b\": 1, \"x\": \"b\"}"
- m.b: 1
- m.x: "b"
- r.r: 2
- r.x: "r"
- s.s: 3
- s.x: "s"
- a: 4
- x: "a"
- map-m1: 5
- map-m2: 6
- Everything enabled, prefixes are empty strings:
- Configuration:
logs:
export_resource_info_on_event: true
export_resource_prefix: ""
export_scope_info_on_event: true
export_scope_prefix: ""
decompose_complex_message_field: true
decomposed_complex_message_prefix: ""
export_separator: "-"
export_distinguishing_suffix: "_"
- Event:
- message: "{\"b\": 1, \"x\": \"b\"}"
- b: 1
- x: "b"
- r: 2
- x_: "r"
- s: 3
- x__: "s"
- a: 4
- x___: "a"
- map-m1: 5
- map-m2: 6
- Everything enabled, prefixes are empty strings, suffix is empty string:
- Configuration:
logs:
export_resource_info_on_event: true
export_resource_prefix: ""
export_scope_info_on_event: true
export_scope_prefix: ""
decompose_complex_message_field: true
decomposed_complex_message_prefix: ""
export_separator: "-"
export_distinguishing_suffix: ""
- Event:
- message: "{\"b\": 1, \"x\": \"b\"}"
- b: 1
- r: 2
- s: 3
- a: 4
- x: "a"
- map-m1: 5
- map-m2: 6
Field names can have .
dots, _
underscores, and -
hyphens. You must escape slashes in Search and PowerQueries. For example, search the field name app.kubernetes.io/component
as app.kubernetes.io\/component
.
Example
processors:
attributes:
- key: serverHost
action: insert
from_attribute: container_id
resource:
attributes:
- key: serverHost
from_attribute: node_id
action: insert
exporters:
dataset/logs:
# DataSet API URL, https://app.eu.scalyr.com for DataSet EU instance
dataset_url: https://app.scalyr.com
# API Key
api_key: your_api_key
buffer:
# Send buffer to the API at least every 5s
max_lifetime: 5s
# Group data based on these attributes
group_by:
- container_id
# try to send data to the DataSet for at most 30s during shutdown
retry_shutdown_timeout: 30s
server_host:
# If the serverHost attribute is not specified or empty,
# use the value from the env variable SERVER_HOST
server_host: ${env:SERVER_HOST}
# If server_host is not set, use the hostname value
use_hostname: true
dataset/traces:
# DataSet API URL, https://app.eu.scalyr.com for DataSet EU instance
dataset_url: https://app.scalyr.com
# API Key
api_key: your_api_key
buffer:
max_lifetime: 15s
group_by:
- resource_service.instance.id
service:
pipelines:
logs:
receivers: [otlp]
processors: [batch, attributes]
# add dataset among your exporters
exporters: [dataset/logs]
traces:
receivers: [otlp]
processors: [batch]
# add dataset among your exporters
exporters: [dataset/traces]
Handling serverHost
Attribute
Based on the given configuration and scenarios, here's the expected behavior:
- Resource:
{'node_id:' 'node-pay-01', 'host.name': 'host-pay-01'}
, Log: {'container_id': 'cont-pay-01'}
, Env: SERVER_HOST='server-pay-01'
, Hostname: ip-172-31-27-19
- Since the attribute
container_id
is set, attributesprocessor
will copy this value to the serverHost
.
- Used
serverHost
will be cont-pay-01
.
- Resource:
{'node_id': 'node-pay-01', 'host.name': 'host-pay-01'}
, Log: {'attribute.foo': 'Bar'}
, Env: SERVER_HOST='server-pay-01'
, Hostname: ip-172-31-27-19
- Since the resource attribute
node_id
is set, resourceprocessor
will copy this value to the serverHost
.
- Used
serverHost
will be node-pay-01
.
- Resource:
{'host.name': 'host-pay-01'}
, Log: {'attribute.foo': 'Bar'}
, Env: SERVER_HOST='server-pay-01'
, Hostname: ip-172-31-27-19
- Since the resource attribute
host.name
is set, it will be used.
- Used
serverHost
will be host-pay-01
.
- Resource:
{}
, Log: {'attribute.foo': 'Bar'}
, Env: SERVER_HOST='server-pay-01'
, Hostname: ip-172-31-27-19
- Since the attribute
container_id
is not set, the value from the environmental variable SERVER_HOST
will be copied to the serverHost
.
- Used
serverHost
will be server-pay-01
.
- Resource:
{}
, Log: {'attribute.foo': 'Bar'}
, Env: SERVER_HOST=''
, Hostname: ip-172-31-27-19
- Since the attribute
container_id
is not set and the environmental variable SERVER_HOST
is empty, the hostname
of the node (ip-172-31-27-19
) will be used as the fallback value for serverHost
.
- Used
serverHost
will be ip-172-31-27-19
.
Metrics
To enable metrics you have to:
- Run collector with enabled feature gate
telemetry.useOtelForInternalMetrics
. This can be done by executing it with one additional parameter - --feature-gates=telemetry.useOtelForInternalMetrics
.
- Enable metrics scraping as part of the configuration and add receiver into services:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 5s
static_configs:
- targets: ['0.0.0.0:8888']
...
service:
pipelines:
metrics:
# add prometheus among metrics receivers
receivers: [prometheus]
processors: [batch]
exporters: [otlphttp/prometheus, debug]
Available Metrics
Available metrics contain dataset
in their name. There are counters related to the
number of processed events (events
), buffers (buffer
), and transferred bytes (bytes
).
There are also histograms related to response times (responseTime
) and payload size (payloadSize
).
There are several counters related to events/buffers:
enqueued
- the number of received entities
processed
- the number of entities that were accepted by the next layer
dropped
- the number of entities that were not accepted by the next layer
broken
- the number of entities that were somehow corrupted during processing (should be 0)
The number of entities, that are still in the queue can be computed as enqueued - (processed + dropped + broken)
.