datasetexporter

package module
v0.116.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 17, 2024 License: Apache-2.0 Imports: 27 Imported by: 1

README

DataSet Exporter

Status
Stability alpha: logs, traces
Distributions contrib
Issues Open issues Closed issues
Code Owners @atoulme, @martin-majlis-s1, @zdaratom-s1, @tomaz-s1

This exporter sends logs to DataSet.

See the Getting Started guide.

Configuration

Required Settings
  • dataset_url (no default): The URL of the DataSet API that ingests the data. Most likely https://app.scalyr.com.
  • api_key (no default): The "Log Write" API Key required to use API. Instructions how to get API key.

If you do not want to specify api_key in the file, you can use the builtin functionality and use api_key: ${env:DATASET_API_KEY}.

Server Host Settings

Specifying the server host is crucial for ensuring the correct functionality of DataSet. DataSet expects the server host value to be provided in the serverHost attribute. If the server host value is stored in a different attribute, you can use the resourceprocessor or attributesprocessor to copy it into the serverHost attribute.

You can also utilize the server_host settings (described below) to populate the serverHost attribute with different values.

The process of populating the serverHost attribute works as follows:

  • If the serverHost attribute is specified and not empty in the log or trace, then it is used.
  • If the serverHost attribute is specified and not empty in the resource, then it is used.
  • If the host.name attribute is specified and not empty in the resource, then it is used.
  • If the server_host.server_host setting is specified and not empty, then it is used.
  • If server_host.use_host_name setting is set to true, the hostname of the node is used.

Make sure to provide the appropriate server host value in the serverHost attribute to ensure the proper functionality of DataSet and accurate handling of events.

Optional Settings
  • debug (default = false): Adds session_key to the server fields. It's useful for debugging throughput issues.
  • buffer:
    • max_lifetime (default = 5s): The maximum delay between sending batches from the same session.
    • purge_older_than (default = 30s): The maximum delay between receiving data for the same session after which resources associated with it are purged.
    • group_by (default = []): The list of attributes based on which events should be grouped. They are moved from the event attributes to the session info and shown as server fields in the UI.
    • retry_initial_interval (default = 5s): Time to wait after the first failure before retrying.
    • retry_max_interval (default = 30s): Is the upper bound on backoff.
    • retry_max_elapsed_time (default = 300s): Is the maximum amount of time spent trying to send a buffer.
    • retry_shutdown_timeout (default = 30s): The maximum time for which it will try to send data to the DataSet during shutdown. This value should be shorter than container's grace period.
    • max_parallel_outgoing (default = 100): The maximum number of parallel outgoing requests.
  • logs:
    • export_resource_info_on_event (default = false): Include LogRecord resource information (if available) on the DataSet event.
    • export_resource_prefix (default = 'resource.attributes.'): A prefix string for the resource, if export_resource_info_on_event is enabled.
    • export_scope_info_on_event (default = true): Include LogRecord scope information (if available) on the DataSet event.
    • export_scope_prefix (default = 'scope.attributes.'): A prefix string for the scope, if export_scope_info_on_event is enabled.
    • export_separator (default = '.'): The separator to add between keys when flattening nested structures (maps, arrays).
    • export_distinguishing_suffix (default = '_'): A suffix string to resolve naming collisions when flattening.
    • decompose_complex_message_field (default = false): Decompose complex body / message field types (e.g. a maps, arrays) into separate fields.
    • decomposed_complex_message_prefix (default = 'body.map.'): A prefix string to use when a complex message is decomposed.
  • traces:
    • export_separator (default = '.'): The separator to add between keys when flattening nested structures (maps, arrays).
    • export_distinguishing_suffix (default = '_'): A suffix string to resolve naming collisions when flattening.
  • server_host:
    • server_host (default = ''): Specifies the server host to be used for the events.
    • use_hostname (default = true): Determines whether the hostname of the node should be used as the server host for the events. When set to true, the node's hostname is automatically used.
  • retry_on_failure: See retry_on_failure
  • sending_queue: See sending_queue
  • timeout: See timeout
Attributes

Enabled attributes are exported in the order:

  1. Log properties
  2. Body
  3. Resource attributes
  4. Scope attributes
  5. Log attributes

If there is a name conflict, the export_distinguishing_suffix value is appended to the later attribute's name. If the export_distinguishing_suffix value is an empty string, then the value from the last attribute is used.

Example

Example LogRecord:

Log
- body:
  - b: 1
  - x: "b"
- resource:
  - r: 2
  - x: "r"
- scope:
  - s: 3
  - x: "s"
- attribute:
  - a: 4
  - x: "a"
  - map:
    - m1: 5
    - m2: 6

Then the event will look like:

  • Default settings for logs:
    • Event:
      - message: "{\"b\": 1, \"x\": \"b\"}"
      - scope.attributes.s: 3
      - scope.attributes.x: "s"
      - a: 4
      - x: "a"
      - map.m1: 5
      - map.m2: 6
      
  • Everything enabled:
    • Configuration:
        logs:
          export_resource_info_on_event: true
          export_resource_prefix: "r."
          export_scope_info_on_event: true
          export_scope_prefix: "s."
          decompose_complex_message_field: true
          decomposed_complex_message_prefix: "m."
          export_separator: "-"
          export_distinguishing_suffix: "_"
      
    • Event:
      - message: "{\"b\": 1, \"x\": \"b\"}"
      - m.b: 1
      - m.x: "b"
      - r.r: 2
      - r.x: "r"
      - s.s: 3
      - s.x: "s"
      - a: 4
      - x: "a"
      - map-m1: 5
      - map-m2: 6
      
  • Everything enabled, prefixes are empty strings:
    • Configuration:
        logs:
          export_resource_info_on_event: true
          export_resource_prefix: ""
          export_scope_info_on_event: true
          export_scope_prefix: ""
          decompose_complex_message_field: true
          decomposed_complex_message_prefix: ""
          export_separator: "-"
          export_distinguishing_suffix: "_"
      
    • Event:
      - message: "{\"b\": 1, \"x\": \"b\"}"
      - b: 1
      - x: "b"
      - r: 2
      - x_: "r"
      - s: 3
      - x__: "s"
      - a: 4
      - x___: "a"
      - map-m1: 5
      - map-m2: 6
      
  • Everything enabled, prefixes are empty strings, suffix is empty string:
    • Configuration:
        logs:
          export_resource_info_on_event: true
          export_resource_prefix: ""
          export_scope_info_on_event: true
          export_scope_prefix: ""
          decompose_complex_message_field: true
          decomposed_complex_message_prefix: ""
          export_separator: "-"
          export_distinguishing_suffix: ""
      
    • Event:
      - message: "{\"b\": 1, \"x\": \"b\"}"
      - b: 1
      - r: 2
      - s: 3
      - a: 4
      - x: "a"
      - map-m1: 5
      - map-m2: 6
      

Field names can have . dots, _ underscores, and - hyphens. You must escape slashes in Search and PowerQueries. For example, search the field name app.kubernetes.io/component as app.kubernetes.io\/component.

Example
processors:
  attributes:
    - key: serverHost
      action: insert
      from_attribute: container_id
  resource:
    attributes:
      - key: serverHost
        from_attribute: node_id
        action: insert      

exporters:
  dataset/logs:
    # DataSet API URL, https://app.eu.scalyr.com for DataSet EU instance
    dataset_url: https://app.scalyr.com
    # API Key
    api_key: your_api_key
    buffer:
      # Send buffer to the API at least every 5s
      max_lifetime: 5s
      # Group data based on these attributes
      group_by:
        - container_id
      # try to send data to the DataSet for at most 30s during shutdown
      retry_shutdown_timeout: 30s
    server_host:
      # If the serverHost attribute is not specified or empty,
      # use the value from the env variable SERVER_HOST
      server_host: ${env:SERVER_HOST}
      # If server_host is not set, use the hostname value
      use_hostname: true

  dataset/traces:
    # DataSet API URL, https://app.eu.scalyr.com for DataSet EU instance
    dataset_url: https://app.scalyr.com
    # API Key
    api_key: your_api_key
    buffer:
      max_lifetime: 15s
      group_by:
        - resource_service.instance.id

service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch, attributes]
      # add dataset among your exporters
      exporters: [dataset/logs]
    traces:
      receivers: [otlp]
      processors: [batch]
      # add dataset among your exporters
      exporters: [dataset/traces]
Handling serverHost Attribute

Based on the given configuration and scenarios, here's the expected behavior:

  1. Resource: {'node_id:' 'node-pay-01', 'host.name': 'host-pay-01'}, Log: {'container_id': 'cont-pay-01'}, Env: SERVER_HOST='server-pay-01', Hostname: ip-172-31-27-19
    • Since the attribute container_id is set, attributesprocessor will copy this value to the serverHost.
    • Used serverHost will be cont-pay-01.
  2. Resource: {'node_id': 'node-pay-01', 'host.name': 'host-pay-01'}, Log: {'attribute.foo': 'Bar'}, Env: SERVER_HOST='server-pay-01', Hostname: ip-172-31-27-19
    • Since the resource attribute node_id is set, resourceprocessor will copy this value to the serverHost.
    • Used serverHost will be node-pay-01.
  3. Resource: {'host.name': 'host-pay-01'}, Log: {'attribute.foo': 'Bar'}, Env: SERVER_HOST='server-pay-01', Hostname: ip-172-31-27-19
    • Since the resource attribute host.name is set, it will be used.
    • Used serverHost will be host-pay-01.
  4. Resource: {}, Log: {'attribute.foo': 'Bar'}, Env: SERVER_HOST='server-pay-01', Hostname: ip-172-31-27-19
    • Since the attribute container_id is not set, the value from the environmental variable SERVER_HOST will be copied to the serverHost.
    • Used serverHost will be server-pay-01.
  5. Resource: {}, Log: {'attribute.foo': 'Bar'}, Env: SERVER_HOST='', Hostname: ip-172-31-27-19
    • Since the attribute container_id is not set and the environmental variable SERVER_HOST is empty, the hostname of the node (ip-172-31-27-19) will be used as the fallback value for serverHost.
    • Used serverHost will be ip-172-31-27-19.

Metrics

To enable metrics you have to:

  1. Run collector with enabled feature gate telemetry.useOtelForInternalMetrics. This can be done by executing it with one additional parameter - --feature-gates=telemetry.useOtelForInternalMetrics.
  2. Enable metrics scraping as part of the configuration and add receiver into services:
    receivers:
      prometheus:
        config:
          scrape_configs:
            - job_name: 'otel-collector'
              scrape_interval: 5s
              static_configs:
                - targets: ['0.0.0.0:8888']
    ...
    service:
      pipelines:
        metrics:
          # add prometheus among metrics receivers
          receivers: [prometheus]
          processors: [batch]
          exporters: [otlphttp/prometheus, debug]
    
Available Metrics

Available metrics contain dataset in their name. There are counters related to the number of processed events (events), buffers (buffer), sessions (sessions), and transferred bytes (bytes). There are also histograms related to response times (responseTime) and payload size (payloadSize).

There are several counters related to events/buffers:

  • enqueued - the number of received entities
  • processed - the number of entities that were accepted by the next layer
  • dropped - the number of entities that were not accepted by the next layer
  • broken - the number of entities that were somehow corrupted during processing (should be 0)

The number of entities, that are still in the queue can be computed as enqueued - (processed + dropped + broken).

Documentation

Overview

Package datasetexporter implements an exporter that sends data to DataSet.

Index

Constants

View Source
const (
	Service = ResourceType("service")
	Process = ResourceType("process")
)
View Source
const ServiceNameKey = "service.name"

Variables

This section is empty.

Functions

func NewFactory

func NewFactory() exporter.Factory

NewFactory created new factory with DataSet exporters.

Types

type BufferSettings added in v0.78.0

type BufferSettings struct {
	MaxLifetime          time.Duration `mapstructure:"max_lifetime"`
	PurgeOlderThan       time.Duration `mapstructure:"purge_older_than"`
	GroupBy              []string      `mapstructure:"group_by"`
	RetryInitialInterval time.Duration `mapstructure:"retry_initial_interval"`
	RetryMaxInterval     time.Duration `mapstructure:"retry_max_interval"`
	RetryMaxElapsedTime  time.Duration `mapstructure:"retry_max_elapsed_time"`
	RetryShutdownTimeout time.Duration `mapstructure:"retry_shutdown_timeout"`
	MaxParallelOutgoing  int           `mapstructure:"max_parallel_outgoing"`
}

type Config

type Config struct {
	DatasetURL                string              `mapstructure:"dataset_url"`
	APIKey                    configopaque.String `mapstructure:"api_key"`
	Debug                     bool                `mapstructure:"debug"`
	BufferSettings            `mapstructure:"buffer"`
	TracesSettings            `mapstructure:"traces"`
	LogsSettings              `mapstructure:"logs"`
	ServerHostSettings        `mapstructure:"server_host"`
	configretry.BackOffConfig `mapstructure:"retry_on_failure"`
	QueueSettings             exporterhelper.QueueConfig   `mapstructure:"sending_queue"`
	TimeoutSettings           exporterhelper.TimeoutConfig `mapstructure:"timeout"`
}

func (*Config) String

func (c *Config) String() string

String returns a string representation of the Config object. It includes all the fields and their values in the format "field_name: field_value".

func (*Config) Unmarshal

func (c *Config) Unmarshal(conf *confmap.Conf) error

func (*Config) Validate

func (c *Config) Validate() error

Validate checks if all required fields in Config are set and have valid values. If any of the required fields are missing or have invalid values, it returns an error.

type DatasetExporter added in v0.78.0

type DatasetExporter struct {
	// contains filtered or unexported fields
}

type ExporterConfig added in v0.78.0

type ExporterConfig struct {
	// contains filtered or unexported fields
}

type LogsSettings added in v0.80.0

type LogsSettings struct {
	// ExportResourceInfo is optional flag to signal that the resource info is being exported to DataSet while exporting Logs.
	// This is especially useful when reducing DataSet billable log volume.
	// Default value: false
	ExportResourceInfo bool `mapstructure:"export_resource_info_on_event"`

	// ExportResourcePrefix is prefix for the resource attributes when they are exported (see ExportResourceInfo).
	// Default value: resource.attributes.
	ExportResourcePrefix string `mapstructure:"export_resource_prefix"`

	// ExportScopeInfo is an optional flag that signals if scope info should be exported (when available) with each event. If scope
	// information is not utilized, it makes sense to disable exporting it since it will result in increased billable log volume.
	// Default value: true
	ExportScopeInfo bool `mapstructure:"export_scope_info_on_event"`

	// ExportScopePrefix is prefix for the scope attributes when they are exported (see ExportScopeInfo).
	// Default value: scope.attributes.
	ExportScopePrefix string `mapstructure:"export_scope_prefix"`

	// DecomposeComplexMessageField is an optional flag to signal that message / body of complex types (e.g. a map) should be
	// decomposed / deconstructed into multiple fields. This is usually done outside of the main DataSet integration on the
	// client side (e.g. as part of the attribute processor or similar) or on the server side (DataSet server side JSON parser
	// for message field) and that's why this functionality is disabled by default.
	DecomposeComplexMessageField bool `mapstructure:"decompose_complex_message_field"`

	// DecomposedComplexMessagePrefix is prefix for the decomposed complex message (see DecomposeComplexMessageField).
	// Default value: body.map.
	DecomposedComplexMessagePrefix string `mapstructure:"decomposed_complex_message_prefix"`
	// contains filtered or unexported fields
}

type ResourceType added in v0.78.0

type ResourceType string

type ServerHostSettings added in v0.83.0

type ServerHostSettings struct {
	UseHostName bool   `mapstructure:"use_hostname"`
	ServerHost  string `mapstructure:"server_host"`
}

type TracesSettings added in v0.78.0

type TracesSettings struct {
	// contains filtered or unexported fields
}

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL