googlecloudpubsubexporter

package module
v0.97.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 26, 2024 License: Apache-2.0 Imports: 24 Imported by: 4

README

Google Cloud Pubsub Exporter

Status
Stability beta: traces, metrics, logs
Distributions contrib, observiq
Issues Open issues Closed issues
Code Owners @alexvanboxel

⚠️ This is a community-provided module. It has been developed and extensively tested at Collibra, but it is not officially supported by GCP.

This exporter sends OTLP messages to a Google Cloud Pubsub topic.

The following configuration options are supported:

  • project (Optional): The Google Cloud Project of the topics.
  • topic (Required): The topic name to receive OTLP data over. The topic name should be a fully qualified resource name (eg: projects/otel-project/topics/otlp).
  • compression (Optional): Set the payload compression, only gzip is supported. Default is no compression.
  • watermark Behaviour of how the ce-time attribute is set (see watermark section for more info)
    • behavior (Optional): current sets the ce-time attribute to the system clock, earliest sets the attribute to the smallest timestamp of all the messages.
    • allow_drift (Optional): The maximum difference the ce-time attribute can be set from the system clock. When the drift is set to 0, the maximum drift from the clock is allowed (only applicable to earliest).
  • endpoint (Optional): Override the default Pubsub Endpoint, useful when connecting to the PubSub emulator instance or switching between global and regional service endpoints.
  • insecure (Optional): allows performing “insecure” SSL connections and transfers, useful when connecting to a local emulator instance. Only has effect if Endpoint is not ""
exporters:
  googlecloudpubsub:
    project: my-project
    topic: otlp-traces

Pubsub topic

The Google Cloud Pubsub export doesn't automatic create topics, it expects the topic to be created upfront. Security wise it's best to give the collector its own service account and give the topic Pub/Sub Publisher permission.

Messages

The message published on the topic are CloudEvent compliance and uses the binary content mode defined in the Google Cloud Pub/Sub Protocol Binding for CloudEvents .

The data field is either a ExportTraceServiceRequest, ExportMetricsServiceRequest or ExportLogsServiceRequest for traces, metrics or logs respectively. Each message is accompanied by the following attributes:

attributes description
ce-specversion Follow version 1.0 of the CloudEvent spec
ce-source The source is this /opentelemetry/collector/googlecloudpubsub/<version> exporter
ce-id a random UUID to uniquely define the message
ce-time a watermark indicating when the events, encapsulated in the OTLP message, where generated. The behavior will depend on the watermark setting in the configuration
ce-type depending on the data org.opentelemetry.otlp.traces.v1, org.opentelemetry.otlp.metrics.v1 or org.opentelemetry.otlp.logs.v1
content-type the content type is application/protobuf
content-encoding indicates that payload is compressed. Only gzip compression is supported
Compression

By default, the messages are not compressed. By compressing the messages, the cost of Pubsub can be reduced to up to 20% of the cost. This can be done by setting the compression to gzip.

exporters:
  googlecloudpubsub:
    project: my-project
    topic: otlp-traces
    compression: gzip

The exporter with add the content-encoding attribute to the message. The receiver will look at this attribute to detect the compression that is used on the payload.

Only gzip is supported.

Watermark

A watermark is a threshold that indicates where streaming processing frameworks (like Apache Beam) expects all the data in a window to have arrived. If new data arrives with a timestamp that's in the window but older than the watermark, the data is considered late data. The watermark section will change the behaviour of the ce-time attribute of the message. If you don't use such frameworks you can ignore the section and the ce-time will be set to the current time, but to have a more reliable watermark behaviour in such streaming it's better to set the ce-time attribute to the earliest timestamp of the messages embedded in the Pubsub message.

Setting the behaviour to earliest will scan all the embedded message before sending the actual Pubsub message to figure out what the earliest timestamp is. You have to set allow_drift, the allowed maximum for the ce-time timestamp , if you want to behaviour to have effect as the default is 0s.

exporters:
  googlecloudpubsub:
    project: my-project
    topic: otlp-traces
    watermark: 
      behavior: earliest
      allow_drift: 1h

The default behavior is that the watermark is set to the current time of the processor. This timestamp will not differ that much as the timestamp that is attached to a Pubsub message. Most users that don't do anything outside using Pubsub as a global distribution system will not need anything else.

If you use Google Cloud Dataflow and want to rely on the advanced streaming feature you may want to change the behavior of the watermark and de-duplication. You can leverage the unique id (ce-id) and a timestamp (ce-time) attributes on the message. In Apache Beam (the framework used by Dataflow) you can set the attributes names on the Pubsub connector via the .withTimestampAttribute("ce-time") and .withIdAttribute("ce-id") methods. A good settings for this scenario is behavior: earliest with a reasonable allow_drift of 1h.

Allowed behavior values are current or earliest. For allow_drift the default is 0s, so make sure to set the value.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewFactory

func NewFactory() exporter.Factory

NewFactory creates a factory for Google Cloud Pub/Sub exporter.

Types

type Config

type Config struct {
	// Timeout for all API calls. If not set, defaults to 12 seconds.
	exporterhelper.TimeoutSettings `mapstructure:",squash"` // squash ensures fields are correctly decoded in embedded struct.
	exporterhelper.QueueSettings   `mapstructure:"sending_queue"`
	configretry.BackOffConfig      `mapstructure:"retry_on_failure"`
	// Google Cloud Project ID where the Pubsub client will connect to
	ProjectID string `mapstructure:"project"`
	// User agent that will be used by the Pubsub client to connect to the service
	UserAgent string `mapstructure:"user_agent"`
	// Override of the Pubsub Endpoint, leave empty for the default endpoint
	Endpoint string `mapstructure:"endpoint"`
	// Only has effect if Endpoint is not ""
	Insecure bool `mapstructure:"insecure"`

	// The fully qualified resource name of the Pubsub topic
	Topic string `mapstructure:"topic"`
	// Compression of the payload (only gzip or is supported, no compression is the default)
	Compression string `mapstructure:"compression"`
	// Watermark defines the watermark (the ce-time attribute on the message) behavior
	Watermark WatermarkConfig `mapstructure:"watermark"`
}

func (*Config) Validate added in v0.45.0

func (config *Config) Validate() error

type WatermarkBehavior added in v0.47.0

type WatermarkBehavior int

type WatermarkConfig added in v0.47.0

type WatermarkConfig struct {
	// Behavior of the watermark. Currently, only  of the message (none, earliest and current, current being the default)
	// will set the timestamp on pubsub based on timestamps of the events inside the message
	Behavior string `mapstructure:"behavior"`
	// Indication on how much the timestamp can drift from the current time, the timestamp will be capped to the allowed
	// maximum. A duration of 0 is the same as maximum duration
	AllowedDrift time.Duration `mapstructure:"allowed_drift"`
}

WatermarkConfig customizes the behavior of the watermark

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL