filelogreceiver

package module

v0.116.0 Latest Latest Go to latest Published: Dec 17, 2024 License: Apache-2.0 Imports: 7 Imported by: 19

README ¶

File Log Receiver

Status
Stability	beta: logs
Distributions	contrib, k8s
Issues
Code Owners	@djaglowski

Tails and parses logs from files.

Configuration

Field	Default	Description
`include`	required	A list of file glob patterns that match the file paths to be read.
`exclude`	[]	A list of file glob patterns to exclude from reading. This is applied against the paths matched by `include`.
`exclude_older_than`		Exclude files whose modification time is older than the specified age.
`start_at`	`end`	At startup, where to start reading logs from the file. Options are `beginning` or `end`.
`multiline`		A `multiline` configuration block. See below for more details.
`force_flush_period`	`500ms`	Time since last time new data was found in the file, after which a partial log at the end of the file may be emitted.
`encoding`	`utf-8`	The encoding of the file being read. See the list of supported encodings below for available options.
`preserve_leading_whitespaces`	`false`	Whether to preserve leading whitespaces.
`preserve_trailing_whitespaces`	`false`	Whether to preserve trailing whitespaces.
`include_file_name`	`true`	Whether to add the file name as the attribute `log.file.name`.
`include_file_path`	`false`	Whether to add the file path as the attribute `log.file.path`.
`include_file_name_resolved`	`false`	Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`.
`include_file_path_resolved`	`false`	Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`.
`include_file_owner_name`	`false`	Whether to add the file owner name as the attribute `log.file.owner.name`. Not supported for windows.
`include_file_owner_group_name`	`false`	Whether to add the file group name as the attribute `log.file.owner.group.name`. Not supported for windows.
`include_file_record_number`	`false`	Whether to add the record number in the file as the attribute `log.file.record_number`.
`poll_interval`	200ms	The duration between filesystem polls.
`fingerprint_size`	`1kb`	The number of bytes with which to identify a file. The first bytes in the file are used as the fingerprint. Decreasing this value at any point will cause existing fingerprints to forgotten, meaning that all files will be read from the beginning (one time)
`max_log_size`	`1MiB`	The maximum size of a log entry to read. A log entry will be truncated if it is larger than `max_log_size`. Protects against reading large amounts of data into memory.
`max_concurrent_files`	1024	The maximum number of log files from which logs will be read concurrently. If the number of files matched in the `include` pattern exceeds this number, then files will be processed in batches.
`max_batches`	0	Only applicable when files must be batched in order to respect `max_concurrent_files`. This value limits the number of batches that will be processed during a single poll interval. A value of 0 indicates no limit.
`delete_after_read`	`false`	If `true`, each log file will be read and then immediately deleted. Requires that the `filelog.allowFileDeletion` feature gate is enabled. Must be `false` when `start_at` is set to `end`.
`acquire_fs_lock`	`false`	Whether to attempt to acquire a filesystem lock before reading a file (Unix only).
`attributes`	{}	A map of `key: value` pairs to add to the entry's attributes.
`resource`	{}	A map of `key: value` pairs to add to the entry's resource.
`operators`	[]	An array of operators. See below for more details.
`storage`	none	The ID of a storage extension to be used to store file offsets. File offsets allow the receiver to pick up where it left off in the case of a collector restart. If no storage extension is used, the receiver will manage offsets in memory only.
`header`	nil	Specifies options for parsing header metadata. Requires that the `filelog.allowHeaderMetadataParsing` feature gate is enabled. See below for details. Must not be set when `start_at` is set to `end`.
`header.pattern`	required for header metadata parsing	A regex that matches every header line.
`header.metadata_operators`	required for header metadata parsing	A list of operators used to parse metadata from the header.
`retry_on_failure.enabled`	`false`	If `true`, the receiver will pause reading a file and attempt to resend the current batch of logs if it encounters an error from downstream components.
`retry_on_failure.initial_interval`	`1s`	Time to wait after the first failure before retrying.
`retry_on_failure.max_interval`	`30s`	Upper bound on retry backoff interval. Once this value is reached the delay between consecutive retries will remain constant at the specified value.
`retry_on_failure.max_elapsed_time`	`5m`	Maximum amount of time (including retries) spent trying to send a logs batch to a downstream consumer. Once this value is reached, the data is discarded. Retrying never stops if set to `0`.
`ordering_criteria.regex`		Regular expression used for sorting, should contain a named capture groups that are to be used in `regex_key`.
`ordering_criteria.gropup_by`		Regular expression used for grouping, which is done pre-sorting. Should contain a named capture groups.
`ordering_criteria.top_n`	1	The number of files to track when using file ordering. The top N files are tracked after applying the ordering criteria.
`ordering_criteria.sort_by.sort_type`		Type of sorting to be performed (e.g., `numeric`, `alphabetical`, `timestamp`, `mtime`)
`ordering_criteria.sort_by.location`		Relevant if `sort_type` is set to `timestamp`. Defines the location of the timestamp of the file.
`ordering_criteria.sort_by.format`		Relevant if `sort_type` is set to `timestamp`. Defines the strptime format of the timestamp being sorted.
`ordering_criteria.sort_by.ascending`		Sort direction
`compression`		Indicate the compression format of input files. If set accordingly, files will be read using a reader that uncompresses the file before scanning its content. Options are `` or `gzip`

Note that by default, no logs will be read from a file that is not actively being written to because start_at defaults to end.

Operators

Each operator performs a simple responsibility, such as parsing a timestamp or JSON. Chain together operators to process logs into a desired format.

Every operator has a type.
Every operator can be given a unique id. If you use the same type of operator more than once in a pipeline, you must specify an id. Otherwise, the id defaults to the value of type.
Operators will output to the next operator in the pipeline. The last operator in the pipeline will emit from the receiver. Optionally, the output parameter can be used to specify the id of another operator to which logs will be passed directly.
Only parsers and general purpose operators should be used.

Multiline configuration

If set, the multiline configuration block instructs the file_input operator to split log entries on a pattern other than newlines.

The multiline configuration block must contain exactly one of line_start_pattern or line_end_pattern. These are regex patterns that match either the beginning of a new log entry, or the end of a log entry.

The omit_pattern setting can be used to omit the start/end pattern from each entry.

Supported encodings

Key	Description
`nop`	No encoding validation. Treats the file as a stream of raw bytes
`utf-8`	UTF-8 encoding
`utf-16le`	UTF-16 encoding with little-endian byte order
`utf-16be`	UTF-16 encoding with big-endian byte order
`ascii`	ASCII encoding
`big5`	The Big5 Chinese character encoding

Other less common encodings are supported on a best-effort basis. See https://www.iana.org/assignments/character-sets/character-sets.xhtml for other encodings available.

Header Metadata Parsing

To enable header metadata parsing, the filelog.allowHeaderMetadataParsing feature gate must be set, and start_at must be beginning.

If set, the file input operator will attempt to read a header from the start of the file. Each header line must match the header.pattern pattern. Each line is emitted into a pipeline defined by header.metadata_operators. Any attributes on the resultant entry from the embedded pipeline will be merged with the attributes from previous lines (attribute collisions will be resolved with an upsert strategy). After all header lines are read, the final merged header attributes will be present on every log line that is emitted for the file.

The header lines are not emitted by the receiver.

Additional Terminology and Features

An entry is the base representation of log data as it moves through a pipeline. All operators either create, modify, or consume entries.
A field is used to reference values in an entry.
A common expression syntax is used in several operators. For example, expressions can be used to filter or route entries.

Parsers with Embedded Operations

Many parsers operators can be configured to embed certain followup operations such as timestamp and severity parsing. For more information, see complex parsers.

Time parameters

All time parameters must have the unit of time specified. e.g.: 200ms, 1s, 1m.

Log Rotation

File Log Receiver can read files that are being rotated.

Example - Tailing a simple json file

Receiver Configuration

receivers:
  filelog:
    include: [ /var/log/myservice/*.json ]
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'

Example - Tailing a plaintext file

Receiver Configuration

receivers:
  filelog:
    include: [ /simple.log ]
    operators:
      - type: regex_parser
        regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?P<sev>[A-Z]*) (?P<msg>.*)$'
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'
        severity:
          parse_from: attributes.sev

The above configuration will read logs from the "simple.log" file. Some examples of logs that it will read:

2023-06-19 05:20:50 ERROR This is a test error message
2023-06-20 12:50:00 DEBUG This is a test debug message

Example - Multiline logs parsing

Receiver Configuration

receivers:
  filelog:
    include:
    - /var/log/example/multiline.log
    multiline:
      line_start_pattern: ^Exception

The above configuration will be able to parse multiline logs, splitting every time the ^Exception pattern is met.

Exception in thread 1 "main" java.lang.NullPointerException
        at com.example.myproject.Book.getTitle(Book.java:16)
        at com.example.myproject.Author.getBookTitles(Author.java:25)
        at com.example.myproject.Bootstrap.main(Bootstrap.java:14)
Exception in thread 2 "main" java.lang.NullPointerException
        at com.example.myproject.Book.getTitle(Book.java:16)
        at com.example.myproject.Author.getBookTitles(Author.java:25)
        at com.example.myproject.Bootstrap.main(Bootstrap.java:44)

Example - Reading compressed log files

Receiver Configuration

receivers:
  filelog:
    include:
    - /var/log/example/compressed.log.gz
    compression: gzip

The above configuration will be able to read gzip compressed log files by setting the compression option to gzip. When this option is set, all files ending with that suffix are scanned using a gzip reader that decompresses the file content before scanning through it. Please note that if the compressed file is expected to be updated, the additional compressed logs must be appended to the compressed file, rather than recompressing the whole content and overwriting the previous file.

Offset tracking

The storage setting allows you to define the proper storage extension for storing file offsets. While the storage parameter can ensure that log files are consumed accurately, it is possible that logs are dropped while moving downstream through other components in the collector. For additional resiliency, see Fault tolerant log collection example

Here is some of the information the file log receiver stores:

The number of files it is currently tracking (knownFiles).
For each file being tracked:
- The fingerprint of the file (Fingerprint.first_bytes).
- The byte offset from the start of the file, indicating the position in the file from where the file log receiver continues reading the file (Offset).
- An arbitrary set of file attributes, such as the name of the file (FileAttributes).

Exactly how this information is serialized depends on the type of storage being used.

Troubleshooting

Tracking symlinked files

If the receiver is being used to track a symlinked file and the symlink target is expected to change frequently, make sure to set the value of the poll_interval setting to something lower than the symlink update frequency.

Telemetry metrics

Enabling Collector metrics will also provide telemetry metrics for the state of the receiver's file consumption. Specifically, the otelcol_fileconsumer_open_files and otelcol_fileconsumer_reading_files metrics are provided.

Documentation ¶

Overview ¶

Package filelogreceiver implements a receiver that can be used by the OpenTelemetry collector to receive logs using the stanza log agent

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NewFactory ¶

func NewFactory() receiver.Factory

NewFactory creates a factory for filelog receiver

Types ¶

type FileLogConfig ¶

type FileLogConfig struct {
	InputConfig        file.Config `mapstructure:",squash"`
	adapter.BaseConfig `mapstructure:",squash"`
}

FileLogConfig defines configuration for the filelog receiver

type ReceiverType ¶

type ReceiverType struct{}

ReceiverType implements stanza.LogReceiverType to create a file tailing receiver

func (ReceiverType) BaseConfig ¶

func (f ReceiverType) BaseConfig(cfg component.Config) adapter.BaseConfig

BaseConfig gets the base config from config, for now

func (ReceiverType) CreateDefaultConfig ¶

func (f ReceiverType) CreateDefaultConfig() component.Config

CreateDefaultConfig creates a config with type and version

func (ReceiverType) InputConfig ¶ added in v0.60.0

func (f ReceiverType) InputConfig(cfg component.Config) operator.Config

InputConfig unmarshals the input operator

func (ReceiverType) Type ¶

func (f ReceiverType) Type() component.Type

Type is the receiver type

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
internal
metadata

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL