input/

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ozontech/file.d

Links

Open Source Insights

README ¶

Input plugins

dmesg

It reads kernel events from /dev/kmsg

More details...

fake

It provides an API to test pipelines and other plugins.

More details...

file

It watches for files in the provided directory and reads them line by line.

Each line should contain only one event. It also correctly handles rotations (rename/truncate) and symlinks.

From time to time, it instantly releases and reopens descriptors of the completely processed files. Such behavior allows files to be deleted by a third party software even though file.d is still working (in this case the reopening will fail).

A watcher is trying to use the file system events to detect file creation and updates. But update events don't work with symlinks, so watcher also periodically manually fstat all tracking files to detect changes.

⚠ It supports the commitment mechanism. But "least once delivery" is guaranteed only if files aren't being truncated. However, file.d correctly handles file truncation, there is a little chance of data loss. It isn't a file.d issue. The data may have been written just before the file truncation. In this case, you may miss to read some events. If you care about the delivery, you should also know that the logrotate manual clearly states that copy/truncate may cause data loss even on a rotating stage. So use copy/truncate or similar actions only if your data isn't critical. In order to reduce potential harm of truncation, you can turn on notifications of file changes. By default the plugin is notified only on file creations. Note that following for changes is more CPU intensive.

⚠ Use add_file_name plugin if you want to add filename to events.

Reading docker container log files:

pipelines:
  example_docker_pipeline:
    input:
        type: file
        watching_dir: /var/lib/docker/containers
        offsets_file: /data/offsets.yaml
        filename_pattern: "*-json.log"
        persistence_mode: async

More details...

http

Reads events from HTTP requests with the body delimited by a new line.

Also, it emulates some protocols to allow receiving events from a wide range of software that use HTTP to transmit data. E.g. file.d may pretend to be Elasticsearch allows clients to send events using Elasticsearch protocol. So you can use Elasticsearch filebeat output plugin to send data to file.d.

⚠ Currently event commitment mechanism isn't implemented for this plugin. Plugin answers with HTTP code OK 200 right after it has read all the request body. It doesn't wait until events are committed.

Example: Emulating elastic through http:

pipelines:
  example_k8s_pipeline:
    settings:
      capacity: 1024
    input:
      # define input type.
      type: http
      # pretend elastic search, emulate it's protocol.
      emulate_mode: "elasticsearch"
      # define http port.
      address: ":9200"
    actions:
      # parse elastic search query.
      - type: parse_es
      # decode elastic search json.
      - type: json_decode
        # field is required.
        field: message
    output:
      # Let's write to kafka example.
      type: kafka
      brokers: [kafka-local:9092, kafka-local:9091]
      default_topic: yourtopic-k8s-data
      use_topic_field: true
      topic_field: pipeline_kafka_topic

      # Or we can write to file:
      # type: file
      # target_file: "./output.txt"

Setup:

# run server.
# config.yaml should contains yaml config above.
go run ./cmd/file.d --config=config.yaml

# now do requests.
curl "localhost:9200/_bulk" -H 'Content-Type: application/json' -d \
'{"index":{"_index":"index-main","_type":"span"}}
{"message": "hello", "kind": "normal"}
'

More details...

journalctl

Reads journalctl output.

More details...

k8s

It reads Kubernetes logs and also adds pod meta-information. Also, it joins split logs into a single event.

Source log file should be named in the following format:
[pod-name]_[namespace]_[container-name]-[container-id].log

E.g. my_pod-1566485760-trtrq_my-namespace_my-container-4e0301b633eaa2bfdcafdeba59ba0c72a3815911a6a820bf273534b0f32d98e0.log

An information which plugin adds:

k8s_node – node name where pod is running;
k8s_node_label_* – node labels;
k8s_pod – pod name;
k8s_namespace – pod namespace name;
k8s_container – pod container name;
k8s_label_* – pod labels.

⚠ Use add_file_name plugin if you want to add filename to events.

Example:

pipelines:
  example_k8s_pipeline:
    input:
      type: k8s
      offsets_file: /data/offsets.yaml
      file_config:                        // customize file plugin
        persistence_mode: sync
        read_buffer_size: 2048

More details...

kafka

It reads events from multiple Kafka topics using sarama library.

It guarantees at "at-least-once delivery" due to the commitment mechanism.

Example Standard example:

pipelines:
  example_pipeline:
    input:
      type: kafka
      brokers: [kafka:9092, kafka:9091]
      topics: [topic1, topic2]
      offset: newest
    # output plugin is not important in this case, let's emulate s3 output.
    output:
      type: s3
      file_config:
        retention_interval: 10s
      endpoint: "s3.fake_host.org:80"
      access_key: "access_key1"
      secret_key: "secret_key2"
      bucket: "bucket-logs"
      bucket_field_event: "bucket_name"

More details...
Generated using insane-doc

Directories ¶

Path	Synopsis
dmesg
fake
file
http
journalctl
k8s
kafka

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL