conduit

module

v0.12.0-nightly.20240913 Latest Latest Go to latest Published: Sep 12, 2024 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/conduitio/conduit

README ¶

Conduit

Data Integration for Production Data Stores. 💫

scarf pixel

Overview

Conduit is a data streaming tool written in Go. It aims to provide the best user experience for building and running real-time data pipelines. Conduit comes with batteries included, it provides a UI, common connectors, processors and observability data out of the box.

Conduit pipelines are built out of simple building blocks which run in their own goroutines and are connected using Go channels. This makes Conduit pipelines incredibly performant on multi-core machines. Conduit guarantees the order of received records won't change, it also takes care of consistency by propagating acknowledgments to the start of the pipeline only when a record is successfully processed on all destinations.

Conduit connectors are plugins that communicate with Conduit via a gRPC interface. This means that plugins can be written in any language as long as they conform to the required interface.

Conduit was created and open-sourced by Meroxa.

Quick start
Installation guide
Configuring Conduit
Storage
Connectors
Processors
API
UI
Documentation
Contributing

Quick start

Download and extract the latest release.
Download the example pipeline and put it in the directory named pipelines in the same directory as the Conduit binary.
Run Conduit (./conduit). The example pipeline will start automatically.
Write something to file example.in in the same directory as the Conduit binary.
```
echo "hello conduit" >> example.in
```

Read the contents of example.out and notice an OpenCDC record:

$ cat example.out
{"position":"MTQ=","operation":"create","metadata":{"file.path":"./example.in","opencdc.readAt":"1663858188836816000","opencdc.version":"v1"},"key":"MQ==","payload":{"before":null,"after":"aGVsbG8gY29uZHVpdA=="}}

The string hello conduit is a base64 encoded string stored in the field payload.after, let's decode it:
```
$ cat example.out | jq ".payload.after | @base64d"
"hello conduit"
```
Explore the UI by opening http://localhost:8080 and build your own pipeline!

Installation guide

Download binary and run

Download a pre-built binary from the latest release and simply run it!

./conduit

Once you see that the service is running you may access a user-friendly web interface at http://localhost:8080. You can also interact with the Conduit API directly, we recommend navigating to http://localhost:8080/openapi and exploring the HTTP API through Swagger UI.

Conduit can be configured through command line parameters. To view the full list of available options, run ./conduit --help or see configuring Conduit.

Homebrew

Make sure you have homebrew installed on your machine, then run:

brew update
brew install conduit

Debian

Download the right .deb file for your machine architecture from the latest release, then run:

dpkg -i conduit_0.10.0_Linux_x86_64.deb

RPM

Download the right .rpm file for your machine architecture from the latest release, then run:

rpm -i conduit_0.10.0_Linux_x86_64.rpm

Build from source

Requirements:

Go
Node.js (18.x)
Yarn (latest 1.x)
Ember CLI
Make

git clone git@github.com:ConduitIO/conduit.git
cd conduit
make
./conduit

Note that you can also build Conduit with make build-server, which only compiles the server and skips the UI. This command requires only Go and builds the binary much faster. That makes it useful for development purposes or for running Conduit as a simple backend service.

Docker

Our Docker images are hosted on GitHub's Container Registry. To run the latest Conduit version, you should run the following command:

docker run -p 8080:8080 conduit.docker.scarf.sh/conduitio/conduit:latest

The Docker image includes the UI, you can access it by navigating to http://localhost:8080.

Configuring Conduit

Conduit accepts CLI flags, environment variables and a configuration file to configure its behavior. Each CLI flag has a corresponding environment variable and a corresponding field in the configuration file. Conduit uses the value for each configuration option based on the following priorities:

CLI flags (highest priority) - if a CLI flag is provided it will always be respected, regardless of the environment variable or configuration file. To see a full list of available flags run conduit --help.
Environment variables (lower priority) - an environment variable is only used if no CLI flag is provided for the same option. Environment variables have the prefix CONDUIT and contain underscores instead of dots and hyphens (e.g. the flag -db.postgres.connection-string corresponds to CONDUIT_DB_POSTGRES_CONNECTION_STRING).
Configuration file (lowest priority) - Conduit by default loads the file conduit.yaml placed in the same folder as Conduit. The path to the file can be customized using the CLI flag -config. It is not required to provide a configuration file and any value in the configuration file can be overridden by an environment variable or a flag. The file content should be a YAML document where keys can be hierarchically split on .. For example:
```
db:
  type: postgres # corresponds to flag -db.type and env variable CONDUIT_DB_TYPE
  postgres:
    connection-string: postgres://localhost:5432/conduitdb # -db.postgres.connection-string or CONDUIT_DB_POSTGRES_CONNECTION_STRING
```

Storage

Conduit's own data (information about pipelines, connectors, etc.) can be stored in the following databases:

BadgerDB (default)
PostgreSQL
SQLite

It's also possible to store all the data in memory, which is sometimes useful for development purposes.

The database type used can be configured with the db.type parameter (through any of the configuration options in Conduit). For example, the CLI flag to use a PostgresSQL database with Conduit is as follows: -db.type=postgres.

Changing database parameters (e.g. the PostgreSQL connection string) is done through parameters of the following form: db.<db type>.<parameter name>. For example, the CLI flag to use a PostgreSQL instance listening on localhost:5432 would be: -db.postgres.connection-string=postgres://localhost:5432/conduitdb.

The full example in our case would be:

./conduit -db.type=postgres -db.postgres.connection-string="postgresql://localhost:5432/conduitdb"

Connectors

For the full list of available connectors, see the Connector List. If there's a connector that you're looking for that isn't available in Conduit, please file an issue .

Conduit loads standalone connectors at startup. The connector binaries need to be placed in the connectors directory relative to the Conduit binary so Conduit can find them. Alternatively, the path to the standalone connectors can be adjusted using the CLI flag -connectors.path.

Conduit ships with a number of built-in connectors:

File connector provides a source/destination to read/write a local file (useful for quickly trying out Conduit without additional setup).
Kafka connector provides a source/destination for Apache Kafka.
Postgres connector provides a source/destination for PostgreSQL.
S3 connector provides a source/destination for AWS S3.
Generator connector provides a source which generates random data (useful for testing).
Log connector provides a destination which logs all records (useful for testing).

Additionally, we have prepared a Kafka Connect wrapper that allows you to run any Apache Kafka Connect connector as part of a Conduit pipeline.

If you are interested in writing a connector yourself, have a look at our Go Connector SDK. Since standalone connectors communicate with Conduit through gRPC they can be written in virtually any programming language, as long as the connector follows the Conduit Connector Protocol .

Processors

A processor is a component that operates on a single record that flows through a pipeline. It can either change the record (i.e. transform it) or filter it out based on some criteria.

Conduit provides a number of builtin processors, which can be used to manipulate fields, send requests to HTTP endpoints, and more, check Builtin processors for the list of builtin processors and documentations.

Conduit also provides the ability to write your own Standalone Processor, or you can use the builtin processor custom.javascript to write custom processors in JavaScript.

More detailed information as well as examples can be found in the Processors documentation.

API

Conduit exposes a gRPC API and an HTTP API.

The gRPC API is by default running on port 8084. You can define a custom address using the CLI flag -grpc.address. To learn more about the gRPC API please have a look at the protobuf file .

The HTTP API is by default running on port 8080. You can define a custom address using the CLI flag -http.address. It is generated using gRPC gateway and is thus providing the same functionality as the gRPC API. To learn more about the HTTP API please have a look at the API documentation, OpenAPI definition or run Conduit and navigate to http://localhost:8080/openapi to open a Swagger UI which makes it easy to try it out.

UI

Conduit comes with a web UI that makes building data pipelines a breeze, you can access it at http://localhost:8080. See the installation guide for instructions on how to build Conduit with the UI.

For more information about the UI refer to the Readme in /ui.

animation

Documentation

To learn more about how to use Conduit visit Conduit.io/docs.

If you are interested in internals of Conduit we have prepared some technical documentation:

Pipeline Semantics explains the internals of how a Conduit pipeline works.
Pipeline Configuration Files explains how you can define pipelines using YAML files.
Processors contains examples and more information about Conduit processors.
Conduit Architecture will give you a high-level overview of Conduit.
Conduit Metrics provides more information about how Conduit exposes metrics.
Conduit Package structure provides more information about the different packages in Conduit.

Contributing

For a complete guide to contributing to Conduit, see the Contribution Guide .

We welcome you to join the community and contribute to Conduit to make it better! When something does not work as intended please check if there is already an issue that describes your problem, otherwise please open an issue and let us know. When you are not sure how to do something please open a discussion or hit us up on Discord.

We also value contributions in form of pull requests. When opening a PR please ensure:

You have followed the Code Guidelines.
There is no other pull request for the same update/change.
You have written unit tests.
You have made sure that the PR is of reasonable size and can be easily reviewed.

Directories ¶

Path	Synopsis
cmd
conduit
pkg
conduit Package conduit wires up everything under the hood of a Conduit instance including metrics, telemetry, logging, and server construction.	Package conduit wires up everything under the hood of a Conduit instance including metrics, telemetry, logging, and server construction.
connector
foundation/cerrors Package cerrors contains functions related to error handling.	Package cerrors contains functions related to error handling.
foundation/ctxutil
foundation/grpcutil
foundation/log
foundation/metrics
foundation/metrics/measure
foundation/metrics/noop Package noop exposes implementations of metrics which do not do anything.	Package noop exposes implementations of metrics which do not do anything.
foundation/metrics/prometheus
inspector
orchestrator
orchestrator/mock Package mock is a generated GoMock package.	Package mock is a generated GoMock package.
pipeline
pipeline/stream Package stream defines a message and nodes that can be composed into a data pipeline.	Package stream defines a message and nodes that can be composed into a data pipeline.
pipeline/stream/mock Package mock is a generated GoMock package.	Package mock is a generated GoMock package.
plugin
plugin/connector
plugin/connector/builtin
plugin/connector/connutils
plugin/connector/mock Package mock is a generated GoMock package.	Package mock is a generated GoMock package.
plugin/connector/standalone
plugin/connector/standalone/test/testplugin Package main contains a plugin used for testing purposes.	Package main contains a plugin used for testing purposes.
plugin/processor
plugin/processor/builtin
plugin/processor/builtin/impl
plugin/processor/builtin/impl/avro Package avro is a generated GoMock package.	Package avro is a generated GoMock package.
plugin/processor/builtin/impl/avro/internal
plugin/processor/builtin/impl/base64
plugin/processor/builtin/impl/custom
plugin/processor/builtin/impl/field
plugin/processor/builtin/impl/json
plugin/processor/builtin/impl/unwrap
plugin/processor/builtin/impl/webhook
plugin/processor/builtin/internal
plugin/processor/builtin/internal/diff Package diff computes differences between text files or strings.	Package diff computes differences between text files or strings.
plugin/processor/builtin/internal/diff/difftest Package difftest supplies a set of tests that will operate on any implementation of a diff algorithm as exposed by "github.com/conduitio/conduit/pkg/plugin/processor/builtin/internal/diff"	Package difftest supplies a set of tests that will operate on any implementation of a diff algorithm as exposed by "github.com/conduitio/conduit/pkg/plugin/processor/builtin/internal/diff"
plugin/processor/builtin/internal/diff/lcs package lcs contains code to find longest-common-subsequences (and diffs)	package lcs contains code to find longest-common-subsequences (and diffs)
plugin/processor/builtin/internal/diff/testenv
plugin/processor/builtin/internal/exampleutil
plugin/processor/mock Package mock is a generated GoMock package.	Package mock is a generated GoMock package.
plugin/processor/procutils
plugin/processor/standalone
plugin/processor/standalone/test/wasm_processors/chaos
plugin/processor/standalone/test/wasm_processors/specify_error
processor
processor/mock Package mock is a generated GoMock package.	Package mock is a generated GoMock package.
provisioning
provisioning/config
provisioning/config/yaml
provisioning/config/yaml/internal
provisioning/config/yaml/v1
provisioning/config/yaml/v2
provisioning/mock Package mock is a generated GoMock package.	Package mock is a generated GoMock package.
provisioning/test/pipelines1
provisioning/test/pipelines2
provisioning/test/pipelines3
provisioning/test/pipelines4-integration-test
schemaregistry
schemaregistry/fromschema
schemaregistry/schemaregistrytest
schemaregistry/toschema
web/api
web/api/fromproto
web/api/mock Package mock is a generated GoMock package.	Package mock is a generated GoMock package.
web/api/status
web/api/toproto
web/openapi
web/ui
proto
api/v1 Package apiv1 is a reverse proxy.	Package apiv1 is a reverse proxy.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL