Conduit
Data Integration for Production Data Stores. 💫
Overview
Conduit is a data streaming tool written in Go. It aims to provide the best user
experience for building and running real-time data pipelines. Conduit comes with
batteries included, it provides a UI, common connectors, processors and
observability data out of the box.
Conduit pipelines are built out of simple building blocks which run in their own
goroutines and are connected using Go channels. This makes Conduit pipelines
incredibly performant on multi-core machines. Conduit guarantees the order of
received records won't change, it also takes care of consistency by propagating
acknowledgments to the start of the pipeline only when a record is successfully
processed on all destinations.
Conduit connectors are plugins that communicate with Conduit via a gRPC
interface. This means that plugins can be written in any language as long as
they conform to the required interface.
Conduit was created and open-sourced by Meroxa.
Quick start
-
Download and extract
the latest release.
-
Download
the example pipeline
and put it in the directory named pipelines
in the same directory as the
Conduit binary.
-
Run Conduit (./conduit
). The example pipeline will start automatically.
-
Write something to file example.in
in the same directory as the Conduit
binary.
echo "hello conduit" >> example.in
-
Read the contents of example.out
and notice an OpenCDC record:
$ cat example.out
{"position":"MTQ=","operation":"create","metadata":{"file.path":"./example.in","opencdc.readAt":"1663858188836816000","opencdc.version":"v1"},"key":"MQ==","payload":{"before":null,"after":"aGVsbG8gY29uZHVpdA=="}}
-
The string hello conduit
is a base64 encoded string stored in the field
payload.after
, let's decode it:
$ cat example.out | jq ".payload.after | @base64d"
"hello conduit"
-
Explore the UI by opening http://localhost:8080
and build your own
pipeline!
Installation guide
Download binary and run
Download a pre-built binary from
the latest release and
simply run it!
./conduit
Once you see that the service is running you may access a user-friendly web
interface at http://localhost:8080
. You can also interact with
the Conduit API directly, we recommend navigating
to http://localhost:8080/openapi
and exploring the HTTP API through Swagger
UI.
Conduit can be configured through command line parameters. To view the full list
of available options, run ./conduit --help
or see
configuring Conduit.
Homebrew
Make sure you have homebrew installed on your machine, then run:
brew update
brew install conduit
Debian
Download the right .deb
file for your machine architecture from the
latest release, then run:
dpkg -i conduit_0.10.0_Linux_x86_64.deb
RPM
Download the right .rpm
file for your machine architecture from the
latest release, then run:
rpm -i conduit_0.10.0_Linux_x86_64.rpm
Build from source
Requirements:
git clone git@github.com:ConduitIO/conduit.git
cd conduit
make
./conduit
Note that you can also build Conduit with make build-server
, which only
compiles the server and skips the UI. This command requires only Go and builds
the binary much faster. That makes it useful for development purposes or for
running Conduit as a simple backend service.
Docker
Our Docker images are hosted on GitHub's Container Registry. To run the latest
Conduit version, you should run the following command:
docker run -p 8080:8080 conduit.docker.scarf.sh/conduitio/conduit:latest
The Docker image includes the UI, you can access it by navigating
to http://localhost:8080
.
Configuring Conduit
Conduit accepts CLI flags, environment variables and a configuration file to
configure its behavior. Each CLI flag has a corresponding environment variable
and a corresponding field in the configuration file. Conduit uses the value for
each configuration option based on the following priorities:
-
CLI flags (highest priority) - if a CLI flag is provided it will always be
respected, regardless of the environment variable or configuration file. To
see a full list of available flags run conduit --help
.
-
Environment variables (lower priority) - an environment variable is only used
if no CLI flag is provided for the same option. Environment variables have
the prefix CONDUIT
and contain underscores instead of dots and hyphens (e.g.
the flag -db.postgres.connection-string
corresponds to
CONDUIT_DB_POSTGRES_CONNECTION_STRING
).
-
Configuration file (lowest priority) - Conduit by default loads the file
conduit.yaml
placed in the same folder as Conduit. The path to the file can
be customized using the CLI flag -config
. It is not required to provide a
configuration file and any value in the configuration file can be overridden
by an environment variable or a flag. The file content should be a YAML
document where keys can be hierarchically split on .
. For example:
db:
type: postgres # corresponds to flag -db.type and env variable CONDUIT_DB_TYPE
postgres:
connection-string: postgres://localhost:5432/conduitdb # -db.postgres.connection-string or CONDUIT_DB_POSTGRES_CONNECTION_STRING
Storage
Conduit's own data (information about pipelines, connectors, etc.) can be stored
in the following databases:
- BadgerDB (default)
- PostgreSQL
- SQLite
It's also possible to store all the data in memory, which is sometimes useful
for development purposes.
The database type used can be configured with the db.type
parameter (through
any of the configuration options in Conduit).
For example, the CLI flag to use a PostgresSQL database with Conduit is as
follows: -db.type=postgres
.
Changing database parameters (e.g. the PostgreSQL connection string) is done
through parameters of the following form: db.<db type>.<parameter name>
. For
example, the CLI flag to use a PostgreSQL instance listening on localhost:5432
would be: -db.postgres.connection-string=postgres://localhost:5432/conduitdb
.
The full example in our case would be:
./conduit -db.type=postgres -db.postgres.connection-string="postgresql://localhost:5432/conduitdb"
Connectors
For the full list of available connectors, see
the Connector List. If
there's a connector that you're looking for that isn't available in Conduit,
please file an issue
.
Conduit loads standalone connectors at startup. The connector binaries need to
be placed in the connectors
directory relative to the Conduit binary so
Conduit can find them. Alternatively, the path to the standalone connectors can
be adjusted using the CLI flag -connectors.path
.
Conduit ships with a number of built-in connectors:
- File connector provides
a source/destination to read/write a local file (useful for quickly trying out
Conduit without additional setup).
- Kafka connector
provides a source/destination for Apache Kafka.
- Postgres connector
provides a source/destination for PostgreSQL.
- S3 connector provides a
source/destination for AWS S3.
- Generator connector
provides a source which generates random data (useful for testing).
- Log connector
provides a destination which logs all records (useful for testing).
Additionally, we have prepared
a Kafka Connect wrapper
that allows you to run any Apache Kafka Connect connector as part of a Conduit
pipeline.
If you are interested in writing a connector yourself, have a look at our
Go Connector SDK. Since
standalone connectors communicate with Conduit through gRPC they can be written
in virtually any programming language, as long as the connector follows
the Conduit Connector Protocol
.
Processors
A processor is a component that operates on a single record that flows through a
pipeline. It can either change the record (i.e. transform it) or filter
it out based on some criteria.
Conduit provides a number of builtin processors, which can be used to manipulate fields,
send requests to HTTP endpoints, and more, check Builtin processors
for the list of builtin processors and documentations.
Conduit also provides the ability to write your own Standalone Processor,
or you can use the builtin processor custom.javascript
to write custom processors in JavaScript.
More detailed information as well as examples can be found in
the Processors documentation.
API
Conduit exposes a gRPC API and an HTTP API.
The gRPC API is by default running on port 8084. You can define a custom address
using the CLI flag -grpc.address
. To learn more about the gRPC API please have
a look at
the protobuf file
.
The HTTP API is by default running on port 8080. You can define a custom address
using the CLI flag -http.address
. It is generated
using gRPC gateway and is thus
providing the same functionality as the gRPC API. To learn more about the HTTP
API please have a look at the API documentation,
OpenAPI definition
or run Conduit and navigate to http://localhost:8080/openapi
to open
a Swagger UI which makes it easy to
try it out.
UI
Conduit comes with a web UI that makes building data pipelines a breeze, you can
access it at http://localhost:8080
. See
the installation guide for instructions on how to build
Conduit with the UI.
For more information about the UI refer to the Readme in /ui
.
Documentation
To learn more about how to use Conduit
visit Conduit.io/docs.
If you are interested in internals of Conduit we have prepared some technical
documentation:
Contributing
For a complete guide to contributing to Conduit, see
the Contribution Guide
.
We welcome you to join the community and contribute to Conduit to make it
better! When something does not work as intended please check if there is
already an issue that describes
your problem, otherwise
please open an issue
and let us know. When you are not sure how to do something
please open a discussion or
hit us up on Discord.
We also value contributions in form of pull requests. When opening a PR please
ensure:
- You have followed
the Code Guidelines.
- There is no other pull request
for the same update/change.
- You have written unit tests.
- You have made sure that the PR is of reasonable size and can be easily
reviewed.