flows

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 16, 2023 License: GPL-3.0 Imports: 19 Imported by: 10

README

Flows

Simple declarative and functional dataflow with Kafka.

Developing

Tools Used

Please follow the tools setup guide in its respective repository or documentation.

  • Golang
  • Gomock
  • Podman / docker
  • Protoc
  • Protoc-gen-go
  • Psql
Commands

Makefile is heavily used.

Commands used when coding:

make test
make tidy
make update
make mocks
make proto
make cov
make htmlcov

Commands used when running examples locally:

make reset
make run
make listen
make group-delete

Main func is in main folder instead of the project root, as the project root is used for library package.

Running Example

See the source in example folder

docker-compose up -d
make reset
make run
  1. docker-compose up -d will start zookeeper, kafka, and postgresql with ports exposed to your host network detached from your terminal
  2. make reset will clean up the topics on kafka and postgresql table, and add some example events
  3. make run will start the example word count application

There are three examples, its manually switched via comments in the main folder in main.go

Test

Some tests uses testcontainers-go which are set up for rootless Podman based context. Switching between different container technology will be done some time in the future.

Principles

  • Container first
  • Do one thing, and only one thing well
  • Ease to change integrations
  • Protobuf as primary bytes encoding
  • Sane defaults
  • Simple format helpers, with bytes as default
  • Idiomatic golang

Integrations

  • Kafka using Sarama
  • Postgresql using Bun

Patterns

Flows sits at the core of the Kappa Architecture, where it tackles four elements:

  1. Stateless functions
  2. Stateful functions
  3. Join as combination of stateless and stateful functions
  4. Materialisation
Stateless

Stateless functions can be used to perform simple operations such as:

  1. Basic event filtering
  2. Event mirroring
  3. Merging multiple topics into one
  4. Exploding events
  5. Interfacing with external parties with at least once semantic
Stateful

Stateful functions can be used to perform state machine operations on exactly one topic to perform operations such as:

  1. Reduce or aggregate
  2. Validation
Join

To ensure events are published for the multiple topics that are being joined, there are two options:

  1. Maintain publishing state for all topics (currently, the last result state is global per key, this can be changed to be per topic per key)
  2. Merge topics into one intermediate topic, and perform a stateful function

In this codebase, option two is the chosen option for reasons of:

  1. Avoiding to impose limits on the number of messages being published, which can increase the state size written into the store, which implication should be obvious
  2. Stateless map that can be used to merge topics are cheaper in terms of time latency than transaction locking failure
  3. Kafka can be configured to be scalable enough in terms of throughput
  4. Parallelism of the intermediate topic (partition count) can be higher than the source topics
  5. Avoiding transaction will reduce cost and increase speed especially for cloud services
  6. Avoiding distributed data contention allows local state caching, reducing data query latency for cache hits

Join Pattern

It is recommended to use your own custom intermediate topic mapper for better control of your dataflow. However, a standard implementation is provided as a reference.

Consumers

Consumer functions can be used to materialise into databases either as a batch or per message. It can also be used to interface with external parties where relevant.

Limitations

To keep the simplicity of implementation, temporal operations are not yet considered in this project. Examples of temporal operations that are not considered for implementation yet:

  1. End of time window only publishing. With states, a window can be emulated, but an output will be published for each message received instead of only at the end of the window.
  2. Per-key publication rate limiter. Combining state storage, commit offset, and real time ticks can be implemented, however that complicates the interfaces needed.

To Do

  1. Integrations
    1. MongoDB
    2. Cassandra
    3. AWS DynamoDB
    4. GCP Bigtable
  2. Local state caching
  3. Unit test coverage
  4. Replace prometheus with otel

Notes

This project solves dataflow in a very specific way. If you are interested to improve this project more or have some feedback, please contact me.

Kafka Migration

The way to migrate stateful functions to a mirrored Kafka cluster are by:

  1. Stopping the job
  2. Clearing the internal column / field
  3. Starting the job

Due to the fact that offset numbers are different in mirrored Kafka cluster, an additional application functionality side deduplication will be required to ensure that stateful operation does not get executed twice. Such deduplication can be peformed using a unique Kafka header identifier.

However, if the application functionality can already tolerate at least once execution, then there will be no problems with migration.

Functions

The functions are used as is, because function for pointer struct can be used as is. As proof, the following unit test will pass.

import (
	"testing"

	"github.com/stretchr/testify/assert"
)

type TestStruct struct {
	Test int
}

func (t *TestStruct) Inc() {
	t.Test += 1
}

type TestStructTwo struct {
	Test int
}

func (t TestStructTwo) Get() int {
	return t.Test
}

func Inc(fn func()) {
	fn()
}

func Get(fn func() int) int {
	return fn()
}

func TestPointerStuff(t *testing.T) {
	assert := assert.New(t)
	testOne := TestStruct{
		Test: 1,
	}
	Inc(testOne.Inc)
	assert.Equal(2, testOne.Test)

	testTwo := &TestStruct{
		Test: 4,
	}
	Inc(testTwo.Inc)
	assert.Equal(5, testTwo.Test)

	testThree := TestStructTwo{
		Test: 100,
	}
	assert.Equal(100, Get(testThree.Get))

	testFour := &TestStructTwo{
		Test: 200,
	}
	assert.Equal(200, Get(testFour.Get))
}

Rootless Podman

You only need to add DOCKER_HOST according to your podman info. Example:

export DOCKER_HOST=unix:///run/user/1000/podman/podman.sock

Documentation

Index

Constants

View Source
const (
	QualifierKafkaProducerConfiguration  = "QualifierKafkaProducerConfiguration"
	QualifierKafkaProducer               = "QualifierKafkaProducer"
	QualifierKafkaConsumerConfiguration  = "QualifierKafkaConsumerConfiguration"
	QualifierKafkaConsumer               = "QualifierKafkaConsumer"
	QualifierKafkaConsumerHandler        = "QualifierKafkaConsumerHandler"
	QualifierKafkaConsumerSingleFunction = "QualifierKafkaConsumerSingleFunction"
	QualifierKafkaConsumerBatchFunction  = "QualifierKafkaConsumerBatchFunction"
	QualifierKafkaConsumerKeyFunction    = "QualifierKafkaConsumerKeyFunction"
)
View Source
const (
	QualifierPostgresqlConnectionConfiguration        = "QualifierPostgresqlConnectionConfiguration"
	QualifierPostgresqlConnection                     = "QualifierPostgresqlConnection"
	QualifierPostgresqlSingleStateRepository          = "QualifierPostgresqlSingleStateRepository"
	QualifierPostgresqlSingleStateRepositoryTableName = "QualifierPostgresqlSingleStateRepositoryTableName"
	QualifierPostgresqlUpsertRepository               = "QualifierPostgresqlUpsertRepository"
)
View Source
const (
	QualifierRetryConfiguration = "QualifierRetryConfiguration"
	QualifierRetry              = "QualifierRetry"
)
View Source
const (
	QualifierRouteConfiguration = "QualifierRouteConfiguration"
	QualifierRoute              = "QualifierRoute"
)
View Source
const (
	QualifierRuntime = "QualifierRuntime"
)

Variables

This section is empty.

Functions

func GetKafkaProducer added in v0.0.20

func GetKafkaProducer(ctx context.Context) (message.Producer, error)

func GetPostgresqlSingleStateRepository added in v0.0.20

func GetPostgresqlSingleStateRepository(ctx context.Context) (stateful.SingleStateRepository, error)

func GetPostgresqlUpsertRepository added in v0.0.20

func GetPostgresqlUpsertRepository[T any](ctx context.Context) (materialise.UpsertRepository[T], error)

func GetRetry added in v0.0.20

func GetRetry(ctx context.Context) (*runtime_retry.Retry, error)

func InjectedRuntimes added in v0.0.20

func InjectedRuntimes() []runtime.Runtime

func InjectorKafkaConsumerHandlerConfiguration added in v0.1.0

func InjectorKafkaConsumerHandlerConfiguration(ctx context.Context) (runtime.Configuration[*runtime_sarama.Consumer], error)

func InjectorKafkaConsumerKeyedHandler added in v0.1.0

func InjectorKafkaConsumerKeyedHandler(ctx context.Context) (runtime_sarama.ConsumerLoop, error)

func InjectorPostgresqlSingleStateRepository added in v0.0.20

func InjectorPostgresqlSingleStateRepository(ctx context.Context) (stateful.SingleStateRepository, error)

func InjectorPostgresqlUpsertRepository added in v0.0.20

func InjectorPostgresqlUpsertRepository[T any](ctx context.Context) (materialise.UpsertRepository[T], error)

func InjectorRouteProducer added in v0.0.21

func InjectorRouteProducer(ctx context.Context) (runtime.Configuration[*runtime_bunrouter.Router], error)

func InjectorRuntime added in v0.0.20

func InjectorRuntime(qualifier string) inverse.Injector[runtime.Runtime]

func RegisterConsumer added in v0.1.0

func RegisterConsumer()

Consumer

func RegisterConsumerKeyedConfig added in v0.1.0

func RegisterConsumerKeyedConfig(config []runtime.Configuration[*runtime_sarama.Consumer])

func RegisterPostgresql added in v0.0.20

func RegisterPostgresql(config []runtime.Configuration[*runtime_bun.PostgresqlConnection])

Postgresql connection

func RegisterPostgresqlSingleState added in v0.0.20

func RegisterPostgresqlSingleState(tableName string)

Single state repository

func RegisterPostgresqlUpsert added in v0.0.20

func RegisterPostgresqlUpsert[T any]()

Upsert materialiser

func RegisterProducer added in v0.0.20

func RegisterProducer()

func RegisterProducerConfig added in v0.0.20

func RegisterProducerConfig(config []runtime.Configuration[*runtime_sarama.Producer])

Producer

func RegisterRetry added in v0.0.20

func RegisterRetry(config []runtime.Configuration[*runtime_retry.Retry])

Retry

func RegisterRoute added in v0.0.20

func RegisterRoute(config []runtime.Configuration[*runtime_bunrouter.Router])

Types

type JoinPostgresqlFunctionConfiguration

type JoinPostgresqlFunctionConfiguration struct {
	PostgresqlConfiguration    []runtime.Configuration[*runtime_bun.PostgresqlConnection]
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	StatefulFunctions          map[string]stateful.SingleFunction
	PersistenceIdFunctions     map[string]stateful.PersistenceIdFunction[[]byte, []byte]
	IntermediateTopicName      string
	PersistenceTableName       string
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

func (JoinPostgresqlFunctionConfiguration) Runtime

type Main

type Main interface {
	Register(i string, r func() runtime.Runtime) error
	Start(i string) error
}

func NewMain added in v0.0.10

func NewMain() Main

type MaterialisePostgresqlFunctionConfiguration added in v0.0.4

type MaterialisePostgresqlFunctionConfiguration[T any] struct {
	PostgresqlConfiguration    []runtime.Configuration[*runtime_bun.PostgresqlConnection]
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	MaterialiseMapFunction     materialise.MapFunction[message.Bytes, message.Bytes, T]
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

func (MaterialisePostgresqlFunctionConfiguration[T]) Runtime added in v0.0.4

type MaterialisePostgresqlOneToOneFunctionConfiguration added in v0.0.17

type MaterialisePostgresqlOneToOneFunctionConfiguration[S any, IK any, IV any] struct {
	Name                       string
	InputTopic                 topic.Topic[IK, IV]
	Function                   materialise.MapFunction[IK, IV, S]
	InputBroker                string
	OutputBroker               string
	HttpPort                   int
	PostgresConnectionString   string
	PostgresqlConfiguration    []runtime.Configuration[*runtime_bun.PostgresqlConnection]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

Wiring configuration

func (MaterialisePostgresqlOneToOneFunctionConfiguration[S, IK, IV]) Runtime added in v0.0.17

type RouterAdapterConfiguration added in v0.0.16

type RouterAdapterConfiguration[Request any, InputKey any, InputValue any] struct {
	Name                       string
	ProduceTopic               topic.Topic[InputKey, InputValue]
	ProduceBroker              string
	RequestBodyFormat          format.Format[Request]
	RequestMapFunction         stateless.OneToOneFunction[message.Bytes, Request, InputKey, InputValue]
	HttpPort                   int
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

Wiring configuration

func (RouterAdapterConfiguration[Request, InputKey, InputValue]) Runtime added in v0.0.16

func (c RouterAdapterConfiguration[Request, InputKey, InputValue]) Runtime() runtime.Runtime

type RouterConfiguration added in v0.0.6

type RouterConfiguration struct {
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

Wiring configuration

func (RouterConfiguration) Runtime added in v0.0.6

func (c RouterConfiguration) Runtime() runtime.Runtime

type RuntimeFacade added in v0.0.13

type RuntimeFacade struct {
	Runtimes []runtime.Runtime
}

func (*RuntimeFacade) Start added in v0.0.13

func (r *RuntimeFacade) Start() error

func (*RuntimeFacade) Stop added in v0.0.13

func (r *RuntimeFacade) Stop()

type StatefulPostgresqlFunctionConfiguration

type StatefulPostgresqlFunctionConfiguration struct {
	PostgresqlConfiguration    []runtime.Configuration[*runtime_bun.PostgresqlConnection]
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	StatefulFunction           stateful.SingleFunction
	PersistenceIdFunction      stateful.PersistenceIdFunction[[]byte, []byte]
	PersistenceTableName       string
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

Wiring configuration

func (StatefulPostgresqlFunctionConfiguration) Runtime

type StatefulPostgresqlOneToOneFunctionConfiguration added in v0.0.15

type StatefulPostgresqlOneToOneFunctionConfiguration[S any, IK any, IV any, OK any, OV any] struct {
	Name                       string
	InputTopic                 topic.Topic[IK, IV]
	OutputTopic                topic.Topic[OK, OV]
	Function                   stateful.OneToOneFunction[S, IK, IV, OK, OV]
	InputBroker                string
	OutputBroker               string
	HttpPort                   int
	StateFormat                format.Format[S]
	StateKeyFunction           stateful.PersistenceIdFunction[IK, IV]
	PostgresTable              string
	PostgresConnectionString   string
	PostgresqlConfiguration    []runtime.Configuration[*runtime_bun.PostgresqlConnection]
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

Wiring configuration

func (StatefulPostgresqlOneToOneFunctionConfiguration[S, IK, IV, OK, OV]) Runtime added in v0.0.15

func (c StatefulPostgresqlOneToOneFunctionConfiguration[S, IK, IV, OK, OV]) Runtime() runtime.Runtime

type StatefulPostgresqlOneToTwoFunctionConfiguration added in v0.0.15

type StatefulPostgresqlOneToTwoFunctionConfiguration[S any, IK any, IV any, OK1 any, OV1 any, OK2 any, OV2 any] struct {
	Name                       string
	InputTopic                 topic.Topic[IK, IV]
	OutputTopicOne             topic.Topic[OK1, OV1]
	OutputTopicTwo             topic.Topic[OK2, OV2]
	Function                   stateful.OneToTwoFunction[S, IK, IV, OK1, OV1, OK2, OV2]
	InputBroker                string
	OutputBroker               string
	HttpPort                   int
	StateFormat                format.Format[S]
	StateKeyFunction           stateful.PersistenceIdFunction[IK, IV]
	PostgresTable              string
	PostgresConnectionString   string
	PostgresqlConfiguration    []runtime.Configuration[*runtime_bun.PostgresqlConnection]
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

Wiring configuration

func (StatefulPostgresqlOneToTwoFunctionConfiguration[S, IK, IV, OK1, OV1, OK2, OV2]) Runtime added in v0.0.15

func (c StatefulPostgresqlOneToTwoFunctionConfiguration[S, IK, IV, OK1, OV1, OK2, OV2]) Runtime() runtime.Runtime

type StatelessOneToOneConfiguration added in v0.0.15

type StatelessOneToOneConfiguration[IK any, IV any, OK any, OV any] struct {
	Name                       string
	InputTopic                 topic.Topic[IK, IV]
	OutputTopic                topic.Topic[OK, OV]
	Function                   stateless.OneToOneFunction[IK, IV, OK, OV]
	InputBroker                string
	OutputBroker               string
	HttpPort                   int
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

func (StatelessOneToOneConfiguration[IK, IV, OK, OV]) Runtime added in v0.0.15

func (c StatelessOneToOneConfiguration[IK, IV, OK, OV]) Runtime() runtime.Runtime

type StatelessOneToOneExplodeConfiguration added in v0.0.15

type StatelessOneToOneExplodeConfiguration[IK any, IV any, OK any, OV any] struct {
	Name                       string
	InputTopic                 topic.Topic[IK, IV]
	OutputTopic                topic.Topic[OK, OV]
	Function                   stateless.OneToOneExplodeFunction[IK, IV, OK, OV]
	InputBroker                string
	OutputBroker               string
	HttpPort                   int
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

func (StatelessOneToOneExplodeConfiguration[IK, IV, OK, OV]) Runtime added in v0.0.15

func (c StatelessOneToOneExplodeConfiguration[IK, IV, OK, OV]) Runtime() runtime.Runtime

type StatelessOneToTwoConfiguration added in v0.0.15

type StatelessOneToTwoConfiguration[IK any, IV any, OK1 any, OV1 any, OK2 any, OV2 any] struct {
	Name                       string
	InputTopic                 topic.Topic[IK, IV]
	OutputTopicOne             topic.Topic[OK1, OV1]
	OutputTopicTwo             topic.Topic[OK2, OV2]
	Function                   stateless.OneToTwoFunction[IK, IV, OK1, OV1, OK2, OV2]
	InputBroker                string
	OutputBroker               string
	HttpPort                   int
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

func (StatelessOneToTwoConfiguration[IK, IV, OK1, OV1, OK2, OV2]) Runtime added in v0.0.15

func (c StatelessOneToTwoConfiguration[IK, IV, OK1, OV1, OK2, OV2]) Runtime() runtime.Runtime

type StatelessSingleFunctionConfiguration

type StatelessSingleFunctionConfiguration struct {
	KafkaProducerConfiguration []runtime.Configuration[*runtime_sarama.Producer]
	KafkaConsumerConfiguration []runtime.Configuration[*runtime_sarama.Consumer]
	RetryConfiguration         []runtime.Configuration[*runtime_retry.Retry]
	StatelessFunction          stateless.SingleFunction
	RouteConfiguration         []runtime.Configuration[*runtime_bunrouter.Router]
}

Wiring configuration

func (StatelessSingleFunctionConfiguration) Runtime

Directories

Path Synopsis
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL