simulator

command module

v0.0.0-...-9964166 Latest Latest Go to latest Published: May 13, 2023 License: MIT Imports: 24 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cybersamx/cloud-data-platform

Links

Open Source Insights

README ¶

Simulator

Build the project for running a "simulated" transactional data source, from which Airbyte will extract.

Overview

S3 Loading by the simulator

Snowflake offers a beginner's workshop called Virtual Zero-to-Snowflake hands-on lab. The lab uses the sample data from Citibike, which the data from the past 7 years or so are made available on a public AWS S3 bucket. We can load all the data from the S3 bucket directly into Snowflake with just 1 load using a Snowflake feature called Stage.

However, one of the goals of this project is to play around with Airbyte and try out its change data capture capability. So we want to be able to do the following:

Instead of loading all available transactional data from S3 all at once, we want to simulate the "citibike" application database (running on postges) where transactions would be stored.
A Go program called simulator is written to simulate a series of incremental batch insertions of (trip and rider) transactions from S3 to the posgres database. The frequency of inserts can be adjusted in the config.yaml file.
Airbyte can be configured to extract any data that are added to postgres since last sync before loading the data to Snowflake.

Setup

Open a shell and start Postgres by running:
```
make docker-up
```
Build and run the simulator:
```
make build
bin/simulator start
```
When we are done, run the following to teardown and close out the resources:
```
make docker-down
```

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL