tinyETL
Tiny ETL tool in Go language.
ETL stands for Extract-Transform-Load
Purpose
The purpose of the project was to implement a test task for a job interview.
The original task: filter list of customers in a range of 100km from a geo point and return id-name pairs sorted by ID (ascending).
Build & Quality status
Build:
Architecture
- Workflow is defined by a chain of workers.
- Uses streaming processing where possible to minimize memory footprint.
- Data items are processed throw workflow and are wrapped in a
WorkItem
container
that holds data item as Data() interface{}
. It also has a reference to a previous worker
for easier logging/troubleshooting in case of unexpected input.
- Asynchronous processing is not built-in at the moment but should be fairly easy to add.
Though current implementation allows workers return
etl.Iterator
and easily use goroutines for async processing.
- The library was designed in such a way that it requires minimum
boiler-plating & coding from developers.
Project structure
How to run examples/customers
To run the program you would need Go language installed, preferably version 1.10.
-
change current directory to examples/customers
cd $GOPATH/src/github.com/astec/tinyetl/examples/customers
-
Get all Go source code dependencies:
go get ./...
-
Build the program using Go compiler
go build .
This should produce customers
executable file (customers.exe
on Windows).
-
To get hints for program arguments run with --help
flag:
>customers.exe --help
usage: customers.exe [<flags>]
Flags:
--help Show context-sensitive help (also try --help-long and
--help-man).
-i, --input=INPUT Input file or URL
-s, --sort=SORT Specifies how to sort customers: id, name. Prepend '-' for
descending order.
-
File to be processed can be specified with --input
or short -i
parameter.
customers --input=customers.txt
It defaults to customers.txt
.
-
Sorting can be specified with --sort
or short -s
parameter.
customers --sort=id
Currently 2 options supported: id
and by name
.
If you want to sort in descending order prepend value with "-
":
customers --sort=-id
How to run tests
Tests & unit tests are implemented using standard Go testing convention.
Test files are located next to code files with _test
postfix. E.g.:
code.go
code_test.go
To run all tests:
go test ./...
Some quick links for tests:
Notes about examples/customers
implementation
- To minimize memory footprint
CustomerExtended
structure is mapped to CustomerShot
before sorting.
- ffjson autogenerated code is used to speedup JSON unmarshalling.
- To speedup development process input file name defaults to
customers.txt
if not specified.
- The utility should support an
--input
parameter as a comma-separated list of files, though this has not been tested yet.
Missing functionality in ETL library
This is a test demo project. Due to time & purpose constraint it misses few things essential for production use. E.g:
- Logging
- Statistics & Performance counters
- etc.