Go ETL
Go ETL using pipelines
Start
make start
To configure, edit ./config/config.yaml to load a new pipeline.
To add a custom ETL, create a new plugin on ./plugins and add is on config.yaml.
Examples
Add an ETL code on config.yaml
- Add on config/config.yaml:
workers:
- schedule:
hour: 20 # UTC time
minute: 0
job:
name: http-requestor
code: |
package main
import (
"github.com/topfreegames/go-etl/processors"
"github.com/topfreegames/go-etl/models"
)
type etl string
func (e etl) Extract() models.DataProcessor {
return processors.NewHTTPRequestor("GET", "http://localhost:8080")
}
func (e etl) Transform() models.DataProcessor {
return &processors.Logger{}
}
func (e etl) Load() models.DataProcessor {
return &processors.Null{}
}
// ETL is the exported symbol of this plugin
var ETL etl
- Start:
make start
Create a new ETL plugin
- Create a new plugin on ./plugins like this:
// ./plugins/http-requestor/main.go
package main
import (
"github.com/topfreegames/go-etl/processors"
"github.com/topfreegames/go-etl/models"
)
type etl string
func (e etl) Extract() models.DataProcessor {
return processors.NewHTTPRequestor("GET", "http://localhost:8080")
}
func (e etl) Transform() models.DataProcessor {
return &processors.Logger{}
}
func (e etl) Load() models.DataProcessor {
return &processors.Null{}
}
// ETL is the exported symbol of this plugin
var ETL etl
- Build the plugin binary:
make plugins
- Add on config/config.yaml:
workers:
- period: 1h
job:
name: http-requestor
- Start:
make start
Next steps
- Better logging
- Some shared memory (maybe redis?) to allow replication and not execute job twice
- Not crash application when wrong script (not found or code that doesn't compile)
- Unit tests
- Integration tests