dp-dd-csv-filter

command module

v1.0.3 Latest Latest Go to latest Published: Feb 24, 2017 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/onsdigital/dp-dd-csv-filter

Links

Open Source Insights

README ¶

dp-csv-filter

Application retrieves a specified CSV file from AWS s3 bucket, and filters it by dimension values. The output is then written to a new file in an AWS s3 bucket.

The /filter endpoint accepts HTTP POST request with a FilterRequest body {"filePath": "$PATH_TO_FILE$"}

Getting started

First grab the code

go get github.com/ONSdigital/dp-csv-filter

You will need to have Kafka set up locally. Set the following env variables (the example here uses the default ports)

ZOOKEEPER=localhost:2181
KAFKA=localhost:9092

Install Kafka:

brew install kafka
brew services start kafka
brew services start zookeeper

Run the Kafka console consumer

kafka-console-consumer --zookeeper $ZOOKEEPER --topic filter-request

Run the Kafka console producer

kafka-console-producer --broker-list $KAFKA --topic filter-request

Run the filter

make debug

The following curl command will instruct the application attempt to get the specified file from the AWS bucket, filter it and write the output back to the output file in the bucket

curl -H "Content-Type: application/json" -X POST -d '{ "inputUrl": "s3://dp-csv-splitter-1/Open-Data-for-filter.csv", "outputUrl": "s3://dp-csv-splitter-1/Open-Data-filtered.csv", "dimensions": { "NACE": [ "08 - Other mining and quarrying", "1012 - Processing and preserving of poultry meat"], "Prodcom Elements": [ "Work done", "Waste Products"] } }' http://localhost:21100/filter

Or paste the following line into the kafka console producer mentioned above:

{ "inputUrl": "s3://dp-csv-splitter/Open-Data-v3.csv", "outputUrl": "s3://dp-dd-csv-filter/Open-Data-v3.csv", "dimensions": { "NACE": [ "CI_0000072", "CI_0008197"], "Prodcom Elements": [ "CI_0021513", "CI_0021514"] } }

The project includes a small data set in the sample_csv directory for test usage.

Configuration

Environment variable	Default	Description
BIND_ADDR	":21100"	The host and port to bind to.
KAFKA_ADDR	"http://localhost:9092"	The Kafka address to request messages from.
S3_BUCKET	"dp-csv-splitter-1"	The name of AWS S3 bucket to get the csv files from.
AWS_REGION	"eu-west-1"	The AWS region to use.
KAFKA_CONSUMER_GROUP	"filter-request"	The name of the Kafka group to read messages from.
KAFKA_CONSUMER_TOPIC	"filter-request"	The name of the Kafka topic to read messages from.

Contributing

See CONTRIBUTING for details.

License

Released under MIT license, see LICENSE for details.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
aws
config
filter
handlers
message
event

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL