ddbimport

module

v0.0.40 Latest Latest Go to latest Published: Sep 26, 2021 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/a-h/ddbimport

Links

Open Source Insights

README ¶

ddbimport

Import CSV data into DynamoDB.

Features

Comma separated (CSV) files
Tab separated (TSV) files
Large file sizes
Local files
Files on S3
Parallel imports using AWS Step Functions to import > 4M rows per minute
No depdendencies (no need for .NET, Python, Node.js, Docker, AWS CLI etc.)

Warning

This program will use up all available DynamoDB capacity. It is not designed for use against production tables. Use at your own risk.

Installation

Download binaries for MacOS, Linux and Windows at https://github.com/a-h/ddbimport/releases

A Docker image is available:

docker pull adrianhesketh/ddbimport

Usage

Import local CSV from local computer:

ddbimport -inputFile ../data.csv -delimiter tab -numericFields year -tableRegion eu-west-2 -tableName ddbimport

Import S3 file from local computer:

ddbimport -bucketRegion eu-west-2 -bucketName infinityworks-ddbimport -bucketKey data1M.csv -delimiter tab -numericFields year -tableRegion eu-west-2 -tableName ddbimport

Import S3 file using remote ddbimport Step Function

ddbimport -remote -bucketRegion eu-west-2 -bucketName infinityworks-ddbimport -bucketKey data1M.csv -delimiter tab -numericFields year -tableRegion eu-west-2 -tableName ddbimport

Install ddbimport Step Function

ddbimport -install -stepFnRegion=eu-west-2

Benchmarks

Inserts per second of the Google ngram 1 dataset (English).

To reproduce my results.

Create a DynamoDB table

aws dynamodb create-table \
  --table-name ddbimport \
  --attribute-definitions AttributeName=ngram,AttributeType=S AttributeName=year,AttributeType=N \
  --key-schema AttributeName=ngram,KeyType=HASH AttributeName=year,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST

Download Google data

curl http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-1M-1gram-20090715-0.csv.zip -o 0.csv.zip

Prepare the data

# Add the headers.
echo "ngram	year	match_count	page_count	volume_count" > data.csv
# Prepare the data.
unzip 0.csv.zip
cat googlebooks-eng-1M-1gram-20090715-0.csv >> data.csv
rm googlebooks-eng-1M-1gram-20090715-0.csv

Resources

Learn about the project here:

Building from source

Ensure you have $GOPATH/bin in $PATH (by default that is ~/go/bin). This is needed for statik (https://github.com/rakyll/statik) to package the Serverless application into the ddbimport binary.

Install a supported version (v12 seems to work fine) of Node.js and NPM (https://www.npmjs.com/get-npm) or Yarn (https://classic.yarnpkg.com/en/docs/install/#mac-stable).

git clone git@github.com:a-h/ddbimport; cd ddbimport
Edit version/version.go and set the Version const to a non-empty value. Without this, installation in steps 7-8 will fail.
yarn global add serverless or npm -g install serverless, whichever you prefer.
sls plugin install -n serverless-step-functions
make -C sls package
go build -o ddbimport cmd/main.go. This is your main binary.
Run ./ddbimport -install -stepFnRegion your-region and wait a minute or so. You may check the CloudFormation console, a stack named ddbimport should now be created.
Run the same command again. This will now upload the binary that contains two Lambda function handlers, and setup the actual step function. If this fails, complaining about S3 key not found, you probably skipped step 2.

Directories ¶

Path	Synopsis
batchwriter
cmd
csvtodynamo
log
sls
import
linereader
preflight
preflight/process
state
version

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL