data_migration_pipeline

command module

v0.0.0-...-23229c8 Latest Latest Go to latest Published: Feb 14, 2025 License: Apache-2.0 Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

Template: Data migration pipeline

Simple Spanner data migration pipeline in Go, using Apache Beam to migrate from a MySQL CSV dump. To run this, please follow these steps:

cd into an empty directory

$ mkdir ~/spanner_migration_project
$ cd ~/spanner_migration_project

Install the abc binary
```
$ go install github.com/abcxyz/abc/cmd/abc@latest
$ abc --help
```
This only works if you have go installed (https://go.dev/doc/install) and have the Go binary directory in your $PATH.
Execute the template defined in the t directory. This will output a file named main.go in your working directory containing the template program.
```
$ abc templates render github.com/abcxyz/abc/t/data_migration_pipeline@latest
```
Start a local Spanner emulator. If the emulator is not installed already, you will be prompted to download and install the binary for the emulator.
```
$ gcloud components update
$ gcloud emulators spanner start
```
Create a dedicated gcloud configuration that allows disable authentication and override the endpoint. Once configured, your gcloud commands will be sent to the emulator instead of the production service. No worries, you'll be able to switch back to your previous configurations at the end of this guide.
```
$ gcloud config configurations create emulator
$ gcloud config set auth/disable_credentials true
$ gcloud config set project [your-project-id] $ gcloud config set api_endpoint_overrides/spanner http://localhost:9020/
```

Create a test database to host your pipeline output.

$ gcloud spanner instances create test-instance \
    --config=emulator-config --description="Test Instance" --nodes=1
$ gcloud spanner databases create testdb --instance=test-instance --ddl='CREATE TABLE mytable (Id STRING(36)) PRIMARY KEY(Id)'

make sure the local Spanner emulator runs in a separated tab.

Point your client libraries to the emulator. When pipeline starts, the client library automatically checks for SPANNER_EMULATOR_HOST and connects to the emulator if it is running.
```
$ export SPANNER_EMULATOR_HOST=localhost:9010
```

Run the data migration pipeline in dry run mode. Verify metrics like total record count.

$ go run main.go -input-csv-path "test-data.csv" -spanner-database "projects/[your-project-id]/instances/test-instance/databases/testdb" -spanner-table "mytable" -dry-run=true

flag -dry-run=true is to active the dry run mode.

Run the data migration pipeline in the real run and write into Spanner.

$ go run main.go -input-csv-path "test-data.csv" -spanner-database "projects/[your-project-id]/instances/test-instance/databases/testdb" -spanner-table "mytable"

Verify the MySQL CSV dump has been successfully migrated to your Spanner database

$ gcloud spanner databases execute-sql testdb --instance=test-instance --sql='SELECT * FROM mytable'

Switch back to your default gcloud configurations

$ gcloud config configurations activate default

Documentation ¶

Overview ¶

Package main implements a simple MySQL to Spanner data migration example.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL