upload-server

module

v0.0.0-...-f9df99f Latest Latest Go to latest Published: Dec 6, 2024 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cdcgov/data-exchange-upload

README ¶

DEX TUSD Go Server

A resumable file upload server for OCIO Data Exchange (DEX)

Overview

The DEX Upload API is an open-source tool that allows users to upload and manage data sets for public health initiatives. It is designed for ease of use and customization while also ensuring compliance with federal standards.

The DEX Upload API is built on the tus open protocol, which provides a way to upload large files in smaller chunks over HTTP. Each file chunk is sent as an HTTP PATCH request. The last PATCH request tells tus to combine the chunks into the full file. Once the file upload is complete, the server can also push it out to other destinations.

Key Features

Resumable uploads via tus
File metadata validation
File routing
Upload multiple files in parallel
Configurable authN/authZ middleware
Support for distributed file locking to enable horizontal scaling

Folder Structure

Repo is structured (as feasible) based on the golang-standards/project-layout

References

Based on the tus open protocol for resumable file uploads
Based on the tusd official reference implementation

Getting Started

1. Install Required Tools

Install Go and/or a container tool (e.g., Docker, Podman)

2. Clone the Repo

Clone the repo and change into the upload-server/ inside the repo

git clone git@github.com:CDCgov/data-exchange-upload.git 

cd data-exchange-upload/upload-server

3. Start the Server

Running the server with the default configurations will use the local file system for the storage backend and the delivery targets. It will start the Upload API server at http://localhost:8080 and a Tus client at http://localhost:8081/, which will allow you to upload files to the Upload API server.

These files will be uploaded by default to the upload-server/uploads/tus-prefix/ directory. The base uploads/ directory name and location can be changed using the LOCAL_FOLDER_UPLOAD_TUS environment variable. The tus-prefix/ name can be changed using the TUS_UPLOAD_PREFIX environment variable.

Information about file uploads and delivery are stored as reports and events in the upload-server/uploads/reports/ and upload-server/uploads/events respectively. If delivery targets are specified in the sender manifest config file, they will be sent to the corresponding directory in upload-server/uploads/.

Default folder structure

|-- upload-server
    |-- uploads
        |-- edav // edav delivery destination
        |-- ehdi // ehdi delivery destination
        |-- eicr // eicr delivery destination
        |-- events // file upload and delivery events
        |-- ncird // ncird delivery destination
        |-- reports // file upload and delivery reports
        |-- routing // routing delivery destination
        |-- tus-prefix // file upload destination

All of the following commands should be run from the upload-server/ directory.

Flags

Application Configuration

-appconf passes in an environment variable file to use for configuration

The following forms are permitted:

-appconf=.env
-appconf .env
--appconf=.env
--appconf .env

Using Go

Running from the Code

Run the code

go run ./cmd/main.go

Run the code with the flag

go run ./cmd/main.go -appconf=.env

Running from the Binary

Build the binary

go build -o ./dextusd ./cmd/main.go

Run the binary

./dextusd

Run the binary with the flag

./dextusd -appconf=.env

Using Docker

Running Using the Dockerfile

Build the image

docker build -t dextusdimage .

Run the container

docker run -d -p 8080:8080 -p 8081:8081 --name dextusd dextusdimage

Note: -p 8080:8080 -p 8081:8081 must be included so that you can access the endpoints

Run the container with the flag, passing in the .env in the same directory

docker run -d -p 8080:8080 -p 8081:8081 -v .:/conf --name dextusd dextusdimage -appconf=/conf/.env

Note: -v .:/conf mounts a volume from your current directory to a directory in the container, which can be anything except /app because that is used for the binary. The location of the -appconf flag needs to point to this mounted volume. (see Mount Volume)

Running Using Docker Compose

This is the easiest way to start the service locally, because in addition to starting the service it also starts the Redis cache, Prometheus, and Grafana. If there is an .env file in the same directory as the docker-compose.yml file, it will automatically use those values when building and starting the containers and it will use the -appconf flag to pass the file into the service.

Start the containers using

docker-compose up -d

Configurations

Configuration of the upload-server is managed through environment variables. These environment variables can be set directly in the terminal

(Mac or Linux)

export SERVER_PORT=8082
go run ./cmd/main.go

(Windows)

set SERVER_PORT=8082
go run ./cmd/main.go

or you can create a file and set them in it

upload-server/env-file:

SERVER_PORT=8082

then pass the file in using the -appconf flag

go run ./cmd/main.go -appconf env-file

or load it into the session using the source command

source env-file
go run ./cmd/main.go

If you name this file .env you can get the benefits of the dotenv file format. For instance, it will automatically be recognized and loaded by tools like docker-compose. The .env.example file in the upload-server/ directory contains all of the available environment variables for configuring the system. Add any environment variables you would like to set to your .env file.

[!WARNING] Never check your .env file into source control. It should only be on your local computer or on the server you are using it on.

Configuration Documentation

Please see env-configs documentation for more complete documentation on environment configs to be used in the .env file.

Common Service Configurations

upload-server/.env:

## logging and environment
# enable or disable DEBUG logging level, default=INFO
LOGGER_DEBUG_ON= 
# environment for the service, system default=DEV
ENVIRONMENT= 

## server configs
# Protocol use by the server (http or https), default=http
SERVER_PROTOCOL= 
# hostname of the server, default=localhost
SERVER_HOSTNAME= 
# port on which the server runs, default=8080
SERVER_PORT= 
# url path for handling tusd upload requests, default=/files/
TUSD_HANDLER_BASE_PATH= 
# url path for handling tusd info requests, default=/info/
TUSD_HANDLER_INFO_PATH=
# maximum number of retries for event processing, default=3 
EVENT_MAX_RETRY_COUNT= 
# string separated list of keys from the sender manifest config to count in the metrics, default=data_stream_id,data_stream_route,sender_id
METRICS_LABELS_FROM_MANIFEST= 

# tusd
# relative file system path to the tus uploads directory within the storage backend location, default=tus-prefix
TUS_UPLOAD_PREFIX= 

# ui
# port on which the UI client runs, default=8081
UI_PORT= 
# CSRF token used for form security (32 byte string), default=1qQBJumxRABFBLvaz5PSXBcXLE84viE42x4Aev359DvLSvzjbXSme3whhFkESatW
CSRF_TOKEN=

[!WARNING] The default CSRF_TOKEN is for development purposes only. You should replace this with a new string, you can generate a 32 byte string here

Configuring Distributed File Locking with Redis

When you want to scale this service horizontally, you'll need to use a distributed file locking mechanism to prevent upload corruption. You can read more about the limitations of Tus's support for concurrent requests here. This service comes with a Redis implementation of a distributed file lock out of the box. All you need is a Redis instance that is accessible from the servers you will deploy this service to. This is provided for you in the docker-compose set up. After the Redis instances is set up, set the following environment variable to enable the use of your Redis instance:

upload-server/.env:

# connection string to the Redis instance
REDIS_CONNECTION_STRING=

Note: The full URI of the Redis instance must include authentication credentials such as username/password or access token. Make sure to use rediss:// instead of redis:// to use TLS for this traffic.

Configuring OAuth Token Verification Middleware

The Upload API has OAuth token verification middleware for the /files/ and /info endpoints. You can read more about it here. OAuth is disabled by default. If you would like to enable it, you need to set the following environment variables with your OAuth settings:

upload-server/.env:

# enable or disable OAuth token verification
OAUTH_AUTH_ENABLED=false
# URL of the OAuth token issuer
OAUTH_ISSUER_URL=
# space-separated list of required scopes
OAUTH_REQUIRED_SCOPES=
# optionally, URL for OAuth introspection, used for opaque tokens
OAUTH_INTROSPECTION_URL=

Configuring the storage backend

This service currently supports local file system, Azure, and AWS as storage backends. You can only use one storage backend at a time. If the Azure configurations are set, it will be the storage backend regardless. If the Azure configurations are not set and the S3 configurations are set, S3 will be the storage backend. If neither Azure or S3 configurations are set, local storage will be the storage backend.

Local file system

By default, this service uses the file system of the host machine it is running on as a storage backend. Therefore, no environment variables are necessary to set. You can change the directory the service will use as the base of the uploads

upload-server/.env:

# relative file system path to the base upload directory, default=./uploads
LOCAL_FOLDER_UPLOADS_TUS=

Sender manifest config location

By default, the service uses the sender manifest config files located in ../upload-configs. These files within this directory are split into the sub directories v1 and v2 depending on their version. The service will default to the v2 files if there are two versions of the same sender manifest config. You can change the base of the configs directory

upload-server/.env:

# relative file system path to the sender manifest configuration directory, default=../upload-configs
UPLOAD_CONFIG_PATH=

Azure Storage Account

To upload to an Azure Storage Account, you'll need to collect the name, access key, and endpoint URI of the account. You also need to create a Blob container within the account. You must set the following environment variables to use Azure.

upload-server/.env:

# Azure storage account name
AZURE_STORAGE_ACCOUNT= 
# Azure storage account private access key or SAS token
AZURE_STORAGE_KEY= 
# Azure storage endpoint URL
AZURE_ENDPOINT= 
# container name for tus base upload storage
TUS_AZURE_CONTAINER_NAME=

Azure local development

For local development, you can use Azurite to emulate Azure Storage. There is a Docker Compose file included here, docker-compose.azurite.yml that creates and starts an Azurite container. You only need to set the AZURE_STORAGE_KEY. You can get the default Azurite key here. To start the service with an Azure storage backend, run

podman-compose -f docker-compose.yml -f docker-compose.azurite.yml up -d

Optional Azure authentication

To use Azure Service Principal for authentication, set:

upload-server/.env:

# Azure storage account service principal tenant id
AZURE_TENANT_ID= 
# Azure storage account service principal client id
AZURE_CLIENT_ID= 
# Azure storage account service principal client secret
AZURE_CLIENT_SECRET=

Optional Azure blob for sender manifest config files

If you would like to store the sender manifest config files on Azure, create a blob container for them using the same credentials as the upload blob container. Copy the ../upload-configs/v1 and ../upload-configs/v2 directories to the blob. Set DEX_MANIFEST_CONFIG_CONTAINER_NAME to the new blob container name

upload-server/.env:

# container name for sender manifest configuration files
DEX_MANIFEST_CONFIG_CONTAINER_NAME=

If DEX_MANIFEST_CONFIG_CONTAINER_NAME is not set, the sender manifest config files on the file system will be used.

S3

To use an AWS S3 bucket as the storage backend, you'll need to create a bucket to upload to within S3 and give a user or service read and write access to it. Then set the bucket name and endpoint URI of the S3 instance.

upload-server/.env:

# s3-compatible storage endpoint URL, must start with `http` or `https`
S3_ENDPOINT= 
# bucket name for tus base upload storage
S3_BUCKET_NAME=

AWS local development

For local development, you can use Minio to emulate the AWS S3 API. There is a Docker Compose file included here, docker-compose.minio.yml that creates and starts a Minio container. To start the service with an AWS S3 storage backend, run

podman-compose -f docker-compose.yml -f docker-compose.minio.yml up -d

AWS Authentication

Authentication is handled using the standard AWS environment variables

upload-server/.env:

# username or user ID of the user or service account with read and write access to the bucket
AWS_ACCESS_KEY_ID= 
# password or private key of a user or service account with read and write access to the bucket
AWS_SECRET_ACCESS_KEY= 
# optional, session token for authentication (typically used for short lived keys)
AWS_SESSION_TOKEN= 
# region of the s3 bucket
AWS_REGION=

or using a profile in an AWS Credential file with the AWS CLI

~/.aws/credentials/credentials:

[default]
aws_access_key_id = <YOUR_DEFAULT_ACCESS_KEY_ID>
aws_secret_access_key = <YOUR_DEFAULT_SECRET_ACCESS_KEY>
aws_session_token = <YOUR_SESSION_TOKEN>

[test-account]
aws_access_key_id = <YOUR_TEST_ACCESS_KEY_ID>
aws_secret_access_key = <YOUR_TEST_SECRET_ACCESS_KEY>

and set the region in an AWS Config file

~/.aws/config/config:

[default]
region = <REGION>

[profile test-account]
region = <REGION>

Note: If you use a credential profile that is not [default], you need to explicitly set the AWS_PROFILE environment variable to the profile you want to use, before starting the service.

Optional AWS S3 bucket for sender manifest config files

If you would like to store the sender manifest config files in an AWS S3 bucket, set the DEX_S3_MANIFEST_CONFIG_FOLDER_NAME environment variable to the directory in the bucket to use. Optionally, you can also create a new bucket for the configs and set DEX_MANIFEST_CONFIG_BUCKET_NAME environment variable to that new bucket name. The new bucket must use the same credentials as the upload bucket. Copy the../upload-configs/v1 and ../upload-configs/v2 directories to the new config folder in the bucket

upload-server/.env:

# Directory name inside the s3 bucket for the sender manifest configuration files within the upload bucket
DEX_S3_MANIFEST_CONFIG_FOLDER_NAME= 
# Bucket name for the sender manifest configurations, if not set it defaults to the upload bucket
DEX_MANIFEST_CONFIG_BUCKET_NAME=

If neither DEX_MANIFEST_CONFIG_BUCKET_NAME or DEX_S3_MANIFEST_CONFIG_FOLDER_NAME are set, the sender manifest config files on the file system will be used.

Configuring the reports location

By default, the reports of upload and delivery activity are written to the ./uploads/reports directory in the local file system.

Local file system reports directory

To change the location of the report files

upload-server/.env:

# relative file system path to the reports directory
LOCAL_REPORTS_FOLDER=

Azure report service bus

Create an Azure service bus queue or topic for publishing the report messages. Set the following environment variables with the details from the new service bus

upload-server/.env:

# Azure connection string with credential to the queue or topic
REPORTER_CONNECTION_STRING= 
# queue name for sending reports, use if the service bus is a queue
REPORTER_QUEUE= 
# topic name for sending reports, use if the service bus is a queue
REPORTER_TOPIC=

Configuring the event publication and subscription

By default, the event messages about upload and delivery activity are written to the ./uploads/events directory in the local file system.

Local file system events directory

To change the location of the event files

upload-server/.env:

# relative file system path to the events directory
LOCAL_EVENTS_FOLDER=

Azure event publisher and subscriber service buses

Event publisher

Create an Azure service bus topic for publishing event messages. Set the following environment variables with the details from the new service bus topic

upload-server/.env:

# Azure connection string with credentials to the event publisher topic
PUBLISHER_CONNECTION_STRING= 
# topic name for the event publisher service bus
PUBLISHER_TOPIC=

Subscriber

Create a subscription for the desired Azure service bus topic. Set the environment variables with the details from the new subscription

upload-server/.env:

# Azure connection string with credentials to the event subscription
SUBSCRIBER_CONNECTION_STRING= 
# topic name to subscribe to for receiving events
SUBSCRIBER_TOPIC= 
# subscription name for the event subscriber
SUBSCRIBER_SUBSCRIPTION=

Configuring upload routing and delivery targets

This service is capable of copying files that are uploaded to other storage locations, even ones that are outside the on-prem or cloud environment your service is deployed to. This is useful when you want your files to land in particular storage locations based on their metadata. Setting this up begins with the creation of a YML file that defines delivery groups, and one or more delivery targets. These targets currently support Azure Blob, S3, and local file system.

By default, this service will use the YML file located at configs/local/delivery.yml, but you can create your own and point to it via the DEX_DELIVERY_CONFIG_FILE environment variable.

Start by defining programs, which act as delivery groups

configs/local/delivery.yml:

programs:
  - data_stream_id: teststream1
    data_stream_route: testroute1
  - data_stream_id: teststream2
    data_stream_route: testroute2

Next, define at least one delivery target for each group. Each of these target endpoints can be configured independently to point to a local file system directory, an Azure Blob container, or an AWS S3 bucket. Specify the type of connection you want by setting the type field to either az-blob, s3, or file.

Azure blob target

Create an Azure Blob container or get the values from an existing Blob container. Then, set the required connection information, which can be a SAS token and connection string, or Azure service principle. Note that the service will create the container if it does not already exist.

configs/local/delivery.yml:

programs:
  - data_stream_id: teststream1
    data_stream_route: testroute1
    delivery_targets:
      - name: target1
        type: az-blob
        endpoint: https://target1.blob.core.windows.net
        container_name: target1_container
        tenant_id: $AZURE_TENANT_ID
        client_id: $AZURE_CLIENT_ID
        client_secret: $AZURE_CLIENT_SECRET
  - data_stream_id: teststream2
    data_stream_route: testroute2
    delivery_targets:
      - name: target2
        type: az-blob
        endpoint: https://target2.blob.core.windows.net
        container_name: target2_container
        tenant_id: $AZURE_TENANT_ID
        client_id: $AZURE_CLIENT_ID
        client_secret: $AZURE_CLIENT_SECRET

Note that you can substiture environment variables using the $ notation. This is so you can keep secrets like service principle credentials or SAS tokens out of this configuration file.

AWS S3 bucket target

Create an AWS S3 bucket or get the values from an existing S3 bucket. Then, set the access credentials and endpoint for the bucket in the following way:

configs/local/delivery.yml:

programs:
  - data_stream_id: teststream1
    data_stream_route: testroute1
    delivery_targets:
      - name: target1
        type: s3
        endpoint: https://target1.s3.aws.com
        bucket_name: target1
        access_key_id: $S3_ACCESS_KEY_ID
        secret_access_key: $S3_SECRET_ACCESS_KEY
        REGION: us-east-1
  - data_stream_id: teststream2
    data_stream_route: testroute2
    delivery_targets:
      - name: target2
        type: s3
        endpoint: https://target2.s3.aws.com
        bucket_name: target2
        access_key_id: $S3_ACCESS_KEY_ID
        secret_access_key: $S3_SECRET_ACCESS_KEY
        REGION: us-east-1

Local file system target

To use a local file system target, you simply need to set a directory path. Note that the service will create the path if it does not exist

configs/local/delivery.yml:

programs:
  - data_stream_id: teststream1
    data_stream_route: testroute1
    delivery_targets:
      - name: target1
        type: file
        path: /my/uploads/target1
  - data_stream_id: teststream2
    data_stream_route: testroute2
    delivery_targets:
      - name: target2
        type: file
        path: /my/uploads/target2

Configuring Processing Status API Integration

Upload server is capable of being run locally with the Processing Status API to integrate features from that service into the Upload end to end flow. Setting this up allows for the capability of integrataing reporting structures into the bigger Upload workflow. The Processing Status API repository will need to be cloned locally to access its features for integration. This setup currently assumes that the repositories live adjacent to each other on the local filesystem.

Build Processing Status Report Sink

PStatus report sink needs to be built so your container system gets an image with the changes. To do this run the following from the pstatus-report-sink-ktor directory.

For building the client for Podman:

./gradlew jibDockerBuild -Djib.dockerClient.executable=$(which podman)

For building the client for Docker:

./gradlew jibDockerBuild

This will create a local container that can be used by the following steps.

From the upload-server directory in the Upload Server repository run the following command to built a system using both Upload and PS API.

[!NOTE] In order to avoid port colissions, set the environment variable SINK_PORT to a value other than 8080. Suggested port: 8082

podman-compose -f docker-compose.yml -f docker-compose.localstack.yml -f ../../data-exchange-processing-status/docker-compose.yml -f compose.pstatus.yml up -d

This will set up the system so that:

Upload API is available locally on port 8080
Upload UI is available locally on port 8081
PS API GraphQL endpoint is available locally on 8090

Testing

Unit Tests

[!TIP] Before running unit tests, make sure to clean the file system with the clean.sh script. This removes any temporary upload and report files that the tests generated.

Run the unit tests

go test ./...

Run the unit tests with code coverage

go test -coverprofile=c.out ./...
go tool cover -html=c.out

Integration Tests (with minio and azurite)

Set the environment variable AZURE_STORAGE_KEY in your .env file or locally in your terminal. You can get the default key here.

podman-compose -f docker-compose.yml -f docker-compose.azurite.yml -f docker-compose.minio.yml -f docker-compose.testing.yml up --exit-code-from upload-server

VS Code

When using VS Code, we recommend using the Go extension made by the Go Team at Google

.vscode/launch.json:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Launch Package",
            "type": "go",
            "request": "launch",
            "mode": "auto",
            "program": "cmd/main.go",
            "cwd": "${workspaceFolder}",
            "args": []
        }
    ]
}

Directories ¶

Path	Synopsis
cmd
cli
internal
appconfig
delivery
event
handlertusd
health
loaders
loaders/azure
loaders/file
metadata
metadata/validation
metrics
middleware
models
postprocessing
storeaz
stores3
ui
ui/components
upload
version
pkg
azureinspector
fileinspector
hooks
info
metadata
redislocker
reports
s3inspector
sloger
slogerxexp
testing

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL