Grants Ingest Service
Ingests and indexes data related to grants.
What's this?
This repository contains both IaC (terraform) and runtime code for the Grants Ingest pipeline
service. The purpose of this service is to collect information about grant opportunities from
third party sources, such as Grants.gov, and organize that data into
per-grant data records that can be consumed whenever updates to the underlying data occur.
Architecture
This service consists of an event-driven pipeline that uses AWS Lambda for running various
compute tasks. Additionally, Amazon S3 is used for archival storage of raw source data, while
DynamoDB provides an active index of processed grant opportunity records. At the end of the ingestion
pipeline, newly-created/-updated grant opportunity records are sent to an event bus that delivers
the updates to subscribers.
High-level architecture
Component-level architecture
Code Organization
Code for this service can generally be considered under two categories:
IaC
Infrastructure-as-code (IaC) used to provision the target environment, which is written
with Terraform, and normally run during deployment. The main Terraform project is located in
this directory; infrastructure specific to a particular step in the pipeline is organized into
Terraform modules located within the modules
subdirectory.
Runtime
Runtime code that executes (e.g. by AWS Lambda) within the target environment in response to
some triggering event. Runtime code is written using Go (currently targeting version 1.23.x
),
which is organized in the repository root directory according to the following conventions:
cmd/
: This directory contains one subdirectory per Lambda function, and should provide a single
main
package that can be compiled into a per-function binary. Each subdirectory of cmd
should have a name that obviously aligns with the particular Lambda function for which it is
written (which generally should correspond to the Terraform module directory used to provision
the Lambda function and its dependencies). For example:
- Lambda handler code:
cmd/DownloadGrantsGovDB
- Terraform module:
modules/DownloadGrantsGovDB
pkg/
: This directory contains "library code" used by one or more Lambda functions in the project,
organized into per-package subdirectories according to Go convention.
internal/
: This directory is similar to pkg/
but contains packages that are only intended
to ever be used internally by this project and make no guarantees about third-party compatibility.
Generally, reusable libraries providing common functionality used by multiple Lambda functions
should reside here.
Development
During development, infrastructure and runtime code can be tested by configuring Terraform
to target an actual ("sandbox", dedicated for development) AWS environment or a mock AWS
environment simulated with LocalStack if an AWS sandbox environment
is unavailable.
Prerequisites
To begin, make sure the following tools are available in your development workspace:
Note: This document assumes usage of Docker-Compose method.
However, you can choose whatever LocalStack installation option best suits your environment.
[!TIP]
Check that you have the necessary dependencies for LocalStack development by running task local:check-dependencies
Environment Variables
We recommend setting the following environment variables in your development workspace when working
with LocalStack:
AWS_SDK_LOAD_CONFIG=true
AWS_REGION=us-west-2
AWS_DEFAULT_REGION=us-west-2
AWS_ACCESS_KEY_ID=testing
AWS_SECRET_ACCESS_KEY=testing
LOCALSTACK_VOLUME_DIR
- Should be set to whatever path you want to persist LocalStack data.
If not set, the default behavior is to target a
./volume
subdirectory
from where docker compose
is run.
Provisioning Infrastructure
Quickstart
Certain steps described in the "Manual Provisioning" section below may be achieved using the following
shortcut commands provided by this repository's Taskfile:
- Prerequisite: Ensure LocalStack is started by running
docker compose up -d
- Deploy to LocalStack for the first time:
task local:from-scratch
- Subsequent deployments to LocalStack:
task local:deploy
- Invoke the
DownloadGrantsGovDB
Lambda function: task local:invoke-DownloadGrantsGovDB
See .taskfile/local.yml
for more information and other shortcuts.
Manual Provisioning
After starting LocalStack, create a terraform state deployment bucket in the localstack environment. This guide, as well as the provided Terraform backend file for local development, assumes that
this bucket is named local-terraform
and has a region of us-west-2
.
To create this bucket, navigate to the terraform folder, then run the following command:
awslocal s3 mb s3://local-terraform
If using the provided docker-compose
then this runs when the localstack container is started.
Next, initialize the Terraform project using the local.s3.tfbackend
Terraform state backend
configuration file provided by this repository. Note that you may need to modify the endpoint
setting in this file if your LocalStack environment uses a different host/port other than the
default of localhost:4566
.
tflocal init -backend-config="local.s3.tfbackend" -reconfigure
Once this command completes successfully, use tflocal
to "provision" mock infrastructure in your LocalStack environment. This will create the Lambda functions, so it requires the functions to be
built first. Use task build
to build the functions. Then use the local.tfvars
file provided by this
repository to provision the mock infrastructure:
tflocal apply -var-file=local.tfvars
Hint: It's normal for task build
and/or tflocal apply
to run for up to a few minutes.
If you notice that these commands are taking an excessively long time or seem to hang, try
cancelling the operation, then run task prebuild-lambda
and try running your command again.
Running Lambda Functions
Once your Lambda function has been deployed to your running LocalStack environment, you can
invoke it in order to test its execution and observe log output. Since each Lambda function
behaves differently and expects different inputs, the exact nature of the debugging cycle
will vary between functions. However, the following example demonstrating invocation of the
function defined in module.DownloadGrantsGovDB
may be used as a starting point.
In general, Lambda functions are invoked with a JSON payload that represents the event
input expected by the Lambda handler. Again, the structure of this payload varies depending
on the invocation source(s) configured for each Lambda function. In the case of the Lambda
function defined in module.DownloadGrantsGovDB
, the expected payload is a JSON object
containing a "timestamp"
key whose value is an ISO-8601 timestamp string, from which
the date of a desired Grants.gov database export is derived. For example:
{"timestamp": "2023-03-20T05:00:00-04:00"}
When using awslocal
(or the official AWS CLI, for that matter), also note that the payload
must be provided as a Base64-encoded encoded representation of the payload. The following example
demonstrates an invocation of a Lambda function named grants-ingest-DownloadGrantsGovDB
that provides the above example JSON in Base64 encoding and prints the Lambda output to stdout:
awslocal lambda invoke \
--function-name grants-ingest-DownloadGrantsGovDB \
--payload $(printf '{"timestamp": "2023-03-20T05:00:00-04:00"}' | base64) \
/dev/stdout
null
{
"StatusCode": 200,
"LogResult": "",
"ExecutedVersion": "$LATEST"
}
The above output displays the Lambda function invocation result following its successful execution;
the output you see will vary depending on the input, the function version that was invoked,
whether an unhandled error cause execution to fail, etc.
In order to view execution logs outputted during Lambda invocation, use CloudWatch Logs (just as
you would when conducting tests against a genuine AWS environment). The following command can
be used to observe log output in real-time:
awslocal logs tail /aws/lambda/grants-ingest-DownloadGrantsGovDB --follow
When actively debugging, it is useful to run a command similar to the above in a separate terminal
prior to invoking the Lambda function under test, as it only displays logs emitted after the
awslocal logs tail
started. You can use the --since
option (with or without --follow
)
in order to display historical logs from previous invocations. For example, the following command
shows logs emitted in the past 1 hour, and will continue to display new logs as they are emitted:
awslocal logs tail /aws/lambda/grants-ingest-DownloadGrantsGovDB --since 1h --follow
Running Common Tasks
This repository provides a Taskfile.yml
file for defining and running common tasks related
to development. You can install the task
utility
and then run task
in your command-line environment to see a list of the available helpers.
Go compiler architecture
This project is configured to target ARM64 CPU architecture when building Go binaries,
since that is what we run on staging and production.
Some contributors who develop on 64-bit Intel machines may have issues running Lambdas on LocalStack with this configuration.
Currently, the suggested workaround is to make the following modifications to target 64-bit Intel when building locally and deploying to LocalStack:
- In
Taskfile.yml
, modify the go-build-lambda
task's command to set GOARCH=amd64
(replaces GOARCH=arm64
).
- In
terraform/local.tfvars
, add the following Terraform input variable: lambda_arch = "x86_64"
.
Quick Reference
The following items can be referred to as a quick "cheat-sheet" for development:
- Install Go dependencies:
go mod download
or go get ./...
- Run Go unit tests:
task test
- Ensure all Go and Terraform code is formatted properly:
task fmt
- Warm up your Go cache:
task prebuild-lambda
- We recommend running this command periodically, especially before compiling Go binaries,
running
tflocal plan
, and/or tflocal apply
.
- Compile binary Lambda function handlers:
task build
- Compile the CLI tool:
task build-cli
- Run all QA checks normally executed during CI:
task check
- Initialize and deploy a test environment after starting LocalStack:
task local:from-scratch
Contributing
This project wouldn’t exist without the hard work of
many people! Please
see CONTRIBUTING.md
to find out how you can help.
Releasing to Production
Note: Releases are versioned using a YYYY.inc
scheme that represents the year of the
release, and the incremental release number for that year. You can view a list of all historical
releases on the Releases page.
Maintainers with the requisite access may release to Production by performing the following
steps:
- Navigate to the list of Releases
for this repository.
- Locate the draft for the next release, and click the pencil icon to edit.
- Provide a high-level summary of the release in the Summary section of the release notes.
- Optionally, make any necessary edits to the other sections of the prepared release notes.
- Ensure "Set as a pre-release" is checked at the bottom of the edit page.
- Click the "Publish Release" button.
At this point, the release will be published (as a pre-release) and the
deployment pipeline
will automatically begin preparing the changes that will be rolled out to Production.
Once a Terraform plan has been created for the release, repository administrators will be notified
for review and final approval. After the plan has been approved and applied to Production,
the release will be automatically updated to remove the pre-release state, and a timestamp
for the deployment will be appended to the release notes.