aggregator

module

v0.0.0-...-334bf7d Latest Latest Go to latest Published: Jun 3, 2020 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/covidtrace/aggregator

Links

Open Source Insights

README ¶

covidtrace/aggregator

Aggregator handles processing raw COVID Trace input files (locations or tokens/beacons) to produce anonymized public data. Aggregator also handles generating query hints when directories in the public data bucket grow larger than a predefined threshold. These hints allow the app to better control data usage.

Diagram

Details

The COVID Trace app has symptomatic users upload CSV files directly to input buckets. There are two types of files: location files and token/beacon files. Location files contain a unix timestamp rounded up to the hour, an S2 Geometry Cell ID, and a verified status (which is currently always set to false). Token files contain a unix timestamp rounded up to the hour, a beacon UUID, and an S2 Geometry Cell ID.

Published files contain aggregated and anonymized input data at various S2 Geometry Cell ID levels. Files are aggregated at different S2 Cell ID levels to allow clients to control data usage. In particular, higher S2 Geometry Cell ID levels are more specific and thus will contain fewer data points.

Jobs

The following jobs are performed periodically by the Aggregator.

Aggregate Locations

Fetch and aggregate all location input files, producing several output files.

Aggregate Tokens

Fetch and aggregate all token/beacon input files, producing several output files.

Hinting

List all prefixes in the published data bucket, then recursively compute the size of each prefix. If the size of a particular prefix exceeds a theshold, create a 0_HINT file that indicates to clients that they should subdivide queries for that prefix into more specific queries.

Deploying

Aggregator is deployed as a Google Cloud Run service that is triggered by several Cloud Scheduler jobs at different intervals. The Aggregator is controlled by environment variables and a configuration file.

HINTING_THRESHOLD="number of bytes at which a prefix will be subdivided"
GOROUTINE_LIMIT="max number of goroutines to spawn when interacting with Cloud Storage"
CONFIG_FILE="URL to config file"

Directories ¶

Path	Synopsis
aggregate
config
hinting
serve

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL