README ¶
Grafeas - DynamoDb
This project provides a Grafeas implementation that supports using AWS DynamoDB as a storage mechanism.
Building
Build using the provided Makefile or via Docker.
# Either build via make
make build
# or docker
docker build --rm .
Unit tests
Testing is performed against a DynamoDB instance. The Makefile offers the ability to start and stop a locally installed DynamoDB instance running via Java. This requires that port 8000 be free.
# BRING YOUR OWN DYNAMODB ON PORT 8000
make test
# OR USE THE LOCAL JAVA DYNAMODB INSTANCE
make pre-test test post-test
Configuring
The server looks for a configuration file that is passed in via the --config
argument. That file should be in YAML format and follows the specification laid down by the main Grafeas project. There is an additional configuration namespace that must be set in order to use DynamoDB.
grafeas:
storage_type: dynamodb
dynamodb:
table: "Name_of_table_to_use_within_DynamoDB"
aws:
endpoint: "http://localhost:1234"
region: "eu-west-1"
...
The AWS configuration options are used for defining how to interact with DynamoDB. They are optional.
Option | Meaning | Example |
---|---|---|
endpoint | AWS endpoint. Set if you wish to run against a local DynamoDB instance, otherwise leave blank or do not use. | http://localhost:8000 |
region | AWS region. Set if you wish to run against a DynamoDB instance in you non-default region, otherwise leave blank or do not use. | eu-west-1 |
Configuration options are translated to their respective AWS Config equivalent, where the name is uppercase in the Grafeas yaml config.
The ...
in the snippet above refers to the any other configuration required by Grafeas. A simple working example is below:
grafeas:
api:
# Endpoint address
address: "0.0.0.0:8080"
# PKI configuration (optional)
cafile: ca.crt
keyfile: ca.key
certfile: ca.crt
# CORS configuration (optional)
cors_allowed_origins:
# - "http://example.net"
storage_type: dynamodb
dynamodb:
table: "Name_of_table_to_use_within_DynamoDB"
This instance of Grafeas also supports all the storage mechanisms defined within the main Grafeas project. Note that if dynamodb
is not specified as the storage_type
, then this instance of Grafeas will use the default storage mechanism (which is currently memstore
).
The configuration file is specified by way of the --config
argument
--config /path/to/config.yaml
Running the Server
This implementation requires a DynamoDB instance to operate against. That instance may be an AWS instance, or it can be a local instance. If it is local, the following variables need to be set (to anything): AWS_REGION
, AWS_ACCESS_KEY
, AWS_SECRET_ACCESS_KEY
.
If an AWS instance is the target, then credentials are parsed from the underlying system as per AWS's documentation.
Pass the name of a configuration file to the executable via the --config
command line argument.
cd go/v1beta1
go run main/main.go -- --config /path/to/your/config.yaml
This will start the Grafeas gRPC and REST APIs on localhost:8080
.
The master branch publishes docker images to the GitHub Package Registry here. There is no versioning at present. The server can be started by using the image:
docker run -p 8080:8080 -v /path/to/config.yaml:/grafeas/config.yaml docker.pkg.github.com/john-tipper/grafeas-dynamodb/grafeas-dynamodb:latest --config /grafeas/config.yaml
DynamoDB Details
Preamble
If you haven't used DynamoDB before, then these two resources are fantastic by way of an introduction to data modelling in NoSQL:
- https://www.youtube.com/watch?v=HaEPXoXVf2k
- https://www.trek10.com/blog/dynamodb-single-table-relational-modeling/
The AWS DynamoDB Developer Guide is here.
Data Model
The Grafeas data is stored in a single table, as per AWS best practice. That table name is customisable via configuration. If the table does not exist when Grafeas is started, then the application will attempt to create it.
Data is stored using 4 columns, called PartitionKey
, SortKey
, Data
and Json
. What data is actually stored in the columns depends on the item being stored.
There are 2 indices:
- Global Primary Index (GPI):
- Hash: PartitionKey
- Range: SortKey
- Global Secondary Index (GSI1):
- Hash: SortKey
- Range: Data
Data Object | PartitionKey | SortKey | Data | Json |
---|---|---|---|---|
Project | Project name (projects/[PROJECT ID] ) |
"PROJECT" |
Project name (projects/[PROJECT ID] ) |
Json representation of Project |
Note | Note name (projects/[PROJECT ID]/notes/[NOTE ID] ) |
"NOTE" |
Project name (projects/[PROJECT ID] ) |
Json representation of Note |
Occurrence | Occurrence name (projects/[PROJECT_ID]/occurrences/[OCCURRENCE_ID] ) |
"OCCURRENCE" |
Project ID | Json representation of Occurrence |
Occurrence note | Occurrence name (projects/[PROJECT_ID]/occurrences/[OCCURRENCE_ID] ) |
Note name (projects/[NOTE PROJECT ID]/notes/[NOTE ID] ) |
Occurrence name (projects/[PROJECT_ID]/occurrences/[OCCURRENCE_ID] ) |
Json representation of Occurrence |
Projects and Notes can be queried by ID using the GPI, or listing all items of that respective type by means of the GSI, in which case they will be lexicographically sorted. It's important to realise that Notes and Occurrences may be stored in different projects (this is the recommendation within the Grafeas documentation).
Note that when Occurrences are created, 2 rows are created in the table (this is the Adjacency List pattern described in the 2 resources (blog and video) listed above). The first row allows for querying by ID using the GPI, or listing all Occurrences by means of the GSI, as is the case for Projects and Notes. The second row saves the associated Note name in the Data
column, which means that the Note associated with a given Occurrence can be retrieved by means of the GPI (parsing the Note ID from the Occurrence, then querying the GPI for that Note ID). Additionally, all occurrences across all projects associated with a given Note can be queried using the GSI (SortKey
contains the Note name of interest).
Pagination support is provided out of the box with DynamoDB; see the main Grafeas documentation for how to use this.
No support is currently provided for migration of schemas in the event of changes to the Grafeas structure and thus any such migrations will need to be performed manually.
Consistency and Billing
Strict consistency is used for queries and gets that make use of the GPI; all others use eventual consistency. Billing is set to per-request: this will be more expensive if you have many queries, but for experimenting at low query volumes then this will likely be the cheapest option.
Contributing
Pull requests welcome.
License
Grafeas-dynamodb is under the Apache 2.0 license. See the LICENSE file for details.