README ¶
ds-api
This API provides API access to the Spinup Data Set service.
Endpoints
GET /v1/ds/ping
GET /v1/ds/version
GET /v1/ds/metrics
POST /v1/ds/{account}/datasets/{group}
GET /v1/ds/{account}/datasets/{group}/{id}
PATCH /v1/ds/{account}/datasets/{group}/{id}
PUT /v1/ds/{account}/datasets/{group}/{id}
DELETE /v1/ds/{account}/datasets/{group}/{id}
POST /v1/ds/{account}/datasets/{group}/{id}/attachments
DELETE /v1/ds/{account}/datasets/{group}/{id}/attachments
GET /v1/ds/{account}/datasets/{group}/{id}/attachments
GET /v1/ds/{account}/datasets/{group}/{id}/instances
POST /v1/ds/{account}/datasets/{group}/{id}/instances
DELETE /v1/ds/{account}/datasets/{group}/{id}/instances/{instance_id}
GET /v1/ds/{account}/datasets/{group}/{id}/logs
GET /v1/ds/{account}/datasets/{group}/{id}/users
POST /v1/ds/{account}/datasets/{group}/{id}/users
DELETE /v1/ds/{account}/datasets/{group}/{id}/users
PUT /v1/ds/{account}/datasets/{group}/{id}/users
Usage
Create a dataset
POST /v1/ds/{account}/datasets/{group}
{
"name": "awesome-dataset-of-stuff",
"type": "s3",
"derivative": true,
"tags": [
{ "key": "Application", "value": "ButWhyyyyy" },
{ "key": "COA", "value": "Take.My.Money" },
{ "key": "CreatedBy", "value": "SomeGuy" }
],
"metadata": {
"description": "The hugest dataset of awesome stuff",
"created_at": "2018-03-28T07:36:01.123Z",
"created_by": "drzoidberg",
"data_classifications": ["hipaa","pii"],
"data_format": "file",
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"modified_at": "2019-03-28T07:36:01.123Z",
"modified_by": "pfry",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": ["e15d2282-9c68-46b5-801c-2b5a62484624", "a7c082ee-f711-48fa-8a57-25c95b3a6ddd"]
}
}
Response
{
"id": "d37b375b-d136-4b17-8666-5036dc554a66",
"repository": "dataset-localdev-d37b375b-d136-4b17-8666-5036dc554a66",
"metadata": {
"id": "d37b375b-d136-4b17-8666-5036dc554a66",
"name": "awesome-dataset-of-stuff",
"description": "The hugest dataset of awesome stuff",
"created_at": "2020-03-11T18:41:32Z",
"created_by": "drzoidberg",
"data_classifications": [
"hipaa",
"pii"
],
"data_format": "file",
"data_storage": "s3",
"derivative": true,
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"modified_at": "2020-03-11T18:41:32Z",
"modified_by": "pfry",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": [
"e15d2282-9c68-46b5-801c-2b5a62484624",
"a7c082ee-f711-48fa-8a57-25c95b3a6ddd"
]
}
}
Response Code | Definition |
---|---|
202 Accepted | creation request accepted |
400 Bad Request | badly formed request |
403 Forbidden | you don't have access to bucket |
404 Not Found | account not found |
409 Conflict | bucket or iam policy already exists |
429 Too Many Requests | service or rate limit exceeded |
500 Internal Server Error | a server error occurred |
503 Service Unavailable | an AWS service is unavailable |
Get information about a dataset
GET /v1/ds/{account}/datasets/{group}/{id}
{
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"metadata": {
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"name": "awesome-dataset-of-stuff",
"description": "The hugest dataset of awesome stuff",
"created_at": "2020-03-16T15:38:14Z",
"created_by": "drzoidberg",
"data_classifications": [
"hipaa",
"pii"
],
"data_format": "file",
"data_storage": "s3",
"derivative": true,
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"modified_at": "2020-03-16T15:38:14Z",
"modified_by": "pfry",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": [
"d37b375b-d136-4b17-8666-5036dc554a66",
]
},
"repository": {
"name": "dataset-localdev-bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"empty": false,
"tags": [
{
"key": "CreatedBy",
"value": "SomeGuy"
},
{
"key": "spinup:org",
"value": "localdev"
},
{
"key": "ID",
"value": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8"
},
{
"key": "COA",
"value": "Take.My.Money"
},
{
"key": "Application",
"value": "ButWhyyyyy"
},
{
"key": "Name",
"value": "awesome-dataset-of-stuff"
}
]
}
}
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | dataset not found |
500 Internal Server Error | a server error occurred |
Promote a dataset
PATCH /v1/ds/{account}/datasets/{group}/{id}
Headers:
X-Forwarded-User: awong
Response
{
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"metadata": {
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"name": "awesome-dataset-of-stuff",
"description": "The hugest dataset of awesome stuff",
"created_at": "2020-03-16T15:38:14Z",
"created_by": "drzoidberg",
"data_classifications": [
"hipaa",
"pii"
],
"data_format": "file",
"data_storage": "s3",
"derivative": false,
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"finalized_at": "2020-06-01T19:27:35Z",
"finalized_by": "awong",
"modified_at": "2020-06-01T19:27:35Z",
"modified_by": "awong",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": [
"d37b375b-d136-4b17-8666-5036dc554a66",
]
}
}
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | dataset not found |
409 Conflict | dataset already finalized |
500 Internal Server Error | a server error occurred |
Update dataset metadata
PUT /v1/ds/{account}/datasets/{group}/{id}
Headers:
X-Forwarded-User: awong
Request:
{
"metadata": {
"description": "It's actually a tiny dataset"
}
}
Response
{
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"metadata": {
"id": "bb4f6316-53e2-45ae-97c7-fa7fd17f78a8",
"name": "awesome-dataset-of-stuff",
"description": "It's actually a tiny dataset",
"created_at": "2020-03-16T15:38:14Z",
"created_by": "drzoidberg",
"data_classifications": [
"hipaa",
"pii"
],
"data_format": "file",
"data_storage": "s3",
"derivative": false,
"dua_url": "https://allmydata.s3.amazonaws.com/duas/huge_awesome_dua.pdf",
"finalized_at": "2020-06-01T19:27:35Z",
"finalized_by": "awong",
"modified_at": "2020-06-01T21:31:05Z",
"modified_by": "awong",
"proctor_response_url": "https://allmydata.s3.amazonaws.com/proctor/huge_awesome_study.json",
"source_ids": [
"d37b375b-d136-4b17-8666-5036dc554a66",
]
}
}
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | dataset not found |
500 Internal Server Error | a server error occurred |
Delete a dataset
DELETE /v1/ds/{account}/datasets/{group}/{id}
Headers:
X-Forwarded-User: awong
Response Code | Definition |
---|---|
204 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | dataset not found |
500 Internal Server Error | a server error occurred |
Create attachment for a dataset
POST /v1/ds/{account}/datasets/{group}/{id}/attachments
The request needs to be a multipart/form-data
with the following parameters:
name
- the name of the attachment as it should be saved, e.g.eula.txt
attachment
- the content of the file being uploaded
Response
[
"eula.txt"
]
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request, or file too big |
404 Not Found | dataset not found |
500 Internal Server Error | a server error occurred |
Delete attachment from a dataset
DELETE /v1/ds/{account}/datasets/{group}/{id}/attachments
{
"attachment_name": "dummy.doc"
}
Response
Response Code | Definition |
---|---|
204 OK | attachment deleted, if it existed |
400 Bad Request | bad request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
Get attachments for a dataset
GET /v1/ds/{account}/datasets/{group}/{id}/attachments
Response
[
{
"Name": "Dataset Data Use Agreement.pdf",
"Modified": "2020-05-17T02:04:27Z",
"Size": 3708454,
"URL": "https://dataset-localdev-3cadbe31-27e9-4f7a-9515-51ec9d754022.s3.amazonaws.com/_attachments/Dataset%20Data%20Use%20Agreement.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAXQVXYEBXA5X5LRN3%2F20200518%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200518T132423Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=342d937b7b726408c2efe41493d126ea577204f85ffe77ffc9b3cf22af80c7ea"
},
{
"Name": "eula.txt",
"Modified": "2020-05-18T13:19:34Z",
"Size": 6920,
"URL": "https://dataset-localdev-3cadbe31-27e9-4f7a-9515-51ec9d754022.s3.amazonaws.com/_attachments/eula.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAXQVXYEBXA5X5LRN3%2F20200518%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200518T132423Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=c2d7f7165ce3c099e8eefcb14e3b4c7e0e6a319af48d6727f25519f35488b14a"
}
]
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
List all instances that have access to a dataset
GET /v1/ds/{account}/datasets/{group}/{id}/instances
{
"id": "95db5a7b-466b-4aa7-bbe1-1e23ed860f32",
"access": {
"i-01f9bfb7ee683e807": "instanceRole_i-01f9bfb7ee683e807"
}
}
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
Grant dataset access to an instance
POST /v1/ds/{account}/datasets/{group}/{id}/instances
{
"instance_id": "i-01f9bfb7ee683e807"
}
Response
{
"id": "95db5a7b-466b-4aa7-bbe1-1e23ed860f32",
"access": {
"i-01f9bfb7ee683e807": "instanceRole_i-01f9bfb7ee683e807"
}
}
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
Revoke dataset access from an instance
DELETE /v1/ds/{account}/datasets/{group}/{id}/instances/{instance_id}
Response Code | Definition |
---|---|
204 OK | instance access revoked |
400 Bad Request | bad request, or instance doesn't have access |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
Get audit logs for a dataset
GET /v1/ds/{account}/datasets/{group}/{id}/logs
Response
[
"11/19/2020, 17:07:28 - Created dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (CreatedBy: drzoidberg)",
"11/19/2020, 17:51:39 - Updated metadata for dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (ModifiedBy: awong)",
"11/19/2020, 17:56:33 - Finalized original dataset 3819c173-e1a8-4fe5-b55c-b224bb86ddbd (ModifiedBy: me)"
]
Response Code | Definition |
---|---|
200 OK | okay |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
500 Internal Server Error | a server error occurred |
Create a user for a dataset
POST /v1/ds/{account}/datasets/{group}/{id}/users
Request body is empty.
Response
{
"user": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpUsr",
"group": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpGrp",
"policy": "dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpPlc",
"credentials": {
"akid": "XXXXXXXXXXXXXXXXXXXX",
"secret": "secretsecretsecretsecretsecretsecret",
}
}
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
409 Conflict | user already exists |
500 Internal Server Error | a server error occurred |
Delete a user for a dataset
DELETE /v1/ds/{account}/datasets/{group}/{id}/users
Response
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset/user not found |
500 Internal Server Error | a server error occurred |
Get a user for a dataset
GET /v1/ds/{account}/datasets/{group}/{id}/users
Response
{
"dataset-ssdev-95db5a7b-466b-4aa7-bbe1-1e23ed860f32-DsTmpUsr": {
"keys": {
"XXXXXXXXXXXXXXXXXXXX": "Inactive",
"YYYYYYYYYYYYYYYYYYYY": "Active"
}
}
}
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset/user not found |
500 Internal Server Error | a server error occurred |
Update a user's key for a dataset
PUT /v1/ds/{account}/datasets/{group}/{id}/users
Request body is empty.
Response
{
"keys": {
"XXXXXXXXXXXXXXXXXXXXX": "Inactive"
},
"credentials": {
"akid": "YYYYYYYYYYYYYYYYYYYYY",
"secret": "secretsecretsecretsecretsecretsecret"
}
}
Response Code | Definition |
---|---|
200 OK | instance access granted |
400 Bad Request | badly formed request |
404 Not Found | account/dataset not found |
429 Limit Exceeded | maximum number of keys |
500 Internal Server Error | a server error occurred |
Authentication
Authentication is accomplished using a pre-shared key (hashed string) in the X-Auth-Token
header.
API Configuration
API configuration is via config/config.json
, an example config file is provided.
You can specify a single metadataRepository
where metadata about all the different data sets will be stored. Currently, the only supported type is s3
, so you need to provide an S3 bucket and credentials with full access to that bucket. For example, if you created a bucket called spinup-example-metadata-repository
, then the IAM policy would be:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::spinup-example-metadata-repository",
"arn:aws:s3:::spinup-example-metadata-repository/*"
]
}
]
}
You can then define a list of accounts
for the actual dataset repositories - that's where the data sets will be stored. Currently, the only supported type is s3
, so you need to provide credentials in each account with the appropriate S3 and IAM access. This is a good starting IAM policy if you don't modify the default name and path prefixes:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "iam:*",
"Resource": [
"arn:aws:iam::*:role/spinup/dataset/*",
"arn:aws:iam::*:instance-profile/spinup/dataset/*",
"arn:aws:iam::*:group/spinup/dataset/*",
"arn:aws:iam::*:user/spinup/dataset/*",
"arn:aws:iam::*:policy/spinup/dataset/*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:GetRole",
"iam:GetInstanceProfile",
"iam:ListAttachedRolePolicies",
"iam:PassRole"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3::*:dataset-*"
]
},
{
"Effect": "Allow",
"Action": [
"ec2:AssociateIamInstanceProfile",
"ec2:DescribeIamInstanceProfileAssociations",
"ec2:DescribeInstances",
"ec2:DisassociateIamInstanceProfile"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "logs:CreateLogGroup",
"Resource": "arn:aws:logs:*:*:log-group:/spinup/ORG/*"
},
{
"Effect": "Allow",
"Action": [
"logs:ListTagsLogGroup",
"logs:CreateLogStream",
"logs:TagLogGroup",
"logs:DescribeLogGroups",
"logs:DeleteLogGroup",
"logs:DescribeLogStreams",
"logs:GetLogEvents",
"logs:PutRetentionPolicy",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:*:*:log-group:/spinup/ORG/*:log-stream:*",
"arn:aws:logs:*:*:log-group:/spinup/ORG/*"
]
}
]
}
Dataset groups
When creating a data set you need to specify a group that it belongs to. The group could be any arbitrary string and it just provides a way to group similar datasets together (e.g. data sets that are part of the same application or department). Currently, the group is only used for logging purposes but eventually it will play a more significant role.
Authors
E Camden Fisher camden.fisher@yale.edu Tenyo Grozev tenyo.grozev@yale.edu
License
GNU Affero General Public License v3.0 (GNU AGPLv3)
Copyright (c) 2020 Yale University
Documentation ¶
There is no documentation for this package.