Provly
Provly & Provenance
Provly is a rough implementation of the W3C's provenance standandards. This maps the relationship between objects across time. It is worth noting
that this could be the same object changing over time. This is covered in the data models section.
In order to do this Provly provides the following data models:
These models are connected through a set of relationships defined in the Prov spec.
The goal of this particular provenance implementation is to map activities in the research process to make scientific reproducability easier. To accomplish this goal the implementation occasionally strays from the W3C reference when necessary.
Example
This represents an example graph that might be stored in Provly to describe a hypothentical experiment. We can see this experiment resolved around a packet of seeds. While most papers written about this would describe the packet of seed linking to this prov graph gives a description of the origin of this seed packet as well as the anlysis that was done to create the paper. We can see who was involved for each process of the experiment and where entities or agents represent software we are given hashes of the document to ensure exact replication.
Getting Started
This section describes setting up Provly on your machine for development/personal usage. If you are interested in using an existing Provly instance through an API contact the repo owners.
Required software:
# Start the Databse
make start-db
# Start tracing server
make start-zipkins
# Run migrations
make migrate
# Start server
go run ./cmd/provly-api --zipkin-reporter-uri=0.0.0.0:9411
Command default options
--web-api-host=0.0.0.0:3000
--web-debug-host=0.0.0.0:4000
--web-read-timeout=10s
--web-write-timeout=10s
--web-shutdown-timeout=5s
--db-user=root
--db-host=[http://localhost:8529]
--db-name=provly
--zipkin-local-endpoint=0.0.0.0:3000
--zipkin-reporter-uri=http://zipkins:9411/api/v2/spans
--zipkin-service-name=provly-api
--zipkin-probability=0.05
Testing
go test ./...
Loading demo data
The data used to create diagram above can be loaded into the database by running
make demo
Points of interest
There are now four services that you can interact with to help with development.
- API - running on :3000.
- Monitoring & Debug - running on :4000
- Arango Graph DB - running on :8529
- Zipkins Tracing - running on :9411
Data models
A goal of provenance is to track relationships between entities across time using activities as the main
catalyst for change. This model is often conceptually different from data models used in applications. Understanding these differences is key to using Provly effectivly.
While most applications build up relationships between different objects at a single point in time (normally as current as possible), Provly builds up relationships between a single object across a time range. This results in each item in Provly having two identifiers. A canonical ID is the identifier that defines the resource to the outside world, and a provenance ID which defines a particular version of that resource.
If this is hard to conceptualize consider the proverb "If the blade of an axe is replaced, and then its handle, it is still the same axe?" This could be modelled in Provly as follows:
As you can see as the axe goes through transformations its canonical ID does not change, but it gets a new provenance ID after each transformation.
Contributing
All contributions are welcome. Please contact the authors to get involved!