Gorge
Gorge is a service which harvests hydrological data (river's discharge and water level) on schedule.
Harvested data is stored in database and can be queried later.
Usage
Gorge is distributed as docker image with two binary files:
gorge-server
(entrypoint) - web server with REST API
gorge-cli
- command-line client for this server. Since image is distroless, use docker exec gorge gorge-cli
to call it
Launching
gorge-server
accepts configuration via cli arguments (use gorge-server --help
). You can pass them via docker-compose command field, like this:
command:
[
"--pg-db",
"gorge",
"--debug",
"--log-format",
"plain",
"--db-chunk-size",
"1000",
]
Here is the list of available flags:
--cache string Either 'inmemory' or 'redis' (default "redis")
--db string Either 'inmemory' or 'postgres' (default "postgres")
--db-chunk-size int Measurements will be saved to db in chunks of this size. When set to 0, they will be saved in one chunk, which can cause errors
--debug Enables debug mode, sets log level to debug
--endpoint string Endpoint path (default "/")
--http-timeout int Request timeout in seconds (default 60)
--http-user-agent string User agent for requests sent from scripts. Leave empty to use fake browser agent (default "whitewater.guide robot")
--log-format string Set this to 'json' to output log in json (default "json")
--log-level string Log level. Leave empty to discard logs (default "warn")
--pg-db string Postgres database (default "postgres")
--pg-host string Postgres host (default "db")
--pg-password string Postgres password
--pg-user string Postgres user (default "postgres")
--port string Port (default "7080")
--redis-host string Redis host (default "redis")
--redis-port string Redis port (default "6379")
Postgres and redis can also be configured using folowing environment variables:
- POSTGRES_HOST
- POSTGRES_DB
- POSTGRES_USER
- POSTGRES_PASSWORD
- REDIS_HOST
- REDIS_PORT
Environment variables have lower priority than cli flags.
Gorge uses database to store harvested measurements and scheduled jobs. It comes with postgres and sqlite drivers. Postgres with timescaledb extension is recommended for production. Gorge will initialize all the required tables. Check out sql migration file if you're curious about db schema.
Gorge uses cache to store safe-to-lose data: latest measurement each gauge and harvest statuses. It comes with redis (recommended) and embedded redis drivers.
Development
Preferred way of development is to develop inside docker container. I do this in VS Code. There's a compose file for this purpose.
There's a modd tool installed in dev image, which enables liver reloading and tests. Start it using make run
.
If you want to develop on host machine, you'll need following tools installed on it (they're installed in docker image, see Dockerfile for more info):
- libproj shared library, to convert coordinate systems
- go-bindata to embed sql scripts
- modd it's actually optional
Some tests require postgres. You cannot run them inside docker container (unless you want to mess with docker-inside-docker). They're excluded from main test set, I run them using make test-nodocker
from host machine or CI environment.
Writing scripts
Here are some recommendations for writing scripts for new sources
- Write tests, but when testing, do not use calls to real URLs, because unit tests can flood upstream with requests
- Round locations to 5 digits precision link, round levels and flows to what seems reasonable
- When converting coordinates, use core.ToEPSG4326 utility function. It uses PROJ internally
- Use
core.Client
http client, which sets timeout, user-agent and has various helpers
- Do not bother with sorting results - this is done by script consumers
- Do not filter by
codes
and since
inside worker. They are meant to be passed to upstream. Empty codes
for all-at-once script must return all available measurements.
- Return null value (
nulltype.NullFloat64{}
) for level/flow when it's not provided
- Pay extra attention to time zones!
- Pass variables like access keys via script options
- Provide sample http requests (see
requests.http
files)
Env variables
Container makes use of following env variables. Env variables have lesser priority than config values.
Name |
Default value |
Desription |
POSTGRES_HOST |
|
Postgres connection details - host |
POSTGRES_DB |
|
Postgres connection details - database name |
POSTGRES_USER |
|
Postgres connection details - user |
POSTGRES_PASSWORD |
|
Postgres connection details - password |
REDIS_HOST |
redis |
Redis connection details - host |
REDIS_PORT |
6379 |
Redis connection details - port |
TODO
- Virtual gauges
- Statuses
- What happens when one component is broken?
- Authorization
- Pushing
- Subscriptions
- Advanced scheduling, new harvest mode: batched
- Scripts as Go plugins
- Send logs to sentry
- Per-script binaries for third-party consumption
- Autogenerate typescript definitions
- Add "upstream" JSON field to allow upstream methods pass arbitrary fields