Data node
Version 0.53.0
A service exposing read only APIs built on top of Vega platform.
Data node provides the following core features:
- Consume all events from Vega core
- Aggregates received events and stores the aggregated data
- Serves stored data via APIs
- Allows advanced configuration Configure a node
Links
- For new developers, see Getting Started.
- For updates, see the Change log for major updates.
- For architecture, please read the documentation to learn about the design for the system and its architecture.
- Please open an issue if anything is missing or unclear in this documentation.
Table of Contents (click to expand)
Installation
To install see Getting Started.
Configuration
Data node is initialised with a set of default configuration with the command data-node init
. To override any of the defaults edit your config.toml
typically found in the ~/.data-node
directory. Example:
[Matching]
Level = 0
ProRataMode = false
LogPriceLevelsDebug = false
LogRemovedOrdersDebug = false
PostgreSQL
As of version 0.53, data node uses PostgreSQL as its storage back end instead of the previous mix of in-memory and BadgerDB file stores. We also make use of Postgres extension called TimescaleDB, which adds a number of time series specific features.
Postgres is not an embedded database, but a separate server application that needs to be running before datanode starts, and a side effect of this transition is a little bit of setup is required by the data node operator.
By default, data node will attempt to connect to a database called vega
listening on localhost:5432
, using the username and password vega
. This is of course all configurable in data node’s config.toml
file.
We are developing using PostgreSQL 14.2
and Timescale 2.7.1
and strongly recommend that you also use the same versions.
[SQLStore]
UseEmbedded = false
[SQLStore.ConnectionConfig]
Host = "localhost"
Port = 5432
Username = "vega"
Password = "vega"
Database = "vega"
UseTransactions = true
Persistence
Currently the database is destroyed if it exists and recreated at data node start-up, though we expect this to change in the not too distant future once the schema has settled down and we add support for starting/stopping data nodes without replaying the entire chain.
There are a few different ways you can get postgres & timescale up and running.
Using docker
This is probably the most straightforward and reliable way to get up and running.
Timescale supply a docker image, so assuming you already have docker installed, it is a simple matter of:
docker run --rm \
-d
-e POSTGRES_USER=vega \
-e POSTGRES_PASSWORD=vega \
-e POSTGRES_DB=vega \
-p 5432:5432 \
timescale/timescaledb:2.7.1-pg14
Using your operating system's native packages
Timescale have a set of instructions for installing Postgres/Timescale using .deb
or .rpm
they have built. If you follow these and get postgres running as a system service you'll then have to create a database, user, and password for the data node to use. For example:
➜ ~ sudo -u postgres psql
psql (14.3 (Ubuntu 14.3-0ubuntu0.22.04.1))
Type "help" for help.
postgres=# create database vega;
CREATE DATABASE
postgres=# create user vega with password 'vega';
CREATE ROLE
postgres=# grant all privileges on database vega to vega;
GRANT
Using 'embedded' PostgreSQL
As mentioned above, PostgreSQL is not an embedded database. However, the good folks over at embedded-postgres-go didn't let that stop them trying.
This go package allows us to start a PostgreSQL server from the data-node. It does this by
- Examining your system to figure out what platform/architecture it is
- Downloading an appropriate PostgreSQL binary installation
- Unpacking it to a temporary location
- Configuring and launching Postgres as a child process of data-node
embedded-postgres-go doesn't come with support for TimescaleDB so we forked it and built a set of our own binaries for a limited set of platforms which we host on GitHub.
We use it for running integration tests and it works quite well however, we haven't tested it on a wide range of platforms, and ran into a few odd issues usually related to linking to various system libraries or sometimes not shutting down cleanly.
You can launch postgres in this way either with the command either using
data-node postgres run
Which will launch embedded postgres in it's own process or
Or by setting
[SQLStore]
UseEmbedded = true
Which will cause data-node to launch Postgres as it starts up, and stop it when it exits. While convenient, if data-node is forcefully killed and doesn't have chance to shutdown it is possible for postgres to keep on running. Postgres then needs to be manually killed to prevent 'unable to bind to port' errors on the next start.
In both cases, the files for the database will be stored in your 'state' directory, e.g. ~/.local/state/vega/data-node/
on Linux.
Building from source
It's quite straightforward; if this is your preferred option you probably already know how to do it. There are instructions on the timescale website.
Using a cloud database provider
This isn't something we've tested yet, but it's something we plan to investigate in the future. Feel very free to give it a try; our main concern is that the latency of the connection may cause data-node to be unable to process blocks as fast as they are produced.
Timescale provide a hosted service, I believe AWS
do as well.
Vega core streaming
Data requires an instance of Vega core node for it's meaningful function. Please see Vega Getting Started.
The data node will listen on default port 3002
for incoming connections from Vega core node.
APIs
In order for clients to communicate with data nodes, we expose a set of APIs and methods for reading data.
There are currently three protocols to communicate with the data node APIs:
gRPC
gRPC is an open source remote procedure call (RPC) system initially developed at Google. In data node the gRPC API features streaming of events in addition to standard procedure calls.
The default port (configurable) for the gRPC API is 3007
and matches the gRPC protobuf definition.
GraphQL
GraphQL is an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data, originally developed at Facebook. The Console uses the GraphQL API to retrieve data including streaming of events.
The GraphQL API is defined by a schema. External clients will use this schema to communicate with Vega.
Queries can be tested using the GraphQL playground app which is bundled with a node. The default port (configurable) for the playground app is 3008
accessing this in a web browser will show a web app for testing custom queries, mutations and subscriptions.
GraphQL SSL
GraphQL subscriptions do not work properly unless the HTTPS is enabled.
To enable TLS on the GraphQL port, set
[Gateway.GraphQL]
HTTPSEnabled = true
You will need your data node to be reachable over the internet with a proper fully qualified domain name, and a matching certificate. If you already have a certificate and corresponding private key file, you can specify them as follows:
[Gateway.GraphQL]
CertificateFile = "/path/to/certificate/file"
KeyFile = "/path/to/key/file"
If you prefer, the data node can manage this for you by automatically generating a certificate and using LetsEncrypt
to sign it for you.
[Gateway.GraphQL]
HTTPSEnabled = true
AutoCertDomain = "my.lovely.domain.com"
However, it is a requirement of the LetsEncrypt
validation process that the the server answering its challenge is running on the standard HTTPS port (443). This means you must either
- Forward port 443 on your machine to the GraphQL port (3008 by default) using
iptables
or similar
- Directly use port 443 for the GraphQL server in data-node by specifying
[Gateway.GraphQL]
Port = 443
Note that Linux systems generally require processes listening on ports under 1024 to either
- run as root, or
- be specifically granted permission, e.g. by launching with
setcap cap_net_bind_service=ep data-node run
GraphQL Complexity
Currently the GraphQL complexity limit is globally set to 3750.
This setting is theoretic at the moment and will be refined and have different levels for different queries/resolvers in the future.
The intention behind this limit is to prevent the VEGA system from being anused by heavy queries (DOS).
The complexity level is mostly affected by the number of objects a query contains. So the heaviest ones we currently have in the system (based on discussion with Matt) are:
SimpleMarkets (embedded candles):
1 candle: 151
91 candles: 788
MarketInfo (embedded candles):
1 candle: 399
91 candles: 1036
Orders (embedded orders):
Complexity for:
1 order is: 163
80 orders: 4003
Trades (embedded trades):
Complexity with
1 trade is:
118
75 trades: 1393
Positions (embedded positions):
Complexity with 1 position is 129
For 40 positions is:
2500
The approximate number of positions queries by customers is 40.
At the moment we do not have a precise idea what limit would be appropriate to set for candles and orders. This would take some time and experience.
So for 40 positions - complexity is 2500. A theoretical value of 3000 is set as a maximum + 25% -> 3750.
The GraphQL will return error for queries that have complexity above the set limit: "GraphQL error: Query is too complex to execute" and will not proceed with execution.
Further settings for GraphQL limits will be customized for specific evil queries and will be set for the specific GraphQL resolvers methods. That would also affect subscriptions and oneoff queries.
REST
REST provides a standard between computer systems on the web, making it easier for systems to communicate with each other. It is arguably simpler to work with than gRPC and GraphQL. In Vega the REST API is a reverse proxy to the gRPC API, however it does not support streaming.
The default port (configurable) for the REST API is 3009
and we use a reverse proxy to the gRPC API to deliver the REST API implementation.
Troubleshooting & debugging
The application has structured logging capability, the first port of call for a crash is probably the Vega and Tendermint logs which are available on the console if running locally or by journal plus syslog if running on test networks. Default location for log files:
Each internal Go package has a logging level that can be set at runtime by configuration. Setting the logging Level
to -1
for a package will enable all debugging messages for the package which can be useful when trying to analyse a crash or issue.