README
¶
State Sync Snapshotting
The snapshots
package implements automatic support for Tendermint state sync
in Cosmos SDK-based applications. State sync allows a new node joining a network
to simply fetch a recent snapshot of the application state instead of fetching
and applying all historical blocks. This can reduce the time needed to join the
network by several orders of magnitude (e.g. weeks to minutes), but the node
will not contain historical data from previous heights.
This document describes the Cosmos SDK implementation of the ABCI state sync interface, for more information on Tendermint state sync in general see:
- Tendermint Core State Sync for Developers
- ABCI State Sync Spec
- ABCI State Sync Method/Type Reference
Overview
For an overview of how Cosmos SDK state sync is set up and configured by developers and end-users, see the Cosmos SDK State Sync Guide.
Briefly, the Cosmos SDK takes state snapshots at regular height intervals given
by state-sync.snapshot-interval
and stores them as binary files in the
filesystem under <node_home>/data/snapshots/
, with metadata in a LevelDB database
<node_home>/data/snapshots/metadata.db
. The number of recent snapshots to keep are given by
state-sync.snapshot-keep-recent
.
Snapshots are taken asynchronously, i.e. new blocks will be applied concurrently
with snapshots being taken. This is possible because IAVL supports querying
immutable historical heights. However, this requires state-sync.snapshot-interval
to be a multiple of pruning-keep-every
, to prevent a height from being removed
while it is being snapshotted.
When a remote node is state syncing, Tendermint calls the ABCI method
ListSnapshots
to list available local snapshots and LoadSnapshotChunk
to
load a binary snapshot chunk. When the local node is being state synced,
Tendermint calls OfferSnapshot
to offer a discovered remote snapshot to the
local application and ApplySnapshotChunk
to apply a binary snapshot chunk to
the local application. See the resources linked above for more details on these
methods and how Tendermint performs state sync.
The Cosmos SDK does not currently do any incremental verification of snapshots during restoration, i.e. only after the entire snapshot has been restored will Tendermint compare the app hash against the trusted hash from the chain. Cosmos SDK snapshots and chunks do contain hashes as checksums to guard against IO corruption and non-determinism, but these are not tied to the chain state and can be trivially forged by an adversary. This was considered out of scope for the initial implementation, but can be added later without changes to the ABCI state sync protocol.
Snapshot Metadata
The ABCI Protobuf type for a snapshot is listed below (refer to the ABCI spec for field details):
message Snapshot {
uint64 height = 1; // The height at which the snapshot was taken
uint32 format = 2; // The application-specific snapshot format
uint32 chunks = 3; // Number of chunks in the snapshot
bytes hash = 4; // Arbitrary snapshot hash, equal only if identical
bytes metadata = 5; // Arbitrary application metadata
}
Because the metadata
field is application-specific, the Cosmos SDK uses a
similar type cosmos.base.snapshots.v1beta1.Snapshot
with its own metadata
representation:
// Snapshot contains Tendermint state sync snapshot info.
message Snapshot {
uint64 height = 1;
uint32 format = 2;
uint32 chunks = 3;
bytes hash = 4;
Metadata metadata = 5 [(gogoproto.nullable) = false];
}
// Metadata contains SDK-specific snapshot metadata.
message Metadata {
repeated bytes chunk_hashes = 1; // SHA-256 chunk hashes
}
The format
is currently 1
, defined in snapshots.types.CurrentFormat
. This
must be increased whenever the binary snapshot format changes, and it may be
useful to support past formats in newer versions.
The hash
is a SHA-256 hash of the entire binary snapshot, used to guard
against IO corruption and non-determinism across nodes. Note that this is not
tied to the chain state, and can be trivially forged (but Tendermint will always
compare the final app hash against the chain app hash). Similarly, the
chunk_hashes
are SHA-256 checksums of each binary chunk.
The metadata
field is Protobuf-serialized before it is placed into the ABCI
snapshot.
Snapshot Format
The current version 1
snapshot format is a zlib-compressed, length-prefixed
Protobuf stream of cosmos.base.store.v1beta1.SnapshotItem
messages, split into
chunks at exact 10 MB byte boundaries.
// SnapshotItem is an item contained in a rootmulti.Store snapshot.
message SnapshotItem {
// item is the specific type of snapshot item.
oneof item {
SnapshotStoreItem store = 1;
SnapshotIAVLItem iavl = 2 [(gogoproto.customname) = "IAVL"];
}
}
// SnapshotStoreItem contains metadata about a snapshotted store.
message SnapshotStoreItem {
string name = 1;
}
// SnapshotIAVLItem is an exported IAVL node.
message SnapshotIAVLItem {
bytes key = 1;
bytes value = 2;
int64 version = 3;
int32 height = 4;
}
Snapshots are generated by rootmulti.Store.Snapshot()
as follows:
- Set up a
protoio.NewDelimitedWriter
that writes length-prefixed serializedSnapshotItem
Protobuf messages.- Iterate over each IAVL store in lexicographical order by store name.
- Emit a
SnapshotStoreItem
containing the store name. - Start an IAVL export for the store using
iavl.ImmutableTree.Export()
. - Iterate over each IAVL node.
- Emit a
SnapshotIAVLItem
for the IAVL node.
- Pass the serialized Protobuf output stream to a zlib compression writer.
- Split the zlib output stream into chunks at exactly every 10th megabyte.
Snapshots are restored via rootmulti.Store.Restore()
as the inverse of the above, using
iavl.MutableTree.Import()
to reconstruct each IAVL tree.
Snapshot Storage
Snapshot storage is managed by snapshots.Store
, with metadata in a db.DB
database and binary chunks in the filesystem. Note that this is only used to
store locally taken snapshots that are being offered to other nodes. When the
local node is being state synced, Tendermint will take care of buffering and
storing incoming snapshot chunks before they are applied to the application.
Metadata is generally stored in a LevelDB database at
<node_home>/data/snapshots/metadata.db
. It contains serialized
cosmos.base.snapshots.v1beta1.Snapshot
Protobuf messages with a key given by
the concatenation of a key prefix, the big-endian height, and the big-endian
format. Chunk data is stored as regular files under
<node_home>/data/snapshots/<height>/<format>/<chunk>
.
The snapshots.Store
API is based on streaming IO, and integrates easily with
the snapshots.types.Snapshotter
snapshot/restore interface implemented by
rootmulti.Store
. The Store.Save()
method stores a snapshot given as a
<- chan io.ReadCloser
channel of binary chunk streams, and Store.Load()
loads
the snapshot as a channel of binary chunk streams -- the same stream types used
by Snapshotter.Snapshot()
and Snapshotter.Restore()
to take and restore
snapshots using streaming IO.
The store also provides many other methods such as List()
to list stored
snapshots, LoadChunk()
to load a single snapshot chunk, and Prune()
to prune
old snapshots.
Taking Snapshots
snapshots.Manager
is a high-level snapshot manager that integrates a
snapshots.types.Snapshotter
(i.e. the rootmulti.Store
snapshot
functionality) and a snapshots.Store
, providing an API that maps easily onto
the ABCI state sync API. The Manager
will also make sure only one operation
is in progress at a time, e.g. to prevent multiple snapshots being taken
concurrently.
During BaseApp.Commit
, once a state transition has been committed, the height
is checked against the state-sync.snapshot-interval
setting. If the committed
height should be snapshotted, a goroutine BaseApp.snapshot()
is spawned that
calls snapshots.Manager.Create()
to create the snapshot.
Manager.Create()
will do some basic pre-flight checks, and then start
generating a snapshot by calling rootmulti.Store.Snapshot()
. The chunk stream
is passed into snapshots.Store.Save()
, which stores the chunks in the
filesystem and records the snapshot metadata in the snapshot database.
Once the snapshot has been generated, BaseApp.snapshot()
then removes any
old snapshots based on the state-sync.snapshot-keep-recent
setting.
Serving Snapshots
When a remote node is discovering snapshots for state sync, Tendermint will
call the ListSnapshots
ABCI method to list the snapshots present on the
local node. This is dispatched to snapshots.Manager.List()
, which in turn
dispatches to snapshots.Store.List()
.
When a remote node is fetching snapshot chunks during state sync, Tendermint
will call the LoadSnapshotChunk
ABCI method to fetch a chunk from the local
node. This dispatches to snapshots.Manager.LoadChunk()
, which in turn
dispatches to snapshots.Store.LoadChunk()
.
Restoring Snapshots
When the operator has configured the local Tendermint node to run state sync
(see the resources listed in the introduction for details on Tendermint state
sync), it will discover snapshots across the P2P network and offer their
metadata in turn to the local application via the OfferSnapshot
ABCI call.
BaseApp.OfferSnapshot()
attempts to start a restore operation by calling
snapshots.Manager.Restore()
. This may fail, e.g. if the snapshot format is
unknown (it may have been generated by a different version of the Cosmos SDK),
in which case Tendermint will offer other discovered snapshots.
If the snapshot is accepted, Manager.Restore()
will record that a restore
operation is in progress, and spawn a separate goroutine that runs a synchronous
rootmulti.Store.Restore()
snapshot restoration which will be fed snapshot
chunks until it is complete.
Tendermint will then start fetching and buffering chunks, providing them in
order via ABCI ApplySnapshotChunk
calls. These dispatch to
Manager.RestoreChunk()
, which passes the chunks to the ongoing restore
process, checking if errors have been encountered yet (e.g. due to checksum
mismatches or invalid IAVL data). Once the final chunk is passed,
Manager.RestoreChunk()
will wait for the restore process to complete before
returning.
Once the restore is completed, Tendermint will go on to call the Info
ABCI
call to fetch the app hash, and compare this against the trusted chain app
hash at the snapshot height to verify the restored state. If it matches,
Tendermint goes on to process blocks.
Documentation
¶
Index ¶
- func DrainChunks(chunks <-chan io.ReadCloser)
- func IsFormatSupported(snapshotter types.ExtensionSnapshotter, format uint32) bool
- type ChunkReader
- type ChunkWriter
- type Manager
- func (m *Manager) Create(height uint64) (*types.Snapshot, error)
- func (m *Manager) List() ([]*types.Snapshot, error)
- func (m *Manager) LoadChunk(height uint64, format uint32, chunk uint32) ([]byte, error)
- func (m *Manager) Prune(retain uint32) (uint64, error)
- func (m *Manager) RegisterExtensions(extensions ...types.ExtensionSnapshotter) error
- func (m *Manager) Restore(snapshot types.Snapshot) error
- func (m *Manager) RestoreChunk(chunk []byte) (bool, error)
- type Store
- func (s *Store) Delete(height uint64, format uint32) error
- func (s *Store) Get(height uint64, format uint32) (*types.Snapshot, error)
- func (s *Store) GetLatest() (*types.Snapshot, error)
- func (s *Store) List() ([]*types.Snapshot, error)
- func (s *Store) Load(height uint64, format uint32) (*types.Snapshot, <-chan io.ReadCloser, error)
- func (s *Store) LoadChunk(height uint64, format uint32, chunk uint32) (io.ReadCloser, error)
- func (s *Store) Prune(retain uint32) (uint64, error)
- func (s *Store) Save(height uint64, format uint32, chunks <-chan io.ReadCloser) (*types.Snapshot, error)
- type StreamReader
- type StreamWriter
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DrainChunks ¶
func DrainChunks(chunks <-chan io.ReadCloser)
DrainChunks drains and closes all remaining chunks from a chunk channel.
func IsFormatSupported ¶ added in v0.46.0
func IsFormatSupported(snapshotter types.ExtensionSnapshotter, format uint32) bool
IsFormatSupported returns if the snapshotter supports restoration from given format.
Types ¶
type ChunkReader ¶
type ChunkReader struct {
// contains filtered or unexported fields
}
ChunkReader reads chunks from a channel of io.ReadClosers and outputs them as an io.Reader
func NewChunkReader ¶
func NewChunkReader(ch <-chan io.ReadCloser) *ChunkReader
NewChunkReader creates a new ChunkReader.
type ChunkWriter ¶
type ChunkWriter struct {
// contains filtered or unexported fields
}
ChunkWriter reads an input stream, splits it into fixed-size chunks, and writes them to a sequence of io.ReadClosers via a channel.
func NewChunkWriter ¶
func NewChunkWriter(ch chan<- io.ReadCloser, chunkSize uint64) *ChunkWriter
NewChunkWriter creates a new ChunkWriter. If chunkSize is 0, no chunking will be done.
func (*ChunkWriter) CloseWithError ¶
func (w *ChunkWriter) CloseWithError(err error)
CloseWithError closes the writer and sends an error to the reader.
type Manager ¶
type Manager struct {
// contains filtered or unexported fields
}
Manager manages snapshot and restore operations for an app, making sure only a single long-running operation is in progress at any given time, and provides convenience methods mirroring the ABCI interface.
Although the ABCI interface (and this manager) passes chunks as byte slices, the internal snapshot/restore APIs use IO streams (i.e. chan io.ReadCloser), for two reasons:
- In the future, ABCI should support streaming. Consider e.g. InitChain during chain upgrades, which currently passes the entire chain state as an in-memory byte slice. https://github.com/tendermint/tendermint/issues/5184
- io.ReadCloser streams automatically propagate IO errors, and can pass arbitrary errors via io.Pipe.CloseWithError().
func NewManager ¶
func NewManager(store *Store, multistore types.Snapshotter) *Manager
NewManager creates a new manager.
func NewManagerWithExtensions ¶ added in v0.46.0
func NewManagerWithExtensions(store *Store, multistore types.Snapshotter, extensions map[string]types.ExtensionSnapshotter) *Manager
NewManagerWithExtensions creates a new manager.
func (*Manager) List ¶
List lists snapshots, mirroring ABCI ListSnapshots. It can be concurrent with other operations.
func (*Manager) LoadChunk ¶
LoadChunk loads a chunk into a byte slice, mirroring ABCI LoadChunk. It can be called concurrently with other operations. If the chunk does not exist, nil is returned.
func (*Manager) RegisterExtensions ¶ added in v0.46.0
func (m *Manager) RegisterExtensions(extensions ...types.ExtensionSnapshotter) error
RegisterExtensions register extension snapshotters to manager
type Store ¶
type Store struct {
// contains filtered or unexported fields
}
Store is a snapshot store, containing snapshot metadata and binary chunks.
func (*Store) Load ¶
Load loads a snapshot (both metadata and binary chunks). The chunks must be consumed and closed. Returns nil if the snapshot does not exist.
func (*Store) LoadChunk ¶
LoadChunk loads a chunk from disk, or returns nil if it does not exist. The caller must call Close() on it when done.
type StreamReader ¶ added in v0.46.0
type StreamReader struct {
// contains filtered or unexported fields
}
StreamReader set up a restore stream pipeline chan io.ReadCloser -> chunkReader -> zlib -> delimited Protobuf -> ExportNode
func NewStreamReader ¶ added in v0.46.0
func NewStreamReader(chunks <-chan io.ReadCloser) (*StreamReader, error)
NewStreamReader set up a restore stream pipeline.
func (*StreamReader) Close ¶ added in v0.46.0
func (sr *StreamReader) Close() error
Close implements io.Closer interface
type StreamWriter ¶ added in v0.46.0
type StreamWriter struct {
// contains filtered or unexported fields
}
StreamWriter set up a stream pipeline to serialize snapshot nodes: Exported Items -> delimited Protobuf -> zlib -> buffer -> chunkWriter -> chan io.ReadCloser
func NewStreamWriter ¶ added in v0.46.0
func NewStreamWriter(ch chan<- io.ReadCloser) *StreamWriter
NewStreamWriter set up a stream pipeline to serialize snapshot DB records.
func (*StreamWriter) Close ¶ added in v0.46.0
func (sw *StreamWriter) Close() error
Close implements io.Closer interface
func (*StreamWriter) CloseWithError ¶ added in v0.46.0
func (sw *StreamWriter) CloseWithError(err error)
CloseWithError pass error to chunkWriter