Documentation ¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( // ErrInvalidMembers indicates that the provided member nodes is invalid. ErrInvalidMembers = errors.New("invalid members") // ErrPathNotExist indicates that the specified exported snapshot directory // do not exist. ErrPathNotExist = errors.New("path does not exist") // ErrIncompleteSnapshot indicates that the specified exported snapshot // directory does not contain a complete snapshot. ErrIncompleteSnapshot = errors.New("snapshot is incomplete") )
Functions ¶
func ImportSnapshot ¶
func ImportSnapshot(nhConfig config.NodeHostConfig, srcDir string, memberNodes map[uint64]string, replicaID uint64) (err error)
ImportSnapshot is used to repair the Raft shard already has its quorum nodes permanently lost or damaged. Such repair is only required when the Raft shard permanently lose its quorum. You are not suppose to use this function when the shard still have its majority nodes running or when the node failures are not permanent. In our experience, a well monitored and managed Dragonboat system can usually avoid using the ImportSnapshot tool by always replace permanently dead nodes with available ones in time.
ImportSnapshot imports the exported snapshot available in the specified srcDir directory to the system and rewrites the history of node replicaID so the node owns the imported snapshot and the membership of the Raft shard is rewritten to the details specified in memberNodes.
ImportSnapshot is typically invoked by a DevOps tool separated from the Dragonboat based application. The NodeHost instance must be stopped on that host when invoking the function ImportSnapshot.
As an example, consider a Raft shard with three nodes with the ReplicaID values being 1, 2 and 3, they run on three distributed hostss each with a running NodeHost instance and the RaftAddress values are m1, m2 and m3. The ShardID value of the Raft shard is 100. Let's say hosts identified by m2 and m3 suddenly become permanently gone and thus cause the Raft shard to lose its quorum nodes. To repair the shard, we can use the ImportSnapshot function to overwrite the state and membership of the Raft shard.
Assuming we have two other running hosts identified as m4 and m5, we want to have two new nodes with ReplicaID 4 and 5 to replace the permanently lost ndoes 2 and 3. In this case, the memberNodes map should contain the following content:
memberNodes: map[uint64]string{ {1: "m1"}, {4: "m4"}, {5: "m5"}, }
we first shutdown NodeHost instances on all involved hosts and call the ImportSnapshot function from the DevOps tool. Assuming the directory /backup/shard100 contains the exported snapshot we previously saved by using NodeHost's ExportSnapshot method, then -
on m1, we call - ImportSnapshot(nhConfig1, "/backup/shard100", memberNodes, 1)
on m4 - ImportSnapshot(nhConfig4, "/backup/shard100", memberNodes, 4)
on m5 - ImportSnapshot(nhConfig5, "/backup/shard100", memberNodes, 5)
The nhConfig* value used above should be the same as the one used to start your NodeHost instances, they are suppose to be slightly different on m1, m4 and m5 to reflect the differences between these hosts, e.g. the RaftAddress values. srcDir values are all set to "/backup/shard100", that directory should contain the exact same snapshot. The memberNodes value should be the same across all three hosts.
Once ImportSnapshot is called on all three of those hosts, we end up having the history of the Raft shard overwritten to the state in which -
- there are 3 nodes in the Raft shard, the ReplicaID values are 1, 4 and 5. they run on hosts m1, m4 and m5.
- nodes 2 and 3 are permanently removed from the shard. you should never restart any of them as both hosts m2 and m3 are suppose to be permanently unavailable.
- the state captured in the snapshot became the state of the shard. all proposals more recent than the state of the snapshot are lost.
Once the NodeHost instances are restarted on m1, m4 and m5, nodes 1, 4 and 5 of the Raft shard 100 can be restarted in the same way as after rebooting the hosts m1, m4 and m5.
It is your applications's responsibility to let m4 and m5 to be aware that node 4 and 5 are now running there.
Types ¶
This section is empty.