Documentation ¶
Overview ¶
Package multiraft implements the Raft distributed consensus algorithm.
In contrast to other Raft implementations, this version is optimized for the case where one server is a part of many Raft consensus groups (likely with overlapping membership). This entails the use of a shared log and coalesced timers for heartbeats.
A cluster consists of a collection of nodes; the local node is represented by a MultiRaft object. Each node may participate in any number of groups. Nodes must have a globally unique ID (a string), and groups have a globally unique name. The application is responsible for providing a Transport interface that knows how to communicate with other nodes based on their IDs, and a Storage interface to manage persistent data.
The Raft protocol is documented in "In Search of an Understandable Consensus Algorithm" by Diego Ongaro and John Ousterhout. https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
Index ¶
- type Config
- type EventCommandCommitted
- type EventLeaderElection
- type EventMembershipChangeCommitted
- type MemoryStorage
- type MultiRaft
- func (m *MultiRaft) ChangeGroupMembership(groupID uint64, commandID string, changeType raftpb.ConfChangeType, ...) chan struct{}
- func (m *MultiRaft) CreateGroup(groupID uint64) error
- func (m *MultiRaft) RemoveGroup(groupID uint64) error
- func (m *MultiRaft) Start() error
- func (m *MultiRaft) Stop()
- func (m *MultiRaft) SubmitCommand(groupID uint64, commandID string, command []byte) chan struct{}
- type NodeID
- type RaftMessageRequest
- type RaftMessageResponse
- type ServerInterface
- type Storage
- type Ticker
- type Transport
- type WriteableGroupStorage
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct { Storage Storage Transport Transport // Ticker may be nil to use real time and TickInterval. Ticker Ticker // A new election is called if the ElectionTimeout elapses with no contact from the leader. // The actual ElectionTimeout is chosen randomly from the range [ElectionTimeoutMin, // ElectionTimeoutMax) to minimize the chances of several servers trying to become leaders // simultaneously. The Raft paper suggests a range of 150-300ms for local networks; // geographically distributed installations should use higher values to account for the // increased round trip time. ElectionTimeoutTicks int HeartbeatIntervalTicks int TickInterval time.Duration // If Strict is true, some warnings become fatal panics and additional (possibly expensive) // sanity checks will be done. Strict bool EntryFormatter raft.EntryFormatter }
Config contains the parameters necessary to construct a MultiRaft object.
type EventCommandCommitted ¶
type EventCommandCommitted struct { GroupID uint64 // CommandID is the application-supplied ID for this command. The same CommandID // may be seen multiple times, so the application should remember this CommandID // for deduping. CommandID string // Index is the raft log index for this event. The application should persist // the Index of the last applied command atomically with any effects of that // command. Index uint64 Command []byte }
An EventCommandCommitted is broadcast whenever a command has been committed.
type EventLeaderElection ¶
An EventLeaderElection is broadcast when a group starts or completes an election. NodeID is zero when an election is in progress.
type EventMembershipChangeCommitted ¶
type EventMembershipChangeCommitted struct { // GroupID, CommandID, and Index are the same as for EventCommandCommitted. GroupID uint64 CommandID string Index uint64 NodeID NodeID ChangeType raftpb.ConfChangeType Payload []byte // Callback should be invoked when this event and its payload have been // processed. A non-nil error aborts the membership change. Callback func(error) }
An EventMembershipChangeCommitted is broadcast whenever a membership change has been committed.
type MemoryStorage ¶
type MemoryStorage struct {
// contains filtered or unexported fields
}
MemoryStorage is an in-memory implementation of Storage for testing.
func NewMemoryStorage ¶
func NewMemoryStorage() *MemoryStorage
NewMemoryStorage creates a MemoryStorage.
func (*MemoryStorage) GroupStorage ¶
func (m *MemoryStorage) GroupStorage(groupID uint64) WriteableGroupStorage
GroupStorage implements the Storage interface.
type MultiRaft ¶
type MultiRaft struct { Config Events chan interface{} // contains filtered or unexported fields }
MultiRaft represents a local node in a raft cluster. The owner is responsible for consuming the Events channel in a timely manner.
func NewMultiRaft ¶
NewMultiRaft creates a MultiRaft object.
func (*MultiRaft) ChangeGroupMembership ¶
func (m *MultiRaft) ChangeGroupMembership(groupID uint64, commandID string, changeType raftpb.ConfChangeType, nodeID NodeID, payload []byte) chan struct{}
ChangeGroupMembership submits a proposed membership change to the cluster. Payload is an opaque blob that will be returned in EventMembershipChangeCommitted.
func (*MultiRaft) CreateGroup ¶
CreateGroup creates a new consensus group and joins it. The initial membership of this group is determined by the InitialState method of the group's Storage object.
func (*MultiRaft) RemoveGroup ¶
RemoveGroup destroys the consensus group with the given ID. No events for this group will be emitted after this method returns (but some events may still be in the channel buffer).
func (*MultiRaft) Stop ¶
func (m *MultiRaft) Stop()
Stop terminates the running raft instance and shuts down all network interfaces.
func (*MultiRaft) SubmitCommand ¶
SubmitCommand sends a command (a binary blob) to the cluster. This method returns when the command has been successfully sent, not when it has been committed. The returned channel is closed when the command is committed. As long as this node is alive, the command will be retried until committed (e.g. in the event of leader failover). There is no guarantee that commands will be committed in the same order as they were originally submitted.
type NodeID ¶
type NodeID uint64
NodeID is a type alias for a raft node ID. Note that a raft node corresponds to a cockroach node+store combination.
type RaftMessageRequest ¶
RaftMessageRequest wraps a raft message.
type RaftMessageResponse ¶
type RaftMessageResponse struct { }
RaftMessageResponse is empty (raft uses a one-way messaging model; if a response is needed it will be sent as a separate message).
type ServerInterface ¶
type ServerInterface interface {
RaftMessage(req *RaftMessageRequest, resp *RaftMessageResponse) error
}
ServerInterface is the methods we expose for use by net/rpc.
type Storage ¶
type Storage interface {
GroupStorage(groupID uint64) WriteableGroupStorage
}
The Storage interface is supplied by the application to manage persistent storage of raft data.
type Ticker ¶
type Ticker interface { // This channel will be readable once per tick. The time value returned is unspecified; // the channel has this type for compatibility with time.Ticker but other implementations // may not return real times. Chan() <-chan time.Time }
Ticker encapsulates the timing-related parts of the raft protocol.
type Transport ¶
type Transport interface { // Listen informs the Transport of the local node's ID and callback interface. // The Transport should associate the given id with the server object so other Transport's // Connect methods can find it. Listen(id NodeID, server ServerInterface) error // Stop undoes a previous Listen. Stop(id NodeID) // Send a message to the given node. Send(id NodeID, req *RaftMessageRequest) error // Close all associated connections. Close() }
The Transport interface is supplied by the application to manage communication with other nodes. It is responsible for mapping from IDs to some communication channel (in the simplest case, a host:port pair could be used as an ID, although this would make it impossible to move an instance from one host to another except by syncing up a new node from scratch).
func NewLocalRPCTransport ¶
func NewLocalRPCTransport() Transport
NewLocalRPCTransport creates a Transport for local testing use. MultiRaft instances sharing the same local Transport can find and communicate with each other by ID (which can be an arbitrary string). Each instance binds to a different unused port on localhost. Because this is just for local testing, it doesn't use TLS.
type WriteableGroupStorage ¶
type WriteableGroupStorage interface { raft.Storage Append(entries []raftpb.Entry) error ApplySnapshot(snap raftpb.Snapshot) error SetHardState(st raftpb.HardState) error }
WriteableGroupStorage represents a single group within a Storage. It is implemented by *raft.MemoryStorage.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
Package storagetest is a test suite for raft.Storage implementations.
|
Package storagetest is a test suite for raft.Storage implementations. |