Documentation ¶
Overview ¶
Package multiraft implements the Raft distributed consensus algorithm.
In contrast to other Raft implementations, this version is optimized for the case where one server is a part of many Raft consensus groups (likely with overlapping membership). This entails the use of a shared log and coalesced timers for heartbeats.
A cluster consists of a collection of nodes; the local node is represented by a MultiRaft object. Each node may participate in any number of groups. Nodes must have a globally unique ID (a string), and groups have a globally unique name. The application is responsible for providing a Transport interface that knows how to communicate with other nodes based on their IDs, and a Storage interface to manage persistent data.
The Raft protocol is documented in "In Search of an Understandable Consensus Algorithm" by Diego Ongaro and John Ousterhout. https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
Index ¶
- type ChangeMembershipOperation
- type ChangeMembershipPayload
- type ClientInterface
- type Config
- type EventCommandCommitted
- type EventLeaderElection
- type GroupPersistentState
- type LogEntry
- type LogEntryState
- type MemoryStorage
- type MultiRaft
- func (m *MultiRaft) ChangeGroupMembership(groupID uint64, changeOp ChangeMembershipOperation, nodeID uint64) error
- func (m *MultiRaft) CreateGroup(groupID uint64, initialMembers []uint64) error
- func (m *MultiRaft) DoRPC(name string, req, resp interface{}) error
- func (m *MultiRaft) Start()
- func (m *MultiRaft) Stop()
- func (m *MultiRaft) SubmitCommand(groupID uint64, command []byte) error
- type RPCInterface
- type SendMessageRequest
- type SendMessageResponse
- type ServerInterface
- type Storage
- type Ticker
- type Transport
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ChangeMembershipOperation ¶
type ChangeMembershipOperation int8
ChangeMembershipOperation indicates the operation being performed by a ChangeMembershipPayload.
const ( // ChangeMembershipAddObserver adds a non-voting node. The given node will // retrieve a snapshot and catch up with logs. ChangeMembershipAddObserver ChangeMembershipOperation = iota // ChangeMembershipRemoveObserver removes a non-voting node. ChangeMembershipRemoveObserver // ChangeMembershipAddMember adds a full (voting) node. The given node must already be an // observer; it will be removed from the Observers list when this // operation is processed. // TODO(bdarnell): enforce the requirement that a node be added as an observer first. ChangeMembershipAddMember // ChangeMembershipRemoveMember removes a voting node. It is not possible to remove the // last node; the result of attempting to do so is undefined. ChangeMembershipRemoveMember )
Values for ChangeMembershipOperation.
type ChangeMembershipPayload ¶
type ChangeMembershipPayload struct { Operation ChangeMembershipOperation Node uint64 }
ChangeMembershipPayload is the Payload of an entry with Type LogEntryChangeMembership. Nodes are added or removed one at a time to minimize the risk of quorum failures in the new configuration.
type ClientInterface ¶
type ClientInterface interface { Go(serviceMethod string, args interface{}, reply interface{}, done chan *rpc.Call) *rpc.Call Close() error }
ClientInterface is the interface expected of the client provided by a transport. It is satisfied by rpc.Client, but could be implemented in other ways (using rpc.Call as a dumb data structure)
type Config ¶
type Config struct { Storage Storage Transport Transport // Ticker may be nil to use real time and TickInterval. Ticker Ticker // A new election is called if the ElectionTimeout elapses with no contact from the leader. // The actual ElectionTimeout is chosen randomly from the range [ElectionTimeoutMin, // ElectionTimeoutMax) to minimize the chances of several servers trying to become leaders // simultaneously. The Raft paper suggests a range of 150-300ms for local networks; // geographically distributed installations should use higher values to account for the // increased round trip time. ElectionTimeoutTicks int HeartbeatIntervalTicks int TickInterval time.Duration // If Strict is true, some warnings become fatal panics and additional (possibly expensive) // sanity checks will be done. Strict bool }
Config contains the parameters necessary to construct a MultiRaft object.
type EventCommandCommitted ¶
type EventCommandCommitted struct {
Command []byte
}
An EventCommandCommitted is broadcast whenever a command has been committed.
type EventLeaderElection ¶
An EventLeaderElection is broadcast when a group completes an election. TODO(bdarnell): emit EventLeaderElection from follower nodes as well.
type GroupPersistentState ¶
GroupPersistentState is a unified view of the readable data (except for log entries) about a group; used by Storage.LoadGroups.
type LogEntryState ¶
LogEntryState is used by Storage.GetLogEntries to bundle a LogEntry with its index and an optional error.
type MemoryStorage ¶
type MemoryStorage struct {
// contains filtered or unexported fields
}
MemoryStorage is an in-memory implementation of Storage for testing.
func NewMemoryStorage ¶
func NewMemoryStorage() *MemoryStorage
NewMemoryStorage creates a MemoryStorage.
func (*MemoryStorage) AppendLogEntries ¶
func (m *MemoryStorage) AppendLogEntries(groupID uint64, entries []*LogEntry) error
AppendLogEntries implements the Storage interface.
func (*MemoryStorage) SetGroupState ¶
func (m *MemoryStorage) SetGroupState(groupID uint64, state *GroupPersistentState) error
SetGroupState implements the Storage interface.
type MultiRaft ¶
type MultiRaft struct { Config Events chan interface{} // contains filtered or unexported fields }
MultiRaft represents a local node in a raft cluster. The owner is responsible for consuming the Events channel in a timely manner.
func NewMultiRaft ¶
NewMultiRaft creates a MultiRaft object.
func (*MultiRaft) ChangeGroupMembership ¶
func (m *MultiRaft) ChangeGroupMembership(groupID uint64, changeOp ChangeMembershipOperation, nodeID uint64) error
ChangeGroupMembership submits a proposed membership change to the cluster. TODO(bdarnell): same concerns as SubmitCommand TODO(bdarnell): do we expose ChangeMembershipAdd{Member,Observer} to the application level or does MultiRaft take care of the non-member -> observer -> full member cycle?
func (*MultiRaft) CreateGroup ¶
CreateGroup creates a new consensus group and joins it. The application should arrange to call CreateGroup on all nodes named in initialMembers.
func (*MultiRaft) Start ¶
func (m *MultiRaft) Start()
Start runs the raft algorithm in a background goroutine.
func (*MultiRaft) Stop ¶
func (m *MultiRaft) Stop()
Stop terminates the running raft instance and shuts down all network interfaces.
func (*MultiRaft) SubmitCommand ¶
SubmitCommand sends a command (a binary blob) to the cluster. This method returns when the command has been successfully sent, not when it has been committed. TODO(bdarnell): should SubmitCommand wait until the commit? TODO(bdarnell): what do we do if we lose leadership before a command we proposed commits?
type RPCInterface ¶
type RPCInterface interface {
SendMessage(req *SendMessageRequest, resp *SendMessageResponse) error
}
RPCInterface is the methods we expose for use by net/rpc.
type SendMessageRequest ¶
SendMessageRequest wraps a raft message.
type SendMessageResponse ¶
type SendMessageResponse struct { }
SendMessageResponse is empty (raft uses a one-way messaging model; if a response is needed it will be sent as a separate message).
type ServerInterface ¶
ServerInterface is a generic interface based on net/rpc.
type Storage ¶
type Storage interface { // SetGroupState is called to update the persistent state for the given group. SetGroupState(groupID uint64, state *GroupPersistentState) error // AppendLogEntries is called to add entries to the log. The entries will always span // a contiguous range of indices just after the current end of the log. AppendLogEntries(groupID uint64, entries []*LogEntry) error }
The Storage interface is supplied by the application to manage persistent storage of raft data.
type Ticker ¶
type Ticker interface { // This channel will be readable once per tick. The time value returned is unspecified; // the channel has this type for compatibility with time.Ticker but other implementations // may not return real times. Chan() <-chan time.Time }
Ticker encapsulates the timing-related parts of the raft protocol.
type Transport ¶
type Transport interface { // Listen informs the Transport of the local node's ID and callback interface. // The Transport should associate the given id with the server object so other Transport's // Connect methods can find it. Listen(id uint64, server ServerInterface) error // Stop undoes a previous Listen. Stop(id uint64) // Connect looks up a node by id and returns a stub interface to submit RPCs to it. Connect(id uint64) (ClientInterface, error) }
The Transport interface is supplied by the application to manage communication with other nodes. It is responsible for mapping from IDs to some communication channel (in the simplest case, a host:port pair could be used as an ID, although this would make it impossible to move an instance from one host to another except by syncing up a new node from scratch).