Documentation ¶
Overview ¶
Package multiraft implements the Raft distributed consensus algorithm.
In contrast to other Raft implementations, this version is optimized for the case where one server is a part of many Raft consensus groups (likely with overlapping membership). This entails the use of a shared log and coalesced timers for heartbeats.
A cluster consists of a collection of nodes; the local node is represented by a MultiRaft object. Each node may participate in any number of groups. Nodes must have a globally unique ID (a string), and groups have a globally unique name. The application is responsible for providing a Transport interface that knows how to communicate with other nodes based on their IDs, and a Storage interface to manage persistent data.
The Raft protocol is documented in "In Search of an Understandable Consensus Algorithm" by Diego Ongaro and John Ousterhout. https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
Index ¶
- Variables
- type AppendEntriesRequest
- type AppendEntriesResponse
- type ClientInterface
- type Clock
- type Config
- type EventCommandCommitted
- type EventLeaderElection
- type GroupElectionState
- type GroupID
- type GroupMembers
- type GroupPersistentState
- type LogEntry
- type LogEntryState
- type LogEntryType
- type MemoryStorage
- func (m *MemoryStorage) AppendLogEntries(groupID GroupID, entries []*LogEntry) error
- func (m *MemoryStorage) GetLogEntries(groupID GroupID, firstIndex, lastIndex int, ch chan<- *LogEntryState)
- func (m *MemoryStorage) GetLogEntry(groupID GroupID, index int) (*LogEntry, error)
- func (m *MemoryStorage) LoadGroups() <-chan *GroupPersistentState
- func (m *MemoryStorage) SetGroupElectionState(groupID GroupID, electionState *GroupElectionState) error
- func (m *MemoryStorage) TruncateLog(groupID GroupID, lastIndex int) error
- type MultiRaft
- type NodeID
- type RPCInterface
- type RequestHeader
- type RequestVoteRequest
- type RequestVoteResponse
- type Role
- type ServerInterface
- type Storage
- type Transport
Constants ¶
This section is empty.
Variables ¶
var RealClock = realClock{}
RealClock is the standard implementation of the Clock interface.
Functions ¶
This section is empty.
Types ¶
type AppendEntriesRequest ¶
type AppendEntriesRequest struct { RequestHeader GroupID GroupID Term int LeaderID NodeID PrevLogIndex int PrevLogTerm int Entries []*LogEntry LeaderCommit int }
AppendEntriesRequest is a part of the Raft protocol. It is public so it can be used by the net/rpc system but should not be used outside this package except to serialize it.
type AppendEntriesResponse ¶
AppendEntriesResponse is a part of the Raft protocol. It is public so it can be used by the net/rpc system but should not be used outside this package except to serialize it.
type ClientInterface ¶
type ClientInterface interface { Go(serviceMethod string, args interface{}, reply interface{}, done chan *rpc.Call) *rpc.Call Close() error }
ClientInterface is the interface expected of the client provided by a transport. It is satisfied by rpc.Client, but could be implemented in other ways (using rpc.Call as a dumb data structure)
type Clock ¶
type Clock interface { Now() time.Time // NewElectionTimer returns a timer to be used for triggering elections. The resulting // Timer struct will have its C field filled out, but may not be a "real" timer, so it // must be stopped with StopElectionTimer instead of t.Stop() NewElectionTimer(time.Duration) *time.Timer StopElectionTimer(*time.Timer) }
Clock encapsulates the timing-related parts of the raft protocol. Types of events are separated in the API (i.e. NewElectionTimer() instead of NewTimer() so they can be triggered individually in tests.
type Config ¶
type Config struct { Storage Storage Transport Transport // Clock may be nil to use real time. Clock Clock // A new election is called if the ElectionTimeout elapses with no contact from the leader. // The actual ElectionTimeout is chosen randomly from the range [ElectionTimeoutMin, // ElectionTimeoutMax) to minimize the chances of several servers trying to become leaders // simultaneously. The Raft paper suggests a range of 150-300ms for local networks; // geographically distributed installations should use higher values to account for the // increased round trip time. ElectionTimeoutMin time.Duration ElectionTimeoutMax time.Duration // If Strict is true, some warnings become fatal panics and additional (possibly expensive) // sanity checks will be done. Strict bool }
Config contains the parameters necessary to construct a MultiRaft object.
type EventCommandCommitted ¶
type EventCommandCommitted struct {
Command []byte
}
An EventCommandCommitted is broadcast whenever a command has been committed.
type EventLeaderElection ¶
An EventLeaderElection is broadcast when a group completes an election. TODO(bdarnell): emit EventLeaderElection from follower nodes as well.
type GroupElectionState ¶
type GroupElectionState struct { // CurrentTerm is the highest term this node has seen. CurrentTerm int // VotedFor is the node this node has voted for in CurrentTerm's election. It is zero // If this node has not yet voted in CurrentTerm. VotedFor NodeID }
GroupElectionState records the votes this node has made so that it will not change its vote after a restart.
func (*GroupElectionState) Equal ¶
func (g *GroupElectionState) Equal(other *GroupElectionState) bool
Equal compares two GroupElectionStates.
type GroupID ¶
type GroupID int64
GroupID is a unique identifier for a consensus group within the cluster.
type GroupMembers ¶
type GroupMembers struct { // Members contains the current members of the group and is always non-empty. // When ProposedMembers is non-empty, the group is in a "joint consensus" phase and // a quorum must be reached independently among both Members and ProposedMembers. // 'Members' never changes except to be set to the ProposedMembers of the most recently // committed GroupMembers. Members []NodeID ProposedMembers []NodeID // NonVotingMembers receive logs for the group but do not participate in elections // or quorum decisions. When a new node is added to the group it is initially // a NonVotingMember until it has caught up to the current log position. NonVotingMembers []NodeID }
GroupMembers maintains the current and future members of the group. It is updated when log entries are received rather than when they are committed as a part of the "joint consensus" protocol (section 6 of the Raft paper).
type GroupPersistentState ¶
type GroupPersistentState struct { GroupID GroupID ElectionState GroupElectionState Members GroupMembers LastLogIndex int LastLogTerm int }
GroupPersistentState is a unified view of the readable data (except for log entries) about a group; used by Storage.LoadGroups.
type LogEntry ¶
type LogEntry struct { Term int Index int Type LogEntryType Payload []byte }
LogEntry represents a persistent log entry. Payloads are opaque to the raft system. TODO(bdarnell): we will need both opaque payloads for the application and raft-subsystem payloads for membership changes.
type LogEntryState ¶
LogEntryState is used by Storage.GetLogEntries to bundle a LogEntry with its index and an optional error.
type LogEntryType ¶
type LogEntryType int8
LogEntryType is the type of a LogEntry.
const (
LogEntryCommand LogEntryType = iota
)
LogEntryCommand is for application-level commands sent via MultiRaft.SendCommand; other LogEntryTypes are for internal use.
type MemoryStorage ¶
type MemoryStorage struct {
// contains filtered or unexported fields
}
MemoryStorage is an in-memory implementation of Storage for testing.
func NewMemoryStorage ¶
func NewMemoryStorage() *MemoryStorage
NewMemoryStorage creates a MemoryStorage.
func (*MemoryStorage) AppendLogEntries ¶
func (m *MemoryStorage) AppendLogEntries(groupID GroupID, entries []*LogEntry) error
AppendLogEntries implements the Storage interface.
func (*MemoryStorage) GetLogEntries ¶
func (m *MemoryStorage) GetLogEntries(groupID GroupID, firstIndex, lastIndex int, ch chan<- *LogEntryState)
GetLogEntries implements the Storage interface.
func (*MemoryStorage) GetLogEntry ¶
func (m *MemoryStorage) GetLogEntry(groupID GroupID, index int) (*LogEntry, error)
GetLogEntry implements the Storage interface.
func (*MemoryStorage) LoadGroups ¶
func (m *MemoryStorage) LoadGroups() <-chan *GroupPersistentState
LoadGroups implements the Storage interface.
func (*MemoryStorage) SetGroupElectionState ¶
func (m *MemoryStorage) SetGroupElectionState(groupID GroupID, electionState *GroupElectionState) error
SetGroupElectionState implements the Storage interface.
func (*MemoryStorage) TruncateLog ¶
func (m *MemoryStorage) TruncateLog(groupID GroupID, lastIndex int) error
TruncateLog implements the Storage interface.
type MultiRaft ¶
type MultiRaft struct { Config Events chan interface{} // contains filtered or unexported fields }
MultiRaft represents a local node in a raft cluster. The owner is responsible for consuming the Events channel in a timely manner.
func NewMultiRaft ¶
NewMultiRaft creates a MultiRaft object.
func (*MultiRaft) CreateGroup ¶
CreateGroup creates a new consensus group and joins it. The application should arrange to call CreateGroup on all nodes named in initialMembers.
func (*MultiRaft) Start ¶
func (m *MultiRaft) Start()
Start runs the raft algorithm in a background goroutine.
func (*MultiRaft) Stop ¶
func (m *MultiRaft) Stop()
Stop terminates the running raft instance and shuts down all network interfaces.
func (*MultiRaft) SubmitCommand ¶
SubmitCommand sends a command (a binary blob) to the cluster. This method returns when the command has been successfully sent, not when it has been committed. TODO(bdarnell): should SubmitCommand wait until the commit?
type NodeID ¶
type NodeID int32
NodeID is a unique non-zero identifier for the node within the cluster.
type RPCInterface ¶
type RPCInterface interface { RequestVote(req *RequestVoteRequest, resp *RequestVoteResponse) error AppendEntries(req *AppendEntriesRequest, resp *AppendEntriesResponse) error }
RPCInterface is the methods we expose for use by net/rpc.
type RequestHeader ¶
RequestHeader contains fields common to all RPC requests.
type RequestVoteRequest ¶
type RequestVoteRequest struct { RequestHeader GroupID GroupID Term int CandidateID NodeID LastLogIndex int LastLogTerm int }
RequestVoteRequest is a part of the Raft protocol. It is public so it can be used by the net/rpc system but should not be used outside this package except to serialize it.
type RequestVoteResponse ¶
RequestVoteResponse is a part of the Raft protocol. It is public so it can be used by the net/rpc system but should not be used outside this package except to serialize it.
type Role ¶
type Role int
Role represents the state of the node in a group.
Nodes can be either observers, followers, candidates, or leaders. Observers receive replicated logs but do not vote. There is at most one Leader per term; a node cannot become a Leader without first becoming a Candiate and winning an election.
type ServerInterface ¶
ServerInterface is a generic interface based on net/rpc.
type Storage ¶
type Storage interface { // LoadGroups is called at startup to load all previously-existing groups. // The returned channel should be closed once all groups have been loaded. LoadGroups() <-chan *GroupPersistentState // SetGroupElectionState is called to update the election state for the given group. SetGroupElectionState(groupID GroupID, electionState *GroupElectionState) error // AppendLogEntries is called to add entries to the log. The entries will always span // a contiguous range of indices just after the current end of the log. AppendLogEntries(groupID GroupID, entries []*LogEntry) error // TruncateLog is called to delete all log entries with index > lastIndex. TruncateLog(groupID GroupID, lastIndex int) error // GetLogEntry is called to synchronously retrieve an entry from the log. GetLogEntry(groupID GroupID, index int) (*LogEntry, error) // GetLogEntries is called to asynchronously retrieve entries from the log, // from firstIndex to lastIndex inclusive. If there is an error the storage // layer should send one LogEntryState with a non-nil error and then close the // channel. GetLogEntries(groupID GroupID, firstIndex, lastIndex int, ch chan<- *LogEntryState) }
The Storage interface is supplied by the application to manage persistent storage of raft data.
type Transport ¶
type Transport interface { // Listen informs the Transport of the local node's ID and callback interface. // The Transport should associate the given id with the server object so other Transport's // Connect methods can find it. Listen(id NodeID, server ServerInterface) error // Stop undoes a previous Listen. Stop(id NodeID) // Connect looks up a node by id and returns a stub interface to submit RPCs to it. Connect(id NodeID) (ClientInterface, error) }
The Transport interface is supplied by the application to manage communication with other nodes. It is responsible for mapping from IDs to some communication channel (in the simplest case, a host:port pair could be used as an ID, although this would make it impossible to move an instance from one host to another except by syncing up a new node from scratch).
func NewLocalRPCTransport ¶
func NewLocalRPCTransport() Transport
NewLocalRPCTransport creates a Transport for local testing use. MultiRaft instances sharing the same local Transport can find and communicate with each other by ID (which can be an arbitrary string). Each instance binds to a different unused port on localhost.