raft

package
v0.0.0-...-ba1c585 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 29, 2017 License: BSD-3-Clause Imports: 23 Imported by: 0

README

// Copyright 2015 The Vanadium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

This is an implemetation of the Raft agreement protocol.  Each raft
member maintains a log of commands in the same order.  All commands 
go through a single master.  The master keeps track of the commit point,
i.e., the point in the log where a quorum of servers have stored the log.
Each server can apply the commands up to the commit point.  When a client
starts a member it provides a callback for applying the commands as they are
committed.  It is up to the application to make sure commands are idempotent.
For example, it can remember the last command applied and not let any old
ones be reapplied.

Raft members use the file system to persist their log records across crashes.
We're currently not syncing after writing each record to the disk to
speed things up.  Because of the idempotent callback, this will work as long
as a member of the quorum survives each master reelection but I may have to
eventualy rethink this.

The leader sends heartbeat messages at a fixed interval (hb).  Each follower will
trigger a new election if it hasn't heard from the leader in an interval 2.x * hb,
where x is a random number between 0 and 1.  The random interval reduces but does
not eliminate the likelihood of two elections starting simultaneously.

The VDL protocol is internal, i.e., it is just for raft members to talk to each
other.

Documentation

Index

Constants

View Source
const (
	RoleCandidate = iota // Requesting to be voted leader.
	RoleFollower
	RoleLeader
	RoleStopped
)
View Source
const ClientEntry = byte(0)
View Source
const RaftEntry = byte(1)

Variables

This section is empty.

Functions

This section is empty.

Types

type ControlEntry

type ControlEntry struct {
	InUse       bool
	CurrentTerm Term
	VotedFor    string
}

ControlEntry's are appended to the log to reflect changes in the current term and/or the voted for leader during ellections

type Entry

type Entry struct {
	LogEntry
	ControlEntry
}

type Index

type Index uint64

Index is an index into the log. The log entries are numbered sequentially. At the moment the entries RaftClient.Apply()ed should be sequential but that will change if we introduce system entries. For example, we could have an entry type that is used to add members to the set of replicas.

func (Index) VDLIsZero

func (x Index) VDLIsZero() bool

func (*Index) VDLRead

func (x *Index) VDLRead(dec vdl.Decoder) error

func (Index) VDLReflect

func (Index) VDLReflect(struct {
	Name string `vdl:"v.io/x/ref/lib/raft.Index"`
})

func (Index) VDLWrite

func (x Index) VDLWrite(enc vdl.Encoder) error

type LogEntry

type LogEntry struct {
	Term  Term
	Index Index
	Cmd   []byte
	Type  byte
}

The LogEntry is what the log consists of. 'error' starts nil and is never written to stable storage. It represents the result of RaftClient.Apply(Cmd, Index). This is a hack but I haven't figured out a better way.

func (LogEntry) VDLIsZero

func (x LogEntry) VDLIsZero() bool

func (*LogEntry) VDLRead

func (x *LogEntry) VDLRead(dec vdl.Decoder) error

func (LogEntry) VDLReflect

func (LogEntry) VDLReflect(struct {
	Name string `vdl:"v.io/x/ref/lib/raft.LogEntry"`
})

func (LogEntry) VDLWrite

func (x LogEntry) VDLWrite(enc vdl.Encoder) error

type Raft

type Raft interface {
	// AddMember adds a new member to the server set.  "id" is actually a network address for the member,
	// currently host:port.  This has to be done before starting the server.
	AddMember(ctx *context.T, id string) error

	// Id returns the id of this member.
	Id() string

	// Start starts the local server communicating with other members.
	Start()

	// Stop terminates the server.   It cannot be Start'ed again.
	Stop()

	// Append appends a new command to the replicated log.  The command will be Apply()ed at each member
	// once a quorum has logged it. The Append() will terminate once a quorum has logged it and at least
	// the leader has Apply()ed the command.  'applyError' is the error returned by the Apply() while
	// 'raftError' is returned by the raft library itself reporting that the Append could not be
	// performed.
	Append(ctx *context.T, cmd []byte) (applyError, raftError error)

	// Status returns the state of the raft.
	Status() (myId string, role int, leader string)

	// StartElection forces an election.  Normally just used for debugging.
	StartElection()
}

func NewRaft

func NewRaft(ctx *context.T, config *RaftConfig, client RaftClient) (Raft, error)

NewRaft creates a new raft server.

type RaftClient

type RaftClient interface {
	// Apply appies a logged command, 'cmd', to the client. The commands will
	// be delivered in the same order and with the same 'index' to all clients.
	// 'index' is a monotonically increasing number and is just an index into the
	// common log.
	//
	// Whenever a client restarts (after a crash perhaps) or falls too far behind
	// (as in a partitioned network) it will be reinitialized with a RestoreFomSnapshot
	// and then replayed all subsequent logged commands.
	//
	// A client that wishes to may return empty snapshots, i.e., just close the error
	// channel without writing anything and worry about reliably storing its database
	// itself.  It that case it must remember the highest index it has seen if it wishes
	// to avoid replays.  Hence the index is supplied with the Apply().
	Apply(cmd []byte, index Index) error

	// SaveToSnapshot requests the application to write a snapshot to 'wr'.
	// Until SaveToSnapshot returns, no commands will be Apply()ed.  Closing
	// the response channel signals that the snapshot is finished.  Any
	// error written to the response channel will be logged by the library
	// and the library will discard the snapshot if any error is returned.
	SaveToSnapshot(ctx *context.T, wr io.Writer, response chan<- error) error

	// RestoreFromSnapshot requests the application to rebuild its database from the snapshot
	// it must read from 'rd'.  'index' is the last index applied to the snapshot.  No Apply()s
	// will be performed until RestoreFromSnapshot() returns. 'index' can be ignored
	// or used for debugging.
	RestoreFromSnapshot(ctx *context.T, index Index, rd io.Reader) error
}

RaftClient defines the call backs from the Raft library to the application.

type RaftConfig

type RaftConfig struct {
	LogDir            string            // Directory in which to put log and snapshot files.
	HostPort          string            // For RPCs from other members.
	ServerName        string            // Where to mount if not empty.
	Heartbeat         time.Duration     // Time between heartbeats.
	SnapshotThreshold int64             // Approximate number of log entries between snapshots.
	Acl               access.AccessList // For sending RPC to the members.
}

RaftConfig is passed to NewRaft to avoid lots of parameters.

type Term

type Term uint64

Term is a counter incremented each time a member starts an election. The log will show gaps in Term numbers because all elections need not be successful.

func (Term) VDLIsZero

func (x Term) VDLIsZero() bool

func (*Term) VDLRead

func (x *Term) VDLRead(dec vdl.Decoder) error

func (Term) VDLReflect

func (Term) VDLReflect(struct {
	Name string `vdl:"v.io/x/ref/lib/raft.Term"`
})

func (Term) VDLWrite

func (x Term) VDLWrite(enc vdl.Encoder) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL