Documentation ¶
Overview ¶
Package wal provides an implementation of write ahead log that is used by etcd.
A WAL is created at a particular directory and is made up of a number of segmented WAL files. Inside each file the raft state and entries are appended to it with the Save method:
metadata := []byte{} w, err := wal.Create(zap.NewExample(), "/var/lib/etcd", metadata) ... err := w.Save(s, ents)
After saving a raft snapshot to disk, SaveSnapshot method should be called to record it. So WAL can match with the saved snapshot when restarting.
err := w.SaveSnapshot(walpb.Snapshot{Index: 10, Term: 2})
When a user has finished using a WAL it must be closed:
w.Close()
Each WAL file is a stream of WAL records. A WAL record is a length field and a wal record protobuf. The record protobuf contains a CRC, a type, and a data payload. The length field is a 64-bit packed structure holding the length of the remaining logical record data in its lower 56 bits and its physical padding in the first three bits of the most significant byte. Each record is 8-byte aligned so that the length field is never torn. The CRC contains the CRC32 value of all record protobufs preceding the current record.
WAL files are placed inside the directory in the following format: $seq-$index.wal
The first WAL file to be created will be 0000000000000000-0000000000000000.wal indicating an initial sequence of 0 and an initial raft index of 0. The first entry written to WAL MUST have raft index 0.
WAL will cut its current tail wal file if its size exceeds 64 MB. This will increment an internal sequence number and cause a new file to be created. If the last raft index saved was 0x20 and this is the first time cut has been called on this WAL then the sequence will increment from 0x0 to 0x1. The new file will be: 0000000000000001-0000000000000021.wal. If a second cut issues 0x10 entries with incremental index later, then the file will be called: 0000000000000002-0000000000000031.wal.
At a later time a WAL can be opened at a particular snapshot. If there is no snapshot, an empty snapshot should be passed in.
w, err := wal.Open("/var/lib/etcd", walpb.Snapshot{Index: 10, Term: 2}) ...
The snapshot must have been written to the WAL.
Additional items cannot be Saved to this WAL until all the items from the given snapshot to the end of the WAL are read first:
metadata, state, ents, err := w.ReadAll()
This will give you the metadata, the last raft.State and the slice of raft.Entry items in the log.
Index ¶
- Constants
- Variables
- func Exist(dir string) bool
- func MinimalEtcdVersion(ents []raftpb.Entry) *semver.Version
- func MustUnmarshalEntry(d []byte) raftpb.Entry
- func MustUnmarshalState(d []byte) raftpb.HardState
- func ReadWALVersion(w *WAL) (*walVersion, error)
- func Repair(lg *zap.Logger, dirpath string) bool
- func ValidSnapshotEntries(lg *zap.Logger, walDir string) ([]walpb.Snapshot, error)
- func Verify(lg *zap.Logger, walDir string, snap walpb.Snapshot) (*raftpb.HardState, error)
- func VisitFileDescriptor(file protoreflect.FileDescriptor, visitor Visitor) error
- type Decoder
- type Visitor
- type WAL
- func (w *WAL) Close() error
- func (w *WAL) ReadAll() (metadata []byte, state raftpb.HardState, ents []raftpb.Entry, err error)
- func (w *WAL) ReleaseLockTo(index uint64) error
- func (w *WAL) Reopen(lg *zap.Logger, snap walpb.Snapshot) (*WAL, error)
- func (w *WAL) Save(st raftpb.HardState, ents []raftpb.Entry) error
- func (w *WAL) SaveSnapshot(e walpb.Snapshot) error
- func (w *WAL) SetUnsafeNoFsync()
- func (w *WAL) Sync() error
Constants ¶
const ( MetadataType int64 = iota + 1 EntryType StateType CrcType SnapshotType )
Variables ¶
var ( // SegmentSizeBytes is the preallocated size of each wal segment file. // The actual size might be larger than this. In general, the default // value should be used, but this is defined as an exported variable // so that tests can set a different segment size. SegmentSizeBytes int64 = 64 * 1000 * 1000 // 64MB ErrMetadataConflict = errors.New("wal: conflicting metadata found") ErrFileNotFound = errors.New("wal: file not found") ErrCRCMismatch = walpb.ErrCRCMismatch ErrSnapshotMismatch = errors.New("wal: snapshot mismatch") ErrSnapshotNotFound = errors.New("wal: snapshot not found") ErrSliceOutOfRange = errors.New("wal: slice bounds out of range") ErrDecoderNotFound = errors.New("wal: decoder not found") )
Functions ¶
func MinimalEtcdVersion ¶
MinimalEtcdVersion returns minimal etcd able to interpret entries from WAL log, determined by looking at entries since the last snapshot and returning the highest etcd version annotation from used messages, fields, enums and their values.
func MustUnmarshalEntry ¶
func MustUnmarshalState ¶
func ReadWALVersion ¶
ReadWALVersion reads remaining entries from opened WAL and returns struct that implements schema.WAL interface.
func ValidSnapshotEntries ¶
ValidSnapshotEntries returns all the valid snapshot entries in the wal logs in the given directory. Snapshot entries are valid if their index is less than or equal to the most recent committed hardstate.
func Verify ¶
Verify reads through the given WAL and verifies that it is not corrupted. It creates a new decoder to read through the records of the given WAL. It does not conflict with any open WAL, but it is recommended not to call this function after opening the WAL for writing. If it cannot read out the expected snap, it will return ErrSnapshotNotFound. If the loaded snap doesn't match with the expected one, it will return error ErrSnapshotMismatch.
func VisitFileDescriptor ¶
func VisitFileDescriptor(file protoreflect.FileDescriptor, visitor Visitor) error
VisitFileDescriptor calls visitor on each field and enum value with etcd version read from proto definition. If field/enum value is not annotated, visitor will be called with nil. Upon encountering invalid annotation, will immediately exit with error.
Types ¶
type Decoder ¶
type Decoder interface { Decode(rec *walpb.Record) error LastOffset() int64 LastCRC() uint32 UpdateCRC(prevCrc uint32) }
func NewDecoder ¶
func NewDecoder(r ...fileutil.FileReader) Decoder
func NewDecoderAdvanced ¶
func NewDecoderAdvanced(continueOnCrcError bool, r ...fileutil.FileReader) Decoder
type WAL ¶
type WAL struct {
// contains filtered or unexported fields
}
WAL is a logical representation of the stable storage. WAL is either in read mode or append mode but not both. A newly created WAL is in append mode, and ready for appending records. A just opened WAL is in read mode, and ready for reading records. The WAL will be ready for appending after reading out all the previous records.
func Create ¶
Create creates a WAL ready for appending records. The given metadata is recorded at the head of each WAL file, and can be retrieved with ReadAll after the file is Open.
func Open ¶
Open opens the WAL at the given snap. The snap SHOULD have been previously saved to the WAL, or the following ReadAll will fail. The returned WAL is ready to read and the first record will be the one after the given snap. The WAL cannot be appended to before reading out all of its previous records.
func OpenForRead ¶
OpenForRead only opens the wal files for read. Write on a read only wal panics.
func (*WAL) ReadAll ¶
ReadAll reads out records of the current WAL. If opened in write mode, it must read out all records until EOF. Or an error will be returned. If opened in read mode, it will try to read all records if possible. If it cannot read out the expected snap, it will return ErrSnapshotNotFound. If loaded snap doesn't match with the expected one, it will return all the records and error ErrSnapshotMismatch. TODO: detect not-last-snap error. TODO: maybe loose the checking of match. After ReadAll, the WAL will be ready for appending new records.
ReadAll suppresses WAL entries that got overridden (i.e. a newer entry with the same index exists in the log). Such a situation can happen in cases described in figure 7. of the RAFT paper (http://web.stanford.edu/~ouster/cgi-bin/papers/raft-atc14.pdf).
ReadAll may return uncommitted yet entries, that are subject to be overridden. Do not apply entries that have index > state.commit, as they are subject to change.
func (*WAL) ReleaseLockTo ¶
ReleaseLockTo releases the locks, which has smaller index than the given index except the largest one among them. For example, if WAL is holding lock 1,2,3,4,5,6, ReleaseLockTo(4) will release lock 1,2 but keep 3. ReleaseLockTo(5) will release 1,2,3 but keep 4.
func (*WAL) SetUnsafeNoFsync ¶
func (w *WAL) SetUnsafeNoFsync()