verifier

package
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 20, 2023 License: MPL-2.0 Imports: 10 Imported by: 10

Documentation

Index

Constants

View Source
const (
	// ExtensionMagicPrefix is the prefix we append to log Extensions fields to
	// disambiguate from other middleware that may use extensions. This value is
	// carefully constructed to be completely invalid as the beginning of a
	// protobuf (3) wire protocol message since the other known user of this field
	// encodes its data that way. If the first byte were 0xa8 this would be a
	// valid protobuf field encoding for an int field, however currently the 3
	// least significant bits encode the field type as 7, which is not a valid
	// type in the current spec. Even if this does change in the future, the
	// field's tag number encoded here is 123456789 so it's extremely unlikely
	// that any valid protobuf schema will ever have enough fields or arbitrarily
	// decide to assign field tags that large (though unrecognized tags would be
	// ignored on decode). Finally, the value of the field is the varint encoding
	// of the randomly chosen value 53906 so if type 7 is ever valid in the future
	// and used as a length-prefixed type, the length decoded would be way longer
	// than the buffer making it invalid.
	ExtensionMagicPrefix uint64 = 0xafd1f9d60392a503
)

Variables

View Source
var ErrRangeMismatch = errors.New("range mismatch")

ErrRangeMismatch is the error type returned in a VerificationReport where the follower does not have enough logs on disk to fill the checkpoint's range and so is bound to fail. This is a separate type from pure failures to read a log because it's expected this could happen just after truncations or if the interval is to large for the number of logs retained etc. Implementations may choose to detect this and report as a warning rather than a failure as it indicates only an inability to report correctly not an actual error in processing data.

View Source
var (
	// MetricDefinitions describe the metrics emitted by this library via the
	// provided metrics.Collector implementation. It's public so that these can be
	// registered during init with metrics clients that support pre-defining
	// metrics.
	MetricDefinitions = metrics.Definitions{
		Counters: []metrics.Descriptor{
			{
				Name: "checkpoints_written",
				Desc: "checkpoints_written counts the number of checkpoint entries" +
					" written to the LogStore.",
			},
			{
				Name: "ranges_verified",
				Desc: "ranges_verified counts the number of log ranges for which a" +
					" verification report has been completed.",
			},
			{
				Name: "read_checksum_failures",
				Desc: "read_checksum_failures counts the number of times a range of" +
					" logs between two check points contained at least one corruption.",
			},
			{
				Name: "write_checksum_failures",
				Desc: "write_checksum_failures counts the number of times a follower" +
					" has a different checksum to the leader at the point where it" +
					" writes to the log. This could be caused by either a disk-corruption" +
					" on the leader (unlikely) or some other corruption of the log" +
					" entries in-flight.",
			},
			{
				Name: "dropped_reports",
				Desc: "dropped_reports counts how many times the verifier routine was" +
					" still busy when the next checksum came in and so verification for" +
					" a range was skipped. If you see this happen consider increasing" +
					" the interval between checkpoints.",
			},
		},
	}
)

Functions

This section is empty.

Types

type ErrChecksumMismatch

type ErrChecksumMismatch string

ErrChecksumMismatch is the error type returned in a VerificationReport where the log range's checksum didn't match.

func (ErrChecksumMismatch) Error

func (e ErrChecksumMismatch) Error() string

Error implements error

type IsCheckpointFn

type IsCheckpointFn func(*raft.Log) (bool, error)

IsCheckpointFn is a function that can decide whether the contents of a raft log's Data represents a checkpoint message. It is called on every append so it must be relatively fast in the common case. If it returns true for a log, the log's Extra field will be used to encode verification metadata and must be empty - if it's not empty the append will fail and force the leader to step down. If an error is returned the same will happen.

type LogRange

type LogRange struct {
	Start uint64
	End   uint64
}

LogRange describes the set of logs in the range [Start, End). That is End is NOT inclusive.

func (LogRange) String

func (r LogRange) String() string

String implements Stringer

type LogStore

type LogStore struct {
	// contains filtered or unexported fields
}

LogStore is a raft.LogStore that acts as middleware around an underlying persistent store. It provides support for periodically verifying that ranges of logs read back from the LogStore match the values written, and the values read from the LogStores of other peers even though all peers will have different actual log ranges due to independent snapshotting and truncation.

Verification of the underlying log implementation may be performed as follows:

  1. The application provides an implementation of `IsCheckpoint` that is able to identify whether the encoded data represents a checkpoint command.
  2. The application's raft leader then may periodically append such a checkpoint log to be replicated out.
  3. When the LogStore has a log appended for which IsCheckpoint returns true, it will write the current cumulative checksum over log entries since the last checkpoint into the Extra field. Since hashicorp/raft only replicates to peers _after_ a trip through the LogStore, this checksum will be replicated.
  4. When a follower has a log appended for which IsCheckpoint returns true, but already has non-empty Extra metadata, it will trigger a background verification.
  5. Verification happens in the background and reads all logs from the underlying store since the last checkpoint, calculating their checksums cumulatively before calling the configured Report func with a summary of what it found.

func NewLogStore

func NewLogStore(store raft.LogStore, checkpointFn IsCheckpointFn, reportFn ReportFn, mc metrics.Collector) *LogStore

NewLogStore creates a verifying LogStore. CheckpointFn and ReportFn must be set on the returned store _before_ it is passed to Raft, or may be left as nil to bypass verification. Close must be called when the log store is no longer useful to cleanup background verification.

func (*LogStore) Close

func (s *LogStore) Close() error

Close cleans up the background verification routine and calls Close on the underlying store if it is an io.Closer.

func (*LogStore) DeleteRange

func (s *LogStore) DeleteRange(min uint64, max uint64) error

DeleteRange deletes a range of log entries. The range is inclusive.

func (*LogStore) FirstIndex

func (s *LogStore) FirstIndex() (uint64, error)

FirstIndex returns the first index written. 0 for no entries.

func (*LogStore) GetLog

func (s *LogStore) GetLog(index uint64, log *raft.Log) error

GetLog gets a log entry at a given index.

func (*LogStore) IsMonotonic added in v0.3.0

func (s *LogStore) IsMonotonic() bool

IsMonotonic implements the raft.MonotonicLogStore interface. This is a shim to expose the underlying store as monotonically indexed or not.

func (*LogStore) LastIndex

func (s *LogStore) LastIndex() (uint64, error)

LastIndex returns the last index written. 0 for no entries.

func (*LogStore) StoreLog

func (s *LogStore) StoreLog(log *raft.Log) error

StoreLog stores a log entry.

func (*LogStore) StoreLogs

func (s *LogStore) StoreLogs(logs []*raft.Log) error

StoreLogs stores multiple log entries.

type ReportFn

type ReportFn func(VerificationReport)

ReportFn is a function that will be called after every checkpoint has been verified. It will not be called concurrently. The VerificationReport may represent a failure to report so it's Err field should be checked. For example, if checkpoints are arriving faster than they can be calculated, some will be skipped and no report will be made for that range. The next report that is delivered will contain the range missed for logging. Note that ReportFn is called synchronously by the verifier so it should not block for long otherwise it may cause the verifier to miss later checkpoints.

type VerificationReport

type VerificationReport struct {
	// Range is the range of raft indexes over which the leader calculated its
	// checksum. In steady state it typically starts with the index of the
	// previous checkpoint command, but after an election it could be an arbitrary
	// point in the log. If the range is no longer in the server's log (due to not
	// seeing one yet or it being truncated too soon) this will be reported as an
	// Err - a longer log retention (`raft.Config.TrailingLogs`) or shorter
	// interval between checkpoints should be chosen if this happens often.
	Range LogRange

	// ExpectedSum is a uint64 checksum over the logs in the range as calculated
	// by the leader before appending to disk.
	ExpectedSum uint64

	// WrittenSum is the uint64 checksum calculated over the logs in the range of
	// a follower as it wrote them to it's own LogStore. It might be zero to
	// indicate that the follower has not written all the logs in Range since
	// startup and so its written sum will be invalid. Risk of collision with
	// genuine zero sum is acceptable. If zero the verifier will have ignored it
	// and not raised an error if it didn't match expected.
	WrittenSum uint64

	// ReadSum is the uint64 checksum calculated over the logs in the range as
	// read from the underlying LogStore in the range [StartIndex, EndIndex).
	ReadSum uint64

	// Err indicates any error that prevented the report from being completed or
	// the result of the report. It will be set to ErrChecksumMismatch if the
	// report was conducted correctly, but the log data written or read checksum
	// did not match the leader's write checksum. The message in the error
	// describes the nature of the failure.
	Err error

	// SkippedRange indicates the ranges of logs covered by any checkpoints that
	// we skipped due to spending too much time verifying. If this is regularly
	// non-nil it likely indicates that the checkpoint frequency is too fast.
	SkippedRange *LogRange

	// Elapsed records how long it took to read the range and generate the report.
	Elapsed time.Duration
}

VerificationReport describes the result of attempting to verify the contents of all logs in a range compared with the input the leader delivered for that same range.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL