package module
v0.0.0-...-467abdf Latest Latest

This package is not in the latest version of its module.

Go to latest
Published: Feb 26, 2019 License: Apache-2.0 Imports: 21 Imported by: 0


Durable is a package, written in the Go programming language, that contains an implementation of a journal that may be used for write-ahead logging, and a utility for atomically writing files.

The atomic file utility follows guidelines inferred from [1] to achieve atomic writes on a variety of filesystems.

The journal allocates and overwrites fixed-size files, called volumes, using fdatasync for durability. The contents of each volume optionally contain a table of contents, written just before closing the volume, to identify what journal entries are present in the volume. CRCs are used throughout to detect corruption. An incomplete final entry, as could be caused by a power failure, and a corrupted final entry are distinguished by the use of a 32-bit magic number written at a known location in each disk sector.

If you are interested in using this package, would like to report a bug, or wish to see additions or changes, please submit a pull request or contact me at ben at woozlesaurus dot com.

Durable's API is not yet frozen, and I am planning one backwards-incompatible change to it. Should you wish to have a frozen version of the API, please submit a pull request or email me.

[1] Pillai, T.S., Chidambaram, V., Alagappan, R., Al-Kiswany, S., Arpaci-Dusseau, A.C. and Arpaci-Dusseau, R.H., 2014, October. All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications. In OSDI (pp. 433-448).



Package durable contains an implementation of a journal that may be used for write-ahead logging, and a utility for atomically writing files.



View Source
const (
	Error int = iota

Log levels for the durable package. To configure a logger to pass into a function in the durable package, do the following, for example:

var log0 *log.Loggerl = log.New(...)
log1 := loglvl.New(Warning, log0)
opts := durable.StandardJournalOptions()
opts.Logger = log1
jrnl := durable.NewJournal(dir, done, opts)
View Source
const MinAtomicFileLength = fileHeaderLen + crc64Len

MinAtomicFileLength is the minimum length of a file written using AtomicFileWriter. A file having this length contains zero bytes written using the AtomicFileWriter's Write function.

View Source
const VolumeSuffix = ".vol"

VolumeSuffix is the suffix of volume files.


This section is empty.


func ScanJournal

func ScanJournal(callbacks JournalScannerCallbacks,
	magic MagicNumber,
	dir string,
	from, to, meta, data int,
	partial int64,
	logger loglvl.Logger) (err error)

ScanJournal iterates over journal entries in a directory. Functions of the callbacks parameter are invoked for each entry.

Only volumes with volume number v such that from <= v < to are scanned. An error will be returned if the magic number of some scanned volume does not equal magic. On detection of a partial write, at most partial bytes of the volume containing the suspected partial write will be read; if partial is negative, then all remaining bytes of the volume containing the partial write will be read. meta and data should be expected upper bounds on the length of metadata and data slices, respectively.


type AtomicFileReader

type AtomicFileReader struct {
	// contains filtered or unexported fields

Type AtomicFileReader reads a file written by AtomicFileWriter.

func NewAtomicFileReader

func NewAtomicFileReader(filename string, logger loglvl.Logger) AtomicFileReader

NewAtomicFileReader returns an AtomicFileReader with the given name. log is used to log errors encountered when closing the file. It panics if log.Logger is nil.

func (*AtomicFileReader) Close

func (a *AtomicFileReader) Close() error

Close closes a's file.

func (*AtomicFileReader) Open

func (a *AtomicFileReader) Open() (magic MagicNumber, err error)

Open opens a's file. It returns a's magic number, and any error encountered.

func (*AtomicFileReader) Read

func (a *AtomicFileReader) Read(p []byte) (n int, err error)

Read reads up to len(p) bytes into p. It returns the number of bytes read, and any error encountered. If this was the first call to Read or ReadByte, it checks the file's CRC.

func (*AtomicFileReader) ReadByte

func (a *AtomicFileReader) ReadByte() (b byte, err error)

ReadByte reads a single byte from a's file. It returns the byte read and any error encountered. If this is the first call to ReadByte or Read, it checks the file's CRC.

type AtomicFileWriter

type AtomicFileWriter struct {
	// contains filtered or unexported fields

Type AtomicFileWriter writes a file atomically. The write is atomic in the sense that if the file with the name supplied to NewAtomicFileWriter is successfully written, it will contain exactly the bytes written via the Write calls (plus a file header and a 64-bit CRC). This atomicity guarantee holds for many common file systems, e.g., ext4. Calls must be made in the following order: Open, Write (zero or more calls), SyncAndRename.

AtomicFileWriter creates a temporary file in Open, to which Write calls append bytes, and then renames the temporary file in SyncAndRename.

func NewAtomicFileWriter

func NewAtomicFileWriter(filename string, magic MagicNumber, logger loglvl.Logger) AtomicFileWriter

NewAtomicFileWriter returns an AtomicFileWriter with the given name. The magic bytes are written in the file's header (by the Open call). log is used to log errors encountered when closing the temporary file. NewAtomicFileWriter panics if log.Logger is nil.

func (*AtomicFileWriter) Open

func (a *AtomicFileWriter) Open() (err error)

Open opens a's temporary file for writing, and writes a file header to it.

func (*AtomicFileWriter) RemoveTmp

func (a *AtomicFileWriter) RemoveTmp() error

RemoveTmp removes the temporary file created by Open. It should be used if Open, Write, or SyncAndRename return a non-nil error.

func (*AtomicFileWriter) SyncAndRename

func (a *AtomicFileWriter) SyncAndRename() (err error)

SyncAndRename appends a CRC to a's temporary file, syncs and closes the file, renames it, and syncs its directory.

func (*AtomicFileWriter) Write

func (a *AtomicFileWriter) Write(p []byte) (n int, err error)

Write writes to a's temporary file. It returns the number of bytes written, and any error encountered.

type Journal

type Journal struct {
	// contains filtered or unexported fields

A journal is an on-disk data structure that stores a sequence of entries. A common use for a journal is to store write-ahead or undo/redo information.

Each entry in a journal consists of metadata and data. Metadata may be information about the data, or other header-type information, and may be much shorter than data. For example, metadata could contain a transaction sequence number.

A journal is written using one or more calls to the Append function, and is read using the ScanJournal function. On scanning a journal, entries with irrelevant metadata (e.g., a sequence number no greater than a snapshot's sequence number, or an entry type irrelevant to the scan) can be skipped.

A journal is stored as a sequence of files, named for example

0000000000.vol, 0000000001.vol, 0000000002.vol, etc.

where each file is called a volume. A single journal entry may be split across multiple volumes.

func NewJournal

func NewJournal(dir string, done <-chan struct{}, opts JournalOptions) (Journal, error)

NewJournal returns a Journal that writes volumes in the directory dir. done is a channel that, when closed, will cause the volume maker goroutine to return. opts is described in JournalOptions. NewJournal returns a non-nil error if an existing temporary blank volume cannot be removed.

func (*Journal) Append

func (jrnl *Journal) Append(meta, data []byte) error

Append appends an entry (metadata and data) to a journal. meta and data must each be no longer than 2^31 - 5 bytes. (Of course, practically speaking, it makes no sense to append an entry with meta or data even close to that size.)

func (*Journal) Close

func (jrnl *Journal) Close() error

Close closes a journal, returning any error encountered. It also writes the table of contents to the current volume before closing it.

func (*Journal) Flush

func (jrnl *Journal) Flush() error

Flush flushes buffered data to the underlying volume.

func (*Journal) Start

func (jrnl *Journal) Start(group *sync.WaitGroup) error

Start checks the journal info file, starts a goroutine that produces blank volumes, and opens a volume. The journal info file contains expected volume size, TOC length, and sector size. The blank volumes are named 000.bvl, 001.bvl, etc. Start returns any error encountered in opening the volume. Stopping the volume maker goroutine should be done by closing the done channel that was an argument to NewJournal, and then calling group.Wait(). Closing the journal should be done by calling Close.

func (*Journal) Sync

func (jrnl *Journal) Sync() error

Sync ensures that data appended is durably stored. If it is called, Flush should be called first.

type JournalOptions

type JournalOptions struct {
	// Magic is the journal file's magic number, and is written at the start of
	// each sector to detect partial writes. NewJournal panics if Magic is zero.
	Magic MagicNumber

	// Size is the size of volume files in bytes. NewJournal panics if
	// Size > 2^32.
	Size int64

	// Blanks is the number of blank volume files.
	Blanks uint8

	// Alloc specifies the behavior of the maker of blank volume files.
	//   * WriteBlanks means to allocate such files by sequentially writing zeros
	//     until Size bytes are written.
	//   * FallocateBlanks means to use syscall.Fallocate to allocate blank
	//     volumes, but not to overwrite with zeros.
	//   * FallocateAndWriteBlanks means to use syscall.Fallocate, and then to
	//     overwrite with zeros.
	// Recommendations about which strategy to use are as follows:
	//   * If the fallocate system call is not available on your system/s, use
	//     WriteBlanks.
	//   * Otherwise, if using flash storage, use FallocateBlanks, because of
	//     endurance concerns.
	//   * Otherwise, use FallocateAndWriteBlanks.
	Alloc VolumeAllocOption

	// TOCLength is the maximum length of table of contents (TOC) data.
	// If TOCLength is smaller than 1, then TOC data will not be written.
	// NewJournal panics if TOCLength > 0 and TOCCallbacks == nil.
	TOCLength int

	// NextVolume is called when a volume is closed, if non-nil and TOCLength is
	// greater than 0. It provides the content of tables of content (TOCs);
	// the returned slice is the table of contents of the volume being closed.
	// Journal.Append will return an error if the returned slice has length
	// greater than the table of contents length. volumeN is the number of the
	// volume being closed. The spill parameter indicates whether the most recent
	// journal entry will also be written to the next volume.
	// Given the sequence of calls:
	//   callbacks.Next(volumeN, spill)
	//   journal.Append(entry0, _)
	// where Append(entry0, _) is the first Append call after the Next call,
	// the following conditions hold:
	//   * volumeN+1 is guaranteed to contain all or part of entry0.
	//   * volumeN is guaranteed not to contain any of entry0.
	// Also, given the sequence of calls:
	//   journal.Append(entry1, _)
	//   journal.Append(entry2, _)
	//   callbacks.Next(volumeM, spill)
	// where Append(entry1, _) and Append(entry2, _) are the Append calls
	// preceding the Next call, the following conditions hold:
	//   * volumeM will contain all or part of entry2.
	//   * volumeM+1 will not contain entry1.
	//   * volumeM+1 will contain part of entry2 if and only if spill is true.
	NextVolume func(volumeN int, spill bool) (tableOfContents []byte)

	// SectorSize should be the length L of the storage device's unit of atomic
	// writes. Smaller values than L are acceptable (e.g., if a disk's sector
	// size is 4096, then SectorSize==512 is acceptable). SectorSize must be less
	// than or equal to 2^16. For proper partial write detection, SectorSize must
	// be less than or equal to L.
	SectorSize int

	// Logger is a logger for use by the journal. The durable package's log levels
	// are Error, Warning, Info, and Debug. See those constants or the function
	// StandardJournalOptions for an example of how to configure Logger.
	// NewJournal panics if Logger.Logger is nil.
	Logger loglvl.Logger

JournalOptions is the type used to provide options to the function NewJournal. For reasonable defaults, use the function StandardJournalOptions.

func StandardJournalOptions

func StandardJournalOptions() JournalOptions

StandardJournalOptions returns JournalOptions that are reasonable defaults.

type JournalScannerCallbacks

type JournalScannerCallbacks interface {
	// Meta is invoked when metadata is read. A return value of true indicates
	// that the data portion should also be read, and false indicates the data
	// portion should be skipped. This allows application code to skip irrelevant
	// journal entries (e.g., if a sequence number contained in the metadata is
	// no greater than the sequence number of a snapshot).
	Meta(meta []byte) (readData bool)

	// Data is invoked when metadata and data have been read.
	Data(meta, data []byte)

	// NextVolume is invoked when a volume is opened.
	// The volume parameter is the volume number.
	NextVolume(volume int)

	// If Stop returns true, then the journal scanner stops scanning; no further
	// metadata or data will be read. If it returns false, then the journal
	// continues scanning as long as there are more journal entries.
	Stop() bool

ScanJournal invokes Functions of JournalScannerCallbacks during the scanning of a journal. Three of these functions --Meta, Data, and NextVolume-- inform the application code (e.g., recovery logic) about information in the journal. The Stop function, in contrast, lets the journal scanner know whether it should stop scanning.

type MagicNumber

type MagicNumber [magicNumberLen]byte

A magic number is four bytes that identify the format of a file. Magic numbers are also used in journal volumes to detect partial writes. Except in AtomicFile, magic numbers should be nonzero.

type Volume

type Volume struct {
	// contains filtered or unexported fields

A volume is a single file that is part of a journal. The Volume type has utility functions that form a volume's file name and full path from a directory and volume number, that parse directory and number from a volume's full path, and that read the table of contents of a volume.

func NewVolume

func NewVolume(dir string, n int) Volume

NewVolume returns a Volume with the given directory and number. It panics if n > math.MaxInt32.

func NewVolumeFromFilename

func NewVolumeFromFilename(filename string) (Volume, error)

NewVolumeFromFilename returns a volume with directory and number extracted from filename. If filename is not a string that would have been produced by Path, it returns a non-nil error.

func (Volume) Dir

func (v Volume) Dir() string

Dir returns the directory of volume v.

func (Volume) Name

func (v Volume) Name() string

Name returns the file name of volume v.

func (Volume) Num

func (v Volume) Num() int

Num returns the number of volume v.

func (Volume) Path

func (v Volume) Path() string

Path returns the path of volume v.

func (Volume) ReadTOC

func (v Volume) ReadTOC(buf []byte, log loglvl.Logger) (
	toc []byte, magic MagicNumber, err error)

ReadTOC reads the table of contents (TOC) of a volume. The returned TOC will be written into a slice of buf, if len(buf) is at least 4 plus the length of the TOC. log is used for logging file close errors. ReadTOC returns the TOC and the volume's file header, along with any error encountered. If the the volume does not have a TOC, then it returns the nil slice for toc. ReadTOC panics if log.Logger is nil.

type VolumeAllocOption

type VolumeAllocOption int

Type VolumeAllocOption provides allocation options for volumes. See JournalOptions for details.

const (
	// Write zeros
	WriteBlanks VolumeAllocOption = iota

	// Use syscall.Fallocate

	// Use syscall.Fallocate, then overwrite with zeros

type VolumeScanner

type VolumeScanner struct {
	Dir       string
	Volume    int
	DataSize  int
	Callbacks VolumeScannerCallbacks
	Log       loglvl.Logger
	// contains filtered or unexported fields

VolumeScanner allows the inspection of a single volume. To scan a journal, use the function ScanJournal instead of VolumeScanner. VolumeScanner is public for the main program voldump, in durable/cmd/voldump.go.

func (*VolumeScanner) Close

func (vs *VolumeScanner) Close() error

Closes closes any resources held by a volume scanner.

func (*VolumeScanner) Scan

func (vs *VolumeScanner) Scan() error

Scan reads a volume, invoking functions of vs.Callbacks as records are encountered.

type VolumeScannerCallbacks

type VolumeScannerCallbacks interface {
	// Entry is invoked when an entry header is read.
	// metaLen is the length of the entry's metadata,
	// and dataLen is the length of the entry's data.
	Entry(metaLen, dataLen uint32, offset int64)

	// Segment is invoked when a segment is read.
	// kind is the type of segment, either "meta" or "data".
	// contents is the contents of the segment.
	Segment(kind string, contents []byte, offset int64)

	// NextVolume is invoked when a next volume record is read.
	NextVolume(offset int64)

	// TOC is invoked when the table of contents is read.
	// data is the contents of the table of contents.
	TOC(data []byte, offset int64)

Functions of VolumeScannerCallbacks are invoked during the scanning of a volume, as volume records are encountered. Each function has an offset parameter, which is the offset in the file of the record type byte.


Path Synopsis
voldump is a program for dumping the contents of a volume.
voldump is a program for dumping the contents of a volume.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL