posix

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 5, 2024 License: Apache-2.0 Imports: 18 Imported by: 0

README

POSIX Design

This document describes how the storage implementation for running Tessera on a POSIX-compliant filesystem is intended to work.

Overview

POSIX provides for a small number of atomic operations on compliant filesystems.

This design leverages those to safely maintain a Merkle tree log on disk, in a format which can be exposed directly via a read-only endpoint to clients of the log (for example, using nginx or similar).

In contrast with some of other other storage backends, sequencing and integration of entries into the tree is synchronous.

The implementation uses a .state/ directory to coordinate operation. This directory does not need to be visible to log clients, but it does not contain sensitive data and so it isn't a problem if it is made visible.

Life of a leaf

In the description below, when we talk about writing to files - either appending or creating new ones, the actual process used always follows the following pattern:

  1. Create a temporary file on the same filesystem as the target location
  2. If we're appending data, copy the contents of the prefix location into the temporary file
  3. Write any new/additional data into the temporary file
  4. Close the temporary file
  5. Rename the temporary file into the target location.

The final step in the dance above is atomic according to the POSIX spec, so in performing this sequence of actions we can avoid corrupt or partially written files being part of the tree.

  1. Leaves are submitted by the binary built using Tessera via a call the storage's Add func.
  2. The storage library batches these entries up, and, after a configurable period of time has elapsed or the batch reaches a configurable size threshold, the batch is sequenced and integrated into the tree:
    1. An advisory lock is taken on .state/treeState.lock file. This helps prevent multiple frontends from stepping on each other, but isn't necesary for safety.
    2. Flushed entries are assigned contiguous sequence numbers, and written out into entry bundle files.
    3. Integrate newly added leaves into Merkle tree, and write tiles out as files.
    4. Update ./state/treeState file with the new size & root hash.
  3. Asynchronously, at an interval determined by the WithCheckpointInterval option, the checkpoint file will be updated:
    1. An advisory lock is taken on .state/publish.lock
    2. If the last-modified date of the checkpoint file is older than the checkpoint update interval, a new checkpoint which commits to the latest tree state is produced and written to the checkpoint file.

Filesystems

This implementation has been somewhat tested on both a local ext4 filesystem and on a distributed CephFS instance on GCP, in both cases with multiple personality binaries attempting to add new entries concurrently.

Documentation

Overview

Copyright 2024 The Tessera authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type NewTreeFunc

type NewTreeFunc func(size uint64, root []byte) error

NewTreeFunc is the signature of a function which receives information about newly integrated trees.

type Storage

type Storage struct {
	// contains filtered or unexported fields
}

Storage implements storage functions for a POSIX filesystem. It leverages the POSIX atomic operations.

func New

func New(ctx context.Context, path string, create bool, opts ...func(*options.StorageOptions)) (*Storage, error)

New creates a new POSIX storage. - path is a directory in which the log should be stored - create must only be set when first creating the log, and will create the directory structure and an empty checkpoint

func (*Storage) Add

Add takes an entry and queues it for inclusion in the log. Upon placing the entry in an in-memory queue to be sequenced, it returns a future that will evaluate to either the sequence number assigned to this entry, or an error. This future is made available when the entry is queued. Any further calls to Add after this returns will guarantee that the later entry appears later in the log than any earlier entries. Concurrent calls to Add are supported, but the order they are queued and thus included in the log is non-deterministic.

If the future resolves to a non-error state then it means that the entry is both sequenced and integrated into the log. This means that a checkpoint will be available that commits to this entry.

It is recommended that the caller keeps the process running until all futures returned by this method have successfully evaluated. Terminating earlier than this will likely mean that some of the entries added are not committed to by a checkpoint, and thus are not considered to be in the log.

func (*Storage) ReadCheckpoint

func (s *Storage) ReadCheckpoint(_ context.Context) ([]byte, error)

func (*Storage) ReadEntryBundle

func (s *Storage) ReadEntryBundle(_ context.Context, index, logSize uint64) ([]byte, error)

ReadEntryBundle retrieves the Nth entries bundle for a log of the given size.

func (*Storage) ReadTile

func (s *Storage) ReadTile(_ context.Context, level, index, logSize uint64) ([]byte, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL