batchmap

package
v1.6.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 28, 2024 License: Apache-2.0 Imports: 14 Imported by: 7

README

Experimental Beam Map Generation

Generates Verifiable Maps using Beam Go. Generating a map in batch scales better than incremental for large numbers of key/values.

Motivation

This experimental map replaces a more interactive DB-backed map which was deleted and tracked by #2284. Unlike Logs, Maps don't have efficient consistency proofs. Without efficient consistency proofs, clients of the data structure will struggle to keep in sync with the freshest version of it while verify that everything that they had previously relied on in the data structure is still present.

One solution proposed to this (e.g. in Think Global, Act Local) is to put the roots of the map in a log. Clients will verify their own keyspace within the map for each revision and providing their area is correct, and that everyone else has also done this for their own keyspaces, then evolution of the map has been verified.

If these map revisions are being created quickly, then the number of revisions to be verified can require too much computing power for all but very dedicated clients, and this could easily lead to a situation where verification is not being performed.

This observation that verification cost scales linearly with the number of map revisions came along at a similar time to performance problems that were identified with large maps built on top of the interactive DB-backed map. Using a parallelized batch solution solves both of these issues; the temptation to create map revisions every few seconds (or less!) is removed, and massive scale is possible through parallel processing of the subtrees within the map.

Status

This code is experimental! This code is free to change outside of semantic versioning in the trillian repository.

This batch map has been run at scale to create a map with 2^30 leaves in under 25 minutes.

The resulting map is output as tiles, in which the tree is divided from the root in a configurable number of prefix strata. Each strata contains a single byte of the 256-bit path. Tiles are only output if they are non-empty.

The design of this batchmap is as a library rather than a service. The tiles that are returned must be stored by the client and serving them to users is outside the remit of this library. This may change in the future as common deployments are identified.

Running the demo

Building the map

The following instructions show how to run this code locally using the Python portable runner. Setting up a Python environment is beyond the scope of this demo, but instructions can be found at Beam Python Tips. These instructions are for Linux/MacOS but can likely be adapted for Windows.

Ensure that the Python runner is running:

  1. Check out github.com/apache/beam (tested at branch release-2.24.0)
  2. cd sdks/python within that repository
  3. python -m apache_beam.runners.portability.local_job_service_main --port=8099

In another terminal:

  1. Check out this repository and cd to the folder containing this README
  2. go run ./cmd/build/mapdemo.go --output=/tmp/mapv1 --runner=universal --endpoint=localhost:8099 --environment_type=LOOPBACK

The pipeline should run and generate files under the ouput directory, each of which contains a tile from the map. Note that a new file will be constructed for each tile output, which can get very large if the key_count or prefix_strata parameters are changed from their default values! If these parameters are set too high, one could run out of inodes on your file system. You have been warned.

The demo intends only to show the usage of the API and provide a simple way to test locally running the pipeline. It is not intended to demonstrate where data would be sourced from, or how the output Tiles should be used. See the comments in the demo script for more details.

Verifying the map

This requires a map to have been constructed using the previous instructions. Verifying that a particular key/value is set correctly within the tiles can be done with the command:

  • go run cmd/verify/verify.go --logtostderr --map_dir=/tmp/mapv1 --key=5

The map_dir must match the directory provided as output in the previous stage. The parameters for value_salt and tree_id must also match those used in the map construction as they are used during the value construction/hashing.

If the expected value is committed to correctly by the tiles, then you will see an output line similar to:

key 5 found at path 11cd1b2203ad4a3a11ff479d1ee75a59c9f33a73c5f5cf45bda87b656237e9ed, with value '[v1]5' (1e27e661ca57f2231fb41b7ef861ab702ce7412921e4df9eb106db0d8b442227) committed to by map root 4365e3c65742fdfeb60079b677ccf4a264405c0d18fc7db1706690a1b06db73c

Setting the key parameter to a key outside the range generated in the map will show non-inclusion, as will changing the tree_id or value_salt parameters.

Documentation

Overview

Package batchmap is a library to be used within Beam pipelines to construct verifiable data structures.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Create

func Create(s beam.Scope, entries beam.PCollection, treeID int64, hash crypto.Hash, prefixStrata int) (beam.PCollection, error)

Create builds a new map from the given PCollection of *Entry. Outputs the resulting Merkle tree tiles as a PCollection of *Tile.

The keys in the input PCollection must be 256-bit, uniformly distributed, and unique within the input. The values in the input PCollection must be 256-bit. treeID should be a unique ID for the lifetime of this map. This is used as part of the hashing algorithm to provide preimage resistance. If the tiles are to be imported into Trillian for serving, this must match the tree ID within Trillian. The internal hash algorithm can be picked between SHA256 and SHA512_256. The internal nodes will use this algorithm via the CONIKS strategy. prefixStrata is the number of 8-bit prefix strata. Any path from root to leaf will have prefixStrata+1 tiles.

func Update

func Update(s beam.Scope, base, delta beam.PCollection, treeID int64, hash crypto.Hash, prefixStrata int) (beam.PCollection, error)

Update takes an existing base map (PCollection of *Tile), applies the delta (PCollection of *Entry) and returns the resulting map as a PCollection of *Tile. The deltas can add new keys to the map or overwrite existing keys. Keys cannot be deleted (though their value can be set to a sentinel value).

treeID, hash, and prefixStrata must match the values passed into the original call to Create that started the base map.

Types

type Entry

type Entry struct {
	// The key that uniquely identifies this key/value.
	// These keys must be distributed uniformly and randomly across the key space.
	HashKey []byte
	// Hash of the value to be committed to. This will literally be set as a hash
	// of a TileLeaf in a leaf Tile.
	// It is the job of the code constructing this Entry to ensure that this has
	// appropriate preimage protection and domain separation. This means this will
	// likely be set to something like H(salt || data).
	//
	// TODO(mhutchinson): Revisit this. My preference is that this is set to
	// H(data), and the map synthesis will set the hash to H(salt||H(data)).
	// This allows the map to always be constructed with good security.
	HashValue []byte
}

Entry is a single key/value to be committed to by the map.

type Tile

type Tile struct {
	// The path from the root of the map to the root of this tile.
	Path []byte
	// The computed hash of this subtree.
	// TODO(mhutchinson): Consider removing this.
	RootHash []byte
	// All non-empty leaves in this tile, sorted left-to-right.
	Leaves []*TileLeaf
}

Tile represents a perfect subtree covering the whole height of a stratum within a sparse map.

type TileLeaf

type TileLeaf struct {
	// The path from the root of the container Tile to this leaf.
	Path []byte
	// The hash value being committed to.
	Hash []byte
}

TileLeaf is a leaf value of a Tile. If it belongs to a leaf tile then this represents one of the values that the map commits to. Otherwise, this leaf represents the root of the subtree in the stratum below.

Directories

Path Synopsis
cmd
build
mapdemo is a simple example that shows how a verifiable map can be constructed in Beam.
mapdemo is a simple example that shows how a verifiable map can be constructed in Beam.
verify
verify is a simple example that shows how a verifiable map can be used to demonstrate inclusion.
verify is a simple example that shows how a verifiable map can be used to demonstrate inclusion.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL