hashtree

package
v1.7.0-e5352a5c2b844f4... Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 6, 2018 License: Apache-2.0 Imports: 11 Imported by: 19

README

This is a small library for working with modified Merkle Trees. We store one of these data structures in block storage (e.g. S3) for each PFS commit, so that we know, with each subsequent commit, what files changed and need to be reprocessed by any pipelines.

Documentation

Overview

Package hashtree is a generated protocol buffer package.

It is generated from these files:

server/pkg/hashtree/hashtree.proto

It has these top-level messages:

FileNodeProto
DirectoryNodeProto
NodeProto
HashTreeProto

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrInvalidLengthHashtree = fmt.Errorf("proto: negative length found during unmarshaling")
	ErrIntOverflowHashtree   = fmt.Errorf("proto: integer overflow")
)

Functions

func Serialize added in v1.3.6

func Serialize(h HashTree) ([]byte, error)

Serialize serializes a HashTree so that it can be persisted. Also see Deserialize(bytes).

Types

type DirectoryNodeProto

type DirectoryNodeProto struct {
	// Children of this directory. Note that paths are relative, so if "/foo/bar"
	// has a child "baz", that means that there is a file at "/foo/bar/baz".
	//
	// 'Children' is ordered alphabetically, to quickly check if a new file is
	// overwriting an existing one.
	Children []string `protobuf:"bytes,3,rep,name=children" json:"children,omitempty"`
}

DirectoryNodeProto is a node corresponding to a directory.

func (*DirectoryNodeProto) Descriptor

func (*DirectoryNodeProto) Descriptor() ([]byte, []int)

func (*DirectoryNodeProto) GetChildren

func (m *DirectoryNodeProto) GetChildren() []string

func (*DirectoryNodeProto) Marshal added in v1.5.0

func (m *DirectoryNodeProto) Marshal() (dAtA []byte, err error)

func (*DirectoryNodeProto) MarshalTo added in v1.5.0

func (m *DirectoryNodeProto) MarshalTo(dAtA []byte) (int, error)

func (*DirectoryNodeProto) ProtoMessage

func (*DirectoryNodeProto) ProtoMessage()

func (*DirectoryNodeProto) Reset

func (m *DirectoryNodeProto) Reset()

func (*DirectoryNodeProto) Size added in v1.5.0

func (m *DirectoryNodeProto) Size() (n int)

func (*DirectoryNodeProto) String

func (m *DirectoryNodeProto) String() string

func (*DirectoryNodeProto) Unmarshal added in v1.5.0

func (m *DirectoryNodeProto) Unmarshal(dAtA []byte) error

type ErrCode

type ErrCode uint8

ErrCode identifies different kinds of errors returned by methods in HashTree below. The ErrCode of any such error can be retrieved with Code().

const (
	// OK is returned on success
	OK ErrCode = iota

	// Unknown is returned by Code() when an error wasn't emitted by the HashTree
	// implementation.
	Unknown

	// Internal is returned when a HashTree encounters a bug (usually due to the
	// violation of an internal invariant).
	Internal

	// CannotDeserialize is returned when Deserialize(bytes) fails, perhaps due to
	// 'bytes' being corrupted.
	CannotDeserialize

	// Unsupported is returned when Deserialize(bytes) encounters an unsupported
	// (likely old) serialized HashTree.
	Unsupported

	// PathNotFound is returned when Get() or DeleteFile() is called with a path
	// that doesn't lead to a node.
	PathNotFound

	// MalformedGlob is returned when Glob() is called with an invalid glob
	// pattern.
	MalformedGlob

	// PathConflict is returned when a path that is expected to point to a
	// directory in fact points to a file, or the reverse. For example:
	// 1. PutFile is called with a path that points to a directory.
	// 2. PutFile is called with a path that contains a prefix that
	//    points to a file.
	// 3. Merge is forced to merge a directory into a file
	PathConflict
)

func Code

func Code(err error) ErrCode

Code returns the "error code" of 'err' if it was returned by one of the HashTree methods, or "Unknown" if 'err' was emitted by some other function (error codes are defined in interface.go)

type FileNodeProto

type FileNodeProto struct {
	// Object references an object in the object store which contains the content
	// of the data.
	Objects []*pfs.Object `protobuf:"bytes,4,rep,name=objects" json:"objects,omitempty"`
}

FileNodeProto is a node corresponding to a file (which is also a leaf node).

func (*FileNodeProto) Descriptor

func (*FileNodeProto) Descriptor() ([]byte, []int)

func (*FileNodeProto) GetObjects added in v1.3.19

func (m *FileNodeProto) GetObjects() []*pfs.Object

func (*FileNodeProto) Marshal added in v1.5.0

func (m *FileNodeProto) Marshal() (dAtA []byte, err error)

func (*FileNodeProto) MarshalTo added in v1.5.0

func (m *FileNodeProto) MarshalTo(dAtA []byte) (int, error)

func (*FileNodeProto) ProtoMessage

func (*FileNodeProto) ProtoMessage()

func (*FileNodeProto) Reset

func (m *FileNodeProto) Reset()

func (*FileNodeProto) Size added in v1.5.0

func (m *FileNodeProto) Size() (n int)

func (*FileNodeProto) String

func (m *FileNodeProto) String() string

func (*FileNodeProto) Unmarshal added in v1.5.0

func (m *FileNodeProto) Unmarshal(dAtA []byte) error

type HashTree

type HashTree interface {
	// Open makes a deep copy of the HashTree and returns the copy
	Open() OpenHashTree

	// Get retrieves a file.
	Get(path string) (*NodeProto, error)

	// List retrieves the list of files and subdirectories of the directory at
	// 'path'.
	List(path string) ([]*NodeProto, error)

	// Glob returns a list of files and directories that match 'pattern'.
	Glob(pattern string) ([]*NodeProto, error)

	// FSSize gets the size of the file system that this tree represents.
	// It's essentially a helper around h.Get("/").SubtreeBytes
	FSSize() int64

	// Walk calls a given function against every node in the hash tree.
	// The order of traversal is not guaranteed.  If any invocation of the
	// function returns an error, the walk stops and returns the error.
	Walk(path string, f func(path string, node *NodeProto) error) error

	// Diff returns a the diff of 2 HashTrees at particular Paths. It takes a
	// callback function f, which will be called with paths that are not
	// identical to the same path in the other HashTree.
	// Specify '-1' for fully recursive, or '1' for shallow diff
	Diff(oldHashTree HashTree, newPath string, oldPath string, recursiveDepth int64, f func(path string, node *NodeProto, new bool) error) error
}

HashTree is the signature of a hash tree provided by this library. To get a new HashTree, create an OpenHashTree with NewHashTree(), modify it, and then call Finish() on it.

func Deserialize added in v1.3.6

func Deserialize(serialized []byte) (HashTree, error)

Deserialize deserializes a hash tree so that it can be read or modified.

type HashTreeProto

type HashTreeProto struct {
	// Version is an arbitrary version number, set by the corresponding library
	// in hashtree.go.  This ensures that if the hash function used to create
	// these trees is changed, we won't run into errors when deserializing old
	// trees. The current version is 1.
	Version int32 `protobuf:"varint,1,opt,name=version,proto3" json:"version,omitempty"`
	// Fs maps each node's path to the NodeProto with that node's details.
	// See "Potential Optimizations" at the end for a compression scheme that
	// could be useful if this map gets too large.
	//
	// Note that the key must end in "/" if an only if the value has .dir_node set
	// (i.e. iff the path points to a directory).
	Fs map[string]*NodeProto `` /* 131-byte string literal not displayed */
}

HashTreeProto is a tree corresponding to the complete file contents of a pachyderm repo at a given commit (based on a Merkle Tree). We store one HashTree for every PFS commit.

func (*HashTreeProto) Descriptor

func (*HashTreeProto) Descriptor() ([]byte, []int)

func (*HashTreeProto) Diff added in v1.4.8

func (h *HashTreeProto) Diff(old HashTree, newPath string, oldPath string, recursiveDepth int64, f func(string, *NodeProto, bool) error) error

Diff implements HashTree.Diff

func (*HashTreeProto) FSSize added in v1.5.0

func (h *HashTreeProto) FSSize() int64

FSSize returns the size of the file system that the hashtree represents.

func (*HashTreeProto) Get

func (h *HashTreeProto) Get(path string) (*NodeProto, error)

Get retrieves the contents of a file.

func (*HashTreeProto) GetFs

func (m *HashTreeProto) GetFs() map[string]*NodeProto

func (*HashTreeProto) GetVersion

func (m *HashTreeProto) GetVersion() int32

func (*HashTreeProto) Glob

func (h *HashTreeProto) Glob(pattern string) ([]*NodeProto, error)

Glob returns a list of files and directories that match 'pattern'. The nodes returned have their 'Name' field set to their full paths.

func (*HashTreeProto) List

func (h *HashTreeProto) List(path string) ([]*NodeProto, error)

List retrieves the list of files and subdirectories of the directory at 'path'.

func (*HashTreeProto) Marshal added in v1.5.0

func (m *HashTreeProto) Marshal() (dAtA []byte, err error)

func (*HashTreeProto) MarshalTo added in v1.5.0

func (m *HashTreeProto) MarshalTo(dAtA []byte) (int, error)

func (*HashTreeProto) Open added in v1.3.6

func (h *HashTreeProto) Open() OpenHashTree

Open makes a deep copy of the HashTree and returns the copy

func (*HashTreeProto) ProtoMessage

func (*HashTreeProto) ProtoMessage()

func (*HashTreeProto) Reset

func (m *HashTreeProto) Reset()

func (*HashTreeProto) Size added in v1.3.19

func (m *HashTreeProto) Size() (n int)

func (*HashTreeProto) String

func (m *HashTreeProto) String() string

func (*HashTreeProto) Unmarshal added in v1.5.0

func (m *HashTreeProto) Unmarshal(dAtA []byte) error

func (*HashTreeProto) Walk added in v1.4.7

func (h *HashTreeProto) Walk(path string, f func(string, *NodeProto) error) error

Walk implements HashTree.Walk

type NodeProto

type NodeProto struct {
	// Name is the name (not path) of the file/directory (e.g. /lib).
	Name string `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"`
	// Hash is a hash of the node's name and contents (which includes the
	// BlockRefs of a file and the Children of a directory). This can be used to
	// detect if the name or contents have changed between versions.
	Hash []byte `protobuf:"bytes,2,opt,name=hash,proto3" json:"hash,omitempty"`
	// subtree_size is the of the subtree under node; i.e. if this is a directory,
	// subtree_size includes all children.
	SubtreeSize int64 `protobuf:"varint,3,opt,name=subtree_size,json=subtreeSize,proto3" json:"subtree_size,omitempty"`
	// Exactly one of the following fields must be set. The type of this node will
	// be determined by which field is set.
	FileNode *FileNodeProto      `protobuf:"bytes,4,opt,name=file_node,json=fileNode" json:"file_node,omitempty"`
	DirNode  *DirectoryNodeProto `protobuf:"bytes,5,opt,name=dir_node,json=dirNode" json:"dir_node,omitempty"`
}

NodeProto is a node in the file tree (either a file or a directory)

func (*NodeProto) Descriptor

func (*NodeProto) Descriptor() ([]byte, []int)

func (*NodeProto) GetDirNode

func (m *NodeProto) GetDirNode() *DirectoryNodeProto

func (*NodeProto) GetFileNode

func (m *NodeProto) GetFileNode() *FileNodeProto

func (*NodeProto) GetHash

func (m *NodeProto) GetHash() []byte

func (*NodeProto) GetName

func (m *NodeProto) GetName() string

func (*NodeProto) GetSubtreeSize

func (m *NodeProto) GetSubtreeSize() int64

func (*NodeProto) Marshal added in v1.5.0

func (m *NodeProto) Marshal() (dAtA []byte, err error)

func (*NodeProto) MarshalTo added in v1.5.0

func (m *NodeProto) MarshalTo(dAtA []byte) (int, error)

func (*NodeProto) ProtoMessage

func (*NodeProto) ProtoMessage()

func (*NodeProto) Reset

func (m *NodeProto) Reset()

func (*NodeProto) Size added in v1.5.0

func (m *NodeProto) Size() (n int)

func (*NodeProto) String

func (m *NodeProto) String() string

func (*NodeProto) Unmarshal added in v1.5.0

func (m *NodeProto) Unmarshal(dAtA []byte) error

type OpenHashTree added in v1.3.6

type OpenHashTree interface {
	HashTree
	// GetOpen retrieves a file.
	GetOpen(path string) (*OpenNode, error)

	// PutFile appends data to a file (and creates the file if it doesn't exist).
	PutFile(path string, objects []*pfs.Object, size int64) error

	// PutFileOverwrite is the same as PutFile, except that instead of
	// appending the objects to the end of the given file, the objects
	// are inserted to the given index, and the existing objects starting
	// from the given index are removed.
	//
	// sizeDelta is the delta between the size of the objects added and
	// the size of the objects removed.
	PutFileOverwrite(path string, objects []*pfs.Object, overwriteIndex *pfs.OverwriteIndex, sizeDelta int64) error

	// PutDir creates a directory (or does nothing if one exists).
	PutDir(path string) error

	// DeleteFile deletes a regular file or directory (along with its children).
	DeleteFile(path string) error

	// Merge adds all of the files and directories in each tree in 'trees' into
	// this tree. If it errors this tree will be left in a undefined state and
	// should be discarded. If you'd like to be able to revert to the previous
	// state of the tree you should Finish and then Open the tree.
	Merge(trees ...HashTree) error

	// Finish makes a deep copy of the OpenHashTree, updates all of the hashes and
	// node size metadata in the copy, and returns the copy
	Finish() (HashTree, error)
}

OpenHashTree is like HashTree, except that it can be modified. Once an OpenHashTree is Finish()ed, the hash and size stored with each node will be updated (until then, the hashes and sizes stored in an OpenHashTree will be stale).

func NewHashTree added in v1.3.6

func NewHashTree() OpenHashTree

NewHashTree creates a new hash tree implementing Interface.

type OpenNode added in v1.3.6

type OpenNode struct {
	Name string
	Size int64

	FileNode *FileNodeProto
	DirNode  *DirectoryNodeProto
}

OpenNode is similar to NodeProto, except that it doesn't include the Hash field (which is not generally meaningful in an OpenHashTree)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL