Documentation ¶
Index ¶
- Constants
- Variables
- func DecodeBlob(blob []byte) ([]byte, error)
- func DecodePackFile(r io.Reader, f func(chunk []byte)) error
- func InitBandwidthLimit(uploadBytesPerSecond, downloadBytesPerSecond int)
- func NewHashesReader(hashes []Hash, sem chan bool, backend Backend) io.ReadCloser
- func NewLimitedDownloadReader(r io.Reader) io.Reader
- func NewLimitedUploadReader(r io.Reader) io.Reader
- func PackBlob(h Hash, chunk []byte, packFileSize int64) (idx, pack []byte)
- func SetLogger(l *u.Logger)
- type Backend
- type BlobLocation
- type ChunkIndex
- type FileStorage
- type GCSOptions
- type Hash
- type HashSplitter
- type MerkleHash
- type PackFileBackend
- func (pb *PackFileBackend) Fsck()
- func (pb *PackFileBackend) HashExists(hash Hash) bool
- func (pb *PackFileBackend) Hashes() map[Hash]struct{}
- func (pb *PackFileBackend) ListMetadata() map[string]time.Time
- func (pb *PackFileBackend) LogStats()
- func (pb *PackFileBackend) MetadataExists(name string) bool
- func (pb *PackFileBackend) Read(hash Hash) (io.ReadCloser, error)
- func (pb *PackFileBackend) ReadMetadata(name string) []byte
- func (pb *PackFileBackend) String() string
- func (pb *PackFileBackend) SyncWrites()
- func (pb *PackFileBackend) Write(chunk []byte) Hash
- func (pb *PackFileBackend) WriteMetadata(name string, contents []byte)
- type RobustWriteCloser
Constants ¶
const HashSize = 32
HashSize is the number of bytes in the hash values returned to represent chunks of data.
Variables ¶
var ( ErrHashNotFound = errors.New("hash not found") ErrHashMismatch = errors.New("hash value mismatch") ErrIndexMagicWrong = errors.New("index entry has incorrect magic number") ErrBlobMagicWrong = errors.New("blob has incorrect magic number") ErrPrematureEndOfData = errors.New("premature end of data") )
var BlobMagic = [4]byte{'B', 'L', '0', 'B'}
var IdxMagic = [4]byte{'I', 'd', 'x', '2'}
Functions ¶
func DecodeBlob ¶
DecodeBlob takes a blob read from a pack file (as per the specs from a BlobLocation) and returns the chunk stored in that blob.
func DecodePackFile ¶
Given a reader for a pack file, decodes it into blobs and then calls the given callback function for each blob's chunk.
func InitBandwidthLimit ¶
func InitBandwidthLimit(uploadBytesPerSecond, downloadBytesPerSecond int)
func NewHashesReader ¶
func NewHashesReader(hashes []Hash, sem chan bool, backend Backend) io.ReadCloser
NewHashesReader returns an io.ReadCloser that reads multiple hashes in parallel from the given storage backend. It supplies the bytes of the hashes' chunks concatenated together into a single stream. If non-nil, the sem parameter is used to limit the number of active readers; otherwise a fixed number of reader goroutines are launched.
Types ¶
type Backend ¶
type Backend interface { // String returns the name of the Backend in the form of a string. String() string // LogStats reports any statistics that the Backend may have gathered // during the course of its operation. LogStats() // Fsck checks the consistency of the data in the Backend and reports // any problems found via the logger specified by SetLogger. Fsck() // Write saves the provided chunk of data to storage, returning a Hash // that uniquely identifies it. Any write errors are fatal and // terminate the program. Write(chunk []byte) Hash // SyncWrites ensures that all chunks of data provided to Write have // in fact reached permanent storage. Calls to Read may not find // data stored by Write if SyncWrites hasn't been called after the // call to Write. SyncWrites() // Read returns a io.ReadCloser that provides the chunk for the given // hash. If the given hash doesn't exist in the backend, an error is // returned. Read(hash Hash) (io.ReadCloser, error) // HashExists reports whether a blob of data with the given hash exists // in the storage backend. HashExists(hash Hash) bool // Hashes returns a map that has all of the hashes stored by the // storage backend. Hashes() map[Hash]struct{} // WriteMetadata saves the given data in the storage backend, // associating it with the given name. It's mostly used for storing // data that we don't want to run through the dedupe process and want // to be able to easily access directly by name. WriteMetadata(name string, data []byte) // ReadMetadata returns the metadata for a given name that was stored // with WriteMetadata. ReadMetadata(name string) []byte // MetadataExists indicates whether the given named metadata is // present in the storage backend. MetadataExists(name string) bool // ListMetadata returns a map from all of the existing metadata // to the time each one was created. ListMetadata() map[string]time.Time }
Backend describes a general interface for low-level data storage; users can provide chunks of data that a storage backend will store (on disk, in the cloud, etc.), and are returned a Hash that identifies each such chunk. Implementations should apply deduplication so that if the same chunk is supplied multiple times, it will only be stored once.
Note: it isn't safe in general for multiple threads to call Backend methods concurrently, though the Read() method may be called by multiple threads (as long as others aren't calling other Backend methods).
func NewCompressed ¶
NewCompressed returns a new storage.Backend that applies gzip compression to the contents of chunks stored in the provided underlying backend. Note: the contents of metadata files are not compressed.
func NewDisk ¶
NewDisk returns a new storage.Backend that stores data to the given dir. This directory should be empty the first time NewDisk is called with it.
func NewEncrypted ¶
NewEncrypted returns a storage.Backend that applies AES encryption to the chunk data stored in the underlying storage.Backend. Note: metadata contents and the names of named hashes are not encrypted.
func NewGCS ¶
func NewGCS(options GCSOptions) Backend
type BlobLocation ¶
External representation of the location of a blob in a pack file that's returned to callers.
type ChunkIndex ¶
type ChunkIndex struct {
// contains filtered or unexported fields
}
ChunkIndex maintains an index from hashes to the locations of their blobs in pack files.
func (*ChunkIndex) AddIndexFile ¶
func (c *ChunkIndex) AddIndexFile(packName string, idx []byte) (int, error)
Takes the entire contents of an index file and associates its index entries with the given pack file name. Returns the number of entries added and the error (if any).
func (*ChunkIndex) AddSingle ¶
func (c *ChunkIndex) AddSingle(hash Hash, packName string, offset, length int64)
func (*ChunkIndex) Hashes ¶
func (c *ChunkIndex) Hashes() map[Hash]struct{}
func (*ChunkIndex) Lookup ¶
func (c *ChunkIndex) Lookup(hash Hash) (BlobLocation, error)
type FileStorage ¶
type FileStorage interface { // CreateFile returns a RobustWriteCloser for a file with the given name; // a fatal error occurs if a file with that name already exists. CreateFile(name string) RobustWriteCloser // ReadFile returns the contents of the given file. If length is zero, the // whole file contents are returned; otherwise the segment starting at offset // with given length is returned. // // TODO: it might be more idiomatic to return e.g. an io.ReadCloser, // but between the GCS backend needing to be able to retry reads and // the fact that callers usually want a []byte in the end anyway, this // seems more straightforward overall. ReadFile(name string, offset int64, length int64) ([]byte, error) // ForFiles calls the given callback function for all files with the // given directory prefix, providing the file path and its creation // time. ForFiles(prefix string, f func(path string, created time.Time)) String() string // Fsck checks the validity of the stored data. The returned Boolean // value indicates whether or not the caller should continue and // perform its own checks on the contents of the data as well. Fsck() bool }
FileStorage is a simple abstraction for a storage system.
type GCSOptions ¶
type Hash ¶
Hash encodes a fixed-size secure hash of a collection of bytes.
type HashSplitter ¶
type HashSplitter struct {
// contains filtered or unexported fields
}
The lowest bits seem to be most useful; splitting based on, say, 4 bits in the middle is fiddly, especially when it spans the 16th bit.
func NewHashSplitter ¶
func NewHashSplitter(splitBits uint) *HashSplitter
func (*HashSplitter) AddByte ¶
func (hs *HashSplitter) AddByte(b byte)
func (*HashSplitter) Reset ¶
func (hs *HashSplitter) Reset()
func (*HashSplitter) SplitFromReader ¶
func (hs *HashSplitter) SplitFromReader(reader io.ByteReader) (ret []byte)
func (*HashSplitter) SplitNow ¶
func (hs *HashSplitter) SplitNow() bool
type MerkleHash ¶
func DecodeMerkleHash ¶
func DecodeMerkleHash(r io.Reader) (sh MerkleHash)
func MerkleFromSingle ¶
func MerkleFromSingle(hash Hash) MerkleHash
func NewMerkleHash ¶
func NewMerkleHash(b []byte) MerkleHash
func SplitAndStore ¶
func SplitAndStore(r io.Reader, backend Backend, splitBits uint) MerkleHash
Split the bytes of the given io.Reader using a rolling checksum into chunks of size (on average) 1<<splitBits. Return the hash for the root of a Merkle tree that identifies the data stored in the given storage backend.
func (*MerkleHash) Bytes ¶
func (sh *MerkleHash) Bytes() []byte
func (*MerkleHash) Fsck ¶
func (h *MerkleHash) Fsck(backend Backend)
func (*MerkleHash) NewReader ¶
func (h *MerkleHash) NewReader(sem chan bool, backend Backend) io.ReadCloser
type PackFileBackend ¶
type PackFileBackend struct {
// contains filtered or unexported fields
}
PackFileBackend implements the storage.Backend interface, but depends on an implementation of the FileStorage interface to handle the mechanics of storing and retrieving files. In turn, we can implement functionality that's common between the disk and GCS backends in a single place.
func (*PackFileBackend) Fsck ¶
func (pb *PackFileBackend) Fsck()
func (*PackFileBackend) HashExists ¶
func (pb *PackFileBackend) HashExists(hash Hash) bool
func (*PackFileBackend) Hashes ¶
func (pb *PackFileBackend) Hashes() map[Hash]struct{}
func (*PackFileBackend) ListMetadata ¶
func (pb *PackFileBackend) ListMetadata() map[string]time.Time
func (*PackFileBackend) LogStats ¶
func (pb *PackFileBackend) LogStats()
func (*PackFileBackend) MetadataExists ¶
func (pb *PackFileBackend) MetadataExists(name string) bool
func (*PackFileBackend) Read ¶
func (pb *PackFileBackend) Read(hash Hash) (io.ReadCloser, error)
func (*PackFileBackend) ReadMetadata ¶
func (pb *PackFileBackend) ReadMetadata(name string) []byte
func (*PackFileBackend) String ¶
func (pb *PackFileBackend) String() string
func (*PackFileBackend) SyncWrites ¶
func (pb *PackFileBackend) SyncWrites()
func (*PackFileBackend) Write ¶
func (pb *PackFileBackend) Write(chunk []byte) Hash
func (*PackFileBackend) WriteMetadata ¶
func (pb *PackFileBackend) WriteMetadata(name string, contents []byte)
type RobustWriteCloser ¶
type RobustWriteCloser interface { Write(b []byte) Close() }
RobustWriteCloser is like a io.WriteCloser, except it treats any errors as fatal errors and thus doesn't have error return values. Write() always writes all bytes given to it, and after a call to Close() returns, the contents have successfully been committed to storage.