repo

package module
v0.3.0-pre1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 21, 2019 License: Apache-2.0 Imports: 24 Imported by: 0

README

Kopia Repository

Kopia

Build Status GoDoc Coverage Status Go Report Card

Features

Kppia Repository organizes raw blob storage, such as Google Cloud Storage or Amazon S3 buckets into content-addressable storage with:

  • deduplication
  • client-side encryption
  • caching
  • object splitting and merging
  • packaging and indexing (organizing many small objects into larger ones)
  • shared access from multiple computers
  • simple manifest management for storing label-addressable content

All Repository features are implemented client-side, without any need for a custom server, thus encryption keys never leave the client.

The primary user of Repository is Kopia which stores its filesystem snapshots in content-addressable storage, but Repository is designed to be a general-purpose storage system.

Repository implements 4 storage layers:

  • Object Storage for storing objects of arbitrary size with encryption and deduplication
  • Manifest Storage for storing small JSON-based manifests indexed by arbitrary labels (key=value)
  • Block Storage for storing content-addressable, indivisible blocks of relatively small sizes (up to 10-20MB each) with encryption and deduplication
  • Raw BLOB storage provides raw access to physical blocks

Usage

Initialize repository in a given storage (this is done only once).

// connect to a Google Cloud Storage blucket.
st, err := gcs.New(ctx, &gcs.Options{
  Bucket: "my-bucket",
})
password := "my-super-secret-password"
if err := repo.Initialize(ctx, st, &repo.NewRepositoryOptions{
  BlockFormat: block.FormattingOptions{
    Hash:       "HMAC-SHA256-128",
    Encryption: "AES-256-CTR",
  },
}, password); err != nil {
  log.Fatalf("unable to initialize repository: %v", err)
}

Now connect to repository, which creates a local configuration file that persists all connection details.

configFile := "/tmp/my-repo.config"
if err := repo.Connect(ctx, configFile, st, password, repo.ConnectOptions{
  CachingOptions: block.CachingOptions{
  CacheDirectory:    cacheDirectory,
  MaxCacheSizeBytes: 100000000,
},
}); err != nil {
  log.Fatalf("unable to connect to repository: %v", err)
}

To open repository use:

ctx := context.Background()
rep, err := repo.Open(ctx, configFile, password, nil)
if err != nil {
  log.Fatalf("unable to open the repository: %v", err)
}

// repository must be closed at the end.
defer rep.Close(ctx)

Writing objects:


w := rep.Objects.NewWriter(ctx, object.WriterOptions{})
defer w.Close()

// w implements io.Writer
fmt.Fprintf(w, "hello world")

// Object ID is a function of contents written, so every time we write "hello world" we're guaranteed to get exactly the same ID.
objectID, err := w.Result()
if err != nil {
  log.Fatalf("upload failed: %v", err)
}

Reading objects:

rd, err := rep.Objects.Open(ctx, objectID)
if err != nil {
  log.Fatalf("open failed: %v", err)
}
defer rd.Close()

data, err := ioutil.ReadAll(rd)
if err != nil {
  log.Fatalf("read failed: %v", err)
}

// Outputs "hello world"
log.Printf("data: %v", string(data))

Saving manifest with a given set of labels:

labels := map[string]string{
  "type": "custom-object",
  "my-kind": "greeting",
}

payload := map[string]string{
  "myObjectID": objectID,
}

manifestID, err := rep.Manifests.Put(ctx, labels, payload)
if err != nil {
  log.Fatalf("manifest put failed: %v", err)
}

log.Printf("saved manifest %v", manifestID)

Loading manifests matching labels:

manifests, err := rep.Manifests.Find(ctx, labels)
if err != nil {
  log.Fatalf("unable to find manifests: %v", err)
}
for _, m := range manifests {
  var val map[string]string

  if err := rep.Manifests.Get(ctx, m.ID, &val); err != nil {
    log.Fatalf("unable to load manfiest %v: %v", m.ID, err)
  }

  log.Printf("loaded manifest: %v created at %v", val["myObjectID"], m.ModTime)
}

FAQ

  1. How stable is it?

This library is still in development and is not ready for general use.

The repository data format is still subject to change, including backwards-incompatible changes, which will require data migration, although at some point before v1.0 we will declare the format to be stable and will maintain backward compatibility going forward.

  1. How big can a repository get?

There's no inherent size limit, but a rule of thumb should be no more than 10 TB (at least for now, until we test with larger repositories).

The data is efficiently packed into a small number of files and stored, but indexes need to be cached locally and will consume disk space and RAM.

For example:

One sample repository of 480 GB of data from home NAS containing a mix of photos, videos, documents and music files contains:

  • 1874361 content-addressable blocks/objects
  • 27485 physical objects (packs) in cloud storage bucket (typically between 20MB and 30MB each)
  • 70 MB of indexes
  1. How safe is the data?

Your data can only be as safe as the underlying storage, so it's recommended to use one of high-quality cloud storage solutions, which nowadays provide very high-durability, high-throughput and low-latency for access to your data at a very reasonable price.

In addition to that, Kopia employs several data protection techniques, such as encryption, checksumming to detect accidental bit flips, redundant storage of indexes, and others.

WARNING: It's not recommended to trust all your data to Kopia just yet - always have another backup.

  1. I'd like to contribute

Sure, get started by filing an Issue or sending a Pull request.

  1. I found a security issue

Please notify us privately at jaak@jkowalski.net so we can work on addressing the issue and releasing a patch.

Licensing

Kopia is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Disclaimer

Kopia is a personal project and is not affiliated with, supported or endorsed by Google.

Cryptography Notice

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See http://www.wassenaar.org/ for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with symmetric algorithms. The form and manner of this distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.

Documentation

Overview

Package repo implements content-addressable Repository on top of BLOB storage.

Index

Constants

View Source
const FormatBlockID = "kopia.repository"

FormatBlockID is the identifier of a storage block that describes repository format.

Variables

View Source
var (
	BuildInfo    = "unknown"
	BuildVersion = "v0-unofficial"
)

BuildInfo is the build information of Kopia.

Functions

func Connect

func Connect(ctx context.Context, configFile string, st storage.Storage, password string, opt ConnectOptions) error

Connect connects to the repository in the specified storage and persists the configuration and credentials in the file provided.

func Disconnect

func Disconnect(configFile string) error

Disconnect removes the specified configuration file and any local cache directories.

func Initialize

func Initialize(ctx context.Context, st storage.Storage, opt *NewRepositoryOptions, password string) error

Initialize creates initial repository data structures in the specified storage with given credentials.

func RecoverFormatBlock

func RecoverFormatBlock(ctx context.Context, st storage.Storage, filename string, optionalLength int64) ([]byte, error)

RecoverFormatBlock attempts to recover format block replica from the specified file. The format block can be either the prefix or a suffix of the given file. optionally the length can be provided (if known) to speed up recovery.

func SetCachingConfig

func SetCachingConfig(ctx context.Context, configFile string, opt block.CachingOptions) error

SetCachingConfig changes caching configuration for a given repository config file.

Types

type ConnectOptions

type ConnectOptions struct {
	block.CachingOptions
}

ConnectOptions specifies options when persisting configuration to connect to a repository.

type LocalConfig

type LocalConfig struct {
	Storage storage.ConnectionInfo `json:"storage"`
	Caching block.CachingOptions   `json:"caching"`
}

LocalConfig is a configuration of Kopia stored in a configuration file.

func (*LocalConfig) Load

func (lc *LocalConfig) Load(r io.Reader) error

Load reads local configuration from the specified reader.

func (*LocalConfig) Save

func (lc *LocalConfig) Save(w io.Writer) error

Save writes the configuration to the specified writer.

type NewRepositoryOptions

type NewRepositoryOptions struct {
	UniqueID     []byte // force the use of particular unique ID
	BlockFormat  block.FormattingOptions
	DisableHMAC  bool
	ObjectFormat object.Format // object format
}

NewRepositoryOptions specifies options that apply to newly created repositories. All fields are optional, when not provided, reasonable defaults will be used.

type Options

type Options struct {
	TraceStorage         func(f string, args ...interface{}) // Logs all storage access using provided Printf-style function
	ObjectManagerOptions object.ManagerOptions
}

Options provides configuration parameters for connection to a repository.

type Repository

type Repository struct {
	Blocks    *block.Manager
	Objects   *object.Manager
	Storage   storage.Storage
	Manifests *manifest.Manager
	UniqueID  []byte

	ConfigFile     string
	CacheDirectory string
	// contains filtered or unexported fields
}

Repository represents storage where both content-addressable and user-addressable data is kept.

func Open

func Open(ctx context.Context, configFile string, password string, options *Options) (rep *Repository, err error)

Open opens a Repository specified in the configuration file.

func OpenWithConfig

func OpenWithConfig(ctx context.Context, st storage.Storage, lc *LocalConfig, password string, options *Options, caching block.CachingOptions) (*Repository, error)

OpenWithConfig opens the repository with a given configuration, avoiding the need for a config file.

func (*Repository) Close

func (r *Repository) Close(ctx context.Context) error

Close closes the repository and releases all resources.

func (*Repository) Flush

func (r *Repository) Flush(ctx context.Context) error

Flush waits for all in-flight writes to complete.

func (*Repository) Refresh

func (r *Repository) Refresh(ctx context.Context) error

Refresh periodically makes external changes visible to repository.

func (*Repository) RefreshPeriodically

func (r *Repository) RefreshPeriodically(ctx context.Context, interval time.Duration)

RefreshPeriodically periodically refreshes the repository to reflect the changes made by other hosts.

func (*Repository) Upgrade

func (r *Repository) Upgrade(ctx context.Context) error

Upgrade upgrades repository data structures to the latest version.

Directories

Path Synopsis
Package block implements repository support content-addressable storage blocks.
Package block implements repository support content-addressable storage blocks.
examples
upload_download
Command repository_api demonstrates the use of Kopia's Repository API.
Command repository_api demonstrates the use of Kopia's Repository API.
internal
repologging
Package repologging provides loggers.
Package repologging provides loggers.
repotesting
Package repotesting contains test utilities for working with repositories.
Package repotesting contains test utilities for working with repositories.
retry
Package retry implements exponential retry policy.
Package retry implements exponential retry policy.
storagetesting
Package storagetesting is used for testing Storage implementations.
Package storagetesting is used for testing Storage implementations.
Package manifest implements support for managing JSON-based manifests in repository.
Package manifest implements support for managing JSON-based manifests in repository.
Package object implements repository support for content-addressable objects of arbitrary size.
Package object implements repository support for content-addressable objects of arbitrary size.
Package storage implements simple storage of immutable, unstructured binary large objects (BLOBs).
Package storage implements simple storage of immutable, unstructured binary large objects (BLOBs).
filesystem
Package filesystem implements filesystem-based Storage.
Package filesystem implements filesystem-based Storage.
gcs
Package gcs implements Storage based on Google Cloud Storage bucket.
Package gcs implements Storage based on Google Cloud Storage bucket.
logging
Package logging implements wrapper around Storage that logs all activity.
Package logging implements wrapper around Storage that logs all activity.
providers
Package providers registers all storage providers that are included as part of Kopia.
Package providers registers all storage providers that are included as part of Kopia.
s3
Package s3 implements Storage based on an S3 bucket.
Package s3 implements Storage based on an S3 bucket.
webdav
Package webdav implements WebDAV-based Storage.
Package webdav implements WebDAV-based Storage.
tests

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL