chunky

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 12, 2025 License: MIT Imports: 23 Imported by: 0

README

Chunky

Efficiently store versioned data.

Chunky was built to ship code and binaries to remote servers. Then once on those remote servers, Chunky helps you quickly swap versions.

Chunky is like if rsync and git had a baby.

Chunky uses content-defined-chunking (CDC) powered by Restic's chunker library to efficiently store data on disk. When you upload new versions, only the files that have changed will be uploaded. Preliminary estimates suggest that repos are about half the size of the original codebase, while storing every version!

Status: Early release. It's working and I'm using it in production, but it lacks tests and at this stage I mostly optimized the binary files for understandability. I think there's a more efficient way to link and store commits and packs. I'd encourage you to look for ways to make Chunky more efficient!

Examples

Upload a directory
$ chunky upload . vagrant@127.0.0.1:2222/my-repo

This example uploads the entire current directory to the vagrant@127.0.0.1:2222/my-repo remote repo.

This will create a directory my-repo in the $HOME/vagrant directory on your remote machine.

List all versions
$ chunky list vagrant@127.0.0.1:2222/my-repo
20241105033915 (latest) 69kB Matt Mueller 39 minutes ago
20241105033613          69kB Matt Mueller 42 minutes ago
20241105033612          69kB Matt Mueller 42 minutes ago
20241105033611          69kB Matt Mueller 42 minutes ago
20241105033610 (v0.0.0) 69kB Matt Mueller 42 minutes ago
20241105033609          69kB Matt Mueller 42 minutes ago
20241105033607          69kB Matt Mueller 42 minutes ago
20241105032915          68kB Matt Mueller 49 minutes ago

This command lists all the versions on the vagrant@127.0.0.1:2222/my-repo remote repo.

Tag a revision
$ chunky tag vagrant@127.0.0.1:2222/my-repo 20241105033612 v0.0.1

This command tags the 20241105033612 commit with the v0.0.1 tag

$ chunky list vagrant@127.0.0.1:2222/my-repo
20241105033915 (latest) 69kB Matt Mueller 39 minutes ago
20241105033613          69kB Matt Mueller 42 minutes ago
20241105033612 (v0.0.1) 69kB Matt Mueller 42 minutes ago
20241105033611          69kB Matt Mueller 42 minutes ago
20241105033610 (v0.0.0) 69kB Matt Mueller 42 minutes ago
20241105033609          69kB Matt Mueller 42 minutes ago
20241105033607          69kB Matt Mueller 42 minutes ago
20241105032915          68kB Matt Mueller 49 minutes ago
Download a version
$ chunky download vagrant@127.0.0.1:2222/my-repo v0.0.1 my-repo-v1

This command downloads the v0.0.1 revision from the vagrant@127.0.0.1:2222/my-repo remote repo into the my-repo-v1 directory.

By default this overwrites any existing file, but will not delete local files that no longer exist in the remote repository. To fully sync my-repo-v1 with the remote repository, include the --sync flag.

$ chunky download --sync vagrant@127.0.0.1:2222/my-repo v0.0.1 my-repo-v1

Usage

Chunky ships with a CLI and programmatic API.

CLI

You can install the CLI using Go with:

go install github.com/matthewmueller/chunky/cmd/chunky@latest

If that succeeds, you should be able to type chunky and see the help menu below.

$ chunky

  Usage:
    $ chunky [command]

  Description:
    efficiently store versioned data

  Commands:
    cat       show a file
    create    create a new repository
    download  download a directory from a repository
    list      list repository
    show      show a revision
    tag       tag a commit
    upload    upload a directory to a repository

  Advanced Commands:
    cat-commit  show a commit
    cat-pack    show a pack
    cat-tag     show a tag
    clean       clean a repository and local cache

API

Go Reference

Chunky also includes a programmatic API. For now, that's undocumented, but it should be fairly straightforward to understand by looking in the ./cli directory.

You can also review the documentation on go.dev.

Similar Tools

  • Git: You need to use a Git extension to store large binaries and overall requires too much ceremony when you just want to sync a directory on a remote machine. You may also want to sync files in your .gitignore to production servers (e.g. compiled assets).
  • Restic: An excellent file backup tool. Chunky took a lot of design inspiration from Restic. Restic doesn't have a programmatic API and the backups are encrypted, which is not helpful in a server-side setting.
  • Rsync: Great for syncing files to a remote machine, but does not store older versions of those files.

Adding New Repositories

Chunky currently supports two repository backends:

  1. Local: Store your repository in your local filesystem
  2. SFTP: Store your repository on a remote server

I'd encourage you to contribute new repository backends to Chunky. The interface is quite straightforward to implement:

type Repo interface {
	// Unique key that identifies the repository (used as a cache key)
	Key() string
	// Upload from a filesystem to the repository
	Upload(ctx context.Context, from fs.FS) error
	// Download paths from the repository to a filesystem
	Download(ctx context.Context, to repos.FS, paths ...string) error
	// Walk the repository
	Walk(ctx context.Context, dir string, fn fs.WalkDirFunc) error
	// Close the repository
	Close() error
}

Development

First, clone the repo:

git clone https://github.com/matthewmueller/chunky
cd chunky

Next, install dependencies:

go mod tidy

Finally, try running the tests:

go test ./...

License

MIT

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

func New

func New(log *slog.Logger) *Client

func (*Client) Download

func (c *Client) Download(ctx context.Context, in *Download) error

Download a directory from a repository at a specific revision

func (*Client) FindCommit added in v0.1.0

func (c *Client) FindCommit(ctx context.Context, in *FindCommit) (*Commit, error)

FindCommit finds a commit by a revision

func (*Client) ListTags added in v0.1.2

func (c *Client) ListTags(ctx context.Context, in *ListTags) (allTags []*Tag, err error)

func (*Client) TagRevision added in v0.1.2

func (c *Client) TagRevision(ctx context.Context, in *TagRevision) error

TagRevision tags a revision

func (*Client) Upload

func (c *Client) Upload(ctx context.Context, in *Upload) error

Upload a directory to a repository

type Commit added in v0.1.0

type Commit = commits.Commit

Commit represents a commit

type Download

type Download struct {
	From     repos.Repo
	To       repos.FS
	Revision string

	// MaxCacheSize is the maximum size of the LRU for caching packs (default: 512MiB)
	MaxCacheSize string

	// LimitDownload is the maximum download speed per second (default: unlimited)
	LimitDownload string

	// Concurrency is the number of concurrent downloads (default: num cpus * 2)
	Concurrency *int
	// contains filtered or unexported fields
}

type FindCommit added in v0.1.0

type FindCommit struct {
	Repo     repos.Repo
	Revision string
}

type ListTags added in v0.1.2

type ListTags struct {
	Repo repos.Repo
}

type Tag added in v0.1.2

type Tag struct {
	Name    string
	Commits []string
}

type TagRevision added in v0.1.2

type TagRevision struct {
	Repo     repos.Repo
	Tag      string
	Revision string
}

type Upload

type Upload struct {
	From   fs.FS
	To     repos.Repo
	Cache  repos.FS
	User   string
	Tags   []string
	Ignore func(path string) bool

	// MaxPackSize is the maximum pack size (default: 32MiB)
	MaxPackSize string

	// MinChunkSize is the minimum chunk size (default: 512KiB)
	MinChunkSize string

	// MaxChunkSize is the maximum chunk size (default: 8MiB)
	MaxChunkSize string

	// LimitUpload is the maximum upload rate (default: unlimited)
	LimitUpload string

	// Concurrency is the number of files to upload concurrently (default: num cpus * 2)
	Concurrency *int
	// contains filtered or unexported fields
}

Directories

Path Synopsis
cmd
internal
cli
humanize
Package humanize Based on: https://github.com/dustin/go-humanize/blob/master/bytes.go
Package humanize Based on: https://github.com/dustin/go-humanize/blob/master/bytes.go
lru
singleflight
Package singleflight provides a duplicate function call suppression mechanism.
Package singleflight provides a duplicate function call suppression mechanism.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL