splitfs
A filesystem that chunks files up.
What does it do?
splitfs
takes a directory and mirrors it on another. However, all files within that directory will be presented as directories themselves, containing one or more "chunk files" which each correspond to a portion of the original file.
For example:
$ tree /testdata
/testdata
├── 10KB. (size=10KiB)
├── 20KB.data (size=20KiB)
├── Andy_Mabbett_-_RSC_-_How_to_Edit_Wikipedia_-_01_-_italic_bold.webm (size=7.5M)
├── file-sources.txt (size=262 bytes)
├── Flower-300x300_dtf.jpg (size=168K)
└── subdir
├── 20KB.symlink -> ../20KB.data
└── 40KB.data (size=40KiB)
1 directory, 7 files
$ splitfs --chunk_size=10KiB ./testdata /mnt
$ tree /mnt
/mnt
├── 10KB.data
│ └── 71da741724c3f289_00000001_of_00000001.splitfs.chunk
├── 20KB.data
│ ├── 1b15a91efc959b04_00000001_of_00000002.splitfs.chunk
│ └── 1b15a91efc959b04_00000002_of_00000002.splitfs.chunk
├── Andy_Mabbett_-_RSC_-_How_to_Edit_Wikipedia_-_01_-_italic_bold.webm
│ ├── 764cc4cecb7e72ce_00000001_of_00000765.splitfs.chunk
│ ├── 764cc4cecb7e72ce_00000002_of_00000765.splitfs.chunk
│ │ ...
│ ├── 764cc4cecb7e72ce_00000764_of_00000765.splitfs.chunk
│ └── 764cc4cecb7e72ce_00000765_of_00000765.splitfs.chunk
├── file-sources.txt
│ └── 74c420ec46a3845a_00000001_of_00000001.splitfs.chunk
├── Flower-300x300_dtf.jpg
│ ├── efa4bffab14f7017_00000001_of_00000017.splitfs.chunk
│ ├── efa4bffab14f7017_00000002_of_00000017.splitfs.chunk
│ │ ...
│ ├── efa4bffab14f7017_00000016_of_00000017.splitfs.chunk
│ └── efa4bffab14f7017_00000017_of_00000017.splitfs.chunk
└── subdir
├── 20KB.symlink -> ../20KB.data
└── 40KB.data
├── 36d928335f3367da_00000001_of_00000004.splitfs.chunk
├── 36d928335f3367da_00000002_of_00000004.splitfs.chunk
├── 36d928335f3367da_00000003_of_00000004.splitfs.chunk
└── 36d928335f3367da_00000004_of_00000004.splitfs.chunk
8 directories, 790 files
Note: The chunked filesystem is read-only.
Why?
Think of it as a filesystem-wide split(1)
. Some use cases:
- You want to do more efficient backups of large sparse files for which only some random parts change at a time.
- You want to back up large files to a service whose API only lets you upload one file at a time in a non-resumable fashion.
- You want more efficient redundant copy detection for append-only files (you need to turn off total chunk counts and mtimes in filenames for this).
- You want to
split
a lot of files in a large directory structure but don't want to try hacking up a recursive shell loop to do it.
Downloading and building from source
Because go get
uses https
to download Git repositories, while perot.me
only serves them over the git://
protocol, you have to manually fetch the repository in the right place.
# (if you haven't defined `GOPATH`, Go defaults to `GOPATH=~/go`)
$ export GOPATH="$HOME/go"
$ mkdir -p "$GOPATH/src/perot.me"
$ git clone git://perot.me/splitfs "$GOPATH/src/perot.me/splitfs"
$ go get -v perot.me/splitfs
$ go build perot.me/splitfs
$ ./splitfs
Usage of splitfs:
splitfs [options] <source directory> <target mountpoint>
[...]
Usage
splitfs [flags] <source_directory> <mountpoint>
Flags
chunk_size
: The size of each chunk. Must be suffixed by a unit (B
, KiB
, MiB
, GiB
, TiB
). Default is 32MiB
.
exclude_regexp
: If specified, files with their full path (rooted at the source directory) match this regular expressions will show up as regular files in the mountpoint, rather than getting chunked.
filename_hash
: Algorithm for filename hashes in chunked filenames.
filename_includes_total_chunks
: Controls whether or not chunk filenames will contain the total number of chunks of the overall file.
filename_includes_mtime
: Controls whether or not chunk filenames will contain the mtime of the overall file.
How do I get my files back from chunks?
For a one-off, just use cat
:
$ cat /mnt/Flower-300x300_dtf.jpg/*.splitfs.chunk > /tmp/reconstituted.jpg
$ sha1sum /testdata/Flower-300x300_dtf.jpg /tmp/reconstituted.jpg
43e31dc3b3c541cf266d678b4309f73ca4d12cb6 /testdata/Flower-300x300_dtf.jpg
43e31dc3b3c541cf266d678b4309f73ca4d12cb6 /tmp/reconstituted.jpg
For more than just a one-off, wait until I implement unsplitfs
, or do it yourself and send a pull request.