archives

package module
v0.0.0-...-ab120f0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 21, 2024 License: MIT Imports: 31 Imported by: 1

README

archives Go Reference Linux Mac Windows

Introducing mholt/archives - a cross-platform, multi-format Go library for working with archives and compression formats with a unified API and as virtual file systems compatible with io/fs.

Features

  • Stream-oriented APIs
  • Automatically identify archive and compression formats:
    • By file name
    • By stream peeking (headers)
  • Traverse directories, archives, and other files uniformly as io/fs file systems:
  • Compress and decompress files
  • Create and extract archive files
  • Walk or traverse into archive files
  • Extract only specific files from archives
  • Insert into (append to) .tar and .zip archives without re-creating entire archive
  • Numerous archive and compression formats supported
  • Read from password-protected 7-Zip and RAR files
  • Extensible (add more formats just by registering them)
  • Cross-platform, static binary
  • Pure Go (no cgo)
  • Multithreaded Gzip
  • Adjustable compression levels
  • Super-fast Snappy implementation (via S2)
Supported compression formats
  • brotli (.br)
  • bzip2 (.bz2)
  • flate (.zip)
  • gzip (.gz)
  • lz4 (.lz4)
  • lzip (.lz)
  • snappy (.sz) and S2 (.s2)
  • xz (.xz)
  • zlib (.zz)
  • zstandard (.zst)
Supported archive formats
  • .zip
  • .tar (including any compressed variants like .tar.gz)
  • .rar (read-only)
  • .7z (read-only)

Library use

$ go get github.com/mholt/archives
Create archive

Creating archives can be done entirely without needing a real disk or storage device. All you need is a list of FileInfo structs, which can be implemented without a real file system.

However, creating archives from a disk is very common, so you can use the FilesFromDisk() function to help you map filenames on disk to their paths in the archive.

In this example, we add 4 files and a directory (which includes its contents recursively) to a .tar.gz file:

ctx := context.TODO()

// map files on disk to their paths in the archive using default settings (second arg)
files, err := archives.FilesFromDisk(ctx, nil, map[string]string{
	"/path/on/disk/file1.txt": "file1.txt",
	"/path/on/disk/file2.txt": "subfolder/file2.txt",
	"/path/on/disk/file3.txt": "",              // put in root of archive as file3.txt
	"/path/on/disk/file4.txt": "subfolder/",    // put in subfolder as file4.txt
	"/path/on/disk/folder":    "Custom Folder", // contents added recursively
})
if err != nil {
	return err
}

// create the output file we'll write to
out, err := os.Create("example.tar.gz")
if err != nil {
	return err
}
defer out.Close()

// we can use the Archive type to gzip a tarball
// (compression is not required; you could use Tar directly)
format := archives.Archive{
	Compression: archives.Gz{},
	Archival:    archives.Tar{},
}

// create the archive
err = format.Archive(ctx, out, files)
if err != nil {
	return err
}
Extract archive

Extracting an archive, extracting from an archive, and walking an archive are all the same function.

Simply use your format type (e.g. Zip) to call Extract(). You'll pass in a context (for cancellation), the input stream, and a callback function to handle each file.

// the type that will be used to read the input stream
var format archives.Zip

err := format.Extract(ctx, input, func(ctx context.Context, f archives.FileInfo) error {
	// do something with the file here; or, if you only want a specific file or directory,
	// just return until you come across the desired f.NameInArchive value(s)
	return nil
})
if err != nil {
	return err
}
Identifying formats

When you have an input stream with unknown contents, this package can identify it for you. It will try matching based on filename and/or the header (which peeks at the stream):

// unless your stream is an io.Seeker, use the returned stream value to
// ensure you re-read the bytes consumed during Identify()
format, stream, err := archives.Identify(ctx, "filename.tar.zst", stream)
if err != nil {
	return err
}

// you can now type-assert format to whatever you need

// want to extract something?
if ex, ok := format.(archives.Extractor); ok {
	// ... proceed to extract
}

// or maybe it's compressed and you want to decompress it?
if decomp, ok := format.(archives.Decompressor); ok {
	rc, err := decomp.OpenReader(unknownFile)
	if err != nil {
		return err
	}
	defer rc.Close()

	// read from rc to get decompressed data
}

Identify() works by reading an arbitrary number of bytes from the beginning of the stream (just enough to check for file headers). It buffers them and returns a new reader that lets you re-read them anew. If your input stream is io.Seeker however, no buffer is created as it uses Seek() instead, and the returned stream is the same as the input.

Virtual file systems

This is my favorite feature.

Let's say you have a directory on disk, an archive, a compressed archive, any other regular file, or a stream of any of the above! You don't really care; you just want to use it uniformly no matter what it is.

Simply create a file system:

// filename could be:
// - a folder ("/home/you/Desktop")
// - an archive ("example.zip")
// - a compressed archive ("example.tar.gz")
// - a regular file ("example.txt")
// - a compressed regular file ("example.txt.gz")
// and/or the last argument could be a stream of any of the above
fsys, err := archives.FileSystem(ctx, filename, nil)
if err != nil {
	return err
}

This is a fully-featured fs.FS, so you can open files and read directories, no matter what kind of file the input was.

For example, to open a specific file:

f, err := fsys.Open("file")
if err != nil {
	return err
}
defer f.Close()

If you opened a regular file or archive, you can read from it. If it's a compressed file, reads are automatically decompressed.

If you opened a directory (either real or in an archive), you can list its contents:

if dir, ok := f.(fs.ReadDirFile); ok {
	// 0 gets all entries, but you can pass > 0 to paginate
	entries, err := dir.ReadDir(0)
	if err != nil {
		return err
	}
	for _, e := range entries {
		fmt.Println(e.Extension())
	}
}

Or get a directory listing this way:

entries, err := fsys.ReadDir("Playlists")
if err != nil {
	return err
}
for _, e := range entries {
	fmt.Println(e.Extension())
}

Or maybe you want to walk all or part of the file system, but skip a folder named .git:

err := fs.WalkDir(fsys, ".", func(path string, d fs.DirEntry, err error) error {
	if err != nil {
		return err
	}
	if path == ".git" {
		return fs.SkipDir
	}
	fmt.Println("Walking:", path, "Dir?", d.IsDir())
	return nil
})
if err != nil {
	return err
}

The archives package lets you do it all.

Important .tar note: Tar files do not efficiently implement file system semantics due to their roots in sequential-access design for tapes. File systems inherently assume random access, but tar files need to be read from the beginning to access something at the end. This is especially slow when the archive is compressed. Optimizations have been implemented to amortize ReadDir() calls so that fs.WalkDir() only has to scan the archive once, but they use more memory. Open calls require another scan to find the file. It may be more efficient to use Tar.Extract() directly if file system semantics are not important to you.

Use with http.FileServer

It can be used with http.FileServer to browse archives and directories in a browser. However, due to how http.FileServer works, don't directly use http.FileServer with compressed files; instead wrap it like following:

fileServer := http.FileServer(http.FS(archiveFS))
http.HandleFunc("/", func(writer http.ResponseWriter, request *http.Request) {
	// disable range request
	writer.Header().Set("Accept-Ranges", "none")
	request.Header.Del("Range")
	
	// disable content-type sniffing
	ctype := mime.TypeByExtension(filepath.Ext(request.URL.Path))
	writer.Header()["Content-Type"] = nil
	if ctype != "" {
		writer.Header().Set("Content-Type", ctype)
	}
	fileServer.ServeHTTP(writer, request)
})

http.FileServer will try to sniff the Content-Type by default if it can't be inferred from file name. To do this, the http package will try to read from the file and then Seek back to file start, which the libray can't achieve currently. The same goes with Range requests. Seeking in archives is not currently supported by this package due to limitations in dependencies.

If Content-Type is desirable, you can register it yourself.

Compress data

Compression formats let you open writers to compress data:

// wrap underlying writer w
compressor, err := archives.Zstd{}.OpenWriter(w)
if err != nil {
	return err
}
defer compressor.Close()

// writes to compressor will be compressed
Decompress data

Similarly, compression formats let you open readers to decompress data:

// wrap underlying reader r
decompressor, err := archives.Snappy{}.OpenReader(r)
if err != nil {
	return err
}
defer decompressor.Close()

// reads from decompressor will be decompressed
Append to tarball and zip archives

Tar and Zip archives can be appended to without creating a whole new archive by calling Insert() on a tar or zip stream. However, for tarballs, this requires that the tarball is not compressed (due to complexities with modifying compression dictionaries).

Here is an example that appends a file to a tarball on disk:

tarball, err := os.OpenFile("example.tar", os.O_RDWR, 0644)
if err != nil {
	return err
}
defer tarball.Close()

// prepare a text file for the root of the archive
files, err := archives.FilesFromDisk(nil, map[string]string{
	"/home/you/lastminute.txt": "",
})

err := archives.Tar{}.Insert(context.Background(), tarball, files)
if err != nil {
	return err
}

The code is similar for inserting into a Zip archive, except you'll call Insert() on a Zip{} value instead.

Documentation

Index

Constants

View Source
const (
	ZipMethodBzip2 = 12
	// TODO: LZMA: Disabled - because 7z isn't able to unpack ZIP+LZMA ZIP+LZMA2 archives made this way - and vice versa.
	// ZipMethodLzma     = 14
	ZipMethodZstd = 93
	ZipMethodXz   = 95
)

Additional compression methods not offered by archive/zip. See https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT section 4.4.5.

Variables

View Source
var NoMatch = fmt.Errorf("no formats matched")

NoMatch is a special error returned if there are no matching formats.

Functions

func FileSystem

func FileSystem(ctx context.Context, filename string, stream ReaderAtSeeker) (fs.FS, error)

FileSystem identifies the format of the input and returns a read-only file system. The input can be a filename, stream, or both.

If only a filename is specified, it may be a path to a directory, archive file, compressed archive file, compressed regular file, or any other regular file on disk. If the filename is a directory, its contents are accessed directly from the device's file system. If the filename is an archive file, the contents can be accessed like a normal directory; compressed archive files are transparently decompressed as contents are accessed. And if the filename is any other file, it is the only file in the returned file system; if the file is compressed, it is transparently decompressed when read from.

If a stream is specified, the filename (if available) is used as a hint to help identify its format. Streams of archive files must be able to be made into an io.SectionReader (for safe concurrency) which requires io.ReaderAt and io.Seeker (to efficiently determine size). The automatic format identification requires io.Reader and will use io.Seeker if supported to avoid buffering.

Whether the data comes from disk or a stream, it is peeked at to automatically detect which format to use.

This function essentially offers uniform read access to various kinds of files: directories, archives, compressed archives, individual files, and file streams are all treated the same way.

NOTE: The performance of compressed tar archives is not great due to overhead with decompression. However, the fs.WalkDir() use case has been optimized to create an index on first call to ReadDir().

func RegisterFormat

func RegisterFormat(format Format)

RegisterFormat registers a format. It should be called during init. Duplicate formats by name are not allowed and will panic.

func TopDirOpen

func TopDirOpen(fsys fs.FS, name string) (fs.File, error)

TopDirOpen is a special Open() function that may be useful if a file system root was created by extracting an archive.

It first tries the file name as given, but if that returns an error, it tries the name without the first element of the path. In other words, if "a/b/c" returns an error, then "b/c" will be tried instead.

Consider an archive that contains a file "a/b/c". When the archive is extracted, the contents may be created without a new parent/root folder to contain them, and the path of the same file outside the archive may be lacking an exclusive root or parent container. Thus it is likely for a file system created for the same files extracted to disk to be rooted at one of the top-level files/folders from the archive instead of a parent folder. For example, the file known as "a/b/c" when rooted at the archive becomes "b/c" after extraction when rooted at "a" on disk (because no new, exclusive top-level folder was created). This difference in paths can make it difficult to use archives and directories uniformly. Hence these TopDir* functions which attempt to smooth over the difference.

Some extraction utilities do create a container folder for archive contents when extracting, in which case the user may give that path as the root. In that case, these TopDir* functions are not necessary (but aren't harmful either). They are primarily useful if you are not sure whether the root is an archive file or is an extracted archive file, as they will work with the same filename/path inputs regardless of the presence of a top-level directory.

func TopDirReadDir

func TopDirReadDir(fsys fs.FS, name string) ([]fs.DirEntry, error)

TopDirReadDir is like TopDirOpen but for ReadDir.

func TopDirStat

func TopDirStat(fsys fs.FS, name string) (fs.FileInfo, error)

TopDirStat is like TopDirOpen but for Stat.

Types

type Archival

type Archival interface {
	Format
	Archiver
	Extractor
}

Archival is an archival format that can create/write archives.

type ArchiveAsyncJob

type ArchiveAsyncJob struct {
	File   FileInfo
	Result chan<- error
}

ArchiveAsyncJob contains a File to be archived and a channel that the result of the archiving should be returned on. EXPERIMENTAL: Subject to change or removal.

type ArchiveFS

type ArchiveFS struct {
	// set one of these
	Path   string            // path to the archive file on disk, or...
	Stream *io.SectionReader // ...stream from which to read archive

	Format  Extractor       // the archive format
	Prefix  string          // optional subdirectory in which to root the fs
	Context context.Context // optional; mainly for cancellation
	// contains filtered or unexported fields
}

ArchiveFS allows reading an archive (or a compressed archive) using a consistent file system interface. Essentially, it allows traversal and reading of archive contents the same way as any normal directory on disk. The contents of compressed archives are transparently decompressed.

A valid ArchiveFS value must set either Path or Stream, but not both. If Path is set, a literal file will be opened from the disk. If Stream is set, new SectionReaders will be implicitly created to access the stream, enabling safe, concurrent access.

NOTE: Due to Go's file system APIs (see package io/fs), the performance of ArchiveFS can suffer when using fs.WalkDir(). To mitigate this, an optimized fs.ReadDirFS has been implemented that indexes the entire archive on the first call to ReadDir() (since the entire archive needs to be walked for every call to ReadDir() anyway, as archive contents are often unordered). The first call to ReadDir(), i.e. near the start of the walk, will be slow for large archives, but should be instantaneous after. If you don't care about walking a file system in directory order, consider calling Extract() on the underlying archive format type directly, which walks the archive in entry order, without needing to do any sorting.

Note that fs.FS implementations, including this one, reject paths starting with "./". This can be problematic sometimes, as it is not uncommon for tarballs to contain a top-level/root directory literally named ".", which can happen if a tarball is created in the same directory it is archiving. The underlying Extract() calls are faithful to entries with this name, but file systems have certain semantics around "." that restrict its use. For example, a file named "." cannot be created on a real file system because it is a special name that means "current directory".

We had to decide whether to honor the true name in the archive, or honor file system semantics. Given that this is a virtual file system and other code using the fs.FS APIs will trip over a literal directory named ".", we choose to honor file system semantics. Files named "." are ignored; directories with this name are effectively transparent; their contents get promoted up a directory/level. This means a file at "./x" where "." is a literal directory name, its name will be passed in as "x" in WalkDir callbacks. If you need the raw, uninterpeted values from an archive, use the formats' Extract() method directly. See https://github.com/golang/go/issues/70155 for a little more background.

This does have one negative edge case... a tar containing contents like [x . ./x] will have a conflict on the file named "x" because "./x" will also be accessed with the name of "x".

func (ArchiveFS) Open

func (f ArchiveFS) Open(name string) (fs.File, error)

Open opens the named file from within the archive. If name is "." then the archive file itself will be opened as a directory file.

func (*ArchiveFS) ReadDir

func (f *ArchiveFS) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir reads the named directory from within the archive. If name is "." then the root of the archive content is listed.

func (ArchiveFS) Stat

func (f ArchiveFS) Stat(name string) (fs.FileInfo, error)

Stat stats the named file from within the archive. If name is "." then the archive file itself is statted and treated as a directory file.

func (*ArchiveFS) Sub

func (f *ArchiveFS) Sub(dir string) (fs.FS, error)

Sub returns an FS corresponding to the subtree rooted at dir.

type Archiver

type Archiver interface {
	// Archive writes an archive file to output with the given files.
	//
	// Context cancellation must be honored.
	Archive(ctx context.Context, output io.Writer, files []FileInfo) error
}

Archiver can create a new archive.

type ArchiverAsync

type ArchiverAsync interface {
	Archiver

	// Use ArchiveAsync if you can't pre-assemble a list of all
	// the files for the archive. Close the jobs channel after
	// all the files have been sent.
	//
	// This won't return until the channel is closed.
	ArchiveAsync(ctx context.Context, output io.Writer, jobs <-chan ArchiveAsyncJob) error
}

ArchiverAsync is an Archiver that can also create archives asynchronously by pumping files into a channel as they are discovered. EXPERIMENTAL: Subject to change or removal.

type Brotli

type Brotli struct {
	Quality int
}

Brotli facilitates brotli compression.

func (Brotli) Extension

func (Brotli) Extension() string

func (Brotli) Match

func (br Brotli) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Brotli) MediaType

func (Brotli) MediaType() string

func (Brotli) OpenReader

func (Brotli) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Brotli) OpenWriter

func (br Brotli) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Bz2

type Bz2 struct {
	CompressionLevel int
}

Bz2 facilitates bzip2 compression.

func (Bz2) Extension

func (Bz2) Extension() string

func (Bz2) Match

func (bz Bz2) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Bz2) MediaType

func (Bz2) MediaType() string

func (Bz2) OpenReader

func (Bz2) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Bz2) OpenWriter

func (bz Bz2) OpenWriter(w io.Writer) (io.WriteCloser, error)

type CompressedArchive

type CompressedArchive struct {
	Archival
	Extraction
	Compression
}

CompressedArchive represents an archive which is compressed externally (for example, a gzipped tar file, .tar.gz.) It combines a compression format on top of an archival/extraction format and provides both functionalities in a single type, allowing archival and extraction operations transparently through compression and decompression. However, compressed archives have some limitations; for example, files cannot be inserted/appended because of complexities with modifying existing compression state (perhaps this could be overcome, but I'm not about to try it).

func (CompressedArchive) Archive

func (ca CompressedArchive) Archive(ctx context.Context, output io.Writer, files []FileInfo) error

Archive writes an archive to the output stream while compressing the result.

func (CompressedArchive) ArchiveAsync

func (ca CompressedArchive) ArchiveAsync(ctx context.Context, output io.Writer, jobs <-chan ArchiveAsyncJob) error

ArchiveAsync adds files to the output archive while compressing the result asynchronously.

func (CompressedArchive) Extension

func (ca CompressedArchive) Extension() string

Name returns a concatenation of the archive and compression format extensions.

func (CompressedArchive) Extract

func (ca CompressedArchive) Extract(ctx context.Context, sourceArchive io.Reader, handleFile FileHandler) error

Extract reads files out of a compressed archive while decompressing the results.

func (CompressedArchive) Match

func (ca CompressedArchive) Match(ctx context.Context, filename string, stream io.Reader) (MatchResult, error)

Match matches if the input matches both the compression and archival/extraction format.

func (CompressedArchive) MediaType

func (ca CompressedArchive) MediaType() string

MediaType returns the compression format's MIME type, since a compressed archive is fundamentally a compressed file.

type Compression

type Compression interface {
	Format
	Compressor
	Decompressor
}

Compression is a compression format with both compress and decompress methods.

type Compressor

type Compressor interface {
	// OpenWriter wraps w with a new writer that compresses what is written.
	// The writer must be closed when writing is finished.
	OpenWriter(w io.Writer) (io.WriteCloser, error)
}

Compressor can compress data by wrapping a writer.

type Decompressor

type Decompressor interface {
	// OpenReader wraps r with a new reader that decompresses what is read.
	// The reader must be closed when reading is finished.
	OpenReader(r io.Reader) (io.ReadCloser, error)
}

Decompressor can decompress data by wrapping a reader.

type DeepFS

type DeepFS struct {
	// The root filepath on disk.
	Root string

	// An optional context, mainly for cancellation.
	Context context.Context
	// contains filtered or unexported fields
}

DeepFS is a fs.FS that represents the real file system, but also has the ability to traverse into archive files as if they were part of the regular file system. If a filename component ends with an archive extension (e.g. .zip, .tar, .tar.gz, etc.), then the remainder of the filepath will be considered to be inside that archive.

This allows treating archive files transparently as if they were part of the regular file system during a walk, which can be extremely useful for accessing data in an "ordinary" walk of the disk, without needing to first extract all the archives and use more disk space.

The listing of archive entries is retained for the lifetime of the DeepFS value for efficiency, but this can use more memory if archives contain a lot of files.

func (*DeepFS) Open

func (fsys *DeepFS) Open(name string) (fs.File, error)

func (*DeepFS) ReadDir

func (fsys *DeepFS) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir returns the directory listing for the given directory name, but for any entries that appear by their file extension to be archive files, they are slightly modified to always return true for IsDir(), since we have the unique ability to list the contents of archives as if they were directories.

func (*DeepFS) Stat

func (fsys *DeepFS) Stat(name string) (fs.FileInfo, error)

type DirFS

type DirFS struct{ fs.FS }

DirFS is returned by FileSystem() if the input is a real directory on disk. It merely wraps the return value of os.DirFS(), which is (unfortunately) unexported, making it impossible to use with type assertions to determine which kind of FS was returned. Because this wrapper type is exported, it can be type-asserted against. If this type is used manually and the embedded type does not implement the same interfaces os.dirFS does, errors will occur.

func (DirFS) ReadDir

func (d DirFS) ReadDir(name string) ([]fs.DirEntry, error)

func (DirFS) ReadFile

func (d DirFS) ReadFile(name string) ([]byte, error)

func (DirFS) Stat

func (d DirFS) Stat(name string) (fs.FileInfo, error)

type Extraction

type Extraction interface {
	Format
	Extractor
}

Extraction is an archival format that extract from (read) archives.

type Extractor

type Extractor interface {
	// Extract walks entries in the archive and calls handleFile for each
	// entry in the archive.
	//
	// Any files opened in the FileHandler should be closed when it returns,
	// as there is no guarantee the files can be read outside the handler
	// or after the walk has proceeded to the next file.
	//
	// Context cancellation must be honored.
	Extract(ctx context.Context, archive io.Reader, handleFile FileHandler) error
}

Extractor can extract files from an archive.

type FileFS

type FileFS struct {
	// The path to the file on disk.
	Path string

	// If file is compressed, setting this field will
	// transparently decompress reads.
	Compression Decompressor
}

FileFS allows accessing a file on disk using a consistent file system interface. The value should be the path to a regular file, not a directory. This file will be the only entry in the file system and will be at its root. It can be accessed within the file system by the name of "." or the filename.

If the file is compressed, set the Compression field so that reads from the file will be transparently decompressed.

func (FileFS) Open

func (f FileFS) Open(name string) (fs.File, error)

Open opens the named file, which must be the file used to create the file system.

func (FileFS) ReadDir

func (f FileFS) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir returns a directory listing with the file as the singular entry.

func (FileFS) Stat

func (f FileFS) Stat(name string) (fs.FileInfo, error)

Stat stats the named file, which must be the file used to create the file system.

type FileHandler

type FileHandler func(ctx context.Context, info FileInfo) error

FileHandler is a callback function that is used to handle files as they are read from an archive; it is kind of like fs.WalkDirFunc. Handler functions that open their files must not overlap or run concurrently, as files may be read from the same sequential stream; always close the file before returning.

If the special error value fs.SkipDir is returned, the directory of the file (or the file itself if it is a directory) will not be walked. Note that because archive contents are not necessarily ordered, skipping directories requires memory, and skipping lots of directories may run up your memory bill.

Any other returned error will terminate a walk and be returned to the caller.

type FileInfo

type FileInfo struct {
	fs.FileInfo

	// The file header as used/provided by the archive format.
	// Typically, you do not need to set this field when creating
	// an archive.
	Header any

	// The path of the file as it appears in the archive.
	// This is equivalent to Header.Name (for most Header
	// types). We require it to be specified here because
	// it is such a common field and we want to preserve
	// format-agnosticism (no type assertions) for basic
	// operations.
	//
	// When extracting, this name or path may not have
	// been sanitized; it should not be trusted at face
	// value. Consider using path.Clean() before using.
	//
	// If this is blank when inserting a file into an
	// archive, the filename's base may be assumed
	// by default to be the name in the archive.
	NameInArchive string

	// For symbolic and hard links, the target of the link.
	// Not supported by all archive formats.
	LinkTarget string

	// A callback function that opens the file to read its
	// contents. The file must be closed when reading is
	// complete.
	Open func() (fs.File, error)
}

FileInfo is a virtualized, generalized file abstraction for interacting with archives.

func FilesFromDisk

func FilesFromDisk(ctx context.Context, options *FromDiskOptions, filenames map[string]string) ([]FileInfo, error)

FilesFromDisk is an opinionated function that returns a list of FileInfos by walking the directories in the filenames map. The keys are the names on disk, and the values become their associated names in the archive.

Map keys that specify directories on disk will be walked and added to the archive recursively, rooted at the named directory. They should use the platform's path separator (backslash on Windows; slash on everything else). For convenience, map keys that end in a separator ('/', or '\' on Windows) will enumerate contents only without adding the folder itself to the archive.

Map values should typically use slash ('/') as the separator regardless of the platform, as most archive formats standardize on that rune as the directory separator for filenames within an archive. For convenience, map values that are empty string are interpreted as the base name of the file (sans path) in the root of the archive; and map values that end in a slash will use the base name of the file in that folder of the archive.

File gathering will adhere to the settings specified in options.

This function is used primarily when preparing a list of files to add to an archive.

func (FileInfo) Stat

func (f FileInfo) Stat() (fs.FileInfo, error)

type Format

type Format interface {
	// Extension returns the conventional file extension for this
	// format.
	Extension() string

	// MediaType returns the MIME type ("content type") of this
	// format (see RFC 2046).
	MediaType() string

	// Match returns true if the given name/stream is recognized.
	// One of the arguments is optional: filename might be empty
	// if working with an unnamed stream, or stream might be empty
	// if only working with a file on disk; but both may also be
	// specified. The filename should consist only of the base name,
	// not path components, and is typically used for matching by
	// file extension. However, matching by reading the stream is
	// preferred as it is more accurate. Match reads only as many
	// bytes as needed to determine a match.
	Match(ctx context.Context, filename string, stream io.Reader) (MatchResult, error)
}

Format represents a way of getting data out of something else. A format usually represents compression or an archive (or both).

func Identify

func Identify(ctx context.Context, filename string, stream io.Reader) (Format, io.Reader, error)

Identify iterates the registered formats and returns the one that matches the given filename and/or stream. It is capable of identifying compressed files (.gz, .xz...), archive files (.tar, .zip...), and compressed archive files (tar.gz, tar.bz2...). The returned Format value can be type-asserted to ascertain its capabilities.

If no matching formats were found, special error NoMatch is returned.

If stream is nil then it will only match on file name and the returned io.Reader will be nil.

If stream is non-nil, it will be returned in the same read position as it was before Identify() was called, by virtue of buffering the peeked bytes. However, if the stream is an io.Seeker, Seek() must work, no extra buffering will be performed, and the original input value will be returned at the original position by seeking.

type FromDiskOptions

type FromDiskOptions struct {
	// If true, symbolic links will be dereferenced, meaning that
	// the link will not be added as a link, but what the link
	// points to will be added as a file.
	FollowSymlinks bool

	// If true, some file attributes will not be preserved.
	// Name, size, type, and permissions will still be preserved.
	ClearAttributes bool
}

FromDiskOptions specifies various options for gathering files from disk.

type Gz

type Gz struct {
	// Gzip compression level. See https://pkg.go.dev/compress/flate#pkg-constants
	// for some predefined constants. If 0, DefaultCompression is assumed rather
	// than no compression.
	CompressionLevel int

	// DisableMultistream controls whether the reader supports multistream files.
	// See https://pkg.go.dev/compress/gzip#example-Reader.Multistream
	DisableMultistream bool

	// Use a fast parallel Gzip implementation. This is only
	// effective for large streams (about 1 MB or greater).
	Multithreaded bool
}

Gz facilitates gzip compression.

func (Gz) Extension

func (Gz) Extension() string

func (Gz) Match

func (gz Gz) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Gz) MediaType

func (Gz) MediaType() string

func (Gz) OpenReader

func (gz Gz) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Gz) OpenWriter

func (gz Gz) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Inserter

type Inserter interface {
	// Insert inserts the files into archive.
	//
	// Context cancellation must be honored.
	Insert(ctx context.Context, archive io.ReadWriteSeeker, files []FileInfo) error
}

Inserter can insert files into an existing archive. EXPERIMENTAL: Subject to change.

type Lz4

type Lz4 struct {
	CompressionLevel int
}

Lz4 facilitates LZ4 compression.

func (Lz4) Extension

func (Lz4) Extension() string

func (Lz4) Match

func (lz Lz4) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Lz4) MediaType

func (Lz4) MediaType() string

func (Lz4) OpenReader

func (Lz4) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Lz4) OpenWriter

func (lz Lz4) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Lzip

type Lzip struct{}

Lzip facilitates lzip compression.

func (Lzip) Extension

func (Lzip) Extension() string

func (Lzip) Match

func (lz Lzip) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Lzip) MediaType

func (Lzip) MediaType() string

func (Lzip) OpenReader

func (Lzip) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Lzip) OpenWriter

func (Lzip) OpenWriter(w io.Writer) (io.WriteCloser, error)

type MatchResult

type MatchResult struct {
	ByName, ByStream bool
}

MatchResult returns true if the format was matched either by name, stream, or both. Name usually refers to matching by file extension, and stream usually refers to reading the first few bytes of the stream (its header). A stream match is generally stronger, as filenames are not always indicative of their contents if they even exist at all.

func (MatchResult) Matched

func (mr MatchResult) Matched() bool

Matched returns true if a match was made by either name or stream.

func (MatchResult) String

func (mr MatchResult) String() string

type Rar

type Rar struct {
	// If true, errors encountered during reading or writing
	// a file within an archive will be logged and the
	// operation will continue on remaining files.
	ContinueOnError bool

	// Password to open archives.
	Password string
}

func (Rar) Extension

func (Rar) Extension() string

func (Rar) Extract

func (r Rar) Extract(ctx context.Context, sourceArchive io.Reader, handleFile FileHandler) error

func (Rar) Match

func (r Rar) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Rar) MediaType

func (Rar) MediaType() string

type ReaderAtSeeker

type ReaderAtSeeker interface {
	io.Reader
	io.ReaderAt
	io.Seeker
}

ReaderAtSeeker is a type that can read, read at, and seek. os.File and io.SectionReader both implement this interface.

type S2

type S2 struct {
	// reader options
	MaxBlockSize           int
	AllocBlock             int
	IgnoreStreamIdentifier bool
	IgnoreCRC              bool

	// writer options
	AddIndex           bool
	Compression        S2Level
	BlockSize          int
	Concurrency        int
	FlushOnWrite       bool
	Padding            int
	SnappyIncompatible bool
}

S2 is an extension of Snappy that can read Snappy streams and write Snappy-compatible streams, but can also be configured to write Snappy-incompatible streams for greater gains. See https://pkg.go.dev/github.com/klauspost/compress/s2 for details and the documentation for each option.

type S2Level

type S2Level int

Compression level for S2 (Snappy/Sz extension). EXPERIMENTAL: May be changed or removed without a major version bump.

const (
	S2LevelNone   S2Level = 0
	S2LevelFast   S2Level = 1
	S2LevelBetter S2Level = 2
	S2LevelBest   S2Level = 3
)

Compression levels for S2. EXPERIMENTAL: May be changed or removed without a major version bump.

type SevenZip

type SevenZip struct {
	// If true, errors encountered during reading or writing
	// a file within an archive will be logged and the
	// operation will continue on remaining files.
	ContinueOnError bool

	// The password, if dealing with an encrypted archive.
	Password string
}

func (SevenZip) Extension

func (SevenZip) Extension() string

func (SevenZip) Extract

func (z SevenZip) Extract(ctx context.Context, sourceArchive io.Reader, handleFile FileHandler) error

Extract extracts files from z, implementing the Extractor interface. Uniquely, however, sourceArchive must be an io.ReaderAt and io.Seeker, which are oddly disjoint interfaces from io.Reader which is what the method signature requires. We chose this signature for the interface because we figure you can Read() from anything you can ReadAt() or Seek() with. Due to the nature of the zip archive format, if sourceArchive is not an io.Seeker and io.ReaderAt, an error is returned.

func (SevenZip) Match

func (z SevenZip) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (SevenZip) MediaType

func (SevenZip) MediaType() string

type Sz

type Sz struct {
	// Configurable S2 extension.
	S2 S2
}

Sz facilitates Snappy compression. It uses S2 for reading and writing, but by default will write Snappy-compatible data.

func (Sz) Extension

func (Sz) Extension() string

func (Sz) Match

func (sz Sz) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Sz) MediaType

func (Sz) MediaType() string

func (Sz) OpenReader

func (sz Sz) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Sz) OpenWriter

func (sz Sz) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Tar

type Tar struct {
	// If true, preserve only numeric user and group id
	NumericUIDGID bool

	// If true, errors encountered during reading or writing
	// a file within an archive will be logged and the
	// operation will continue on remaining files.
	ContinueOnError bool
}

func (Tar) Archive

func (t Tar) Archive(ctx context.Context, output io.Writer, files []FileInfo) error

func (Tar) ArchiveAsync

func (t Tar) ArchiveAsync(ctx context.Context, output io.Writer, jobs <-chan ArchiveAsyncJob) error

func (Tar) Extension

func (Tar) Extension() string

func (Tar) Extract

func (t Tar) Extract(ctx context.Context, sourceArchive io.Reader, handleFile FileHandler) error

func (Tar) Insert

func (t Tar) Insert(ctx context.Context, into io.ReadWriteSeeker, files []FileInfo) error

func (Tar) Match

func (t Tar) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Tar) MediaType

func (Tar) MediaType() string

type Xz

type Xz struct{}

Xz facilitates xz compression.

func (Xz) Extension

func (Xz) Extension() string

func (Xz) Match

func (x Xz) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Xz) MediaType

func (Xz) MediaType() string

func (Xz) OpenReader

func (Xz) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Xz) OpenWriter

func (Xz) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Zip

type Zip struct {
	// Only compress files which are not already in a
	// compressed format (determined simply by examining
	// file extension).
	SelectiveCompression bool

	// The method or algorithm for compressing stored files.
	Compression uint16

	// If true, errors encountered during reading or writing
	// a file within an archive will be logged and the
	// operation will continue on remaining files.
	ContinueOnError bool

	// For files in zip archives that do not have UTF-8
	// encoded filenames and comments, specify the character
	// encoding here.
	TextEncoding encoding.Encoding
}

func (Zip) Archive

func (z Zip) Archive(ctx context.Context, output io.Writer, files []FileInfo) error

func (Zip) ArchiveAsync

func (z Zip) ArchiveAsync(ctx context.Context, output io.Writer, jobs <-chan ArchiveAsyncJob) error

func (Zip) Extension

func (Zip) Extension() string

func (Zip) Extract

func (z Zip) Extract(ctx context.Context, sourceArchive io.Reader, handleFile FileHandler) error

Extract extracts files from z, implementing the Extractor interface. Uniquely, however, sourceArchive must be an io.ReaderAt and io.Seeker, which are oddly disjoint interfaces from io.Reader which is what the method signature requires. We chose this signature for the interface because we figure you can Read() from anything you can ReadAt() or Seek() with. Due to the nature of the zip archive format, if sourceArchive is not an io.Seeker and io.ReaderAt, an error is returned.

func (Zip) Insert

func (z Zip) Insert(ctx context.Context, into io.ReadWriteSeeker, files []FileInfo) error

Insert appends the listed files into the provided Zip archive stream. If the filename already exists in the archive, it will be replaced.

func (Zip) Match

func (z Zip) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Zip) MediaType

func (Zip) MediaType() string

type Zlib

type Zlib struct {
	CompressionLevel int
}

Zlib facilitates zlib compression.

func (Zlib) Extension

func (Zlib) Extension() string

func (Zlib) Match

func (zz Zlib) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Zlib) MediaType

func (Zlib) MediaType() string

func (Zlib) OpenReader

func (Zlib) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Zlib) OpenWriter

func (zz Zlib) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Zstd

type Zstd struct {
	EncoderOptions []zstd.EOption
	DecoderOptions []zstd.DOption
}

Zstd facilitates Zstandard compression.

func (Zstd) Extension

func (Zstd) Extension() string

func (Zstd) Match

func (zs Zstd) Match(_ context.Context, filename string, stream io.Reader) (MatchResult, error)

func (Zstd) MediaType

func (Zstd) MediaType() string

func (Zstd) OpenReader

func (zs Zstd) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Zstd) OpenWriter

func (zs Zstd) OpenWriter(w io.Writer) (io.WriteCloser, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL