sevenzip

package module
v1.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 17, 2024 License: BSD-3-Clause Imports: 36 Imported by: 49

README

OpenSSF Scorecard OpenSSF Best Practices GitHub release Build Status Coverage Status Go Report Card GoDoc Go version Go version

sevenzip

A reader for 7-zip archives inspired by archive/zip.

Current status:

  • Pure Go, no external libraries or binaries needed.
  • Handles uncompressed headers, (7za a -mhc=off test.7z ...).
  • Handles compressed headers, (7za a -mhc=on test.7z ...).
  • Handles password-protected versions of both of the above (7za a -mhc=on|off -mhe=on -ppassword test.7z ...).
  • Handles archives split into multiple volumes, (7za a -v100m test.7z ...).
  • Handles self-extracting archives, (7za a -sfx archive.exe ...).
  • Validates CRC values as it parses the file.
  • Supports ARM, BCJ, BCJ2, Brotli, Bzip2, Copy, Deflate, Delta, LZ4, LZMA, LZMA2, PPC, SPARC and Zstandard methods.
  • Implements the fs.FS interface so you can treat an opened 7-zip archive like a filesystem.

More examples of 7-zip archives are needed to test all of the different combinations/algorithms possible.

Frequently Asked Questions

Why is my code running so slow?

Someone might write the following simple code:

func extractArchive(archive string) error {
        r, err := sevenzip.OpenReader(archive)
        if err != nil {
                return err
        }
        defer r.Close()

        for _, f := range r.File {
                rc, err := f.Open()
                if err != nil {
                        return err
                }
                defer rc.Close()

                // Extract the file
        }

        return nil
}

Unlike a zip archive where every file is individually compressed, 7-zip archives can have all of the files compressed together in one long compressed stream, supposedly to achieve a better compression ratio. In a naive random access implementation, to read the first file you start at the beginning of the compressed stream and read out that files worth of bytes. To read the second file you have to start at the beginning of the compressed stream again, read and discard the first files worth of bytes to get to the correct offset in the stream, then read out the second files worth of bytes. You can see that for an archive that contains hundreds of files, extraction can get progressively slower as you have to read and discard more and more data just to get to the right offset in the stream.

This package contains an optimisation that caches and reuses the underlying compressed stream reader so you don't have to keep starting from the beginning for each file, but it does require you to call rc.Close() before extracting the next file. So write your code similar to this:

func extractFile(file *sevenzip.File) error {
        rc, err := f.Open()
        if err != nil {
                return err
        }
        defer rc.Close()

        // Extract the file

        return nil
}

func extractArchive(archive string) error {
        r, err := sevenzip.OpenReader(archive)
        if err != nil {
                return err
        }
        defer r.Close()

        for _, f := range r.File {
                if err = extractFile(f); err != nil {
                        return err
                }
        }

        return nil
}

You can see the main difference is to not defer all of the Close() calls until the end of extractArchive().

There is a set of benchmarks in this package that demonstrates the performance boost that the optimisation provides, amongst other techniques:

$ go test -v -run='^$' -bench='Reader$' -benchtime=60s
goos: darwin
goarch: amd64
pkg: github.com/bodgit/sevenzip
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkNaiveReader
BenchmarkNaiveReader-12                  	       2	31077542628 ns/op
BenchmarkOptimisedReader
BenchmarkOptimisedReader-12              	     434	 164854747 ns/op
BenchmarkNaiveParallelReader
BenchmarkNaiveParallelReader-12          	     240	 361869339 ns/op
BenchmarkNaiveSingleParallelReader
BenchmarkNaiveSingleParallelReader-12    	     412	 171027895 ns/op
BenchmarkParallelReader
BenchmarkParallelReader-12               	     636	 112551812 ns/op
PASS
ok  	github.com/bodgit/sevenzip	472.251s

The archive used here is just the reference LZMA SDK archive, which is only 1 MiB in size but does contain 630+ files split across three compression streams. The only difference between BenchmarkNaiveReader and the rest is the lack of a call to rc.Close() between files so the stream reuse optimisation doesn't take effect.

Don't try and blindly throw goroutines at the problem either as this can also undo the optimisation; a naive implementation that uses a pool of multiple goroutines to extract each file ends up being nearly 50% slower, even just using a pool of one goroutine can end up being less efficient. The optimal way to employ goroutines is to make use of the sevenzip.FileHeader.Stream field; extract files with the same value using the same goroutine. This achieves a 50% speed improvement with the LZMA SDK archive, but it very much depends on how many streams there are in the archive.

In general, don't try and extract the files in a different order compared to the natural order within the archive as that will also undo the optimisation. The worst scenario would likely be to extract the archive in reverse order.

Detecting the wrong password

It's virtually impossible to reliably detect the wrong password versus some other corruption in a password protected archive. This is partly due to how CBC decryption works; with the wrong password you don't get any sort of decryption error, you just a stream of bytes that aren't the correct ones. This manifests itself when the file has been compressed and encrypted; during extraction the file is decrypted and then decompressed so with the wrong password the decompression algorithm gets handed a stream which isn't valid so that's the error you see.

A sevenzip.ReadError error type can be returned for certain operations. If sevenzip.ReadError.Encrypted is true then encryption is involved and you can use that as a hint to either set a password or try a different one. Use errors.As() to check like this:

r, err := sevenzip.OpenReaderWithPassword(archive, password)
if err != nil {
        var e *sevenzip.ReadError
        if errors.As(err, &e) && e.Encrypted {
                // Encryption involved, retry with a different password
        }

        return err
}

Be aware that if the archive does not have the headers encrypted, (7za a -mhe=off -ppassword test.7z ...), then you can always open the archive and the password is only used when extracting the files.

If files are added to the archive encrypted and not compressed, (7za a -m0=copy -ppassword test.7z ...), then you will never get an error extracting with the wrong password as the only consumer of the decrypted content will be your own code. To detect a potentially wrong password, calculate the CRC value and check that it matches the value in sevenzip.FileHeader.CRC32.

Documentation

Overview

Package sevenzip provides read access to 7-zip archives.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func RegisterDecompressor

func RegisterDecompressor(method []byte, dcomp Decompressor)

RegisterDecompressor allows custom decompressors for a specified method ID.

Types

type CryptoReadCloser added in v1.2.0

type CryptoReadCloser interface {
	Password(password string) error
}

CryptoReadCloser adds a Password method to decompressors.

type Decompressor

type Decompressor func([]byte, uint64, []io.ReadCloser) (io.ReadCloser, error)

Decompressor describes the function signature that decompression/decryption methods must implement to return a new instance of themselves. They are passed any property bytes, the size of the stream and a slice of at least one io.ReadCloser's providing the stream(s) of bytes.

type File

type File struct {
	FileHeader
	// contains filtered or unexported fields
}

A File is a single file in a 7-Zip archive. The file information is in the embedded FileHeader. The file content can be accessed by calling File.Open.

func (*File) Open

func (f *File) Open() (io.ReadCloser, error)

Open returns an io.ReadCloser that provides access to the File's contents. Multiple files may be read concurrently.

type FileHeader

type FileHeader struct {
	Name             string
	Created          time.Time
	Accessed         time.Time
	Modified         time.Time
	Attributes       uint32
	CRC32            uint32
	UncompressedSize uint64

	// Stream is an opaque identifier representing the compressed stream
	// that contains the file. Any File with the same value can be assumed
	// to be stored within the same stream.
	Stream int
	// contains filtered or unexported fields
}

FileHeader describes a file within a 7-zip file.

func (*FileHeader) FileInfo

func (h *FileHeader) FileInfo() fs.FileInfo

FileInfo returns an fs.FileInfo for the FileHeader.

func (*FileHeader) Mode

func (h *FileHeader) Mode() (mode fs.FileMode)

Mode returns the permission and mode bits for the FileHeader.

type ReadCloser

type ReadCloser struct {
	Reader
	// contains filtered or unexported fields
}

A ReadCloser is a Reader that must be closed when no longer needed.

func OpenReader

func OpenReader(name string) (*ReadCloser, error)

OpenReader will open the 7-zip file specified by name and return a *ReadCloser. If name has a ".001" suffix it is assumed there are multiple volumes and each sequential volume will be opened.

Example
package main

import (
	"fmt"
	"path/filepath"

	"github.com/bodgit/sevenzip"
)

func main() {
	r, err := sevenzip.OpenReader(filepath.Join("testdata", "multi.7z.001"))
	if err != nil {
		panic(err)
	}

	defer func() {
		if err := r.Close(); err != nil {
			panic(err)
		}
	}()

	for _, file := range r.File {
		fmt.Println(file.Name)
	}
}
Output:

01
02
03
04
05
06
07
08
09
10

func OpenReaderWithPassword

func OpenReaderWithPassword(name, password string) (*ReadCloser, error)

OpenReaderWithPassword will open the 7-zip file specified by name using password as the basis of the decryption key and return a *ReadCloser. If name has a ".001" suffix it is assumed there are multiple volumes and each sequential volume will be opened.

func (*ReadCloser) Close

func (rc *ReadCloser) Close() (err error)

Close closes the 7-zip file or volumes, rendering them unusable for I/O.

func (*ReadCloser) Volumes added in v1.4.0

func (rc *ReadCloser) Volumes() []string

Volumes returns the list of volumes that have been opened as part of the current archive.

type ReadError added in v1.6.0

type ReadError struct {
	// Encrypted is a hint that there is encryption involved.
	Encrypted bool
	Err       error
}

ReadError is used to wrap read I/O errors.

func (ReadError) Error added in v1.6.0

func (e ReadError) Error() string

func (ReadError) Unwrap added in v1.6.0

func (e ReadError) Unwrap() error

type Reader

type Reader struct {
	File []*File
	// contains filtered or unexported fields
}

A Reader serves content from a 7-Zip archive.

func NewReader

func NewReader(r io.ReaderAt, size int64) (*Reader, error)

NewReader returns a new *Reader reading from r, which is assumed to have the given size in bytes.

func NewReaderWithPassword

func NewReaderWithPassword(r io.ReaderAt, size int64, password string) (*Reader, error)

NewReaderWithPassword returns a new *Reader reading from r using password as the basis of the decryption key, which is assumed to have the given size in bytes.

func (*Reader) Open added in v1.3.0

func (z *Reader) Open(name string) (fs.File, error)

Open opens the named file in the 7-zip archive, using the semantics of fs.FS.Open: paths are always slash separated, with no leading / or ../ elements.

Directories

Path Synopsis
internal
aes7z
Package aes7z implements the 7-zip AES decryption.
Package aes7z implements the 7-zip AES decryption.
bcj2
Package bcj2 implements the BCJ2 filter for x86 binaries.
Package bcj2 implements the BCJ2 filter for x86 binaries.
bra
Package bra implements the branch rewriting filter for binaries.
Package bra implements the branch rewriting filter for binaries.
brotli
Package brotli implements the Brotli decompressor.
Package brotli implements the Brotli decompressor.
bzip2
Package bzip2 implements the Bzip2 decompressor.
Package bzip2 implements the Bzip2 decompressor.
deflate
Package deflate implements the Deflate decompressor.
Package deflate implements the Deflate decompressor.
delta
Package delta implements the Delta filter.
Package delta implements the Delta filter.
lz4
Package lz4 implements the LZ4 decompressor.
Package lz4 implements the LZ4 decompressor.
lzma
Package lzma implements the LZMA decompressor.
Package lzma implements the LZMA decompressor.
lzma2
Package lzma2 implements the LZMA2 decompressor.
Package lzma2 implements the LZMA2 decompressor.
pool
Package pool implements the reader pooling.
Package pool implements the reader pooling.
util
Package util implements various utility types and interfaces.
Package util implements various utility types and interfaces.
zstd
Package zstd implements the Zstandard decompressor.
Package zstd implements the Zstandard decompressor.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL