pzip

package module
v0.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2023 License: Apache-2.0 Imports: 17 Imported by: 0

README

logo-5

pzip

pzip, short for parallel-zip, is a blazing fast concurrent zip archiver and extractor.

Features

  • Archives files and directories into a valid zip archive, using DEFLATE.
  • Preserves modification times of files.
  • Files are read and compressed concurrently

Installation

Command Line

For command-line usage, we provide two binaries which can be installed separately:

  • pzip- concurrent zip archiving
  • punzip- concurrent zip extraction

To install, run:

macOS

For zip archiving: brew install ybirader/pzip/pzip

For zip extraction: brew install ybirader/pzip/punzip

Debian, Ubuntu, Raspbian

For the latest stable release:

curl -1sLf 'https://dl.cloudsmith.io/public/pzip/stable/setup.deb.sh' | sudo -E bash
sudo apt update
sudo apt install pzip
curl -1sLf 'https://dl.cloudsmith.io/public/pzip/stable/setup.deb.sh' | sudo -E bash
sudo apt update
sudo apt install punzip
Go

Alternatively, if you have Go installed:

go install github.com/ybirader/pzip
Build from source

To build from source, we require Go 1.21 or newer.

  1. Clone the repository by running git clone "https://github.com/ybirader/pzip.git"
  2. Build both pzip and punzip by running make build or build separately via cd cmd/pzip && go build and cd cmd/punzip && go build

Usage

Archiving

pzip's API is similar to that of the standard zip utlity found on most *-nix systems.

pzip /path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ... path/to/file_or_directoryN

Alternatively, pzip can be imported as a package

archive, err := os.Create("archive.zip")
if err != nil {
  log.Fatal(err)
}

archiver, err := pzip.NewArchiver(archive)
if err != nil {
  log.Fatal(err)
}
defer archiver.Close()

files := []string{ "./hello", "./hello.txt", "./bye.md" }

err = archiver.Archive(context.Background(), files)
if err != nil {
  log.Fatal(err)
}

The concurrency of the archiver can be configured using the corresponding flag:

pzip --concurrency 2 /path/to/compressed.zip path/to/file_or_directory1 path/to/file_or_directory2 ... path/to/file_or_directoryN

or by passing the ArchiverConcurrency option:

archiver, err := pzip.NewArchiver(archive, ArchiverConcurrency(2))
Extraction

punzip's API is similar to that of the standard unzip utlity found on most *-nix systems.

punzip /path/to/compressed.zip

By default, punzip extracts into the current directory. We can extract to a particular path by:

punzip -d /path/to/output /path/to/compressed.zip

Using the Go package, we have:

outputDirPath := "./output"
archivePath := "./archive.zip"

extractor, err := pzip.NewExtractor(outputDirPath)
if err != nil {
  log.Fatal(err)
}
defer extractor.Close()

err = extractor.Extract(context.Background(), archivePath)
if err != nil {
  log.Fatal(err)
}

As with pzip, we can configure the concurrency of the extractor using:

punzip --concurrency 2 /path/to/compressed.zip

Similarly, with the Go package, we pass in the ExtractorConcurrency option:

extractor, err := pzip.NewExtractor(outputDirPath, ExtractorConcurrency(2))
Benchmarks

pzip was benchmarked using Matt Mahoney's sample directory.

Using the standard zip utlity, we get the following time to archive:

real    14m31.809s
user    13m12.833s
sys     0m24.193s

Running the same benchmark with pzip, we find that:

real    0m56.851s
user    3m32.619s
sys     1m25.040s

Contributing

To contribute to pzip, first submit or comment in an issue to discuss your contribution, then open a pull request (PR).

License

pzip is released under the Apache 2.0 license.

Acknowledgements

Many thanks to the folks at Cloudsmith for graciously providing Debian package hosting. Cloudsmith is the only fully hosted, cloud-native, universal package management solution, that enables your organization to create, store and share packages in any format, to any place, with total confidence.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ArchiverConcurrency

func ArchiverConcurrency(n int) archiverOption

ArchiverConcurrency sets the number of goroutines used during archiving An error is returned if n is less than 1.

func ExtractorConcurrency

func ExtractorConcurrency(n int) extractorOption

ExtractorConcurrency sets the number of goroutines used during extraction An error is returned if n is less than 1.

func NewArchiver

func NewArchiver(archive *os.File, options ...archiverOption) (*archiver, error)

NewArchiver returns a new pzip archiver. The archiver can be configured by passing in a number of options. Available options include ArchiverConcurrency(n int). It returns an error if the archiver can't be created Close() should be called on the returned archiver when done

func NewExtractor

func NewExtractor(outputDir string, options ...extractorOption) (*extractor, error)

NewExtractor returns a new pzip extractor. The extractor can be configured by passing in a number of options. Available options include ExtractorConcurrency(n int). It returns an error if the extractor can't be created Close() should be called on the returned extractor when done

Types

type ArchiverCLI

type ArchiverCLI struct {
	ArchivePath string
	Files       []string
	Concurrency int
}

func (*ArchiverCLI) Archive

func (a *ArchiverCLI) Archive(ctx context.Context) error

type ExtendedTimestampExtraField

type ExtendedTimestampExtraField struct {
	// contains filtered or unexported fields
}

ExtendedTimeStampExtraField is the extended timestamp field, as defined in the zip specification (See 4.5.3 https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT).

func NewExtendedTimestampExtraField

func NewExtendedTimestampExtraField(modified time.Time) *ExtendedTimestampExtraField

func (*ExtendedTimestampExtraField) Encode

func (e *ExtendedTimestampExtraField) Encode() []byte

Encode returns the modified time of the associated ExtendedTimestampExtraField as a slice of bytes.

type ExtractorCLI

type ExtractorCLI struct {
	ArchivePath string
	OutputDir   string
	Concurrency int
}

func (*ExtractorCLI) Extract

func (e *ExtractorCLI) Extract(ctx context.Context) error

Directories

Path Synopsis
adapters
cli
cmd
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL