pack

package module

v0.0.0-...-77e5a9a Latest Latest Go to latest Published: Dec 27, 2023 License: BSD-2-Clause Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/andybalholm/pack

Links

Open Source Insights

README ¶

Pack

Interfaces for LZ77-based data compression.

Introduction

Many compression libraries have two main parts:

Something that looks for repeated sequences of bytes
An encoder for the compressed data format (often an entropy coder)

Although these are logically two separate steps, the implementations are usually closely tied together. You can't use flate's matcher with snappy's encoder, for example. Pack defines interfaces and an intermediate representation to allow mixing and matching compression components.

Interfaces

Pack defines two interfaces, MatchFinder and Encoder, and a Writer type to tie them together and implement io.Writer. Both of the interfaces use an append-like API, where a destination slice is be passed as the first parameter, and the output is appended to it. If you pass nil as the destination, you will get a newly-allocated slice.

// A MatchFinder performs the LZ77 stage of compression, looking for matches.
type MatchFinder interface {
	// FindMatches looks for matches in src, appends them to dst, and returns dst.
	FindMatches(dst []Match, src []byte) []Match

	// Reset clears any internal state, preparing the MatchFinder to be used with
	// a new stream.
	Reset()
}

// An Encoder encodes the data in its final format.
type Encoder interface {
	// Encode appends the encoded format of src to dst, using the match
	// information from matches.
	Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte

	// Reset clears any internal state, preparing the Encoder to be used with
	// a new stream.
	Reset()
}

// A Writer uses MatchFinder and Encoder to write compressed data to Dest.
type Writer struct {
	Dest        io.Writer
	MatchFinder MatchFinder
	Encoder     Encoder

	// BlockSize is the number of bytes to compress at a time. If it is zero,
	// each Write operation will be treated as one block.
	BlockSize int
}

Example

Here is an example program that finds repititions in the Go Proverbs, and prints them with human-readable backreferences.

package main

import (
	"fmt"

	"github.com/andybalholm/pack"
	"github.com/andybalholm/pack/flate"
)

const proverbs = `Don't communicate by sharing memory, share memory by communicating.
Concurrency is not parallelism.
Channels orchestrate; mutexes serialize.
The bigger the interface, the weaker the abstraction.
Make the zero value useful.
interface{} says nothing.
Gofmt's style is no one's favorite, yet gofmt is everyone's favorite.
A little copying is better than a little dependency.
Syscall must always be guarded with build tags.
Cgo must always be guarded with build tags.
Cgo is not Go.
With the unsafe package there are no guarantees.
Clear is better than clever.
Reflection is never clear.
Errors are values.
Don't just check errors, handle them gracefully.
Design the architecture, name the components, document the details.
Documentation is for users.
Don't panic.`

func main() {
	var mf flate.DualHash
	var e pack.TextEncoder

	b := []byte(proverbs)
	matches := mf.FindMatches(nil, b)
	out := e.Encode(nil, b, matches, true)

	fmt.Printf("%s\n", out)
}

Implementations

The brotli, flate, and snappy directories contain implementations of Encoder and MatchFinder based on github.com/andybalholm/brotli, compress/flate, and github.com/golang/snappy.

The flate implementation, with BestSpeed, is faster than compress/flate. This demonstrates that although there is some overhead due to using the Pack interfaces, it isn’t so much that you can’t get reasonable compression performance.

Documentation ¶

Overview ¶

The pack package defines interfaces for LZ77-based data compression.

Many compression libraries have two main parts:

Something that looks for repeated sequences of bytes
An encoder for the compressed data format (often an entropy coder)

Although these are logically two separate steps, the implementations are usually closely tied together. You can't use flate's matcher with snappy's encoder, for example. This package defines interfaces and an intermediate representation to allow mixing and matching compression components.

Index ¶

type AbsoluteMatch
type AutoReset
- func (a AutoReset) FindMatches(dst []Match, src []byte) []Match
type DualHash
- func (q *DualHash) FindMatches(dst []Match, src []byte) []Match
- func (q *DualHash) Reset()
- func (q *DualHash) Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch
type DualHashAdvancedParsing
- func (q *DualHashAdvancedParsing) FindMatches(dst []Match, src []byte) []Match
- func (q *DualHashAdvancedParsing) Reset()
type Encoder
type GreedyParser
- func (p *GreedyParser) Parse(dst []Match, src Searcher, start, end int) []Match
type HashChain
- func (q *HashChain) FindMatches(dst []Match, src []byte) []Match
- func (q *HashChain) Reset()
- func (q *HashChain) Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch
type LazyParser
- func (p *LazyParser) Parse(dst []Match, src Searcher, start, end int) []Match
type Match
type MatchFinder
type NoMatchFinder
- func (n NoMatchFinder) FindMatches(dst []Match, src []byte) []Match
- func (n NoMatchFinder) Reset()
type O2
- func (q *O2) FindMatches(dst []Match, src []byte) []Match
- func (q *O2) Parse(dst []Match, start, end int) []Match
- func (q *O2) Reset()
type O3
- func (q *O3) FindMatches(dst []Match, src []byte) []Match
- func (q *O3) Parse(dst []Match, start, end int) []Match
- func (q *O3) Reset()
type OverlapParser
- func (p *OverlapParser) Parse(dst []Match, src Searcher, start, end int) []Match
type Parser
type Searcher
type SimpleSearchAdvancedParsing
- func (q *SimpleSearchAdvancedParsing) FindMatches(dst []Match, src []byte) []Match
- func (q *SimpleSearchAdvancedParsing) Reset()
type SingleHash
- func (q *SingleHash) FindMatches(dst []Match, src []byte) []Match
- func (q *SingleHash) Reset()
- func (q *SingleHash) Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch
type SingleHashGreedy
- func (q *SingleHashGreedy) FindMatches(dst []Match, src []byte) []Match
- func (q *SingleHashGreedy) Parse(dst []Match, start, end int) []Match
- func (q *SingleHashGreedy) Reset()
type SingleHashOverlap
- func (q *SingleHashOverlap) FindMatches(dst []Match, src []byte) []Match
- func (q *SingleHashOverlap) Parse(dst []Match, start, end int) []Match
- func (q *SingleHashOverlap) Reset()
type TextEncoder
- func (t TextEncoder) Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte
- func (t TextEncoder) Reset()
type Writer
- func (w *Writer) Close() error
- func (w *Writer) Reset(newDest io.Writer)
- func (w *Writer) Write(p []byte) (n int, err error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AbsoluteMatch ¶

type AbsoluteMatch struct {
	// Start is the index of the first byte.
	Start int

	// End is the index of the byte after the last byte
	// (so that End - Start = Length).
	End int

	// Match is the index of the previous data that matches
	// (Start - Match = Distance).
	Match int
}

An AbsoluteMatch is like a Match, but it stores indexes into the byte stream instead of lengths.

type AutoReset ¶

type AutoReset struct {
	MatchFinder
}

AutoReset wraps a MatchFinder that can return references to data in previous blocks, and calls Reset before each block. It is useful for (e.g.) using a snappy Encoder with a MatchFinder designed for flate. (Snappy doesn't support references between blocks.)

func (AutoReset) FindMatches ¶

func (a AutoReset) FindMatches(dst []Match, src []byte) []Match

type DualHash ¶

type DualHash struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	Parser Parser
	// contains filtered or unexported fields
}

DualHash is an implementation of the MatchFinder interface that uses two hash tables (4-byte and 8-byte).

func (*DualHash) FindMatches ¶

func (q *DualHash) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*DualHash) Reset ¶

func (q *DualHash) Reset()

func (*DualHash) Search ¶

func (q *DualHash) Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch

type DualHashAdvancedParsing ¶

type DualHashAdvancedParsing struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	// MinLength is the length of the shortest match to return.
	// The default is 4.
	MinLength int
	// contains filtered or unexported fields
}

DualHashAdvancedParsing is an implementation of the MatchFinder interface that uses two hash tables to find matches (with two different hash lengths), and the advanced parsing technique from https://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html, except that it looks for matches at every input position.

func (*DualHashAdvancedParsing) FindMatches ¶

func (q *DualHashAdvancedParsing) FindMatches(dst []Match, src []byte) []Match

func (*DualHashAdvancedParsing) Reset ¶

func (q *DualHashAdvancedParsing) Reset()

type Encoder ¶

type Encoder interface {
	// Encode appends the encoded format of src to dst, using the match
	// information from matches.
	Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte

	// Reset clears any internal state, preparing the Encoder to be used with
	// a new stream.
	Reset()
}

An Encoder encodes the data in its final format.

type GreedyParser ¶

type GreedyParser struct {
	// contains filtered or unexported fields
}

A GreedyParser implements the greedy matching strategy: It goes from start to end, choosing the longest match at each position.

func (*GreedyParser) Parse ¶

func (p *GreedyParser) Parse(dst []Match, src Searcher, start, end int) []Match

type HashChain ¶

type HashChain struct {
	// SearchLen is how many entries to examine on the hash chain.
	// The default is 1.
	SearchLen int

	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	Parser Parser
	// contains filtered or unexported fields
}

HashChain is an implementation of the MatchFinder interface that uses hash chaining to find longer matches.

func (*HashChain) FindMatches ¶

func (q *HashChain) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*HashChain) Reset ¶

func (q *HashChain) Reset()

func (*HashChain) Search ¶

func (q *HashChain) Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch

type LazyParser ¶

type LazyParser struct {
	// contains filtered or unexported fields
}

A LazyParser implements the lazy matching strategy: Before choosing a match, it checks for a longer match starting at the next byte.

func (*LazyParser) Parse ¶

func (p *LazyParser) Parse(dst []Match, src Searcher, start, end int) []Match

type Match ¶

type Match struct {
	Unmatched int // the number of unmatched bytes since the previous match
	Length    int // the number of bytes in the matched string; it may be 0 at the end of the input
	Distance  int // how far back in the stream to copy from
}

A Match is the basic unit of LZ77 compression.

type MatchFinder ¶

type MatchFinder interface {
	// FindMatches looks for matches in src, appends them to dst, and returns dst.
	FindMatches(dst []Match, src []byte) []Match

	// Reset clears any internal state, preparing the MatchFinder to be used with
	// a new stream.
	Reset()
}

A MatchFinder performs the LZ77 stage of compression, looking for matches.

type NoMatchFinder ¶

type NoMatchFinder struct{}

A NoMatchFinder implements MatchFinder, but doesn't find any matches. It can be used to implement the equivalent of the standard library flate package's HuffmanOnly setting.

func (NoMatchFinder) FindMatches ¶

func (n NoMatchFinder) FindMatches(dst []Match, src []byte) []Match

func (NoMatchFinder) Reset ¶

func (n NoMatchFinder) Reset()

type O2 ¶

type O2 struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int
	// contains filtered or unexported fields
}

O2 is an implementation of the MatchFinder interface that is comparable to Brotli level 2, but uses the overlap parsing strategy.

func (*O2) FindMatches ¶

func (q *O2) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*O2) Parse ¶

func (q *O2) Parse(dst []Match, start, end int) []Match

func (*O2) Reset ¶

func (q *O2) Reset()

type O3 ¶

type O3 struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int
	// contains filtered or unexported fields
}

O3 is an implementation of the MatchFinder interface that is comparable to Brotli level 3, but uses the overlap parsing strategy.

func (*O3) FindMatches ¶

func (q *O3) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*O3) Parse ¶

func (q *O3) Parse(dst []Match, start, end int) []Match

func (*O3) Reset ¶

func (q *O3) Reset()

type OverlapParser ¶

type OverlapParser struct {
	// Score is used to choose the best match. If it is nil,
	// the length of the match is used as its score.
	Score func(AbsoluteMatch) int
	// contains filtered or unexported fields
}

An OverlapParser looks for overlapping matches and chooses the best ones, using an algorithm based on https://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html

func (*OverlapParser) Parse ¶

func (p *OverlapParser) Parse(dst []Match, src Searcher, start, end int) []Match

type Parser ¶

type Parser interface {
	// Parse gets matches from src, chooses which ones to use, and appends
	// them to dst. The matches cover the range of bytes from start to end.
	Parse(dst []Match, src Searcher, start, end int) []Match
}

A Parser chooses which matches to use to compress the data.

type Searcher ¶

type Searcher interface {
	// Search looks for matches at pos and appends them to dst.
	// In each match, Start and End must fall within the interval [min,max),
	// and Match < Start < End.
	Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch
}

A Searcher is the source of matches for a Parser. It is a lower-level interface than MatchFinder, only looking for matches at one position at a time. A type that uses a Parser to implement MatchFinder can implement Searcher as well, and pass itself to the Parser.

type SimpleSearchAdvancedParsing ¶

type SimpleSearchAdvancedParsing struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	// MinLength is the length of the shortest match to return.
	// The default is 4.
	MinLength int

	// HashLen is the number of bytes to use to calculate the hashes.
	// The maximum is 8 and the default is 6.
	HashLen int
	// contains filtered or unexported fields
}

SimpleSearchAdvancedParsing is an implementation of the MatchFinder interface that uses a simple hash table to find matches, but the advanced parsing technique from https://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html, except that it looks for matches at every input position.

func (*SimpleSearchAdvancedParsing) FindMatches ¶

func (q *SimpleSearchAdvancedParsing) FindMatches(dst []Match, src []byte) []Match

func (*SimpleSearchAdvancedParsing) Reset ¶

func (q *SimpleSearchAdvancedParsing) Reset()

type SingleHash ¶

type SingleHash struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	Parser Parser
	// contains filtered or unexported fields
}

SingleHash is an implementation of the MatchFinder interface that uses a simple 4-byte hash to find matches.

func (*SingleHash) FindMatches ¶

func (q *SingleHash) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*SingleHash) Reset ¶

func (q *SingleHash) Reset()

func (*SingleHash) Search ¶

func (q *SingleHash) Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch

type SingleHashGreedy ¶

type SingleHashGreedy struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int
	// contains filtered or unexported fields
}

SingleHashGreedy is an implementation of the MatchFinder interface that is like SingleHash with an GreedyParser inlined.

func (*SingleHashGreedy) FindMatches ¶

func (q *SingleHashGreedy) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*SingleHashGreedy) Parse ¶

func (q *SingleHashGreedy) Parse(dst []Match, start, end int) []Match

func (*SingleHashGreedy) Reset ¶

func (q *SingleHashGreedy) Reset()

type SingleHashOverlap ¶

type SingleHashOverlap struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int
	// contains filtered or unexported fields
}

SingleHashOverlap is an implementation of the MatchFinder interface that is like SingleHash with an OverlapParser inlined.

func (*SingleHashOverlap) FindMatches ¶

func (q *SingleHashOverlap) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*SingleHashOverlap) Parse ¶

func (q *SingleHashOverlap) Parse(dst []Match, start, end int) []Match

func (*SingleHashOverlap) Reset ¶

func (q *SingleHashOverlap) Reset()

type TextEncoder ¶

type TextEncoder struct{}

A TextEncoder is an Encoder that produces a human-readable representation of the LZ77 compression. Matches are replaced with <Length,Distance> symbols.

func (TextEncoder) Encode ¶

func (t TextEncoder) Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte

func (TextEncoder) Reset ¶

func (t TextEncoder) Reset()

type Writer ¶

type Writer struct {
	Dest        io.Writer
	MatchFinder MatchFinder
	Encoder     Encoder

	// BlockSize is the number of bytes to compress at a time. If it is zero,
	// each Write operation will be treated as one block.
	BlockSize int
	// contains filtered or unexported fields
}

A Writer uses MatchFinder and Encoder to write compressed data to Dest.

func (*Writer) Close ¶

func (w *Writer) Close() error

func (*Writer) Reset ¶

func (w *Writer) Reset(newDest io.Writer)

func (*Writer) Write ¶

func (w *Writer) Write(p []byte) (n int, err error)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
brotli
flate
lz4
snappy
zstd

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL