pack

package module
v0.0.0-...-77e5a9a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 27, 2023 License: BSD-2-Clause Imports: 5 Imported by: 0

README

Pack

Interfaces for LZ77-based data compression.

Introduction

Many compression libraries have two main parts:

  • Something that looks for repeated sequences of bytes
  • An encoder for the compressed data format (often an entropy coder)

Although these are logically two separate steps, the implementations are usually closely tied together. You can't use flate's matcher with snappy's encoder, for example. Pack defines interfaces and an intermediate representation to allow mixing and matching compression components.

Interfaces

Pack defines two interfaces, MatchFinder and Encoder, and a Writer type to tie them together and implement io.Writer. Both of the interfaces use an append-like API, where a destination slice is be passed as the first parameter, and the output is appended to it. If you pass nil as the destination, you will get a newly-allocated slice.

// A MatchFinder performs the LZ77 stage of compression, looking for matches.
type MatchFinder interface {
	// FindMatches looks for matches in src, appends them to dst, and returns dst.
	FindMatches(dst []Match, src []byte) []Match

	// Reset clears any internal state, preparing the MatchFinder to be used with
	// a new stream.
	Reset()
}

// An Encoder encodes the data in its final format.
type Encoder interface {
	// Encode appends the encoded format of src to dst, using the match
	// information from matches.
	Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte

	// Reset clears any internal state, preparing the Encoder to be used with
	// a new stream.
	Reset()
}

// A Writer uses MatchFinder and Encoder to write compressed data to Dest.
type Writer struct {
	Dest        io.Writer
	MatchFinder MatchFinder
	Encoder     Encoder

	// BlockSize is the number of bytes to compress at a time. If it is zero,
	// each Write operation will be treated as one block.
	BlockSize int
}

Example

Here is an example program that finds repititions in the Go Proverbs, and prints them with human-readable backreferences.

package main

import (
	"fmt"

	"github.com/andybalholm/pack"
	"github.com/andybalholm/pack/flate"
)

const proverbs = `Don't communicate by sharing memory, share memory by communicating.
Concurrency is not parallelism.
Channels orchestrate; mutexes serialize.
The bigger the interface, the weaker the abstraction.
Make the zero value useful.
interface{} says nothing.
Gofmt's style is no one's favorite, yet gofmt is everyone's favorite.
A little copying is better than a little dependency.
Syscall must always be guarded with build tags.
Cgo must always be guarded with build tags.
Cgo is not Go.
With the unsafe package there are no guarantees.
Clear is better than clever.
Reflection is never clear.
Errors are values.
Don't just check errors, handle them gracefully.
Design the architecture, name the components, document the details.
Documentation is for users.
Don't panic.`

func main() {
	var mf flate.DualHash
	var e pack.TextEncoder

	b := []byte(proverbs)
	matches := mf.FindMatches(nil, b)
	out := e.Encode(nil, b, matches, true)

	fmt.Printf("%s\n", out)
}

Implementations

The brotli, flate, and snappy directories contain implementations of Encoder and MatchFinder based on github.com/andybalholm/brotli, compress/flate, and github.com/golang/snappy.

The flate implementation, with BestSpeed, is faster than compress/flate. This demonstrates that although there is some overhead due to using the Pack interfaces, it isn’t so much that you can’t get reasonable compression performance.

Documentation

Overview

The pack package defines interfaces for LZ77-based data compression.

Many compression libraries have two main parts:

  • Something that looks for repeated sequences of bytes
  • An encoder for the compressed data format (often an entropy coder)

Although these are logically two separate steps, the implementations are usually closely tied together. You can't use flate's matcher with snappy's encoder, for example. This package defines interfaces and an intermediate representation to allow mixing and matching compression components.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AbsoluteMatch

type AbsoluteMatch struct {
	// Start is the index of the first byte.
	Start int

	// End is the index of the byte after the last byte
	// (so that End - Start = Length).
	End int

	// Match is the index of the previous data that matches
	// (Start - Match = Distance).
	Match int
}

An AbsoluteMatch is like a Match, but it stores indexes into the byte stream instead of lengths.

type AutoReset

type AutoReset struct {
	MatchFinder
}

AutoReset wraps a MatchFinder that can return references to data in previous blocks, and calls Reset before each block. It is useful for (e.g.) using a snappy Encoder with a MatchFinder designed for flate. (Snappy doesn't support references between blocks.)

func (AutoReset) FindMatches

func (a AutoReset) FindMatches(dst []Match, src []byte) []Match

type DualHash

type DualHash struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	Parser Parser
	// contains filtered or unexported fields
}

DualHash is an implementation of the MatchFinder interface that uses two hash tables (4-byte and 8-byte).

func (*DualHash) FindMatches

func (q *DualHash) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*DualHash) Reset

func (q *DualHash) Reset()

func (*DualHash) Search

func (q *DualHash) Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch

type DualHashAdvancedParsing

type DualHashAdvancedParsing struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	// MinLength is the length of the shortest match to return.
	// The default is 4.
	MinLength int
	// contains filtered or unexported fields
}

DualHashAdvancedParsing is an implementation of the MatchFinder interface that uses two hash tables to find matches (with two different hash lengths), and the advanced parsing technique from https://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html, except that it looks for matches at every input position.

func (*DualHashAdvancedParsing) FindMatches

func (q *DualHashAdvancedParsing) FindMatches(dst []Match, src []byte) []Match

func (*DualHashAdvancedParsing) Reset

func (q *DualHashAdvancedParsing) Reset()

type Encoder

type Encoder interface {
	// Encode appends the encoded format of src to dst, using the match
	// information from matches.
	Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte

	// Reset clears any internal state, preparing the Encoder to be used with
	// a new stream.
	Reset()
}

An Encoder encodes the data in its final format.

type GreedyParser

type GreedyParser struct {
	// contains filtered or unexported fields
}

A GreedyParser implements the greedy matching strategy: It goes from start to end, choosing the longest match at each position.

func (*GreedyParser) Parse

func (p *GreedyParser) Parse(dst []Match, src Searcher, start, end int) []Match

type HashChain

type HashChain struct {
	// SearchLen is how many entries to examine on the hash chain.
	// The default is 1.
	SearchLen int

	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	Parser Parser
	// contains filtered or unexported fields
}

HashChain is an implementation of the MatchFinder interface that uses hash chaining to find longer matches.

func (*HashChain) FindMatches

func (q *HashChain) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*HashChain) Reset

func (q *HashChain) Reset()

func (*HashChain) Search

func (q *HashChain) Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch

type LazyParser

type LazyParser struct {
	// contains filtered or unexported fields
}

A LazyParser implements the lazy matching strategy: Before choosing a match, it checks for a longer match starting at the next byte.

func (*LazyParser) Parse

func (p *LazyParser) Parse(dst []Match, src Searcher, start, end int) []Match

type Match

type Match struct {
	Unmatched int // the number of unmatched bytes since the previous match
	Length    int // the number of bytes in the matched string; it may be 0 at the end of the input
	Distance  int // how far back in the stream to copy from
}

A Match is the basic unit of LZ77 compression.

type MatchFinder

type MatchFinder interface {
	// FindMatches looks for matches in src, appends them to dst, and returns dst.
	FindMatches(dst []Match, src []byte) []Match

	// Reset clears any internal state, preparing the MatchFinder to be used with
	// a new stream.
	Reset()
}

A MatchFinder performs the LZ77 stage of compression, looking for matches.

type NoMatchFinder

type NoMatchFinder struct{}

A NoMatchFinder implements MatchFinder, but doesn't find any matches. It can be used to implement the equivalent of the standard library flate package's HuffmanOnly setting.

func (NoMatchFinder) FindMatches

func (n NoMatchFinder) FindMatches(dst []Match, src []byte) []Match

func (NoMatchFinder) Reset

func (n NoMatchFinder) Reset()

type O2

type O2 struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int
	// contains filtered or unexported fields
}

O2 is an implementation of the MatchFinder interface that is comparable to Brotli level 2, but uses the overlap parsing strategy.

func (*O2) FindMatches

func (q *O2) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*O2) Parse

func (q *O2) Parse(dst []Match, start, end int) []Match

func (*O2) Reset

func (q *O2) Reset()

type O3

type O3 struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int
	// contains filtered or unexported fields
}

O3 is an implementation of the MatchFinder interface that is comparable to Brotli level 3, but uses the overlap parsing strategy.

func (*O3) FindMatches

func (q *O3) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*O3) Parse

func (q *O3) Parse(dst []Match, start, end int) []Match

func (*O3) Reset

func (q *O3) Reset()

type OverlapParser

type OverlapParser struct {
	// Score is used to choose the best match. If it is nil,
	// the length of the match is used as its score.
	Score func(AbsoluteMatch) int
	// contains filtered or unexported fields
}

An OverlapParser looks for overlapping matches and chooses the best ones, using an algorithm based on https://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html

func (*OverlapParser) Parse

func (p *OverlapParser) Parse(dst []Match, src Searcher, start, end int) []Match

type Parser

type Parser interface {
	// Parse gets matches from src, chooses which ones to use, and appends
	// them to dst. The matches cover the range of bytes from start to end.
	Parse(dst []Match, src Searcher, start, end int) []Match
}

A Parser chooses which matches to use to compress the data.

type Searcher

type Searcher interface {
	// Search looks for matches at pos and appends them to dst.
	// In each match, Start and End must fall within the interval [min,max),
	// and Match < Start < End.
	Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch
}

A Searcher is the source of matches for a Parser. It is a lower-level interface than MatchFinder, only looking for matches at one position at a time. A type that uses a Parser to implement MatchFinder can implement Searcher as well, and pass itself to the Parser.

type SimpleSearchAdvancedParsing

type SimpleSearchAdvancedParsing struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	// MinLength is the length of the shortest match to return.
	// The default is 4.
	MinLength int

	// HashLen is the number of bytes to use to calculate the hashes.
	// The maximum is 8 and the default is 6.
	HashLen int
	// contains filtered or unexported fields
}

SimpleSearchAdvancedParsing is an implementation of the MatchFinder interface that uses a simple hash table to find matches, but the advanced parsing technique from https://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html, except that it looks for matches at every input position.

func (*SimpleSearchAdvancedParsing) FindMatches

func (q *SimpleSearchAdvancedParsing) FindMatches(dst []Match, src []byte) []Match

func (*SimpleSearchAdvancedParsing) Reset

func (q *SimpleSearchAdvancedParsing) Reset()

type SingleHash

type SingleHash struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	Parser Parser
	// contains filtered or unexported fields
}

SingleHash is an implementation of the MatchFinder interface that uses a simple 4-byte hash to find matches.

func (*SingleHash) FindMatches

func (q *SingleHash) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*SingleHash) Reset

func (q *SingleHash) Reset()

func (*SingleHash) Search

func (q *SingleHash) Search(dst []AbsoluteMatch, pos, min, max int) []AbsoluteMatch

type SingleHashGreedy

type SingleHashGreedy struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int
	// contains filtered or unexported fields
}

SingleHashGreedy is an implementation of the MatchFinder interface that is like SingleHash with an GreedyParser inlined.

func (*SingleHashGreedy) FindMatches

func (q *SingleHashGreedy) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*SingleHashGreedy) Parse

func (q *SingleHashGreedy) Parse(dst []Match, start, end int) []Match

func (*SingleHashGreedy) Reset

func (q *SingleHashGreedy) Reset()

type SingleHashOverlap

type SingleHashOverlap struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int
	// contains filtered or unexported fields
}

SingleHashOverlap is an implementation of the MatchFinder interface that is like SingleHash with an OverlapParser inlined.

func (*SingleHashOverlap) FindMatches

func (q *SingleHashOverlap) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst.

func (*SingleHashOverlap) Parse

func (q *SingleHashOverlap) Parse(dst []Match, start, end int) []Match

func (*SingleHashOverlap) Reset

func (q *SingleHashOverlap) Reset()

type TextEncoder

type TextEncoder struct{}

A TextEncoder is an Encoder that produces a human-readable representation of the LZ77 compression. Matches are replaced with <Length,Distance> symbols.

func (TextEncoder) Encode

func (t TextEncoder) Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte

func (TextEncoder) Reset

func (t TextEncoder) Reset()

type Writer

type Writer struct {
	Dest        io.Writer
	MatchFinder MatchFinder
	Encoder     Encoder

	// BlockSize is the number of bytes to compress at a time. If it is zero,
	// each Write operation will be treated as one block.
	BlockSize int
	// contains filtered or unexported fields
}

A Writer uses MatchFinder and Encoder to write compressed data to Dest.

func (*Writer) Close

func (w *Writer) Close() error

func (*Writer) Reset

func (w *Writer) Reset(newDest io.Writer)

func (*Writer) Write

func (w *Writer) Write(p []byte) (n int, err error)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL