matchfinder

package
v0.0.0-...-89700a6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 19, 2024 License: MIT Imports: 5 Imported by: 1

Documentation

Overview

The matchfinder package defines reusable components for data compression.

Many compression libraries have two main parts:

  • Something that looks for repeated sequences of bytes
  • An encoder for the compressed data format (often an entropy coder)

Although these are logically two separate steps, the implementations are usually closely tied together. You can't use flate's matcher with snappy's encoder, for example. This package defines interfaces and an intermediate representation to allow mixing and matching compression components.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AutoReset

type AutoReset struct {
	MatchFinder
}

AutoReset wraps a MatchFinder that can return references to data in previous blocks, and calls Reset before each block. It is useful for (e.g.) using a snappy Encoder with a MatchFinder designed for flate. (Snappy doesn't support references between blocks.)

func (AutoReset) FindMatches

func (a AutoReset) FindMatches(dst []Match, src []byte) []Match

type Encoder

type Encoder interface {
	// Encode appends the encoded format of src to dst, using the match
	// information from matches.
	Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte

	// Reset clears any internal state, preparing the Encoder to be used with
	// a new stream.
	Reset()
}

An Encoder encodes the data in its final format.

type M0

type M0 struct {
	// Lazy turns on "lazy matching," for higher compression but less speed.
	Lazy bool

	MaxDistance int
	MaxLength   int
}

M0 is an implementation of the MatchFinder interface based on the algorithm used by snappy, but modified to be more like the algorithm used by compression level 0 of the brotli reference implementation.

It has a maximum block size of 65536 bytes.

func (M0) FindMatches

func (m M0) FindMatches(dst []Match, src []byte) []Match

FindMatches looks for matches in src, appends them to dst, and returns dst. src must not be longer than 65536 bytes.

func (M0) Reset

func (M0) Reset()

type M4

type M4 struct {
	// MaxDistance is the maximum distance (in bytes) to look back for
	// a match. The default is 65535.
	MaxDistance int

	// MinLength is the length of the shortest match to return.
	// The default is 4.
	MinLength int

	// HashLen is the number of bytes to use to calculate the hashes.
	// The maximum is 8 and the default is 6.
	HashLen int

	// TableBits is the number of bits in the hash table indexes.
	// The default is 17 (128K entries).
	TableBits int

	// ChainLength is how many entries to search on the "match chain" of older
	// locations with the same hash as the current location.
	ChainLength int

	// DistanceBitCost is used when comparing two matches to see
	// which is better. The comparison is primarily based on the length
	// of the matches, but it can also take the distance into account,
	// in terms of the number of bits needed to represent the distance.
	// One byte of length is given a score of 256, so 32 (256/8) would
	// be a reasonable first guess for the value of one bit.
	// (The default is 0, which bases the comparison solely on length.)
	DistanceBitCost int
	// contains filtered or unexported fields
}

M4 is an implementation of the MatchFinder interface that uses a hash table to find matches, optional match chains, and the advanced parsing technique from https://fastcompression.blogspot.com/2011/12/advanced-parsing-strategies.html.

func (*M4) FindMatches

func (q *M4) FindMatches(dst []Match, src []byte) []Match

func (*M4) Reset

func (q *M4) Reset()

type Match

type Match struct {
	Unmatched int // the number of unmatched bytes since the previous match
	Length    int // the number of bytes in the matched string; it may be 0 at the end of the input
	Distance  int // how far back in the stream to copy from
}

A Match is the basic unit of LZ77 compression.

type MatchFinder

type MatchFinder interface {
	// FindMatches looks for matches in src, appends them to dst, and returns dst.
	FindMatches(dst []Match, src []byte) []Match

	// Reset clears any internal state, preparing the MatchFinder to be used with
	// a new stream.
	Reset()
}

A MatchFinder performs the LZ77 stage of compression, looking for matches.

type NoMatchFinder

type NoMatchFinder struct{}

A NoMatchFinder implements MatchFinder, but doesn't find any matches. It can be used to implement the equivalent of the standard library flate package's HuffmanOnly setting.

func (NoMatchFinder) FindMatches

func (n NoMatchFinder) FindMatches(dst []Match, src []byte) []Match

func (NoMatchFinder) Reset

func (n NoMatchFinder) Reset()

type TextEncoder

type TextEncoder struct{}

A TextEncoder is an Encoder that produces a human-readable representation of the LZ77 compression. Matches are replaced with <Length,Distance> symbols.

func (TextEncoder) Encode

func (t TextEncoder) Encode(dst []byte, src []byte, matches []Match, lastBlock bool) []byte

func (TextEncoder) Reset

func (t TextEncoder) Reset()

type Writer

type Writer struct {
	Dest        io.Writer
	MatchFinder MatchFinder
	Encoder     Encoder

	// BlockSize is the number of bytes to compress at a time. If it is zero,
	// each Write operation will be treated as one block.
	BlockSize int
	// contains filtered or unexported fields
}

A Writer uses MatchFinder and Encoder to write compressed data to Dest.

func (*Writer) Close

func (w *Writer) Close() error

func (*Writer) Flush

func (w *Writer) Flush() error

func (*Writer) Reset

func (w *Writer) Reset(newDest io.Writer)

func (*Writer) Write

func (w *Writer) Write(p []byte) (n int, err error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL