huff0

package
v0.0.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2023 License: Apache-2.0, BSD-3-Clause, MIT Imports: 10 Imported by: 0

README

Huff0 entropy compression

This package provides Huff0 encoding and decoding as used in zstd.

Huff0, a Huffman codec designed for modern CPU, featuring OoO (Out of Order) operations on multiple ALU (Arithmetic Logic Unit), achieving extremely fast compression and decompression speeds.

This can be used for compressing input with a lot of similar input values to the smallest number of bytes. This does not perform any multi-byte dictionary coding as LZ coders, but it can be used as a secondary step to compressors (like Snappy) that does not do entropy encoding.

News

This is used as part of the zstandard compression and decompression package.

This ensures that most functionality is well tested.

Usage

This package provides a low level interface that allows to compress single independent blocks.

Each block is separate, and there is no built in integrity checks. This means that the caller should keep track of block sizes and also do checksums if needed.

Compressing a block is done via the Compress1X and Compress4X functions. You must provide input and will receive the output and maybe an error.

These error values can be returned:

Error Description
<nil> Everything ok, output is returned
ErrIncompressible Returned when input is judged to be too hard to compress
ErrUseRLE Returned from the compressor when the input is a single byte value repeated
ErrTooBig Returned if the input block exceeds the maximum allowed size (128 Kib)
(error) An internal error occurred.

As can be seen above some of there are errors that will be returned even under normal operation so it is important to handle these.

To reduce allocations you can provide a Scratch object that can be re-used for successive calls. Both compression and decompression accepts a Scratch object, and the same object can be used for both.

Be aware, that when re-using a Scratch object that the output buffer is also re-used, so if you are still using this you must set the Out field in the scratch to nil. The same buffer is used for compression and decompression output.

The Scratch object will retain state that allows to re-use previous tables for encoding and decoding.

Tables and re-use

Huff0 allows for reusing tables from the previous block to save space if that is expected to give better/faster results.

The Scratch object allows you to set a ReusePolicy that controls this behaviour. See the documentation for details. This can be altered between each block.

Do however note that this information is not stored in the output block and it is up to the users of the package to record whether ReadTable should be called, based on the boolean reported back from the CompressXX call.

If you want to store the table separate from the data, you can access them as OutData and OutTable on the Scratch object.

Decompressing

The first part of decoding is to initialize the decoding table through ReadTable. This will initialize the decoding tables. You can supply the complete block to ReadTable and it will return the data part of the block which can be given to the decompressor.

Decompressing is done by calling the Decompress1X or Decompress4X function.

For concurrently decompressing content with a fixed table a stateless Decoder can be requested which will remain correct as long as the scratch is unchanged. The capacity of the provided slice indicates the expected output size.

You must provide the output from the compression stage, at exactly the size you got back. If you receive an error back your input was likely corrupted.

It is important to note that a successful decoding does not mean your output matches your original input. There are no integrity checks, so relying on errors from the decompressor does not assure your data is valid.

Contributing

Contributions are always welcome. Be aware that adding public functions will require good justification and breaking changes will likely not be accepted. If in doubt open an issue before writing the PR.

Documentation

Overview

This file contains the specialisation of Decoder.Decompress4X and Decoder.Decompress1X that use an asm implementation of thir main loops.

Package huff0 provides fast huffman encoding as used in zstd.

See README.md at https://github.com/klauspost/compress/tree/master/huff0 for details.

Index

Constants

View Source
const (

	// BlockSizeMax is maximum input size for a single block uncompressed.
	BlockSizeMax = 1<<18 - 1
)

Variables

View Source
var (
	// ErrIncompressible is returned when input is judged to be too hard to compress.
	ErrIncompressible = errors.New("input is not compressible")

	// ErrUseRLE is returned from the compressor when the input is a single byte value repeated.
	ErrUseRLE = errors.New("input is single value repeated")

	// ErrTooBig is return if input is too large for a single block.
	ErrTooBig = errors.New("input too big")

	// ErrMaxDecodedSizeExceeded is return if input is too large for a single block.
	ErrMaxDecodedSizeExceeded = errors.New("maximum output size exceeded")
)

Functions

func Compress1X

func Compress1X(in []byte, s *Scratch) (out []byte, reUsed bool, err error)

Compress1X will compress the input. The output can be decoded using Decompress1X. Supply a Scratch object. The scratch object contains state about re-use, So when sharing across independent encodes, be sure to set the re-use policy.

func Compress4X

func Compress4X(in []byte, s *Scratch) (out []byte, reUsed bool, err error)

Compress4X will compress the input. The input is split into 4 independent blocks and compressed similar to Compress1X. The output can be decoded using Decompress4X. Supply a Scratch object. The scratch object contains state about re-use, So when sharing across independent encodes, be sure to set the re-use policy.

func EstimateSizes

func EstimateSizes(in []byte, s *Scratch) (tableSz, dataSz, reuseSz int, err error)

EstimateSizes will estimate the data sizes

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder provides stateless decoding.

func (*Decoder) Decompress1X

func (d *Decoder) Decompress1X(dst, src []byte) ([]byte, error)

Decompress1X will decompress a 1X encoded stream. The cap of the output buffer will be the maximum decompressed size. The length of the supplied input must match the end of a block exactly.

func (*Decoder) Decompress4X

func (d *Decoder) Decompress4X(dst, src []byte) ([]byte, error)

Decompress4X will decompress a 4X encoded stream. The length of the supplied input must match the end of a block exactly. The *capacity* of the dst slice must match the destination size of the uncompressed data exactly.

type ReusePolicy

type ReusePolicy uint8
const (
	// ReusePolicyAllow will allow reuse if it produces smaller output.
	ReusePolicyAllow ReusePolicy = iota

	// ReusePolicyPrefer will re-use aggressively if possible.
	// This will not check if a new table will produce smaller output,
	// except if the current table is impossible to use or
	// compressed output is bigger than input.
	ReusePolicyPrefer

	// ReusePolicyNone will disable re-use of tables.
	// This is slightly faster than ReusePolicyAllow but may produce larger output.
	ReusePolicyNone

	// ReusePolicyMust must allow reuse and produce smaller output.
	ReusePolicyMust
)

type Scratch

type Scratch struct {

	// Out is output buffer.
	// If the scratch is re-used before the caller is done processing the output,
	// set this field to nil.
	// Otherwise the output buffer will be re-used for next Compression/Decompression step
	// and allocation will be avoided.
	Out []byte

	// OutTable will contain the table data only, if a new table has been generated.
	// Slice of the returned data.
	OutTable []byte

	// OutData will contain the compressed data.
	// Slice of the returned data.
	OutData []byte

	// MaxDecodedSize will set the maximum allowed output size.
	// This value will automatically be set to BlockSizeMax if not set.
	// Decoders will return ErrMaxDecodedSizeExceeded is this limit is exceeded.
	MaxDecodedSize int

	// MaxSymbolValue will override the maximum symbol value of the next block.
	MaxSymbolValue uint8

	// TableLog will attempt to override the tablelog for the next block.
	// Must be <= 11 and >= 5.
	TableLog uint8

	// Reuse will specify the reuse policy
	Reuse ReusePolicy

	// WantLogLess allows to specify a log 2 reduction that should at least be achieved,
	// otherwise the block will be returned as incompressible.
	// The reduction should then at least be (input size >> WantLogLess)
	// If WantLogLess == 0 any improvement will do.
	WantLogLess uint8
	// contains filtered or unexported fields
}

func ReadTable

func ReadTable(in []byte, s *Scratch) (s2 *Scratch, remain []byte, err error)

ReadTable will read a table from the input. The size of the input may be larger than the table definition. Any content remaining after the table definition will be returned. If no Scratch is provided a new one is allocated. The returned Scratch can be used for encoding or decoding input using this table.

func (*Scratch) Decoder

func (s *Scratch) Decoder() *Decoder

Decoder will return a stateless decoder that can be used by multiple decompressors concurrently. Before this is called, the table must be initialized with ReadTable. The Decoder is still linked to the scratch buffer so that cannot be reused. However, it is safe to discard the scratch.

func (*Scratch) Decompress1X

func (s *Scratch) Decompress1X(in []byte) (out []byte, err error)

Decompress1X will decompress a 1X encoded stream. The length of the supplied input must match the end of a block exactly. Before this is called, the table must be initialized with ReadTable unless the encoder re-used the table. deprecated: Use the stateless Decoder() to get a concurrent version.

func (*Scratch) Decompress4X

func (s *Scratch) Decompress4X(in []byte, dstSize int) (out []byte, err error)

Decompress4X will decompress a 4X encoded stream. Before this is called, the table must be initialized with ReadTable unless the encoder re-used the table. The length of the supplied input must match the end of a block exactly. The destination size of the uncompressed data must be known and provided. deprecated: Use the stateless Decoder() to get a concurrent version.

func (*Scratch) TransferCTable

func (s *Scratch) TransferCTable(src *Scratch)

TransferCTable will transfer the previously used compression table.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL