libdeflate

package module
v2.0.2+incompatible Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 7, 2020 License: MIT Imports: 2 Imported by: 0

README

libdeflate for go

This library wraps the libdeflate zlib-, gzip- and deflate-(de)compression library for Go, using cgo.

It is significantly faster than Go's standard compress/zlib/gzip/flate libraries (see benchmarks) at the expense of not being able to stream data. Therefore, this library is optimal for the special use case of (de)compressing whole-buffered in-memory data: If it fits into your RAM, this library can (de)compress it much faster than the standard libraries can.

Table of Contents

Features

  • Super fast zlib, gzip, and raw deflate compression / decompression
  • Convenience functions for quicker one-time compression / decompression
  • More (zlib/gzip/flate compatible) compression levels for better compression ratios than with standard zlib/gzip/flate
  • Simple and clean API
Availability of the original libdeflate API:
  • zlib/gzip/deflate compression
  • zlib/gzip/deflate decompression
  • Definite upper bound of compressed size
  • Decompression w/ info about number of consumed bytes
  • adler32 and crc32 checksums
  • Custom memory allocator : no implementation planned, due to too little relevance for a high level Go API

Installation

Prerequisites (working cgo)

In order to use this library with your Go source code, you must be able to use the Go tool cgo, which, in turn, requires a GCC compiler.

If you are on Linux, there is a good chance you already have GCC installed, otherwise just get it with your favorite package manager.

If you are on MacOS, Xcode - for instance - supplies the required tools.

If you are on Windows, you will need to install GCC. I can recommend tdm-gcc which is based off of MinGW. Please note that cgo requires the 64-bit version (as stated here).

For any other the procedure should be about the same. Just google.

TL;DR: Get cgo working.

Download and Installation

If you want to build for $GOARCH=amd64 and either Windows, Linux or MacOS just Go get this library and everything will work right away.

(You may also use Go modules (available since Go 1.11) to get the version of a specific branch or tag if you want to try out or use experimental features. However, beware that these versions are not necessarily guaranteed to be stable or thoroughly tested.)

Instructions for building for different GOOS,GOARCH

First of all, it is not encouraged to build for non-64-bit archs, as this library works best for 64-bit systems.

A list of possible GOOS,GOARCH combinations can be viewed here.

Instructions:

  1. You will need to compile and build the C libdeflate library for your target system by cloning the repository and executing the Makefile (specifying the build). You should always use GCC for compilation as this produces the fastest libraries.

  2. Step 1 will yield compiled library files. You are going to want to use the static library (usually ending with .a [in case of windows, rename .lib to .a]), give it an adequate name (like libdeflate_GOOS_GOARCH.a) and copy it to the native/libs folder of this library.

  3. Go to the native/cgo.go file, which should roughly look like this:

package native

/*
#cgo CFLAGS: -I${SRCDIR}/libs/
#cgo windows,amd64 LDFLAGS: ${SRCDIR}/libs/libdeflate_windows_amd64.a 
#cgo linux,amd64 LDFLAGS: ${SRCDIR}/libs/libdeflate_linux_amd64.a
#cgo darwin,amd64 LDFLAGS: ${SRCDIR}/libs/libdeflate_darwin_amd64.a
*/
import "C"

Now you want to add your build of libdeflate to the cgo directives, more specifically, to the linker flags, like this (omit the '+'):

package native

/*
#cgo CFLAGS: -I${SRCDIR}/libs/
+#cgo GOOS,GOARCH LDFLAGS: ${SRCDIR}/libs/libdeflate_GOOS_GOARCH.a
#cgo windows,amd64 LDFLAGS: ${SRCDIR}/libs/libdeflate_windows_amd64.a
#cgo linux,amd64 LDFLAGS: ${SRCDIR}/libs/libdeflate_linux_amd64.a
#cgo darwin,amd64 LDFLAGS: ${SRCDIR}/libs/libdeflate_darwin_amd64.a
*/
import "C"

That's it! It should work now!

Usage

Compress

First, you need to create a compressor that can be used for any type of compression.

You can also specify a level of compression for which holds true: The higher the level, the higher the compression at the expense of speed. -> lower level = fast, bad compression; higher level = slow, good compression. Test what works best for your application but generally the DefaultCompressionLevel is fine most of the time.

// Compressor with default compression level. Errors if out of memory
c, err := libdeflate.NewCompressor()

// Compressor with custom compression level. Errors if out of memory or if an illegal level was passed. 
c, err = libdeflate.NewCompressorLevel(2)

Then you can compress the actual data with a given mode of compression (currently supported: zlib, gzip, raw deflate):

decomp := []byte(`Some data to compress: May be anything,  
    but it might be a good idea to only compress data that exceeds a certain threshold in size, 
    as compressed data can become larger (due to overhead)`)
comp := make([]byte, len(decomp)) // supplying a fitting buffer is in all cases the fastest approach

n, _, err := c.Compress(decomp, comp, libdeflate.ModeZlib) // Errors if buffer was too short
comp = comp[:n]

You can also pass nil for out, and the function will allocate a fitting buffer by itself:

_, comp, err = c.Compress(decomp, nil, libdeflate.ModeZlib)

After you are done with the compressor, do not forget to close it to free c-allocated-memory:

c.Close()

Decompress

As with compression, you need to create a decompressor which can also be used for any type of decompression at any compression level:

// Doesn't need a compression level; works universally. Errors if out of memory.
dc, err := libdeflate.NewDecompressor() 

Then you can decompress the actual data with a given mode of compression (currently supported: zlib, gzip, raw deflate):

// must be exactly the size of the output, if unknown, pass nil for out(see below)
decompressed := make([]byte, len(decomp)) 

_, err = dc.Decompress(comp, decompressed, ModeZlib) 

Just like with compress you can also pass nil and get a fitting buffer:

decompressed, err = dc.Decompress(comp, nil, ModeZlib)

After you are done with the decompressor, do not forget to close it to free c-allocated-memory:

dc.Close()

There are also convenience functions that allow one-time compression to be easier, as well as functions to directly compress to zlib format.

Notes

  • Do NOT use the same Compressor / Decompressor across multiple threads simultaneously. However, you can create as many of them as you like, so if you want to parallelize your application, just create a compressor / decompressor for each thread. (See Memory Usage down below for more info)

  • Always Close() your Compressor / Decompressor when you are done with it - especially if you create a new compressor/decompressor for each compression/decompression you undertake (which is generally discouraged anyway). As the C-part of this library is not subject to the Go garbage collector, the memory allocated by it must be released manually (by a call to Close()) to avoid memory leakage.

  • Memory Usage: Compressing requires at least ~32 KiB of additional memory during execution, while Decompressing also requires at least ~32 KiB of additional memory during execution.

Benchmarks

These benchmarks were conducted with "real-life-type data" to ensure that these tests are most representative for an actual use case in a practical production environment. As the zlib standard has been traditionally used for compressing smaller chunks of data, I have decided to follow suite by opting for Minecraft client-server communication packets, as they represent the optimal use case for this library.

To that end, I have recorded 930 individual Minecraft packets, totalling 11,445,993 bytes in uncompressed data and 1,564,159 bytes in compressed data. These packets represent actual client-server communication and were recorded using this software.

The benchmarks were executed on different hardware and operating systems, including AMD and Intel processors, as well as all the out-of-the-box supported operating systems (Windows, Linux, MacOS). All the benchmarked functions/methods were executed hundreds of times, and the numbers you are about to see are the averages over all these executions.

The data was compressed using compression level 6 (current default of zlib).

These benchmarks compare this library (blue) to the Go standard library (yellow) and show that this library performs way better in all cases.

  • (A note regarding testing on your machine)

    Please note that you will need an Internet connection for some benchmarks to function. This is because these benchmarks will download the mc packets from here and temporarily store them in memory for the duration of the benchmark tests, so this repository won't have to include the data in order save space on your machine and to make it a lightweight library.

Compression

compression total

This chart shows how long it took for the methods of this library (blue), and the standard library (yellow) to compress all of the 930 packets (~11.5 MB) on different systems in milliseconds. Note that the two rightmost data points were tested on exactly the same hardware in a dual-boot setup and that Linux seems to generally perform better than Windows.

compression relative

This chart shows the time it took for this library's Compress (blue) to compress the data in nanoseconds, as well as the time it took for the standard library's Write (WriteStd, yellow) to compress the data in nanoseconds. The vertical axis shows percentages relative to the time needed by the standard library, thus you can see how much faster this library is.

For example: This library only needed ~29% of the time required by the standard library to compress the packets on an Intel Core i5-6600K on Windows. That makes the standard library a substantial ~244.8% slower than this library.

Decompression

compression total

This chart shows how long it took for the methods of this library (blue), and the standard library (yellow) to decompress all of the 930 packets (~1.5 MB) on different systems in milliseconds. Note that the two rightmost data points were tested on exactly the same hardware in a dual-boot setup and that Linux seems to generally perform better than Windows.

decompression relative

This chart shows the time it took for this library's Decompress (blue) to decompress the data in nanoseconds, as well as the time it took for the standard library's Read (ReadStd, Yellow) to decompress the data in nanoseconds. The vertical axis shows percentages relative to the time needed by the standard library, thus you can see how much faster this library is.

For example: This library only needed ~34% of the time required by the standard library to decompress the packets on an Intel Core i5-6600K on Windows. That makes the standard library a substantial ~194.1% slower than this library.

Compression Ratio

Across all the benchmarks on all the different hardware / operating systems the compression ratios were consistent: This library had a compression ratio of 5.77 while the standard library had a compression ratio of 5.75, which is a negligible difference.

The compression ratio r is calculated as r = ucs / cs, where ucs = uncompressed size and cs = compressed size.

License

MIT License

Copyright (c) 2020 Dominik Ochs 

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Attribution

This library heavily depends on the C library libdeflate, so everything in the folder native/libs is licensed under:

MIT License 
[for the license text of the MIT License, see LICENSE]

Copyright 2016 Eric Biggers

(See also: native/libs/LICENSE)

Documentation

Index

Constants

View Source
const (
	MinCompressionLevel        = native.MinCompressionLevel
	MaxStdZlibCompressionLevel = native.MaxStdZlibCompressionLevel
	MaxCompressionLevel        = native.MaxCompressionLevel
	DefaultCompressionLevel    = native.DefaultCompressionLevel
)

These constants specify several special compression levels

Variables

This section is empty.

Functions

func Adler32

func Adler32(adler32 uint32, in []byte) uint32

Adler32 updates the running adler32 checksum by the contents of the slice in. This function returns the updated checksum. A new adler32-checksum requires 1 as initial value. This value is also returned if in == nil.

func Compress

func Compress(in, out []byte, m Mode) (int, []byte, error)

Compress compresses the data from in to out and returns the number of bytes written to out, out (sliced to written) or an error if the out buffer was too short. If you pass nil for out, this function will allocate a fitting buffer and return it (not preferred though).

m specifies which compression format should be used (e.g. ModeZlib). Uses default compression level.

IF YOU WANT TO COMPRESS MORE THAN ONCE, PLEASE REFER TO NewCompressor(), as this function creates a new Compressor (alloc 32KiB) which is then closed at the end of the function.

Notice that for extremely small or already highly compressed data, the compressed data could be larger than uncompressed. If out == nil: For a too large discrepancy (len(out) > 1000 + 2 * len(in)) Compress will error

func CompressLevel

func CompressLevel(in, out []byte, m Mode, level int) (int, []byte, error)

CompressLevel compresses the data from in to out using the specified compression level and returns the number of bytes written to out, out (sliced to written) or an error if the out buffer was too short. If you pass nil for out, this function will allocate a fitting buffer and return it (not preferred though).

m specifies which compression format should be used (e.g. ModeZlib). Level defines the compression level.

IF YOU WANT TO COMPRESS MORE THAN ONCE, PLEASE REFER TO NewCompressorLevel(), as this function creates a new Compressor (alloc 32KiB) which is then closed at the end of the function.

Notice that for extremely small or already highly compressed data, the compressed data could be larger than uncompressed. If out == nil: For a too large discrepancy (len(out) > 1000 + 2 * len(in)) Compress will error

func CompressZlib

func CompressZlib(in, out []byte) (int, []byte, error)

CompressZlib compresses the data from in to out (in zlib format) and returns the number of bytes written to out, out (sliced to written) or an error if the out buffer was too short. If you pass nil for out, this function will allocate a fitting buffer and return it (not preferred though).

IF YOU WANT TO COMPRESS MORE THAN ONCE, PLEASE REFER TO NewCompressor(), as this function creates a new Compressor (alloc 32KiB) which is then closed at the end of the function.

See Compress for further information.

func CompressZlibLevel

func CompressZlibLevel(in, out []byte, level int) (int, []byte, error)

CompressZlibLevel compresses the data from in to out (in zlib format) at the specified level and returns the number of bytes written to out, out (sliced to written) or an error if the out buffer was too short. If you pass nil for out, this function will allocate a fitting buffer and return it (not preferred though).

IF YOU WANT TO COMPRESS MORE THAN ONCE, PLEASE REFER TO NewCompressorLevel(), as this function creates a new Compressor (alloc 32KiB) which is then closed at the end of the function.

See CompressLevel for further information.

func Crc32

func Crc32(crc32 uint32, in []byte) uint32

Crc32 updates the running crc32 checksum by the contents of the slice in. This function returns the updated checksum. A new crc32-checksum requires 0 as initial value. This value is also returned if in == nil.

func Decompress

func Decompress(in, out []byte, m Mode) (int, []byte, error)

Decompress decompresses the given data from in to out and returns the number of consumed bytes c from 'in' and 'out' or an error if something went wrong. Mode m specifies the format (e.g. zlib) of the data within in.

c is the number of bytes that were read before the BFINAL flag was encountered, which indicates the end of the compressed data.

If you pass a buffer to out, the size of this buffer must exactly match the length of the decompressed data. If you pass nil to out, this function will allocate a sufficient buffer and return it.

IF YOU WANT TO DECOMPRESS MORE THAN ONCE, PLEASE REFER TO NewDecompressor(), as this function creates a new Decompressor (alloc 32KiB) which is then closed at the end of the function.

If error != nil, the data in out is undefined.

func DecompressZlib

func DecompressZlib(in, out []byte) (int, []byte, error)

DecompressZlib decompresses the given zlib data from in to out and returns the number of consumed bytes c from 'in' and 'out' or an error if something went wrong.

c is the number of bytes that were read before the BFINAL flag was encountered, which indicates the end of the compressed data.

If you pass a buffer to out, the size of this buffer must exactly match the length of the decompressed data. If you pass nil to out, this function will allocate a sufficient buffer and return it.

IF YOU WANT TO DECOMPRESS MORE THAN ONCE, PLEASE REFER TO NewDecompressor(), as this function creates a new Decompressor (alloc 32KiB) which is then closed at the end of the function.

If error != nil, the data in out is undefined.

Types

type Compressor

type Compressor struct {
	// contains filtered or unexported fields
}

Compressor compresses data at the specified compression level.

A single compressor must not not be used across multiple threads concurrently. If you want to compress concurrently, create a compressor for each thread.

Always Close() the decompressor to free c memory. One Compressor allocates at least 32 KiB.

func NewCompressor

func NewCompressor() (Compressor, error)

NewCompressor returns a new Compressor used to compress data with compression level DefaultCompressionLevel. Errors if out of memory. Allocates 32KiB. See NewCompressorLevel for custom compression level

func NewCompressorLevel

func NewCompressorLevel(level int) (Compressor, error)

NewCompressorLevel returns a new Compressor used to compress data. Errors if out of memory or if an invalid compression level was passed. Allocates 32KiB.

The compression level is legal if and only if: MinCompressionLevel <= level <= MaxCompressionLevel

func (Compressor) Close

func (c Compressor) Close()

Close closes the compressor and releases all occupied resources. It is the users responsibility to close compressors in order to free resources, as the underlying c objects are not subject to the go garbage collector. They have to be freed manually.

After closing, the compressor must not be used anymore, as the methods will panic (except for the c.Level() method).

func (Compressor) Compress

func (c Compressor) Compress(in, out []byte, m Mode) (int, []byte, error)

Compress compresses the data from in to out and returns the number of bytes written to out, out (sliced to written) or an error if the out buffer was too short. If you pass nil for out, this function will allocate a fitting buffer and return it (not preferred though).

m specifies which compression format should be used (e.g. ModeZlib)

Notice that for extremely small or already highly compressed data, the compressed data could be larger than uncompressed. If out == nil: For a too large discrepancy (len(out) > 1000 + 2 * len(in)) Compress will error

func (Compressor) CompressZlib

func (c Compressor) CompressZlib(in, out []byte) (int, []byte, error)

CompressZlib compresses the data from in to out (in zlib format) and returns the number of bytes written to out, out (sliced to written) or an error if the out buffer was too short. If you pass nil for out, this function will allocate a fitting buffer and return it (not preferred though).

See c.Compress for further information.

func (Compressor) Level

func (c Compressor) Level() int

Level returns the compression level at which this Compressor compresses. May be called after having closed a Compressor.

func (Compressor) WorstCaseCompressedSize

func (c Compressor) WorstCaseCompressedSize(size int, m Mode) (max int)

WorstCaseCompressedSize returns the maximum theoretical size of the data after compressing data of length 'size', using the given mode of compression. This prediction is a wild overestimate in most cases, for which holds true: max >= size. However, it gives a hard maximal bound of the size of compressed data, compressing with the given mode at the compression level of the this compressor, independent of the actual data. This method will always return the same max size for the same compressor, input size and mode.

type Decompressor

type Decompressor struct {
	// contains filtered or unexported fields
}

Decompressor decompresses any DEFLATE, zlib or gzip compressed data at any level

A single decompressor must not not be used across multiple threads concurrently. If you want to decompress concurrently, create a decompressor for each thread.

Always Close() the decompressor to free c memory. One Decompressor allocates at least 32KiB.

func NewDecompressor

func NewDecompressor() (Decompressor, error)

NewDecompressor returns a new Decompressor used to decompress data at any compression level and with any Mode. Errors if out of memory. Allocates 32KiB.

func (Decompressor) Close

func (dc Decompressor) Close()

Close closes the decompressor and releases all occupied resources. It is the users responsibility to close decompressors in order to free resources, as the underlying c objects are not subject to the go garbage collector. They have to be freed manually.

After closing, the decompressor must not be used anymore, as the methods will panic.

func (Decompressor) Decompress

func (dc Decompressor) Decompress(in, out []byte, m Mode) (int, []byte, error)

Decompress decompresses the given data from in to out and returns the number of consumed bytes c from 'in' and 'out' or an error if something went wrong. Mode m specifies the format (e.g. zlib) of the data within in.

c is the number of bytes that were read before the BFINAL flag was encountered, which indicates the end of the compressed data.

If you pass a buffer to out, the size of this buffer must exactly match the length of the decompressed data. If you pass nil to out, this function will allocate a sufficient buffer and return it.

If error != nil, the data in out is undefined.

func (Decompressor) DecompressZlib

func (dc Decompressor) DecompressZlib(in, out []byte) (int, []byte, error)

DecompressZlib decompresses the given zlib data from in to out and returns the number of consumed bytes c from 'in' and 'out' or an error if something went wrong.

c is the number of bytes that were read before the BFINAL flag was encountered, which indicates the end of the compressed data.

If you pass a buffer to out, the size of this buffer must exactly match the length of the decompressed data. If you pass nil to out, this function will allocate a sufficient buffer and return it.

If error != nil, the data in out is undefined.

type Mode

type Mode int

Mode specifies the type of compression/decompression such as zlib, gzip and raw DEFLATE

const (
	ModeDEFLATE Mode = iota
	ModeZlib
	ModeGzip
)

The constants that specify a certain mode of compression/decompression

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL