czlib

package module
v0.0.0-...-86a9592 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 14, 2024 License: BSD-3-Clause Imports: 8 Imported by: 1

README

Disclaimer

This repository is considered stable but not actively maintained anymore. It is still in use in many places and safe for production use; but the zlib protocol being stable, we have not made any changes in recent times. Time to reply on issues/PRs may not be on par with other Datadog's repositories.

czlib

GoDoc

czlib started as a fork of the vitess project’s cgzip package. Our primary data pipeline uses zlib compressed messages, but the standard library’s pure Go implementation can be significantly slower than the C zlib library. In order to address this gap, we modified a few flags in cgzip to make it encode and decode with zlib wrapping rather than with gzip headers.

We’ve detailed some of the other more novel design decisions in czlib, including its batch interfaces, in our general blog on performance in Go a couple of years ago. Performance varies quite a bit among the various interfaces, so it pays to benchmark using a message that is typical for your system by running the czlib benchmark suite with PAYLOAD=path_to_message go test -run=NONE -bench .

Here are some benchmark results for compression and decompression of czlib compared to the standard library:

go version go1.22.6 darwin/arm64
pkg: github.com/DataDog/czlib

# 2KiB file
     │ CompressStdZlib │               Compress               │
     │     sec/op      │    sec/op     vs base                │
*-10      75.20µ ± 12%   39.84µ ± 31%  -47.02% (p=0.000 n=10)
     │ CompressStdZlib │               Compress                │
     │       B/s       │      B/s       vs base                │
*-10     27.71Mi ± 11%   52.30Mi ± 24%  +88.73% (p=0.000 n=10)

     │ DecompressStdZlib │             Decompress              │
     │      sec/op       │   sec/op     vs base                │
*-10        18.353µ ± 5%   4.993µ ± 4%  -72.80% (p=0.000 n=10)
     │ DecompressStdZlib │              Decompress               │
     │        B/s        │     B/s       vs base                 │
*-10        113.5Mi ± 5%   417.4Mi ± 3%  +267.60% (p=0.000 n=10)

# Silesia compression corpus - mr (~10MB)
     │ CompressStdZlib │              Compress               │
     │     sec/op      │   sec/op     vs base                │
*-10       327.1m ± 1%   381.0m ± 1%  +16.46% (p=0.000 n=10)

     │ CompressStdZlib │               Compress               │
     │       B/s       │     B/s       vs base                │
*-10      29.07Mi ± 1%   24.96Mi ± 1%  -14.14% (p=0.000 n=10)

     │ DecompressStdZlib │             Decompress              │
     │      sec/op       │   sec/op     vs base                │
*-10         51.20m ± 1%   13.96m ± 2%  -72.74% (p=0.000 n=10)
     │ DecompressStdZlib │              Decompress               │
     │        B/s        │     B/s       vs base                 │
*-10        185.7Mi ± 1%   681.2Mi ± 2%  +266.81% (p=0.000 n=10)

See more on the blog post

Documentation

Index

Constants

View Source
const (
	NoCompression      = flate.NoCompression
	BestSpeed          = flate.BestSpeed
	BestCompression    = flate.BestCompression
	DefaultCompression = flate.DefaultCompression
)

Constants copied from the flate package, so that code that imports czlib does not also have to import "compress/flate".

View Source
const (
	Z_NO_FLUSH      = 0
	Z_PARTIAL_FLUSH = 1
	Z_SYNC_FLUSH    = 2
	Z_FULL_FLUSH    = 3
	Z_FINISH        = 4
	Z_BLOCK         = 5
	Z_TREES         = 6
)

Allowed flush values

View Source
const (
	Z_OK            = 0
	Z_STREAM_END    = 1
	Z_NEED_DICT     = 2
	Z_ERRNO         = -1
	Z_STREAM_ERROR  = -2
	Z_DATA_ERROR    = -3
	Z_MEM_ERROR     = -4
	Z_BUF_ERROR     = -5
	Z_VERSION_ERROR = -6
)

Return codes

View Source
const (
	DEFAULT_COMPRESSED_BUFFER_SIZE = 32 * 1024
)

our default buffer size most go io functions use 32KB as buffer size, so 32KB works well here for compressed data buffer

Variables

View Source
var (
	// ErrChecksum is returned when reading ZLIB data that has an invalid checksum.
	ErrChecksum = zlib.ErrChecksum
	// ErrDictionary is returned when reading ZLIB data that has an invalid dictionary.
	ErrDictionary = zlib.ErrDictionary
	// ErrHeader is returned when reading ZLIB data that has an invalid header.
	ErrHeader = zlib.ErrHeader
)

Functions

func Compress

func Compress(input []byte) ([]byte, error)

Compress returns the input compressed using zlib, or an error if encountered.

func Decompress

func Decompress(input []byte) ([]byte, error)

Decompress returns the input decompressed using zlib, or an error if encountered.

func NewReader

func NewReader(r io.Reader) (io.ReadCloser, error)

NewReader creates a new io.ReadCloser. Reads from the returned io.ReadCloser read and decompress data from r. The implementation buffers input and may read more data than necessary from r. It is the caller's responsibility to call Close on the ReadCloser when done.

func NewReaderBuffer

func NewReaderBuffer(r io.Reader, bufferSize int) (io.ReadCloser, error)

NewReaderBuffer has the same behavior as NewReader but the user can provides a custom buffer size.

Types

type UnsafeByte

type UnsafeByte []byte

An UnsafeByte is a []byte whose backing array has been allocated in C and thus is not subject to the Go garbage collector. The Unsafe versions of Compress and Decompress return this in order to prevent copying the unsafe memory into collected memory.

func NewUnsafeByte

func NewUnsafeByte(p *C.char, length int) UnsafeByte

NewUnsafeByte creates a []byte from the unsafe pointer without a copy, using the method outlined in this mailing list post:

https://groups.google.com/forum/#!topic/golang-nuts/KyXR0fDp0HA

but amended to use the three-index slices from go1.2 to set the capacity of b correctly:

https://tip.golang.org/doc/go1.2#three_index

This means this code only works in go1.2+.

This shouldn't copy the underlying array; it's just casting it Afterwards, we use reflect to fix the Cap & len of the slice.

func UnsafeCompress

func UnsafeCompress(input []byte) (UnsafeByte, error)

UnsafeCompress zips input into an UnsafeByte without copying the result malloced in C. The UnsafeByte returned can be used as a normal []byte but must be manually free'd w/ UnsafeByte.Free()

func UnsafeDecompress

func UnsafeDecompress(input []byte) (UnsafeByte, error)

UnsafeDecompress unzips input into an UnsafeByte without copying the result malloced in C. The UnsafeByte returned can be used as a normal []byte but must be manually free'd w/ UnsafeByte.Free()

func (UnsafeByte) Free

func (b UnsafeByte) Free()

Free the underlying byte array; doing this twice would be bad.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer implements a io.WriteCloser we will call deflateEnd when we set err to a value: - whatever error is returned by the underlying writer - io.EOF if Close was called

func NewWriter

func NewWriter(w io.Writer) *Writer

NewWriter returns a new zlib writer that writes to the underlying writer

func NewWriterLevel

func NewWriterLevel(w io.Writer, level int) (*Writer, error)

NewWriterLevel let the user provide a compression level value

func NewWriterLevelBuffer

func NewWriterLevelBuffer(w io.Writer, level, bufferSize int) (*Writer, error)

NewWriterLevelBuffer let the user provide compression level and buffer size values

func (*Writer) Close

func (z *Writer) Close() error

Close closes the zlib buffer but does not close the wrapped io.Writer originally passed to NewWriterX.

func (*Writer) Flush

func (z *Writer) Flush() error

Flush let the user flush the zlib buffer to the underlying writer buffer

func (*Writer) Write

func (z *Writer) Write(p []byte) (n int, err error)

Write implements the io.Writer interface

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL