Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder. A high performance compression algorithm is implemented. For now focused on speed.

This package provides compression to and decompression of Zstandard content. Note that custom dictionaries are not supported yet, so if your code relies on that, you cannot use the package as-is.

This package is pure Go and without use of "unsafe". If a significant speedup can be achieved using "unsafe", it may be added as an option later.

The zstd package is provided as open source software using a Go standard license.

Currently the package is heavily optimized for 64 bit processors and will be significantly slower on 32 bit processors.


Install using go get -u github.com/klauspost/compress. The package is located in github.com/klauspost/compress/zstd.

Godoc Documentation: https://godoc.org/github.com/klauspost/compress/zstd

You will also need the github.com/cespare/xxhash package.



BETA - there may still be subtle bugs, but a wide variety of content has been tested. There may still be implementation specific stuff in regards to error handling that could lead to edge cases.

For now, a high speed compressor has been implemented. The compression ratio is roughly equivalent to zstd level 2.

In terms of speed, it is typically 2x as fast as the stdlib deflate/gzip in its fastest mode. The compression ratio compared to stdlib is around level 3, but usually 3x as fast.

Compared to cgo zstd, the speed is around level 3 (default), but compression slightly worse, between level 1&2.


An Encoder can be used for either compressing a stream via the io.WriteCloser interface supported by the Encoder or as multiple independent tasks via the EncodeAll function. Smaller encodes are encouraged to use the EncodeAll function. Use NewWriter to create a new instance that can be used for both.

To create a writer with default options, do like this:

// Compress input to output.
func Compress(in io.Reader, out io.Writer) error {
	w, err := NewWriter(output)
	if err != nil {
		return err
	_, err := io.Copy(w, input)
	if err != nil {
		return err
	return enc.Close()

Now you can encode by writing data to enc. The output will be finished writing when Close() is called. Even if your encode fails, you should still call Close() to release any resources that may be held up.

The above is fine for big encodes. However, whenever possible try to reuse the writer.

To reuse the encoder, you can use the Reset(io.Writer) function to change to another output. This will allow the encoder to reuse all resources and avoid wasteful allocations.

Currently stream encoding has 'light' concurrency, meaning up to 2 goroutines can be working on part of a stream. This is independent of the WithEncoderConcurrency(n), but that is likely to change in the future. So if you want to limit concurrency for future updates, specify the concurrency you would like.

Future Compatibility Guarantees

This will be an evolving project. When using this package it is important to note that both the compression efficiency and speed may change.

The goal will be to keep the default efficiency at the default zstd (level 3). However the encoding should never be assumed to remain the same, and you should not use hashes of compressed output for similarity checks.

The Encoder can be assumed to produce the same output from the exact same code version. However, the may be modes in the future that break this, although they will not be enabled without an explicit option.

This encoder is not designed to (and will probably never) output the exact same bitstream as the reference encoder.

Also note, that the cgo decompressor currently does not report all errors on invalid input, omits error checks, ignores checksums and seems to ignore concatenated streams, even though it is part of the spec.


For compressing small blocks, the returned encoder has a function called EncodeAll(src, dst []byte) []byte.

EncodeAll will encode all input in src and append it to dst. This function can be called concurrently, but each call will only run on a single goroutine.

Encoded blocks can be concatenated and the result will be the combined input stream. Data compressed with EncodeAll can be decoded with the Decoder, using either a stream or DecodeAll.

Especially when encoding blocks you should take special care to reuse the encoder. This will effectively make it run without allocations after a warmup period. To make it run completely without allocations, supply a destination buffer with space for all content.

import "github.com/klauspost/compress/zstd"

// Create a writer that caches compressors.
// For this operation type we supply a nil Reader.
var encoder, _ = zstd.NewWriter(nil)

// Compress a buffer. 
// If you have a destination buffer, the allocation in the call can also be eliminated.
func Compress(src []byte) []byte {
	return encoder.EncodeAll(src, make([]byte, 0, len(src)))

You can control the maximum number of concurrent encodes using the WithEncoderConcurrency(n) option when creating the writer.

Using the Encoder for both a stream and individual blocks concurrently is safe.


I have collected some speed examples to compare speed and compression against other compressors.

  • file is the input file.
  • out is the compressor used. zskp is this package. gzstd is gzip standard library. zstd is the Datadog cgo library.
  • level is the compression level used.
  • insize/outsize is the input/output size.
  • millis is the number of milliseconds used for compression.
  • mb/s is megabytes (2^20 bytes) per second.
The test data for the Large Text Compression Benchmark is the first
10^9 bytes of the English Wikipedia dump on Mar. 3, 2006.

file    out     level   insize  outsize     millis  mb/s
enwik9  zskp    1   1000000000  348027537   7537    126.53
enwik9  gzstd   1   1000000000  382578136   13627   69.98
enwik9  gzstd   3   1000000000  349139651   22344   42.68
enwik9  zstd    1   1000000000  357416379   4838    197.12
enwik9  zstd    2   1000000000  329056536   5899    161.64

GOB stream of binary data. Highly compressible.

file        out level   insize      outsize     millis  mb/s
gob-stream  zskp    1   1911399616  272529084   7047    258.67
gob-stream  gzstd   1   1911399616  357382641   14727   123.78
gob-stream  gzstd   3   1911399616  327835097   17005   107.19
gob-stream  zstd    1   1911399616  250787165   4345    419.43
gob-stream  zstd    2   1911399616  225853438   4599    396.36

Highly compressible JSON file. Similar to logs in a lot of ways.

file            out level   insize      outsize     millis  mb/s
adresser.001    zskp    1   1073741824  18235536    1220    839.34
adresser.001    gzstd   1   1073741824  47755503    3079    332.47
adresser.001    gzstd   3   1073741824  40052381    3051    335.63
adresser.001    zstd    1   1073741824  16135896    903     1133.99
adresser.001    zstd    2   1073741824  16340655    916     1117.90

VM Image, Linux mint with a few installed applications:

file    out level   insize  outsize millis  mb/s
rawstudio-mint14.tar    zskp    1   8558382592  3800966951  43201   188.93
rawstudio-mint14.tar    gzstd   1   8558382592  3926257486  84712   96.35
rawstudio-mint14.tar    gzstd   3   8558382592  3740711978  176344  46.28
rawstudio-mint14.tar    zstd    1   8558382592  3607859705  27613   295.58
rawstudio-mint14.tar    zstd    2   8558382592  3457698070  31781   256.81

The test data is designed to test archivers in realistic backup scenarios.

file    out level   insize  outsize millis  mb/s
10gb.tar    zskp    1   10065157632 5001038195  59349   161.74
10gb.tar    gzstd   1   10065157632 5198296126  97769   98.18
10gb.tar    gzstd   3   10065157632 4932665487  313427  30.63
10gb.tar    zstd    1   10065157632 4940796535  40391   237.65
10gb.tar    zstd    2   10065157632 4778612089  43680   219.75

Silesia Corpus:

file    out level   insize  outsize millis  mb/s
silesia.tar zskp    1   211947520   73712964    1369    147.65
silesia.tar gzstd   1   211947520   80007735    2515    80.37
silesia.tar gzstd   3   211947520   73133380    4259    47.45
silesia.tar zstd    1   211947520   73513991    946     213.44
silesia.tar zstd    2   211947520   69595464    1097    184.09


As part of the development process a Snappy -> Zstandard converter was also built.

This can convert a framed Snappy Stream to a zstd stream. Note that a single block is not framed.

Conversion is done by converting the stream directly from Snappy without intermediate full decoding. Therefore the compression ratio is much less than what can be done by a full decompression and compression, and a faulty Snappy stream may lead to a faulty Zstandard stream without any errors being generated. No CRC value is being generated and not all CRC values of the Snappy stream are checked. However, it provides really fast re-compression of Snappy streams.

BenchmarkSnappy_ConvertSilesia-8           1  1156001600 ns/op   183.35 MB/s
Snappy len 103008711 -> zstd len 82687318

BenchmarkSnappy_Enwik9-8           1  6472998400 ns/op   154.49 MB/s
Snappy len 508028601 -> zstd len 390921079
	s := zstd.SnappyConverter{}
	n, err = s.Convert(input, output)
	if err != nil {
	    fmt.Println("Re-compressed stream to", n, "bytes")

The converter s can be reused to avoid allocations, even after errors.


STATUS: Release Candidate - there may still be subtle bugs, but a wide variety of content has been tested.


The package has been designed for two main usages, big streams of data and smaller in-memory buffers. There are two main usages of the package for these. Both of them are accessed by creating a Decoder.

For streaming use a simple setup could look like this:

import "github.com/klauspost/compress/zstd"

func Decompress(in io.Reader, out io.Writer) error {
    d, err := zstd.NewReader(input)
    if err != nil {
    	return err
    defer d.Close()
    // Copy content...
    _, err := io.Copy(out, d)
    return err

It is important to use the "Close" function when you no longer need the Reader to stop running goroutines. See "Allocation-less operation" below.

For decoding buffers, it could look something like this:

import "github.com/klauspost/compress/zstd"

// Create a reader that caches decompressors.
// For this operation type we supply a nil Reader.
var decoder, _ = zstd.NewReader(nil)

// Decompress a buffer. We don't supply a destination buffer,
// so it will be allocated by the decoder.
func Decompress(src []byte) ([]byte, error) {
	return decoder.DecodeAll(src, nil)

Both of these cases should provide the functionality needed. The decoder can be used for concurrent decompression of multiple buffers. It will only allow a certain number of concurrent operations to run. To tweak that yourself use the WithDecoderConcurrency(n) option when creating the decoder.

Allocation-less operation

The decoder has been designed to operate without allocations after a warmup.

This means that you should store the decoder for best performance. To re-use a stream decoder, use the Reset(r io.Reader) error to switch to another stream. A decoder can safely be re-used even if the previous stream failed.

To release the resources, you must call the Close() function on a decoder. After this it can no longer be reused, but all running goroutines will be stopped. So you must use this if you will no longer need the Reader.

For decompressing smaller buffers a single decoder can be used. When decoding buffers, you can supply a destination slice with length 0 and your expected capacity. In this case no unneeded allocations should be made.


The buffer decoder does everything on the same goroutine and does nothing concurrently. It can however decode several buffers concurrently. Use WithDecoderConcurrency(n) to limit that.

The stream decoder operates on

  • One goroutine reads input and splits the input to several block decoders.
  • A number of decoders will decode blocks.
  • A goroutine coordinates these blocks and sends history from one to the next.

So effectively this also means the decoder will "read ahead" and prepare data to always be available for output.

Since "blocks" are quite dependent on the output of the previous block stream decoding will only have limited concurrency.

In practice this means that concurrency is often limited to utilizing about 2 cores effectively.


These are some examples of performance compared to datadog cgo library.

The first two are streaming decodes and the last are smaller inputs.

BenchmarkDecoderSilesia-8             20       642550210 ns/op   329.85 MB/s      3101 B/op        8 allocs/op
BenchmarkDecoderSilesiaCgo-8         100       384930000 ns/op   550.61 MB/s    451878 B/op     9713 allocs/op

BenchmarkDecoderEnwik9-2              10        3146000080 ns/op         317.86 MB/s        2649 B/op          9 allocs/op
BenchmarkDecoderEnwik9Cgo-2           20        1905900000 ns/op         524.69 MB/s     1125120 B/op      45785 allocs/op

BenchmarkDecoder_DecodeAll/z000000.zst-8               200     7049994 ns/op   138.26 MB/s        40 B/op        2 allocs/op
BenchmarkDecoder_DecodeAll/z000001.zst-8            100000       19560 ns/op    97.49 MB/s        40 B/op        2 allocs/op
BenchmarkDecoder_DecodeAll/z000002.zst-8              5000      297599 ns/op   236.99 MB/s        40 B/op        2 allocs/op
BenchmarkDecoder_DecodeAll/z000003.zst-8              2000      725502 ns/op   141.17 MB/s        40 B/op        2 allocs/op
BenchmarkDecoder_DecodeAll/z000004.zst-8            200000        9314 ns/op    54.54 MB/s        40 B/op        2 allocs/op
BenchmarkDecoder_DecodeAll/z000005.zst-8             10000      137500 ns/op   104.72 MB/s        40 B/op        2 allocs/op
BenchmarkDecoder_DecodeAll/z000006.zst-8               500     2316009 ns/op   206.06 MB/s        40 B/op        2 allocs/op
BenchmarkDecoder_DecodeAll/z000007.zst-8             20000       64499 ns/op   344.90 MB/s        40 B/op        2 allocs/op
BenchmarkDecoder_DecodeAll/z000008.zst-8             50000       24900 ns/op   219.56 MB/s        40 B/op        2 allocs/op
BenchmarkDecoder_DecodeAll/z000009.zst-8              1000     2348999 ns/op   154.01 MB/s        40 B/op        2 allocs/op

BenchmarkDecoder_DecodeAllCgo/z000000.zst-8            500     4268005 ns/op   228.38 MB/s   1228849 B/op        3 allocs/op
BenchmarkDecoder_DecodeAllCgo/z000001.zst-8         100000       15250 ns/op   125.05 MB/s      2096 B/op        3 allocs/op
BenchmarkDecoder_DecodeAllCgo/z000002.zst-8          10000      147399 ns/op   478.49 MB/s     73776 B/op        3 allocs/op
BenchmarkDecoder_DecodeAllCgo/z000003.zst-8           5000      320798 ns/op   319.27 MB/s    139312 B/op        3 allocs/op
BenchmarkDecoder_DecodeAllCgo/z000004.zst-8         200000       10004 ns/op    50.77 MB/s       560 B/op        3 allocs/op
BenchmarkDecoder_DecodeAllCgo/z000005.zst-8          20000       73599 ns/op   195.64 MB/s     19120 B/op        3 allocs/op
BenchmarkDecoder_DecodeAllCgo/z000006.zst-8           1000     1119003 ns/op   426.48 MB/s    557104 B/op        3 allocs/op
BenchmarkDecoder_DecodeAllCgo/z000007.zst-8          20000      103450 ns/op   215.04 MB/s     71296 B/op        9 allocs/op
BenchmarkDecoder_DecodeAllCgo/z000008.zst-8         100000       20130 ns/op   271.58 MB/s      6192 B/op        3 allocs/op
BenchmarkDecoder_DecodeAllCgo/z000009.zst-8           2000     1123500 ns/op   322.00 MB/s    368688 B/op        3 allocs/op

This reflects the performance around May 2019, but this may be out of date.


Contributions are always welcome. For new features/fixes, remember to add tests and for performance enhancements include benchmarks.

For sending files for reproducing errors use a service like goobox or similar to share your files.

For general feedback and experience reports, feel free to open an issue or write me on Twitter.



Package zstd provides decompression of zstandard files.

For advanced usage and examples, go to the README: https://github.com/klauspost/compress/tree/master/zstd#zstd



var (
	// ErrSnappyCorrupt reports that the input is invalid.
	ErrSnappyCorrupt = errors.New("snappy: corrupt input")
	// ErrSnappyTooLarge reports that the uncompressed length is too large.
	ErrSnappyTooLarge = errors.New("snappy: decoded block is too large")
	// ErrSnappyUnsupported reports that the input isn't supported.
	ErrSnappyUnsupported = errors.New("snappy: unsupported input")
var (
	// ErrReservedBlockType is returned when a reserved block type is found.
	// Typically this indicates wrong or corrupted input.
	ErrReservedBlockType = errors.New("invalid input: reserved block type encountered")

	// ErrCompressedSizeTooBig is returned when a block is bigger than allowed.
	// Typically this indicates wrong or corrupted input.
	ErrCompressedSizeTooBig = errors.New("invalid input: compressed size too big")

	// ErrBlockTooSmall is returned when a block is too small to be decoded.
	// Typically returned on invalid input.
	ErrBlockTooSmall = errors.New("block too small")

	// ErrMagicMismatch is returned when a "magic" number isn't what is expected.
	// Typically this indicates wrong or corrupted input.
	ErrMagicMismatch = errors.New("invalid input: magic number mismatch")

	// ErrWindowSizeExceeded is returned when a reference exceeds the valid window size.
	// Typically this indicates wrong or corrupted input.
	ErrWindowSizeExceeded = errors.New("window size exceeded")

	// ErrWindowSizeTooSmall is returned when no window size is specified.
	// Typically this indicates wrong or corrupted input.
	ErrWindowSizeTooSmall = errors.New("invalid input: window size was too small")

	// ErrDecoderSizeExceeded is returned if decompressed size exceeds the configured limit.
	ErrDecoderSizeExceeded = errors.New("decompressed size exceeds configured limit")

	// ErrUnknownDictionary is returned if the dictionary ID is unknown.
	// For the time being dictionaries are not supported.
	ErrUnknownDictionary = errors.New("unknown dictionary")

	// ErrCRCMismatch is returned if CRC mismatches.
	ErrCRCMismatch = errors.New("CRC check failed")

	// ErrDecoderClosed will be returned if the Decoder was used after
	// Close has been called.
	ErrDecoderClosed = errors.New("decoder used after Close")


type DOption

type DOption func(*decoderOptions) error

DOption is an option for creating a decoder.

func WithDecoderConcurrency

func WithDecoderConcurrency(n int) DOption

WithDecoderConcurrency will set the concurrency, meaning the maximum number of decoders to run concurrently. The value supplied must be at least 1. By default this will be set to GOMAXPROCS.

func WithDecoderLowmem

func WithDecoderLowmem(b bool) DOption

WithDecoderLowmem will set whether to use a lower amount of memory, but possibly have to allocate more while running.

func WithDecoderMaxMemory

func WithDecoderMaxMemory(n uint64) DOption

WithDecoderMaxMemory allows to set a maximum decoded size for in-memory (non-streaming) operations. Maxmimum and default is 1 << 63 bytes.

type Decoder

type Decoder struct {
	// contains filtered or unexported fields

Decoder provides decoding of zstandard streams. The decoder has been designed to operate without allocations after a warmup. This means that you should store the decoder for best performance. To re-use a stream decoder, use the Reset(r io.Reader) error to switch to another stream. A decoder can safely be re-used even if the previous stream failed. To release the resources, you must call the Close() function on a decoder.

func NewReader

func NewReader(r io.Reader, opts ...DOption) (*Decoder, error)

NewReader creates a new decoder. A nil Reader can be provided in which case Reset can be used to start a decode.

A Decoder can be used in two modes:

1) As a stream, or 2) For stateless decoding using DecodeAll or DecodeBuffer.

Only a single stream can be decoded concurrently, but the same decoder can run multiple concurrent stateless decodes. It is even possible to use stateless decodes while a stream is being decoded.

The Reset function can be used to initiate a new stream, which is will considerably reduce the allocations normally caused by NewReader.

func (*Decoder) Close

func (d *Decoder) Close()

Close will release all resources. It is NOT possible to reuse the decoder after this.

func (*Decoder) DecodeAll

func (d *Decoder) DecodeAll(input, dst []byte) ([]byte, error)

DecodeAll allows stateless decoding of a blob of bytes. Output will be appended to dst, so if the destination size is known you can pre-allocate the destination slice to avoid allocations. DecodeAll can be used concurrently. The Decoder concurrency limits will be respected.

func (*Decoder) Read

func (d *Decoder) Read(p []byte) (int, error)

Read bytes from the decompressed stream into p. Returns the number of bytes written and any error that occurred. When the stream is done, io.EOF will be returned.

func (*Decoder) Reset

func (d *Decoder) Reset(r io.Reader) error

Reset will reset the decoder the supplied stream after the current has finished processing. Note that this functionality cannot be used after Close has been called.

func (*Decoder) WriteTo

func (d *Decoder) WriteTo(w io.Writer) (int64, error)

WriteTo writes data to w until there's no more data to write or when an error occurs. The return value n is the number of bytes written. Any error encountered during the write is also returned.

type EOption added in v1.6.0

type EOption func(*encoderOptions) error

DOption is an option for creating a encoder.

func WithEncoderCRC added in v1.6.0

func WithEncoderCRC(b bool) EOption

WithEncoderCRC will add CRC value to output. Output will be 4 bytes larger.

func WithEncoderConcurrency added in v1.6.0

func WithEncoderConcurrency(n int) EOption

WithEncoderConcurrency will set the concurrency, meaning the maximum number of decoders to run concurrently. The value supplied must be at least 1. By default this will be set to GOMAXPROCS.

func WithSingleSegment added in v1.6.0

func WithSingleSegment(b bool) EOption

WithSingleSegment will set the "single segment" flag when EncodeAll is used. If this flag is set, data must be regenerated within a single continuous memory segment. In this case, Window_Descriptor byte is skipped, but Frame_Content_Size is necessarily present. As a consequence, the decoder must allocate a memory segment of size equal or larger than size of your content. In order to preserve the decoder from unreasonable memory requirements, a decoder is allowed to reject a compressed frame which requests a memory size beyond decoder's authorized range. For broader compatibility, decoders are recommended to support memory sizes of at least 8 MB. This is only a recommendation, each decoder is free to support higher or lower limits, depending on local limitations. This setting has no effect on streamed encodes.

type Encoder added in v1.6.0

type Encoder struct {
	// contains filtered or unexported fields

Encoder provides encoding to Zstandard. An Encoder can be used for either compressing a stream via the io.WriteCloser interface supported by the Encoder or as multiple independent tasks via the EncodeAll function. Smaller encodes are encouraged to use the EncodeAll function. Use NewWriter to create a new instance.

func NewWriter added in v1.6.0

func NewWriter(w io.Writer, opts ...EOption) (*Encoder, error)

NewWriter will create a new Zstandard encoder. If the encoder will be used for encoding blocks a nil writer can be used.

func (*Encoder) Close added in v1.6.0

func (e *Encoder) Close() error

Close will flush the final output and close the stream. The function will block until everything has been written.

func (*Encoder) EncodeAll added in v1.6.0

func (e *Encoder) EncodeAll(src, dst []byte) []byte

EncodeAll will encode all input in src and append it to dst. This function can be called concurrently, but each call will only run on a single goroutine. If empty input is given, nothing is returned. Encoded blocks can be concatenated and the result will be the combined input stream. Data compressed with EncodeAll can be decoded with the Decoder, using either a stream or DecodeAll.

func (*Encoder) Flush added in v1.6.0

func (e *Encoder) Flush() error

Flush will send the currently written data to output and block until everything has been written. This should only be used on rare occasions where pushing the currently queued data is critical.

func (*Encoder) ReadFrom added in v1.6.0

func (e *Encoder) ReadFrom(r io.Reader) (n int64, err error)

ReadFrom reads data from r until EOF or error. The return value n is the number of bytes read. Any error except io.EOF encountered during the read is also returned.

The Copy function uses ReaderFrom if available.

func (*Encoder) Reset added in v1.6.0

func (e *Encoder) Reset(w io.Writer)

Reset will re-initialize the writer and new writes will encode to the supplied writer as a new, independent stream.

func (*Encoder) Write added in v1.6.0

func (e *Encoder) Write(p []byte) (n int, err error)

Write data to the encoder. Input data will be buffered and as the buffer fills up content will be compressed and written to the output. When done writing, use Close to flush the remaining output and write CRC if requested.

type SnappyConverter added in v1.6.0

type SnappyConverter struct {
	// contains filtered or unexported fields

SnappyConverter can read SnappyConverter-compressed streams and convert them to zstd. Conversion is done by converting the stream directly from Snappy without intermediate full decoding. Therefore the compression ratio is much less than what can be done by a full decompression and compression, and a faulty Snappy stream may lead to a faulty Zstandard stream without any errors being generated. No CRC value is being generated and not all CRC values of the Snappy stream are checked. However, it provides really fast recompression of Snappy streams. The converter can be reused to avoid allocations, even after errors.

func (*SnappyConverter) Convert added in v1.6.0

func (r *SnappyConverter) Convert(in io.Reader, w io.Writer) (int64, error)

Convert the Snappy stream supplied in 'in' and write the zStandard stream to 'w'. If any error is detected on the Snappy stream it is returned. The number of bytes written is returned.

