Documentation ¶
Overview ¶
Package s2 implements the S2 compression format.
S2 is an extension of Snappy. Similar to Snappy S2 is aimed for high throughput, which is why it features concurrent compression for bigger payloads.
Decoding is compatible with Snappy compressed content, but content compressed with S2 cannot be decompressed by Snappy.
For more information on Snappy/S2 differences see README in: https://github.com/klauspost/compress/tree/master/s2
There are actually two S2 formats: block and stream. They are related, but different: trying to decompress block-compressed data as a S2 stream will fail, and vice versa. The block format is the Decode and Encode functions and the stream format is the Reader and Writer types.
A "better" compression option is available. This will trade some compression speed
The block format, the more common case, is used when the complete size (the number of bytes) of the original data is known upfront, at the time compression starts. The stream format, also known as the framing format, is for when that isn't always true.
Blocks to not offer much data protection, so it is up to you to add data validation of decompressed blocks.
Streams perform CRC validation of the decompressed data. Stream compression will also be performed on multiple CPU cores concurrently significantly improving throughput.
Index ¶
- Constants
- Variables
- func ConcatBlocks(dst []byte, blocks ...[]byte) ([]byte, error)
- func Decode(dst, src []byte) ([]byte, error)
- func DecodedLen(src []byte) (int, error)
- func Encode(dst, src []byte) []byte
- func EncodeBest(dst, src []byte) []byte
- func EncodeBetter(dst, src []byte) []byte
- func EncodeSnappy(dst, src []byte) []byte
- func EncodeSnappyBest(dst, src []byte) []byte
- func EncodeSnappyBetter(dst, src []byte) []byte
- func EstimateBlockSize(src []byte) (d int)
- func IndexStream(r io.Reader) ([]byte, error)
- func MaxEncodedLen(srcLen int) int
- func RemoveIndexHeaders(b []byte) []byte
- func RestoreIndexHeaders(in []byte) []byte
- type Dict
- type ErrCantSeek
- type Index
- type LZ4Converter
- type LZ4sConverter
- type ReadSeeker
- type Reader
- func (r *Reader) DecodeConcurrent(w io.Writer, concurrent int) (written int64, err error)
- func (r *Reader) Read(p []byte) (int, error)
- func (r *Reader) ReadByte() (byte, error)
- func (r *Reader) ReadSeeker(random bool, index []byte) (*ReadSeeker, error)
- func (r *Reader) Reset(reader io.Reader)
- func (r *Reader) Skip(n int64) error
- func (r *Reader) SkippableCB(id uint8, fn func(r io.Reader) error) error
- type ReaderOption
- type Writer
- func (w *Writer) AddSkippableBlock(id uint8, data []byte) (err error)
- func (w *Writer) Close() error
- func (w *Writer) CloseIndex() ([]byte, error)
- func (w *Writer) EncodeBuffer(buf []byte) (err error)
- func (w *Writer) Flush() error
- func (w *Writer) ReadFrom(r io.Reader) (n int64, err error)
- func (w *Writer) Reset(writer io.Writer)
- func (w *Writer) Write(p []byte) (nRet int, errRet error)
- type WriterOption
- func WriterAddIndex() WriterOption
- func WriterBestCompression() WriterOption
- func WriterBetterCompression() WriterOption
- func WriterBlockSize(n int) WriterOption
- func WriterConcurrency(n int) WriterOption
- func WriterCustomEncoder(fn func(dst, src []byte) int) WriterOption
- func WriterFlushOnWrite() WriterOption
- func WriterPadding(n int) WriterOption
- func WriterPaddingSrc(reader io.Reader) WriterOption
- func WriterSnappyCompat() WriterOption
- func WriterUncompressed() WriterOption
Examples ¶
Constants ¶
const ( // MinDictSize is the minimum dictionary size when repeat has been read. MinDictSize = 16 // MaxDictSize is the maximum dictionary size when repeat has been read. MaxDictSize = 65536 // MaxDictSrcOffset is the maximum offset where a dictionary entry can start. MaxDictSrcOffset = 65535 )
const ( S2IndexHeader = "s2idx\x00" S2IndexTrailer = "\x00xdi2s" )
const (
ChunkTypeIndex = 0x99
)
const MaxBlockSize = math.MaxUint32 - binary.MaxVarintLen32 - 5
MaxBlockSize is the maximum value where MaxEncodedLen will return a valid block size. Blocks this big are highly discouraged, though.
Variables ¶
var ( // ErrCorrupt reports that the input is invalid. ErrCorrupt = errors.New("s2: corrupt input") // ErrCRC reports that the input failed CRC validation (streams only) ErrCRC = errors.New("s2: corrupt input, crc mismatch") // ErrTooLarge reports that the uncompressed length is too large. ErrTooLarge = errors.New("s2: decoded block is too large") // ErrUnsupported reports that the input isn't supported. ErrUnsupported = errors.New("s2: unsupported input") )
var ErrDstTooSmall = errors.New("s2: destination too small")
ErrDstTooSmall is returned when provided destination is too small.
Functions ¶
func ConcatBlocks ¶
ConcatBlocks will concatenate the supplied blocks and append them to the supplied destination. If the destination is nil or too small, a new will be allocated. The blocks are not validated, so garbage in = garbage out. dst may not overlap block data. Any data in dst is preserved as is, so it will not be considered a block.
func Decode ¶
Decode returns the decoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire decoded block. Otherwise, a newly allocated slice will be returned.
The dst and src must not overlap. It is valid to pass a nil dst.
func DecodedLen ¶
DecodedLen returns the length of the decoded block.
func Encode ¶
Encode returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned.
The dst and src must not overlap. It is valid to pass a nil dst.
The blocks will require the same amount of memory to decode as encoding, and does not make for concurrent decoding. Also note that blocks do not contain CRC information, so corruption may be undetected.
If you need to encode larger amounts of data, consider using the streaming interface which gives all of these features.
func EncodeBest ¶ added in v1.11.7
EncodeBest returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned.
EncodeBest compresses as good as reasonably possible but with a big speed decrease.
The dst and src must not overlap. It is valid to pass a nil dst.
The blocks will require the same amount of memory to decode as encoding, and does not make for concurrent decoding. Also note that blocks do not contain CRC information, so corruption may be undetected.
If you need to encode larger amounts of data, consider using the streaming interface which gives all of these features.
func EncodeBetter ¶
EncodeBetter returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned.
EncodeBetter compresses better than Encode but typically with a 10-40% speed decrease on both compression and decompression.
The dst and src must not overlap. It is valid to pass a nil dst.
The blocks will require the same amount of memory to decode as encoding, and does not make for concurrent decoding. Also note that blocks do not contain CRC information, so corruption may be undetected.
If you need to encode larger amounts of data, consider using the streaming interface which gives all of these features.
func EncodeSnappy ¶ added in v1.10.0
EncodeSnappy returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned.
The output is Snappy compatible and will likely decompress faster.
The dst and src must not overlap. It is valid to pass a nil dst.
The blocks will require the same amount of memory to decode as encoding, and does not make for concurrent decoding. Also note that blocks do not contain CRC information, so corruption may be undetected.
If you need to encode larger amounts of data, consider using the streaming interface which gives all of these features.
func EncodeSnappyBest ¶ added in v1.13.1
EncodeSnappyBest returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned.
The output is Snappy compatible and will likely decompress faster.
The dst and src must not overlap. It is valid to pass a nil dst.
The blocks will require the same amount of memory to decode as encoding, and does not make for concurrent decoding. Also note that blocks do not contain CRC information, so corruption may be undetected.
If you need to encode larger amounts of data, consider using the streaming interface which gives all of these features.
func EncodeSnappyBetter ¶ added in v1.13.1
EncodeSnappyBetter returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned.
The output is Snappy compatible and will likely decompress faster.
The dst and src must not overlap. It is valid to pass a nil dst.
The blocks will require the same amount of memory to decode as encoding, and does not make for concurrent decoding. Also note that blocks do not contain CRC information, so corruption may be undetected.
If you need to encode larger amounts of data, consider using the streaming interface which gives all of these features.
func EstimateBlockSize ¶ added in v1.16.0
EstimateBlockSize will perform a very fast compression without outputting the result and return the compressed output size. The function returns -1 if no improvement could be achieved. Using actual compression will most often produce better compression than the estimate.
func IndexStream ¶ added in v1.14.0
IndexStream will return an index for a stream. The stream structure will be checked, but data within blocks is not verified. The returned index can either be appended to the end of the stream or stored separately.
Example ¶
ExampleIndexStream shows an example of indexing a stream and indexing it after it has been written. The index can either be appended.
package main import ( "bytes" "fmt" "io" "math/rand" "os" "github.com/klauspost/compress/s2" ) func main() { fatalErr := func(err error) { if err != nil { panic(err) } } // Create a test stream without index var streamName = "" tmp := make([]byte, 5<<20) { rng := rand.New(rand.NewSource(0xbeefcafe)) rng.Read(tmp) // Make it compressible... for i, v := range tmp { tmp[i] = '0' + v&3 } // Compress it... output, err := os.CreateTemp("", "IndexStream") streamName = output.Name() fatalErr(err) // We use smaller blocks just for the example... enc := s2.NewWriter(output, s2.WriterSnappyCompat()) err = enc.EncodeBuffer(tmp) fatalErr(err) // Close and get index... err = enc.Close() fatalErr(err) err = output.Close() fatalErr(err) } // Open our compressed stream without an index... stream, err := os.Open(streamName) fatalErr(err) defer stream.Close() var indexInput = io.Reader(stream) var indexOutput io.Writer var indexedName string // Should index be combined with stream by appending? // This could also be done by appending to an os.File // If not it will be written to a separate file. const combineOutput = false // Function to easier use defer. func() { if combineOutput { output, err := os.CreateTemp("", "IndexStream-Combined") fatalErr(err) defer func() { fatalErr(output.Close()) if false { fi, err := os.Stat(output.Name()) fatalErr(err) fmt.Println("Combined:", fi.Size(), "bytes") } else { fmt.Println("Index saved") } }() // Everything read from stream will also be written to output. indexedName = output.Name() indexInput = io.TeeReader(stream, output) indexOutput = output } else { output, err := os.CreateTemp("", "IndexStream-Index") fatalErr(err) defer func() { fatalErr(output.Close()) fi, err := os.Stat(output.Name()) fatalErr(err) if false { fmt.Println("Index:", fi.Size(), "bytes") } else { fmt.Println("Index saved") } }() indexedName = output.Name() indexOutput = output } // Index the input idx, err := s2.IndexStream(indexInput) fatalErr(err) // Write the index _, err = indexOutput.Write(idx) fatalErr(err) }() if combineOutput { // Read from combined stream only. stream, err := os.Open(indexedName) fatalErr(err) defer stream.Close() // Create a reader with the input. // We assert that the stream is an io.ReadSeeker. r := s2.NewReader(io.ReadSeeker(stream)) // Request a ReadSeeker with random access. // This will load the index from the stream. rs, err := r.ReadSeeker(true, nil) fatalErr(err) _, err = rs.Seek(-10, io.SeekEnd) fatalErr(err) b, err := io.ReadAll(rs) fatalErr(err) if want := tmp[len(tmp)-10:]; !bytes.Equal(b, want) { fatalErr(fmt.Errorf("wanted %v, got %v", want, b)) } fmt.Println("last 10 bytes read") _, err = rs.Seek(10, io.SeekStart) fatalErr(err) _, err = io.ReadFull(rs, b) fatalErr(err) if want := tmp[10:20]; !bytes.Equal(b, want) { fatalErr(fmt.Errorf("wanted %v, got %v", want, b)) } fmt.Println("10 bytes at offset 10 read") } else { // Read from separate stream and index. stream, err := os.Open(streamName) fatalErr(err) defer stream.Close() // Create a reader with the input. // We assert that the stream is an io.ReadSeeker. r := s2.NewReader(io.ReadSeeker(stream)) // Read the separate index. index, err := os.ReadFile(indexedName) fatalErr(err) // Request a ReadSeeker with random access. // The provided index will be used. rs, err := r.ReadSeeker(true, index) fatalErr(err) _, err = rs.Seek(-10, io.SeekEnd) fatalErr(err) b, err := io.ReadAll(rs) fatalErr(err) if want := tmp[len(tmp)-10:]; !bytes.Equal(b, want) { fatalErr(fmt.Errorf("wanted %v, got %v", want, b)) } fmt.Println("last 10 bytes read") _, err = rs.Seek(10, io.SeekStart) fatalErr(err) _, err = io.ReadFull(rs, b) fatalErr(err) if want := tmp[10:20]; !bytes.Equal(b, want) { fatalErr(fmt.Errorf("wanted %v, got %v", want, b)) } fmt.Println("10 bytes at offset 10 read") } }
Output: Index saved last 10 bytes read 10 bytes at offset 10 read
func MaxEncodedLen ¶
MaxEncodedLen returns the maximum length of a snappy block, given its uncompressed length.
It will return a negative value if srcLen is too large to encode. 32 bit platforms will have lower thresholds for rejecting big content.
func RemoveIndexHeaders ¶ added in v1.15.8
RemoveIndexHeaders will trim all headers and trailers from a given index. This is expected to save 20 bytes. These can be restored using RestoreIndexHeaders. This removes a layer of security, but is the most compact representation. Returns nil if headers contains errors. The returned slice references the provided slice.
func RestoreIndexHeaders ¶ added in v1.15.8
RestoreIndexHeaders will index restore headers removed by RemoveIndexHeaders. No error checking is performed on the input. If a 0 length slice is sent, it is returned without modification.
Types ¶
type Dict ¶ added in v1.16.0
type Dict struct {
// contains filtered or unexported fields
}
Dict contains a dictionary that can be used for encoding and decoding s2
func MakeDict ¶ added in v1.16.0
MakeDict will create a dictionary. 'data' must be at least MinDictSize. If data is longer than MaxDictSize only the last MaxDictSize bytes will be used. If searchStart is set the start repeat value will be set to the last match of this content. If no matches are found, it will attempt to find shorter matches. This content should match the typical start of a block. If at least 4 bytes cannot be matched, repeat is set to start of block.
Example ¶
package main import ( "bytes" "fmt" "os" "github.com/klauspost/compress/s2" ) func main() { // Read a sample sample, err := os.ReadFile("../testdata/gettysburg.txt") if err != nil { panic(err) } fmt.Println("Input size:", len(sample)) // Create a dictionary. dict := s2.MakeDict(sample, nil) fmt.Println("Dict size:", len(dict.Bytes())) encoded := dict.Encode(nil, sample) if len(encoded) < 20 { fmt.Println("Encoded size was less than 20 bytes!") } // To decode: decoded, err := dict.Decode(nil, encoded) if err != nil { panic(err) } if bytes.Equal(decoded, sample) { fmt.Println("They match!") } }
Output: Input size: 1548 Dict size: 1549 Encoded size was less than 20 bytes! They match!
Example (Zstd) ¶
package main import ( "bytes" "fmt" "os" "github.com/klauspost/compress/s2" "github.com/klauspost/compress/zstd" ) func main() { // Read dictionary generated by zStandard using the command line // λ zstd -r --train-fastcover -o zstd.dict --maxdict=2048 gosrc\* // With gosrc containing all the standard library source files. zdict := []byte("7\xa40콶\xc1\x1bB\x10\x982\xc4\xe9\xc0\xc0\xc0\xc0\xc0\xc0\xc0\xc0\xc0\xc0\xc0\xc0@\xf5<\xda#\"{\xb7\xb6\xdd\xdd\xda\x17\x1b\t\x9b\xbd\x13n{U\xc1k\x11\xc3\x1b\x8b\xfbX\xee\xfe\xcb1\xcai\f\xf6meE\x97\x19\x83\\f\x14\x00\\\tS\x01\x00\x18 \x18\x8f\aT\x1a\xf5\x00\x00\x04\x80O\xd3MIJH\x03q\x98$I\n\xa3\x10B\xc6\x18B\b\x01\x00\x00D\x00\x04\x04\x00\xc0\x00\x00\x004\xcdieĩ@Β \xc7\x14B\n͌\b\x00\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00\b\x00\x00\x00kage types2\n\nimport (\n\t\"cmd/compile/internal/syntax\"\n\t\"strings\"\n\t\"unicode\"\n)\n\n// funcInst type-checks a func\")\n\tif err != nil {\n\t\tt.Fatalf(\"Prepare: %v\", err)\n\t}\n\tdefer stmt.Close()\n\n\tconst n = 10\n\tch := make(chan error, n)\n\tfor i := 0; i < n; i++ {\n\t\tgo func() {\n\t\t\tvar age int\n\t\t\terr := stmt.QueryRowool { return c != nil && c.fd != nil }\n\n// Implementation of the Conn interface.\n\n// Read implements the Conn Read method.\nfunc (c *conn) Read(b []byte) (int, error) {\n\tif !c.ok() {\n\t\treturn 0, t\n\t\t} else {\n\t\t\treturn nil, &FormatError{0, \"invalid magic number\", nil}\n\t\t}\n\t}\n\toffset := int64(4)\n\n\t// Read the number of FatArchHeaders that come after the fat_header.\n\tvar narch uint32\n\terr log.Fatal(err)\n\t\t}\n\t\tf := strings.Fields(line)\n\t\tif len(f) == 0 {\n\t\t\tcontinue\n\t\t}\n\t\tswitch f[0] {\n\t\tdefault:\n\t\t\tfmt.Fprintf(os.Stderr, \"?unknown command\\n\")\n\t\t\tcontinue\n\t\tcase \"tags\":\n\t\t\tprefix 00\\x00\\x00\", true},\n\t}\n\n\tfor _, v := range vectors {\n\t\tvar f formatter\n\t\tgot := make([]byte, len(v.want))\n\t\tf.formatNumeric(got, v.in)\n\t\tok := (f.err == nil)\n\t\tif ok != v.ok {\n\t\t\tif v.ok {\n\t\t\t\ttturn true\n\t}\n\treturn false\n}\nfunc rewriteValueARM_OpARMBICconst(v *Value) bool {\n\tv_0 := v.Args[0]\n\t// match: (BICconst [0] x)\n\t// result: x\n\tfor {\n\t\tif auxIntToInt32(v.AuxInt) != 0 {\n\t\t\tbreak\n\tnt) {\n\t\t\t\t\tt.Errorf(\"%5g %s %5g = %5s; want %5s\", x, op, y, got, want)\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n}\n\nfunc TestFloatArithmeticOverflow(t *testing.T) {\n\tfor _, test := range []struct {\n\t\tprec uint\n\t\t)\n\t\t\t}\n\t\t\treturn\n\t\t}\n\t}\n}\n// Copyright 2017 The Go Authors. All rights reserved.\n// Use of this source code is governed by a BSD-style\n// license that can be found in the LICENSE file.\n\npackage ), uintptr(unsafe.Pointer(_p1)), 0)\n\tif e1 != 0 {\n\t\terr = errnoErr(e1)\n\t}\n\treturn\n}\n\n// THIS FILE IS GENERATED BY THE COMMAND AT THE TOP; DO NOT EDIT\n\nfunc Sync() (err error) {\n\t_, _, e1 := SyscDLINK = 0x10\n\tMOVEFILE_FAIL_IF_NOT") // Decode the zstandard dictionary. insp, err := zstd.InspectDictionary(zdict) if err != nil { panic(err) } // We are only interested in the contents. fmt.Println("Dictionary content length:", len(insp.Content())) // Create a dictionary. // Assume that files start with "// Copyright (c) 2023". // Search for the longest match for that. // This may save a few bytes. dict := s2.MakeDict(insp.Content(), []byte("// Copyright (c) 2023")) // b := d.Bytes() will provide a dictionary that can be saved // and reloaded with s2.NewDict(b). fmt.Println("Dict size:", len(dict.Bytes())) // Read a sample. Use this file. sample, err := os.ReadFile("examples_test.go") if err != nil { panic(err) } encodedWithDict := dict.Encode(nil, sample) encodedNoDict := s2.Encode(nil, sample) // Print a less accurate output that is less likely to change. // Since we include the (encoded) dictionary itself that will create better than expected compression. if len(encodedWithDict) < len(encodedNoDict)-1000 { fmt.Println("Saved more than 1000 bytes") } // To decode the content: decoded, err := dict.Decode(nil, encodedWithDict) if err != nil { panic(err) } if bytes.Equal(decoded, sample) { fmt.Println("They match!") } }
Output: Dictionary content length: 1894 Dict size: 1896 Saved more than 1000 bytes They match!
func NewDict ¶ added in v1.16.0
NewDict will read a dictionary. It will return nil if the dictionary is invalid.
func (*Dict) Bytes ¶ added in v1.16.0
Bytes will return a serialized version of the dictionary. The output can be sent to NewDict.
func (*Dict) Decode ¶ added in v1.16.0
Decode returns the decoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire decoded block. Otherwise, a newly allocated slice will be returned.
The dst and src must not overlap. It is valid to pass a nil dst.
func (*Dict) Encode ¶ added in v1.16.0
Encode returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned.
The dst and src must not overlap. It is valid to pass a nil dst.
The blocks will require the same amount of memory to decode as encoding, and does not make for concurrent decoding. Also note that blocks do not contain CRC information, so corruption may be undetected.
If you need to encode larger amounts of data, consider using the streaming interface which gives all of these features.
func (*Dict) EncodeBest ¶ added in v1.16.0
EncodeBest returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned.
EncodeBest compresses as good as reasonably possible but with a big speed decrease.
The dst and src must not overlap. It is valid to pass a nil dst.
The blocks will require the same amount of memory to decode as encoding, and does not make for concurrent decoding. Also note that blocks do not contain CRC information, so corruption may be undetected.
If you need to encode larger amounts of data, consider using the streaming interface which gives all of these features.
func (*Dict) EncodeBetter ¶ added in v1.16.0
EncodeBetter returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned.
EncodeBetter compresses better than Encode but typically with a 10-40% speed decrease on both compression and decompression.
The dst and src must not overlap. It is valid to pass a nil dst.
The blocks will require the same amount of memory to decode as encoding, and does not make for concurrent decoding. Also note that blocks do not contain CRC information, so corruption may be undetected.
If you need to encode larger amounts of data, consider using the streaming interface which gives all of these features.
type ErrCantSeek ¶ added in v1.14.0
type ErrCantSeek struct {
Reason string
}
ErrCantSeek is returned if the stream cannot be seeked.
func (ErrCantSeek) Error ¶ added in v1.14.0
func (e ErrCantSeek) Error() string
Error returns the error as string.
type Index ¶ added in v1.14.0
type Index struct { TotalUncompressed int64 // Total Uncompressed size if known. Will be -1 if unknown. TotalCompressed int64 // Total Compressed size if known. Will be -1 if unknown. // contains filtered or unexported fields }
Index represents an S2/Snappy index.
func (*Index) Find ¶ added in v1.14.0
Find the offset at or before the wanted (uncompressed) offset. If offset is 0 or positive it is the offset from the beginning of the file. If the uncompressed size is known, the offset must be within the file. If an offset outside the file is requested io.ErrUnexpectedEOF is returned. If the offset is negative, it is interpreted as the distance from the end of the file, where -1 represents the last byte. If offset from the end of the file is requested, but size is unknown, ErrUnsupported will be returned.
func (*Index) Load ¶ added in v1.14.0
Load a binary index. A zero value Index can be used or a previous one can be reused.
Example ¶
package main import ( "bytes" "fmt" "io" "math/rand" "sync" "github.com/klauspost/compress/s2" ) func main() { fatalErr := func(err error) { if err != nil { panic(err) } } // Create a test corpus tmp := make([]byte, 5<<20) rng := rand.New(rand.NewSource(0xbeefcafe)) rng.Read(tmp) // Make it compressible... for i, v := range tmp { tmp[i] = '0' + v&3 } // Compress it... var buf bytes.Buffer // We use smaller blocks just for the example... enc := s2.NewWriter(&buf, s2.WriterBlockSize(100<<10)) err := enc.EncodeBuffer(tmp) fatalErr(err) // Close and get index... idxBytes, err := enc.CloseIndex() fatalErr(err) // This is our compressed stream... compressed := buf.Bytes() var once sync.Once for wantOffset := int64(0); wantOffset < int64(len(tmp)); wantOffset += 555555 { // Let's assume we want to read from uncompressed offset 'i' // and we cannot seek in input, but we have the index. want := tmp[wantOffset:] // Load the index. var index s2.Index _, err = index.Load(idxBytes) fatalErr(err) // Find offset in file: compressedOffset, uncompressedOffset, err := index.Find(wantOffset) fatalErr(err) // Offset the input to the compressed offset. // Notice how we do not provide any bytes before the offset. input := io.Reader(bytes.NewBuffer(compressed[compressedOffset:])) if _, ok := input.(io.Seeker); !ok { // Notice how the input cannot be seeked... once.Do(func() { fmt.Println("Input does not support seeking...") }) } else { panic("did you implement seeking on bytes.Buffer?") } // When creating the decoder we must specify that it should not // expect a stream identifier at the beginning og the frame. dec := s2.NewReader(input, s2.ReaderIgnoreStreamIdentifier()) // We now have a reader, but it will start outputting at uncompressedOffset, // and not the actual offset we want, so skip forward to that. toSkip := wantOffset - uncompressedOffset err = dec.Skip(toSkip) fatalErr(err) // Read the rest of the stream... got, err := io.ReadAll(dec) fatalErr(err) if bytes.Equal(got, want) { fmt.Println("Successfully skipped forward to", wantOffset) } else { fmt.Println("Failed to skip forward to", wantOffset) } } }
Output: Input does not support seeking... Successfully skipped forward to 0 Successfully skipped forward to 555555 Successfully skipped forward to 1111110 Successfully skipped forward to 1666665 Successfully skipped forward to 2222220 Successfully skipped forward to 2777775 Successfully skipped forward to 3333330 Successfully skipped forward to 3888885 Successfully skipped forward to 4444440 Successfully skipped forward to 4999995
func (*Index) LoadStream ¶ added in v1.14.0
func (i *Index) LoadStream(rs io.ReadSeeker) error
LoadStream will load an index from the end of the supplied stream. ErrUnsupported will be returned if the signature cannot be found. ErrCorrupt will be returned if unexpected values are found. io.ErrUnexpectedEOF is returned if there are too few bytes. IO errors are returned as-is.
type LZ4Converter ¶ added in v1.16.0
type LZ4Converter struct { }
LZ4Converter provides conversion from LZ4 blocks as defined here: https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md
func (*LZ4Converter) ConvertBlock ¶ added in v1.16.0
func (l *LZ4Converter) ConvertBlock(dst, src []byte) ([]byte, int, error)
ConvertBlock will convert an LZ4 block and append it as an S2 block without block length to dst. The uncompressed size is returned as well. dst must have capacity to contain the entire compressed block.
func (*LZ4Converter) ConvertBlockSnappy ¶ added in v1.16.0
func (l *LZ4Converter) ConvertBlockSnappy(dst, src []byte) ([]byte, int, error)
ConvertBlockSnappy will convert an LZ4 block and append it as a Snappy block without block length to dst. The uncompressed size is returned as well. dst must have capacity to contain the entire compressed block.
type LZ4sConverter ¶ added in v1.16.1
type LZ4sConverter struct { }
LZ4sConverter provides conversion from LZ4s. (Intel modified LZ4 Blocks) https://cdrdv2-public.intel.com/743912/743912-qat-programmers-guide-v2.0.pdf LZ4s is a variant of LZ4 block format. LZ4s should be considered as an intermediate compressed block format. The LZ4s format is selected when the application sets the compType to CPA_DC_LZ4S in CpaDcSessionSetupData. The LZ4s block returned by the Intel® QAT hardware can be used by an external software post-processing to generate other compressed data formats. The following table lists the differences between LZ4 and LZ4s block format. LZ4s block format uses the same high-level formatting as LZ4 block format with the following encoding changes: For Min Match of 4 bytes, Copy length value 1-15 means length 4-18 with 18 bytes adding an extra byte. ONLY "Min match of 4 bytes" is supported.
func (*LZ4sConverter) ConvertBlock ¶ added in v1.16.1
func (l *LZ4sConverter) ConvertBlock(dst, src []byte) ([]byte, int, error)
ConvertBlock will convert an LZ4s block and append it as an S2 block without block length to dst. The uncompressed size is returned as well. dst must have capacity to contain the entire compressed block.
func (*LZ4sConverter) ConvertBlockSnappy ¶ added in v1.16.1
func (l *LZ4sConverter) ConvertBlockSnappy(dst, src []byte) ([]byte, int, error)
ConvertBlockSnappy will convert an LZ4s block and append it as a Snappy block without block length to dst. The uncompressed size is returned as well. dst must have capacity to contain the entire compressed block.
type ReadSeeker ¶ added in v1.14.0
type ReadSeeker struct { *Reader // contains filtered or unexported fields }
ReadSeeker provides random or forward seeking in compressed content. See Reader.ReadSeeker
func (*ReadSeeker) ReadAt ¶ added in v1.16.0
func (r *ReadSeeker) ReadAt(p []byte, offset int64) (int, error)
ReadAt reads len(p) bytes into p starting at offset off in the underlying input source. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered.
When ReadAt returns n < len(p), it returns a non-nil error explaining why more bytes were not returned. In this respect, ReadAt is stricter than Read.
Even if ReadAt returns n < len(p), it may use all of p as scratch space during the call. If some data is available but not len(p) bytes, ReadAt blocks until either all the data is available or an error occurs. In this respect ReadAt is different from Read.
If the n = len(p) bytes returned by ReadAt are at the end of the input source, ReadAt may return either err == EOF or err == nil.
If ReadAt is reading from an input source with a seek offset, ReadAt should not affect nor be affected by the underlying seek offset.
Clients of ReadAt can execute parallel ReadAt calls on the same input source. This is however not recommended.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader is an io.Reader that can read Snappy-compressed bytes.
func NewReader ¶
func NewReader(r io.Reader, opts ...ReaderOption) *Reader
NewReader returns a new Reader that decompresses from r, using the framing format described at https://github.com/google/snappy/blob/master/framing_format.txt with S2 changes.
func (*Reader) DecodeConcurrent ¶ added in v1.15.5
DecodeConcurrent will decode the full stream to w. This function should not be combined with reading, seeking or other operations. Up to 'concurrent' goroutines will be used. If <= 0, runtime.NumCPU will be used. On success the number of bytes decompressed nil and is returned. This is mainly intended for bigger streams.
func (*Reader) ReadSeeker ¶ added in v1.14.0
func (r *Reader) ReadSeeker(random bool, index []byte) (*ReadSeeker, error)
ReadSeeker will return an io.ReadSeeker and io.ReaderAt compatible version of the reader. If 'random' is specified the returned io.Seeker can be used for random seeking, otherwise only forward seeking is supported. Enabling random seeking requires the original input to support the io.Seeker interface. A custom index can be specified which will be used if supplied. When using a custom index, it will not be read from the input stream. The ReadAt position will affect regular reads and the current position of Seek. So using Read after ReadAt will continue from where the ReadAt stopped. No functions should be used concurrently. The returned ReadSeeker contains a shallow reference to the existing Reader, meaning changes performed to one is reflected in the other.
func (*Reader) Reset ¶
Reset discards any buffered data, resets all state, and switches the Snappy reader to read from r. This permits reusing a Reader rather than allocating a new one.
func (*Reader) Skip ¶
Skip will skip n bytes forward in the decompressed output. For larger skips this consumes less CPU and is faster than reading output and discarding it. CRC is not checked on skipped blocks. io.ErrUnexpectedEOF is returned if the stream ends before all bytes have been skipped. If a decoding error is encountered subsequent calls to Read will also fail.
func (*Reader) SkippableCB ¶ added in v1.14.0
SkippableCB will register a callback for chunks with the specified ID. ID must be a Reserved skippable chunks ID, 0x80-0xfe (inclusive). For each chunk with the ID, the callback is called with the content. Any returned non-nil error will abort decompression. Only one callback per ID is supported, latest sent will be used. Sending a nil function will disable previous callbacks.
type ReaderOption ¶ added in v1.11.7
ReaderOption is an option for creating a decoder.
func ReaderAllocBlock ¶ added in v1.11.8
func ReaderAllocBlock(blockSize int) ReaderOption
ReaderAllocBlock allows to control upfront stream allocations and not allocate for frames bigger than this initially. If frames bigger than this is seen a bigger buffer will be allocated.
Default is 1MB, which is default output size.
func ReaderIgnoreCRC ¶ added in v1.15.6
func ReaderIgnoreCRC() ReaderOption
ReaderIgnoreCRC will make the reader skip CRC calculation and checks.
func ReaderIgnoreStreamIdentifier ¶ added in v1.14.0
func ReaderIgnoreStreamIdentifier() ReaderOption
ReaderIgnoreStreamIdentifier will make the reader skip the expected stream identifier at the beginning of the stream. This can be used when serving a stream that has been forwarded to a specific point.
func ReaderMaxBlockSize ¶ added in v1.11.7
func ReaderMaxBlockSize(blockSize int) ReaderOption
ReaderMaxBlockSize allows to control allocations if the stream has been compressed with a smaller WriterBlockSize, or with the default 1MB. Blocks must be this size or smaller to decompress, otherwise the decoder will return ErrUnsupported.
For streams compressed with Snappy this can safely be set to 64KB (64 << 10).
Default is the maximum limit of 4MB.
func ReaderSkippableCB ¶ added in v1.14.0
func ReaderSkippableCB(id uint8, fn func(r io.Reader) error) ReaderOption
ReaderSkippableCB will register a callback for chuncks with the specified ID. ID must be a Reserved skippable chunks ID, 0x80-0xfd (inclusive). For each chunk with the ID, the callback is called with the content. Any returned non-nil error will abort decompression. Only one callback per ID is supported, latest sent will be used.
type Writer ¶
type Writer struct {
// contains filtered or unexported fields
}
Writer is an io.Writer that can write Snappy-compressed bytes.
func NewWriter ¶
func NewWriter(w io.Writer, opts ...WriterOption) *Writer
NewWriter returns a new Writer that compresses to w, using the framing format described at https://github.com/google/snappy/blob/master/framing_format.txt
Users must call Close to guarantee all data has been forwarded to the underlying io.Writer and that resources are released. They may also call Flush zero or more times before calling Close.
func (*Writer) AddSkippableBlock ¶ added in v1.14.0
AddSkippableBlock will add a skippable block to the stream. The ID must be 0x80-0xfe (inclusive). Length of the skippable block must be <= 16777215 bytes.
func (*Writer) Close ¶
Close calls Flush and then closes the Writer. Calling Close multiple times is ok, but calling CloseIndex after this will make it not return the index.
func (*Writer) CloseIndex ¶ added in v1.14.0
CloseIndex calls Close and returns an index on first call. This is not required if you are only adding index to a stream.
func (*Writer) EncodeBuffer ¶ added in v1.10.0
EncodeBuffer will add a buffer to the stream. This is the fastest way to encode a stream, but the input buffer cannot be written to by the caller until Flush or Close has been called when concurrency != 1.
If you cannot control that, use the regular Write function.
Note that input is not buffered. This means that each write will result in discrete blocks being created. For buffered writes, use the regular Write function.
func (*Writer) Flush ¶
Flush flushes the Writer to its underlying io.Writer. This does not apply padding.
func (*Writer) ReadFrom ¶
ReadFrom implements the io.ReaderFrom interface. Using this is typically more efficient since it avoids a memory copy. ReadFrom reads data from r until EOF or error. The return value n is the number of bytes read. Any error except io.EOF encountered during the read is also returned.
type WriterOption ¶
WriterOption is an option for creating a encoder.
func WriterAddIndex ¶ added in v1.14.0
func WriterAddIndex() WriterOption
WriterAddIndex will append an index to the end of a stream when it is closed.
func WriterBestCompression ¶ added in v1.11.7
func WriterBestCompression() WriterOption
WriterBestCompression will enable better compression. EncodeBetter compresses better than Encode but typically with a big speed decrease on compression.
func WriterBetterCompression ¶
func WriterBetterCompression() WriterOption
WriterBetterCompression will enable better compression. EncodeBetter compresses better than Encode but typically with a 10-40% speed decrease on both compression and decompression.
func WriterBlockSize ¶
func WriterBlockSize(n int) WriterOption
WriterBlockSize allows to override the default block size. Blocks will be this size or smaller. Minimum size is 4KB and and maximum size is 4MB.
Bigger blocks may give bigger throughput on systems with many cores, and will increase compression slightly, but it will limit the possible concurrency for smaller payloads for both encoding and decoding. Default block size is 1MB.
When writing Snappy compatible output using WriterSnappyCompat, the maximum block size is 64KB.
func WriterConcurrency ¶
func WriterConcurrency(n int) WriterOption
WriterConcurrency will set the concurrency, meaning the maximum number of decoders to run concurrently. The value supplied must be at least 1. By default this will be set to GOMAXPROCS.
func WriterCustomEncoder ¶ added in v1.16.0
func WriterCustomEncoder(fn func(dst, src []byte) int) WriterOption
WriterCustomEncoder allows to override the encoder for blocks on the stream. The function must compress 'src' into 'dst' and return the bytes used in dst as an integer. Block size (initial varint) should not be added by the encoder. Returning value 0 indicates the block could not be compressed. The function should expect to be called concurrently.
func WriterFlushOnWrite ¶ added in v1.13.4
func WriterFlushOnWrite() WriterOption
WriterFlushOnWrite will compress blocks on each call to the Write function.
This is quite inefficient as blocks size will depend on the write size.
Use WriterConcurrency(1) to also make sure that output is flushed. When Write calls return, otherwise they will be written when compression is done.
func WriterPadding ¶
func WriterPadding(n int) WriterOption
WriterPadding will add padding to all output so the size will be a multiple of n. This can be used to obfuscate the exact output size or make blocks of a certain size. The contents will be a skippable frame, so it will be invisible by the decoder. n must be > 0 and <= 4MB. The padded area will be filled with data from crypto/rand.Reader. The padding will be applied whenever Close is called on the writer.
func WriterPaddingSrc ¶ added in v1.11.4
func WriterPaddingSrc(reader io.Reader) WriterOption
WriterPaddingSrc will get random data for padding from the supplied source. By default crypto/rand is used.
func WriterSnappyCompat ¶ added in v1.13.1
func WriterSnappyCompat() WriterOption
WriterSnappyCompat will write snappy compatible output. The output can be decompressed using either snappy or s2. If block size is more than 64KB it is set to that.
func WriterUncompressed ¶ added in v1.11.4
func WriterUncompressed() WriterOption
WriterUncompressed will bypass compression. The stream will be written as uncompressed blocks only. If concurrency is > 1 CRC and output will still be done async.
Source Files ¶
Directories ¶
Path | Synopsis |
---|---|
_generate
module
|
|
cmd
|
|
internal/filepathx
Package filepathx adds double-star globbing support to the Glob function from the core path/filepath package.
|
Package filepathx adds double-star globbing support to the Glob function from the core path/filepath package. |
internal/readahead
Package readahead will do asynchronous read-ahead from an input io.Reader and make the data available as an io.Reader.
|
Package readahead will do asynchronous read-ahead from an input io.Reader and make the data available as an io.Reader. |