Documentation ¶
Index ¶
- Constants
- func CalculateEntropy(count []float64) (bitLengths []float64)
- func Compress(options *Options, outputType int, in []byte, out io.Writer) error
- func DeflateCompress(options *Options, in []byte, out io.Writer) error
- func GzipCompress(options *Options, in []byte, out io.Writer) error
- func ZlibCompress(options *Options, in []byte, out io.Writer) error
- type BlockState
- type Deflator
- type LZ77Store
- type Options
Constants ¶
const ( HASH_SHIFT = 5 HASH_MASK = 32767 )
const ( // Minimum and maximum length that can be encoded in deflate. MAX_MATCH = 258 MIN_MATCH = 3 // The window size for deflate. Must be a power of two. This should be // 32768, the maximum possible by the deflate spec. Anything less hurts // compression more than speed. WINDOW_SIZE = 32768 // The window mask used to wrap indices into the window. This is why the // window size must be a power of two. WINDOW_MASK = (WINDOW_SIZE - 1) // A block structure of huge, non-smart, blocks to divide the input into, to allow // operating on huge files without exceeding memory, such as the 1GB wiki9 corpus. // The whole compression algorithm, including the smarter block splitting, will // be executed independently on each huge block. // Dividing into huge blocks hurts compression, but not much relative to the size. // Set this to, for example, 20MB (20000000). Set it to 0 to disable master blocks. MASTER_BLOCK_SIZE = 20000000 // For longest match cache. max 256. Uses huge amounts of memory but makes it // faster. Uses this many times three bytes per single byte of the input data. // This is so because longest match finding has to find the exact distance // that belongs to each length for the best lz77 strategy. // Good values: e.g. 5, 8. CACHE_LENGTH = 8 // limit the max hash chain hits for this hash value. This has an effect only // on files where the hash value is the same very often. On these files, this // gives worse compression (the value should ideally be 32768, which is the // WINDOW_SIZE, while zlib uses 4096 even for best level), but makes it // faster on some specific files. // Good value: e.g. 8192. MAX_CHAIN_HITS = 8192 // Whether to use the longest match cache for FindLongestMatch. This cache // consumes a lot of memory but speeds it up. No effect on compression size. LONGEST_MATCH_CACHE = true // Enable to remember amount of successive identical bytes in the hash chain for // finding longest match // required for HASH_SAME_HASH and SHORTCUT_LONG_REPETITIONS // This has no effect on the compression result, and enabling it increases speed. HASH_SAME = true // Switch to a faster hash based on the info from HASH_SAME once the // best length so far is long enough. This is way faster for files with lots of // identical bytes, on which the compressor is otherwise too slow. Regular files // are unaffected or maybe a tiny bit slower. // This has no effect on the compression result, only on speed. HASH_SAME_HASH = true // Enable this, to avoid slowness for files which are a repetition of the same // character more than a multiple of MAX_MATCH times. This should not affect // the compression result. SHORTCUT_LONG_REPETITIONS = true // Whether to use lazy matching in the greedy LZ77 implementation. This gives a // better result of LZ77Greedy, but the effect this has on the optimal LZ77 // varies from file to file. LAZY_MATCHING = true )
const ( FORMAT_GZIP = iota FORMAT_ZLIB FORMAT_DEFLATE )
Output format
const ( UNCOMPRESSED_BLOCK = iota FIXED_BLOCK DYNAMIC_BLOCK )
Block type
Variables ¶
This section is empty.
Functions ¶
func CalculateEntropy ¶
Calculates the entropy of each symbol, based on the counts of each symbol. The result is similar to the result of CalculateBitLengths, but with the actual theoritical bit lengths according to the entropy. Since the resulting values are fractional, they cannot be used to encode the tree specified by DEFLATE.
func GzipCompress ¶
Compresses according to the gzip specification and writes the compressed result to the output.
options: global program options out: writer to which the result is appended
Types ¶
type BlockState ¶
type BlockState struct {
// contains filtered or unexported fields
}
Some state information for compressing a block. This is currently a bit under-used (with mainly only the longest match cache), but is kept for easy future expansion.
func NewBlockState ¶
func NewBlockState(options *Options, in []byte, inStart, inEnd int) (s BlockState)
func (*BlockState) LZ77Greedy ¶
func (s *BlockState) LZ77Greedy(inStart, inEnd int) (store LZ77Store)
Does LZ77 using an algorithm similar to gzip, with lazy matching, rather than with the slow but better "squeeze" implementation. The result is placed in the LZ77Store. If inStart is larger than 0, it uses values before inStart as starting dictionary.
func (*BlockState) LZ77Optimal ¶
func (s *BlockState) LZ77Optimal(inStart, inEnd int) LZ77Store
Calculates lit/len and dist pairs for given data. If instart is larger than 0, it uses values before instart as starting dictionary.
func (*BlockState) LZ77OptimalFixed ¶
func (s *BlockState) LZ77OptimalFixed(inStart, inEnd int) LZ77Store
Does the same as LZ77Optimal, but optimized for the fixed tree of the deflate standard. The fixed tree never gives the best compression. But this gives the best possible LZ77 encoding possible with the fixed tree. This does not create or output any fixed tree, only LZ77 data optimized for using with a fixed tree. If inStart is larger than 0, it uses values before inStart as starting dictionary.
type Deflator ¶
type Deflator struct {
// contains filtered or unexported fields
}
func (*Deflator) Deflate ¶
Compresses according to the deflate specification and append the compressed result to the output. This function will usually output multiple deflate blocks. If final is 1, then the final bit will be set on the last block.
final: whether this is the last section of the input, sets the final bit to the
last deflate block.
in: the input bytes
func (*Deflator) DeflatePart ¶
Deflate a part, to allow Deflate() to use multiple master blocks if needed.
Like Deflate, but allows to specify start and end byte with inStart and inEnd. Only that part is compressed, but earlier bytes are still used for the back window.
It is possible to call this function multiple times in a row, shifting inStart and inEnd to next bytes of the data. If inStart is larger than 0, then previous bytes are used as the initial dictionary for LZ77. This function will usually output multiple deflate blocks. If final is 1, then the final bit will be set on the last block.
func (*Deflator) WriteLZ77Block ¶
func (z *Deflator) WriteLZ77Block(blockType byte, final bool, store LZ77Store, expectedDataSize int)
Adds a deflate block with the given LZ77 data to the output. z: the stream to write to blockType: the block type, must be 1 or 2 final: whether to set the "final" bit on this block, must be the last block store: literal/length/distance array of the LZ77 data expectedDataSize: the uncompressed block size, used for panic, but you can
set it to 0 to not do the assertion.
type LZ77Store ¶
type LZ77Store []lz77Pair
Stores lit/length and dist pairs for LZ77.
func (LZ77Store) CalculateBlockSize ¶
Calculates block size in bits. litLens: lz77 lit/lengths dists: ll77 distances
type Options ¶
type Options struct { // Whether to print output Verbose bool // Whether to print more detailed output VerboseMore bool // Maximum amount of times to rerun forward and backward pass to optimize // LZ77 compression cost. Good values: 10, 15 for small files, 5 for files // over several MB in size or it will be too slow. NumIterations int // If true, splits the data in multiple deflate blocks with optimal choice // for the block boundaries. Block splitting gives better compression. Default: // true. BlockSplitting bool // If true, chooses the optimal block split points only after doing the iterative // LZ77 compression. If false, chooses the block split points first, then does // iterative LZ77 on each individual block. Depending on the file, either first // or last gives the best compression. Default: false. BlockSplittingLast bool // Maximum amount of blocks to split into (0 for unlimited, but this can give // extreme results that hurt compression on some files). Default value: 15. BlockSplittingMax int // The deflate block type. Use 2 for best compression. // -0: non compressed blocks (00) // -1: blocks with fixed tree (01) // -2: blocks with dynamic tree (10) BlockType byte }
Options used throughout the program.
func DefaultOptions ¶
func DefaultOptions() (options Options)