ace

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2024 License: BSD-3-Clause Imports: 1 Imported by: 0

README

ace

Package ace is an Abstract Compression Engine providing special-purpose lossless entropy coding/decoding.

The algorithm used is a simplified variation of Re-Pair, a grammar-based compression algorithm.

No wire/transport format is provided, hence the "abstract" part of the name.

The engine is not practical for very long input sequences/huge files as both encoding and decoding consumes several bytes of memory per every sequence symbol.

Symbols are 32 bit unsigned numbers, the length of the output sequence is limited to 2^32 symbols. However, many machines will run out of memory long before handling output sequences of such length.

It is usually possible to encode and later decode a 10^9 symbol sequence on a 16 GB RAM machine.

Compression ratios

me@3900x:~/src/modernc.org/ace$ make test
=== RUN   TestMemory0
    all_test.go:79: abcabdabcabd   : syms in=       12 syms out=      8 k=66.67% te=         8.62µs td=         1.37µs
    all_test.go:79: enwik3         : syms in=    1,000 syms out=    370 k=37.00% te=       332.58µs td=        11.94µs
    all_test.go:79: enwik4         : syms in=   10,000 syms out=  3,552 k=35.52% te=     2.933338ms td=       109.45µs
    all_test.go:79: enwik5         : syms in=  100,000 syms out= 25,913 k=25.91% te=    22.427729ms td=     1.467629ms
    all_test.go:79: enwik6         : syms in=1,000,000 syms out=195,100 k=19.51% te=    246.72995ms td=    10.718449ms
    all_test.go:79: gettysburg     : syms in=    1,463 syms out=    737 k=50.38% te=       530.71µs td=        14.45µs
    all_test.go:79: gettysburgx10  : syms in=   14,630 syms out=  1,497 k=10.23% te=     1.734398ms td=       116.87µs
    all_test.go:79: gettysburgx100 : syms in=  146,300 syms out=  1,510 k= 1.03% te=     8.984032ms td=      921.319µs
    all_test.go:79: zero3          : syms in=    1,000 syms out=     22 k= 2.20% te=        51.54µs td=         6.49µs
    all_test.go:79: zero4          : syms in=   10,000 syms out=     29 k= 0.29% te=       486.01µs td=       55.889µs
    all_test.go:79: zero5          : syms in=  100,000 syms out=     36 k= 0.04% te=     4.719957ms td=       526.45µs
    all_test.go:79: zero6          : syms in=1,000,000 syms out=     43 k= 0.00% te=    47.401958ms td=     9.755601ms
--- PASS: TestMemory0 (0.36s)
PASS
ok      modernc.org/ace 0.364s
me@3900x:~/src/modernc.org/ace$

Performance

me@3900x:~/src/modernc.org/ace$ make benchmark
goos: linux
goarch: amd64
pkg: modernc.org/ace
cpu: AMD Ryzen 9 3900X 12-Core Processor            
BenchmarkEncodeEnwik6-24               6     255495376 ns/op       3.91 MB/s    13102778 B/op       7572 allocs/op
BenchmarkDecodeEnwik6-24             100      19666232 ns/op      50.85 MB/s     9355974 B/op         64 allocs/op
PASS
ok      modernc.org/ace 4.264s
me@3900x:~/src/modernc.org/ace$

Documentation

Overview

Package ace is an Abstract Compression Engine providing special-purpose lossless entropy coding/decoding.

The algorithm used is a simplified variation of Re-Pair, a grammar-based compression algorithm.

No wire/transport format is provided, hence the "abstract" part of the name.

The engine is not practical for very long input sequences/huge files as both encoding and decoding consumes several bytes of memory per every sequence symbol.

Symbols are 32 bit unsigned numbers, the length of the output sequence is limited to 2^32 symbols. However, many machines will run out of memory long before handling output sequences of such length.

It is usually possible to encode and later decode a 10^9 symbol sequence on a 16 GB RAM machine.

Compression ratios

me@3900x:~/src/modernc.org/ace$ make test
=== RUN   TestMemory0
    all_test.go:79: abcabdabcabd   : syms in=       12 syms out=      8 k=66.67% te=         8.62µs td=         1.37µs
    all_test.go:79: enwik3         : syms in=    1,000 syms out=    370 k=37.00% te=       332.58µs td=        11.94µs
    all_test.go:79: enwik4         : syms in=   10,000 syms out=  3,552 k=35.52% te=     2.933338ms td=       109.45µs
    all_test.go:79: enwik5         : syms in=  100,000 syms out= 25,913 k=25.91% te=    22.427729ms td=     1.467629ms
    all_test.go:79: enwik6         : syms in=1,000,000 syms out=195,100 k=19.51% te=    246.72995ms td=    10.718449ms
    all_test.go:79: gettysburg     : syms in=    1,463 syms out=    737 k=50.38% te=       530.71µs td=        14.45µs
    all_test.go:79: gettysburgx10  : syms in=   14,630 syms out=  1,497 k=10.23% te=     1.734398ms td=       116.87µs
    all_test.go:79: gettysburgx100 : syms in=  146,300 syms out=  1,510 k= 1.03% te=     8.984032ms td=      921.319µs
    all_test.go:79: zero3          : syms in=    1,000 syms out=     22 k= 2.20% te=        51.54µs td=         6.49µs
    all_test.go:79: zero4          : syms in=   10,000 syms out=     29 k= 0.29% te=       486.01µs td=       55.889µs
    all_test.go:79: zero5          : syms in=  100,000 syms out=     36 k= 0.04% te=     4.719957ms td=       526.45µs
    all_test.go:79: zero6          : syms in=1,000,000 syms out=     43 k= 0.00% te=    47.401958ms td=     9.755601ms
--- PASS: TestMemory0 (0.36s)
PASS
ok      modernc.org/ace 0.364s
me@3900x:~/src/modernc.org/ace$

Performance

me@3900x:~/src/modernc.org/ace$ make benchmark
goos: linux
goarch: amd64
pkg: modernc.org/ace
cpu: AMD Ryzen 9 3900X 12-Core Processor
BenchmarkEncodeEnwik6-24               6     255495376 ns/op       3.91 MB/s    13102778 B/op       7572 allocs/op
BenchmarkDecodeEnwik6-24             100      19666232 ns/op      50.85 MB/s     9355974 B/op         64 allocs/op
PASS
ok      modernc.org/ace 4.264s
me@3900x:~/src/modernc.org/ace$

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Decoder added in v0.1.0

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder represents the decoding state.

func NewDecoder

func NewDecoder(firstMeta Symbol, write func(Symbol)) (*Decoder, error)

NewDecoder returns a newly created Decoder. The value of the 'firstMeta' argument must be the same one used to encode the sequence. See NewEncoder for more details.

The 'firstMeta' argument must be the same value used to encode the sequence.

The 'write' function is called to add symbols to the decoded sequence.

func (*Decoder) Decode added in v0.1.0

func (d *Decoder) Decode(s Symbol)

Decode adds the expansion of 's' to the decoded sequence.

type Encoder added in v0.1.0

type Encoder struct {
	// contains filtered or unexported fields
}

Encoder represents the encoding state.

func NewEncoder

func NewEncoder(window int, firstMeta Symbol, write func(Symbol)) (*Encoder, error)

NewEncoder returns a newly created Encoder.

The 'window' parameter sets the size of the compression window. The compression ratio may improve with larger values, but the improvements diminish quickly. A usable value for textual input sequences is ~32.

The 'write' function is called to add symbols to the encoded sequence.

The 'firstMeta' argument declares the limit of the values in the input sequence. If the input consists of, say ASCII-only characters, the value is 128 (or higher). For a byte sequence pass 256. If the symbols are Unicode runes then unicode.MaxRune+1 is a safe value of the 'firstMeta' argument. A possible way how to share this value without prior knowledge of the decoder could be, for example, to prepend the 'firstMeta' value to the output sequence, so the decoder can read it back.

func (*Encoder) Encode added in v0.1.0

func (e *Encoder) Encode(s Symbol)

Encode adds the encoding of 's' to the encoded sequence.

The 's' argument must be in [0, firstMeta).

func (*Encoder) Flush added in v0.1.0

func (e *Encoder) Flush()

Flush flushes the output sequence.

type Symbol

type Symbol uint32

Symbol represents a sequence item.

Directories

Path Synopsis
Command ace demonstrates a concrete implementation of the ACE algorithm for byte sequences.
Command ace demonstrates a concrete implementation of the ACE algorithm for byte sequences.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL