minimus

package module
v0.0.0-...-28f2576 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 11, 2020 License: Apache-2.0 Imports: 8 Imported by: 0

README

minimus-encoder

Project description

Data stream encoder/decoder for compressing sequences of float/integer values to bits.

It focuses on encoding vectors of elements, i.e. interleaved series.

Heavily inspired by Facebook's Gorilla TSDB paper.

Shipped with a lossy float64 transform function, allowing more efficient (but lossy) storage. This module can still be used for lossless encoding.

Thanks icza for his bitio library, heavily used in this project.

Current status

The project is quite young and could definitely use more testing and eventually some optimizations. I am currently using it to store large amounts of fixed-interval time-series on the cloud.

There is currently an issue with values that rapidly oscillates around zero (i.e. flipping the float64's sign bit too often). In case you would like to use this project, it is best advised to apply a bias to your inputs in order to avoid flipping the sign bit too often.

Pull requests and suggestions are welcome, feel free to open an issue.

Example

See the example directory. The program compresses/uncompresses an arbitrary sequence of vectors with different loss levels. Finally, it prints out both inputs and outputs sequences, and displays the mean compressed data bits per sample.

go run ./example

        0: 60.4700  0.0163  1.0000  1.0000  => 60.3750  0.0156  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0000  0.0078  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.3750  0.0117  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0000  0.0156 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.0625  2.1250  3.2500  4.3750
|e|=1e-01:  6.999 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0162  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0499  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0699  0.0216 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-04: 11.838 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-07: 16.819 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-10: 21.762 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-13: 26.543 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
|e|=1e-16: 29.764 b/sample

        0: 60.4700  0.0163  1.0000  1.0000  => 60.4700  0.0163  1.0000  1.0000
        1: 94.0500  0.0105  1.0000  1.0000  => 94.0500  0.0105  1.0000  1.0000
        2: 66.4600  0.0148  1.0000  1.0000  => 66.4600  0.0148  1.0000  1.0000
      999: 45.0700  0.0217 32.0000 77.0000  => 45.0700  0.0217 32.0000 77.0000
     1000:  1.1111  2.2222  3.3333  4.4444  =>  1.1111  2.2222  3.3333  4.4444
(lossless)    30.078 b/sample

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BitHint

type BitHint uint8

func LossyFloat64

func LossyFloat64(n float64, maxAbsError float64, hint BitHint) (float64, BitHint)

LossyFloat64 transforms a float64 into a compress-friendly approximation.

The function guarantees that abs(result - n) < maxAbsError.

Under the hood, the function zero-outs as many least significant bits.

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder reads a compressed data stream and allows iterating over the resulting decoded sequence of Vec64

func NewDecoder

func NewDecoder(src io.Reader, span int) *Decoder

NewDecoder allocates and initializes a new Decoder

func (*Decoder) Current

func (dec *Decoder) Current() Vec64

Current returns the last Vec64 decoded after calling Next. Only valid if Next returned true.

func (*Decoder) EnumBorrow

func (dec *Decoder) EnumBorrow(ctx context.Context, out chan Vec64, pool *VecPool) error

EnumBorrow is a helper method for enumerating Vec64s into a go channel.

Channel elements should be returned to the pool when no longer needed.

func (*Decoder) Err

func (dec *Decoder) Err() error

Err returns the last error after calling Next. Err returns nil at EOF, not io.EOF.

func (*Decoder) Next

func (dec *Decoder) Next() bool

Next decodes the next Vec64 element from the stream. Returns false if an error happens, or end of stream is reached.

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

Encoder stores the encoding context for compressing a sequence of Vec64 to an io.Writer

func NewEncoder

func NewEncoder(dst io.Writer, span int) *Encoder

NewEncoder allocates and initializes a new Encoder to compress sequences of Vec64

func (*Encoder) Close

func (enc *Encoder) Close() error

Close appends an EOF bit sequence and flushes any remaining buffered data to the underlying stream

func (*Encoder) Put

func (enc *Encoder) Put(vec Vec64) error

Put encodes a Vec64 into a compressed, variable bits sequence and writes the result to the underlying stream.

Put will panic if the given Vec64 has a different span than the encoder.

func (*Encoder) PutFloat64

func (enc *Encoder) PutFloat64(vec []float64) error

PutFloat64 is a short-hand for appending a vector of float64 to the encoder.

func (*Encoder) PutUint64

func (enc *Encoder) PutUint64(vec []uint64) error

PutUint64 is a short-hand for appending a vector of uint64 to the encoder.

func (*Encoder) Reset

func (enc *Encoder) Reset(dst io.Writer)

Reset the internal state of the encoder to write to a new given stream.

type Vec64

type Vec64 []uint64

Vec64 represents a N vector of 64-bit primitives

func (Vec64) Float64

func (v Vec64) Float64() []float64

Float64 casts a Vec64 as a float64 slice without copying it

func (Vec64) Uint64

func (v Vec64) Uint64() []uint64

Uint64 casts a Vec64 as a uint64 slice without copying it

type VecPool

type VecPool struct {
	// contains filtered or unexported fields
}

VecPool is a concurrent pool of Vec64

func NewVecPool

func NewVecPool(span int) *VecPool

NewVecPool creates a new empty VecPool

func (*VecPool) Get

func (p *VecPool) Get() Vec64

Get takes or allocates a new Vec64 from the pool

func (*VecPool) Put

func (p *VecPool) Put(v Vec64)

Put returns an existing Vec64 to the pool

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL