serialization

package
v0.0.0-...-e8aa44e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 16, 2025 License: MIT Imports: 14 Imported by: 1

README

I/O Performance

The IO package contains the code to read/write the measurement results. In order to optimize tool performance we need to optimize CPU, since this is one of the common bottlenecks (when using a sufficiently provisioned link of course)

Unfortunately, there are many performance benchmarks and just as many different results.

Benchmark of many serialization libs

Sonic promises to be very fast, but is limited to ARM architectures.

Protobuf

How to use
  • Install protobuf compiler (protoc) and go-tooling (protoc-gen-go)
  • Update model.proto
  • Run protoc in folder dnsmonitor/collector/io to generate go code
    • cd resolve/serialization

    • protoc --go_out=. protobuf/protobuf_model.proto

Pros

  • Over 50% performance increase compared to standard lib (Date: 03.05.2023)
  • Also smaller file sizes

Cons

  • Less convenient to use (need to maintain .proto file)
  • Not human-readable anymore
  • No build-in support for writing multiple messages into a file. Need to implement it ourselves.

Json

  • The default serializer is not very performant, because it uses reflection
  • easyjson or ffjson can generate unmarshal methods to perform better
  • Install and run
    • go get -u github.com/mailru/easyjson/...

    • easyjson -all .go

  • Around 30% performance increase compared to standard lib (Date: 03.05.2023)

Zipping

  • For zipping we use klauspost/compress as it provides many different implementations and seems to be well maintained.
  • There is a unit test TestProto_Benchmark_Zip which can be used to do some basic benchmarking
  • In general ZST > GZIP > ZIP, so it is recommended to only use GZIP or ZIP for compatibility reasons
  • A super rough (and non-representative) tested on 2000 domains of actual output (100 domains per file):
Algo Params writeParallelism outputSize writeDuration
ZSTD FastedSpeed 1 527MB 9.5s
DEFLATE (zip) - 1 830MB 26.7s
DEFLATE (zip) - 2 830MB 18.3s
DEFLATE (zip) - 5 830MB 15.0s
GZIP FastestSpeed 1 906MB 14.9s
GZIP FastestSpeed 2 906MB 10.3s
GZIP DefaultCompr 2 830MB 16.4s

Future Ideas

Avro: Small file sizes, but according to the benchmark, might be CPU-intensive SQL: Could have a good performance, will be nice for later evaluation

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrPoolClosed = errors.New("pool closed")
View Source
var InnerWriterFactory = newBufferedFileWriter

InnerWriterFactory lets you override writer creation for testing purposes

Functions

func InitZipWriter

func InitZipWriter(writer io.Writer, zipAlgorithm ZipAlgorithm, compression CompressionLevel) (io.Writer, func() error, error)

func OpenReader

func OpenReader(fileName string, zipAlgorithm ZipAlgorithm) (io.Reader, func() error, error)

func ParseZip

func ParseZip(zipSetting string) (ZipAlgorithm, CompressionLevel, error)

ParseZip parses a string with format 'algo' or 'algo:level' e.g. 'zstd' or 'zstd:fastest' Allowed algorithms: "" (none), "gzip", "default" (zstd), "zstd", "deflate" Allowed compression levels: "default" (fastest), "fastest", "fast", "better", "best". Note that for 'deflate', the compression level will not make any difference.

Types

type CompressionLevel

type CompressionLevel int

CompressionLevel controls the level of compression to use.

const (
	// CompressionFastest provides the fastest compression speed with the given ZipAlgorithm
	// When changing compression levels, always make sure that your machine can
	// write the data as fast as it is collected. Otherwise memory will overflow.
	CompressionFastest CompressionLevel = iota

	// CompressionFast provides a fast compression speed with the given ZipAlgorithm
	// But smaller file sizes than CompressionFastest.
	// When changing compression levels, always make sure that your machine can
	// write the data as fast as it is collected. Otherwise memory will overflow.
	CompressionFast

	// CompressionBetter provides a smaller file size than CompressionFast and CompressionFastest
	// using the given ZipAlgorithm. CompressionBetter can lead to memory on many machines,
	// as data cannot be written as fast as it is collected. When changing compression levels,
	// always make sure that data can be written fast enough
	CompressionBetter

	// CompressionBest provides a smaller file size than CompressionFast and CompressionFastest
	// using the given ZipAlgorithm.
	CompressionBest
)

type FileWriterBase

type FileWriterBase struct {
	OutDir           string
	FilePrefix       string
	FileExtension    string
	OutputFileSize   uint
	ZipAlgorithm     ZipAlgorithm
	CompressionLevel CompressionLevel

	// RandomFileSuffix avoids that subsequent runs in the same directory overwrite files
	// Just a small safeguard against data loss.
	RandomFileSuffix string
	// contains filtered or unexported fields
}

func NewFileWriterBase

func NewFileWriterBase(outDir string,
	filePrefix string,
	fileExtension string,
	outputFileSize uint,
	parallelFiles uint32,
	renameFiles bool,
	zipAlgo ZipAlgorithm,
	compression CompressionLevel) *FileWriterBase

func (*FileWriterBase) CloseAll

func (j *FileWriterBase) CloseAll() error

CloseAll flushes and closes all writer in the pool Not safe to use concurrently with GetWriter() or writing

func (*FileWriterBase) GetWriter

func (j *FileWriterBase) GetWriter() (io.WriteCloser, error)

type Read

type Read struct {
	Result resolver.Result
	Error  error
}

type ZipAlgorithm

type ZipAlgorithm int

ZipAlgorithm is the type of the zip algorithm to use.

const (
	// ZipNone will not compress the output
	ZipNone ZipAlgorithm = iota

	// ZipDefault will choose the default compression algorithm.
	// which is currently ZipZSTD.
	ZipDefault

	// ZipDeflate will use the Deflate compression algorithm.
	// It will produce a .zip archive.
	// When using Deflate, consider adding some writeParallelism,
	// as it does not parallelize inherently. However, too much
	// writeParallelism comes with other downsides.
	ZipDeflate

	// ZipZSTD will use the ZSTD compression algorithm.
	// It will produce a .zst file
	// It provides the fastest and best compression.
	ZipZSTD

	// ZipGZIP will use the GZIP compression algorithm.
	// It will produce a .gz file
	// When using GZIP, consider adding some writeParallelism,
	// as it does not parallelize inherently. However, too much
	// writeParallelism comes with other downsides.
	ZipGZIP
)

func GetZipAlgoFromExtensions

func GetZipAlgoFromExtensions(fileName string) ZipAlgorithm

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL