serialization

package

v0.0.0-...-e8aa44e Latest Latest Go to latest Published: Jan 16, 2025 License: MIT Imports: 14 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

README ¶

I/O Performance

The IO package contains the code to read/write the measurement results. In order to optimize tool performance we need to optimize CPU, since this is one of the common bottlenecks (when using a sufficiently provisioned link of course)

Unfortunately, there are many performance benchmarks and just as many different results.

Benchmark of many serialization libs

Sonic promises to be very fast, but is limited to ARM architectures.

Protobuf

How to use

Install protobuf compiler (protoc) and go-tooling (protoc-gen-go)
Update model.proto
Run protoc in folder dnsmonitor/collector/io to generate go code
- cd resolve/serialization
- protoc --go_out=. protobuf/protobuf_model.proto

Pros

Over 50% performance increase compared to standard lib (Date: 03.05.2023)
Also smaller file sizes

Cons

Less convenient to use (need to maintain .proto file)
Not human-readable anymore
No build-in support for writing multiple messages into a file. Need to implement it ourselves.

Json

The default serializer is not very performant, because it uses reflection
easyjson or ffjson can generate unmarshal methods to perform better
Install and run
- go get -u github.com/mailru/easyjson/...
- easyjson -all .go
Around 30% performance increase compared to standard lib (Date: 03.05.2023)

Zipping

For zipping we use klauspost/compress as it provides many different implementations and seems to be well maintained.
There is a unit test TestProto_Benchmark_Zip which can be used to do some basic benchmarking
In general ZST > GZIP > ZIP, so it is recommended to only use GZIP or ZIP for compatibility reasons
A super rough (and non-representative) tested on 2000 domains of actual output (100 domains per file):

Algo	Params	writeParallelism	outputSize	writeDuration
ZSTD	FastedSpeed	1	527MB	9.5s
DEFLATE (zip)	-	1	830MB	26.7s
DEFLATE (zip)	-	2	830MB	18.3s
DEFLATE (zip)	-	5	830MB	15.0s
GZIP	FastestSpeed	1	906MB	14.9s
GZIP	FastestSpeed	2	906MB	10.3s
GZIP	DefaultCompr	2	830MB	16.4s

Future Ideas

Avro: Small file sizes, but according to the benchmark, might be CPU-intensive SQL: Could have a good performance, will be nice for later evaluation

Documentation ¶

Index ¶

Variables
func InitZipWriter(writer io.Writer, zipAlgorithm ZipAlgorithm, compression CompressionLevel) (io.Writer, func() error, error)
func OpenReader(fileName string, zipAlgorithm ZipAlgorithm) (io.Reader, func() error, error)
func ParseZip(zipSetting string) (ZipAlgorithm, CompressionLevel, error)
type CompressionLevel
type FileWriterBase
- func NewFileWriterBase(outDir string, filePrefix string, fileExtension string, outputFileSize uint, ...) *FileWriterBase
- func (j *FileWriterBase) CloseAll() error
- func (j *FileWriterBase) GetWriter() (io.WriteCloser, error)
type Read
type ZipAlgorithm
- func GetZipAlgoFromExtensions(fileName string) ZipAlgorithm

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrPoolClosed = errors.New("pool closed")

View Source

var InnerWriterFactory = newBufferedFileWriter

InnerWriterFactory lets you override writer creation for testing purposes

Functions ¶

func InitZipWriter ¶

func InitZipWriter(writer io.Writer, zipAlgorithm ZipAlgorithm, compression CompressionLevel) (io.Writer, func() error, error)

func OpenReader ¶

func OpenReader(fileName string, zipAlgorithm ZipAlgorithm) (io.Reader, func() error, error)

func ParseZip ¶

func ParseZip(zipSetting string) (ZipAlgorithm, CompressionLevel, error)

ParseZip parses a string with format 'algo' or 'algo:level' e.g. 'zstd' or 'zstd:fastest' Allowed algorithms: "" (none), "gzip", "default" (zstd), "zstd", "deflate" Allowed compression levels: "default" (fastest), "fastest", "fast", "better", "best". Note that for 'deflate', the compression level will not make any difference.

Types ¶

type CompressionLevel ¶

type CompressionLevel int

CompressionLevel controls the level of compression to use.

const (
	// CompressionFastest provides the fastest compression speed with the given ZipAlgorithm
	// When changing compression levels, always make sure that your machine can
	// write the data as fast as it is collected. Otherwise memory will overflow.
	CompressionFastest CompressionLevel = iota

	// CompressionFast provides a fast compression speed with the given ZipAlgorithm
	// But smaller file sizes than CompressionFastest.
	// When changing compression levels, always make sure that your machine can
	// write the data as fast as it is collected. Otherwise memory will overflow.
	CompressionFast

	// CompressionBetter provides a smaller file size than CompressionFast and CompressionFastest
	// using the given ZipAlgorithm. CompressionBetter can lead to memory on many machines,
	// as data cannot be written as fast as it is collected. When changing compression levels,
	// always make sure that data can be written fast enough
	CompressionBetter

	// CompressionBest provides a smaller file size than CompressionFast and CompressionFastest
	// using the given ZipAlgorithm.
	CompressionBest
)

type FileWriterBase ¶

type FileWriterBase struct {
	OutDir           string
	FilePrefix       string
	FileExtension    string
	OutputFileSize   uint
	ZipAlgorithm     ZipAlgorithm
	CompressionLevel CompressionLevel

	// RandomFileSuffix avoids that subsequent runs in the same directory overwrite files
	// Just a small safeguard against data loss.
	RandomFileSuffix string
	// contains filtered or unexported fields
}

func NewFileWriterBase ¶

func NewFileWriterBase(outDir string,
	filePrefix string,
	fileExtension string,
	outputFileSize uint,
	parallelFiles uint32,
	renameFiles bool,
	zipAlgo ZipAlgorithm,
	compression CompressionLevel) *FileWriterBase

func (*FileWriterBase) CloseAll ¶

func (j *FileWriterBase) CloseAll() error

CloseAll flushes and closes all writer in the pool Not safe to use concurrently with GetWriter() or writing

func (*FileWriterBase) GetWriter ¶

func (j *FileWriterBase) GetWriter() (io.WriteCloser, error)

type Read ¶

type Read struct {
	Result resolver.Result
	Error  error
}

type ZipAlgorithm ¶

type ZipAlgorithm int

ZipAlgorithm is the type of the zip algorithm to use.

const (
	// ZipNone will not compress the output
	ZipNone ZipAlgorithm = iota

	// ZipDefault will choose the default compression algorithm.
	// which is currently ZipZSTD.
	ZipDefault

	// ZipDeflate will use the Deflate compression algorithm.
	// It will produce a .zip archive.
	// When using Deflate, consider adding some writeParallelism,
	// as it does not parallelize inherently. However, too much
	// writeParallelism comes with other downsides.
	ZipDeflate

	// ZipZSTD will use the ZSTD compression algorithm.
	// It will produce a .zst file
	// It provides the fastest and best compression.
	ZipZSTD

	// ZipGZIP will use the GZIP compression algorithm.
	// It will produce a .gz file
	// When using GZIP, consider adding some writeParallelism,
	// as it does not parallelize inherently. However, too much
	// writeParallelism comes with other downsides.
	ZipGZIP
)

func GetZipAlgoFromExtensions ¶

func GetZipAlgoFromExtensions(fileName string) ZipAlgorithm

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
input
json
protobuf

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL