README
¶
I/O Performance
The IO package contains the code to read/write the measurement results. In order to optimize tool performance we need to optimize CPU, since this is one of the common bottlenecks (when using a sufficiently provisioned link of course)
Unfortunately, there are many performance benchmarks and just as many different results.
Benchmark of many serialization libs
Sonic promises to be very fast, but is limited to ARM architectures.
Protobuf
How to use
- Install protobuf compiler (protoc) and go-tooling (protoc-gen-go)
- Update model.proto
- Run protoc in folder
dnsmonitor/collector/io
to generate go code-
cd resolve/serialization
-
protoc --go_out=. protobuf/protobuf_model.proto
-
Pros
- Over 50% performance increase compared to standard lib (Date: 03.05.2023)
- Also smaller file sizes
Cons
- Less convenient to use (need to maintain .proto file)
- Not human-readable anymore
- No build-in support for writing multiple messages into a file. Need to implement it ourselves.
Json
- The default serializer is not very performant, because it uses reflection
- easyjson or ffjson can generate unmarshal methods to perform better
- Install and run
-
go get -u github.com/mailru/easyjson/...
-
easyjson -all .go
-
- Around 30% performance increase compared to standard lib (Date: 03.05.2023)
Zipping
- For zipping we use klauspost/compress as it provides many different implementations and seems to be well maintained.
- There is a unit test TestProto_Benchmark_Zip which can be used to do some basic benchmarking
- In general ZST > GZIP > ZIP, so it is recommended to only use GZIP or ZIP for compatibility reasons
- A super rough (and non-representative) tested on 2000 domains of actual output (100 domains per file):
Algo | Params | writeParallelism | outputSize | writeDuration |
---|---|---|---|---|
ZSTD | FastedSpeed | 1 | 527MB | 9.5s |
DEFLATE (zip) | - | 1 | 830MB | 26.7s |
DEFLATE (zip) | - | 2 | 830MB | 18.3s |
DEFLATE (zip) | - | 5 | 830MB | 15.0s |
GZIP | FastestSpeed | 1 | 906MB | 14.9s |
GZIP | FastestSpeed | 2 | 906MB | 10.3s |
GZIP | DefaultCompr | 2 | 830MB | 16.4s |
Future Ideas
Avro: Small file sizes, but according to the benchmark, might be CPU-intensive SQL: Could have a good performance, will be nice for later evaluation
Documentation
¶
Index ¶
- Variables
- func InitZipWriter(writer io.Writer, zipAlgorithm ZipAlgorithm, compression CompressionLevel) (io.Writer, func() error, error)
- func OpenReader(fileName string, zipAlgorithm ZipAlgorithm) (io.Reader, func() error, error)
- func ParseZip(zipSetting string) (ZipAlgorithm, CompressionLevel, error)
- type CompressionLevel
- type FileWriterBase
- type Read
- type ZipAlgorithm
Constants ¶
This section is empty.
Variables ¶
var ErrPoolClosed = errors.New("pool closed")
var InnerWriterFactory = newBufferedFileWriter
InnerWriterFactory lets you override writer creation for testing purposes
Functions ¶
func InitZipWriter ¶
func InitZipWriter(writer io.Writer, zipAlgorithm ZipAlgorithm, compression CompressionLevel) (io.Writer, func() error, error)
func OpenReader ¶
func ParseZip ¶
func ParseZip(zipSetting string) (ZipAlgorithm, CompressionLevel, error)
ParseZip parses a string with format 'algo' or 'algo:level' e.g. 'zstd' or 'zstd:fastest' Allowed algorithms: "" (none), "gzip", "default" (zstd), "zstd", "deflate" Allowed compression levels: "default" (fastest), "fastest", "fast", "better", "best". Note that for 'deflate', the compression level will not make any difference.
Types ¶
type CompressionLevel ¶
type CompressionLevel int
CompressionLevel controls the level of compression to use.
const ( // CompressionFastest provides the fastest compression speed with the given ZipAlgorithm // When changing compression levels, always make sure that your machine can // write the data as fast as it is collected. Otherwise memory will overflow. CompressionFastest CompressionLevel = iota // CompressionFast provides a fast compression speed with the given ZipAlgorithm // But smaller file sizes than CompressionFastest. // When changing compression levels, always make sure that your machine can // write the data as fast as it is collected. Otherwise memory will overflow. CompressionFast // CompressionBetter provides a smaller file size than CompressionFast and CompressionFastest // using the given ZipAlgorithm. CompressionBetter can lead to memory on many machines, // as data cannot be written as fast as it is collected. When changing compression levels, // always make sure that data can be written fast enough CompressionBetter // CompressionBest provides a smaller file size than CompressionFast and CompressionFastest // using the given ZipAlgorithm. CompressionBest )
type FileWriterBase ¶
type FileWriterBase struct { OutDir string FilePrefix string FileExtension string OutputFileSize uint ZipAlgorithm ZipAlgorithm CompressionLevel CompressionLevel // RandomFileSuffix avoids that subsequent runs in the same directory overwrite files // Just a small safeguard against data loss. RandomFileSuffix string // contains filtered or unexported fields }
func NewFileWriterBase ¶
func NewFileWriterBase(outDir string, filePrefix string, fileExtension string, outputFileSize uint, parallelFiles uint32, renameFiles bool, zipAlgo ZipAlgorithm, compression CompressionLevel) *FileWriterBase
func (*FileWriterBase) CloseAll ¶
func (j *FileWriterBase) CloseAll() error
CloseAll flushes and closes all writer in the pool Not safe to use concurrently with GetWriter() or writing
func (*FileWriterBase) GetWriter ¶
func (j *FileWriterBase) GetWriter() (io.WriteCloser, error)
type ZipAlgorithm ¶
type ZipAlgorithm int
ZipAlgorithm is the type of the zip algorithm to use.
const ( // ZipNone will not compress the output ZipNone ZipAlgorithm = iota // ZipDefault will choose the default compression algorithm. // which is currently ZipZSTD. ZipDefault // ZipDeflate will use the Deflate compression algorithm. // It will produce a .zip archive. // When using Deflate, consider adding some writeParallelism, // as it does not parallelize inherently. However, too much // writeParallelism comes with other downsides. ZipDeflate // ZipZSTD will use the ZSTD compression algorithm. // It will produce a .zst file // It provides the fastest and best compression. ZipZSTD // ZipGZIP will use the GZIP compression algorithm. // It will produce a .gz file // When using GZIP, consider adding some writeParallelism, // as it does not parallelize inherently. However, too much // writeParallelism comes with other downsides. ZipGZIP )
func GetZipAlgoFromExtensions ¶
func GetZipAlgoFromExtensions(fileName string) ZipAlgorithm