Benchmark
Datasets and results are described at http://shenwei356.github.io/seqkit/benchmark
The benchmark needs be performed in Linux-like operating systems.
Install softwares
Softwares
- seqkit. (Go).
Version v0.3.1.1.
- fasta_utilities. (Perl).
Version 3dcc0bc.
Lots of dependencies to install_.
- fastx_toolkit. (Perl).
Version 0.0.13.
Can't handle multi-line FASTA files_.
- seqmagick. (Python).
Version 0.6.1
- seqtk. (C).
Version 1.1-r92-dirty.
A Python script memusg was used
to computate running time and peak memory usage of a process.
Attention: the fasta_utilities
uses Perl module Term-ProgressBar
which makes it failed to run when using benchmark script run_benchmark_00_all.pl
.
Please change the source code of ProgressBar.pm (for me, the path is
/usr/share/perl5/vendor_perl/Term/ProgressBar.pm
). Add the code below after line 535
:
$config{bar_width} = 1 if $config{bar_width} < 1;
The edited code is
} else {
$config{bar_width} = $target;
$config{bar_width} = 1 if $config{bar_width} < 1; # new line
die "configured bar_width $config{bar_width} < 1"
if $config{bar_width} < 1;
}
Clone this repository
git clone https://github.com/shenwei356/seqkit
cd seqkit/benchmark
Data preparation
http://shenwei356.github.io/seqkit/benchmark/#datasets
Or download all test data seqkit-benchmark-data.tar.gz
(2.2G) and uncompress it, and then move them into directory seqkit/benchmark
wget http://app.shenwei.me/data/seqkit/seqkit-benchmark-data.tar.gz
tar -zxvf seqkit-benchmark-data.tar.gz
mv seqkit-benchmark-data/* seqkit/benchmark
Run tests
A Perl scripts
run.pl
is used to automatically running tests and generate data for plotting.
$ perl run.pl -h
Usage:
1. Run all tests:
perl run.pl run_benchmark*.sh --outfile benchmark.5test.tsv
2. Run one test:
perl run.pl run_benchmark_04_remove_duplicated_seqs_by_name.sh -o benchmark.rmdup.tsv
3. Custom repeate times:
perl run.pl -n 3 run_benchmark_04_remove_duplicated_seqs_by_name.sh -o benchmark.rmdup.tsv
To compare performance between different softwares, run:
./run.pl run_benchmark*.sh -n 3 -o benchmark.5tests.tsv
To test performance of other functions in seqkit, run:
./run.pl run_test*.sh -n 1 -o benchmark.seqkit.tsv
Plot result
R libraries dplyr
, ggplot2
, scales
, ggthemes
, ggrepel
are needed.
Plot for result of the five tests:
./plot.R -i benchmark.5tests.tsv
Plot for result of the tests of other functions in seqkit:
./plot.R -i benchmark.seqkit.tsv --width 5 --height 3
./plot.R -i benchmark.5tests.tsv --width 8 --height 3 --lx 0.75 --ly 0.3