doppelmark is a high-performance duplicate sequencing read marking
tool for marking PCR and optical(pad-hopping) duplicate reads. It is
functionally equivalent to the picard and sambamba duplicate marking
tools, but runs much more efficiently and takes advantage of
multi-core hardware. For some workloads and hardware, doppelmark is
100x faster than picard, and 7x faster than sambamba.
doppelmark achieves its speedup by dividing the input into shards and
running the shards in parallel. Each shard includes input
decompression, duplicate marking, and compression of the resulting
output data. It detects duplicates without sorting all records. For a
detailed description of the algorithm and design,
see doc.go.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.