sorter

package
v0.0.0-...-9ba24aa Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 1, 2024 License: Apache-2.0 Imports: 26 Imported by: 0

Documentation

Index

Constants

View Source
const DefaultParallelism = 2

DefaultParallelism is the default value for SortOptions.Parallelism.

View Source
const DefaultSortBatchSize = 1 << 20

DefaultSortBatchSize is the default number of records to keep in memory before resorting to external sorting.

Variables

This section is empty.

Functions

func BAMFromSortShards

func BAMFromSortShards(paths []string, bamPath string) error

BAMFromSortShards merges a set of sortshard files into a single BAM file.

func PAMFromSortShards

func PAMFromSortShards(paths []string, pamPath string, recordsPerShard int64, parallelism int) error

PAMFromSortShards merges a set of sortshard files into a single PAM file. recordsPerShard is the goal # of reads to store in each rowshard.

Types

type SortOptions

type SortOptions struct {
	// ShardIndex must be a number unique to this sorter, across all sorters for
	// shards that are eventually merged into one BAM or PAM file.
	//
	// ShardIndex defines the sort order of reads at the same (ref,pos), but on
	// different Sorters. If ShardIndex==0, it is set to sha(sortshardpath).
	ShardIndex uint32

	// SortBatchSize is the number of sam.Records to keep in memory before
	// resorting to external sorting.  Not for general use; the default value
	// should suffice for most applications.
	SortBatchSize int

	// MaxParallelism limits the number of background sorts. Max memory
	// consumption of the sorter grows linearly with this value. If <= 0,
	// DefaultMaxParallelism is used.
	Parallelism int

	// NoCompressTmpFiles, if false (default), compress sortshards using snappy.
	// Compression is a big win on an EC2 EBS. It will slow sort down by a minor
	// degree on fast NVMe disks.
	NoCompressTmpFiles bool

	// TmpDir defines the directory to store temp files created during merge.  ""
	// means the system default, usually /tmp.
	TmpDir string
}

SortOptions controls options passed to the toplevel Sort.

type Sorter

type Sorter struct {
	// contains filtered or unexported fields
}

Sorter sorts list of sam.Records and produces a sortshard file in "outPath". SortedShardsToBAM can be later used to merge multiple sorted shard files into a BAM file. "header" must contain all the references used by records to be added later.

Sorter orders records in the following way:

- Increasing reference sequence IDs, then - increasing alignment positions, then - sorts a forward read before a reverse read. - All else equal, sorts records the order of appearance in the input (i.e., stable sort)

These criteria are the same as "samtool sort" and "sambamba sort".

Example:

sorter := NewSorter("tmp0.sort", header)
for _, rec := range recordlist {
  sorter.AddRecord(rec)
}
err := sorter.Close()

.. Similarly, produce tmp1.sort, .., tmpN.sort, possibly on
.. different processes or machines ..

// Merge all the sorted shards into one BAM file.
err := SortedShardsToBAM([]string{"tmp0.sort",..."tmpN.sort"}, "foo.bam")

func NewSorter

func NewSorter(outPath string, header *sam.Header, optList ...SortOptions) *Sorter

NewSorter creates a Sorter object.

func (*Sorter) AddRecord

func (s *Sorter) AddRecord(rec *sam.Record)

AddRecord adds a record to the sorter. The sorter takes ownership of "rec". The caller shall not read or write "rec" after the call.

func (*Sorter) Close

func (s *Sorter) Close() error

Close must be called after adding all the records. It blocks the caller until the shard file is generated. After Close, Sorter becomes invalid.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL