bampair

package
v0.0.0-...-ad47f17 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 10, 2024 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package bampair provides a way to get the mate of each read when

reading a BAM/PAM file that is sorted by position.  Package bampair
assumes that the user is reading the BAM/PAM file in a sharded way,
and that while reading a shard, it is possible to store all the
shard's records in memory while processing the shard.  Package
bampair makes it possible for a user to process each record and its
mate in file order, making each shard's processing deterministic.

To use bampair, the user first calls GetDistantMates() with a
bamprovider and a list of shards, which returns a DistantMateTable.
The DistantMateTable contains the mate for each read who's mate is
*not* in the same shard.  For example, if R1 and R2 are mates, and
R1 is in shard2 and R2 is in shard4, then the DistantMateTable will
contain both R1 and R2.  On the other hand, if R1 and R2 are both in
shard3, then the DistantMateTable will contain neither R1 nor R2.

After calling GetDistantMates(), the user can then open each shard
and find the mate for reach record in the shard using the following
procedure: For a record who's mate is in the same shard, the user
must store the record in memory and continue reading the shard until
the user encounters the mate.  For a record who's mate is not in the
same shard, the user can call DistantMateTable.GetMate() right away
to retrieve the mate.  For an usage example, see
ExampleResolvePairs() in distant_mates_test.go.

Some applications may need to add padding to beginning and end of
each shard.  In this case, if R1 and R2 are in the same padded
shard, then neither will be in distant mates.  If R1 is in the
padded shard, and R2 is not, then R2 will be in distant mates.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetDistantMates

func GetDistantMates(provider bamprovider.Provider, shardList []bam.Shard, opts *Opts,
	createProcessors []func() RecordProcessor) (distantMates *DistantMateTable, shardInfo *ShardInfo, returnErr error)

GetDistantMates scans the BAM/PAM file given by provider, and then returns a DistantMateTable. When finished with the DistantMateTable, the caller must call DistantMateTable.Close() to release the resources used by the DistantMateTable. GetDistantMates also returns a ShardInfo object that includes information like number of records in each shard. While scanning through the input file, GetDistantMates also feeds each record to a set of RecordProcessors. createProcessors is a slice of functions that return the RecordProcessesors to be used. For an example of how to use GetDistantMates, see ExampleResolvePairs() in distant_mates_test.go.

func IsLeftMost

func IsLeftMost(r *sam.Record) bool

IsLeftMost returns true for only one read from a pair. LeftMost is defined by the read on the smaller reference id, the smaller alignment position, and if both refID and position are the same, R1 is considered the LeftMost.

Types

type DistantMateTable

type DistantMateTable struct {
	// contains filtered or unexported fields
}

DistantMateTable provides access to sam records. It is indexed by shardIdx and read name. The interface is designed so that the table can store mates either in memory or on disk. It's intended use is to store read pair mates.

Calls to addDistantMate() can occur concurrently, but the call to finishedAdding() should occur after all calls to addDistantMate() complete. After calling finishedAdding(), any number of threads can call len(), openShard(), getMate(), and closeShard().

func (*DistantMateTable) Close

func (d *DistantMateTable) Close() error

Close frees resources taken by a DistantMateTable. A user must call this after finishing with a DistantMateTable, and all shards have been closed with CloseShard().

func (*DistantMateTable) CloseShard

func (d *DistantMateTable) CloseShard(shardIdx int)

CloseShard closes the given shard so that further calls to GetMate() with the given shardIdx will fail. CloseShard() frees resources that OpenShard() allocates.

func (*DistantMateTable) GetMate

func (d *DistantMateTable) GetMate(shardIdx int, r *sam.Record) (*sam.Record, uint64)

GetMate returns the mate of r, and also the mate's FileIdx (as computed using shardInfo and the mate's shard-relative FileIdx). The shardIdx argument is equal to the shardIdx of the shard where r resides.

func (*DistantMateTable) OpenShard

func (d *DistantMateTable) OpenShard(shardIdx int) error

OpenShard prepares the shard, with the given shardIdx, to be queried with GetMate().

type Opts

type Opts struct {
	Parallelism int
	DiskShards  int
	ScratchDir  string
}

Opts contains the opts for GetDistantMates

type RecordProcessor

type RecordProcessor interface {
	Process(shard bam.Shard, r *sam.Record) error
	Close(shard bam.Shard)
}

RecordProcessor is a way for GetDistantMates to run Process() on every record in the bam file. After a given shard invokes Process() on all the records in the shard, including the padding, the shard will invoke Close().

type ShardInfo

type ShardInfo struct {
	// contains filtered or unexported fields
}

ShardInfo contains handy information about all shards, and is indexed by both key object and shardIdx.

func (*ShardInfo) GetInfoByIdx

func (i *ShardInfo) GetInfoByIdx(shardIdx int) *ShardInfoEntry

GetInfoByIdx returns the info for the given shard index..

func (*ShardInfo) GetInfoByShard

func (i *ShardInfo) GetInfoByShard(shard *bam.Shard) *ShardInfoEntry

GetInfoByShard returns the info for the given shard.

func (*ShardInfo) GetMateShard

func (i *ShardInfo) GetMateShard(r *sam.Record) bam.Shard

func (*ShardInfo) Len

func (i *ShardInfo) Len() int

Len returns the number of shards in i.

type ShardInfoEntry

type ShardInfoEntry struct {
	Shard               bam.Shard // Shard is the bam.Shard object.
	NumStartPadding     uint64    // NumStartPadding is the number of reads in the start padding.
	NumReads            uint64    // NumReads is the number of reads in the actual shard.
	PaddingStartFileIdx uint64    // PaddingStartFileIdx is the FileIdx of the first read in the start padding.
	ShardStartFileIdx   uint64    // ShardStartFileIdx is the FileIdx of the first read in the shard (excluding the padding).
}

ShardInfoEntry contains handy information about a particular shard.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL