Documentation ¶
Overview ¶
Package bampair provides a way to get the mate of each read when
reading a BAM/PAM file that is sorted by position. Package bampair assumes that the user is reading the BAM/PAM file in a sharded way, and that while reading a shard, it is possible to store all the shard's records in memory while processing the shard. Package bampair makes it possible for a user to process each record and its mate in file order, making each shard's processing deterministic. To use bampair, the user first calls GetDistantMates() with a bamprovider and a list of shards, which returns a DistantMateTable. The DistantMateTable contains the mate for each read who's mate is *not* in the same shard. For example, if R1 and R2 are mates, and R1 is in shard2 and R2 is in shard4, then the DistantMateTable will contain both R1 and R2. On the other hand, if R1 and R2 are both in shard3, then the DistantMateTable will contain neither R1 nor R2. After calling GetDistantMates(), the user can then open each shard and find the mate for reach record in the shard using the following procedure: For a record who's mate is in the same shard, the user must store the record in memory and continue reading the shard until the user encounters the mate. For a record who's mate is not in the same shard, the user can call DistantMateTable.GetMate() right away to retrieve the mate. For an usage example, see ExampleResolvePairs() in distant_mates_test.go. Some applications may need to add padding to beginning and end of each shard. In this case, if R1 and R2 are in the same padded shard, then neither will be in distant mates. If R1 is in the padded shard, and R2 is not, then R2 will be in distant mates.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func GetDistantMates ¶
func GetDistantMates(provider bamprovider.Provider, shardList []bam.Shard, opts *Opts, createProcessors []func() RecordProcessor) (distantMates *DistantMateTable, shardInfo *ShardInfo, returnErr error)
GetDistantMates scans the BAM/PAM file given by provider, and then returns a DistantMateTable. When finished with the DistantMateTable, the caller must call DistantMateTable.Close() to release the resources used by the DistantMateTable. GetDistantMates also returns a ShardInfo object that includes information like number of records in each shard. While scanning through the input file, GetDistantMates also feeds each record to a set of RecordProcessors. createProcessors is a slice of functions that return the RecordProcessesors to be used. For an example of how to use GetDistantMates, see ExampleResolvePairs() in distant_mates_test.go.
func IsLeftMost ¶
IsLeftMost returns true for only one read from a pair. LeftMost is defined by the read on the smaller reference id, the smaller alignment position, and if both refID and position are the same, R1 is considered the LeftMost.
Types ¶
type DistantMateTable ¶
type DistantMateTable struct {
// contains filtered or unexported fields
}
DistantMateTable provides access to sam records. It is indexed by shardIdx and read name. The interface is designed so that the table can store mates either in memory or on disk. It's intended use is to store read pair mates.
Calls to addDistantMate() can occur concurrently, but the call to finishedAdding() should occur after all calls to addDistantMate() complete. After calling finishedAdding(), any number of threads can call len(), openShard(), getMate(), and closeShard().
func (*DistantMateTable) Close ¶
func (d *DistantMateTable) Close() error
Close frees resources taken by a DistantMateTable. A user must call this after finishing with a DistantMateTable, and all shards have been closed with CloseShard().
func (*DistantMateTable) CloseShard ¶
func (d *DistantMateTable) CloseShard(shardIdx int)
CloseShard closes the given shard so that further calls to GetMate() with the given shardIdx will fail. CloseShard() frees resources that OpenShard() allocates.
func (*DistantMateTable) GetMate ¶
GetMate returns the mate of r, and also the mate's FileIdx (as computed using shardInfo and the mate's shard-relative FileIdx). The shardIdx argument is equal to the shardIdx of the shard where r resides.
func (*DistantMateTable) OpenShard ¶
func (d *DistantMateTable) OpenShard(shardIdx int) error
OpenShard prepares the shard, with the given shardIdx, to be queried with GetMate().
type RecordProcessor ¶
type RecordProcessor interface { Process(shard bam.Shard, r *sam.Record) error Close(shard bam.Shard) }
RecordProcessor is a way for GetDistantMates to run Process() on every record in the bam file. After a given shard invokes Process() on all the records in the shard, including the padding, the shard will invoke Close().
type ShardInfo ¶
type ShardInfo struct {
// contains filtered or unexported fields
}
ShardInfo contains handy information about all shards, and is indexed by both key object and shardIdx.
func (*ShardInfo) GetInfoByIdx ¶
func (i *ShardInfo) GetInfoByIdx(shardIdx int) *ShardInfoEntry
GetInfoByIdx returns the info for the given shard index..
func (*ShardInfo) GetInfoByShard ¶
func (i *ShardInfo) GetInfoByShard(shard *bam.Shard) *ShardInfoEntry
GetInfoByShard returns the info for the given shard.
type ShardInfoEntry ¶
type ShardInfoEntry struct { Shard bam.Shard // Shard is the bam.Shard object. NumStartPadding uint64 // NumStartPadding is the number of reads in the start padding. NumReads uint64 // NumReads is the number of reads in the actual shard. PaddingStartFileIdx uint64 // PaddingStartFileIdx is the FileIdx of the first read in the start padding. ShardStartFileIdx uint64 // ShardStartFileIdx is the FileIdx of the first read in the shard (excluding the padding). }
ShardInfoEntry contains handy information about a particular shard.