Documentation ¶
Index ¶
- Constants
- func BlockIntersectsRange(startAddr, endAddr biopb.Coord, userRange biopb.CoordRange) bool
- func CoordPathString(r biopb.Coord) string
- func CoordRangePathString(r biopb.CoordRange) string
- func FieldDataPath(dir string, recRange biopb.CoordRange, field string) string
- func GenerateReadShards(opts GenerateReadShardsOpts, indexes []ShardIndex) ([]biopb.CoordRange, error)
- func NewShardIndex(shardRange biopb.CoordRange, h *sam.Header) biopb.PAMShardIndex
- func ReadShardIndex(ctx context.Context, dir string, recRange biopb.CoordRange) (index biopb.PAMShardIndex, err error)
- func Remove(dir string) error
- func ShardIndexPath(dir string, recRange biopb.CoordRange) string
- func ValidateCoordRange(r *biopb.CoordRange) error
- func WriteShardIndex(ctx context.Context, dir string, coordRange biopb.CoordRange, ...) error
- type FileInfo
- func ChooseIndexFilesInRange(allIndexFiles []FileInfo, recRange biopb.CoordRange) ([]FileInfo, error)
- func FindIndexFilesInRange(ctx context.Context, dir string, recRange biopb.CoordRange) ([]FileInfo, error)
- func ListIndexes(ctx context.Context, dir string) ([]FileInfo, error)
- func ParsePath(path string) (FileInfo, error)
- type FileType
- type GenerateReadShardsOpts
- type ShardIndex
Constants ¶
const DefaultVersion = "PAM2"
DefaultVersion is the string embedded in ShardIndex.version.
const ShardIndexMagic = uint64(0x725c7226be794c60)
ShardIndexMagic is the value of ShardIndex.Magic.
Variables ¶
This section is empty.
Functions ¶
func BlockIntersectsRange ¶
func BlockIntersectsRange(startAddr, endAddr biopb.Coord, userRange biopb.CoordRange) bool
BlockIntersectsRange checks if userRange and [startAddr, endAddr] intersect.
func CoordPathString ¶
CoordPathString generates a string that can be used to embed in a pathname. Use ParsePath() to parse such a string.
func CoordRangePathString ¶
func CoordRangePathString(r biopb.CoordRange) string
CoordRangePathString returns a string that can be used as part of a pathname.
func FieldDataPath ¶
func FieldDataPath(dir string, recRange biopb.CoordRange, field string) string
FieldDataPath returns the path of the file storing data for the given record range and the field.
func GenerateReadShards ¶
func GenerateReadShards( opts GenerateReadShardsOpts, indexes []ShardIndex) ([]biopb.CoordRange, error)
GenerateReadShards returns a list of biopb.CoordRanges. The biopb.CoordRanges can be passed to NewReader for parallel, sharded record reads. The returned list satisfies the following conditions.
- The ranges in the list fill opts.Range (or the UniversalRange if not set) exactly, without an overlap or a gap.
- Length of the list is at least nShards. The length may exceed nShards because this function tries to split a range at a rowshard boundary.
3. The bytesize of the file region(s) that covers each biopb.CoordRange is roughly the same.
4. The ranges are sorted in an increasing order of biopb.Coord.
opts.NumShards specifies the number of shards. It should be generally be zero, in which case the function picks an appropriate default.
func NewShardIndex ¶
func NewShardIndex(shardRange biopb.CoordRange, h *sam.Header) biopb.PAMShardIndex
NewShardIndex creates a new PAMShardIndex object with the given arguments.
func ReadShardIndex ¶
func ReadShardIndex(ctx context.Context, dir string, recRange biopb.CoordRange) (index biopb.PAMShardIndex, err error)
ReadShardIndex reads the index file, "dir/<recRange>.index".
func Remove ¶
Remove deletes the files in the given PAM directory. It returns an error if some of the existing files fails to delete.
func ShardIndexPath ¶
func ShardIndexPath(dir string, recRange biopb.CoordRange) string
ShardIndexPath returns the path of shard index file.
func ValidateCoordRange ¶
func ValidateCoordRange(r *biopb.CoordRange) error
ValidateCoordRange validates "r" and normalize its fields, if necessary. In particular, if the range fields are all zeros, the range is replaced by UniversalRange.
func WriteShardIndex ¶
func WriteShardIndex(ctx context.Context, dir string, coordRange biopb.CoordRange, msg *biopb.PAMShardIndex) error
WriteShardIndex serializes "msg" into a single-block recordio file "dir/<coordRange>.index". Existing contents of the file is clobbered.
Types ¶
type FileInfo ¶
type FileInfo struct { // Path is the value passed to ParsePath. Path string // FileType is the type of the file. For "dir/0:0,46:1653469.mapq", the type // is FileTypeFieldData. For "dir/0:0,46:1653469.mapq", the type is // FileTypeFieldIndex. Type FileType // Field stores the field part of the filename. Field=="mapq" if the pathname // is "dir/0:0,46:1653469.mapq". It is meaningful iff Type == // FileTypeFieldData. Field string // Dir is the directory under which the file is stored. Dir="dir" if the // pathname is "dir/0:0,46:1653469.mapq". Dir string // Range is the record range that the file stores. Range={Start:{0,0}, // Limit:{46,1653469}} if the pathname is "dir/0:0,46:1653469.mapq". Range biopb.CoordRange }
FileInfo is the result of parsing a pathname.
A PAM pathname looks like "dir/0:0,46:1653469.mapq" or "dir/0:0,46:1653469.index".
func ChooseIndexFilesInRange ¶
func ChooseIndexFilesInRange(allIndexFiles []FileInfo, recRange biopb.CoordRange) ([]FileInfo, error)
ChooseIndexFilesInRange returns the subset of allIndexFiles that overlap recRange. REQUIRES: allIndexFiles[i].Type == FileTypeShardIndex for all i.
func FindIndexFilesInRange ¶
func FindIndexFilesInRange(ctx context.Context, dir string, recRange biopb.CoordRange) ([]FileInfo, error)
FindIndexFilesInRange lists all *.index files that store a record that intersects "recRange".
func ListIndexes ¶
ListIndexes lists shard index files found for the given PAM files. The returned list will be sorted by positions.
type GenerateReadShardsOpts ¶
type GenerateReadShardsOpts struct { // Range defines an optional row shard range. Only records in this range will // be returned by Scan() and Read(). If Range is unset, the universal range is // assumed. See also ReadOpts.Range. Range biopb.CoordRange // SplitMappedCoords allows GenerateReadShards to split mapped reads of // the same <refid, alignment position> into multiple shards. Setting // this flag true will cause shard size to be more even, but the caller // must be able to handle split reads. SplitMappedCoords bool // SplitUnmappedCoords allows GenerateReadShards to split unmapped // reads into multiple shards. Setting this flag true will cause shard // size to be more even, but the caller must be able to handle split // unmapped reads. SplitUnmappedCoords bool // CombineMappedAndUnmappedCoords allows creating a shard that contains both // mapped and unmapped reads. If this flag is false, shards are always split // at the start of unmapped reads. AlwaysSplitMappedAndUnmappedCoords bool // BytesPerShard is the target shard size, in bytes across all fields. If // this field is set, NumShards is ignored. BytesPerShard int64 // NumShards specifies the number of shards to create. This field is ignored // if BytePerShard>0. If neither BytesPerShard nor NumShards is set, // runtime.NumCPU()*4 shards will be created. NumShards int }
GenerateReadShardsOpts defines options to GenerateReadShards.
type ShardIndex ¶
type ShardIndex struct { // Range is the coordinate range that this object represents. Records and indexes from the // source PAM that don't intersect this range were ignored. Range biopb.CoordRange // ApproxFileBytes is an estimate of the total file size of records in Range (in the // underlying PAM) ApproxFileBytes int64 // Blocks is a sequence of index entries from one PAM field that span Range. Blocks []biopb.PAMBlockIndexEntry }
ShardIndex is data derived from one PAM file index information used by the sharder.
func ReadIndexes ¶
func ReadIndexes(ctx context.Context, path string, rng biopb.CoordRange, fields []string) ([]ShardIndex, error)
ReadIndexes reads the ShardIndexes for the PAM file at path, within rng. If the PAM contains no records in rng, returns an empty slice.