Documentation ¶
Overview ¶
Package interval implements interval-union operations in a manner optimized
for sets of genomic coordinates represented by BED files. (Note the 'union'. Overlapping intervals are merged, not tracked separately; it is currently necessary to use another package when that is not the desired behavior.) It assumes every position fits in a PosType, which is currently defined as int32 since that's what BAM files are limited to.
Index ¶
- Constants
- type BEDUnion
- func (u *BEDUnion) Clone() (bedUnion BEDUnion)
- func (u *BEDUnion) ContainsByID(refID int, pos PosType) bool
- func (u *BEDUnion) ContainsByName(refName string, pos PosType) bool
- func (u *BEDUnion) EndpointsByID(refID int) []PosType
- func (u *BEDUnion) EndpointsByName(refName string) []PosType
- func (u *BEDUnion) IntersectionByID(refID int, startPos, limitPos PosType) []PosType
- func (u *BEDUnion) Intersects(startRefID int, startPos PosType, limitRefID int, limitPos PosType) bool
- func (u *BEDUnion) IntersectsByID(refID int, startPos PosType, limitPos PosType) bool
- func (u *BEDUnion) OverlapByID(refID int, startPos, limitPos PosType) []PosType
- func (u *BEDUnion) RefNameSet() map[string]bool
- func (u *BEDUnion) Subset(startRefID int, startPos PosType, limitRefID int, limitPos PosType) (bedUnion BEDUnion)
- type EndpointIndex
- type Entry
- type NewBEDOpts
- type PosType
- type UnionScanner
Constants ¶
const PosTypeMax = math.MaxInt32
PosTypeMax is the maximum value that can be represented by a PosType.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BEDUnion ¶
type BEDUnion struct { // RefNames maps reference IDs to names, when idMap is initialized. RefNames []string // contains filtered or unexported fields }
BEDUnion represents the set of positions covered by the intervals in a BED file, in a manner that supports efficient iteration and intersection queries. (It does not keep track of interval names, etc.; just the final position set.)
func NewBEDUnion ¶
func NewBEDUnion(reader io.Reader, opts NewBEDOpts) (bedUnion BEDUnion, err error)
NewBEDUnion loads just the intervals from a sorted (by first coordinate) interval-BED, merging touching/overlapping intervals and eliminating empty ones in the process. A BEDUnion is returned.
func NewBEDUnionFromEntries ¶
func NewBEDUnionFromEntries(entries []Entry, opts NewBEDOpts) (bedUnion BEDUnion, err error)
NewBEDUnionFromEntries initializes a BEDUnion from a sorted []Entry. This ignores opts.OneBasedInput, since start0 is defined to be zero-based.
func NewBEDUnionFromPath ¶
func NewBEDUnionFromPath(path string, opts NewBEDOpts) (bedUnion BEDUnion, err error)
NewBEDUnionFromPath is a wrapper for NewBEDUnion that takes a path instead of an io.Reader.
func (*BEDUnion) Clone ¶
Clone returns a new BEDUnion which shares the interval set, but has its own search state.
func (*BEDUnion) ContainsByID ¶
ContainsByID checks whether the (0-based) interval [pos, pos+1) is contained within the BEDUnion, where reference is specified by sam.Header ID.
func (*BEDUnion) ContainsByName ¶
ContainsByName checks whether the (0-based) interval [pos, pos+1) is contained within the BEDUnion, where reference is specified by name.
func (*BEDUnion) EndpointsByID ¶
EndpointsByID returns the sorted interval-endpoint slice for the reference with the given ID. This must be treated as read-only; it may alias an internal data structure.
func (*BEDUnion) EndpointsByName ¶
EndpointsByName has the same functionality as EndpointsByID, without requiring a *sam.Header at BEDUnion construction time.
func (*BEDUnion) IntersectionByID ¶
IntersectionByID is similar to OverlapByID, except that the intersection between [startPos, limitPos) and the BEDUnion is returned (so the first and/or the last interval may be truncated). The return value must be treated as read-only.
func (*BEDUnion) Intersects ¶
func (u *BEDUnion) Intersects(startRefID int, startPos PosType, limitRefID int, limitPos PosType) bool
Intersects checks whether the given contiguous possibly-multi-reference region intersects the interval set. References must be specified by ID. It panics if limitRefID:limitPos isn't after startRefID:startPos.
func (*BEDUnion) IntersectsByID ¶
IntersectsByID checks whether the given nonempty single-reference [startPos, limitPos) interval intersects the BEDUnion, where the reference is specified by ID.
func (*BEDUnion) OverlapByID ¶
OverlapByID is a low-level function which returns a []PosType describing all the BED intervals overlapping [startPos, limitPos) on the given reference, where the reference is specified by ID. For example, assuming positive startPos, the BED interval [0, startPos) would not be returned, while [0, startPos + 1) would. IMPORTANT: the return value must be treated as read-only; it may alias an internal data structure. The return slice is arranged such that the value at index 2n is the start of overlapping interval n, and the value at index (2n+1) is the end of that overlapping interval.
func (*BEDUnion) RefNameSet ¶
RefNameSet returns the set of reference names. This works even when the BEDUnion was constructed without a *sam.Header.
func (*BEDUnion) Subset ¶
func (u *BEDUnion) Subset(startRefID int, startPos PosType, limitRefID int, limitPos PosType) (bedUnion BEDUnion)
Subset returns a new BEDUnion which describes the intersection between the original interval set and the given (possibly multi-reference) interval. The original BEDUnion must support ID-based lookup. Parameters are handled in the same way as Intersects().
type EndpointIndex ¶
type EndpointIndex uint32
EndpointIndex is intended to represent the result of SearchPosTypes(endpoints, pos+1). NOTE THE "+1"! This is necessary to get SearchPosTypes to line up with our usual left-closed right-open intervals.
func ExpsearchPosType ¶
func ExpsearchPosType(a []PosType, x PosType, idx EndpointIndex) EndpointIndex
ExpsearchPosType performs "exponential search" (https://en.wikipedia.org/wiki/Exponential_search ), checking a[idx], then a[idx + 1], then a[idx + 3], then a[idx + 7], etc., and finishing with binary search once it's either found an element larger than the target or has hit the end of the slice. It's usually a better choice than SearchPosTypes when iterating. (However, an inlined simple linear search may be better in practice. Can benchmark later if it matters.)
func NewEndpointIndex ¶
func NewEndpointIndex(pos PosType, endpoints []PosType) EndpointIndex
NewEndpointIndex returns an EndpointIndex initialized to SearchPosTypes(endpoints, pos+1).
func SearchPosTypes ¶
func SearchPosTypes(a []PosType, x PosType) EndpointIndex
SearchPosTypes returns the index of x in a[], or the position where x would be inserted if x isn't in a (this could be len(a)). It's exactly the same as sort.SearchInts(), except for PosType.
func (EndpointIndex) Begin ¶
func (ei EndpointIndex) Begin() EndpointIndex
Begin returns:
- the index for the beginning of the current interval, if we're inside an interval
- otherwise, the index for the beginning of the next interval
func (EndpointIndex) Contained ¶
func (ei EndpointIndex) Contained() bool
Contained returns whether we're inside an interval.
func (EndpointIndex) Finished ¶
func (ei EndpointIndex) Finished(endpoints []PosType) bool
Finished returns whether we're past all the intervals.
func (*EndpointIndex) Update ¶
func (ei *EndpointIndex) Update(newPos PosType, endpoints []PosType)
Update updates the EndpointIndex to refer to newPos, which cannot be smaller than the previous position referred to by this EndpointIndex. It is substantially faster than NewEndpointIndex when the position is increasing slowly.
type Entry ¶
Entry represents a single interval, with 0-based coordinates.
func ParseRegionString ¶
ParseRegionString parses a region string of one of the forms
[contig ID]:[1-based first pos]-[last pos] [contig ID]:[1-based pos] [contig ID]
returning a contig ID and 0-based interval boundaries. The interval [0, PosTypeMax - 1] is returned if there is no positional restriction.
type NewBEDOpts ¶
type NewBEDOpts struct { // SAMHeader enables ID-based lookup. (This is more convenient than // string-based lookup when using gbam.Shard.) SAMHeader *sam.Header // Invert causes the complement of the interval-union to be returned. The // complement extends down to position -1 at the beginning of each // reference, and currently 2^31 - 2 inclusive at the end. If SAMHeader is // provided, any reference mentioned in the SAMHeader but entirely absent // from the BED will be fully included. Otherwise, only the references // mentioned in the BED file are included. (A single empty interval // qualifies as a "mention" for the latter purpose.) Invert bool // OneBasedInput interprets the BED interval boundaries as one-based [start, // end] instead of the usual zero-based [start, end). OneBasedInput bool }
NewBEDOpts defines behavior of this package's BED-loading function(s).
type PosType ¶
type PosType int32
PosType is the type used to represent interval coordinates. int32 should be wide enough for some time to come, since that's what BAM is limited to.
(This, and PosTypeMax, should move to a more central package once an appropriate one exists. And then, when generics finally become part of the language *crosses fingers*, we can allow some applications to redefine this as uint32 or a 64-bit type as appropriate.)
type UnionScanner ¶
type UnionScanner struct {
// contains filtered or unexported fields
}
UnionScanner supports iteration over an interval-union. Invariants:
endpointIdx == SearchPosTypes(endpoints, pos+1) Pos is either contained in an interval, or is PosTypeMax
func NewUnionScanner ¶
func NewUnionScanner(endpoints []PosType) UnionScanner
NewUnionScanner returns a UnionScanner initialized to the first interval.
func (*UnionScanner) Pos ¶
func (us *UnionScanner) Pos() PosType
Pos returns the next position to be iterated over, or PosTypeMax if there aren't any.
func (*UnionScanner) Scan ¶
func (us *UnionScanner) Scan(start *PosType, end *PosType, limit PosType) bool
Scan is written so that the following loop can be used to iterate over all within-interval positions up to (and not including) limit:
for us.Scan(&start, &end, limit) { for pos := start; pos < end; pos++ { // ...do stuff with pos... } }