interval

package

v0.0.0-...-ad47f17 Latest Latest Go to latest Published: Oct 10, 2024 License: Apache-2.0 Imports: 14 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/will-rowe/bio

Links

Open Source Insights

Documentation ¶

Overview ¶

Package interval implements interval-union operations in a manner optimized

for sets of genomic coordinates represented by BED files.
(Note the 'union'.  Overlapping intervals are merged, not tracked
separately; it is currently necessary to use another package when that is not
the desired behavior.)
It assumes every position fits in a PosType, which is currently defined as
int32 since that's what BAM files are limited to.

Index ¶

Constants
type BEDUnion
type EndpointIndex
type Entry
- func ParseRegionString(region string) (result Entry, err error)
type NewBEDOpts
type PosType
type UnionScanner
- func NewUnionScanner(endpoints []PosType) UnionScanner
- func (us *UnionScanner) Pos() PosType
- func (us *UnionScanner) Scan(start *PosType, end *PosType, limit PosType) bool

Constants ¶

View Source

const PosTypeMax = math.MaxInt32

PosTypeMax is the maximum value that can be represented by a PosType.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type BEDUnion ¶

type BEDUnion struct {

	// RefNames maps reference IDs to names, when idMap is initialized.
	RefNames []string
	// contains filtered or unexported fields
}

BEDUnion represents the set of positions covered by the intervals in a BED file, in a manner that supports efficient iteration and intersection queries. (It does not keep track of interval names, etc.; just the final position set.)

func NewBEDUnion ¶

func NewBEDUnion(reader io.Reader, opts NewBEDOpts) (bedUnion BEDUnion, err error)

NewBEDUnion loads just the intervals from a sorted (by first coordinate) interval-BED, merging touching/overlapping intervals and eliminating empty ones in the process. A BEDUnion is returned.

func NewBEDUnionFromEntries ¶

func NewBEDUnionFromEntries(entries []Entry, opts NewBEDOpts) (bedUnion BEDUnion, err error)

NewBEDUnionFromEntries initializes a BEDUnion from a sorted []Entry. This ignores opts.OneBasedInput, since start0 is defined to be zero-based.

func NewBEDUnionFromPath ¶

func NewBEDUnionFromPath(path string, opts NewBEDOpts) (bedUnion BEDUnion, err error)

NewBEDUnionFromPath is a wrapper for NewBEDUnion that takes a path instead of an io.Reader.

func (*BEDUnion) Clone ¶

func (u *BEDUnion) Clone() (bedUnion BEDUnion)

Clone returns a new BEDUnion which shares the interval set, but has its own search state.

func (*BEDUnion) ContainsByID ¶

func (u *BEDUnion) ContainsByID(refID int, pos PosType) bool

ContainsByID checks whether the (0-based) interval [pos, pos+1) is contained within the BEDUnion, where reference is specified by sam.Header ID.

func (*BEDUnion) ContainsByName ¶

func (u *BEDUnion) ContainsByName(refName string, pos PosType) bool

ContainsByName checks whether the (0-based) interval [pos, pos+1) is contained within the BEDUnion, where reference is specified by name.

func (*BEDUnion) EndpointsByID ¶

func (u *BEDUnion) EndpointsByID(refID int) []PosType

EndpointsByID returns the sorted interval-endpoint slice for the reference with the given ID. This must be treated as read-only; it may alias an internal data structure.

func (*BEDUnion) EndpointsByName ¶

func (u *BEDUnion) EndpointsByName(refName string) []PosType

EndpointsByName has the same functionality as EndpointsByID, without requiring a *sam.Header at BEDUnion construction time.

func (*BEDUnion) IntersectionByID ¶

func (u *BEDUnion) IntersectionByID(refID int, startPos, limitPos PosType) []PosType

IntersectionByID is similar to OverlapByID, except that the intersection between [startPos, limitPos) and the BEDUnion is returned (so the first and/or the last interval may be truncated). The return value must be treated as read-only.

func (*BEDUnion) Intersects ¶

func (u *BEDUnion) Intersects(startRefID int, startPos PosType, limitRefID int, limitPos PosType) bool

Intersects checks whether the given contiguous possibly-multi-reference region intersects the interval set. References must be specified by ID. It panics if limitRefID:limitPos isn't after startRefID:startPos.

func (*BEDUnion) IntersectsByID ¶

func (u *BEDUnion) IntersectsByID(refID int, startPos PosType, limitPos PosType) bool

IntersectsByID checks whether the given nonempty single-reference [startPos, limitPos) interval intersects the BEDUnion, where the reference is specified by ID.

func (*BEDUnion) OverlapByID ¶

func (u *BEDUnion) OverlapByID(refID int, startPos, limitPos PosType) []PosType

OverlapByID is a low-level function which returns a []PosType describing all the BED intervals overlapping [startPos, limitPos) on the given reference, where the reference is specified by ID. For example, assuming positive startPos, the BED interval [0, startPos) would not be returned, while [0, startPos + 1) would. IMPORTANT: the return value must be treated as read-only; it may alias an internal data structure. The return slice is arranged such that the value at index 2n is the start of overlapping interval n, and the value at index (2n+1) is the end of that overlapping interval.

func (*BEDUnion) RefNameSet ¶

func (u *BEDUnion) RefNameSet() map[string]bool

RefNameSet returns the set of reference names. This works even when the BEDUnion was constructed without a *sam.Header.

func (*BEDUnion) Subset ¶

func (u *BEDUnion) Subset(startRefID int, startPos PosType, limitRefID int, limitPos PosType) (bedUnion BEDUnion)

Subset returns a new BEDUnion which describes the intersection between the original interval set and the given (possibly multi-reference) interval. The original BEDUnion must support ID-based lookup. Parameters are handled in the same way as Intersects().

type EndpointIndex ¶

type EndpointIndex uint32

EndpointIndex is intended to represent the result of SearchPosTypes(endpoints, pos+1). NOTE THE "+1"! This is necessary to get SearchPosTypes to line up with our usual left-closed right-open intervals.

func ExpsearchPosType ¶

func ExpsearchPosType(a []PosType, x PosType, idx EndpointIndex) EndpointIndex

ExpsearchPosType performs "exponential search" (https://en.wikipedia.org/wiki/Exponential_search ), checking a[idx], then a[idx + 1], then a[idx + 3], then a[idx + 7], etc., and finishing with binary search once it's either found an element larger than the target or has hit the end of the slice. It's usually a better choice than SearchPosTypes when iterating. (However, an inlined simple linear search may be better in practice. Can benchmark later if it matters.)

func NewEndpointIndex ¶

func NewEndpointIndex(pos PosType, endpoints []PosType) EndpointIndex

NewEndpointIndex returns an EndpointIndex initialized to SearchPosTypes(endpoints, pos+1).

func SearchPosTypes ¶

func SearchPosTypes(a []PosType, x PosType) EndpointIndex

SearchPosTypes returns the index of x in a[], or the position where x would be inserted if x isn't in a (this could be len(a)). It's exactly the same as sort.SearchInts(), except for PosType.

func (EndpointIndex) Begin ¶

func (ei EndpointIndex) Begin() EndpointIndex

Begin returns:

the index for the beginning of the current interval, if we're inside an interval
otherwise, the index for the beginning of the next interval

func (EndpointIndex) Contained ¶

func (ei EndpointIndex) Contained() bool

Contained returns whether we're inside an interval.

func (EndpointIndex) Finished ¶

func (ei EndpointIndex) Finished(endpoints []PosType) bool

Finished returns whether we're past all the intervals.

func (*EndpointIndex) Update ¶

func (ei *EndpointIndex) Update(newPos PosType, endpoints []PosType)

Update updates the EndpointIndex to refer to newPos, which cannot be smaller than the previous position referred to by this EndpointIndex. It is substantially faster than NewEndpointIndex when the position is increasing slowly.

type Entry ¶

type Entry struct {
	RefName string
	Start0  PosType
	End     PosType
}

Entry represents a single interval, with 0-based coordinates.

func ParseRegionString ¶

func ParseRegionString(region string) (result Entry, err error)

ParseRegionString parses a region string of one of the forms

[contig ID]:[1-based first pos]-[last pos]
[contig ID]:[1-based pos]
[contig ID]

returning a contig ID and 0-based interval boundaries. The interval [0, PosTypeMax - 1] is returned if there is no positional restriction.

type NewBEDOpts ¶

type NewBEDOpts struct {
	// SAMHeader enables ID-based lookup.  (This is more convenient than
	// string-based lookup when using gbam.Shard.)
	SAMHeader *sam.Header
	// Invert causes the complement of the interval-union to be returned.  The
	// complement extends down to position -1 at the beginning of each
	// reference, and currently 2^31 - 2 inclusive at the end.  If SAMHeader is
	// provided, any reference mentioned in the SAMHeader but entirely absent
	// from the BED will be fully included.  Otherwise, only the references
	// mentioned in the BED file are included.  (A single empty interval
	// qualifies as a "mention" for the latter purpose.)
	Invert bool
	// OneBasedInput interprets the BED interval boundaries as one-based [start,
	// end] instead of the usual zero-based [start, end).
	OneBasedInput bool
}

NewBEDOpts defines behavior of this package's BED-loading function(s).

type PosType ¶

type PosType int32

PosType is the type used to represent interval coordinates. int32 should be wide enough for some time to come, since that's what BAM is limited to.

(This, and PosTypeMax, should move to a more central package once an appropriate one exists. And then, when generics finally become part of the language *crosses fingers*, we can allow some applications to redefine this as uint32 or a 64-bit type as appropriate.)

type UnionScanner ¶

type UnionScanner struct {
	// contains filtered or unexported fields
}

UnionScanner supports iteration over an interval-union. Invariants:

endpointIdx == SearchPosTypes(endpoints, pos+1)
Pos is either contained in an interval, or is PosTypeMax

func NewUnionScanner ¶

func NewUnionScanner(endpoints []PosType) UnionScanner

NewUnionScanner returns a UnionScanner initialized to the first interval.

func (*UnionScanner) Pos ¶

func (us *UnionScanner) Pos() PosType

Pos returns the next position to be iterated over, or PosTypeMax if there aren't any.

func (*UnionScanner) Scan ¶

func (us *UnionScanner) Scan(start *PosType, end *PosType, limit PosType) bool

Scan is written so that the following loop can be used to iterate over all within-interval positions up to (and not including) limit:

for us.Scan(&start, &end, limit) {
  for pos := start; pos < end; pos++ {
    // ...do stuff with pos...
  }
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL