Documentation ¶
Overview ¶
Package fasta contains code for parsing (optionally indexed) FASTA files. See http://www.htslib.org/doc/faidx.html. Briefly, FASTA files consist of a number of named sequences that may be interrupted by newlines. For example:
>chr7 ACGTAC GAGGAC GCG >chr8 ACGT
Note: Sequence names are defined to be the stretch of characters excluding spaces immediately after '>'. Any text appear after a space are ignored. For example, '>chr1 A viral sequence' becomes 'chr1'.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FaiToReferenceLengths ¶
FaiToReferenceLengths reads in a fasta fai file and returns a map of reference name to reference length. This doesn't require reading in the fasta itself.
func GenerateIndex ¶
GenerateIndex generates an index (*.fai) from FASTA. The index can be later passed to NewIndexed() to random-access the FASTA file quickly.
The index format is defined by "samtool faidx" (http://www.htslib.org/doc/faidx.html).
Types ¶
type Encoding ¶
type Encoding byte
const ( // RawASCII encoding preserves the original bytes, including case. RawASCII Encoding = iota // CleanASCII encoding capitalizes all lowercase 'a'/'c'/'g'/'t', and // converts all non-ACGT characters to 'N'. CleanASCII // Seq8 encoding is 'A'/'a' = 1, 'C'/'c' = 2, 'G'/'g' = 4, 'T'/'t' = 8, // anything else = 15. This plays well with BAM/PAM files. Seq8 // TODO(cchang): Add 'Base5' encoding, where 'A'/'a' = 0, 'C'/'c' = 1, // 'G'/'g' = 2, 'T'/'t' = 3, anything else = 4. EncodingLimit )
type Fasta ¶
type Fasta interface { // Get returns a substring of the given sequence name at the given // coordinates, which are treated as a 0-based half-open interval // [start, end). Get is thread-safe. Get(seqName string, start, end uint64) (string, error) // Len returns the length of the given sequence. Len(seqName string) (uint64, error) // SeqNames returns the names of all sequences, in the order of appearance in // the FASTA file. SeqNames() []string }
Fasta represents FASTA-formatted data, consisting of a set of named sequences.
func New ¶
New creates a new Fasta that holds all the FASTA data from the given reader in memory. Pass OptIndex, if possible, to read much faster.
func NewIndexed ¶
NewIndexed creates a new Fasta that can perform efficient random lookups using the provided index, without reading the data into memory.
Note: Callers that expect to read many or all of the FASTA file sequences should use New(..., OptIndex(...)) instead.
type Opt ¶
type Opt func(*opts)
Opt is an optional argument to New, NewIndexed.
func OptEncoding ¶
OptEncoding specifies the encoding of the in-memory FASTA sequences.
func OptIndex ¶
OptIndex makes New read FASTA file with a provided index, like NewIndexed. Unlike NewIndexed, New with OptIndex is optimized for reading all sequences in the FASTA file rather than a small, random subset. Callers that plan to read many or all FASTA sequences should use this (though as always, profile in your application).