Documentation ¶
Overview ¶
Package fasta provides functions for reading, writing, and manipulating fasta files.
Index ¶
- Variables
- func AllAreEqual(alpha []Fasta, beta []Fasta) bool
- func AllAreEqualIgnoreOrder(alpha []Fasta, beta []Fasta) bool
- func AllToUpper(records []Fasta)
- func AlnPosToRefPos(record Fasta, AlnPos int) int
- func AlnPosToRefPosCounter(record Fasta, AlnPos int, refStart int, alnStart int) int
- func AlnPosToRefPosCounterSeq(record []dna.Base, AlnPos int, refStart int, alnStart int) int
- func AssemblyStats(infile string, countLowerAsGaps bool) (int, int, int, int, int)
- func BinFasta(genome []Fasta, binNum int) map[int][]Fasta
- func BinGenomeNoBreaks(genome []Fasta, binNum int, minSize int) map[int][]Fasta
- func CalculateN50(contigList []int, halfGenome int) int
- func GoReadToChan(filename string) <-chan Fasta
- func IsEqual(alpha Fasta, beta Fasta) bool
- func IsFasta(filename string) bool
- func MakeContigList(records []Fasta, countLowerAsGaps bool) []int
- func NumSegregatingSites(aln []Fasta) int
- func PairwiseMutationDistanceInRange(seq1 Fasta, seq2 Fasta, alnStart int, alnEnd int) int
- func PairwiseMutationDistanceReferenceWindow(seq1 Fasta, seq2 Fasta, alnStart int, windowSize int) (int, bool, int)
- func ReadToChan(file *fileio.EasyReader, data chan<- Fasta, wg *sync.WaitGroup)
- func ReadToString(filename string) map[string]string
- func RefPosToAlnPos(record Fasta, RefPos int) int
- func RefPosToAlnPosCounter(record Fasta, RefPos int, refStart int, alnStart int) int
- func ReverseComplement(record Fasta)
- func ReverseComplementAll(records []Fasta)
- func ScanN(aln []Fasta, queryName string) [][]int
- func SeekByIndex(sr *Seeker, chr, start, end int) ([]dna.Base, error)
- func SeekByName(sr *Seeker, chr string, start, end int) ([]dna.Base, error)
- func SortByName(seqs []Fasta)
- func SortBySeq(seqs []Fasta)
- func ToChromInfo(records []Fasta) []chromInfo.ChromInfo
- func ToMap(ref []Fasta) map[string][]dna.Base
- func ToUpper(fa Fasta)
- func Write(filename string, records []Fasta)
- func WriteAssemblyStats(assemblyName string, outfile string, N50 int, halfGenome int, genomeLength int, ...)
- func WriteFasta(file io.Writer, rec Fasta, lineLength int)
- func WriteToFileHandle(file io.Writer, records []Fasta, lineLength int)
- type Fasta
- func Copy(f Fasta) Fasta
- func CopyAll(f []Fasta) []Fasta
- func CopySubset(records []Fasta, start int, end int) []Fasta
- func CreateAllGaps(name string, numGaps int) Fasta
- func CreateAllNs(name string, numN int) Fasta
- func DistColumn(records []Fasta) []Fasta
- func Extract(f Fasta, start int, end int, name string) Fasta
- func ExtractMulti(records []Fasta, start int, end int) []Fasta
- func NextFasta(file *fileio.EasyReader) (Fasta, bool)
- func NextFastaForced(file *fileio.EasyReader) (Fasta, bool)
- func Read(filename string) []Fasta
- func ReadForced(filename string) []Fasta
- func Remove(slice []Fasta, i int) []Fasta
- func RemoveGaps(records []Fasta) []Fasta
- func RemoveMissingMult(records []Fasta) []Fasta
- func SegregatingSites(aln []Fasta) []Fasta
- func TrimName(fa Fasta) Fasta
- type FastaMap
- type Index
- type Seeker
Constants ¶
This section is empty.
Variables ¶
var ( ErrSeekStartOutsideChr = errors.New("requested start position greater than requested chromosome length, nil output") ErrSeekEndOutsideChr = errors.New("requested bases past end of chr, output truncated") )
Functions ¶
func AllAreEqual ¶
AllAreEqual returns true if every entry in a slice of Fasta structs passes IsEqual. Sensitive to order in the slice.
func AllAreEqualIgnoreOrder ¶
AllAreEqualIgnoreOrder returns true if every entry in a slice of Fasta structs passes IsEqual. Not sensitive to order in the slice.
func AllToUpper ¶
func AllToUpper(records []Fasta)
AllToUpper converts all bases to uppercase in all sequences in a slice of fasta records.
func AlnPosToRefPos ¶
AlnPosToRefPos returns the reference position associated with a given AlnPos for an input Fasta. If the AlnPos corresponds to a gap, it gives the preceding reference position. 0 based. Consider using AlnPosToRefPosCounter instead if tracking refStart and alnStart will be beneficial, e.g. when working through entire chromosomes
func AlnPosToRefPosCounter ¶
AlnPosToRefPosCounter is like AlnPosToRefPos, but can begin midway through a chromosome at a refPosition/alnPosition pair, defined with the input variables refStart and alnStart.
func AlnPosToRefPosCounterSeq ¶ added in v1.0.1
AlnPosToRefPosCounterSeq is AlnPosToRefPosCounter but the input record is just the sequence of the fasta struct
func AssemblyStats ¶
AssemblyStats takes the path to a fasta file and a flag for whether lower case letters should count as assembly gaps. Five ints are returned, which encode: the N50 size, half the size of the genome, size of the genome, size of the largest contig, and the number of contigs.
func BinFasta ¶
BinFasta takes in a slice of fastas and breaks it up into x number of fastas with relatively equal sequence in each, where x equals the number of bins specified.
func BinGenomeNoBreaks ¶
BinGenomeNoBreaks takes in an entire genome which is sorted largest to smallest contig and breaks up the fasta so that smaller contigs get combined into a single fasta, while large contigs become a single fasta on their own. The user must specify the number of bins for the genome to be broken into, the genome must have more contigs than bins in order to combine any contigs and equal number bins to contigs if each contig gets its own record. The bins will all be filled with the first contig encountered when it's empty, and then the smallest of those bins will be filled when the contig is equal to binNum+1. The minSize option allows for a user to specify a minimum length of sequence to go into each bin and in this case the number of bins returned depends on the minSize and the binNum will be ignored.
func CalculateN50 ¶
CalculateN50 takes a slice of contig lengths and the size of half the genome. It returns the N50 size.
func GoReadToChan ¶
GoReadToChan reads fasta records from an input filename and returns a channel of Fasta structs.
func IsFasta ¶
IsFasta returns true if the input filename has a fasta file extension. Input filename may have a .gz suffix.
func MakeContigList ¶
MakeContigList takes a slice of fasta sequences and a flag for whether lower case letters should count as gaps. A slice of contig sizes is the return value.
func NumSegregatingSites ¶
NumSegregatingSites returns the number of sites in an alignment block that are segregating.
func PairwiseMutationDistanceInRange ¶
PairwiseMutationDistanceInRange calculates the number of mutations between two Fasta sequences from a specified start and end alignment column. Segregating sites are counted as 1, as are INDELs regardless of length.
func PairwiseMutationDistanceReferenceWindow ¶
func PairwiseMutationDistanceReferenceWindow(seq1 Fasta, seq2 Fasta, alnStart int, windowSize int) (int, bool, int)
PairwiseMutationDistanceReferenceWindow takes two input fasta sequences and calculates the number of mutations in a reference window of a given size. Segregating sites are counted as 1, as are INDELs regardless of length. alnStart indicates the beginning alignment column for distance evaluation, and windowSize is the number of references bases to compare. Three returns, first is the pairwise mutation distance, second is reachedEnd, a bool that is true for incomplete windows. The third return is alignmentEnd, or the last alignment column evaluated.
func ReadToChan ¶
func ReadToChan(file *fileio.EasyReader, data chan<- Fasta, wg *sync.WaitGroup)
ReadToChan is a helper function of GoReadToChan.
func ReadToString ¶
ReadToString reads a fasta file to a map of sequence strings keyed by the record name.
func RefPosToAlnPos ¶
RefPosToAlnPos returns the alignment position associated with a given reference position for an input MultiFa. 0 based.
func RefPosToAlnPosCounter ¶
RefPosToAlnPosCounter is like RefPosToAlnPos, but can begin midway through a chromosome at a refPosition/alnPosition pair, defined by the input variables refStart and alnStart.
func ReverseComplement ¶
func ReverseComplement(record Fasta)
ReverseComplement the sequence in a fasta record.
func ReverseComplementAll ¶
func ReverseComplementAll(records []Fasta)
ReverseComplementAll sequences in a slice of fasta records.
func ScanN ¶ added in v1.0.1
Scan takes in a multiFa alignment, scans the user-specified sequence for a user-specified pattern (N for now) and returns the positions in reference sequence coordinates
func SeekByIndex ¶
SeekByIndex returns a portion of a fasta sequence identified by chromosome index (order in file). Input start and end should be 0-based start-closed end-open.
func SeekByName ¶
SeekByName returns a portion of a fasta sequence identified by chromosome name. Input start and end should be 0-based start-open end-closed.
func ToChromInfo ¶
ToChromInfo converts a []Fasta into a []ChromInfo. Useful for applications that do not require the entire fasta sequence to be kept in memory, but just the name, size, and order of fasta records.
func ToMap ¶
ToMap converts the a slice of fasta records (e.g. the output of the Read function) to a map of sequences keyed to the sequences name.
func WriteAssemblyStats ¶
func WriteAssemblyStats(assemblyName string, outfile string, N50 int, halfGenome int, genomeLength int, largestContig int, numContigs int)
WriteAssemblyStats takes the name of an assembly, a path to an output file, and stats for: the N50 size, half the size of the genome, size of the genome, size of the largest contig, and the number of contigs. The stats, with some human-readable labels are written to the output file.
func WriteFasta ¶
WriteFasta writes a single fasta record to an io.Writer.
Types ¶
type Fasta ¶
Fasta stores the name and sequence of each '>' delimited record in a fasta file.
func CopySubset ¶
CopySubset returns a copy of a multiFa from a specified start and end position.
func CreateAllGaps ¶
CreateAllGaps creates a fasta record where the sequence is all gaps of length numGaps.
func CreateAllNs ¶
CreateAllNs creates a fasta record where the sequence is all Ns of length numN.
func DistColumn ¶
returns alignment columns with no gaps or lowercase letters.
func Extract ¶
Extract will subset a sequence in a fasta file and return a new fasta record with the same name and a subset of the sequence. Input start and end are left-closed right-open.
func ExtractMulti ¶
ExtractMulti extracts a subsequence from a fasta file for every entry in a multiFa alignment.
func NextFasta ¶
func NextFasta(file *fileio.EasyReader) (Fasta, bool)
NextFasta reads a single fasta record from an input EasyReader. Returns true when the file is fully read.
func NextFastaForced ¶
func NextFastaForced(file *fileio.EasyReader) (Fasta, bool)
NextFastaForced functions identically to Read, but any invalid characters in the sequence will be masked to N.
func Read ¶
Read in a fasta file to a []Fasta struct. All sequence records must be preceded by a name line starting with '>'. Each record must have a unique sequence name.
func ReadForced ¶
ReadForced functions identically to Read, but any invalid characters in the sequence will be masked to N.
func RemoveGaps ¶
RemoveGaps from all fasta records in a slice.
func RemoveMissingMult ¶
RemoveMissingMult removes any entries comprised only of gaps in a multiple alignment block,.
func SegregatingSites ¶
SegregatingSites takes in a multiFa alignment and returns a new alignment containing only the columns with segregating sites.
type FastaMap ¶
FastaMap stores fasta sequences as a map keyed by the sequence name instead of a slice. This allows for easy fasta lookups of chromosomes provided by other files (e.g. BED files). A FastaMap can be generated using the ToMap function (e.g. fasta.ToMap(fasta.Read('filename'))).
type Index ¶
type Index struct {
// contains filtered or unexported fields
}
Index stores the byte offset for each fasta sequencing allowing for efficient random access.
func CreateIndex ¶
CreateIndex for a fasta file for efficient random access.
type Seeker ¶
type Seeker struct {
// contains filtered or unexported fields
}
Seeker enables random access of fasta sequences using a pre-computed index.