Documentation ¶
Overview ¶
Sets of utilities to parse Fastq/a files
Sets of utilities to parse Fastq/a files
Index ¶
- func CreateFile(o string) (*bufio.Writer, *os.File, error)
- func CreateFileGzip(o string) (*gzip.Writer, *os.File, error)
- func ExtractFna(file, prefix, taxid string, heads map[string]bool) error
- func GetFileType(inf *os.File) (string, error)
- func MatchFastaId(header, id, substr string) bool
- func MatchFastqId(header, id, substr string) bool
- func OpenFastq(file string) (*bufio.Reader, *os.File, error)
- type Association
- type Converter
- type Fasta
- type Fastq
- type Fna
- func (f *Fna) AddFasta(seq *Fasta)
- func (f *Fna) AddHeaderPrefix(h, d string) (Converter, error)
- func (f *Fna) AddHeaderSuffix(h, d string) (Converter, error)
- func (f *Fna) FilterLength(min, max int)
- func (f *Fna) FilterSequences(ids []string, idx SeqIndex, substr string, exact, exclude, warn bool) error
- func (f *Fna) Get(r string, fIdx SeqIndex) (*Fasta, error)
- func (f *Fna) HeaderFromConverter(c MapConverter) error
- func (f *Fna) Index() (SeqIndex, error)
- func (f *Fna) ReplaceHeader(h string) (Converter, error)
- func (f *Fna) Swap(i, j int, sIdx SeqIndex)
- func (f Fna) Write(o string) error
- type Fsq
- func (f *Fsq) AddFastq(seq *Fastq)
- func (f *Fsq) FilterLength(min, max int)
- func (f *Fsq) FilterSequences(ids []string, idx SeqIndex, substr string, exact, exclude, warn bool) error
- func (f *Fsq) Get(r string, fIdx SeqIndex) (*Fastq, error)
- func (f *Fsq) GetAvgQuality(phred bool) float64
- func (f *Fsq) Index() (SeqIndex, error)
- func (f *Fsq) SearchIndex(i string, b, e, m int) error
- func (f *Fsq) Swap(i, j int, sIdx SeqIndex)
- func (f Fsq) Write(o string) error
- func (f Fsq) WriteAppend(w *bufio.Writer) error
- func (f Fsq) WriteAppendGzip(w *gzip.Writer) error
- func (f Fsq) WriteGzip(o string) error
- type MapConverter
- type SeqIndex
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CreateFileGzip ¶ added in v0.17.0
func ExtractFna ¶
Could be merged / removed because it can be accomplished with combination of simpler function/methods combination Load, Index, NewFasta, AddSequence, Write
func GetFileType ¶
Read start of file to determiner if file is gzip compressed or not. Returns an error if there is a problem with the open file
func MatchFastaId ¶ added in v0.14.0
Utility to determine if an id+substr is found in a sequence header
func MatchFastqId ¶ added in v0.14.0
Utility to determine if an id+substr is found in a sequence header
Types ¶
type Association ¶ added in v0.9.0
Type used to keep track of old and new header names. For use with Replace, AddHeaderPrefix and AddHeaderSuffix
type Converter ¶ added in v0.9.0
type Converter []*Association
func NewConverter ¶ added in v0.9.0
Convenience method to create a new converter and allocate enough space for `nb` Associations
func (Converter) AddAssociation ¶ added in v0.9.0
Adds an `Association` to the converter, replacing any existing one at position `idx` Return an error if `idx >= len(c)`
type Fasta ¶
type Fastq ¶
func NewFastq ¶ added in v0.10.0
function to create a fastq struct from three strings Take three arguments, 1. h : The header to use (will automatically add the starting '@'); 2. s : The sequence itself, and; 3. q : The quality of the sequence
func (*Fastq) GetQuality ¶ added in v0.10.0
type Fna ¶
type Fna []*Fasta
func (*Fna) AddHeaderPrefix ¶ added in v0.9.0
Adds prefix `h` to the sequences' header seperating both by delimiter `d` Return the associated Converter and an error.
func (*Fna) AddHeaderSuffix ¶ added in v0.9.0
Adds prefix `h` to the sequences' header seperating both by delimiter `d` Return the associated Converter and an error.
func (*Fna) FilterLength ¶
Simple Utility to filter the lengths of fasta sequences. Iterates over a slice of Sequences (Fna) and filters out the sequences whose lengths are lower or greater than the specified mininimum and maximum. If maximum <= 0, max is set to the length of the sequence. Returns filtered Fna
func (*Fna) FilterSequences ¶ added in v0.11.3
func (f *Fna) FilterSequences(ids []string, idx SeqIndex, substr string, exact, exclude, warn bool) error
Utility to filter out fasta sequences given a list of ids. Iterates over a slice of Sequences (Fna) and filters out the sequences whose headers have been provided. There are two modes here. exact or substring match. If exact is true, we use the index to search for the headers. The requires the provided headers to be an exact match to those found in the fasta file. If exact is set to false, Then we need to iterate over the index key and look for a partial match in the headers. This may return multiple results per provided id. In the event that the headers aren't exact, it is possible to provide a pattern to limit the number of results reported by the filter. for instance, if the header is composed of two space seperated strings, you could provide `\s+` to the regexp parser using the `substr` argument. It is also possible to reverse the result, meaning removing the provided sequences instead of only keeping then. Set exclude to true if that is the desired output. Also, by default, it will return an error any of the provided headers are not found. This behavior can be modified with warn set to true, which will tell the user an id was not found instead
func (*Fna) Get ¶ added in v0.9.15
Method to get a sequence from a Fna using a SeqIndex. Requires that the fasta be Indexed first Return the desired sequence and an error. Error should be NotFound, if any.
func (*Fna) HeaderFromConverter ¶ added in v0.15.0
func (f *Fna) HeaderFromConverter(c MapConverter) error
TODO
func (*Fna) Index ¶ added in v0.7.0
Index the loaded fasta file using fasta headers Assumes that all headers are unique, raises an error if not
func (*Fna) ReplaceHeader ¶ added in v0.9.0
Replaces all existing headers by `h` follow by a sequential number Return the associated Converter and an error.
type Fsq ¶
type Fsq []*Fastq
func LoadFastq ¶ added in v0.10.0
Function to read in a fastq file, compressed or not and load it in a type Fsq ([]*Fastq). Return an error if the file can't be opened or if there is an error while reading
func LoadNFastq ¶ added in v0.17.0
Function to read in a fastq file N sequences at a time, compressed or not and load it in a type Fsq ([]*Fastq). Return an error if the file can't be opened or if there is an error while reading
func (*Fsq) FilterLength ¶ added in v0.10.0
Simple Utility to filter the lengths of fasta sequences. Iterates over a slice of Sequences (Fna) and filters out the sequences whose lengths are lower or greater than the specified mininimum and maximum. If maximum <= 0, max is set to the length of the sequence. Returns filtered Fna
func (*Fsq) FilterSequences ¶ added in v0.12.0
func (f *Fsq) FilterSequences(ids []string, idx SeqIndex, substr string, exact, exclude, warn bool) error
Utility to filter out fastq sequences given a list of ids. Iterates over a slice of Sequences (Fna) and filters out the sequences whose headers have been provided. There are two modes here. exact or substring match. If exact is true, we use the index to search for the headers. The requires the provided headers to be an exact match to those found in the fastq file. If exact is set to false, Then we need to iterate over the index key and look for a partial match in the headers. This may return multiple results per provided id. In the event that the headers aren't exact, it is possible to provide a pattern to limit the number of results reported by the filter. for instance, if the header is composed of two space seperated strings, you could provide `\s+` to the regexp parser using the `substr` argument. It is also possible to reverse the result, meaning removing the provided sequences instead of only keeping then. Set exclude to true if that is the desired output. Also, by default, it will return an error any of the provided headers are not found. This behavior can be modified with warn set to true, which will tell the user an id was not found instead
func (*Fsq) Get ¶ added in v0.10.0
Method to get a sequence from a Fna using a SeqIndex. Requires that the fasta be Indexed first Return the desired sequence and an error. Error should be NotFound, if any.
func (*Fsq) GetAvgQuality ¶ added in v0.10.0
func (*Fsq) Index ¶ added in v0.10.0
Index the loaded fasta file using fasta headers Assumes that all headers are unique, raises an error if not
type MapConverter ¶ added in v0.15.0
New type that will replace Converter TODO map[old]new
func LoadConverter ¶ added in v0.15.0
func LoadConverter(f string, flip bool) (MapConverter, error)
func NewMapConverter ¶ added in v0.15.0
func NewMapConverter() MapConverter
func (MapConverter) AddAssociation ¶ added in v0.15.0
func (c MapConverter) AddAssociation(o, n string) error