Documentation ¶
Overview ¶
Package pileup contains pileup parsers and writers.
The pileup format is a text-based bioinformatics format to summarize aligned reads against a reference sequence. In comparison to simply getting a consensus sequence from sequencing data, pileup files can contain more context about the mutations in a sequencing run, which is especially useful when analyzing plasmid sequencing data from Nanopore sequencing runs.
Pileup files are basically tsv files with 6 columns: Sequence Identifier, Position, Reference Base, Read Count, Read Results, and Quality. An example from wikipedia (https://en.wikipedia.org/wiki/Pileup_format) is shown below:
``` seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<& seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+ seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6 seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<< seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6< seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&< seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<< seq1 279 C 23 A..T,,.,.,...,,,.,..... 75&<<<<<<<<<=<<<9<<:<<< ``` 1. Sequence Identifier: The sequence identifier of the reference sequence 2. Position: Position of row in the reference sequence (indexed at 1) 3. Reference Base: Base pair in reference sequence 4. Read Count: Number of aligned reads to this particular base pair 5. Read Results: The resultant alignments 6. Quality: Phred quality scores associated with each base
This package provides a parser and writer for working with pileup files.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Parser ¶
type Parser struct {
// contains filtered or unexported fields
}
Parser is a pileup parser.
func (*Parser) ParseAll ¶
ParseAll parses all sequences in underlying reader only returning non-EOF errors. It returns all valid pileup sequences up to error if encountered.
func (*Parser) ParseN ¶
ParseN parses up to maxRows pileup sequences from the Parser's underlying reader. ParseN does not return EOF if encountered. If an non-EOF error is encountered it returns it and all correctly parsed sequences up to then.
type Pileup ¶
type Pileup struct { Sequence string `json:"sequence"` Position uint `json:"position"` ReferenceBase string `json:"reference_base"` ReadCount uint `json:"read_count"` ReadResults []string `json:"read_results"` Quality string `json:"quality"` }
Pileup struct is a single position in a pileup file. Pileup files "pile" a bunch of separate bam/sam alignments into something more readable at a per base pair level, so are only useful as a grouping.