fastq

package

v0.31.2 Latest Latest Go to latest Published: Oct 21, 2024 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/bebop/poly

Links

Open Source Insights

Documentation ¶

Overview ¶

Package fastq contains fastq parsers and writers.

Fastq is a flat text file format developed in ~2000 to store nucleotide sequencing data. While similar to fastq, fastq has a few differences. First, the sequence identifier begins with @ instead of >, and includes quality values for a sequence.

This package provides a parser and writer for working with Fastq formatted sequencing data.

Index ¶

func Build(fastqs []Fastq) ([]byte, error)
func Write(fastqs []Fastq, path string) error
type Fastq
type Parser
- func NewParser(r io.Reader, maxLineSize int) *Parser

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Build ¶

func Build(fastqs []Fastq) ([]byte, error)

Build converts a Fastqs array into a byte array to be written to a file.

func Write ¶

func Write(fastqs []Fastq, path string) error

Write writes a fastq array to a file.

Example ¶

ExampleWrite shows basic usage of the writer.

package main

import (
	"fmt"
	"os"

	_ "embed"
	"github.com/bebop/poly/io/fastq"
)

func main() {
	fastqs, _ := fastq.Read("data/nanosavseq.fastq") // get example data
	_ = fastq.Write(fastqs, "data/test.fastq")       // write it out again
	testSequence, _ := fastq.Read("data/test.fastq") // read it in again

	os.Remove("data/test.fastq") // getting rid of test file

	fmt.Println(testSequence[0].Identifier)
	fmt.Println(testSequence[0].Sequence)
	fmt.Println(testSequence[0].Quality)
}

Output:

e3cc70d5-90ef-49b6-bbe1-cfef99537d73
GATGTGCGCCGTTCCAGTTGCGACGTACTATAATCCCCGGCAACACGGTGCTGATTCTCTTCCTGTTCCAGAAAGCATAAACAGATGCAAGTCTGGTGTGATTAACTTCACCAAAGGGCTGGTTGTAATATTAGGAAATCTAACAATAGATTCTGTTGGTTGGACTCTAAAATTAGAAATTTGATAGATTCCTTTTCCCAAATGAAAGTTTAACGTACACTTTGTTTCTAAAGGAAGGTCAAATTACAGTCTACAGCATCGTAATGGTTCATTTTCATTTATATTTTAATACTAGAAAAGTCCTAGGTTGAAGATAACCACATAATAAGCTGCAACTTCAGCTGTCCCAACCTGAAGAAGAATCGCAGGAGTCGAAATAACTTCTGTAAAGCAAGTAGTTTGAACCTATTGATGTTTCAACATGAGCAATACGTAACT
$$&%&%#$)*59;/767C378411,***,('11<;:,0039/0&()&'2(/*((4.1.09751).601+'#&&&,-**/0-+3558,/)+&)'&&%&$$'%'%'&*/5978<9;**'3*'&&A?99:;:97:278?=9B?CLJHGG=9<@AC@@=>?=>D>=3<>=>3362$%/((+/%&+//.-,%-4:+..000,&$#%$$%+*)&*0%.//*?<<;>DE>.8942&&//074&$033)*&&&%**)%)962133-%'&*99><<=1144??6.027639.011/-)($#$(/422*4;:=122>?@6964:.5'8:52)*675=:4@;323&&##'.-57*4597)+0&:7<7-550REGB21/0+*79/&/6538())+)+23665+(''$$$'-2(&&*-.-#$&%%$$,-)&$$#$'&,);;<C<@454)#'

Types ¶

type Fastq ¶

type Fastq struct {
	Identifier string            `json:"identifier"`
	Optionals  map[string]string `json:"optionals"` // Nanopore, for example, carries along data like: read=13956 ch=53 start_time=2020-11-11T01:49:01Z
	Sequence   string            `json:"sequence"`
	Quality    string            `json:"quality"`
}

Fastq is a struct representing a single Fastq file element with an Identifier, its corresponding sequence, its quality score, and any optional pieces of data.

func Parse ¶

func Parse(r io.Reader) ([]Fastq, error)

Parse parses a given Fastq file into an array of Fastq structs. Internally, it uses ParseFastqConcurrent.

func Read ¶

func Read(path string) ([]Fastq, error)

Read reads a file into an array of Fastq structs

Example ¶

ExampleRead shows basic usage for Read.

package main

import (
	"fmt"

	_ "embed"
	"github.com/bebop/poly/io/fastq"
)

func main() {
	fastqs, _ := fastq.Read("data/nanosavseq.fastq")
	fmt.Println(fastqs[0].Identifier)
}

Output:

e3cc70d5-90ef-49b6-bbe1-cfef99537d73

func ReadGz ¶

func ReadGz(path string) ([]Fastq, error)

ReadGz reads a gzipped file into an array of Fastq structs.

Example ¶

ExampleReadGz shows basic usage for ReadGz.

package main

import (
	"fmt"

	_ "embed"
	"github.com/bebop/poly/io/fastq"
)

func main() {
	fastqs, _ := fastq.ReadGz("data/nanosavseq.fastq.gz")
	fmt.Println(fastqs[0].Identifier)
}

Output:

e3cc70d5-90ef-49b6-bbe1-cfef99537d73

type Parser ¶

type Parser struct {
	// contains filtered or unexported fields
}

Parser is a flexible parser that provides ample control over reading fastq-formatted sequences. It is initialized with NewParser.

Example ¶

package main

import (
	"fmt"
	"strings"

	_ "embed"
	"github.com/bebop/poly/io/fastq"
)

//go:embed data/nanosavseq.fastq
var baseFastq string

func main() {
	parser := fastq.NewParser(strings.NewReader(baseFastq), 2*32*1024)
	for {
		fastq, _, err := parser.ParseNext()
		if err != nil {
			fmt.Println(err)
			break
		}
		fmt.Println(fastq.Identifier)
	}
}

Output:

e3cc70d5-90ef-49b6-bbe1-cfef99537d73
92728f25-b658-426c-8cd7-d82dc70dbf71
60907b6b-5e38-498e-9c07-f036ebd8c658
990e110e-5e50-41a2-8ad5-92044d4465b8
EOF

func NewParser ¶

func NewParser(r io.Reader, maxLineSize int) *Parser

NewParser returns a Parser that uses r as the source from which to parse fastq formatted sequences.

func (*Parser) ParseAll ¶

func (parser *Parser) ParseAll() ([]Fastq, error)

ParseAll parses all sequences in underlying reader only returning non-EOF errors. It returns all valid fastq sequences up to error if encountered.

func (*Parser) ParseN ¶

func (parser *Parser) ParseN(maxSequences int) (fastqs []Fastq, err error)

ParseN parses up to maxSequences fastq sequences from the Parser's underlying reader. ParseN does not return EOF if encountered. If an non-EOF error is encountered it returns it and all correctly parsed sequences up to then.

func (*Parser) ParseNext ¶

func (parser *Parser) ParseNext() (Fastq, int64, error)

ParseNext reads next fastq genome in underlying reader and returns the result and the amount of bytes read during the call. ParseNext only returns an error if it:

Attempts to read and fails to find a valid fastq sequence.
Returns reader's EOF if called after reader has been exhausted.
If a EOF is encountered immediately after a sequence with no newline ending. In this case the Fastq up to that point is returned with an EOF error.

It is worth noting the amount of bytes read are always right up to before the next fastq starts which means this function can effectively be used to index where fastqs start in a file or string.

ParseNext is simplified for fastq files from fasta files. Unlike fasta files, fastq always have 4 lines following each other - not variable with a line limit of 80 like fasta files have. So instead of a for loop, you can just parse 4 lines at once.

func (*Parser) Reset ¶

func (parser *Parser) Reset(r io.Reader)

Reset discards all data in buffer and resets state.

Source Files ¶

View all Source files

fastq.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL