sam

package
v0.0.0-...-f005bc5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 14, 2024 License: MIT Imports: 7 Imported by: 2

Documentation

Overview

Package sam implements a SAM file parser and writer.

SAM is a tab-delimited text format for storing DNA/RNA sequence alignment data. It is the most widely used alignment format, complementing its binary equivalent, BAM, which stores the same data in a compressed format.

DNA sequencing works in the following way:

  • DNA is read in with some raw signal format from the sequencer machine.
  • Raw signal is converted to fastq reads using basecalling software.
  • Fastq reads are aligned to target template, producing SAM files.
  • SAM files are used to answer bioinformatic queries.

This parser allows parsing and writing of SAM files in Go. Unlike other SAM parsers in Golang, we aim to be as close to underlying data types as possible, with a goal of being as simple as possible, and no simpler.

Paper: https://doi.org/10.1093%2Fbioinformatics%2Fbtp352 Spec: http://samtools.github.io/hts-specs/SAMv1.pdf Spec(locally): `dnadesign/lib/bio/sam/SAMv1.pdf`

Index

Examples

Constants

View Source
const DefaultMaxLineSize int = 1024 * 32 * 2 // // 32kB is a magic number often used by the Go stdlib for parsing. We multiply it by two.

Variables

This section is empty.

Functions

func NewParser

func NewParser(r io.Reader, maxLineSize int) (*Parser, Header, error)

NewParser creates a parser from an io.Reader for sam data. For larger alignments, you will want to increase the maxLineSize.

Example
file := strings.NewReader(`@HD	VN:1.6	SO:unsorted	GO:query
@SQ	SN:pOpen_V3_amplified	LN:2482
@PG	ID:minimap2	PN:minimap2	VN:2.24-r1155-dirty	CL:minimap2 -acLx map-ont - APX814_pass_barcode17_e229f2c8_109f9b91_0.fastq.gz
ae9a66f5-bf71-4572-8106-f6f8dbd3b799	16	pOpen_V3_amplified	1	60	8S54M1D3M1D108M1D1M1D62M226S	*	0	0	AGCATGCCGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGTGCTGAGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGAAAACTCTCGACGTTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTTCAGCATCAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTTACTGTTGATGTTCATGTAGGTGCTGATCAGAGGTACTTTCCTGGAGGGTTTAACCTTAGCAATACGTAACGGAACGAAGTACAGGGCAT	%,<??@@{O{HS{{MOG{EHD@@=)))'&%%%%'(((6::::=?=;:7)'''/33387-)(*025557CBBDDFDECD;1+'(&&')(,-('))35@>AFDCBD{LNKKGIL{{JLKI{{IFG>==86668789=<><;056<;>=87:840/++1,++)-,-0{{&&%%&&),-13;<{HGVKCGFI{J{L{G{INJHEA@C540/3568;>EOI{{{I0000HHRJ{{{{{{{RH{N@@?AKLQEEC?==<433345588==FTA??A@G?@@@EC?==;10//2333?AB?<<<--(++*''&&-(((+@DBJQHJHGGPJH{.---@B?<''-++'--&%%&,,,FC:999IEGJ{HJHIGIFEGIFMDEF;8878{KJGFIJHIHDCAA=<<<<;DDB>:::EK{{@{E<==HM{{{KF{{{MDEQM{ECA?=>9--,.3))'')*++.-,**()%%	NM:i:8	ms:i:408	AS:i:408	nn:i:0	tp:A:P	cm:i:29	s1:i:195	s2:i:0	de:f:0.0345	SA:Z:pOpen_V3_amplified,2348,-,236S134M1D92S,60,1;	rl:i:0`)
parser, _, _ := NewParser(file, DefaultMaxLineSize)
samLine, _ := parser.Next()

fmt.Println(samLine.CIGAR)
Output:

8S54M1D3M1D108M1D1M1D62M226S

func Primary

func Primary(a Alignment) bool

Primary determines whether the Alignment is the primary line of the read. This is useful for finding out if a particular read is the best aligned to a certain fragment.

Types

type Alignment

type Alignment struct {
	QNAME     string     // Query template NAME
	FLAG      uint16     // bitwise FLAG
	RNAME     string     // References sequence NAME
	POS       int32      // 1- based leftmost mapping POSition
	MAPQ      byte       // MAPping Quality
	CIGAR     string     // CIGAR string
	RNEXT     string     // Ref. name of the mate/next read
	PNEXT     int32      // Position of the mate/next read
	TLEN      int32      // observed Template LENgth
	SEQ       string     // segment SEQuence
	QUAL      string     // ASCII of Phred-scaled base QUALity+33
	Optionals []Optional // Map of TAG to {TYPE:DATA}
}

Each alignment is a single line of a SAM file, representing a linear alignment of a segment, consisting of 11 or more tab delimited fields. The 11 fields (QNAME -> QUAL) are always available (if the data isn't there, a placeholder '0' or '*' is used instead), with additional optional fields following.

For more information, check section 1.4 of the reference document.

func (*Alignment) Validate

func (alignment *Alignment) Validate() error

Alignment_Validate validates an alignment as valid, given the REGEXP/range defined in the SAM document. Not implemented yet.

func (*Alignment) WriteTo

func (alignment *Alignment) WriteTo(w io.Writer) (int64, error)

Alignment_WriteTo implements the io.WriterTo interface. It writes an alignment line.

type Header struct {
	HD map[string]string   // File-level metadata. Optional. If present, there must be only one @HD line and it must be the first line of the file.
	SQ []map[string]string // Reference sequence dictionary. The order of @SQ lines defines the alignment sorting order.
	RG []map[string]string // Read group. Unordered multiple @RG lines are allowed.
	PG []map[string]string // Program.
	CO []string            // One-line text comment. Unordered multiple @CO lines are allowed. UTF-8 encoding may be used.
}

Each header in a SAM file begins with an @ followed by a two letter record code type. Each line is tab delimited, and contains TAG:VALUE pairs. HD, the first line, only occurs once, while SQ, RG, and PG can appear multiple times. Finally, @CO contains user generated comments.

For more information, check section 1.3 of the reference document.

func (*Header) Validate

func (header *Header) Validate() error

Validate validates that the header has all required information, as described in the SAMv1 specification document. Not implemented yet.

func (*Header) WriteTo

func (header *Header) WriteTo(w io.Writer) (int64, error)

WriteTo writes a SAM header to an io.Writer.

type Optional

type Optional struct {
	Tag  string // Tag is typically a two letter tag corresponding to what the optional represents.
	Type rune   // The type may be one of A (character), B (general array), f (real number), H (hexadecimal array), i (integer), or Z (string).
	Data string // Optional data
}

Optional fields in SAM alignments are structured as TAG:TYPE:DATA, where the type identifiers the typing of the data.

For more information, check section 1.5 of http://samtools.github.io/hts-specs/SAMv1.pdf.

type Parser

type Parser struct {
	FileHeader Header
	// contains filtered or unexported fields
}

Parser is a sam file parser that provide sample control over reading sam alignments. It should be initialized with NewParser.

func (*Parser) Header

func (p *Parser) Header() (Header, error)

Header returns the parsed sam header.

func (*Parser) Next

func (p *Parser) Next() (Alignment, error)

Next parsers the next read from a parser. Returns an `io.EOF` upon EOF.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL