Documentation
¶
Overview ¶
Package sam implements SAM file format reading and writing. The SAM format is described in the SAM specification.
Index ¶
- Constants
- func IsValidRecord(r *Record) bool
- type ASCII
- type Aux
- type AuxFields
- type Cigar
- type CigarOp
- type CigarOpType
- type Consume
- type Doublet
- type Flags
- type GroupOrder
- type Header
- func (bh *Header) AddProgram(p *Program) error
- func (bh *Header) AddReadGroup(rg *ReadGroup) error
- func (bh *Header) AddReference(r *Reference) error
- func (bh *Header) Clone() *Header
- func (bh *Header) DecodeBinary(r io.Reader) error
- func (bh *Header) EncodeBinary(w io.Writer) error
- func (bh *Header) Get(t Tag) string
- func (bh *Header) MarshalBinary() ([]byte, error)
- func (bh *Header) MarshalText() ([]byte, error)
- func (bh *Header) Progs() []*Program
- func (bh *Header) RGs() []*ReadGroup
- func (bh *Header) Refs() []*Reference
- func (bh *Header) RemoveProgram(p *Program) error
- func (bh *Header) RemoveReadGroup(rg *ReadGroup) error
- func (bh *Header) RemoveReference(r *Reference) error
- func (bh *Header) Set(t Tag, value string) error
- func (bh *Header) Tags(fn func(t Tag, value string))
- func (bh *Header) UnmarshalBinary(b []byte) error
- func (bh *Header) UnmarshalText(text []byte) error
- func (bh *Header) Validate(r *Record) error
- type Hex
- type Iterator
- type Program
- func (p *Program) Clone() *Program
- func (p *Program) Command() string
- func (p *Program) Get(t Tag) string
- func (p *Program) ID() int
- func (p *Program) Name() string
- func (p *Program) Previous() string
- func (p *Program) Set(t Tag, value string) error
- func (r *Program) SetUID(uid string) error
- func (p *Program) String() string
- func (p *Program) Tags(fn func(t Tag, value string))
- func (p *Program) UID() string
- func (p *Program) Version() string
- type ReadGroup
- func (r *ReadGroup) Clone() *ReadGroup
- func (r *ReadGroup) Get(t Tag) string
- func (r *ReadGroup) ID() int
- func (r *ReadGroup) Library() string
- func (r *ReadGroup) Name() string
- func (r *ReadGroup) PlatformUnit() string
- func (r *ReadGroup) Set(t Tag, value string) error
- func (r *ReadGroup) SetName(n string) error
- func (r *ReadGroup) String() string
- func (r *ReadGroup) Tags(fn func(t Tag, value string))
- func (r *ReadGroup) Time() time.Time
- type Reader
- type Record
- func (r *Record) Bin() int
- func (r *Record) End() int
- func (r *Record) Len() int
- func (r *Record) LessByCoordinate(other *Record) bool
- func (r *Record) LessByName(other *Record) bool
- func (r *Record) MarshalSAM(flags int) ([]byte, error)
- func (r *Record) MarshalText() ([]byte, error)
- func (r *Record) RefID() int
- func (r *Record) Start() int
- func (r *Record) Strand() int8
- func (r *Record) String() string
- func (r *Record) Tag(tag []byte) (v Aux, ok bool)
- func (r *Record) UnmarshalSAM(h *Header, b []byte) error
- func (r *Record) UnmarshalText(b []byte) error
- type RecordReader
- type Reference
- func (r *Reference) AssemblyID() string
- func (r *Reference) Clone() *Reference
- func (r *Reference) Get(t Tag) string
- func (r *Reference) ID() int
- func (r *Reference) Len() int
- func (r *Reference) MD5() []byte
- func (r *Reference) Name() string
- func (r *Reference) Set(t Tag, value string) error
- func (r *Reference) SetLen(l int) error
- func (r *Reference) SetName(n string) error
- func (r *Reference) Species() string
- func (r *Reference) String() string
- func (r *Reference) Tags(fn func(t Tag, value string))
- func (r *Reference) URI() string
- type Seq
- type SortOrder
- type Tag
- type Text
- type Writer
Examples ¶
Constants ¶
const ( FlagDecimal = iota FlagHex FlagString )
Flag format constants.
Variables ¶
This section is empty.
Functions ¶
func IsValidRecord ¶
IsValidRecord returns whether the record satisfies the conditions that it has the Unmapped flag set if it not placed; that the MateUnmapped flag is set if it paired its mate is unplaced; that the CIGAR length matches the sequence and quality string lengths if they are non-zero; and that the Paired, ProperPair, Unmapped and MateUnmapped flags are consistent.
Types ¶
type Aux ¶
type Aux []byte
An Aux represents an auxiliary data field from a SAM alignment record.
func NewAux ¶
NewAux returns a new Aux with the given tag, type and value. Acceptable value types and their corresponding SAM type are:
A - ASCII c - int8 C - uint8 s - int16 S - uint16 i - int, uint or int32 I - int, uint or uint32 f - float32 Z - Text or string H - Hex B - []int8, []int16, []int32, []uint8, []uint16, []uint32 or []float32
The handling of int and uint types is provided as a convenience - values must fit within either int32 or uint32 and are converted to the smallest possible representation.
func (Aux) Kind ¶
Kind returns a byte corresponding to the kind of the auxiliary tag. Returned values are in {'A', 'i', 'f', 'Z', 'H', 'B'}.
type Cigar ¶
type Cigar []CigarOp
Cigar is a set of CIGAR operations.
func ParseCigar ¶
ParseCigar returns a Cigar parsed from the provided byte slice. ParseCigar will break CIGAR operations longer than 2^28-1 into multiple operations summing to the same length.
func (Cigar) IsValid ¶
IsValid returns whether the CIGAR string is valid for a record of the given sequence length. Validity is defined by the sum of query consuming operations matching the given length, clipping operations only being present at the ends of alignments, and that CigarBack operations only result in query-consuming positions at or right of the start of the alignment.
type CigarOp ¶
type CigarOp uint32
CigarOp is a single CIGAR operation including the operation type and the length of the operation.
func NewCigarOp ¶
func NewCigarOp(t CigarOpType, n int) CigarOp
NewCigarOp returns a CIGAR operation of the specified type with length n. Due to a limitation of the BAM format, CIGAR operation lengths are limited to 2^28-1, and NewCigarOp will panic if n is above this or negative.
func (CigarOp) Type ¶
func (co CigarOp) Type() CigarOpType
Type returns the type of the CIGAR operation for the CigarOp.
type CigarOpType ¶
type CigarOpType byte
A CigarOpType represents the type of operation described by a CigarOp.
const ( CigarMatch CigarOpType = iota // Alignment match (can be a sequence match or mismatch). CigarInsertion // Insertion to the reference. CigarDeletion // Deletion from the reference. CigarSkipped // Skipped region from the reference. CigarSoftClipped // Soft clipping (clipped sequences present in SEQ). CigarHardClipped // Hard clipping (clipped sequences NOT present in SEQ). CigarPadded // Padding (silent deletion from padded reference). CigarEqual // Sequence match. CigarMismatch // Sequence mismatch. CigarBack // Skip backwards. )
func (CigarOpType) Consumes ¶
func (ct CigarOpType) Consumes() Consume
Consumes returns the CIGAR operation alignment consumption characteristics for the CigarOpType.
The Consume values for each of the CigarOpTypes is as follows:
Query Reference CigarMatch 1 1 CigarInsertion 1 0 CigarDeletion 0 1 CigarSkipped 0 1 CigarSoftClipped 1 0 CigarHardClipped 0 0 CigarPadded 0 0 CigarEqual 1 1 CigarMismatch 1 1 CigarBack 0 -1
func (CigarOpType) String ¶
func (ct CigarOpType) String() string
String returns the string representation of a CigarOpType.
type Consume ¶
type Consume struct {
Query, Reference int
}
Consume describes how CIGAR operations consume alignment bases.
Example ¶
package main import ( "fmt" "github.com/biogo/hts/sam" ) func min(a, b int) int { if a > b { return b } return a } func max(a, b int) int { if a < b { return b } return a } // Overlap returns the length of the overlap between the alignment // of the SAM record and the interval specified. // // Note that this will count repeated matches to the same reference // location if CigarBack operations are used. func Overlap(r *sam.Record, start, end int) int { var overlap int pos := r.Pos for _, co := range r.Cigar { t := co.Type() con := t.Consumes() lr := co.Len() * con.Reference if con.Query == con.Reference { o := min(pos+lr, end) - max(pos, start) if o > 0 { overlap += o } } pos += lr } return overlap } func main() { // Example alignments from the SAM specification: // // @HD VN:1.5 SO:coordinate // @SQ SN:ref LN:45 // @CO -------------------------------------------------------- // @CO Coor 12345678901234 5678901234567890123456789012345 // @CO ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT // @CO -------------------------------------------------------- // @CO +r001/1 TTAGATAAAGGATA*CTG // @CO +r002 aaaAGATAA*GGATA // @CO +r003 gcctaAGCTAA // @CO +r004 ATAGCT..............TCAGC // @CO -r003 ttagctTAGGC // @CO -r001/2 CAGCGGCAT // @CO -------------------------------------------------------- // r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG * // r002 0 ref 9 30 3S6M1P1I4M * 0 0 AAAAGATAAGGATA * // r003 0 ref 9 30 5S6M * 0 0 GCCTAAGCTAA * SA:Z:ref,29,-,6H5M,17,0; // r004 0 ref 16 30 6M14N5M * 0 0 ATAGCTTCAGC * // r003 2064 ref 29 17 6H5M * 0 0 TAGGC * SA:Z:ref,9,+,5S6M,30,1; // r001 147 ref 37 30 9M = 7 -39 CAGCGGCAT * NM:i:1 const ( refStart = 0 refEnd = 45 ) records := []*sam.Record{ {Name: "r001/1", Pos: 6, Cigar: []sam.CigarOp{ sam.NewCigarOp(sam.CigarMatch, 8), sam.NewCigarOp(sam.CigarInsertion, 2), sam.NewCigarOp(sam.CigarMatch, 4), sam.NewCigarOp(sam.CigarDeletion, 1), sam.NewCigarOp(sam.CigarMatch, 3), }}, {Name: "r002", Pos: 8, Cigar: []sam.CigarOp{ sam.NewCigarOp(sam.CigarSoftClipped, 3), sam.NewCigarOp(sam.CigarMatch, 6), sam.NewCigarOp(sam.CigarPadded, 1), sam.NewCigarOp(sam.CigarInsertion, 1), sam.NewCigarOp(sam.CigarMatch, 4), }}, {Name: "r003", Pos: 8, Cigar: []sam.CigarOp{ sam.NewCigarOp(sam.CigarSoftClipped, 5), sam.NewCigarOp(sam.CigarMatch, 6), }}, {Name: "r004", Pos: 15, Cigar: []sam.CigarOp{ sam.NewCigarOp(sam.CigarMatch, 6), sam.NewCigarOp(sam.CigarSkipped, 14), sam.NewCigarOp(sam.CigarMatch, 5), }}, {Name: "r003", Pos: 28, Cigar: []sam.CigarOp{ sam.NewCigarOp(sam.CigarHardClipped, 6), sam.NewCigarOp(sam.CigarMatch, 5), }}, {Name: "r001/2", Pos: 36, Cigar: []sam.CigarOp{ sam.NewCigarOp(sam.CigarMatch, 9), }}, } for _, r := range records { fmt.Printf("%q overlaps reference by %d letters\n", r.Name, Overlap(r, refStart, refEnd)) } }
Output: "r001/1" overlaps reference by 15 letters "r002" overlaps reference by 10 letters "r003" overlaps reference by 6 letters "r004" overlaps reference by 11 letters "r003" overlaps reference by 5 letters "r001/2" overlaps reference by 9 letters
type Flags ¶
type Flags uint16
A Flags represents a BAM record's alignment FLAG field.
const ( Paired Flags = 1 << iota // The read is paired in sequencing, no matter whether it is mapped in a pair. ProperPair // The read is mapped in a proper pair. Unmapped // The read itself is unmapped; conflictive with ProperPair. MateUnmapped // The mate is unmapped. Reverse // The read is mapped to the reverse strand. MateReverse // The mate is mapped to the reverse strand. Read1 // This is read1. Read2 // This is read2. Secondary // Not primary alignment. QCFail // QC failure. Duplicate // Optical or PCR duplicate. Supplementary // Supplementary alignment, indicates alignment is part of a chimeric alignment. )
func (Flags) String ¶
String representation of BAM alignment flags:
0x001 - p - Paired 0x002 - P - ProperPair 0x004 - u - Unmapped 0x008 - U - MateUnmapped 0x010 - r - Reverse 0x020 - R - MateReverse 0x040 - 1 - Read1 0x080 - 2 - Read2 0x100 - s - Secondary 0x200 - f - QCFail 0x400 - d - Duplicate 0x800 - S - Supplementary
Note that flag bits are represented high order to the right.
type GroupOrder ¶
type GroupOrder int
GroupOrder indicates the grouping order of a SAM or BAM file.
const ( GroupUnspecified GroupOrder = iota GroupNone GroupQuery GroupReference )
func (GroupOrder) String ¶
func (g GroupOrder) String() string
String returns the string representation of a GroupOrder.
type Header ¶
type Header struct { Version string SortOrder SortOrder GroupOrder GroupOrder Comments []string // contains filtered or unexported fields }
Header is a SAM or BAM header.
func MergeHeaders ¶ added in v1.1.0
MergeHeaders returns a new Header resulting from the merge of the source Headers, and a mapping between the references in the source and the References in the returned Header. Sort order is set to unknown and group order is set to none. If a single Header is passed to MergeHeaders, the mapping between source and destination headers, reflink, is returned as nil. The returned Header contains the read groups and programs of the first Header in src.
func NewHeader ¶
NewHeader returns a new Header based on the given text and list of References. If there is a conflict between the text and the given References NewHeader will return a non-nil error.
func (*Header) AddProgram ¶
AddProgram adds p to the Header.
func (*Header) AddReadGroup ¶
AddReadGroup adds rg to the Header.
func (*Header) AddReference ¶
AddReference adds r to the Header.
func (*Header) DecodeBinary ¶
DecodeBinary unmarshals a Header from the given io.Reader. The byte stream must be in the format described in the SAM specification, section 4.2.
func (*Header) EncodeBinary ¶
EncodeBinary writes a binary encoding of the Header to the given io.Writer. The format of the encoding is defined in the SAM specification, section 4.2.
func (*Header) Get ¶
Get returns the string representation of the value associated with the given header line tag. If the tag is not present the empty string is returned.
func (*Header) MarshalBinary ¶
MarshalBinary implements the encoding.BinaryMarshaler.
func (*Header) MarshalText ¶
MarshalText implements the encoding.TextMarshaler interface.
func (*Header) Progs ¶
Progs returns the Header's list of Programs. The returned slice should not be altered.
func (*Header) RGs ¶
RGs returns the Header's list of ReadGroups. The returned slice should not be altered.
func (*Header) Refs ¶
Refs returns the Header's list of References. The returned slice should not be altered.
func (*Header) RemoveProgram ¶ added in v1.1.0
RemoveProgram removes p from the Header and makes it available to add to another Header.
func (*Header) RemoveReadGroup ¶ added in v1.1.0
RemoveReadGroup removes rg from the Header and makes it available to add to another Header.
func (*Header) RemoveReference ¶ added in v1.1.0
RemoveReference removes r from the Header and makes it available to add to another Header.
func (*Header) Set ¶
Set sets the value associated with the given header line tag to the specified value. If value is the empty string and the tag may be absent, it is deleted or set to a meaningful default (SO:UnknownOrder and GO:GroupUnspecified), otherwise an error is returned.
func (*Header) Tags ¶ added in v1.1.0
Tags applies the function fn to each of the tag-value pairs of the Header. The SO and GO tags are only used if they are set to the non-default values. The function fn must not add or delete tags held by the receiver during iteration.
func (*Header) UnmarshalBinary ¶
UnmarshalBinary implements the encoding.BinaryUnmarshaler interface.
func (*Header) UnmarshalText ¶
UnmarshalText implements the encoding.TextUnmarshaler interface.
func (*Header) Validate ¶
Validate checks r against the Header for record validity according to the SAM specification:
- a program auxiliary field must refer to a program listed in the header
- a read group auxiliary field must refer to a read group listed in the header and these must agree on platform unit and library.
type Iterator ¶
type Iterator struct {
// contains filtered or unexported fields
}
Iterator wraps a Reader to provide a convenient loop interface for reading SAM/BAM data. Successive calls to the Next method will step through the features of the provided Reader. Iteration stops unrecoverably at EOF or the first error.
func NewIterator ¶
func NewIterator(r RecordReader) *Iterator
NewIterator returns a Iterator to read from r.
i, err := NewIterator(r) if err != nil { return err } for i.Next() { fn(i.Record()) } return i.Error()
func (*Iterator) Error ¶
Error returns the first non-EOF error that was encountered by the Iterator.
func (*Iterator) Next ¶
Next advances the Iterator past the next record, which will then be available through the Record method. It returns false when the iteration stops, either by reaching the end of the input or an error. After Next returns false, the Error method will return any error that occurred during iteration, except that if it was io.EOF, Error will return nil.
type Program ¶
type Program struct {
// contains filtered or unexported fields
}
Program represents a SAM program.
func NewProgram ¶
NewProgram returns a Program with the given unique ID, name, command, previous program ID in the pipeline and version.
func (*Program) Get ¶
Get returns the string representation of the value associated with the given program line tag. If the tag is not present the empty string is returned.
func (*Program) Set ¶
Set sets the value associated with the given program line tag to the specified value. If value is the empty string and the tag may be absent, it is deleted.
func (*Program) String ¶
String returns a string representation of the program according to the SAM specification section 1.3,
func (*Program) Tags ¶ added in v1.1.0
Tags applies the function fn to each of the tag-value pairs of the Program. The function fn must not add or delete tags held by the receiver during iteration.
type ReadGroup ¶
type ReadGroup struct {
// contains filtered or unexported fields
}
ReadGroup represents a sequencing read group.
func NewReadGroup ¶
func NewReadGroup(name, center, desc, lib, prog, plat, unit, sample, flow, key string, date time.Time, size int) (*ReadGroup, error)
NewReadGroup returns a ReadGroup with the given name, center, description, library, program, platform, unique platform unit, sample name, flow order, key, date of read group production, and predicted median insert size sequence.
func (*ReadGroup) Get ¶
Get returns the string representation of the value associated with the given read group line tag. If the tag is not present the empty string is returned.
func (*ReadGroup) PlatformUnit ¶
PlatformUnit returns the unique platform unit for the read group.
func (*ReadGroup) Set ¶
Set sets the value associated with the given read group line tag to the specified value. If value is the empty string and the tag may be absent, it is deleted.
func (*ReadGroup) String ¶
String returns a string representation of the read group according to the SAM specification section 1.3,
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader implements SAM format reading.
type Record ¶
type Record struct { Name string Ref *Reference Pos int MapQ byte Cigar Cigar Flags Flags MateRef *Reference MatePos int TempLen int Seq Seq Qual []byte AuxFields AuxFields }
Record represents a SAM/BAM record.
func NewRecord ¶
func NewRecord(name string, ref, mRef *Reference, p, mPos, tLen int, mapQ byte, co []CigarOp, seq, qual []byte, aux []Aux) (*Record, error)
NewRecord returns a Record, checking for consistency of the provided attributes.
func (*Record) End ¶
End returns the highest query-consuming coordinate end of the alignment. The position returned by End is not valid if r.Cigar.IsValid(r.Seq.Length) is false.
func (*Record) LessByCoordinate ¶ added in v1.1.0
LessByCoordinate returns true if the receiver sorts by coordinate before other according to the SAM specification.
func (*Record) LessByName ¶ added in v1.1.0
LessByName returns true if the receiver sorts by record name before other.
func (*Record) MarshalSAM ¶
MarshalSAM formats a Record as SAM using the specified flag format. Acceptable formats are FlagDecimal, FlagHex and FlagString.
func (*Record) MarshalText ¶
MarshalText implements encoding.TextMarshaler. It calls MarshalSAM with FlagDecimal.
func (*Record) Strand ¶
Strand returns an int8 indicating the strand of the alignment. A positive return indicates alignment in the forward orientation, a negative returns indicates alignment in the reverse orientation.
func (*Record) Tag ¶
Tag returns an Aux tag whose tag ID matches the first two bytes of tag and true. If no tag matches, nil and false are returned.
func (*Record) UnmarshalSAM ¶
UnmarshalSAM parses a SAM format alignment line in the provided []byte, using references from the provided Header. If a nil Header is passed to UnmarshalSAM and the SAM data include non-empty refence and mate reference names, fake references with zero length and an ID of -1 are created to hold the reference names.
func (*Record) UnmarshalText ¶
UnmarshalText implements the encoding.TextUnmarshaler. It calls UnmarshalSAM with a nil Header.
type RecordReader ¶
RecordReader wraps types that can read SAM Records.
type Reference ¶
type Reference struct {
// contains filtered or unexported fields
}
Reference is a mapping reference.
func NewReference ¶
func NewReference(name, assemID, species string, length int, md5 []byte, uri *url.URL) (*Reference, error)
NewReference returns a new Reference based on the given parameters. Only name and length are mandatory and length must be a valid reference length according to the SAM specification, [1, 1<<31).
func (*Reference) AssemblyID ¶
AssemblyID returns the assembly ID of the reference.
func (*Reference) Get ¶
Get returns the string representation of the value associated with the given reference line tag. If the tag is not present the empty string is returned.
func (*Reference) Set ¶
Set sets the value associated with the given reference line tag to the specified value. If value is the empty string and the tag may be absent, it is deleted.
func (*Reference) SetLen ¶
SetLen sets the length of the reference sequence to l. The given length must be a valid SAM reference length.
func (*Reference) String ¶
String returns a string representation of the Reference according to the SAM specification section 1.3,
type Seq ¶
Seq is a nybble-encode pair of nucleotide sequence.
type Tag ¶
type Tag [2]byte
A Tag represents an auxiliary or header tag label.