dna

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2024 License: BSD-3-Clause Imports: 7 Imported by: 19

Documentation

Overview

Package dna implements a data structure for storage and manipulation of sequences of DNA.

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ErrLenInputSeqNotDivThree = errors.New("length of input sequence is not a factor of three. remaining bases were ignored")
)
View Source
var GeneticCode = map[Codon]AminoAcid{
	{T, G, A}: Stop, {T, A, A}: Stop, {T, A, G}: Stop,
	{G, T, A}: Val, {G, T, C}: Val, {G, T, G}: Val, {G, T, T}: Val,
	{T, A, T}: Tyr, {T, A, C}: Tyr,
	{T, G, G}: Trp,
	{A, C, A}: Thr, {A, C, G}: Thr, {A, C, T}: Thr, {A, C, C}: Thr,
	{T, C, A}: Ser, {T, C, C}: Ser, {T, C, G}: Ser, {T, C, T}: Ser, {A, G, T}: Ser, {A, G, C}: Ser,
	{C, C, C}: Pro, {C, C, T}: Pro, {C, C, A}: Pro, {C, C, G}: Pro,
	{T, T, T}: Phe, {T, T, C}: Phe,
	{A, T, G}: Met,
	{A, A, A}: Lys, {A, A, G}: Lys,
	{T, T, A}: Leu, {T, T, G}: Leu, {C, T, C}: Leu, {C, T, G}: Leu, {C, T, A}: Leu, {C, T, T}: Leu,
	{A, T, T}: Ile, {A, T, C}: Ile, {A, T, A}: Ile,
	{C, A, T}: His, {C, A, C}: His,
	{G, G, G}: Gly, {G, G, A}: Gly, {G, G, T}: Gly, {G, G, C}: Gly,
	{G, A, A}: Glu, {G, A, G}: Glu,
	{C, A, A}: Gln, {C, A, G}: Gln,
	{T, G, T}: Cys, {T, G, C}: Cys,
	{G, A, T}: Asp, {G, A, C}: Asp,
	{A, A, T}: Asn, {A, A, C}: Asn,
	{A, G, A}: Arg, {A, G, G}: Arg, {C, G, C}: Arg, {C, G, G}: Arg, {C, G, A}: Arg, {C, G, T}: Arg,
	{G, C, A}: Ala, {G, C, G}: Ala, {G, C, T}: Ala, {G, C, C}: Ala,
}

GeneticCode is a map of codon arrays to amino acids. Used for translating coding sequences to protein sequences.

Functions

func AllToLower

func AllToLower(bases []Base)

AllToLower changes all bases in a sequence to lowercase.

func AllToUpper

func AllToUpper(bases []Base)

AllToUpper changes all bases in a sequence to uppercase.

func AminoAcidToShortString

func AminoAcidToShortString(a AminoAcid) string

AminoAcidToShortString converts type AminoAcid into single character amino acid symbols.

func AminoAcidToString

func AminoAcidToString(a AminoAcid) string

AminoAcidToString converts type AminoAcid into three letter amino acid symbols.

func BaseToRune

func BaseToRune(base Base) rune

BaseToRune converts a dna.Base type into a rune.

func BaseToString

func BaseToString(b Base) string

BaseToString converts a DNA base to a string by casting a BaseToRune result to a string.

func BasesToString

func BasesToString(bases []Base) string

BasesToString converts a slice of DNA bases into a string. Useful for writing to files.

Example
var baseSeq []Base
baseSeq = []Base{A, C, G, T}
fmt.Println(baseSeq)

var stringSeq string
stringSeq = BasesToString(baseSeq)
fmt.Println(stringSeq)
Output:

[0 1 2 3]
ACGT

func CompareSeqsCaseSensitive

func CompareSeqsCaseSensitive(alpha []Base, beta []Base) int

CompareSeqsCaseSensitive returns an integer defining the relationship between two input sequences. 1 if alpha > beta, -1 if beta > alpha, 0 if the sequences are equal. Case sensitive.

func CompareSeqsCaseSensitiveIgnoreGaps

func CompareSeqsCaseSensitiveIgnoreGaps(alpha []Base, beta []Base) int

CompareSeqsCaseSensitiveIgnoreGaps returns an integer defining the relationship between two input sequences. 1 if alpha > beta, -1 if beta > alpha, 0 if the sequences are equal. Case sensitive. Ignores gaps.

func CompareSeqsIgnoreCase

func CompareSeqsIgnoreCase(alpha []Base, beta []Base) int

CompareSeqsIgnoreCase returns an integer defining the relationship between two input sequences. 1 if alpha > beta, -1 if beta > alpha, 0 if the sequences are equal. Case insensitive.

func CompareSeqsIgnoreCaseAndGaps

func CompareSeqsIgnoreCaseAndGaps(alpha []Base, beta []Base) int

CompareSeqsIgnoreCaseAndGaps returns an integer defining the relationship between two input sequences. 1 if alpha > beta, -1 if beta > alpha, 0 if the sequences are equal. Case insensitive. Ignores gaps.

func CompareTwoDSeqsCaseSensitive

func CompareTwoDSeqsCaseSensitive(alpha [][]Base, beta [][]Base) int

CompareTwoDSeqsCaseSensitive returns an integer defining the relationship between two input lists of sequences. 1 if alpha > beta, -1 if beta > alpha, 0 if the sequences are equal. Case sensitive.

func CompareTwoDSeqsCaseSensitiveIgnoreGaps

func CompareTwoDSeqsCaseSensitiveIgnoreGaps(alpha [][]Base, beta [][]Base) int

CompareTwoDSeqsCaseSensitiveIgnoreGaps returns an integer defining the relationship between two input lists of sequences. 1 if alpha > beta, -1 if beta > alpha, 0 if the sequences are equal. Case sensitive. Ignores gaps.

func CompareTwoDSeqsIgnoreCase

func CompareTwoDSeqsIgnoreCase(alpha [][]Base, beta [][]Base) int

CompareTwoDSeqsIgnoreCase returns an integer defining the relationship between two input lists of sequences. 1 if alpha > beta, -1 if beta > alpha, 0 if the sequences are equal. Case insensitive.

func CompareTwoDSeqsIgnoreCaseAndGaps

func CompareTwoDSeqsIgnoreCaseAndGaps(alpha [][]Base, beta [][]Base) int

CompareTwoDSeqsIgnoreCaseAndGaps returns an integer defining the relationship between two input lists of sequences. 1 if alpha > beta, -1 if beta > alpha, 0 if the sequences are equal. Case insensitive. Ignores gaps.

func Complement

func Complement(bases []Base)

Complement all bases in a sequence of bases.

Example
var baseSeq []Base
baseSeq = []Base{A, T, G}
fmt.Println(BasesToString(baseSeq))

// Complement modifies the slice in place so no return value
Complement(baseSeq)

fmt.Println(BasesToString(baseSeq))
Output:

ATG
TAC

func Count

func Count(seq []Base) (ACount int, CCount int, GCount int, TCount int, NCount int, aCount int, cCount int, gCount int, tCount int, nCount int, gapCount int)

Count returns the number of each base present in the input sequence.

func CountBase

func CountBase(seq []Base, b Base) int

CountBase returns the number of the designated base present in the input sequence.

Example
var seq []Base
seq = []Base{A, A, C, T, T, T}

fmt.Println(CountBase(seq, A))
fmt.Println(CountBase(seq, C))
fmt.Println(CountBase(seq, G))
fmt.Println(CountBase(seq, T))
fmt.Println(CountBase(seq, N))
Output:

2
1
0
3
0

func CountBaseInterval

func CountBaseInterval(seq []Base, b Base, start int, end int) int

CountBaseInterval returns the number of the designated base present in the input range of the sequence.

func CountGaps

func CountGaps(seq []Base) int

CountGaps returns the number of gaps present in the input sequence.

func CountMask

func CountMask(seq []Base) (unmaskedCount int, maskedCount int, gapCount int)

CountMask returns the number of bases that are masked/unmasked (lowercase/uppercase) in the input sequence.

func DefineBase

func DefineBase(b Base) bool

DefineBase returns false if the input base is an N, Gap, Dot, or Nil.

func Dist

func Dist(a []Base, b []Base) (dist int)

Dist returns the number of bases that do not match between the input sequences. Input sequences must be the same length.

func GCContent added in v1.0.1

func GCContent(seq []Base) (gcContent float64)

GCContent returns the GC content for the input sequence. Note that n/Ns are ignored.

func IsEqual

func IsEqual(c1 Codon, c2 Codon) bool

IsEqual compares two Codons and returns true if the underlying sequences are identical.

func IsLower

func IsLower(b Base) bool

IsLower returns true if the input base is lowercase.

func IsSeqOfACGT

func IsSeqOfACGT(seq []Base) bool

IsSeqOfACGT returns true if the input sequences contains only uppercase A/C/G/T.

func MeltingTemp added in v1.0.1

func MeltingTemp(seq []Base) float64

MeltingTemp calculates the melting temp of slice of Base in Celsius with the nearest-neighbor algorithm. Assumes 500 nM of both oligo + template and 50 mM Na+.

func NonSynonymous

func NonSynonymous(c1 Codon, c2 Codon) bool

NonSynonymous compares two Codons and returns true if they encode different AminoAcids.

func PeptideToShortString

func PeptideToShortString(a []AminoAcid) string

PeptideToShortString converts a slice of amino acid into a string of one character amino acid symbols.

func PeptideToString

func PeptideToString(a []AminoAcid) string

PeptideToString converts a slice of AminoAcids into a string of three character amino acid symbols.

func RangeToLower

func RangeToLower(bases []Base, start int, end int)

RangeToLower changes the bases in a set range to lowercase. start is closed, end is open, both are zero-based.

func RangeToUpper

func RangeToUpper(bases []Base, start int, end int)

RangeToUpper changes the bases in a set range to uppercase. start is closed, end is open, both are zero-based.

func ReverseComplement

func ReverseComplement(bases []Base)

ReverseComplement reverses a sequence of bases and complements each base. Used to switch strands and maintain 5' -> 3' orientation.

Example
var baseSeq []Base
baseSeq = []Base{A, T, G}
fmt.Println(BasesToString(baseSeq))

// Reverse complement modifies the slice in place so no return value
ReverseComplement(baseSeq)

fmt.Println(BasesToString(baseSeq))
Output:

ATG
CAT

func SeqsAreSimilar added in v1.0.1

func SeqsAreSimilar(a, b []Base, numAllowedMismatch int) bool

SeqsAreSimilar returns true if the two input sequences have less than or equal mismatches to the user-specified threshold if two sequences of different length, the function will return false. Comparison is case-insensitive.

func Synonymous

func Synonymous(c1 Codon, c2 Codon) bool

Synonymous compares two codons and returns true if the codons code for the same amino acid.

func TranslateToShortString

func TranslateToShortString(b []Base) string

TranslateToShortString converts a sequence of DNA bases into a string of one character amino acid symbols. Input expects bases to be in-frame. If the input sequence is not a factor of three the function will panic.

func TranslateToString

func TranslateToString(b []Base) string

TranslateToString converts a sequence of DNA bases into a string of three character amino acid symbols. Input expects bases to be in-frame. If the input sequence is not a factor of three the function will panic.

Types

type AminoAcid

type AminoAcid byte

AminoAcid converts the twenty canonical amino acids and stop codon into bytes.

const (
	Ala  AminoAcid = 0
	Arg  AminoAcid = 1
	Asn  AminoAcid = 2
	Asp  AminoAcid = 3
	Cys  AminoAcid = 4
	Gln  AminoAcid = 5
	Glu  AminoAcid = 6
	Gly  AminoAcid = 7
	His  AminoAcid = 8
	Ile  AminoAcid = 9
	Leu  AminoAcid = 10
	Lys  AminoAcid = 11
	Met  AminoAcid = 12
	Phe  AminoAcid = 13
	Pro  AminoAcid = 14
	Ser  AminoAcid = 15
	Thr  AminoAcid = 16
	Trp  AminoAcid = 17
	Tyr  AminoAcid = 18
	Val  AminoAcid = 19
	Stop AminoAcid = 20
)

func OneLetterToAminoAcid

func OneLetterToAminoAcid(b byte) AminoAcid

OneLetterToAminoAcid converts a one letter amino acid byte into an AminoAcid type.

func StringToAminoAcid

func StringToAminoAcid(s string, singleLetter bool) []AminoAcid

StringToAminoAcid converts a string into type amino acid. If singleLetter is false, the input string will be processed by the three letter code.

func ThreeLetterToAminoAcid

func ThreeLetterToAminoAcid(s string) AminoAcid

ThreeLetterToAminoAcid converts a three letter amino acid string into an AminoAcid type.

func TranslateCodon

func TranslateCodon(c Codon) AminoAcid

TranslateCodon converts an individual Codon into the corresponding AminoAcid type.

func TranslateSeq

func TranslateSeq(b []Base) []AminoAcid

TranslateSeq takes a sequence of DNA bases and translates it into a slice of Amino acids. Input expects bases to be in-frame. If the input sequence is not a factor of three the function will panic.

func TranslateSeqToTer

func TranslateSeqToTer(b []Base) []AminoAcid

TranslateSeqToTer takes a sequence of DNA bases and translates it into a slice of Amino acids. The translation will end after the first stop codon is reached and the function will return the protein sequence including the trailing stop codon. Any bases beyond the stop codon, or remaining bases after all 3-base codons have been made will be ignored.

type Base

type Base byte

Base stores a single nucleotide as a byte.

const (
	A      Base = 0
	C      Base = 1
	G      Base = 2
	T      Base = 3
	N      Base = 4
	LowerA Base = 5
	LowerC Base = 6
	LowerG Base = 7
	LowerT Base = 8
	LowerN Base = 9
	Gap    Base = 10
	Dot    Base = 11
	Nil    Base = 12
)

func ByteSliceToDnaBases

func ByteSliceToDnaBases(b []byte) []Base

ByteSliceToDnaBases will convert a slice of bytes into a slice of Bases.

func ByteToBase

func ByteToBase(b byte) (Base, error)

ByteToBase converts a byte into a dna.Base if it matches one of the acceptable DNA characters. Notes: It will also mask the lower case values and return dna.Base as uppercase bases. Note: '*', used by VCF to denote deleted alleles, becomes a Gap in DNA.

func CodonsToBases

func CodonsToBases(c []Codon) []Base

CodonsToBases converts a slice of Codons into a slice of DNA bases.

func ComplementSingleBase

func ComplementSingleBase(b Base) Base

ComplementSingleBase returns the nucleotide complementary to the input base.

func CreateAllGaps

func CreateAllGaps(numGaps int) []Base

CreateAllGaps creates a DNA sequence of Gap with length of numGaps.

func CreateAllNs

func CreateAllNs(numGaps int) []Base

CreateAllNs creates a DNA sequence of N with length of numGaps.

func Delete

func Delete(seq []Base, delStart int, delEnd int) []Base

Delete removes bases from a sequence of bases. all base positions are zero based and left closed, right open.

func Extract

func Extract(rec []Base, start int, end int) []Base

Extract returns a subsequence of an input slice of DNA bases from an input start and end point.

func Insert

func Insert(seq []Base, insPos int, insSeq []Base) []Base

Insert adds bases to a sequence of bases. base position is zero-based, insertion happens before specified base giving the length of the sequence puts the insertion at the end.

func RemoveBase

func RemoveBase(bases []Base, baseToRemove Base) []Base

RemoveBase returns a sequence of bases without any of the designated base.

func RemoveGaps

func RemoveGaps(bases []Base) []Base

RemoveGaps returns a sequence of bases with no gaps.

func Replace

func Replace(seq []Base, start int, end int, insSeq []Base) []Base

Replace performs both a deletion and an insertion, replacing the input interval with the input insSeq. all base positions are zero based and left closed, right open.

func ReverseComplementAndCopy added in v1.0.1

func ReverseComplementAndCopy(bases []Base) []Base

ReverseComplementAndCopy returns a reverse complimented sequence of bases. Used to switch strands and maintain 5' -> 3' orientation.

func RuneToBase

func RuneToBase(r rune) (Base, error)

RuneToBase converts a rune into a dna.Base if it matches one of the acceptable DNA characters. Note: '*', used by VCF to denote deleted alleles becomes Nil.

func StringToBase

func StringToBase(s string) Base

StringToBase parses a string into a single DNA base.

func StringToBases

func StringToBases(s string) []Base

StringToBases parses a string into a slice of DNA bases.

Example
var stringSeq string
stringSeq = "ACGT"
fmt.Println(stringSeq)

var baseSeq []Base
baseSeq = StringToBases(stringSeq)
fmt.Println(baseSeq)
Output:

ACGT
[0 1 2 3]

func StringToBasesForced

func StringToBasesForced(s string) []Base

StringToBasesForced parses a string into a slice of DNA bases and N-masks any invalid characters.

func ToLower

func ToLower(b Base) Base

ToLower changes the input base to lowercase.

func ToUpper

func ToUpper(b Base) Base

ToUpper changes the input base to uppercase.

type Codon

type Codon [3]Base

Codon is an array of three DNA bases for genetic analysis of proteins and amino acids.

func BasesToCodons

func BasesToCodons(b []Base) []Codon

BasesToCodons converts a slice of DNA bases into a slice of Codons. Input expects bases to be in-frame. If the input sequence is not a factor of three the function will panic.

func BasesToCodonsIgnoreRemainder

func BasesToCodonsIgnoreRemainder(b []Base) []Codon

BasesToCodonsIgnoreRemainder converts a slice of DNA bases into a slice of Codons. Any bases remaining after all 3-base codons have been assembled will be ignored.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL