Documentation ¶
Index ¶
- func Build2dSlice(rows int, cols int) [][]float64
- func RandGCWeightSeq(seqlen int, bases []string, cumWeights []float64) string
- func RandSeqs(nseq int, seqlen int, bases []string, gc float64) []string
- func ScoreSeqs(seqs []string, genome *Genome) ([]float64, []float64)
- func SeqGC(seq string) int
- type Chain
- type Genome
- type SeqsAndScores
- type SortByScore
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Build2dSlice ¶
Build2dSlice builds a 2d slice of float64 of target size
func RandGCWeightSeq ¶
RandGCWeightSeq generates a random sequence weighted by GC content cumWeights must be cumulative base weights, with 1 as maximum value
func RandSeqs ¶
RandSeqs generates random sequences with target GC content NOTE: Change randstring, currently not weighted
func ScoreSeqs ¶
ScoreSeqs assigns two scores to each sequence in a list. Scores vary between 0 and 1. The first score only takes k-mer frequency into account, while the second score is adjusted for GC content divergence to the target genome. Rare k-mers increase the score and deviation to genome GC content decreases it.
Types ¶
type Chain ¶
type Chain struct { Matrix [][]float64 // Markov state transition matrix Lmers -> alphabet Lidx map[string]int // Correspondance between lmers (l=k-1) and Chain's rows Bidx map[string]int // Correspondance between Bases and Chain's cols }
Chain contains a markov chain of l-th order where l = k-1 giving transition probabilities for the next base. It also has two maps matching lmers and bases to row and col indices of the chain
type Genome ¶
type Genome struct { GC float64 // GC content between 0 and 1 KmerSize int // Length of kmers to consider Kmers map[string]int // All kmers and their frequencies Bases []string GCWeight float64 // Importance given to GC content of simulated sequences Chain Chain // Struct containing a Markov chain. Similar bool // Should equences generated use frequent k-mers ? (instaed of rare k-mers) }
Genome holds K-mer information about a genome and a Markov state transition matrix of order l = k-1 and transition probabilities are the chance of going to next base B knowing previous l bases.
func NewGenome ¶
NewGenome constructs a Genome object based on a FASTA file and predefined k-mer size.
func (*Genome) FastaToProfile ¶
FastaToProfile parses a FASTA file and fills the kmer profile and Markov chain of a Genome struct and set its GC content.
func (*Genome) FillChain ¶
func (g *Genome) FillChain()
FillChain populates transition probabilities in the l-order markov chain based on the Genome Kmer profile. Laplacian smoothing is used to avoid being stuck in a state.
func (*Genome) GenSeqs ¶
GenSeqs uses the Markov chain of a Genome object to generate fixed length sequences. It also affects transition probabilities according to the sequence GC deviation and the weight attributed to GC content.
func (*Genome) GenerateKmers ¶
GenerateKmers initializes a list of all kmers in alphabetical order. Implemented using recursion.
func (*Genome) GetKmers ¶
GetKmers adds occurrences of kmers in input sequences to the kmer profile of a Genome instance.
type SeqsAndScores ¶
Define sorting interface to sort sequence according to their (full) scores
type SortByScore ¶
type SortByScore SeqsAndScores
func (SortByScore) Len ¶
func (sbs SortByScore) Len() int
func (SortByScore) Less ¶
func (sbs SortByScore) Less(i, j int) bool
func (SortByScore) Swap ¶
func (sbs SortByScore) Swap(i, j int)