kmers

package
v0.0.0-...-d3d09aa Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2021 License: GPL-3.0 Imports: 9 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Build2dSlice

func Build2dSlice(rows int, cols int) [][]float64

Build2dSlice builds a 2d slice of float64 of target size

func RandGCWeightSeq

func RandGCWeightSeq(seqlen int, bases []string, cumWeights []float64) string

RandGCWeightSeq generates a random sequence weighted by GC content cumWeights must be cumulative base weights, with 1 as maximum value

func RandSeqs

func RandSeqs(nseq int, seqlen int, bases []string, gc float64) []string

RandSeqs generates random sequences with target GC content NOTE: Change randstring, currently not weighted

func ScoreSeqs

func ScoreSeqs(seqs []string, genome *Genome) ([]float64, []float64)

ScoreSeqs assigns two scores to each sequence in a list. Scores vary between 0 and 1. The first score only takes k-mer frequency into account, while the second score is adjusted for GC content divergence to the target genome. Rare k-mers increase the score and deviation to genome GC content decreases it.

func SeqGC

func SeqGC(seq string) int

SeqGC returns the number of GC bases in a sequence. Does not handle IUPAC ambiguous bases.

Types

type Chain

type Chain struct {
	Matrix [][]float64    // Markov state transition matrix Lmers -> alphabet
	Lidx   map[string]int // Correspondance between lmers (l=k-1) and Chain's rows
	Bidx   map[string]int // Correspondance between Bases and Chain's cols

}

Chain contains a markov chain of l-th order where l = k-1 giving transition probabilities for the next base. It also has two maps matching lmers and bases to row and col indices of the chain

type Genome

type Genome struct {
	GC       float64        // GC content between 0 and 1
	KmerSize int            // Length of kmers to consider
	Kmers    map[string]int // All kmers and their frequencies
	Bases    []string
	GCWeight float64 // Importance given to GC content of simulated sequences
	Chain    Chain   // Struct containing a Markov chain.
	Similar  bool    // Should equences generated use frequent k-mers ? (instaed of rare k-mers)
}

Genome holds K-mer information about a genome and a Markov state transition matrix of order l = k-1 and transition probabilities are the chance of going to next base B knowing previous l bases.

func NewGenome

func NewGenome(path string, k int, gcWeight float64, similar bool, FixedGC float64) *Genome

NewGenome constructs a Genome object based on a FASTA file and predefined k-mer size.

func (*Genome) FastaToProfile

func (g *Genome) FastaToProfile(file string)

FastaToProfile parses a FASTA file and fills the kmer profile and Markov chain of a Genome struct and set its GC content.

func (*Genome) FillChain

func (g *Genome) FillChain()

FillChain populates transition probabilities in the l-order markov chain based on the Genome Kmer profile. Laplacian smoothing is used to avoid being stuck in a state.

func (*Genome) GenSeqs

func (g *Genome) GenSeqs(nseq int, seqlen int) []string

GenSeqs uses the Markov chain of a Genome object to generate fixed length sequences. It also affects transition probabilities according to the sequence GC deviation and the weight attributed to GC content.

func (*Genome) GenerateKmers

func (g *Genome) GenerateKmers(k int) []string

GenerateKmers initializes a list of all kmers in alphabetical order. Implemented using recursion.

func (*Genome) GetKmers

func (g *Genome) GetKmers(seq string)

GetKmers adds occurrences of kmers in input sequences to the kmer profile of a Genome instance.

func (*Genome) SeedSeq

func (g *Genome) SeedSeq() string

SeedSeq will pick a k-mer using the of their frequencies as probability weights. Uses inverse frequencies if the Similar attribute of receiver genome is set to False. Note that SeedSeq does not directly take GC content into account when picking a k-mer.

type SeqsAndScores

type SeqsAndScores struct {
	Seqs       []string
	KmerScores []float64
	FullScores []float64
}

Define sorting interface to sort sequence according to their (full) scores

type SortByScore

type SortByScore SeqsAndScores

func (SortByScore) Len

func (sbs SortByScore) Len() int

func (SortByScore) Less

func (sbs SortByScore) Less(i, j int) bool

func (SortByScore) Swap

func (sbs SortByScore) Swap(i, j int)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL