Documentation ¶
Overview ¶
Package popgen contains functions for analyzing population genetic statistics from variation or alignment data.
Package popgen contains tools for population genetic analysis, specifically for selection and population structure.
Index ¶
- Constants
- func AfsDivergenceAscertainmentFixedAlpha(afs Afs, alpha float64, binomMap [][]float64, d int, integralError float64) float64
- func AfsDivergenceAscertainmentFixedAlphaClosure(afs Afs, binomMap [][]float64, d int, integralError float64) func(float64) float64
- func AfsDivergenceAscertainmentLikelihood(afs Afs, alpha []float64, binomMap [][]float64, d int, integralError float64) float64
- func AfsLikelihood(afs Afs, alpha []float64, binomMap [][]float64, integralError float64) float64
- func AfsLikelihoodFixedAlpha(afs Afs, alpha float64, binomMap [][]float64, integralError float64) float64
- func AfsLikelihoodFixedAlphaClosure(afs Afs, binomMap [][]float64, integralError float64) func(float64) float64
- func AfsSampleDensity(n int, k int, alpha float64, binomMap [][]float64, integralError float64) float64
- func AfsStationarity(p float64, alpha float64) float64
- func AfsStationarityClosure(alpha float64) func(float64) float64
- func AfsToFrequency(a Afs) []float64
- func AlleleFrequencyProbability(i int, n int, alpha float64, binomMap [][]float64, integralError float64) float64
- func AlleleFrequencyProbabilityAncestralAscertainment(alpha float64, i int, n int, d int, binomCache [][]float64, ...) float64
- func AlleleFrequencyProbabilityDerivedAscertainment(alpha float64, i int, n int, d int, binomCache [][]float64, ...) float64
- func AncestralAscertainmentDenominator(fCache []float64, fCacheSum float64, d int) float64
- func AncestralAscertainmentProbability(n int, i int, d int) float64
- func BuildBinomCache(allN []int) [][]float64
- func BuildFCache(n int, alpha float64, binomCache [][]float64, integralError float64) []float64
- func BuildLikelihoodCache(allN []int) [][]float64
- func DerivedAscertainmentDenominator(fCache []float64, fCacheSum float64, d int) float64
- func DerivedAscertainmentProbability(n int, i int, d int) float64
- func Dunn(b bed.Bed, aln []fasta.Fasta, g []*Group, realign bool) (float64, int, string)
- func FIntegralComponent(n int, k int, alpha float64, binomMap [][]float64) func(float64) float64
- func FilterMultByGroup(aln []fasta.Fasta, g []*Group) []fasta.Fasta
- func FindMissingGroupMembers(aln []fasta.Fasta, g []*Group) string
- func GetFCacheSum(fCache []float64) float64
- func GroupCompare(a *Group, b *Group) int
- func GroupContains(g *Group, s string) bool
- func GroupListsAreEqual(a []*Group, b []*Group) bool
- func GroupListsCompare(a []*Group, b []*Group) int
- func GroupsContains(g []*Group, s string) bool
- func InvertSegSite(s *SegSite)
- func MetropolisAccept(old Theta, thetaPrime Theta, s McmcSettings) bool
- func MetropolisHastings(data Afs, outFile string, s McmcSettings)
- func PlotAfsF(alpha float64, n int, outFile string, integralError float64)
- func PlotAfsLikelihood(afs Afs, outFile string, leftBound float64, rightBound float64, numPoints int, ...)
- func PlotAfsPmf(alpha float64, n int, outFile string, integralError float64, derived bool, ...)
- func PlotAncestralAscertainmentDenominator(outFile string, n int, d int, alpha float64, integralError float64)
- func PlotAncestralAscertainmentProbability(outFile string, n int, d int)
- func PlotDerivedAscertainmentDenominator(outFile string, n int, d int, alpha float64, integralError float64)
- func PlotDerivedAscertainmentProbability(outFile string, n int, d int)
- func PosteriorOdds(old Theta, thetaPrime Theta) float64
- func SegSiteToAlleleArray(s *SegSite) []int16
- func SelectionMaximumLikelihoodEstimate(data Afs, s MleSettings) float64
- func SimulateGenotype(alpha float64, n int, boundAlpha float64, boundBeta float64, ...) ([]vcf.Sample, bool)
- func StationaritySampler(alpha float64, samples int, maxSampleDepth int, bins int, xLeft float64, ...) []float64
- func VcfSampleDerivedAlleleFrequency(v vcf.Vcf) float64
- func WriteTSV(outFile string, wf WrightFisherPopData)
- type Afs
- type Group
- type LikelihoodFunction
- type McmcSettings
- type MleSettings
- type SegSite
- type Theta
- type WrightFisherPopData
- type WrightFisherSettings
Constants ¶
const IntegralBound float64 = 1e-12
Variables ¶
This section is empty.
Functions ¶
func AfsDivergenceAscertainmentFixedAlpha ¶
func AfsDivergenceAscertainmentFixedAlpha(afs Afs, alpha float64, binomMap [][]float64, d int, integralError float64) float64
AfsDivergenceAscertainmentFixedAlpha returns the likelihood of observing a particular derived allele frequency spectrum for a given selection parameter alpha. This is the special case where every segregating site has the same value for selection. Also applies a correction for divergence-based ascertainment bias.
func AfsDivergenceAscertainmentFixedAlphaClosure ¶
func AfsDivergenceAscertainmentFixedAlphaClosure(afs Afs, binomMap [][]float64, d int, integralError float64) func(float64) float64
AfsDivergenceAscertainmentFixedAlphaClosure returns a func(float64) float64 representing the likelihood function for a specific derived allele frequency spectrum with a single selection parameter alpha. Incorporates a site-by-site correction for divergence-based ascertainment bias.
func AfsDivergenceAscertainmentLikelihood ¶
func AfsDivergenceAscertainmentLikelihood(afs Afs, alpha []float64, binomMap [][]float64, d int, integralError float64) float64
AfsDivergenceAscertainmentLikelihood is like AfsLikelihood, but makes a correction for divergence-based ascertainment when variant sets were selected for divergence or identity between two groups of d individuals.
func AfsLikelihood ¶
AfsLikelihood returns P(Data|alpha), or the likelihood of observing a particular allele frequency spectrum given alpha, a vector of selection parameters.
func AfsLikelihoodFixedAlpha ¶
func AfsLikelihoodFixedAlpha(afs Afs, alpha float64, binomMap [][]float64, integralError float64) float64
AfsLikelihoodFixedAlpha calculates the likelihood of observing a particular frequency spectrum for a given alpha, a selection parameter. This represents the special case where every segregating site has the same value for selection, which enables faster, more simplified calculation.
func AfsLikelihoodFixedAlphaClosure ¶
func AfsLikelihoodFixedAlphaClosure(afs Afs, binomMap [][]float64, integralError float64) func(float64) float64
AfsLikelihoodFixedAlphaClosure returns a func(float64) float64 representing the likelihood function for a specific derived allele frequency spectrum with a single selection parameter alpha.
func AfsSampleDensity ¶
func AfsSampleDensity(n int, k int, alpha float64, binomMap [][]float64, integralError float64) float64
AfsSampleDensity (also referred to as the F function) is the product of the stationarity and binomial distributions integrated over p, the allele frequency.
func AfsStationarity ¶
AfsStationarity returns the function value from a stationarity distribution with selection parameter alpha from a particular input allele frequency p.
func AfsStationarityClosure ¶
AfsStationarityClosure returns a func(float64)float64 for a stationarity distribution with a fixed alpha value for subsequent integration.
func AfsToFrequency ¶
AfsToFrequency converts an allele frequency spectrum into allele frequencies. Useful for constructing subsequent AFS histograms.
func AlleleFrequencyProbability ¶
func AlleleFrequencyProbability(i int, n int, alpha float64, binomMap [][]float64, integralError float64) float64
AlleleFrequencyProbability returns the probability of observing i out of n alleles from a stationarity distribution with selection parameter alpha.
func AlleleFrequencyProbabilityAncestralAscertainment ¶
func AlleleFrequencyProbabilityAncestralAscertainment(alpha float64, i int, n int, d int, binomCache [][]float64, integralError float64) float64
AlleleFrequencyProbabilityAncestralAscertainment returns P(i | Asc, alpha) when the variant set has an ancestral allele ascertainment bias.
func AlleleFrequencyProbabilityDerivedAscertainment ¶
func AlleleFrequencyProbabilityDerivedAscertainment(alpha float64, i int, n int, d int, binomCache [][]float64, integralError float64) float64
AlleleFrequencyProbabilityDerivedAscertainment returns P(i | Asc, alpha) when the variant set has a derived allele ascertainment bias.
func AncestralAscertainmentDenominator ¶
AncestralAscertainmentDenominator is P(Asc | alpha), a constant normalizing factor for P(i | n, alpha) with the ancestral ascertainment correction.
func AncestralAscertainmentProbability ¶
AncestralAscertainmentProbability returns P(Asc | i) for ancestral allele ascertainment corrections.
func BuildBinomCache ¶
BuildBinomCache makes a 2D matrix where each entry binomCache[n][k] is equal to [n choose k] in logSpace.
func BuildFCache ¶
BuildFCache builds a slice of len(n) where each index i contains log(F(i | n, alpha)), where F is popgen.AFSSampleDensity.
func BuildLikelihoodCache ¶
BuildLikelihoodCache constructs a cache of likelihood values for MLE. likelihoodCache[n][i] stores the likelihood for a segregating site of n alleles with i in the derived state for a particular selection parameter alpha.
func DerivedAscertainmentDenominator ¶
DerivedAscertainmentDenominator is P(Asc | alpha), a constant normalizing factor for P(i | n, alpha) with the derived ascertainment correction.
func DerivedAscertainmentProbability ¶
DerivedAscertainmentProbability returns P(Asc | i) for derived allele ascertainment corrections.
func Dunn ¶
Dunn takes a region of the genome, a multiple alignment, a set of groups, and an option to realign the bed region of the alignment before calculating the Dunn Index. The returns are the dunn index as a float64, the number of segregating sites considered, and missing group members as a string. Mathematical details of the Dunn Index are described at https://en.wikipedia.org/wiki/Dunn_index.
func FIntegralComponent ¶
FIntegralComponent is a helper function of AfsSampleDensity and represents the component within the integral.
func FilterMultByGroup ¶
FilterMultByGroup takes in a multiFa alignment returns a multiFa containing only the entries that are contained in an input slice of Group structs.
func FindMissingGroupMembers ¶
FindMissingGroupMembers returns a string of all of the entries in a Group slice that are not contained in the names of a multiFa alignment.
func GetFCacheSum ¶
GetFCacheSum uses the fCAche built in BuildFCache and calculates the sum from j=1 to n-1.
func GroupCompare ¶
GroupCompare compares two Groups a and b for sorting or equality testing.
func GroupContains ¶
GroupContains returns true if an input string s is contained within the members of group g, false otherwise.
func GroupListsAreEqual ¶
GroupListsAreEqual returns true if all the groups in a list of groups are equal to another list of groups, false otherwise.
func GroupListsCompare ¶
GroupListsCompare compares two slices of Groups a and b for sorting and equality testing.
func GroupsContains ¶
GroupsContains returns true if any groups within a slice of groups g contains a string s, false otherwise.
func InvertSegSite ¶
func InvertSegSite(s *SegSite)
InvertSegSite reverses the polarity of a segregating site in place.
func MetropolisAccept ¶
func MetropolisAccept(old Theta, thetaPrime Theta, s McmcSettings) bool
MetropolisAccept is a helper function of MetropolisHastings that determines whether to accept or reject a candidate parameter set.
func MetropolisHastings ¶
func MetropolisHastings(data Afs, outFile string, s McmcSettings)
MetropolisHastings implements the MH algorithm for Markov Chain Monte Carlo approximation of the posterior distribution for selection based on an input allele frequency spectrum. muZero and sigmaZero represent the starting hyperparameter values.
func PlotAfsF ¶
PlotAfsF writes the Allele Frequency F function (AfsSampleDensity) to an output file for downstream visualization.
func PlotAfsLikelihood ¶
func PlotAfsLikelihood(afs Afs, outFile string, leftBound float64, rightBound float64, numPoints int, integralError float64, divergenceAscertainment bool, D int)
PlotAfsLikelihood plots the likelihood function from input Afs data. Specifies a left and right bound for plotting and a number of points to be plotted. Optiosn available for applying the divergence based ascertainment bias correction, changing the ascertainment subset size, and changing the integral error. This program will fatal if it is asked to plot the likelihood when alpha is equal to exactly 0.
func PlotAfsPmf ¶
func PlotAfsPmf(alpha float64, n int, outFile string, integralError float64, derived bool, ancestral bool)
PlotAfsPmf writes the allele frequency probability mass function (AlleleFrequencyProbability) to an output file for downstream visualization. Derived or ancestral flag enable visualization of the AfsPmf with the divergence-based ascertainment correction.
func PlotAncestralAscertainmentDenominator ¶
func PlotAncestralAscertainmentDenominator(outFile string, n int, d int, alpha float64, integralError float64)
PlotAncestralAscertainmentDenominator takes the path to an output file and prints the values of AncestralAscertainmentDenominator to the file for the parameters provided.
func PlotAncestralAscertainmentProbability ¶
PlotAncestralAscertainmentProbability takes the path to an output file and prints the values of AncestralAscertainmentProbability(n, i, d) for all values of i to the file.
func PlotDerivedAscertainmentDenominator ¶
func PlotDerivedAscertainmentDenominator(outFile string, n int, d int, alpha float64, integralError float64)
PlotDerivedAscertainmentDenominator takes the path to an output file and prints the values of DerivedAscertainmentDenominator to the file, given the input parameters.
func PlotDerivedAscertainmentProbability ¶
PlotDerivedAscertainmentProbability takes the path to an output file and prints the values of DerivedAscertainmentProbability(n, i, d) for all values of i to the file.
func PosteriorOdds ¶
PosteriorOdds is a helper function of MetropolisAccept that returns the Bayes Factor times the Prior Odds this should be the probability of accepting (can be greater than 1) if the Hastings Ratio is one.
func SegSiteToAlleleArray ¶
SegSiteToAlleleArray is a helper function of SimulateGenotype that takes a SegSite, constructs and array of values with i values set to 1 and n-i values set to 0. The array is then shuffled and returned.
func SelectionMaximumLikelihoodEstimate ¶
func SelectionMaximumLikelihoodEstimate(data Afs, s MleSettings) float64
SelectionMaximumLikelihoodEstimate performs MLE on an input allele frequency spectrum and writes the result to an output file.
func SimulateGenotype ¶
func SimulateGenotype(alpha float64, n int, boundAlpha float64, boundBeta float64, boundMultiplier float64) ([]vcf.Sample, bool)
SimulateGenotype returns a slice of type vcf.GenomeSample, representing a Sample field of a vcf struct, with an allele frequency drawn from a stationarity distribution with selection parameter alpha. Second return is true if the current genotype is a divergent base.
func StationaritySampler ¶
func StationaritySampler(alpha float64, samples int, maxSampleDepth int, bins int, xLeft float64, xRight float64) []float64
StationaritySampler returns an allele frequency i out of n individuals sampled from a stationarity distribution with selection parameter alpha.
func VcfSampleDerivedAlleleFrequency ¶
VcfSampleDerivedAlleleFrequency returns the derived allele frequency based on the sample columns of a VCF variant.
func WriteTSV ¶
func WriteTSV(outFile string, wf WrightFisherPopData)
WriteTSV writes a .tsv file based on WrightFisherPopData, including: Comments starts with # that include metadata about the parameters of the simulation Header line: Gen Site Freq.A Freq.C Freq.G Freq.T Ancestral Frequencies table.
Types ¶
type Afs ¶
type Afs struct {
Sites []*SegSite
}
Afs is a struct that contains an Allele Frequency Spectrum, or a set of segregating sites.
func MultiFaToAfs ¶
MultiFaToAfs constructs an allele frequency spectrum struct from a multiFa alignment block.
func VcfToAfs ¶
func VcfToAfs(filename string, UnPolarized bool, DivergenceAscertainment bool, IncludeRef bool) (*Afs, error)
VcfToAfs reads in a vcf file, parses the genotype information, and constructs an AFS struct. Polarized flag, when true, returns only variants with the ancestor annotated in terms of polarized, derived allele frequencies. IncludeRef, when true, uses the reference genome as an additional data point for the output AFS.
type Group ¶
Group is a struct with associated functions that provide a system to partition a list of samples or species for subsequent analysis of subsections of the overall data.
func ReadGroups ¶
ReadGroups parses a Group format file into a slice of Group structs.
type LikelihoodFunction ¶
type LikelihoodFunction byte
const ( Uncorrected LikelihoodFunction = iota Ancestral Derived )
type McmcSettings ¶
type McmcSettings struct { Iterations int MuStep float64 MuZero float64 SigmaStep float64 SigmaZero float64 RandSeed bool //pseudorandom number generation control: set the seed as current unix time SetSeed int64 //pseudorandom number generation control: set the seed to a user-specified int64. UnPolarized bool DivergenceAscertainment bool FixedSigma bool D int //D is the size of the ascertainment subset. IntegralError float64 Verbose int SigmaPriorAlpha float64 //defines the alpha parameter for the prior distribution for the Theta hyperparameter sigma. SigmaPriorBeta float64 //defines the beta parameter for the prior distribution for the Theta hyperparameter sigma. MuPriorMean float64 //defines the mean of the prior distribution for the Theta hyperparameter mu. MuPriorSigma float64 //defines the standard deviation of the prior distribution for the Theta hyperparameter mu. IncludeRef bool //If true, includes the reference genome allele state as a datapoint in the allele frequency spectrum. }
McmcSettings is a struct that stores settings for the various Mcmc helper functions.`.
type MleSettings ¶
type MleSettings struct { Left float64 //left bound of MLE search space Right float64 //right bound of MLE search space Error float64 //desired accuracy in output maximum likelihood estimate UnPolarized bool //disable the need for ancestor annotation in the vcf file. Assumes the reference allele is in the ancestral state. Use with caution. DivergenceAscertainment bool //if true, use the divergence-based ascertainment bias-corrected likelihood function for selection. D int //for DivergenceAscertainment, set the size of the ascertainment subset. IntegralError float64 //set the acceptable error in the internal integral calculations in the likelihood function. Verbose int //default 0. When set to 1, debug prints appear in standard output. IncludeRef bool //Includes the reference genome allele state in the derived allele frequency spectrum. }
MleSettings delineates the experimental parameters for running maximum likelihood estimation on a set of variants using selectionMLE.
type SegSite ¶
type SegSite struct { I int //individuals with the allele N int //total number of individuals L LikelihoodFunction //specifies with likelihood function to use. 1 for ancestral, 0 for uncorrected, 2 for derived. }
SegSite is the basic struct for segregating sites, used to construct allele frequency spectra.
func SimulateSegSite ¶
func SimulateSegSite(alpha float64, n int, boundAlpha float64, boundBeta float64, boundMultiplier float64) (*SegSite, bool)
SimulateSegSite returns a segregating site with a non-zero allele frequency sampled from a stationarity distribution with selection parameter alpha. the second returns true if the site is divergent.
func VcfSampleToSegSite ¶
func VcfSampleToSegSite(i vcf.Vcf, DivergenceAscertainment bool, UnPolarized bool, IncludeRef bool) (*SegSite, bool)
VcfSampleToSegSite returns a SegSite struct from an input Vcf entry. Enables flag for divergenceBasedAscertainment correction conditions and the unPolarized condition. Two returns: a pointer to the SegSite struct, and a bool that is true if the SegSite was made without issue, false for soft errors. If IncludeRef is true, adds the reference allele as an extra data point to the SegSite.
type Theta ¶
type Theta struct {
// contains filtered or unexported fields
}
Theta is a struct that stores parameter sets, including the alpha vector, mu, and sigma parameters, along with the likelihood of a particular parameter set for MCMC.
func GenerateCandidateThetaPrime ¶
func GenerateCandidateThetaPrime(t Theta, data Afs, binomCache [][]float64, s McmcSettings) Theta
GenerateCandidateThetaPrime is a helper function of Metropolis Hastings that picks a new set of parameters based on the state of the current parameter set t. TODO: We could avoid some memory allocations by passing in an "old" theta and overwriting the values.
func InitializeTheta ¶
func InitializeTheta(m float64, sig float64, data Afs, binomCache [][]float64, s McmcSettings) Theta
InitializeTheta is a helper function of Metropolis Hastings that generates the initial value of theta based on argument values.