ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Allele Frequencies: Difference between revisions
Line 55: | Line 55: | ||
</math> | </math> | ||
===example=== | |||
Example of the use of a genotype probability file for example from the output from beagle. | |||
<pre> | |||
./angsd -outfiles out -doMaf 16 -beagle beagle.file.gz | |||
</pre> | |||
==Estimator from sequencing data== | ==Estimator from sequencing data== | ||
The allele frequencies can be infered directy from the sequencing data[[Li2010|citation]] | The allele frequencies can be infered directy from the sequencing data[[Li2010|citation]] |
Revision as of 14:55, 18 June 2012
Allele Frequency estimation
- -doMaf [int]
INT=1 bfgs known minor
INT=2 EM known minor
INT=4 BFGS unknown minor
INT=8 EM unknown minor
INT=16 frequencies from genotype probabilities
Multiple estimators can be used simultaniusly be summing up the above numbers. Thus -doMaf 7 (1+2+4) will use the first three estimators. If the allele frequencies are estimated from the genotype likelihoods then you need to infer the major and minor allele (-doMajorMinor)
Allele frequencies from genotype likelihoods
The allele frequency estimators are described in citation. For testing reasons two optimazations are availeble. The BFGS and the EM algorithm. The EM algorithm is much faster then the BFGS. The allele frequencies are estimated by assuming that the site is diallelic and the major or minor alleles can be infered prior to the estimation or the uncertaincy of the minor allele can be incorborated into the model.
ML estimator with known minor
First infer the Major and Minor allele and then use BFGS (-doMaf 1) optimazation or the EM algorithm (-doMaf 2) to estimate the allele frequencies.
ML estimator with unknown minor
First infer the Major allele and then use BFGS (-doMaf 4) optimazation or the EM algorithm (-doMaf 8) to estimate the allele frequencies. Here only the Major allele needs to be known and the uncertaincy of infering the minor allele is modelled.
Let denote the major an minor allele assuming adiallelic site, then the maximum likelihood estimate of this pair is found using the likelihood function
Example
Example for estimating the allele frequencies both while assuming known major and minor allele but also while taking the uncertaincy of the minor allele inference into account. The inference of the major and minor allele is done directly from the genotype likelihood
./angsd -outfiles out -doMajorMinor 1 -doMaf 10 -bam bam.filelist
Estimator from genotype probabilities
If the genotype probabilities are known the frequencies can be estimated by summing up the posterior probabilities where is the sequencing data and the allele count of the minor allele. The frequency estimate
example
Example of the use of a genotype probability file for example from the output from beagle.
./angsd -outfiles out -doMaf 16 -beagle beagle.file.gz
Estimator from sequencing data
The allele frequencies can be infered directy from the sequencing datacitation