ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
HWE test: Difference between revisions
(→Output) |
No edit summary |
||
Line 1: | Line 1: | ||
Test for Hardy Weinberg equilibrium based on genotype likelihoods. This class works both as a filter for all other classes and outputs the results in a file. | Test for Hardy Weinberg equilibrium based on genotype likelihoods. This class works both as a filter for all other classes and outputs the results in a file. | ||
If you want to estimate inbreeding for individuals or include inbreeding information in your analysis try [HWE_and_Inbreeding_estimates]. | If you want to estimate inbreeding for individuals or include inbreeding information in your analysis try [[HWE_and_Inbreeding_estimates]]. | ||
Line 27: | Line 27: | ||
==Output== | ==Output== | ||
This function will also print the results of the selected sites. If you choose -HWE_pval 1 then all sites (that pass other filters) will be outputted. | This function will also print the results of the selected sites. If you choose -HWE_pval 1 then all sites (that pass other filters) will be outputted. | ||
<div class="toccolours mw-collapsible mw-collapsed"> | <div class="toccolours mw-collapsible mw-collapsed"> | ||
Example of output | Example of output *.hwe.gz | ||
<pre class="mw-collapsible-content"> | <pre class="mw-collapsible-content"> | ||
Chromo Position Major Minor hweFreq Freq F LRT p-value | Chromo Position Major Minor hweFreq Freq F LRT p-value | ||
Line 58: | Line 58: | ||
'''p-value''' is the p-value based on a likelihood ratio test | '''p-value''' is the p-value based on a likelihood ratio test | ||
==Model== | |||
Probability of genotypes without assumption of HWE | |||
<math> | |||
\begin{align} | |||
p(G=0|f,F) &= (1-f)^2+f(1-f)F \\ | |||
p(G=1|f,F) &= 2f(1-f)-2f(1-f)F \\ | |||
p(G=2|f,F) &= f^2 +f(1-f)F | |||
\end{align} | |||
</math> | |||
;n: total number of individuals | |||
;X: all sequencing data for a site | |||
;f: allele frequency | |||
;F: inbreeding coefficient* | |||
;G: true unobserved genotype | |||
total likelihood | |||
<math> | |||
p(X|f,F)\sim\prod_i^np(X_i|f,F)=\prod_i^n\sum_{G\in \{0,1,2\}}p(X_i|G)p(G|f,F) | |||
</math> | |||
*NB! we allow for negative values of F in order to be able to detect any divination from HWE. |
Revision as of 16:16, 10 April 2016
Test for Hardy Weinberg equilibrium based on genotype likelihoods. This class works both as a filter for all other classes and outputs the results in a file.
If you want to estimate inbreeding for individuals or include inbreeding information in your analysis try HWE_and_Inbreeding_estimates.
Brief Overview
./angsd -HWE_pval -> angsd version: 0.911-12-gddb6f5f-dirty (htslib: 1.3-1-gc72ae90) build(Apr 10 2016 16:36:30) -> Analysis helpbox/synopsis information: -> Command: ../angsd/angsd -HWE_pval -> Sun Apr 10 16:53:24 2016 ------------- abcHWE.cpp: -HWE_pval 0.000000
Use as a filter
- -HWE_pval [float]
p-value threshold. The value must be above 0 and a maximum of 1. Sites with a p-value below this threshold will be removed.
Output
This function will also print the results of the selected sites. If you choose -HWE_pval 1 then all sites (that pass other filters) will be outputted.
Example of output *.hwe.gz
Chromo Position Major Minor hweFreq Freq F LRT p-value 1 14000873 G A 0.282473 0.263594 0.674624 3.140936e+00 7.634997e-02 1 14015890 A G 0.283119 0.300032 0.999762 8.207572e+00 4.171594e-03 1 14018430 A C 0.276112 0.299817 0.675018 2.780118e+00 9.544113e-02 1 14033343 A G 0.295368 0.299442 0.999762 6.473824e+00 1.094747e-02 1 14037881 T A 0.306003 0.341598 -0.518384 3.178415e+00 7.461710e-02 1 14038946 T C 0.329113 0.333424 0.999775 6.925424e+00 8.497884e-03
Chromo is the chromosome
Position is the position Major is the major allele
Minor is the minor allele
hweFreq is the allele frequency assuming HWE (same as -doMaf 1)
Freq is the allele frequency without HWE assumption
F is the scale departure from HWE (inbreeding coefficient - see model)
LRT is the likelihood ratio statistic
p-value is the p-value based on a likelihood ratio test
Model
Probability of genotypes without assumption of HWE
- n
- total number of individuals
- X
- all sequencing data for a site
- f
- allele frequency
- F
- inbreeding coefficient*
- G
- true unobserved genotype
total likelihood
- NB! we allow for negative values of F in order to be able to detect any divination from HWE.