ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
HWE test: Difference between revisions
No edit summary |
|||
(13 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Test for Hardy Weinberg equilibrium based on genotype likelihoods. This class works both as a filter for all other classes and outputs the results in a file. | Test for Hardy Weinberg equilibrium based on genotype likelihoods. This class works both as a filter for all other classes and outputs the results in a file. | ||
This function has been updated to allow for all kinds of deviations not just F>0. This approach works from version '''0.912''' and in the latest developmental version from [https://github.com/ANGSD/angsd github] | |||
If you want to estimate inbreeding for individuals or include inbreeding information in your analysis try [[HWE_and_Inbreeding_estimates]]. | If you want to estimate inbreeding for individuals or include inbreeding information in your analysis try [[HWE_and_Inbreeding_estimates]]. | ||
==options== | |||
;-doHWE [int] | |||
Estimate the divination from HWE for each site | |||
;-doMajorMinor [int] | |||
Method only works for diallelic sites. There choose a methods for selecting the major and minor allele (see [[Inferring_Major_and_Minor_alleles]]) | |||
;-minHWEpval [float] | |||
Remove sites with a pvalue below this threshold i.e. only use with that does not violate HWE | |||
;-maxHWEpval [float] | |||
Remove sites with a pvalue above this threshold. E.g. -maxHWEpval 1e-3 print only sites that violates HWE with a pvalue of less than 0.001 | |||
==example== | |||
<pre> | <pre> | ||
angsd -bam bam.filelist -doHWE 1 -domajorminor 1 -GL 1 | |||
</pre> | </pre> | ||
most of the time it only makes sense to do for the variable site e.g. | |||
<pre> | |||
angsd -bam bam.filelist -doHWE 1 -domajorminor 1 -GL 1 -doMaf 1 -SNP_pval 1e-6 | |||
</pre> | |||
==Use as a filter== | ==Use as a filter== | ||
see [[snpFilters]], -minHWEpval or -maxHWEpval | |||
==Output== | ==Output== | ||
This function will also print the results of the selected sites | This function will also print the results of the selected sites. | ||
<div class="toccolours mw-collapsible mw-collapsed"> | <div class="toccolours mw-collapsible mw-collapsed"> | ||
Example of output *.hwe.gz | Example of output *.hwe.gz |
Latest revision as of 14:32, 23 October 2017
Test for Hardy Weinberg equilibrium based on genotype likelihoods. This class works both as a filter for all other classes and outputs the results in a file.
This function has been updated to allow for all kinds of deviations not just F>0. This approach works from version 0.912 and in the latest developmental version from github
If you want to estimate inbreeding for individuals or include inbreeding information in your analysis try HWE_and_Inbreeding_estimates.
options
- -doHWE [int]
Estimate the divination from HWE for each site
- -doMajorMinor [int]
Method only works for diallelic sites. There choose a methods for selecting the major and minor allele (see Inferring_Major_and_Minor_alleles)
- -minHWEpval [float]
Remove sites with a pvalue below this threshold i.e. only use with that does not violate HWE
- -maxHWEpval [float]
Remove sites with a pvalue above this threshold. E.g. -maxHWEpval 1e-3 print only sites that violates HWE with a pvalue of less than 0.001
example
angsd -bam bam.filelist -doHWE 1 -domajorminor 1 -GL 1
most of the time it only makes sense to do for the variable site e.g.
angsd -bam bam.filelist -doHWE 1 -domajorminor 1 -GL 1 -doMaf 1 -SNP_pval 1e-6
Use as a filter
see snpFilters, -minHWEpval or -maxHWEpval
Output
This function will also print the results of the selected sites.
Example of output *.hwe.gz
Chromo Position Major Minor hweFreq Freq F LRT p-value 1 14000873 G A 0.282473 0.263594 0.674624 3.140936e+00 7.634997e-02 1 14015890 A G 0.283119 0.300032 0.999762 8.207572e+00 4.171594e-03 1 14018430 A C 0.276112 0.299817 0.675018 2.780118e+00 9.544113e-02 1 14033343 A G 0.295368 0.299442 0.999762 6.473824e+00 1.094747e-02 1 14037881 T A 0.306003 0.341598 -0.518384 3.178415e+00 7.461710e-02 1 14038946 T C 0.329113 0.333424 0.999775 6.925424e+00 8.497884e-03
Chromo is the chromosome
Position is the position Major is the major allele
Minor is the minor allele
hweFreq is the allele frequency assuming HWE (same as -doMaf 1)
Freq is the allele frequency without HWE assumption
F is the scale departure from HWE (inbreeding coefficient - see model)
LRT is the likelihood ratio statistic
p-value is the p-value based on a likelihood ratio test
Model
Probability of genotypes without assumption of HWE
- n
- total number of individuals
- X
- all sequencing data for a site
- f
- allele frequency
- F
- inbreeding coefficient*
- G
- true unobserved genotype
total likelihood
- NB! we allow for negative values of F in order to be able to detect any divination from HWE.