ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Genotype Distribution
Works from version 0.913 and above. The latest developmental version can be found here github
This method allow for estimation of the expected genotype count or fractions for one or two individuals based on genotype likelihoods. Examples of genotypes fraction for a single individual
all 10 possible genotypes
pAA | pAC | pAG | pAT | pCC | pCG | pCT | pGG | pGT | pTT |
---|---|---|---|---|---|---|---|---|---|
0.293 | 9.3e-05 | 0.000331 | 7.3e-05 | 0.2 | 7.7e-05 | 0.000411 | 0.204 | 7e-05 | 0.302 |
number of derived alleles
pAA | pAD | pDD |
---|---|---|
0.9986 | 0.0003168 | 0.001127 |
or homozygoes vs. heterogoes
pHO | pHE |
---|---|
0.9987 | 0.0003168 |
For two individuals it could be the full 10x10 possible genotype combination
Example of 10x10 genotype probability
AA AC AG AT CC CG CT GG GT TT AA 0.0420 0.0130 0.0200 0.0170 0.0160 0.0170 0.0150 0.0240 0.0042 0.0500 AC 0.0030 0.0034 0.0071 0.0067 0.0074 0.0071 0.0065 0.0074 0.0032 0.0038 AG 0.0030 0.0033 0.0068 0.0064 0.0070 0.0068 0.0061 0.0070 0.0028 0.0034 AT 0.0071 0.0084 0.0110 0.0110 0.0110 0.0110 0.0100 0.0120 0.0072 0.0084 CC 0.0180 0.0045 0.0110 0.0100 0.0092 0.0100 0.0089 0.0140 0.0016 0.0240 CG 0.0015 0.0018 0.0061 0.0061 0.0067 0.0063 0.0060 0.0067 0.0019 0.0015 CT 0.0029 0.0032 0.0068 0.0064 0.0070 0.0067 0.0060 0.0069 0.0027 0.0033 GG 0.0180 0.0054 0.0110 0.0096 0.0088 0.0094 0.0085 0.0120 0.0012 0.0200 GT 0.0029 0.0033 0.0069 0.0066 0.0072 0.0070 0.0062 0.0071 0.0027 0.0031 TT 0.0400 0.0130 0.0200 0.0170 0.0150 0.0170 0.0150 0.0240 0.0038 0.0480
or the number of derived alleles
ind2 | |||
---|---|---|---|
ind1 | pAA | pAD | pDD |
pAA | 0.6561 | 0.1458 | 0.0081 |
pAD | 0.1458 | 0.0324 | 0.0018 |
pDD | 0.0081 | 0.0018 | 0.0001 |
or the heterozygoes and homozygoes
HO HO | HO HE | HE HO | HE HE | HO altHO |
---|---|---|---|---|
0.6562 | 0.1476 | 0.1476 | 0.0324 | 0.0162 |
Brief Overview
./angsd -HWE_pval -> angsd version: 0.911-12-gddb6f5f-dirty (htslib: 1.3-1-gc72ae90) build(Apr 10 2016 16:36:30) -> Analysis helpbox/synopsis information: -> Command: ../angsd/angsd -HWE_pval -> Sun Apr 10 16:53:24 2016 ------------- abcHWE.cpp: -HWE_pval 0.000000
Options
- -HWE_pval [float]
p-value threshold. The value must be above 0 and a maximum of 1.
- -doMajorMinor [int]
Method only works for diallelic sites. There choose a methods for selecting the major and minor allele (see Inferring_Major_and_Minor_alleles)
Use as a filter
Sites with a p-value below the p-value threshold will be removed.
Output
This function will also print the results of the selected sites. If you choose -HWE_pval 1 then all sites (that pass other filters) will be outputted.
Example of output *.hwe.gz
Chromo Position Major Minor hweFreq Freq F LRT p-value 1 14000873 G A 0.282473 0.263594 0.674624 3.140936e+00 7.634997e-02 1 14015890 A G 0.283119 0.300032 0.999762 8.207572e+00 4.171594e-03 1 14018430 A C 0.276112 0.299817 0.675018 2.780118e+00 9.544113e-02 1 14033343 A G 0.295368 0.299442 0.999762 6.473824e+00 1.094747e-02 1 14037881 T A 0.306003 0.341598 -0.518384 3.178415e+00 7.461710e-02 1 14038946 T C 0.329113 0.333424 0.999775 6.925424e+00 8.497884e-03
Chromo is the chromosome
Position is the position Major is the major allele
Minor is the minor allele
hweFreq is the allele frequency assuming HWE (same as -doMaf 1)
Freq is the allele frequency without HWE assumption
F is the scale departure from HWE (inbreeding coefficient - see model)
LRT is the likelihood ratio statistic
p-value is the p-value based on a likelihood ratio test
Model
Probability of genotypes without assumption of HWE
- n
- total number of individuals
- X
- all sequencing data for a site
- f
- allele frequency
- F
- inbreeding coefficient*
- G
- true unobserved genotype
total likelihood
- NB! we allow for negative values of F in order to be able to detect any divination from HWE.