ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Thetas,Tajima,Neutrality tests
This method will estimate different thetas (population scaled mutation rate) and can based on these thetas calculate Tajima's D and various other neutrality test statistics. Method is described in Korneliussen2013.
- NB Information on this website is for version 0.551 or higher.
- NB The Korneliussen2013 covers two methods,
## using an ML method ## using the emperical bayes method. The information on this page relates to the EB method. For performing the ML method, you should the realSFS method and define the region under interest.
Example
Below is a chain of commands used for caculating statistics. Its a 3 step procedure
- Estimate an site frequency spectrum. Output is .em.ml file.
- Calculate per-site thetas. Output is a .thetas.gz file.
- Calculate neutrality tests statistics. Output is a .thetas.gz.pestPG file.
#(estimate an SFS) ../angsd0.551/angsd -bam pop1.list -out bingo -realSFS 1 ../angsd0.551/misc/emOptim2 bingo.sfs 40 -P 24 >bingo.em.ml #(calculate thetas) ../angsd0.551/angsd -bam pop1.list -out bongo -doThetas 1 -realSFS 1 -pest bingo.em.ml #(calculate Tajimas.) ../angsd0.551/misc/bgid make_bed taj/aout2.thetas.gz ../angsd0.551/misc/bgid do_stat taj/aout2.thetas.gz -nChr 40
Sliding Window example
We can easily do a sliding window analysis by adding -win/-step arguments to the last command
../angsd0.551/misc/bgid do_stat taj/aout2.thetas.gz -nChr 40 -win 50000 -step 10000
This will calculate the test statistic using a window size of 50kb and a step size of 10kb.
Example Output
.thetas.gz is
chr10 1 -7.041327 -7.362126 -6.399318 -8.788662 -7.840050 chr10 2 -8.337345 -8.921354 -7.358819 -11.052126 -9.502293 chr10 3 -9.283203 -9.945923 -8.207280 -12.281002 -10.546671 chr10 4 -1.105216 -0.617477 -35.602360 -0.730974 -0.672616 chr10 5 -7.705427 -8.374287 -6.622234 -10.726430 -8.976529 chr10 6 -11.683662 -12.369866 -10.578860 -14.766857 -12.975926 chr10 7 -11.688400 -12.374647 -10.583545 -14.771753 -12.980717 chr10 8 -12.989104 -13.675391 -11.884201 -16.072602 -14.281470 chr10 9 -13.682495 -14.368910 -12.577435 -16.766463 -14.975017 chr10 10 -1.105216 -0.655518 -37.649835 -0.982402 -0.805662 chr10 11 -14.433381 -15.119890 -13.328206 -17.517696 -15.726019 chr10 12 -14.410507 -15.097004 -13.305348 -17.494774 -15.703129 chr10 13 -14.963982 -15.650467 -13.858836 -18.048209 -16.256590 chr10 14 -15.657183 -16.343698 -14.552002 -18.741516 -16.949827 chr10 15 -15.762558 -16.449085 -14.657361 -18.846938 -17.055218 chr10 16 -1.105216 -0.766882 -12.373455 -1.539395 -1.080326 chr10 17 -15.876465 -16.563007 -14.771249 -18.960901 -17.169143 chr10 18 -17.256944 -17.943487 -16.151728 -20.341382 -18.549623
- 1. chromosome
- 2. position
- 3. ThetaWatterson
- 4. ThetaD (nucleotide diversity)
- 5. Theta? (singleton category)
- 6. ThetaL
- 7. ThetaH
.thetas.gz.pestPG
(59999,69999)(60000,70000)(60000,70000) chr1 65000 2349.039592 2008.865974 2791.401569 3817.828656 2913.347320 -0.545594 -0.626967 -0.486984 -0.617873 0.195337 10000 (69999,79999)(70000,80000)(70000,80000) chr1 75000 2349.113388 1993.792014 2764.051812 3979.987797 2986.889940 -0.569871 -0.617112 -0.456779 -0.678388 0.220762 10000 (79999,89999)(80000,90000)(80000,90000) chr1 85000 2349.154140 2035.577279 2649.132059 3902.254435 2968.915852 -0.502912 -0.491556 -0.330221 -0.637555 0.214522 10000 (89999,99999)(90000,100000)(90000,100000) chr1 95000 2349.462773 2048.143641 2533.193917 3881.554872 2964.849262 -0.483190 -0.388552 -0.202228 -0.626111 0.212980 10000 (99999,109999)(100000,110000)(100000,110000) chr1 105000 2349.306947 2103.402129 2608.611593 3738.658529 2921.030347 -0.394355 -0.404727 -0.285429 -0.558478 0.197881 10000 (109999,119999)(110000,120000)(110000,120000) chr1 115000 2348.965451 1867.325681 2725.815492 4491.310734 3179.318214 -0.772512 -0.687843 -0.414876 -0.896283 0.287438 10000 (119999,129999)(120000,130000)(120000,130000) chr1 125000 2349.437816 2077.636124 2623.517860 3755.631838 2916.633993 -0.435861 -0.437286 -0.301676 -0.573043 0.196304 10000
Format is:
(indexStart,indexStop)(posStart,posStop) chrname wincenter tW tP tF tH tL tajD fulif fuliD fayH zengsE