ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Abbababa: Difference between revisions
(→output) |
(→output) |
||
Line 69: | Line 69: | ||
'''H1 H2 H3''' are the 3 individuals in the tree that are not the outgroup. H1 and H2 are the ingroup see tree | '''H1 H2 H3''' are the 3 individuals in the tree that are not the outgroup. H1 and H2 are the ingroup see tree | ||
'''nABBA''' the total counts of ABBA patterns | '''nABBA''' the total counts of ABBA patterns | ||
'''nBABA''' the total counts of BABA patterns | '''nBABA''' the total counts of BABA patterns | ||
'''Dstat''' The test statistic: (nABBA-nBABA)/(nABBA+nBABA). A negative value means that H1 is closer to H3 than H2 is. A positive value means that H2 is closer to H3 than H1 is. | '''Dstat''' The test statistic: (nABBA-nBABA)/(nABBA+nBABA). A negative value means that H1 is closer to H3 than H2 is. A positive value means that H2 is closer to H3 than H1 is. | ||
'''JackEst''' column is another estimate of the abbababa statistic that is bias corrected. This value is extremely similar to the value in the Dstat column | '''JackEst''' column is another estimate of the abbababa statistic that is bias corrected. This value is extremely similar to the value in the Dstat column | ||
'''SE''' is the estimated m-delete blocked Jackknife Standard error of the estimate used to obtain the Z value | '''SE''' is the estimated m-delete blocked Jackknife Standard error of the estimate used to obtain the Z value | ||
'''Z''' Z value that can be used to determine the significance of the test. As in Reich et al. an absolute value of the Z score above 3 is often used as a critical value. However, this note that this does not take into account the fact that we perform multiple tests. | '''Z''' Z value that can be used to determine the significance of the test. As in Reich et al. an absolute value of the Z score above 3 is often used as a critical value. However, this note that this does not take into account the fact that we perform multiple tests. |
Revision as of 17:28, 2 December 2013
Available from version 0.559+.
performs the abbababa test also called the D-statistic. This tests for ancient admixture (or wrong tree topology)
<classdiagram type="dir:LR">
[BAM files{bg:orange}]->[Sequence data|Random base]
[sequence data]->[*.abbababa|ABBA and BABA couts file{bg:blue}] </classdiagram>
<classdiagram type="dir:LR">
[*.abbababa|ABBA and BABA couts file{bg:blue}]->jackKnife.R[D stat and Z scores{bg:blue}]
</classdiagram>
Brief Overview
> ./angsd -doAbbababa -------------- analysisAbbababa.cpp: -doAbbababa 0 1: use a random base -rmTrans 0 remove transitions -blockSize 5000000 number of based in a block
This function will counts the number of ABBA and BABA sites
Options
- -doAbbababa 1
- sample a random base at each position.
- -rmTrans
Remove transitions (important for ancient DNA)
- -blockSize [INT]
Size of each block. Choose a number that is higher than the LD in the populations. For human 5Mb (5000000) is usually used.
- -anc [fileName.fa]
Include an outgroup in fasta format.
- -doCounts 1
use -doCounts 1 in order to count the bases at each sites after filters.
Output
- .abbbababa
Output: Each lines represents a block with a chromsome name (Column 1), a start position (Column 2), an end postion (Column 3). The new columns are the counts of ABBA and BABA sites. For each combination of 3 individuals (H1,H2,H3) two columns are printed. These number served as input to the R script called jackKnife.R
Example
Create a fasta file bases from a random samples of bases.
head -n5 smallBam.filelist > smallerBam.filelist ./angsd -out out -doAbbababa 1 -bam smallerBam.filelist -doCounts 1 -anc /space/genomes/refgenomes/ancestral/hg19/fasta/hg19ancNoChr.fa Rscript R/jackKnife.R file=out.abbababa indNames=smallerBam.filelist outfile=out
This results in a out.txt file with all the results.
output
H1 H2 H3 nABBA nBABA Dstat jackEst SE Z NA11830 NA12004 NA12763 269 322 -0.08967851 -0.08967851 0.09006086 -0.9957545 NA11830 NA06985 NA12763 267 298 -0.05486726 -0.05486726 0.122256 -0.4487898 NA12004 NA06985 NA12763 254 243 0.0221328 0.0221328 0.1386198 0.1596655 NA11830 NA11993 NA12763 225 336 -0.197861 -0.197861 0.08514797 -2.323731 NA12004 NA11993 NA12763 217 267 -0.1033058 -0.1033058 0.09471542 -1.090697 NA06985 NA11993 NA12763 242 302 -0.1102941 -0.1102941 0.1241554 -0.8883553 NA12763 NA12004 NA11830 237 322 -0.1520572 -0.1520572 0.1047361 -1.451813 NA12763 NA06985 NA11830 219 298 -0.1528046 -0.1528046 0.1115283 -1.370098
H1 H2 H3 are the 3 individuals in the tree that are not the outgroup. H1 and H2 are the ingroup see tree
nABBA the total counts of ABBA patterns
nBABA the total counts of BABA patterns
Dstat The test statistic: (nABBA-nBABA)/(nABBA+nBABA). A negative value means that H1 is closer to H3 than H2 is. A positive value means that H2 is closer to H3 than H1 is.
JackEst column is another estimate of the abbababa statistic that is bias corrected. This value is extremely similar to the value in the Dstat column
SE is the estimated m-delete blocked Jackknife Standard error of the estimate used to obtain the Z value
Z Z value that can be used to determine the significance of the test. As in Reich et al. an absolute value of the Z score above 3 is often used as a critical value. However, this note that this does not take into account the fact that we perform multiple tests.