ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Abbababa: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
Line 49: Line 49:
head -n5 smallBam.filelist > smallerBam.filelist
head -n5 smallBam.filelist > smallerBam.filelist
./angsd -out out -doAbbababa 1 -bam smallerBam.filelist -doCounts 1 -anc /space/genomes/refgenomes/ancestral/hg19/fasta/hg19ancNoChr.fa
./angsd -out out -doAbbababa 1 -bam smallerBam.filelist -doCounts 1 -anc /space/genomes/refgenomes/ancestral/hg19/fasta/hg19ancNoChr.fa
Rscript file=out.abbababa indNames=smallerBam.filelist
Rscript R/jackKnife.R file=out.abbababa indNames=smallerBam.filelist outfile=out
</pre>
</pre>
This results in a out.txt file with all the results.

Revision as of 17:15, 2 December 2013

Available from version 0.559+.

performs the abbababa test also called the D-statistic. This tests for ancient admixture (or wrong tree topology)

<classdiagram type="dir:LR">

[BAM files{bg:orange}]->[Sequence data|Random base]

[sequence data]->[*.abbababa|ABBA and BABA couts file{bg:blue}] </classdiagram>


<classdiagram type="dir:LR"> [*.abbababa|ABBA and BABA couts file{bg:blue}]->jackKnife.R[D stat and Z scores{bg:blue}] </classdiagram>

Brief Overview

> ./angsd -doAbbababa

--------------
analysisAbbababa.cpp:
	-doAbbababa	0
	1: use a random base
	-rmTrans		0	remove transitions
	-blockSize		5000000	number of based in a block

This function will counts the number of ABBA and BABA sites

Options

-doAbbababa 1
sample a random base at each position.
-rmTrans

Remove transitions (important for ancient DNA)

-blockSize [INT]

Size of each block. Choose a number that is higher than the LD in the populations. For human 5Mb (5000000) is usually used.

-anc [fileName.fa]

Include an outgroup in fasta format.

-doCounts 1

use -doCounts 1 in order to count the bases at each sites after filters.

Output

  • .abbbababa

Output: Each lines represents a block with a chromsome name (Column 1), a start position (Column 2), an end postion (Column 3). The new columns are the counts of ABBA and BABA sites. For each combination of 3 individuals (H1,H2,H3) two columns are printed. These number served as input to the R script called jackKnife.R

Example

Create a fasta file bases from a random samples of bases.

head -n5 smallBam.filelist > smallerBam.filelist
./angsd -out out -doAbbababa 1 -bam smallerBam.filelist -doCounts 1 -anc /space/genomes/refgenomes/ancestral/hg19/fasta/hg19ancNoChr.fa
Rscript R/jackKnife.R file=out.abbababa indNames=smallerBam.filelist outfile=out

This results in a out.txt file with all the results.