ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Abbababa: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
No edit summary
Line 4: Line 4:


<classdiagram type="dir:LR">
<classdiagram type="dir:LR">
  [Single BAM file{bg:orange}]->[Sequence data|Random base (-doAbbababa 1)]
  [BAM files{bg:orange}]->[Sequence data|Random base]
[sequence data]->[*.abbababa|ABBA and BABA couts file{bg:blue}]
[sequence data]->[*.abbababa|ABBA and BABA couts file{bg:blue}]
</classdiagram>
</classdiagram>
Line 29: Line 29:


=Options=
=Options=
;-doFasta 1: sample a random base at each position.
;-doAbbababa 1: sample a random base at each position.


;-minQ [INT]  
;-rmTrans 
minimum base quality score.
Remove transitions (important for ancient DNA)
;-blockSize [INT]
Size of each block. Choose a number that is higher than the LD in the populations. For human 5Mb (5000000) is usually used.
; -anc [fileName.fa]
Include an outgroup in fasta format.
; -doCounts 1
use -doCounts 1 in order to count the bases at each sites after filters.  


=Output=
=Output=
Output is a fasta file, a normal looking fast file. Nothing special about this. For -doFasta 1, sometimes its big letters sometime small letters. This is due to the results being copied directly from the sequencing data. So small/big letters correspond to which strand for the original data. For the consensus fasta all letters are capital letters.
;*.abbbababa
 
Output: Each lines represents a block with a chromsome name (Column 1), a start position (Column 2), an end postion (Column 3). The new columns are the counts of ABBA and BABA sites. For each combination of 3 individuals (H1,H2,H3) two columns are printed. These number served as input to the R script called jackKnife.R
==Example==
==Example==
Create a fasta file bases from a random samples of bases.
Create a fasta file bases from a random samples of bases.


<pre>
<pre>
./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1
head -n5 smallBam.filelist > smallerBam.filelist
./angsd -out out -doAbbababa 1 -bam smallerBam.filelist -doCounts 1 -anc /space/genomes/refgenomes/ancestral/hg19/fasta/hg19ancNoChr.fa
Rscript file=out.abbababa indNames=smallerBam.filelist
</pre>
</pre>

Revision as of 17:13, 2 December 2013

Available from version 0.559+.

performs the abbababa test also called the D-statistic. This tests for ancient admixture (or wrong tree topology)

<classdiagram type="dir:LR">

[BAM files{bg:orange}]->[Sequence data|Random base]

[sequence data]->[*.abbababa|ABBA and BABA couts file{bg:blue}] </classdiagram>


<classdiagram type="dir:LR"> [*.abbababa|ABBA and BABA couts file{bg:blue}]->jackKnife.R[D stat and Z scores{bg:blue}] </classdiagram>

Brief Overview

> ./angsd -doAbbababa

--------------
analysisAbbababa.cpp:
	-doAbbababa	0
	1: use a random base
	-rmTrans		0	remove transitions
	-blockSize		5000000	number of based in a block

This function will counts the number of ABBA and BABA sites

Options

-doAbbababa 1
sample a random base at each position.
-rmTrans

Remove transitions (important for ancient DNA)

-blockSize [INT]

Size of each block. Choose a number that is higher than the LD in the populations. For human 5Mb (5000000) is usually used.

-anc [fileName.fa]

Include an outgroup in fasta format.

-doCounts 1

use -doCounts 1 in order to count the bases at each sites after filters.

Output

  • .abbbababa

Output: Each lines represents a block with a chromsome name (Column 1), a start position (Column 2), an end postion (Column 3). The new columns are the counts of ABBA and BABA sites. For each combination of 3 individuals (H1,H2,H3) two columns are printed. These number served as input to the R script called jackKnife.R

Example

Create a fasta file bases from a random samples of bases.

head -n5 smallBam.filelist > smallerBam.filelist
./angsd -out out -doAbbababa 1 -bam smallerBam.filelist -doCounts 1 -anc /space/genomes/refgenomes/ancestral/hg19/fasta/hg19ancNoChr.fa
Rscript file=out.abbababa indNames=smallerBam.filelist