ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Abbababa: Difference between revisions

From angsd
Jump to navigation Jump to search
(Created page with "Available from version 0.559+. performs the abbababa test also called the D-statistic. This tests for ancient admixture (or wrong tree topology) <classdiagram type="dir:LR">...")
 
No edit summary
Line 7: Line 7:
[sequence data]->doAbbababa[ABBA and BABA couts file{bg:blue}]
[sequence data]->doAbbababa[ABBA and BABA couts file{bg:blue}]
[ABBA and BABA couts file{bg:blue}]->jackKnife.R[D stat and Z scores{bg:blue}]
[ABBA and BABA couts file{bg:blue}]->jackKnife.R[D stat and Z scores{bg:blue}]
 
</classdiagram>
</classdiagram>
 
This can be used as input for the ANGSD analysis:
# [[Error estimation]]
# [[ABBA-BABA]]
 


=Brief Overview=
=Brief Overview=
<pre>
<pre>
> ./angsd -doFasta
> ./angsd -doAbbababa
 
--------------
--------------
nalysisFasta.cpp:
analysisAbbababa.cpp:
-doFasta 0
-doAbbababa 0
1: use a random base
1: use a random base
2: use the most common base (needs -doCounts 1)
-rmTrans 0 remove transitions
3: use the base with highest ebd (under development)
-blockSize 5000000 number of based in a block
-minQ 13 (remove bases with qscore<minQ)
 
-basesPerLine 50 (Number of bases perline in output file)
</pre>
</pre>


This function will dump a fasta file, the full header information from the SAM/BAM file will be used. This means that a fasta will be generated for ALL entries in the header even if '-r/-rf -filter' is used.
This function will counts the number of ABBA and BABA sites
 
The EBD is the effective base depth, as defined by [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3638139/]:
 
<math>
EBD_A = \sum_{b_i=A}^N (phred(mapq_i)*phred(qscore_i)),\qquad phred(q) =10^{-q/10} 
</math>
 
For four bases we have 4 different EBD, each EBD is the product of the mapping quality and scores for the base under consideration.


=Options=
=Options=
;-doFasta 1: sample a random base at each position.
;-doFasta 1: sample a random base at each position.
;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. The "-doCounts 1" options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base.


;-minQ [INT]  
;-minQ [INT]  

Revision as of 15:11, 2 December 2013

Available from version 0.559+.

performs the abbababa test also called the D-statistic. This tests for ancient admixture (or wrong tree topology)

<classdiagram type="dir:LR">

[Single BAM file{bg:orange}]->[Sequence data|Random base (-doAbbababa 1)]

[sequence data]->doAbbababa[ABBA and BABA couts file{bg:blue}] [ABBA and BABA couts file{bg:blue}]->jackKnife.R[D stat and Z scores{bg:blue}] </classdiagram>

Brief Overview

> ./angsd -doAbbababa

--------------
analysisAbbababa.cpp:
	-doAbbababa	0
	1: use a random base
	-rmTrans		0	remove transitions
	-blockSize		5000000	number of based in a block

This function will counts the number of ABBA and BABA sites

Options

-doFasta 1
sample a random base at each position.
-minQ [INT]

minimum base quality score.

Output

Output is a fasta file, a normal looking fast file. Nothing special about this. For -doFasta 1, sometimes its big letters sometime small letters. This is due to the results being copied directly from the sequencing data. So small/big letters correspond to which strand for the original data. For the consensus fasta all letters are capital letters.

Example

Create a fasta file bases from a random samples of bases.

./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1