ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Abbababa
Available from version 0.559+.
performs the abbababa test also called the D-statistic. This tests for ancient admixture (or wrong tree topology)
<classdiagram type="dir:LR">
[Single BAM file{bg:orange}]->[Sequence data|Random base (-doAbbababa 1)]
[sequence data]->doAbbababa[ABBA and BABA couts file{bg:blue}] [ABBA and BABA couts file{bg:blue}]->jackKnife.R[D stat and Z scores{bg:blue}]
</classdiagram>
This can be used as input for the ANGSD analysis:
Brief Overview
> ./angsd -doFasta -------------- nalysisFasta.cpp: -doFasta 0 1: use a random base 2: use the most common base (needs -doCounts 1) 3: use the base with highest ebd (under development) -minQ 13 (remove bases with qscore<minQ) -basesPerLine 50 (Number of bases perline in output file)
This function will dump a fasta file, the full header information from the SAM/BAM file will be used. This means that a fasta will be generated for ALL entries in the header even if '-r/-rf -filter' is used.
The EBD is the effective base depth, as defined by [1]:
For four bases we have 4 different EBD, each EBD is the product of the mapping quality and scores for the base under consideration.
Options
- -doFasta 1
- sample a random base at each position.
- -doFasta 2
- use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. The "-doCounts 1" options for allele counts is needed in order to determine the most common base.
- -minQ [INT]
minimum base quality score.
Output
Output is a fasta file, a normal looking fast file. Nothing special about this. For -doFasta 1, sometimes its big letters sometime small letters. This is due to the results being copied directly from the sequencing data. So small/big letters correspond to which strand for the original data. For the consensus fasta all letters are capital letters.
Example
Create a fasta file bases from a random samples of bases.
./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1