ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Haploid calling
Simple haploid output based on sampling or consensus.
<classdiagram type="dir:LR">
[BAM files{bg:orange}]->[Sequence data|Random base;Consensus base]
[sequence data]->[*.haplo.gz|single base file{bg:blue}] </classdiagram>
Brief Overview
> ./angsd -doHaploCall -> angsd version: 0.910-45-g2b2b4f0-dirty (htslib: 1.2.1-192-ge7e2b3d) build(Jan 3 2016 14:45:41) -> Analysis helpbox/synopsis information: -> Command: ./angsd -doHaploCall -> Sun Jan 3 15:18:15 2016 -------------- abcHaploCall.cpp: -doHaploCall 0 (Sampling strategies) 0: no haploid calling 1: (Sample single base) 2: (Concensus base) -doCounts 0 Must choose -doCount 1 Optional -minMinor 0 Minimum observed minor alleles -maxMis -1 Maximum missing bases (per site)
This function outputs a base for each individual for each site
Options
- -doHaploCall [int]
1; sample a random base 2; most frequent base. Random base for ties
- -doCounts 1
use -doCounts 1 in order to count the bases at each sites after filters.
- -minMinor [int]
Minimum observed minor alleles; only prints sites with more than minMinor sampled alleles (across individuals).
- -maxMis [int]
maximum allowed missing alleles (accross individuals). -maxMis 0 means only sites without missing alleles are printed
Output
- .haplo.gz
Output: Each line represents site. chromsome name (Column 1), position (Column 2), major allele (Column 3). One column for each individual with the sampled allele.
Example
Create a fasta file bases from a random samples of bases.
./angsd -bam bam.filelist -dohaplocall 1 -doCounts 1 -r 1: -minMinor 1
Output
1 14094607 C C N N C C C T 1 14094618 C C N N C G C N 1 14094619 G C N N G N G G 1 14094628 C G N N N C N G 1 14094784 G G G T T N G G 1 14095072 A A N A A A A C 1 14095751 C C C C C N T C 1 14095773 G G G G G G N T 1 14095992 C C A N N C A C 1 14096030 C C C N A N C N 1 14096362 G T G G G G G G 1 14096635 A T A A N A N N 1 14096717 C C N C C C C N 1 14097480 A A G A A A A A 1 14097899 T T T G T T G T 1 14098042 G T N G T T G T 1 14098127 A A N C A N A A 1 14098140 G G N G G G N G 1 14098148 C A N C C C N C 1 14098346 T T T T G T G G 1 14098792 T T N T A N T N 1 14099223 G G G T G G G G