ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Genotype calling
The program can do genotype calling based either on the genotype til the highest likelihood or by using the frequency as a prior(recommended see Kim2011).
options
- -doGeno [int]
1: print out major minor
2: print the called genotype as --1,0,1,2
4: print the called genotype as AA, AC, AG, ...
8: print all 3 posts (major,major),(major,minor),(minor,minor)
16: print the posterior of the called genotype
32: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double
Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual
- -doPost [int]
1: estimate the posterior genotype probability based on the allele frequency as a prior
2: estimate the posterior genotype probability assuming a uniform prior
- -postCutoff [float]
Call only a genotype with a posterior above this threshold.
NB if the raw posterior dump is requested the -postCutoff is not used
example
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1 -minLRT 24 -doGeno 5 -doPost 1 -postCutoff 0.95
gives a output like this:
1 14000202 G A GG NN NN GA NN 1 14000873 G A GG GG GG AA GA 1 14001018 T C NN NN NN CC NN 1 14001867 A G NN AA AA NN NN 1 14002342 C T CC CC CC CC CC 1 14002422 A T AA NN NN NN NN 1 14002474 T C TC TT TT TT TT 1 14003581 C T CC CC NN NN CT 1 14004623 T C TT TT TT NN TC 1 14005069 A G AA AA AA AA AA