ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Genotype calling: Difference between revisions
Line 1: | Line 1: | ||
=Genotype calling= | |||
The program can do genotype calling | The program can do genotype calling based either on the genotype til the highest likelihood or by using the frequency as a prior(recommended see [[Kim2011]]). | ||
==options== | |||
;-doGeno [int] | |||
1: print out major minor | |||
2: print the called genotype as 0,1,2 | |||
4: print the called genotype as AA, AC, AG, ... | |||
8: print all 3 posts (major,major),(major,minor),(minor,minor) | |||
16: print the posterior of the called genotype | |||
32: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double | |||
Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual | |||
;- | ; -doPost [int] | ||
1: estimate the posterior genotype probability based on the allele frequency as a prior | |||
2: estimate the posterior genotype probability assuming a uniform prior | |||
; -postCutoff [float] | |||
Call only a genotype with a posterior above this threshold. | |||
==example== | |||
<pre> | |||
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1 -minLRT 24 -doGeno 5 -doPost 1 -postCutoff 0.95 | |||
</pre> | |||
gives a output like this: | |||
<pre> | |||
1 14000202 G A GG NN NN GA NN | |||
1 14000873 G A GG GG GG AA GA | |||
1 14001018 T C NN NN NN CC NN | |||
1 14001867 A G NN AA AA NN NN | |||
1 14002342 C T CC CC CC CC CC | |||
1 14002422 A T AA NN NN NN NN | |||
1 14002474 T C TC TT TT TT TT | |||
1 14003581 C T CC CC NN NN CT | |||
1 14004623 T C TT TT TT NN TC | |||
1 14005069 A G AA AA AA AA AA | |||
</pre> |
Revision as of 20:09, 10 October 2012
Genotype calling
The program can do genotype calling based either on the genotype til the highest likelihood or by using the frequency as a prior(recommended see Kim2011).
options
- -doGeno [int]
1: print out major minor
2: print the called genotype as 0,1,2
4: print the called genotype as AA, AC, AG, ...
8: print all 3 posts (major,major),(major,minor),(minor,minor)
16: print the posterior of the called genotype
32: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double
Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual
- -doPost [int]
1: estimate the posterior genotype probability based on the allele frequency as a prior
2: estimate the posterior genotype probability assuming a uniform prior
- -postCutoff [float]
Call only a genotype with a posterior above this threshold.
example
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1 -minLRT 24 -doGeno 5 -doPost 1 -postCutoff 0.95
gives a output like this:
1 14000202 G A GG NN NN GA NN 1 14000873 G A GG GG GG AA GA 1 14001018 T C NN NN NN CC NN 1 14001867 A G NN AA AA NN NN 1 14002342 C T CC CC CC CC CC 1 14002422 A T AA NN NN NN NN 1 14002474 T C TC TT TT TT TT 1 14003581 C T CC CC NN NN CT 1 14004623 T C TT TT TT NN TC 1 14005069 A G AA AA AA AA AA