ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Genotype calling: Difference between revisions

From angsd
Jump to navigation Jump to search
Line 1: Line 1:
==Genotype calling==
=Genotype calling=
The program can do genotype calling in different tempis.  
The program can do genotype calling based either on the genotype til the highest likelihood or by using the frequency as a prior(recommended see [[Kim2011]]).  


output file .geno
==options==
;-doGeno [int]
1: print out major minor


2: print the called genotype as 0,1,2


;-doGeno 1:, print out major minor
4: print the called genotype as AA, AC, AG, ...


;-doGeno 2:, print the called genotype as 0,1,2
8: print all 3 posts (major,major),(major,minor),(minor,minor)


;-doGeno 4:, print the called genotype as AA, AC, AG, ...
16: print the posterior of the called genotype


;-doGeno 8:, print all 3 posts (major,major),(major,minor),(minor,minor)
32: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double


;-doGeno 16:, print the posterior of the called genotype
Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual


;-doGeno 32:, somewhat different dumps the binary posts for all samples, encoded as 3*nind double
; -doPost [int]
1: estimate the posterior genotype probability based on the allele frequency as a prior


The genotype are integers such that AA=0,AC=1,AG=2,AT=3,CC=4,CG=5,CT=6,GG=7,GT=8,TT=9
2: estimate the posterior genotype probability assuming a uniform prior


output is (-doGeno NOT 64)
; -postCutoff [float]
chr, pos, numberof samples times[ the above]
Call only a genotype with a posterior above this threshold.


 
==example==
NB currently you also need to supply -doMaf to run this genotype calling
<pre>
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1 -minLRT 24 -doGeno 5 -doPost 1 -postCutoff 0.95
</pre>
gives a output like this:
<pre>
1      14000202        G      A      GG      NN      NN      GA      NN     
1      14000873        G      A      GG      GG      GG      AA      GA     
1      14001018        T      C      NN      NN      NN      CC      NN     
1      14001867        A      G      NN      AA      AA      NN      NN     
1      14002342        C      T      CC      CC      CC      CC      CC     
1      14002422        A      T      AA      NN      NN      NN      NN     
1      14002474        T      C      TC      TT      TT      TT      TT     
1      14003581        C      T      CC      CC      NN      NN      CT     
1      14004623        T      C      TT      TT      TT      NN      TC     
1      14005069        A      G      AA      AA      AA      AA      AA
</pre>

Revision as of 20:09, 10 October 2012

Genotype calling

The program can do genotype calling based either on the genotype til the highest likelihood or by using the frequency as a prior(recommended see Kim2011).

options

-doGeno [int]

1: print out major minor

2: print the called genotype as 0,1,2

4: print the called genotype as AA, AC, AG, ...

8: print all 3 posts (major,major),(major,minor),(minor,minor)

16: print the posterior of the called genotype

32: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double

Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual

-doPost [int]

1: estimate the posterior genotype probability based on the allele frequency as a prior

2: estimate the posterior genotype probability assuming a uniform prior

-postCutoff [float]

Call only a genotype with a posterior above this threshold.

example

./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1 -minLRT 24 -doGeno 5 -doPost 1 -postCutoff 0.95
gives a output like this:
1       14000202        G       A       GG      NN      NN      GA      NN      
1       14000873        G       A       GG      GG      GG      AA      GA      
1       14001018        T       C       NN      NN      NN      CC      NN      
1       14001867        A       G       NN      AA      AA      NN      NN      
1       14002342        C       T       CC      CC      CC      CC      CC      
1       14002422        A       T       AA      NN      NN      NN      NN      
1       14002474        T       C       TC      TT      TT      TT      TT      
1       14003581        C       T       CC      CC      NN      NN      CT      
1       14004623        T       C       TT      TT      TT      NN      TC      
1       14005069        A       G       AA      AA      AA      AA      AA