Revision as of 20:09, 10 October 2012

Genotype calling

The program can do genotype calling based either on the genotype til the highest likelihood or by using the frequency as a prior(recommended see Kim2011).

options

-doGeno [int]

1: print out major minor

2: print the called genotype as 0,1,2

4: print the called genotype as AA, AC, AG, ...

8: print all 3 posts (major,major),(major,minor),(minor,minor)

16: print the posterior of the called genotype

32: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double

Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual

-doPost [int]

1: estimate the posterior genotype probability based on the allele frequency as a prior

2: estimate the posterior genotype probability assuming a uniform prior

-postCutoff [float]

Call only a genotype with a posterior above this threshold.

example

./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1 -minLRT 24 -doGeno 5 -doPost 1 -postCutoff 0.95

gives a output like this:

1       14000202        G       A       GG      NN      NN      GA      NN      
1       14000873        G       A       GG      GG      GG      AA      GA      
1       14001018        T       C       NN      NN      NN      CC      NN      
1       14001867        A       G       NN      AA      AA      NN      NN      
1       14002342        C       T       CC      CC      CC      CC      CC      
1       14002422        A       T       AA      NN      NN      NN      NN      
1       14002474        T       C       TC      TT      TT      TT      TT      
1       14003581        C       T       CC      CC      NN      NN      CT      
1       14004623        T       C       TT      TT      TT      NN      TC      
1       14005069        A       G       AA      AA      AA      AA      AA

@@ Line 1: / Line 1: @@
-==Genotype calling==
+=Genotype calling=
-The program can do genotype calling in different tempis.
+The program can do genotype calling based either on the genotype til the highest likelihood or by using the frequency as a prior(recommended see [[Kim2011]]).
-output file .geno
+==options==
+;-doGeno [int]
+: print out major minor
+: print the called genotype as 0,1,2
-;-doGeno 1:, print out major minor
+: print the called genotype as AA, AC, AG, ...
-;-doGeno 2:, print the called genotype as 0,1,2
+: print all 3 posts (major,major),(major,minor),(minor,minor)
-;-doGeno 4:, print the called genotype as AA, AC, AG, ...
+: print the posterior of the called genotype
-;-doGeno 8:, print all 3 posts (major,major),(major,minor),(minor,minor)
+: somewhat different dumps the binary posterior for all samples, encoded as 3*nind double
-;-doGeno 16:, print the posterior of the called genotype
+Use the sum of the above to give the output you want. Forexample -doGeno 5 (1+4) prins the major and minor allele followed by the genotype (AA, AC ...) for each individual
-;-doGeno 32:, somewhat different dumps the binary posts for all samples, encoded as 3*nind double
+; -doPost [int]
+: estimate the posterior genotype probability based on the allele frequency as a prior
-The genotype are integers such that AA=0,AC=1,AG=2,AT=3,CC=4,CG=5,CT=6,GG=7,GT=8,TT=9
+: estimate the posterior genotype probability assuming a uniform prior
-output is (-doGeno NOT 64)
+; -postCutoff [float]
-chr, pos, numberof samples times[ the above]
+Call only a genotype with a posterior above this threshold.
+==example==
-NB currently you also need to supply -doMaf to run this genotype calling
+<pre>
+./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1 -minLRT 24 -doGeno 5 -doPost 1 -postCutoff 0.95
+</pre>
+ gives a output like this:
+<pre>
+       14000202        G       A       GG      NN      NN      GA      NN
+       14000873        G       A       GG      GG      GG      AA      GA
+       14001018        T       C       NN      NN      NN      CC      NN
+       14001867        A       G       NN      AA      AA      NN      NN
+       14002342        C       T       CC      CC      CC      CC      CC
+       14002422        A       T       AA      NN      NN      NN      NN
+       14002474        T       C       TC      TT      TT      TT      TT
+       14003581        C       T       CC      CC      NN      NN      CT
+       14004623        T       C       TT      TT      TT      NN      TC
+       14005069        A       G       AA      AA      AA      AA      AA
+</pre>

Genotype calling: Difference between revisions

Revision as of 20:09, 10 October 2012

Genotype calling

options

example

Navigation menu