ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Genotype Likelihoods: Difference between revisions
No edit summary |
|||
Line 1: | Line 1: | ||
=Analysis from sequencing data= | |||
<classdiagram> | <classdiagram> | ||
// [input|bam files;SOAP files{bg:orange}]->[sequence data] | // [input|bam files;SOAP files{bg:orange}]->[sequence data] | ||
Line 5: | Line 5: | ||
</classdiagram> | </classdiagram> | ||
=Genotype likelihoods from alignments= | |||
; -GL [int] | ; -GL [int] | ||
If your input is sequencing file you can estimate genotype likelhoods from the mapped reads. Four different methods are available. | If your input is sequencing file you can estimate genotype likelhoods from the mapped reads. Four different methods are available. | ||
==Samtools== | |||
-GL 1 | -GL 1 | ||
This methods has a random component. In same tools there is a stocastic component so to get the exact same results as samtools use nThreads=1. However, the method is still the same with multiple threads but some sites will have small differences compared to the samtools output bacause of the stocastic component. | This methods has a random component. In same tools there is a stocastic component so to get the exact same results as samtools use nThreads=1. However, the method is still the same with multiple threads but some sites will have small differences compared to the samtools output bacause of the stocastic component. | ||
===options=== | |||
; -minQ [int] | |||
default 13. The minimum allowed base quality score. | |||
; -minMapQ [int] | |||
default 0; The minimum allowed mapping quality score. | |||
===example=== | |||
<pre> | |||
./angsd -bam bam.filelist -GL 1 -out outfile | |||
</pre> | |||
==GATK== | |||
-GL 2 | -GL 2 | ||
===soapSNP | ===options=== | ||
; -minQ [int] | |||
default 13. The minimum allowed base quality score. | |||
; -minMapQ [int] | |||
default 0; The minimum allowed mapping quality score. | |||
===example=== | |||
<pre> | |||
./angsd -bam bam.filelist -GL 2 -out outfile | |||
</pre> | |||
==soapSNP== | |||
-GL 3 | -GL 3 | ||
When estimating GL with soapSNP we need to generate a calibration matrix. This is done automaticly if these doesn't exist. | When estimating GL with soapSNP we need to generate a calibration matrix. This is done automaticly if these doesn't exist. These are located in angsd_tmpdir/basenameNUM.count,angsd_tmpdir/basenameNUM.qual | ||
These are located in angsd_tmpdir/basenameNUM.count,angsd_tmpdir/basenameNUM.qual | |||
===options=== | |||
; -minQ [int] | |||
default 13. The minimum allowed base quality score. | |||
; -minMapQ [int] | |||
default 0; The minimum allowed mapping quality score. | |||
; -maxq [int] | |||
default 51; The maximum allowed base quality score. | |||
; -L [int] | |||
default 150; The maximum read length (choosing one that is too large is not a problem) | |||
===example=== | |||
<pre> | <pre> | ||
. | ./angsd -bam bam.filelist -GL 3 -out outfile -minQ 0 -ref hg19.fa | ||
</pre> | </pre> | ||
Line 29: | Line 61: | ||
<pre> | <pre> | ||
./angsd -bam bam.filelist -GL 3 -out outfile -minQ 0 -ref hg19.fa | |||
</pre> | </pre> | ||
==Kim et al.== | |||
-GL 4 | -GL 4 | ||
[[Kim10|Citation]] [[Kim11|Citation]] | |||
===options=== | |||
; -error [filename] | |||
A file with the estimated type specific error rates (see [[Error_estimation]]). | |||
===example=== | |||
<pre> | |||
./angsd -bam bam.filelist -GL 4 -out outfile -error error.file | |||
</pre> |
Revision as of 17:48, 19 September 2012
Analysis from sequencing data
<classdiagram> // [input|bam files;SOAP files{bg:orange}]->[sequence data]
[sequence data]->[genotype likelihoods|samtools;GATK;soapSNP;kim et.al] </classdiagram>
Genotype likelihoods from alignments
- -GL [int]
If your input is sequencing file you can estimate genotype likelhoods from the mapped reads. Four different methods are available.
Samtools
-GL 1
This methods has a random component. In same tools there is a stocastic component so to get the exact same results as samtools use nThreads=1. However, the method is still the same with multiple threads but some sites will have small differences compared to the samtools output bacause of the stocastic component.
options
- -minQ [int]
default 13. The minimum allowed base quality score.
- -minMapQ [int]
default 0; The minimum allowed mapping quality score.
example
./angsd -bam bam.filelist -GL 1 -out outfile
GATK
-GL 2
options
- -minQ [int]
default 13. The minimum allowed base quality score.
- -minMapQ [int]
default 0; The minimum allowed mapping quality score.
example
./angsd -bam bam.filelist -GL 2 -out outfile
soapSNP
-GL 3 When estimating GL with soapSNP we need to generate a calibration matrix. This is done automaticly if these doesn't exist. These are located in angsd_tmpdir/basenameNUM.count,angsd_tmpdir/basenameNUM.qual
options
- -minQ [int]
default 13. The minimum allowed base quality score.
- -minMapQ [int]
default 0; The minimum allowed mapping quality score.
- -maxq [int]
default 51; The maximum allowed base quality score.
- -L [int]
default 150; The maximum read length (choosing one that is too large is not a problem)
example
./angsd -bam bam.filelist -GL 3 -out outfile -minQ 0 -ref hg19.fa
This first loop doesn't estimate anything else than the calibration matrix. So now we can do the analysis we want
./angsd -bam bam.filelist -GL 3 -out outfile -minQ 0 -ref hg19.fa
Kim et al.
options
- -error [filename]
A file with the estimated type specific error rates (see Error_estimation).
example
./angsd -bam bam.filelist -GL 4 -out outfile -error error.file