ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Genotype Likelihoods

From angsd
Revision as of 16:48, 19 September 2012 by Albrecht (talk | contribs)
Jump to navigation Jump to search

Analysis from sequencing data

<classdiagram> // [input|bam files;SOAP files{bg:orange}]->[sequence data]

[sequence data]->[genotype likelihoods|samtools;GATK;soapSNP;kim et.al]
</classdiagram>

Genotype likelihoods from alignments

-GL [int]

If your input is sequencing file you can estimate genotype likelhoods from the mapped reads. Four different methods are available.

Samtools

-GL 1

This methods has a random component. In same tools there is a stocastic component so to get the exact same results as samtools use nThreads=1. However, the method is still the same with multiple threads but some sites will have small differences compared to the samtools output bacause of the stocastic component.

options

-minQ [int]

default 13. The minimum allowed base quality score.

-minMapQ [int]

default 0; The minimum allowed mapping quality score.

example

./angsd -bam bam.filelist -GL 1 -out outfile

GATK

-GL 2

options

-minQ [int]

default 13. The minimum allowed base quality score.

-minMapQ [int]

default 0; The minimum allowed mapping quality score.

example

./angsd -bam bam.filelist -GL 2 -out outfile

soapSNP

-GL 3 When estimating GL with soapSNP we need to generate a calibration matrix. This is done automaticly if these doesn't exist. These are located in angsd_tmpdir/basenameNUM.count,angsd_tmpdir/basenameNUM.qual

options

-minQ [int]

default 13. The minimum allowed base quality score.

-minMapQ [int]

default 0; The minimum allowed mapping quality score.

-maxq [int]

default 51; The maximum allowed base quality score.

-L [int]

default 150; The maximum read length (choosing one that is too large is not a problem)

example

./angsd -bam bam.filelist -GL 3 -out outfile -minQ 0 -ref hg19.fa 

This first loop doesn't estimate anything else than the calibration matrix. So now we can do the analysis we want

./angsd -bam bam.filelist -GL 3 -out outfile -minQ 0 -ref hg19.fa

Kim et al.

-GL 4 Citation Citation

options

-error [filename]

A file with the estimated type specific error rates (see Error_estimation).

example

./angsd -bam bam.filelist -GL 4 -out outfile -error error.file