ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
SNP calling: Difference between revisions
(Created page with "==SNP Calling== ===Likelihood ratio test=== supply -doSNP 1 Then the MAF estimate(s) given by -doMaf INT, will be used for a like ratio test <pre> chromo position ma...") |
No edit summary |
||
Line 1: | Line 1: | ||
=SNP Calling= | |||
==Likelihood ratio test== | |||
; -doSNP 1 | |||
SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site i called as polymorphic. The MAF estimate(s) given by -doMaf (see [[Allele_Frequency_estimation]]), will be used for a like ratio test and -2log(likelihood ratio) is the output. | |||
===options=== | |||
; -minLrt [float] | |||
default 0. The minimum likelihood ratio statistic used i further analysis and in the output. -minLrt 24 would only print site with a high likelihood ratio statistic corresonding to a p-value of approximately <math> 10^{-6}</math> | |||
see [[Allele_Frequency_estimation]] for additional options | |||
===example=== | |||
In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the samtools method (-GL 1), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and calculates the likelihood ratio statistic (-doSNP 1) | |||
<pre> | |||
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1 | |||
</pre> | |||
===output=== | |||
the results are given in the file outfile.mafs: | |||
<pre> | <pre> | ||
chromo position major minor knownEM pK-EM nInd | chromo position major minor knownEM pK-EM nInd | ||
1 14008260 C A 0.000001 -0.000074 32 | |||
1 14008261 A C 0.000000 -0.000051 31 | |||
1 14008262 G T 0.000001 -0.000099 32 | |||
1 14008263 C A 0.000001 -0.000093 32 | |||
1 14008264 A T 0.000010 -0.000574 32 | |||
1 14008265 T C 0.000001 -0.000144 31 | |||
1 14008266 G A 0.000000 -0.000049 31 | |||
1 14008267 G A 0.000001 -0.000100 31 | |||
1 14008268 G A 0.000001 -0.000081 30 | |||
1 14008269 G A 0.168397 191.962068 31 | |||
1 14008270 G A 0.000001 -0.000075 31 | |||
1 14008271 A C 0.000001 -0.000109 31 | |||
1 14008272 A C 0.000001 -0.000081 29 | |||
1 14008273 A C 0.000002 -0.000156 30 | |||
</pre> | </pre> | ||
The colums are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the likelihood ratio statistic and the number of individuals with information. In this example one site has a high likelihood ratio statistic. |
Revision as of 14:27, 21 September 2012
SNP Calling
Likelihood ratio test
- -doSNP 1
SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site i called as polymorphic. The MAF estimate(s) given by -doMaf (see Allele_Frequency_estimation), will be used for a like ratio test and -2log(likelihood ratio) is the output.
options
- -minLrt [float]
default 0. The minimum likelihood ratio statistic used i further analysis and in the output. -minLrt 24 would only print site with a high likelihood ratio statistic corresonding to a p-value of approximately see Allele_Frequency_estimation for additional options
example
In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the samtools method (-GL 1), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and calculates the likelihood ratio statistic (-doSNP 1)
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1
output
the results are given in the file outfile.mafs:
chromo position major minor knownEM pK-EM nInd 1 14008260 C A 0.000001 -0.000074 32 1 14008261 A C 0.000000 -0.000051 31 1 14008262 G T 0.000001 -0.000099 32 1 14008263 C A 0.000001 -0.000093 32 1 14008264 A T 0.000010 -0.000574 32 1 14008265 T C 0.000001 -0.000144 31 1 14008266 G A 0.000000 -0.000049 31 1 14008267 G A 0.000001 -0.000100 31 1 14008268 G A 0.000001 -0.000081 30 1 14008269 G A 0.168397 191.962068 31 1 14008270 G A 0.000001 -0.000075 31 1 14008271 A C 0.000001 -0.000109 31 1 14008272 A C 0.000001 -0.000081 29 1 14008273 A C 0.000002 -0.000156 30
The colums are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the likelihood ratio statistic and the number of individuals with information. In this example one site has a high likelihood ratio statistic.