ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

SNP calling: Difference between revisions

From angsd
Jump to navigation Jump to search
(Created page with "==SNP Calling== ===Likelihood ratio test=== supply -doSNP 1 Then the MAF estimate(s) given by -doMaf INT, will be used for a like ratio test <pre> chromo position ma...")
 
No edit summary
Line 1: Line 1:
==SNP Calling==
=SNP Calling=


===Likelihood ratio test===
==Likelihood ratio test==
supply -doSNP 1
; -doSNP 1
Then the MAF estimate(s) given by -doMaf INT, will be used for a like ratio test
SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site i called as polymorphic. The MAF estimate(s) given by -doMaf (see [[Allele_Frequency_estimation]]), will be used for a like ratio test and -2log(likelihood ratio) is the output.


===options===
; -minLrt [float]
default 0. The minimum likelihood ratio statistic used i further analysis and in the output. -minLrt 24 would only print site with a high likelihood ratio statistic corresonding to a p-value of approximately <math> 10^{-6}</math>
see [[Allele_Frequency_estimation]] for additional options
===example===
In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the samtools method (-GL 1), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and calculates the likelihood ratio statistic (-doSNP 1)
<pre>
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1
</pre>
===output===
the results are given in the file outfile.mafs:
<pre>
<pre>
chromo  position        major  minor  knownEM pK-EM  nInd
chromo  position        major  minor  knownEM pK-EM  nInd
21      9719770 T       A      0.000009       -0.000018       1
1      14008260        C       A      0.000001       -0.000074       32
21      9719771 T      A      0.000001        -0.000003       1
1       14008261        A      C      0.000000        -0.000051      31
21      9719772 C      A      0.000001        -0.000003       1
1      14008262        G      T       0.000001        -0.000099      32
21      9719773 T       A      0.000002       -0.000010       2
1      14008263        C       A      0.000001        -0.000093       32
21     9719774 G      A      0.000002       -0.000010       2
1       14008264        A      T      0.000010        -0.000574      32
21      9719775 A      C      0.000004       -0.000022       2
1      14008265        T      C       0.000001        -0.000144      31
21      9719776 G       A      0.000002       -0.000010       2
1      14008266        G      A      0.000000        -0.000049      31
21      9719777 A      C      0.000002        -0.000013       2
1      14008267        G       A      0.000001        -0.000100       31
 
1       14008268        G       A      0.000001       -0.000081       30
1      14008269        G      A      0.168397        191.962068     31
1      14008270        G      A      0.000001       -0.000075       31
1      14008271        A      C      0.000001       -0.000109       31
1       14008272        A       C       0.000001       -0.000081       29
1      14008273        A      C      0.000002        -0.000156       30
</pre>
</pre>
The colums are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the likelihood ratio statistic and the number of individuals with information. In this example one site has a high likelihood ratio statistic.

Revision as of 15:27, 21 September 2012

SNP Calling

Likelihood ratio test

-doSNP 1

SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site i called as polymorphic. The MAF estimate(s) given by -doMaf (see Allele_Frequency_estimation), will be used for a like ratio test and -2log(likelihood ratio) is the output.

options

-minLrt [float]

default 0. The minimum likelihood ratio statistic used i further analysis and in the output. -minLrt 24 would only print site with a high likelihood ratio statistic corresonding to a p-value of approximately see Allele_Frequency_estimation for additional options

example

In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the samtools method (-GL 1), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and calculates the likelihood ratio statistic (-doSNP 1)

./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1

output

the results are given in the file outfile.mafs:

chromo  position        major   minor   knownEM pK-EM   nInd
1       14008260        C       A       0.000001        -0.000074       32
1       14008261        A       C       0.000000        -0.000051       31
1       14008262        G       T       0.000001        -0.000099       32
1       14008263        C       A       0.000001        -0.000093       32
1       14008264        A       T       0.000010        -0.000574       32
1       14008265        T       C       0.000001        -0.000144       31
1       14008266        G       A       0.000000        -0.000049       31
1       14008267        G       A       0.000001        -0.000100       31
1       14008268        G       A       0.000001        -0.000081       30
1       14008269        G       A       0.168397        191.962068      31
1       14008270        G       A       0.000001        -0.000075       31
1       14008271        A       C       0.000001        -0.000109       31
1       14008272        A       C       0.000001        -0.000081       29
1       14008273        A       C       0.000002        -0.000156       30

The colums are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the likelihood ratio statistic and the number of individuals with information. In this example one site has a high likelihood ratio statistic.