ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

SNP calling: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
No edit summary
 
(5 intermediate revisions by 2 users not shown)
Line 2: Line 2:


==Likelihood ratio test==
==Likelihood ratio test==
; -doSNP 1
SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site is called as polymorphic. The MAF estimate(s) given by -doMaf (see [[Allele_Frequency_estimation]]), will be used for a like ratio test by using a chi-square distribution with one degree of freedom for -doMaf 1 and -doMaf 2.
SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site i called as polymorphic. The MAF estimate(s) given by -doMaf (see [[Allele_Frequency_estimation]]), will be used for a like ratio test and -2log(likelihood ratio) is the output.


===options===
===options===
; -minLrt [float]
; -SNP_pval [float]
default 0. The minimum likelihood ratio statistic used i further analysis and in the output. -minLrt 24 would only print site with a high likelihood ratio statistic corresonding to a p-value of approximately <math> 10^{-6}</math>
The p-value used for calling snaps.
see [[Allele_Frequency_estimation]] for additional options
see [[Allele_Frequency_estimation]] for additional options


===example===
===example===
In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the samtools method (-GL 1), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and calculates the likelihood ratio statistic (-doSNP 1)
In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the GATK method (-GL 2), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and only keep those sites that have a p-value less than 1e-6 of for being variable.


<pre>
<pre>
./angsd -bam bam.filelist -GL 1 -out outfile -doMaf 2 -doSNP 1 -doMajorMinor 1
./angsd -bam bam.filelist -GL 2 -out outfile -doMaf 2 -SNP_pval 1e-6 -doMajorMinor 1
</pre>
</pre>


===output===
===output===
the results are given in the file outfile.mafs:
the results are given in the file outfile.mafs.gz:
<pre>
<pre>
chromo  position        major  minor  knownEM pK-EM  nInd
chromo  position        major  minor  unknownEM      pu-EM  nInd
1      14008260       C       A      0.000001       -0.000074      32
1      14000873       G       A      0.282476       0.000000e+00    10
1      14008261       A       C      0.000000       -0.000051      31
1      14001018       T       C      0.259890       7.494005e-14    9
1      14008262       G       T       0.000001       -0.000099      32
1      14001867       A      G      0.272099       6.361578e-14    10
1      14008263       C       A       0.000001       -0.000093      32
1      14002422       A       T       0.377890       0.000000e+00    9
1      14008264       A       T      0.000010       -0.000574      32
1      14003581       C       T      0.194393       5.551115e-16    9
1      14008265       T      C      0.000001       -0.000144      31
1      14004623       T      C      0.259172       2.424727e-13    10
1      14008266       G       A       0.000000       -0.000049      31
1      14007493       A      G      0.297176       5.114086e-07    9
1      14008267       G       A       0.000001       -0.000100      31
1      14007558       C       T       0.381770       0.000000e+00    8
1      14008268       G      A      0.000001       -0.000081      30
1      14007649       G      A      0.220547       1.054967e-11    9
1      14008269       G       A      0.168397       191.962068      31
1      14008734       T       A      0.242852       0.000000e+00    10
1      14008270       G      A       0.000001       -0.000075      31
1      14009723       G      C       0.255063       2.470836e-07    10
1      14008271       A       C       0.000001       -0.000109      31
1      14010597       G      A      0.315430       0.000000e+00    10
1      14008272       A       C       0.000001       -0.000081      29
1      14010851       C      A      0.276936       0.000000e+00    10
1      14008273       A       C       0.000002       -0.000156      30
1      14012240       C       T       0.297956       0.000000e+00    10
</pre>
</pre>
The colums are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the likelihood ratio statistic and the number of individuals with information. In this example one site has a high likelihood ratio statistic.
The columns are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the allele frequency, the p-value and the number of individuals with information.

Latest revision as of 10:59, 28 September 2021

SNP Calling

Likelihood ratio test

SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site is called as polymorphic. The MAF estimate(s) given by -doMaf (see Allele_Frequency_estimation), will be used for a like ratio test by using a chi-square distribution with one degree of freedom for -doMaf 1 and -doMaf 2.

options

-SNP_pval [float]

The p-value used for calling snaps. see Allele_Frequency_estimation for additional options

example

In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the GATK method (-GL 2), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and only keep those sites that have a p-value less than 1e-6 of for being variable.

./angsd -bam bam.filelist -GL 2 -out outfile -doMaf 2 -SNP_pval 1e-6 -doMajorMinor 1

output

the results are given in the file outfile.mafs.gz:

chromo  position        major   minor   unknownEM       pu-EM   nInd
1       14000873        G       A       0.282476        0.000000e+00    10
1       14001018        T       C       0.259890        7.494005e-14    9
1       14001867        A       G       0.272099        6.361578e-14    10
1       14002422        A       T       0.377890        0.000000e+00    9
1       14003581        C       T       0.194393        5.551115e-16    9
1       14004623        T       C       0.259172        2.424727e-13    10
1       14007493        A       G       0.297176        5.114086e-07    9
1       14007558        C       T       0.381770        0.000000e+00    8
1       14007649        G       A       0.220547        1.054967e-11    9
1       14008734        T       A       0.242852        0.000000e+00    10
1       14009723        G       C       0.255063        2.470836e-07    10
1       14010597        G       A       0.315430        0.000000e+00    10
1       14010851        C       A       0.276936        0.000000e+00    10
1       14012240        C       T       0.297956        0.000000e+00    10

The columns are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the allele frequency, the p-value and the number of individuals with information.