ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

SNP calling: Difference between revisions

From angsd
Jump to navigation Jump to search
(Created page with "==SNP Calling== ===Likelihood ratio test=== supply -doSNP 1 Then the MAF estimate(s) given by -doMaf INT, will be used for a like ratio test <pre> chromo position ma...")
 
No edit summary
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==SNP Calling==
=SNP Calling=


===Likelihood ratio test===
==Likelihood ratio test==
supply -doSNP 1
SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site is called as polymorphic. The MAF estimate(s) given by -doMaf (see [[Allele_Frequency_estimation]]), will be used for a like ratio test by using a chi-square distribution with one degree of freedom for -doMaf 1 and -doMaf 2.
Then the MAF estimate(s) given by -doMaf INT, will be used for a like ratio test
 
===options===
; -SNP_pval [float]
The p-value used for calling snaps.
see [[Allele_Frequency_estimation]] for additional options
 
===example===
In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the GATK method (-GL 2), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and only keep those sites that have a p-value less than 1e-6 of for being variable.


<pre>
<pre>
chromo  position        major  minor  knownEM pK-EM  nInd
./angsd -bam bam.filelist -GL 2 -out outfile -doMaf 2 -SNP_pval 1e-6 -doMajorMinor 1
21      9719770 T      A      0.000009        -0.000018      1
</pre>
21      9719771 T      A      0.000001        -0.000003      1
21      9719772 C      A      0.000001        -0.000003      1
21      9719773 T      A      0.000002        -0.000010      2
21      9719774 G      A      0.000002        -0.000010      2
21      9719775 A      C      0.000004        -0.000022      2
21      9719776 G      A      0.000002        -0.000010      2
21      9719777 A      C      0.000002        -0.000013      2


===output===
the results are given in the file outfile.mafs.gz:
<pre>
chromo  position        major  minor  unknownEM      pu-EM  nInd
1      14000873        G      A      0.282476        0.000000e+00    10
1      14001018        T      C      0.259890        7.494005e-14    9
1      14001867        A      G      0.272099        6.361578e-14    10
1      14002422        A      T      0.377890        0.000000e+00    9
1      14003581        C      T      0.194393        5.551115e-16    9
1      14004623        T      C      0.259172        2.424727e-13    10
1      14007493        A      G      0.297176        5.114086e-07    9
1      14007558        C      T      0.381770        0.000000e+00    8
1      14007649        G      A      0.220547        1.054967e-11    9
1      14008734        T      A      0.242852        0.000000e+00    10
1      14009723        G      C      0.255063        2.470836e-07    10
1      14010597        G      A      0.315430        0.000000e+00    10
1      14010851        C      A      0.276936        0.000000e+00    10
1      14012240        C      T      0.297956        0.000000e+00    10
</pre>
</pre>
The columns are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the allele frequency, the p-value and the number of individuals with information.

Latest revision as of 10:59, 28 September 2021

SNP Calling

Likelihood ratio test

SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site is called as polymorphic. The MAF estimate(s) given by -doMaf (see Allele_Frequency_estimation), will be used for a like ratio test by using a chi-square distribution with one degree of freedom for -doMaf 1 and -doMaf 2.

options

-SNP_pval [float]

The p-value used for calling snaps. see Allele_Frequency_estimation for additional options

example

In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the GATK method (-GL 2), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and only keep those sites that have a p-value less than 1e-6 of for being variable.

./angsd -bam bam.filelist -GL 2 -out outfile -doMaf 2 -SNP_pval 1e-6 -doMajorMinor 1

output

the results are given in the file outfile.mafs.gz:

chromo  position        major   minor   unknownEM       pu-EM   nInd
1       14000873        G       A       0.282476        0.000000e+00    10
1       14001018        T       C       0.259890        7.494005e-14    9
1       14001867        A       G       0.272099        6.361578e-14    10
1       14002422        A       T       0.377890        0.000000e+00    9
1       14003581        C       T       0.194393        5.551115e-16    9
1       14004623        T       C       0.259172        2.424727e-13    10
1       14007493        A       G       0.297176        5.114086e-07    9
1       14007558        C       T       0.381770        0.000000e+00    8
1       14007649        G       A       0.220547        1.054967e-11    9
1       14008734        T       A       0.242852        0.000000e+00    10
1       14009723        G       C       0.255063        2.470836e-07    10
1       14010597        G       A       0.315430        0.000000e+00    10
1       14010851        C       A       0.276936        0.000000e+00    10
1       14012240        C       T       0.297956        0.000000e+00    10

The columns are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the allele frequency, the p-value and the number of individuals with information.