ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
SNP calling: Difference between revisions
No edit summary |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
==Likelihood ratio test== | ==Likelihood ratio test== | ||
SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site | SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site is called as polymorphic. The MAF estimate(s) given by -doMaf (see [[Allele_Frequency_estimation]]), will be used for a like ratio test by using a chi-square distribution with one degree of freedom for -doMaf 1 and -doMaf 2. | ||
===options=== | ===options=== | ||
Line 10: | Line 10: | ||
===example=== | ===example=== | ||
In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the | In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the GATK method (-GL 2), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and only keep those sites that have a p-value less than 1e-6 of for being variable. | ||
<pre> | <pre> | ||
./angsd -bam bam.filelist -GL | ./angsd -bam bam.filelist -GL 2 -out outfile -doMaf 2 -SNP_pval 1e-6 -doMajorMinor 1 | ||
</pre> | </pre> | ||
===output=== | ===output=== | ||
the results are given in the file outfile.mafs: | the results are given in the file outfile.mafs.gz: | ||
<pre> | <pre> | ||
chromo position major minor | chromo position major minor unknownEM pu-EM nInd | ||
1 | 1 14000873 G A 0.282476 0.000000e+00 10 | ||
1 | 1 14001018 T C 0.259890 7.494005e-14 9 | ||
1 | 1 14001867 A G 0.272099 6.361578e-14 10 | ||
1 | 1 14002422 A T 0.377890 0.000000e+00 9 | ||
1 | 1 14003581 C T 0.194393 5.551115e-16 9 | ||
1 | 1 14004623 T C 0.259172 2.424727e-13 10 | ||
1 | 1 14007493 A G 0.297176 5.114086e-07 9 | ||
1 | 1 14007558 C T 0.381770 0.000000e+00 8 | ||
1 | 1 14007649 G A 0.220547 1.054967e-11 9 | ||
1 | 1 14008734 T A 0.242852 0.000000e+00 10 | ||
1 | 1 14009723 G C 0.255063 2.470836e-07 10 | ||
1 | 1 14010597 G A 0.315430 0.000000e+00 10 | ||
1 | 1 14010851 C A 0.276936 0.000000e+00 10 | ||
1 | 1 14012240 C T 0.297956 0.000000e+00 10 | ||
</pre> | </pre> | ||
The | The columns are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the allele frequency, the p-value and the number of individuals with information. |
Latest revision as of 09:59, 28 September 2021
SNP Calling
Likelihood ratio test
SNPs are called based on their allele frequencies. If a site has a minor allele frequency significantly different from 0 a site is called as polymorphic. The MAF estimate(s) given by -doMaf (see Allele_Frequency_estimation), will be used for a like ratio test by using a chi-square distribution with one degree of freedom for -doMaf 1 and -doMaf 2.
options
- -SNP_pval [float]
The p-value used for calling snaps. see Allele_Frequency_estimation for additional options
example
In this example we analyse data from bam files (-bam bam.files), calculate the genotype likelihood using the GATK method (-GL 2), infer the major and minor alleles (-doMajorMinor 1), estimate the allele frequencies assuming known minor (-doMAF 2) and only keep those sites that have a p-value less than 1e-6 of for being variable.
./angsd -bam bam.filelist -GL 2 -out outfile -doMaf 2 -SNP_pval 1e-6 -doMajorMinor 1
output
the results are given in the file outfile.mafs.gz:
chromo position major minor unknownEM pu-EM nInd 1 14000873 G A 0.282476 0.000000e+00 10 1 14001018 T C 0.259890 7.494005e-14 9 1 14001867 A G 0.272099 6.361578e-14 10 1 14002422 A T 0.377890 0.000000e+00 9 1 14003581 C T 0.194393 5.551115e-16 9 1 14004623 T C 0.259172 2.424727e-13 10 1 14007493 A G 0.297176 5.114086e-07 9 1 14007558 C T 0.381770 0.000000e+00 8 1 14007649 G A 0.220547 1.054967e-11 9 1 14008734 T A 0.242852 0.000000e+00 10 1 14009723 G C 0.255063 2.470836e-07 10 1 14010597 G A 0.315430 0.000000e+00 10 1 14010851 C A 0.276936 0.000000e+00 10 1 14012240 C T 0.297956 0.000000e+00 10
The columns are the chromosome, the position, the major allele, the minor allele, the minor allele estimate, the allele frequency, the p-value and the number of individuals with information.