ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Filters: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
In most analysis you are only interested in a subset of sites and not all sites. Currently we have the following filter options. | |||
=Selected Sites= | =Selected Sites= | ||
=Allele frequencies= | =Allele frequencies= | ||
= | ; -minMaf [float]: only work with sites with a maf above 'float' | ||
=polymorphic sites= | |||
; -minLRT [float]: only work with sits with an LRT>float | |||
=Major minor= | =Major minor= | ||
=Number of non missing= | =Number of non missing individuals= | ||
; -minKeepInd [int]: only work with sites with information from atleast int individiduals | |||
Revision as of 12:07, 24 September 2012
In most analysis you are only interested in a subset of sites and not all sites. Currently we have the following filter options.
Selected Sites
Allele frequencies
- -minMaf [float]
- only work with sites with a maf above 'float'
polymorphic sites
- -minLRT [float]
- only work with sits with an LRT>float
Major minor
Number of non missing individuals
- -minKeepInd [int]
- only work with sites with information from atleast int individiduals
First we do a run with no filters
./angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: ... head TSK.mafs chromo position major minor knownEM nInd 1 13999919 A C 0.000008 1 1 13999920 G A 0.000008 1 1 13999921 G A 0.000008 1 1 13999922 C A 0.000008 1 1 13999923 A C 0.000008 1 1 13999924 G A 0.000008 1 1 13999925 G A 0.000008 1 1 13999926 A C 0.000008 1 1 13999927 G A 0.000008 1
Now we do a filter with MAF cutoff of 1\%
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minMaf 0.01 head TSK.mafs chromo position major minor knownEM nInd 1 13999950 T G 0.495291 2 1 14000019 G T 0.047247 9 1 14000056 C T 0.055851 10 1 14000127 G T 0.060760 10 1 14000170 C T 0.052388 9 1 14000176 G A 0.047928 10 1 14000202 G A 0.279722 9 1 14000262 C T 0.058555 9 1 14000322 A G 0.040471 8
Similar if we only want sites with information for atleast 5 samples
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minKeepInd 5 head TSK.mafs chromo position major minor knownEM nInd 1 13999971 T A 0.000007 6 1 13999972 G A 0.000007 6 1 13999973 C A 0.000005 5 1 13999974 G A 0.000006 6 1 13999975 C A 0.000002 5 1 13999976 C A 0.000004 7 1 13999977 A C 0.000005 8 1 13999978 C A 0.000005 8 1 13999979 T A 0.000005 8
If we are interested in all sites with a p-value of 10^(-6) of being variable
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minLRT 24 -doSNP 1 head TSK.mafs chromo position major minor knownEM pK-EM nInd 1 14000202 G A 0.279722 42.623150 9 1 14000873 G A 0.212120 79.118476 10 1 14001018 T C 0.333736 89.040311 8 1 14001867 A G 0.200232 47.195423 10 1 14002422 A T 0.167692 43.196259 9 1 14003581 C T 0.207404 58.593208 9 1 14004623 T C 0.219838 102.856433 10 1 14007493 A G 0.453217 28.398647 9 1 14007558 C T 0.395670 80.236777 7
Deprecated options
These options should either be included (as is) or be discarded
- -minDepth
- -maxDepth