ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Sites
Version notice
The information on this page relates to version 0.542 or above.
Main
In most analysis you are only interested in a subset of sites and not all sites. Currently we have the following filter options.
Selected Sites
We support 2 different kinds of inputfiles for filtering.
- Either the user supply a file containing chromsome and positions
- Or the user supply a file containing chromosome,position, major and minor
Only sites contained in the filter file will be outputted. If you supply an augmented filter for the purpose of forcing a major and minor state then remember to supply '-doMajorMinor 3'
A filter file is supplied to ANGSD with the command
-filter filename </pre Example of a filter file. File must be tab seperated. <pre> chr1 100001 chr1 2500000 chr1 347348
Example of a file containing information of major and minor. File must be tab seperated.
1 728951 T C 1 752721 A G 1 754182 A G 1 754334 T C 1 760912 C T 1 776546 G A 1 779322 G A 1 838555 A C
The major and minor state can also be encoded as 0,1,2,3,4. With 0=A,1=C,2=G,3=T,4=N
We do not require the positions to be sorted, but we require that the file grouped by chromosome name.
Details
if a filter file has been supplied as '-filter filter.txt', then ANGSD will parse the entire filter.txt file and generate binary representations and dump these in the outputfiles called
- filter.txt.bin
- filter.txt.idx
Therefore remember to purge old versions of these files, if you have updated the filter.txt file.
Allele frequencies
- -minMaf [float]
- only work with sites with a maf above 'float'
polymorphic sites
- -minLRT [float]
- only work with sits with an LRT>float
Number of non missing individuals
- -minInd [int]
- only work with sites with information from atleast int individiduals, requires -doCounts 1
First we do a run with no filters
./angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: ... head TSK.mafs chromo position major minor knownEM nInd 1 13999919 A C 0.000008 1 1 13999920 G A 0.000008 1 1 13999921 G A 0.000008 1 1 13999922 C A 0.000008 1 1 13999923 A C 0.000008 1 1 13999924 G A 0.000008 1 1 13999925 G A 0.000008 1 1 13999926 A C 0.000008 1 1 13999927 G A 0.000008 1
Now we do a filter with MAF cutoff of 1\%
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minMaf 0.01 head TSK.mafs chromo position major minor knownEM nInd 1 13999950 T G 0.495291 2 1 14000019 G T 0.047247 9 1 14000056 C T 0.055851 10 1 14000127 G T 0.060760 10 1 14000170 C T 0.052388 9 1 14000176 G A 0.047928 10 1 14000202 G A 0.279722 9 1 14000262 C T 0.058555 9 1 14000322 A G 0.040471 8
Similar if we only want sites with information for atleast 5 samples
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minKeepInd 5 head TSK.mafs chromo position major minor knownEM nInd 1 13999971 T A 0.000007 6 1 13999972 G A 0.000007 6 1 13999973 C A 0.000005 5 1 13999974 G A 0.000006 6 1 13999975 C A 0.000002 5 1 13999976 C A 0.000004 7 1 13999977 A C 0.000005 8 1 13999978 C A 0.000005 8 1 13999979 T A 0.000005 8
If we are interested in all sites with a p-value of 10^(-6) of being variable
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minLRT 24 -doSNP 1 head TSK.mafs chromo position major minor knownEM pK-EM nInd 1 14000202 G A 0.279722 42.623150 9 1 14000873 G A 0.212120 79.118476 10 1 14001018 T C 0.333736 89.040311 8 1 14001867 A G 0.200232 47.195423 10 1 14002422 A T 0.167692 43.196259 9 1 14003581 C T 0.207404 58.593208 9 1 14004623 T C 0.219838 102.856433 10 1 14007493 A G 0.453217 28.398647 9 1 14007558 C T 0.395670 80.236777 7
Deprecated options
These options should either be included (as is) or be discarded
- -minDepth
- -maxDepth