ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Sites
This page describes the -sites filtering that angsd allows. This functionality allows the user to supply a list of sites for which the analysis will be limited to. If you are interested in regions you should consider to use the -r/-rf options, as described in Filters. The -sites will loop over all input data, where as the -r/-rf, will use the indexing of BAM files. The -sites and -r/-rf can be used in combination.
Brief overview
/angsd -sites -> angsd version: 0.569 build(Dec 11 2013 13:38:47) -> Analysis helpbox/synopsis information: -------------- analysisKeepList.cpp: -sites (null) (File containing sites to keep (chr tab pos)) -minInd 0 Only use site if atleast minInd of samples has data You can force major/minor by -doMajorMinor 3 And make sure file contains 4 columns (chr tab pos tab major tab minor)
Details
- -sites filename
File containing the sites to include in analysis. If the site does not exist in the sequencing data, then the sites will of course not be included in the output.
- -minInd [int]
Only keep those sites where we have data for at least this number of individuals.
We support 2 different kinds of inputfiles for filtering.
- Either the user supply a file containing chromsome and positions
- Or the user supply a file containing chromosome,position, major and minor
Only sites contained in the filter file will be outputted. If you supply an augmented filter for the purpose of forcing a major and minor state then remember to supply '-doMajorMinor 3'
A filter file is supplied to ANGSD with the command
-filter filename
Example of a filter file. File must be tab seperated.
chr1 100001 chr1 2500000 chr1 347348
Example of a file containing information of major and minor. File must be tab seperated.
1 728951 T C 1 752721 A G 1 754182 A G 1 754334 T C 1 760912 C T 1 776546 G A 1 779322 G A 1 838555 A C
The major and minor state can also be encoded as 0,1,2,3,4. With 0=A,1=C,2=G,3=T,4=N
We do not require the positions to be sorted, but we require that the file is grouped by chromosome name.
Internal representation
if a filter file has been supplied as '-filter filter.txt', then ANGSD will parse the entire filter.txt file and generate binary representations and dump these in the outputfiles called
- filter.txt.bin
- filter.txt.idx
Therefore remember to purge old versions of these files, if you have updated the filter.txt file.
Allele frequencies
- -minMaf [float]
- only work with sites with a maf above 'float'
polymorphic sites
- -minLRT [float]
- only work with sits with an LRT>float
Number of non missing individuals
- -minInd [int]
- only work with sites with information from atleast int individiduals, requires -doCounts 1
First we do a run with no filters
./angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: ... head TSK.mafs chromo position major minor knownEM nInd 1 13999919 A C 0.000008 1 1 13999920 G A 0.000008 1 1 13999921 G A 0.000008 1 1 13999922 C A 0.000008 1 1 13999923 A C 0.000008 1 1 13999924 G A 0.000008 1 1 13999925 G A 0.000008 1 1 13999926 A C 0.000008 1 1 13999927 G A 0.000008 1
Now we do a filter with MAF cutoff of 1\%
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minMaf 0.01 head TSK.mafs chromo position major minor knownEM nInd 1 13999950 T G 0.495291 2 1 14000019 G T 0.047247 9 1 14000056 C T 0.055851 10 1 14000127 G T 0.060760 10 1 14000170 C T 0.052388 9 1 14000176 G A 0.047928 10 1 14000202 G A 0.279722 9 1 14000262 C T 0.058555 9 1 14000322 A G 0.040471 8
Similar if we only want sites with information for atleast 5 samples
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minKeepInd 5 head TSK.mafs chromo position major minor knownEM nInd 1 13999971 T A 0.000007 6 1 13999972 G A 0.000007 6 1 13999973 C A 0.000005 5 1 13999974 G A 0.000006 6 1 13999975 C A 0.000002 5 1 13999976 C A 0.000004 7 1 13999977 A C 0.000005 8 1 13999978 C A 0.000005 8 1 13999979 T A 0.000005 8
If we are interested in all sites with a p-value of 10^(-6) of being variable
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minLRT 24 -doSNP 1 head TSK.mafs chromo position major minor knownEM pK-EM nInd 1 14000202 G A 0.279722 42.623150 9 1 14000873 G A 0.212120 79.118476 10 1 14001018 T C 0.333736 89.040311 8 1 14001867 A G 0.200232 47.195423 10 1 14002422 A T 0.167692 43.196259 9 1 14003581 C T 0.207404 58.593208 9 1 14004623 T C 0.219838 102.856433 10 1 14007493 A G 0.453217 28.398647 9 1 14007558 C T 0.395670 80.236777 7
Deprecated options
These options should either be included (as is) or be discarded
- -minDepth
- -maxDepth