ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Sites: Difference between revisions
No edit summary |
(→Main) |
||
Line 13: | Line 13: | ||
And make sure file contains 4 columns (chr tab pos tab major tab minor) | And make sure file contains 4 columns (chr tab pos tab major tab minor) | ||
</pre> | </pre> | ||
=Selected Sites= | =Selected Sites= |
Revision as of 13:55, 11 December 2013
This page describes the -sites filtering that angsd allows. This functionality allows the user to supply a list of sites for which the analysis will be limited to. If you are interested in regions you should consider to use the -r/-rf options, as described in Filters. The -sites will loop over all input data, where as the -r/-rf, will use the indexing of BAM files. The -sites and -r/-rf can be used in combination.
Brief overview
/angsd -sites -> angsd version: 0.569 build(Dec 11 2013 13:38:47) -> Analysis helpbox/synopsis information: -------------- analysisKeepList.cpp: -sites (null) (File containing sites to keep (chr tab pos)) -minInd 0 Only use site if atleast minInd of samples has data You can force major/minor by -doMajorMinor 3 And make sure file contains 4 columns (chr tab pos tab major tab minor)
Selected Sites
We support 2 different kinds of inputfiles for filtering.
- Either the user supply a file containing chromsome and positions
- Or the user supply a file containing chromosome,position, major and minor
Only sites contained in the filter file will be outputted. If you supply an augmented filter for the purpose of forcing a major and minor state then remember to supply '-doMajorMinor 3'
A filter file is supplied to ANGSD with the command
-filter filename
Example of a filter file. File must be tab seperated.
chr1 100001 chr1 2500000 chr1 347348
Example of a file containing information of major and minor. File must be tab seperated.
1 728951 T C 1 752721 A G 1 754182 A G 1 754334 T C 1 760912 C T 1 776546 G A 1 779322 G A 1 838555 A C
The major and minor state can also be encoded as 0,1,2,3,4. With 0=A,1=C,2=G,3=T,4=N
We do not require the positions to be sorted, but we require that the file is grouped by chromosome name.
Details
if a filter file has been supplied as '-filter filter.txt', then ANGSD will parse the entire filter.txt file and generate binary representations and dump these in the outputfiles called
- filter.txt.bin
- filter.txt.idx
Therefore remember to purge old versions of these files, if you have updated the filter.txt file.
Allele frequencies
- -minMaf [float]
- only work with sites with a maf above 'float'
polymorphic sites
- -minLRT [float]
- only work with sits with an LRT>float
Number of non missing individuals
- -minInd [int]
- only work with sites with information from atleast int individiduals, requires -doCounts 1
First we do a run with no filters
./angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: ... head TSK.mafs chromo position major minor knownEM nInd 1 13999919 A C 0.000008 1 1 13999920 G A 0.000008 1 1 13999921 G A 0.000008 1 1 13999922 C A 0.000008 1 1 13999923 A C 0.000008 1 1 13999924 G A 0.000008 1 1 13999925 G A 0.000008 1 1 13999926 A C 0.000008 1 1 13999927 G A 0.000008 1
Now we do a filter with MAF cutoff of 1\%
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minMaf 0.01 head TSK.mafs chromo position major minor knownEM nInd 1 13999950 T G 0.495291 2 1 14000019 G T 0.047247 9 1 14000056 C T 0.055851 10 1 14000127 G T 0.060760 10 1 14000170 C T 0.052388 9 1 14000176 G A 0.047928 10 1 14000202 G A 0.279722 9 1 14000262 C T 0.058555 9 1 14000322 A G 0.040471 8
Similar if we only want sites with information for atleast 5 samples
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minKeepInd 5 head TSK.mafs chromo position major minor knownEM nInd 1 13999971 T A 0.000007 6 1 13999972 G A 0.000007 6 1 13999973 C A 0.000005 5 1 13999974 G A 0.000006 6 1 13999975 C A 0.000002 5 1 13999976 C A 0.000004 7 1 13999977 A C 0.000005 8 1 13999978 C A 0.000005 8 1 13999979 T A 0.000005 8
If we are interested in all sites with a p-value of 10^(-6) of being variable
../angsd0.3/angsd -doMaf 2 -doMajorMinor 1 -out TSK -bam bam.filelist -GL 1 -r 1: -minLRT 24 -doSNP 1 head TSK.mafs chromo position major minor knownEM pK-EM nInd 1 14000202 G A 0.279722 42.623150 9 1 14000873 G A 0.212120 79.118476 10 1 14001018 T C 0.333736 89.040311 8 1 14001867 A G 0.200232 47.195423 10 1 14002422 A T 0.167692 43.196259 9 1 14003581 C T 0.207404 58.593208 9 1 14004623 T C 0.219838 102.856433 10 1 14007493 A G 0.453217 28.398647 9 1 14007558 C T 0.395670 80.236777 7
Deprecated options
These options should either be included (as is) or be discarded
- -minDepth
- -maxDepth