ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
SFS Estimation: Difference between revisions
No edit summary |
No edit summary |
||
Line 4: | Line 4: | ||
For the optimization we have implemented 2 different approaches both found in the misc subdir of the root subdir.This is shown in the diagram below. | For the optimization we have implemented 2 different approaches both found in the misc subdir of the root subdir.This is shown in the diagram below. | ||
NB the ancestral state needs to be supplied for this methodd | |||
<classdiagram type="dir:LR"> | <classdiagram type="dir:LR"> | ||
[sequence data]->GL[genotype likelihoods|SAMtools;GATK;SOAPsnp;Kim et.al] | [sequence data]->GL[genotype likelihoods|SAMtools;GATK;SOAPsnp;Kim et.al] | ||
Line 17: | Line 18: | ||
;-realSFS 4: genotypecalling (not implemented, int this angsd) | ;-realSFS 4: genotypecalling (not implemented, int this angsd) | ||
==options== | |||
;-underFlowProtect [INT] | |||
a very basic underflowprotection | |||
==Example== | |||
A full example is shown below, here we use GATK genotype likelihoods and our reference hg19.fa | |||
<pre> | |||
#first generate .sfs file | |||
./angsd -bam smallBam.filelist -realSFS 1 -out small -anc hg19.fa -GL 2 | |||
#now try the EM optimization with 4 threads | |||
./emOptim.g++ -binput small.sfs -nChr 20 -maxIter 100 -nThread 4 | |||
#lets also try the optimization that uses derivates (bfgs) | |||
./optimSFS.gcc small.sfs -nChr 20 -nThreads 4 | |||
</pre> |
Revision as of 17:17, 10 October 2012
This method will estimate the site frequency spectrum, the method is described in Nielsen2012.
This is a 2 step procedure first generate a ".sfs" file, followed by an optimization of the .sfs file which will estimate the Site frequency spectrum. For the optimization we have implemented 2 different approaches both found in the misc subdir of the root subdir.This is shown in the diagram below.
NB the ancestral state needs to be supplied for this methodd <classdiagram type="dir:LR">
[sequence data]->GL[genotype likelihoods|SAMtools;GATK;SOAPsnp;Kim et.al]
[genotype likelihoods|SAMtools;GATK;SOAPsnp;Kim et.al]->realSFS[.sfs file] [.sfs file]->optimize[.sfs.ml file]
</classdiagram>
- -realSFS 1
- an sfs file will be generated.
- -realSFS 2
- snpcalling (not implemented, in this angsd)
- -realSFS 4
- genotypecalling (not implemented, int this angsd)
options
- -underFlowProtect [INT]
a very basic underflowprotection
Example
A full example is shown below, here we use GATK genotype likelihoods and our reference hg19.fa
#first generate .sfs file ./angsd -bam smallBam.filelist -realSFS 1 -out small -anc hg19.fa -GL 2 #now try the EM optimization with 4 threads ./emOptim.g++ -binput small.sfs -nChr 20 -maxIter 100 -nThread 4 #lets also try the optimization that uses derivates (bfgs) ./optimSFS.gcc small.sfs -nChr 20 -nThreads 4