ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
RealSFS: Difference between revisions
No edit summary |
|||
Line 3: | Line 3: | ||
</pre> | </pre> | ||
This program will estimate the SFS based on a .saf file generated from the '''./angsd [options] -doSaf '''. | This program will estimate the (multi) SFS based on a .saf file generated from the '''./angsd [options] -doSaf '''. | ||
See also [[SFS Estimation]] and [[2d SFS Estimation]]. | See also [[SFS Estimation]] and [[2d SFS Estimation]]. | ||
Line 11: | Line 10: | ||
<pre> | <pre> | ||
#1d sfs | #1d sfs | ||
./realSFS afile.saf | ./realSFS afile.saf.idx [-start FNAME -P nThreads -tole tole -maxIter -nSites ] | ||
#2dsfs | #2dsfs | ||
./realSFS pop1.saf pop2.saf | ./realSFS pop1.saf.idx pop2.saf.idx [-start FNAME -P nThreads -tole tole -maxIter -nSites] | ||
#2dsfs | |||
./realSFS pop1.saf.idx pop2.saf.idx pop3.saf.idx [-start FNAME -P nThreads -tole tole -maxIter -nSites] | |||
</pre> | </pre> | ||
The saf files are generated using [[SFS estimation| ./angsd -doSaf]]. | The saf files are generated using [[SFS estimation| ./angsd -doSaf]]. | ||
;-start is a file containing a log scaled estimate of the SFS that can be used as the start point for the EM optimisation. | ;-start is a file containing a log scaled estimate of the SFS that can be used as the start point for the EM optimisation. | ||
;-tole When the difference in successive likelihood values in the EM algorithm gets below this value the optimisation will stop | ;-tole When the difference in successive likelihood values in the EM algorithm gets below this value the optimisation will stop | ||
Line 23: | Line 22: | ||
;-nSites Limit the optimisation to a region of this size. If nothing is supplied the program will use the entire saf file | ;-nSites Limit the optimisation to a region of this size. If nothing is supplied the program will use the entire saf file | ||
;-maxIter maximum number of iterations in the EM algorithm | ;-maxIter maximum number of iterations in the EM algorithm | ||
Revision as of 13:28, 11 May 2015
* The program was called emOptim2 in earlier versions, this has now been changed to the more appropiate: 'realSFS'
This program will estimate the (multi) SFS based on a .saf file generated from the ./angsd [options] -doSaf .
See also SFS Estimation and 2d SFS Estimation.
Brief overview
#1d sfs ./realSFS afile.saf.idx [-start FNAME -P nThreads -tole tole -maxIter -nSites ] #2dsfs ./realSFS pop1.saf.idx pop2.saf.idx [-start FNAME -P nThreads -tole tole -maxIter -nSites] #2dsfs ./realSFS pop1.saf.idx pop2.saf.idx pop3.saf.idx [-start FNAME -P nThreads -tole tole -maxIter -nSites]
The saf files are generated using ./angsd -doSaf.
- -start is a file containing a log scaled estimate of the SFS that can be used as the start point for the EM optimisation.
- -tole When the difference in successive likelihood values in the EM algorithm gets below this value the optimisation will stop
- -P number of threads to allocate to program
- -nSites Limit the optimisation to a region of this size. If nothing is supplied the program will use the entire saf file
- -maxIter maximum number of iterations in the EM algorithm
Program defaults to use the EM algorithm for the optimisation. See examples below for using the bfgs optimisation.
1d SFS
realSFS sfstest.saf 20 -P 4 >sfs.em realSFS fstest.saf 20 -P 4 -use-BFGS 1 >sfs.bfgs
The realSFS program will read in a block of the genome (from the .saf) file, and for this region it will estimate the SFS.
The size of the block can be choosen using -nSites argument, otherwise it will try to read in the entire saf file.
If you have .saf file larger than -nSites (you can check the number of sites in the .saf.pos file), then the program will loop over the genome and output the results for each block. So each line in your Whit.saf.ml, is an SFS for a region.
2dsfs
realSFS 2dsfs pop1.saf pop2.saf nChr_pop1 nChr_pop2 [-start FNAME -P nThreads -tole tole -maxIter -nSites ]
nChr_pop1 and nChr_pop2 are the number of chromosomes in population1 and population2 respectively see 2d SFS Estimation for full example.
Output
Main results are printed to the stdout. Values are in logspace.
NB
Use as many sites as possible, for more reliable estimates.