ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
2d SFS Estimation: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
(27 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum [[SFS Estimation|method]]. | Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum [[SFS Estimation|method]]. | ||
* Newer versions of ANGSD can estimate even higher dimensions (upto 4). | |||
* From august17 2019 the program can now do a proper folding of the 2dsfs, which is done by supplying it with the UNFOLDED saf.idx fiels generated by -dosaf 1 | |||
Below are some examples: | |||
And is best explained by a full example. | And is best explained by a full example. | ||
=Example= | ==Example== | ||
* Assume you have a 12 bamfiles for population in the file '''pop1.list''' | * Assume you have a 12 bamfiles for population in the file '''pop1.list''' | ||
* Assume you have a 14 bamfiles for population in the file '''pop2.list''' | * Assume you have a 14 bamfiles for population in the file '''pop2.list''' | ||
* Assume you have a fastafile containing the ancestral state in the '''anc.fa''' | * Assume you have a fastafile containing the ancestral state in the '''anc.fa''' | ||
Let's start by finding the positions for which we have data in population1 and population2 | Let's start by finding the positions for which we have data in population1 and population2 | ||
<pre> | <pre> | ||
angsd -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1 | # as always you can add -minMapQ 1 and -minQ 20 to only keep high quality data. | ||
angsd -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2 | angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1 -doSaf 1 | ||
angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2 -doSaf 1 | |||
</pre> | |||
==1 dimensional frequency spectra== | |||
If we were interested in estimating the 1d sfs for each population we could do it like this using the [[realSFS]] program. (See more on [[SFS Estimation |page]] ) | |||
<pre> | |||
#sfs for pop1 | |||
realSFS pop1.saf.idx -P 24 >pop1.saf.sfs | |||
#sfs for pop2 | |||
realSFS pop2.saf.idx -P 24 >pop2.saf.sfs | |||
#2d sfs for pop1 and pop2 | |||
realSFS pop1.saf.idx pop2.saf.idx -P 24 >2dsfs.sfs | |||
</pre> | </pre> | ||
The output is then located in a nice flattened matrix format(25x29) in the file: '''2dsfs.sfs'''. Good luck visualising it, some people are using dadi, we have been using heat maps in R. | |||
==2d sfs (folded)== | |||
<pre> | <pre> | ||
#2d sfs for pop1 and pop2 doing proper folding | |||
realSFS pop1.saf.idx pop2.saf.idx -P 24 -fold 1 >2dsfs.sfs | |||
</pre> | </pre> |
Latest revision as of 02:26, 17 August 2019
Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum method.
- Newer versions of ANGSD can estimate even higher dimensions (upto 4).
- From august17 2019 the program can now do a proper folding of the 2dsfs, which is done by supplying it with the UNFOLDED saf.idx fiels generated by -dosaf 1
Below are some examples: And is best explained by a full example.
Example
- Assume you have a 12 bamfiles for population in the file pop1.list
- Assume you have a 14 bamfiles for population in the file pop2.list
- Assume you have a fastafile containing the ancestral state in the anc.fa
Let's start by finding the positions for which we have data in population1 and population2
# as always you can add -minMapQ 1 and -minQ 20 to only keep high quality data. angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1 -doSaf 1 angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2 -doSaf 1
1 dimensional frequency spectra
If we were interested in estimating the 1d sfs for each population we could do it like this using the realSFS program. (See more on page )
#sfs for pop1 realSFS pop1.saf.idx -P 24 >pop1.saf.sfs #sfs for pop2 realSFS pop2.saf.idx -P 24 >pop2.saf.sfs #2d sfs for pop1 and pop2 realSFS pop1.saf.idx pop2.saf.idx -P 24 >2dsfs.sfs
The output is then located in a nice flattened matrix format(25x29) in the file: 2dsfs.sfs. Good luck visualising it, some people are using dadi, we have been using heat maps in R.
2d sfs (folded)
#2d sfs for pop1 and pop2 doing proper folding realSFS pop1.saf.idx pop2.saf.idx -P 24 -fold 1 >2dsfs.sfs