ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

2d SFS Estimation: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum  [[SFS Estimation|method]]. Never versions of ANGSD can estimate even higher dimensions (upto 4)
Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum  [[SFS Estimation|method]].
* Newer versions of ANGSD can estimate even higher dimensions (upto 4).
* From august17 2019 the program can now do a proper folding of the 2dsfs, which is done by supplying it with the UNFOLDED saf.idx fiels generated by -dosaf 1


Below are some examples:
And is best explained by a full example.
And is best explained by a full example.
=Example=
==Example==
* Assume you have a 12 bamfiles for population in the file '''pop1.list'''
* Assume you have a 12 bamfiles for population in the file '''pop1.list'''
* Assume you have a 14 bamfiles for population in the file '''pop2.list'''
* Assume you have a 14 bamfiles for population in the file '''pop2.list'''
Line 14: Line 17:
</pre>
</pre>


==1 dimensional frequency spectra==
If we were interested in estimating the 1d sfs for each population we could do it like this using the [[realSFS]] program. (See more on [[SFS Estimation |page]] )
If we were interested in estimating the 1d sfs for each population we could do it like this using the [[realSFS]] program. (See more on [[SFS Estimation |page]] )
<pre>
<pre>
Line 21: Line 25:
realSFS pop2.saf.idx -P 24 >pop2.saf.sfs
realSFS pop2.saf.idx -P 24 >pop2.saf.sfs
#2d sfs for pop1 and pop2
#2d sfs for pop1 and pop2
#first argument is saf file,  -P 24 is the number of cores we want to use
realSFS pop1.saf.idx pop2.saf.idx -P 24 >2dsfs.sfs
realSFS pop1.saf.idx pop2.saf.idx -P 24
</pre>
And we now estimate the joint site frequency spectra by using the [[realSFS]] program
The output is then located in a nice flattened matrix format(25x29) in the file: '''2dsfs.sfs'''. Good luck visualising it, some people are using dadi, we have been using heat maps in R.
 
==2d sfs (folded)==
<pre>
<pre>
The output is then located in a nice matrix format(25x29) in the file: '''2dsfs.sfs'''. Good luck visualising it, some people are using dadi, we have been using heat maps in R.
#2d sfs for pop1 and pop2 doing proper folding
realSFS pop1.saf.idx pop2.saf.idx -P 24 -fold 1 >2dsfs.sfs
</pre>

Latest revision as of 02:26, 17 August 2019

Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum method.

  • Newer versions of ANGSD can estimate even higher dimensions (upto 4).
  • From august17 2019 the program can now do a proper folding of the 2dsfs, which is done by supplying it with the UNFOLDED saf.idx fiels generated by -dosaf 1

Below are some examples: And is best explained by a full example.

Example

  • Assume you have a 12 bamfiles for population in the file pop1.list
  • Assume you have a 14 bamfiles for population in the file pop2.list
  • Assume you have a fastafile containing the ancestral state in the anc.fa

Let's start by finding the positions for which we have data in population1 and population2

# as always you can add -minMapQ 1 and -minQ 20 to only keep high quality data.
angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1 -doSaf 1
angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2 -doSaf 1

1 dimensional frequency spectra

If we were interested in estimating the 1d sfs for each population we could do it like this using the realSFS program. (See more on page )

#sfs for pop1
realSFS pop1.saf.idx -P 24 >pop1.saf.sfs
#sfs for pop2
realSFS pop2.saf.idx -P 24 >pop2.saf.sfs
#2d sfs for pop1 and pop2
realSFS pop1.saf.idx pop2.saf.idx -P 24 >2dsfs.sfs

The output is then located in a nice flattened matrix format(25x29) in the file: 2dsfs.sfs. Good luck visualising it, some people are using dadi, we have been using heat maps in R.

2d sfs (folded)

#2d sfs for pop1 and pop2 doing proper folding
realSFS pop1.saf.idx pop2.saf.idx -P 24 -fold 1 >2dsfs.sfs