ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

2d SFS Estimation: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum  [[SFS Estimation|method]]. Never versions of ANGSD can estimate even higher dimensions (upto 4)
Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum  [[SFS Estimation|method]]. Newer versions of ANGSD can estimate even higher dimensions (upto 4). From august17 2019 the program can now do a proper folding, which is done by supplying it with the UNFOLDED saf.idx fiels generated by -dosaf 1
 
Below are some examples:
 


And is best explained by a full example.
And is best explained by a full example.
=Example=
==Example==
* Assume you have a 12 bamfiles for population in the file '''pop1.list'''
* Assume you have a 12 bamfiles for population in the file '''pop1.list'''
* Assume you have a 14 bamfiles for population in the file '''pop2.list'''
* Assume you have a 14 bamfiles for population in the file '''pop2.list'''

Revision as of 03:23, 17 August 2019

Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum method. Newer versions of ANGSD can estimate even higher dimensions (upto 4). From august17 2019 the program can now do a proper folding, which is done by supplying it with the UNFOLDED saf.idx fiels generated by -dosaf 1

Below are some examples:


And is best explained by a full example.

Example

  • Assume you have a 12 bamfiles for population in the file pop1.list
  • Assume you have a 14 bamfiles for population in the file pop2.list
  • Assume you have a fastafile containing the ancestral state in the anc.fa

Let's start by finding the positions for which we have data in population1 and population2

# as always you can add -minMapQ 1 and -minQ 20 to only keep high quality data.
angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1 -doSaf 1
angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2 -doSaf 1

If we were interested in estimating the 1d sfs for each population we could do it like this using the realSFS program. (See more on page )

#sfs for pop1
realSFS pop1.saf.idx -P 24 >pop1.saf.sfs
#sfs for pop2
realSFS pop2.saf.idx -P 24 >pop2.saf.sfs
#2d sfs for pop1 and pop2
realSFS pop1.saf.idx pop2.saf.idx -P 24 >2dsfs.sfs

The output is then located in a nice flattened matrix format(25x29) in the file: 2dsfs.sfs. Good luck visualising it, some people are using dadi, we have been using heat maps in R.