2d SFS Estimation: Difference between revisions

Latest revision as of 03:26, 17 August 2019

Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum method.

Newer versions of ANGSD can estimate even higher dimensions (upto 4).
From august17 2019 the program can now do a proper folding of the 2dsfs, which is done by supplying it with the UNFOLDED saf.idx fiels generated by -dosaf 1

Below are some examples: And is best explained by a full example.

Example

Assume you have a 12 bamfiles for population in the file pop1.list
Assume you have a 14 bamfiles for population in the file pop2.list
Assume you have a fastafile containing the ancestral state in the anc.fa

Let's start by finding the positions for which we have data in population1 and population2

# as always you can add -minMapQ 1 and -minQ 20 to only keep high quality data.
angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1 -doSaf 1
angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2 -doSaf 1

1 dimensional frequency spectra

If we were interested in estimating the 1d sfs for each population we could do it like this using the realSFS program. (See more on page )

#sfs for pop1
realSFS pop1.saf.idx -P 24 >pop1.saf.sfs
#sfs for pop2
realSFS pop2.saf.idx -P 24 >pop2.saf.sfs
#2d sfs for pop1 and pop2
realSFS pop1.saf.idx pop2.saf.idx -P 24 >2dsfs.sfs

The output is then located in a nice flattened matrix format(25x29) in the file: 2dsfs.sfs. Good luck visualising it, some people are using dadi, we have been using heat maps in R.

2d sfs (folded)

#2d sfs for pop1 and pop2 doing proper folding
realSFS pop1.saf.idx pop2.saf.idx -P 24 -fold 1 >2dsfs.sfs

@@ Line 1: / Line 1: @@
 Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum  [[SFS Estimation|method]].
+* Newer versions of ANGSD can estimate even higher dimensions (upto 4).
+* From august17 2019 the program can now do a proper folding of the 2dsfs, which is done by supplying it with the UNFOLDED saf.idx fiels generated by -dosaf 1
-The method works by calculating population specific sample allele frequencies. A minor annoyance in the current implementation is that you will need to limit the analysis to the sites that has coverage in both population. This in effect means that you will need to do two passes for each population.
+Below are some examples:
 And is best explained by a full example.
-=Example=
+==Example==
 * Assume you have a 12 bamfiles for population in the file '''pop1.list'''
 * Assume you have a 14 bamfiles for population in the file '''pop2.list'''
 * Assume you have a fastafile containing the ancestral state in the '''anc.fa'''
-* Assume we are only interested in '''chr1'''
 Let's start by finding the positions for which we have data in population1 and population2
 <pre>
 # as always you can add -minMapQ 1 and -minQ 20 to only keep high quality data.
-angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1
+angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1 -doSaf 1
-angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2
+angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2 -doSaf 1
-</pre>
-Each run will generate 2 files of interest: '''pop1.saf,pop1.saf.pos''' and '''pop2.saf,pop2.saf.pos'''
-If we were interested in estimating the 1d sfs for each population we could do it like this using the [[EmOptim2]] program. (See more on [[SFS Estimation |page]] )
-<pre>
-emOptim2 pop1.saf 24 -P 24 >pop1.saf.sfs
-emOptim2 pop2.saf 28 -P 24 >pop2.saf.sfs
-</pre>
-Now we find the positions that occurs both in population1 and population2 using the '''uniq''' POSIX program.
-<pre>
-cat pop1.saf.pos pop2.saf.pos|uniq -d >intersect.txt
 </pre>
-And now we redo the angsd sample allelefrequence calculation by conditioning on the sites that occur in both populations
+==1 dimensional frequency spectra==
+If we were interested in estimating the 1d sfs for each population we could do it like this using the [[realSFS]] program. (See more on [[SFS Estimation |page]] )
 <pre>
-# as always you can add -minMapQ 1 and -minQ 20 to only keep high quality data.
+#sfs for pop1
-angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1 -sites intersect.txt
+realSFS pop1.saf.idx -P 24 >pop1.saf.sfs
-angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2 -sites intersect.txt
+#sfs for pop2
+realSFS pop2.saf.idx -P 24 >pop2.saf.sfs
+#2d sfs for pop1 and pop2
+realSFS pop1.saf.idx pop2.saf.idx -P 24 >2dsfs.sfs
 </pre>
-Notice that the last 2 commands will overwrite the: '''pop1.saf,pop1.saf.pos''' and '''pop2.saf,pop2.saf.pos''' files.
+The output is then located in a nice flattened matrix format(25x29) in the file: '''2dsfs.sfs'''. Good luck visualising it, some people are using dadi, we have been using heat maps in R.
-And we now estimate the joint site frequency spectra by using the [[emOptim2]] program
+==2d sfs (folded)==
 <pre>
-emOptim2 pop1.saf pop2.saf  24 28 -P 24 >2dsfs.sfs
+#2d sfs for pop1 and pop2 doing proper folding
+realSFS pop1.saf.idx pop2.saf.idx -P 24 -fold 1 >2dsfs.sfs
 </pre>

2d SFS Estimation: Difference between revisions

Latest revision as of 03:26, 17 August 2019

Example

1 dimensional frequency spectra

2d sfs (folded)

Navigation menu