ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

2d SFS Estimation: Difference between revisions

From angsd
Jump to navigation Jump to search
Line 9: Line 9:
* Assume you have a fastafile containing the ancestral state in the '''anc.fa'''
* Assume you have a fastafile containing the ancestral state in the '''anc.fa'''
* Assume we are only interested in '''chr1'''
* Assume we are only interested in '''chr1'''
Let's start by finding the positions for which we have data in population1 and population2
<pre>
angsd -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1
angsd -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2
</pre>

Revision as of 22:37, 12 March 2014

Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum method.

The method works by calculating population specific sample allele frequencies. A minor annoyance in the current implementation is that you will need to limit the analysis to the sites that has coverage in both population. This in effect means that you will need to do two passes for each population.

And is best explained by a full example.

Example

  • Assume you have a 12 bamfiles for population in the file pop1.list
  • Assume you have a 14 bamfiles for population in the file pop2.list
  • Assume you have a fastafile containing the ancestral state in the anc.fa
  • Assume we are only interested in chr1

Let's start by finding the positions for which we have data in population1 and population2

angsd -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1
angsd -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2