Revision as of 22:47, 12 March 2014

Angsd can estimate a 2d site frequency spectrum. This is an extension of the 1d site frequency spectrum method.

The method works by calculating population specific sample allele frequencies. A minor annoyance in the current implementation is that you will need to limit the analysis to the sites that has coverage in both population. This in effect means that you will need to do two passes for each population.

And is best explained by a full example.

Example

Assume you have a 12 bamfiles for population in the file pop1.list
Assume you have a 14 bamfiles for population in the file pop2.list
Assume you have a fastafile containing the ancestral state in the anc.fa
Assume we are only interested in chr1

Let's start by finding the positions for which we have data in population1 and population2

# as always you can add -minMapQ 1 and -minQ 20 to only keep high quality data.
angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1
angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2

Each run will generate 2 files of interest: pop1.saf,pop1.saf.pos and pop2.saf,pop2.saf.pos

If we were interested in estimating the 1d sfs for each population we could do it like this using the EmOptim2 program. (See more on page )

emOptim2 pop1.saf 24 -P 24 >pop1.saf.sfs
emOptim2 pop2.saf 28 -P 24 >pop2.saf.sfs

2d SFS Estimation: Difference between revisions

Revision as of 22:47, 12 March 2014

Example

Navigation menu

@@ Line 12: / Line 12: @@
 Let's start by finding the positions for which we have data in population1 and population2
 <pre>
-angsd -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1
+# as always you can add -minMapQ 1 and -minQ 20 to only keep high quality data.
-angsd -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2
+angsd -GL 1 -b pop1.list -anc anc.fa -r chr1: -P 10 -out pop1
+angsd -GL 1 -b pop2.list -anc anc.fa -r chr1: -P 10 -out pop2
 </pre>
 Each run will generate 2 files of interest: '''pop1.saf,pop1.saf.pos''' and '''pop2.saf,pop2.saf.pos'''