ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Fst: Difference between revisions
Line 85: | Line 85: | ||
;-type 0 Split out the genome into blocks. And use the first window that have data for the entire window. Then we will have the same windowcenters across datasets. | ;-type 0 Split out the genome into blocks. And use the first window that have data for the entire window. Then we will have the same windowcenters across datasets. | ||
==realSFS fst print== |
Revision as of 21:01, 19 January 2016
Our program can estimate fst between populations. And has been generalized to give all pairwise fst estimates if you supply the command with multiple populations.
- if you supply 3 populations, the program will also output the pbs statistic.
- NB we have removed the very unusefull unweighted fst estimator in the output, and have included a header. The output example below will be updated at some point.
The procedure is
- Use angsd for calculating saf files for each population
- Use realSFS to calculate 2d sfs for each pair
- Use the above calculated 2dsfs as priors jointly with all safs from step1 to calculate fst binary files
- Use realSFS to extract the the fst values from the fst
Two Populations real data
#this is with 2pops #first calculate per pop saf for each populatoin ../angsd -b list1 -anc hg19ancNoChr.fa -out pop1 -dosaf 1 -gl 1 ../angsd -b list2 -anc hg19ancNoChr.fa -out pop2 -dosaf 1 -gl 1 #calculate the 2dsfs prior ../misc/realSFS pop1.saf.idx pop2.saf.idx >pop1.pop2.ml #prepare the fst for easy window analysis etc ../misc/realSFS fst index pop1.saf.idx pop2.saf.idx -sfs pop1.pop2.ml -fstout here #get the global estimate ../misc/realSFS fst stats here.fst.idx -> FST.Unweight:0.069395 Fst.Weight:0.042349 #below is not tested that much, but seems to work ../misc/realSFS fst stats2 here.fst.idx -win 50000 -step 10000 >slidingwindow
3 Populations real data
In commands below im using 24 threads, because this is what I have. Adjust accordingly
#this is with 2pops #first calculate per pop saf for each populatoin ./angsd -b list10 -anc hg19ancNoChr.fa -out pop1 -dosaf 1 -gl 1 ./angsd -b list11 -anc hg19ancNoChr.fa -out pop2 -dosaf 1 -gl 1 ./angsd -b list12 -anc hg19ancNoChr.fa -out pop3 -dosaf 1 -gl 1 #calculate all pairwise 2dsfs's ./misc/realSFS pop1.saf.idx pop2.saf.idx -P 24 >pop1.pop2.ml ./misc/realSFS pop1.saf.idx pop3.saf.idx -P 24 >pop1.pop3.ml ./misc/realSFS pop2.saf.idx pop3.saf.idx -P 24 >pop2.pop3.ml #prepare the fst for easy analysis etc ./misc/realSFS fst index pop1.saf.idx pop2.saf.idx pop3.saf.idx -sfs pop1.pop2.ml -sfs pop1.pop3.ml -sfs pop2.pop3.ml -fstout here #get the global estimate -> Assuming idxname:here.fst.idx -> Assuming .fst.gz file: here.fst.gz -> FST.Unweight[nObs:1666245]:0.022063 Fst.Weight:0.034513 0.022063 0.034513 -> FST.Unweight[nObs:1666245]:0.026867 Fst.Weight:0.031989 0.026867 0.031989 -> FST.Unweight[nObs:1666245]:0.025324 Fst.Weight:0.021118 0.025324 0.021118 -> pbs.pop1 0.023145 -> pbs.pop2 0.005088 -> pbs.pop3 0.009367 #below is not tested that much, but seems to work ../misc/realSFS fst stats2 here.fst.idx -win 50000 -step 10000 >slidingwindow
In the presence of 3 populations, the program will also calculate the population branch statistics
Sliding Window output
The sliding window seems to work so we have documented it here:
- Second column is chromosome, third is center of window followed by
- fst.unweight(pop1,pop2) fst.weight(pop1,pop2) fst.unweight(pop1,pop3) fst.weight(pop1,pop3) fst.unweight(pop2,pop3) fst.weight(pop2,pop3)
(9133,58895)(14010000,14059999)(14010000,14060000) 1 14035000 0.022099 0.016387 0.026686 0.027731 0.025311 0.047920 -0.002231 0.035045 0.030353 (19114,68881)(14020000,14069999)(14020000,14070000) 1 14045000 0.022096 0.019076 0.026777 0.024238 0.025290 0.052793 -0.005220 0.041969 0.029757 (28951,78655)(14030000,14079999)(14030000,14080000) 1 14055000 0.022043 0.021025 0.026915 0.023368 0.025342 0.056975 -0.006884 0.046840 0.030530 (38928,88632)(14040000,14089999)(14040000,14090000) 1 14065000 0.022083 0.016525 0.026846 0.029560 0.025345 0.053421 -0.004116 0.039898 0.034122 (48917,98170)(14050000,14099999)(14050000,14100000) 1 14075000 0.022132 0.022082 0.026742 0.025564 0.025262 0.037071 0.005226 0.024827 0.020671 (74,49191)(14000000,14049999)(14000000,14050000) 10 14025000 0.022704 0.101955 0.026479 0.095713 0.025102 0.001924 0.103108 -0.048378 -0.002500 (9734,58555)(14010000,14059999)(14010000,14060000) 10 14035000 0.022779 0.102670 0.026425 0.088015 0.025118 0.002721 0.098870 -0.043342 -0.006738
The last 3 columns are the populations branch statistic for population1, popultion2 and population3
Relative window positions?
We allow for 3 different ways of defining window positions, these are chosen with the -type argument in realSFS
- -type 2 Use pos=1 as the leftmost position of first window. Even though there isn't any data.
- -type 1 Use first position with data, as leftmost position for the first window.
- -type 0 Split out the genome into blocks. And use the first window that have data for the entire window. Then we will have the same windowcenters across datasets.