ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Safv3: Difference between revisions
| Line 38: | Line 38: | ||
| ==Two population analysis== | ==Two population analysis== | ||
| ##old version required a run for each population, to find the intersect and then limit the analysis to the intersect. | |||
| ##Here are all 4 commands | |||
| ==> oldceu2.arg <== | |||
| ../master/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/ceu.ricco.list -gl 1 -P 5 -out oldceu2 -rf rf -sites intersect.txt  | |||
| ==> oldceu.arg <== | |||
| ../angsd/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/ceu.ricco.list -gl 1 -P 5 -out oldceu -rf rf  | |||
| ==> oldyri2.arg <== | |||
| ../master/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/yri.ricco.list -gl 1 -P 5 -out oldyri2 -rf rf -sites intersect.txt  | |||
| ==> oldyri.arg <== | |||
| ../angsd/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/yri.ricco.list -gl 1 -P 5 -out oldyri -rf rf | |||
| ##with intersect found like | |||
| gunzip -c oldceu.saf.pos.gz oldyri.saf.pos.gz|sort  -S 50%|uniq -d|sort -k1,1  -S 50% >intersect.txt | |||
| ##The old saf files are very big so we had to limit the analysis to 180mio sites | |||
| ../master/misc/realSFS 2dsfs oldceu2.saf oldyri2.saf 36 52 -nSites 180000000 -P 20 >oldceu2.oldyri2.ml | |||
| ##the new format is much simpler here we simply did | |||
| ==> newceu.arg <== | |||
| ../angsd/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/ceu.ricco.list -gl 1 -P 5 -out newceu -rf rf  | |||
| ==> newyri.arg <== | |||
| ../angsd/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/yri.ricco.list -gl 1 -P 5 -out newyri -rf rf | |||
Revision as of 09:48, 7 May 2015
We decided to update the native simple binary double format to a much more intelligent format that allows for random access. The format is described in doc/formats.pdf.
This page will contain the impact of this new format in downstream analysis.
One population analysis
#old master
angsd version: 0.801-27-ga699b44 (htslib: 1.2.1-62-g35746af) build(May  5 2015 03:38:17)
#new new saf
angsd version: 0.801-54-gcf1a12d-dirty (htslib: 1.2.1-62-g35746af) build(May  6 2015 23:34:27)
##old
../master/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/ceu.ricco.list -gl 1 -P 5 -out oldceu -rf rf
../master/misc/realSFS oldceu.saf 36 -nSites 213376207 -P 20 >oldceu.saf.ml
##new
../angsd/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/ceu.ricco.list -gl 1 -P 5 -out newceu -rf rf
../angsd/misc/realSFS ../nsfs/newceu.saf.idx -P 16 -r 1 >ceu.chr1
##comparison
a<-exp(scan("newceu.saf.idx.chr1.ml"))
b<-exp(as.numeric(read.table("oldceu.saf.ml")[1,]))
a-b
 [1]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
 [6]  0.000000e+00  0.000000e+00 -1.248518e-10  4.059244e-10 -3.843052e-10
[11]  4.952888e-10 -2.465176e-10  7.169737e-11  0.000000e+00  0.000000e+00
[16]  0.000000e+00  0.000000e+00 -4.288667e-11  0.000000e+00  0.000000e+00
[21]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
[26]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
[31]  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
[36]  0.000000e+00  0.000000e+00
range(a-b)
[1] -3.843052e-10  4.952888e-10
 barplot(rbind(a,b)[,-c(1,37)],be=T,legend=c("new","old"),col=1:2)
Two population analysis
- old version required a run for each population, to find the intersect and then limit the analysis to the intersect.
- Here are all 4 commands
 
> oldceu2.arg <
../master/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/ceu.ricco.list -gl 1 -P 5 -out oldceu2 -rf rf -sites intersect.txt
> oldceu.arg <
../angsd/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/ceu.ricco.list -gl 1 -P 5 -out oldceu -rf rf
> oldyri2.arg <
../master/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/yri.ricco.list -gl 1 -P 5 -out oldyri2 -rf rf -sites intersect.txt
> oldyri.arg <
../angsd/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/yri.ricco.list -gl 1 -P 5 -out oldyri -rf rf
- with intersect found like
 
gunzip -c oldceu.saf.pos.gz oldyri.saf.pos.gz|sort -S 50%|uniq -d|sort -k1,1 -S 50% >intersect.txt
- The old saf files are very big so we had to limit the analysis to 180mio sites
 
../master/misc/realSFS 2dsfs oldceu2.saf oldyri2.saf 36 52 -nSites 180000000 -P 20 >oldceu2.oldyri2.ml
- the new format is much simpler here we simply did
 
> newceu.arg <
../angsd/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/ceu.ricco.list -gl 1 -P 5 -out newceu -rf rf
> newyri.arg <
../angsd/angsd -anc hg19ancNoChr.fa -dosaf 1 -b /space/genomes/1000g/lowC2014/filelists/yri.ricco.list -gl 1 -P 5 -out newyri -rf rf