ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Contamination: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 11: | Line 11: | ||
<pre> | <pre> | ||
./angsd -i my.bam -r X: -doCounts 1 -iCounts 1 -minMapQ 30 -minQ 20 | ./angsd -i my.bam -r X: -doCounts 1 -iCounts 1 -minMapQ 30 -minQ 20 | ||
Rscript ../R/contamination.R mapFile="../RES/map100.chrX.bz2" hapFile="../RES/hapMapCeuXlift.map.bz2" countFile="angsdput.icnts.gz" mc.cores=24 | |||
</pre> | |||
<pre> | |||
Rscript ../R/contamination.R mapFile="../RES/map100.chrX.bz2" hapFile="../RES/hapMapCeuXlift.map.bz2" countFile="/space/anders/ida/idaSjov/kostenkitest/contamination/out/V1countKostinki.USER.bam.X.gz" mc.cores=24 | |||
Loading required package: multicore | |||
----------------------- | |||
Doing Fisher exact test for Method1: | |||
[,1] [,2] | |||
[1,] 246 157 | |||
[2,] 17700 143407 | |||
Fisher's Exact Test for Count Data | |||
data: mat | |||
p-value < 2.2e-16 | |||
alternative hypothesis: true odds ratio is not equal to 1 | |||
95 percent confidence interval: | |||
10.34000 15.61672 | |||
sample estimates: | |||
odds ratio | |||
12.6959 | |||
----------------------- | |||
Doing Fisher exact test for Method2: | |||
[,1] [,2] | |||
[1,] 91 55 | |||
[2,] 7355 59513 | |||
Fisher's Exact Test for Count Data | |||
data: mat2 | |||
p-value < 2.2e-16 | |||
alternative hypothesis: true odds ratio is not equal to 1 | |||
95 percent confidence interval: | |||
9.466476 19.085589 | |||
sample estimates: | |||
odds ratio | |||
13.38675 | |||
---------------------- | |||
Running jackknife for Method1 (could be slow) | |||
Running jackknife for Method2 (could be slow) | |||
$est | |||
Method1 Method2 | |||
Contamination 0.03837625 0.03380983 | |||
llh 1034.078 483.5145 | |||
SE 0.002630455 0.003900376 | |||
$err | |||
[1] 0.01370779 | |||
$c | |||
[1] 0.001093589 | |||
$est | |||
Method1 Method2 | |||
Contamination 0.03837625 0.03380983 | |||
llh 1034.078 483.5145 | |||
SE 0.002630455 0.003900376 | |||
$err | |||
[1] 0.01370779 | |||
$c | |||
[1] 0.001093589 | |||
</pre> | </pre> |
Revision as of 11:49, 27 June 2014
Angsd can estimate contamination, but only for chromosomes that exists in one genecopy (eg chrX for males). This method requires a list of HapMap sites along with their frequency and we also recommend to discard regions with low mappability.
We have included a mappability and HapMap files for chrX these are found in the RES subfolder of the angsd source package. So if you are working with humans, and your sample is a male then you can estimate the contamination with the follow two commands.
- First we generate a binary count file for chrX for a single BAM file (ANGSD cprogram)
- Then we do a fisher test for finding a p-value, and jackknife to get an estimate of contamination (Rprogram)
An example are found below:
./angsd -i my.bam -r X: -doCounts 1 -iCounts 1 -minMapQ 30 -minQ 20 Rscript ../R/contamination.R mapFile="../RES/map100.chrX.bz2" hapFile="../RES/hapMapCeuXlift.map.bz2" countFile="angsdput.icnts.gz" mc.cores=24
Rscript ../R/contamination.R mapFile="../RES/map100.chrX.bz2" hapFile="../RES/hapMapCeuXlift.map.bz2" countFile="/space/anders/ida/idaSjov/kostenkitest/contamination/out/V1countKostinki.USER.bam.X.gz" mc.cores=24 Loading required package: multicore ----------------------- Doing Fisher exact test for Method1: [,1] [,2] [1,] 246 157 [2,] 17700 143407 Fisher's Exact Test for Count Data data: mat p-value < 2.2e-16 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 10.34000 15.61672 sample estimates: odds ratio 12.6959 ----------------------- Doing Fisher exact test for Method2: [,1] [,2] [1,] 91 55 [2,] 7355 59513 Fisher's Exact Test for Count Data data: mat2 p-value < 2.2e-16 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 9.466476 19.085589 sample estimates: odds ratio 13.38675 ---------------------- Running jackknife for Method1 (could be slow) Running jackknife for Method2 (could be slow) $est Method1 Method2 Contamination 0.03837625 0.03380983 llh 1034.078 483.5145 SE 0.002630455 0.003900376 $err [1] 0.01370779 $c [1] 0.001093589 $est Method1 Method2 Contamination 0.03837625 0.03380983 llh 1034.078 483.5145 SE 0.002630455 0.003900376 $err [1] 0.01370779 $c [1] 0.001093589