ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Plink: Difference between revisions
No edit summary |
|||
Line 24: | Line 24: | ||
<pre> | <pre> | ||
./angsd -bam bam.filelist -out outnames -doPlink | ./angsd -bam bam.filelist -out outnames -doPlink 2 -doGeno -4 -doPost 1 -doMajorMinor 1 -GL 1 -nThreads 20 -doMaf 2 -postCutoff 0.9999 -SNP_pval 1e-6 | ||
</pre> | </pre> | ||
Revision as of 19:08, 5 March 2014
From version 0.549 we now support plink output files. Currently we only support the transposed .tfam/.tped files. But we are working on a native .bed/.bim/.fam dumper.
This method is essentially a wrapper around the existing genotype caller, and all options for the genotype caller can therefore be used for the plink formated output file.
See Genotype calling for options relating to calling genotypes. For dumping the plink file you should supply:
- -doPlink 2
Brief Overview
------------------------ writePlink.cpp: -doPlink 0 1: binary fam/bim/bed format (still beta, not really working) 2: tfam/tped format NB This is a wrapper around -doGeno see more information for that option
Example
A full example commandline is given below:
./angsd -bam bam.filelist -out outnames -doPlink 2 -doGeno -4 -doPost 1 -doMajorMinor 1 -GL 1 -nThreads 20 -doMaf 2 -postCutoff 0.9999 -SNP_pval 1e-6
Notice the extra minus in the -dogeno -4 argument, this will suppress the -doGeno output.
Output files
The above commands will generate a .tfam/.tped files
output.tfam
1 1 0 0 0 -9 2 1 0 0 0 -9 3 1 0 0 0 -9 4 1 0 0 0 -9 5 1 0 0 0 -9 6 1 0 0 0 -9 7 1 0 0 0 -9 8 1 0 0 0 -9 9 1 0 0 0 -9 10 1 0 0 0 -9 11 1 0 0 0 -9 12 1 0 0 0 -9 13 1 0 0 0 -9 14 1 0 0 0 -9 15 1 0 0 0 -9 [capped]
output.tped
1 1_14000202 0 14000202 G G G G G G G A G G G G G A G A G A G G G G G A G G G G G A G A G G G G G G G A G G G G G A G G G G G G G G G G G A G G G A G A G G 1 1_14000873 0 14000873 G G G G G G A A G A G G G G G G G A G G G A G G G A G G G G A A G G G A G G G A G A G G G A G G G G A A A A G G G A G G G A G G G G 1 1_14001018 0 14001018 T T T T T T C C T C T T T T T T T C T T T C T T T T T T T T C C T T T C T T T T T C T T T T T C T T T C T C T T T C T T T C T T T T 1 1_14001867 0 14001867 A A A A A A A G A G A A A A A A A G A A A G A A A G A A A A G G A A A G A A A G A G A A A G A G A A A G G G A A A G A A A G A A A A
Notice that the family id simply an incrementing integer, and that the SNPid is the genomic position.
NB
We highly recommand that users, don't perform analysis on called genotypes, since calling genotypes is likely to cause bias in the downstream analysis.