ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Beagle input: Difference between revisions
No edit summary |
No edit summary |
||
Line 3: | Line 3: | ||
; -doGlf 2 | ; -doGlf 2 | ||
In order to make this file the major and minor allele has the be inferred (-doMajorMinor). It is also a good idea to only use the polymorphic sites, see [[ | In order to make this file the major and minor allele has the be inferred (-doMajorMinor). It is also a good idea to only use the polymorphic sites, see [[Filters]]. | ||
Revision as of 13:22, 10 October 2012
Beagle haplotype imputation and be performed directly on genotype likelhoods. To generate beagle input file use
- -doGlf 2
In order to make this file the major and minor allele has the be inferred (-doMajorMinor). It is also a good idea to only use the polymorphic sites, see Filters.
Example
In this example our input files are bam files. We use the samtools genotype likelihood methods. We use 10 threads. We infer the major and minor allele from the likelihoods and estimate the allele frequencies. We test for polymorphic sites and only outbut the ones with are likelhood ratio test statistic of minimum 24 (ca. p-value<1e-6).
./angsd -GL 1 -out genolike -nThreads 10 -doGlf 2 -doMajorMinor 1 -minLRT 24 -doMaf 2 -doSNP 2 -bam bam.filelist
output
The above command generates the file genolike.beagle.gz that can be use as input for the beagle software
marker allele1 allele2 Ind0 Ind0 Ind0 Ind1 Ind1 Ind1 Ind2 Ind2 Ind2 Ind3 Ind3 Ind3 1_14000023 1 0 0.941177 0.058822 0.000001 0.799685 0.199918 0.000397 0.666316 0.333155 0.000529 1_14000072 2 3 0.709983 0.177493 0.112525 0.941178 0.058822 0.000000 0.665554 0.332774 0.001672 1_14000113 0 2 0.855993 0.106996 0.037010 0.333333 0.333333 0.333333 0.799971 0.199989 0.000040 1_14000202 2 0 0.835380 0.104420 0.060201 0.799685 0.199918 0.000397 0.333333 0.333333 0.333333 ...
Note that the above values sum to one per sites for each individuals. This is just a normalization of the genotype likelihoods in order to avoid underflow problems in the beagle software it does not mean that they are genotype probabilities.