FastNgsAdmixOld: Difference between revisions
No edit summary |
|||
Line 48: | Line 48: | ||
=version 2= | =version 2= | ||
This program needs the genotype likelihoods in the beagle file format. It also needs frequencies of a reference panel with the populations for which admixture proportions should be estimated, | |||
for instance from 1000 G or HGDP, or a custom made reference panel, it should be noted that the frequencies in the reference panel should be of the major allele in the beagle file. | |||
(So if the 3 columns with genotype likelihoods in the beagle file is coded like this AA AB BB, then the frequencies should be of the A allele.) | |||
Furthermore a file with the number of individuals in each reference population should be supplied. | |||
An example of a command: | |||
./fastNGSadmix -likes Yoruba10Japanese65Han25_3000000_d10_N10_GL.txt -fname Yoruba10Japanese65Han25_3000000_d10_N10_Ref.txt -Nname sYoruba10Japanese65Han25_3000000_d10_N10_nInd.txt -outfiles Yoruba10Japanese65Han25_3000000_d10_N10 | |||
Then a lot of different options and filters can be specified: | |||
(TO BE CONTINUED...) |
Revision as of 16:06, 1 August 2016
This page contains information about the program called FastNGSadmixPCA, which is a very fast tool for finding admixture proportions from NGS data of a single individual to incorporate into PCA of NGS data. It is based on genotype likelihoods. The program is written in R.
Installation
wget http://popgen.dk/albrecht/kristian/tool_download.zip unzip tool_download.zip OR simply use SHINY: http://popgen.dk:443/kristian/admixpca_human/
Run example
tool.zip contains all files needed to execute FASTNGSAdmixPCA. The sample is from the HAPMAP project. In need of more samples, one can find a couple more samples in http://popgen.dk/albrecht/kristian/ The Rscript below executes the tool. all output is directed to a output_folder that is created in the process. To see the preset: Rscript FastNGSAdmixPCA.r
Rscript FastNGSAdmixPCA.r infile=NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz
All arguments can be altered. To alter the reference populations, one need to write comma separated populations to the refpops argument as shown below
Rscript FastNGSAdmixPCA.r infile=NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz refpops=YRI,JPT,CHB,CEU
To get an overview of available reference populations, one can make a dry run
Rscript FastNGSAdmixPCA.r infile=TRUE dryrun=TRUE
Input Files
Input files are contains genotype likelihoods in genotype likelihood beagle input file format [1]. We recommend [ANGSD] for easy transformation of Next-generation sequencing data to beagle format.
The example below show how to make a beagle file of genotype likelihood using ANGSD.
HOME$ ./angsd0.594/angsd -i 'pathtoindi.bam' -GL 2 -sites 'SNP.sites' -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out indi_genotypelikelihood
Example of a beagle genotype likelihood input file for 3 individuals.
marker allele1 allele2 Ind0 Ind0 Ind0 1_14000023 1 0 0.941 0.058 0.000 1_14000072 2 3 0.709 0.177 0.112 1_14000113 0 2 0.855 0.106 0.037 1_14000202 2 0 0.835 0.104 0.060 ...
version 2
This program needs the genotype likelihoods in the beagle file format. It also needs frequencies of a reference panel with the populations for which admixture proportions should be estimated, for instance from 1000 G or HGDP, or a custom made reference panel, it should be noted that the frequencies in the reference panel should be of the major allele in the beagle file.
(So if the 3 columns with genotype likelihoods in the beagle file is coded like this AA AB BB, then the frequencies should be of the A allele.)
Furthermore a file with the number of individuals in each reference population should be supplied.
An example of a command:
./fastNGSadmix -likes Yoruba10Japanese65Han25_3000000_d10_N10_GL.txt -fname Yoruba10Japanese65Han25_3000000_d10_N10_Ref.txt -Nname sYoruba10Japanese65Han25_3000000_d10_N10_nInd.txt -outfiles Yoruba10Japanese65Han25_3000000_d10_N10
Then a lot of different options and filters can be specified:
(TO BE CONTINUED...)