FastNgsAdmixOld: Difference between revisions

From software
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
This page contains information about the program called FastNGSadmixPCA, which is a very fast tool for finding admixture proportions from NGS data of a single individual to incorporate into OCA of NGS data. It is based on genotype likelihoods. The program is written in R.
This page contains information about the program called FastNGSadmixPCA, which is a very fast tool for finding admixture proportions from NGS data of a single individual to incorporate into PCA of NGS data. It is based on genotype likelihoods. The program is written in R.


=Installation=
=Installation=

Revision as of 14:08, 4 February 2016

This page contains information about the program called FastNGSadmixPCA, which is a very fast tool for finding admixture proportions from NGS data of a single individual to incorporate into PCA of NGS data. It is based on genotype likelihoods. The program is written in R.

Installation

wget http://popgen.dk/albrecht/kristian/tool_download.zip
unzip tool_download.zip
OR simply use SHINY:
http://popgen.dk:443/kristian/admixpca_human/

Run example

tool.zip contains all files needed to execute FASTNGSAdmixPCA. The sample is from the HAPMAP project. In need of more samples, one can find a couple more samples in http://popgen.dk/albrecht/kristian/ The Rscript below executes the tool. all output is directed to a output_folder that is created in the process. To see the preset: Rscript FastNGSAdmixPCA.r

Rscript FastNGSAdmixPCA.r infile=NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz

All arguments can be altered. To alter the reference populations, one need to write comma separated populations to the refpops argument as shown below

Rscript FastNGSAdmixPCA.r infile=NA12763.mapped.ILLUMINA.bwa.CEU.low_coverage.20130502.bam.beagle.gz refpops=YRI,JPT,CHB,CEU

To get an overview of available reference populations, one can make a dry run

Rscript FastNGSAdmixPCA.r infile=TRUE dryrun=TRUE


Input Files

Input files are contains genotype likelihoods in genotype likelihood beagle input file format [1]. We recommend [ANGSD] for easy transformation of Next-generation sequencing data to beagle format.

The example below show how to make a beagle file of genotype likelihood using ANGSD.

HOME$ ./angsd0.594/angsd -i 'pathtoindi.bam' -GL 2 -sites 'SNP.sites' -doGlf 2 -doMajorMinor 3 -minMapQ 30 -minQ 20 -doDepth 1 -doCounts 1 -out indi_genotypelikelihood

Example of a beagle genotype likelihood input file for 3 individuals.

marker       allele1  allele2   Ind0      Ind0    Ind0
1_14000023      1       0       0.941    0.058    0.000
1_14000072      2       3       0.709    0.177    0.112
1_14000113      0       2       0.855    0.106    0.037
1_14000202      2       0       0.835    0.104    0.060
...