EvalAdmix: Difference between revisions

From software
Jump to navigation Jump to search
Line 82: Line 82:
=== low depth sequencing data ===
=== low depth sequencing data ===
Download the input file containing genotype likelihoods in beagle format
Download the input file containing genotype likelihoods in beagle format
::<code>wget popgen.dk/software/download/NGSadmix/data/input.gz</code>
<pre>
wget popgen.dk/software/download/NGSadmix/data/Demo1input.gz
wget popgen.dk/software/download/NGSadmix/data/Demo1pop.info


Execute [NGSadmix] to obtain admixture proportions
</pre>
::<code>./NGSadmix -likes input.gz -K 3 -P 4 -o myoutfiles -minMaf 0.05</code>
 
Execute [[NgsAdmix |NGSadmix]] to obtain admixture proportions
<code>./NGSadmix -likes Demo1input.gz -K 3 -P 20 -o myoutfiles -minMaf 0.05</code>


::Input file = input.gz
::Input file = input.gz
::Ancestral Populations K=3
::Ancestral Populations K=3
::Computer cores = 4 (-P 4).  
::Computer cores = 20 (-P 20).  
::Output prefix = myoutfiles (-o myoutfiles)  
::Output prefix = myoutfiles (-o myoutfiles)  
::SNPs with MAF > 5%  (-minMaf 0.05)
::SNPs with MAF > 5%  (-minMaf 0.05)
Run evalAdmix
<code>./evalAdmix -beagle input.gz -fname myoutfiles.fopt.gz -qname myoutfiles.qopt -P 20 </code>
::Input file = input.gz
::Ancestral Populations frequency file myoutfiles.fopt.gz
::Computer cores = 20 (-P 20).
Plot results in R
# Get ID and pop info for each individual
pop<-scan("poplabel",what="theFuck")
# Read inferred admixture proportions file
q<-read.table("Demo1NGSadmix.qopt")
# Plot them (ordered by population)
ord = order(pop)
par(mar=c(7,4,1,1))
barplot(t(q)[,ord],col=c(2,1,3),names=pop[ord],las=2,ylab="Demo1 Admixture proportions",cex.names=0.75)


==Citation==
==Citation==

Revision as of 14:35, 18 July 2019

evalAdmix allows to evaluate the results of an admixture analysis. It takes as input the genotype data (either called genotypes in plink files or genotype likelihoods beagle files) used in the admixture analysis and the frequency and admixture propotions (P and Q files) generated.

The output is a pairwise correlation of residuals matrix between individuals The correlation will be 0 in case of a good fit of the data to the admixture model. When something is wrong, individuals from the same population will be positively correlated; and individuals from different populationts but that share one or more ancestral populations as admixture sources will have a negative correlation. Positive correlation between a pair of individuals might also be due to relatedness.


Error creating thumbnail: File missing


Download and Installation

evalAdmix can be installed from github

git clone https://github.com/GenisGE/evalAdmix.git cd evalAdmix make

Quick start

./NGSadmix -likes inputBeagleFile.gz -K 3 -o outFileName -P 10
  • -likes beagle file of genotype likelihoods
  • -K number of clusters
  • -o prefix of output file names
  • -P Number of threads used

Parameters

./evalAdmix  
Arguments:
	-plink path to binary plink file (excluding the .bed)
	-beagle path to beagle file containing genotype likelihoods (alternative to -plink)
	-fname path to ancestral population frequencies file
	-qname path to admixture proportions file
	-o name of the output file
Setup:
	-P 1 number of threads
	-autosomeMax 23	 autosome ends with this chromsome
	-nIts 5	 number of iterations to do for frequency correction; if set to 0 calculates correlation without correction (fast but biased)
	-useSites 1.0	 proportion of sites to use to calculate correlation of residuals
	-misTol 0.05 	 tolerance for considering site as missing when using genotype likelihoods. Use same value as used in NGSadmix to keep compatibility when using genotype likelihoods (-beagle)
	-minMaf 0.05 	 minimum minor allele frequency to keep site. Use same value as used in NGSadmix to keep compatibility when using genotype likelihoods (-beagle)

Input File

Plink

Beagle genotype likelhoods

The input file contains genotype likelihoods in a .beagle file format [1]. and can be compressed with gzip.

BAM files

If you have BAM files you can use ANGSD to produce genotype likelihoods in .beagle format. Please see Creation of Beagle files with ANGSD

VCF files

If you already have made a VCF file that contains genotype likehood information then it should be possible to convert .vcf files with genotype likelihoods to .beagle file via vcftools [2]

vcftools --vcf input.vcf --out test --BEAGLE-GL --chr 1,2

Chromosome has to be specified.

You can also use bcftools' [3] 'query' option for generating a .beagle file from a .vcf file.

Output Files

The analysis performed by NGSadmix produces 4 files:

  • Log likelihood of the estimates: a .log file that summarizes the run. The Command line used for running the program, what the likelihood is every 50 iterations, and finally how long it took to do the run.
  • Estimated allele frequency: a zipped .fopt file, that contains an estimate of the allele frequency in each of the 3 assumed ancestral populations. There is a line for each locus.
  • Estimated admixture proportions: a .qopt file, that contains an estimate of the individual's ancestry proportion (admixture) from each of the three assumed ancestral populations for all individuals. There is a line for each individual.

Run command example

low depth sequencing data

Download the input file containing genotype likelihoods in beagle format

wget popgen.dk/software/download/NGSadmix/data/Demo1input.gz
wget popgen.dk/software/download/NGSadmix/data/Demo1pop.info

Execute NGSadmix to obtain admixture proportions ./NGSadmix -likes Demo1input.gz -K 3 -P 20 -o myoutfiles -minMaf 0.05

Input file = input.gz
Ancestral Populations K=3
Computer cores = 20 (-P 20).
Output prefix = myoutfiles (-o myoutfiles)
SNPs with MAF > 5% (-minMaf 0.05)

Run evalAdmix ./evalAdmix -beagle input.gz -fname myoutfiles.fopt.gz -qname myoutfiles.qopt -P 20

Input file = input.gz
Ancestral Populations frequency file myoutfiles.fopt.gz
Computer cores = 20 (-P 20).

Plot results in R

  1. Get ID and pop info for each individual

pop<-scan("poplabel",what="theFuck")

  1. Read inferred admixture proportions file

q<-read.table("Demo1NGSadmix.qopt")

  1. Plot them (ordered by population)

ord = order(pop) par(mar=c(7,4,1,1)) barplot(t(q)[,ord],col=c(2,1,3),names=pop[ord],las=2,ylab="Demo1 Admixture proportions",cex.names=0.75)

Citation