NgsAdmix

From software
Jump to navigation Jump to search

This will contain the program called NGSadmix, which is a very nice tool for finding admixture. It is based on genotype likelihoods. It is a fancy multithreaded c/c++ program.


Latest version is 32 from June 25 2013. It can be found [1]. Older versions can be found here: [2]

Installation

wget popgen.dk/software/NGSadmix/ngsadmix32.cpp 
g++ ngsadmix32.cpp -O3 -lpthread -lz -o NGSadmix

Run example

Assuming we have an input file called input.gz and we assume 3 ancestral populations (-K 3), and that we want to use 4 computing cores (-P 4). The prefix of the output files is myoutfiles (-o)

./NGSadmix -likes input.gz -K 3 -P 4 -tol 0.00000001 -tolLike50 0.01 -o myoutfiles -minMaf 0.05 

Input Files

Input files are contains genotype likelihoods in beagle input file format [3]. We recommend ANGSD for easy transformation of Next-generation sequencing data to beagle format.

Options

./NGSadmix
Arguments:
	-likes Beagle likelihood filename
	-K Number of ancestral populations
Optional:
	-fname Ancestral population frequencies
	-qname Admixture proportions
	-o Prefix for output files
	-printInfo print ID and mean maf for the SNPs that were analysed
Setup:
	-seed Seed for initial guess in EM
	-P Number of threads
	-method If 0 no acceleration of EM algorithm
	-misTol Tolerance for considering site as missing
Stop chriteria:
	-tolLike50 Loglikelihood difference in 50 iterations
	-tol Tolerance for convergence
	-dymBound Use dymamic boundaries (1: yes (default) 0: no)
	-maxiter Maximum number of EM iterations
Filtering
	-minMaf Minimum minor allele frequency
	-minLrt Minimum likelihood ratio value for maf>0
	-minInd Minumum number of informative individuals

Output Files

Program outputs 3 files.

  1. PREFIX.log
  2. PREFIX.fopt.gz
  3. PREFIX.qopt
  • The log file contains log information of the run. Commandline used for running the program, what the likelihood is every 50 iterations, and finally how long it took to do the run.
  • The fopt.gz file is an compressed file, which contains an estimate of the frequency for each site for all populations.
  • The qopt file contains the admixture proportions for all individuals.

Examples of the output files are found below.


Log file

Contents of the file log file

fopt file

Contents of the fopt file

There is currently no way to know the position of the lines of the fopt file if some sites have been flltered from the analysis (-minMaf, minInd, minLRT etc)

qopt file

Contents of the qopt file # cat tsk48GL.beagle.gz.s1.qopt

Plot results

Use R

admix<-t(as.matrix(read.table("tsk48GL.beagle.gz.s1.qopt")))
barplot(admix,col=1:3,space=0,border=NA,xlab="Individuals",ylab="admixture")

citation

log

  • v32 june 25-2013; modified code such that it now compiles on OSX
  • v31 june 24-2013; First public version.