AsaMap
Download
The program can be downloaded from github:
https://github.com/e-jorsboe/asaMap
git clone https://github.com/e-jorsboe/asaMap.git; cd asaMap make
So far it has only been tested on Linux systems. Use curl if you are on a MAC.
Example
This an example!!
Input Files
Input files are called genotypes in the binary plink files (*.bed) format [1]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies ADMIXTURE, can be used, where .Q and .P files respectively can be given directly to asaMap.
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:
-0.712027291121767 -0.158413122435864 -1.77167888612947 -0.800940619551485 0.3016297021294 ...
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973 0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705 0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881 0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436 0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168 ...
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
#run admixture admixture plinkFile.bed 2 #run asaMap with admix proportions ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
This produces a out.log logfile and a out.res with results for each site (after filtering).
Running asaMap
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
#run admixture admixture plinkFile.bed 2 #run asaMap with admix proportions ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
This produces a out.log logfile and a out.res with results for each site (after filtering).
A whole list of options can be explored by running asaMap without any input:
./asaMap
Must be specified:
- -p <filename>
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.
- -o <filename>
Output filename - a .res file will be written with the results and a .log log file.
- -y <filename>
Phenotypes file, has to be plain text file - with as many rows as .fam file.
- -Q <filename> (either -a or -Q)
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.
- -a <filename> (either -a or -Q)
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.
- -f <filename>
Allele frequencies, .P file from ADMIXTURE.
Optional:
- -c <filename>
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!
- -m <INT>
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).
- -l <INT>
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)
- -b <filename>
Text file containing a starting guess of the estimated coefficients.
- -i <INT>
The maximum number of iterations to run for the EM algorithm (default: 80).
- -t <FLOAT>
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).
- -r <INT>
Give seed, for generation of starting values of coefficients.
- -P <INT>
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by "-o".
- -e <INT>
Estimate standard error of coefficients (0: no, 1: yes - default: 0).
- -w <INT>
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)
Outputs
A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:
Chromo Position nInd f1 f2 llh(M0) llh(M1) llh(M2) llh(M3) llh(M4) llh(M5) b1(M1) b2(M1) b1(M2) b2(M3) b(M4) 1 980552 2737 0.935997 0.937511 3242.099033 3242.214834 3243.033924 3242.812740 3243.019888 3243.115326 0.093018 -0.166907 -0.053931 0.047357 0.020093 1 1068883 2717 0.999990 0.809715 nan nan nan 3214.598952 3214.974638 3215.569371 nan nan nan -0.110044 -0.054084 1 1124663 2737 0.886692 0.388175 3234.025418 3241.930891 3242.902363 3242.561728 3242.820387 3243.028131 -0.048894 0.108007 0.045277 -0.030582 -0.016838 1 1171417 2736 0.999990 0.445701 nan nan nan 3239.320653 3239.524956 3239.641824 nan nan nan -0.033530 -0.015845 1 1366830 2735 0.999990 0.374078 nan nan nan 3241.698019 3241.675158 3241.696793 nan nan nan 0.002135 0.007140 1 1450947 2738 0.659605 0.906222 3240.054094 3243.544587 3243.770254 3243.708934 3243.777517 3243.800524 -0.026101 0.044039 0.016671 -0.014242 -0.005544 1 1995211 2737 0.856699 0.982350 3235.516404 3242.070487 3242.928680 3242.571223 3242.756177 3242.941750 0.074805 -0.142018 -0.020892 0.039110 0.021462 1 2004098 2738 0.443711 0.815725 3241.253250 3242.382033 3243.741660 3242.955646 3243.532476 3243.800524 0.058767 -0.055806 -0.016451 0.041228 0.016158 1 2040898 2738 0.676808 0.610463 3242.664546 3243.371593 3243.574375 3243.801527 3243.787426 3243.800524 -0.024109 0.081087 0.047793 -0.001765 0.004108
For the recessive model it looks like this:
Chromo Position nInd f1 f2 llh(R0) llh(R1) llh(R2) llh(R3) llh(R4) llh(R5) llh(R6) llh(R7) b1(R1) b2(R1) bm(R1) b1(R2) b2m(R2) b1m(R3) b2(R3) b1(R4) b2(R5) b(R6) 1 980552 2737 0.935997 0.937511 3236.442376 3241.191367 3242.235364 3241.191468 3243.112239 3241.188747 3242.691370 3243.115326 0.023373 -2.082935 -0.027433 0.016608 -0.582318 0.004700 -2.083112 -0.046849 -2.083275 -0.259338 1 1068883 2717 0.999990 0.809715 nan nan nan nan 3215.162291 3215.133559 3214.502575 3215.569371 nan nan nan nan nan nan nan -0.529999 -0.721649 -0.438317 1 1124663 2737 0.886692 0.388175 3235.030514 3242.807127 3242.809076 3242.836233 3242.818987 3243.028431 3242.907072 3243.028131 0.064419 -0.047597 -0.004021 0.068119 -0.019760 0.042905 -0.078669 0.060373 -0.018537 0.029227 1 1171417 2736 0.999990 0.445701 nan nan nan nan 3238.750760 3239.274351 3238.288964 3239.641824 nan nan nan nan nan nan nan -0.210643 -0.267111 -0.144645 1 1366830 2735 0.999990 0.374078 nan nan nan nan 3241.645871 3241.199416 3241.338290 3241.696793 nan nan nan nan nan nan nan -0.045970 -0.273382 -0.070305 1 1450947 2738 0.659605 0.906222 3240.883715 3242.545834 3243.515375 3243.627600 3243.713843 3243.659336 3243.802228 3243.800524 0.047735 0.291966 -0.216232 0.044591 -0.069851 -0.016796 0.170637 0.032325 0.146528 0.002457 1 1995211 2737 0.856699 0.982350 3234.731598 3241.839632 3241.919398 3241.997812 3242.204980 3242.750902 3242.000261 3242.941750 0.072845 0.113462 0.601882 0.114683 0.366807 0.175891 0.261334 0.209120 0.516155 0.181162 1 2004098 2738 0.443711 0.815725 3238.336234 3238.488951 3241.228881 3243.661958 3242.407555 3243.783839 3243.676693 3243.800524 0.133629 0.236260 -0.298383 0.122912 -0.100454 0.025324 -0.013486 0.097341 0.030391 0.019042 1 2040898 2738 0.676808 0.610463 3241.442146 3242.449918 3242.502684 3243.202847 3243.802047 3243.233496 3243.496321 3243.800524 -0.065485 0.095602 0.207722 -0.057787 0.165752 0.014559 0.205258 0.003543 0.221293 0.037588
P-values can be generated doing a likelihood ratio test, between the 2 desired models.
An Rscript "getPvalues.R" is provided that makes it easy to obtain P-values from the .res file:
Rscript R/getPvalues.R out.res