AsaMap: Difference between revisions
(→Models) |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
This page contains information about the program '''asaMap''', a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++. | |||
=Download= | =Download= | ||
Line 16: | Line 17: | ||
=Example= | =Example= | ||
To be added... | |||
=Input Files= | =Input Files= | ||
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap. | Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where '''.Q and .P files''' respectively can be given directly to asaMap. | ||
Line 33: | Line 34: | ||
</pre> | </pre> | ||
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file. | A covarite file can also be provided, where each column is a covariate and each row is an individual - '''should NOT have columns of 1s for intercept (intercept will be included automatically)'''. This file has to have same number of rows as phenotype file and .fam file. | ||
<pre> | <pre> | ||
Line 68: | Line 69: | ||
</pre> | </pre> | ||
This produces a out.log logfile and a out.res with results for each site (after filtering). | This produces a '''out.log''' logfile and a '''out.res''' with results for each site (after filtering). | ||
Line 119: | Line 120: | ||
=Outputs= | =Outputs= | ||
A .res file with the likelihoods of each model and the estimated | A '''.res''' file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive: | ||
<pre> | <pre> | ||
Line 144: | Line 145: | ||
P-values can be generated doing a likelihood ratio test, between the 2 desired models. | P-values can be generated doing a likelihood ratio test, between the 2 desired models. | ||
An Rscript | An Rscript '''getPvalues.R''' is provided that makes it easy to obtain P-values from the '''.res''' file: | ||
<pre> | <pre> | ||
Line 152: | Line 153: | ||
</pre> | </pre> | ||
Which produces a file with the suffix .Pvalues: | Which produces a file with the suffix '''.Pvalues''': | ||
<pre> | <pre> | ||
Line 168: | Line 169: | ||
For the additive model there are 6 different models: | For the additive model there are 6 different models: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
Line 176: | Line 176: | ||
! scope="col"| Effect Parameters | ! scope="col"| Effect Parameters | ||
|- | |- | ||
| | | M0 | ||
| | | (beta_1, beta_2, delta_1) in R^3 | ||
| | | effect of non-assumed effect allele | ||
| | | 1 | ||
|- | |||
| M1 | |||
| (beta_1, beta_2) in R^2 | |||
| population specific effects | |||
| 2 | |||
|- | |||
| M2 | |||
| beta_1=0, beta_2 in R | |||
| no effect in population 1 | |||
| 1 | |||
|- | |||
| M3 | |||
| beta_1 in R, beta_2=0 | |||
| no effect in population 2 | |||
| 1 | |||
|- | |||
| M4 | |||
| beta_1=beta_2 in R | |||
| same effect in both populations | |||
| 1 | |||
|- | |||
| M5 | |||
| beta_1=beta_2=0 | |||
| no effect in any population | |||
| 0 | |||
|} | |} | ||
For the recessive model there are 8 different models: | |||
| | |||
{| class="wikitable" | |||
| | |- | ||
| | ! scope="col"| Model | ||
| | ! scope="col"| Parameters | ||
| | ! scope="col"| Notes | ||
| | ! scope="col"| Effect Parameters | ||
|- | |||
| R0 | |||
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5 | |||
| recessive effect of non-assumed effect alleles | |||
| 2 | |||
|- | |||
| R1 | |||
| (beta_1, beta_m, beta_2) in R^3 | |||
| population specific effects | |||
| 3 | |||
|- | |||
| R2 | |||
| beta_1 in R, beta_m=beta_2 in R | |||
| same effect when one or both variant alleles are from pop 2 | |||
| 2 | |||
|- | |||
| R3 | |||
| beta_1=beta_m in R, beta_2 in R | |||
| same effect when one or both variant alleles are from pop 1 | |||
| 2 | |||
|- | |||
| R4 | |||
| beta_1 in R, beta_m=beta_2=0 | |||
| only an effect when both variant alleles are from pop 1 | |||
| 1 | |||
|- | |||
| R5 | |||
| beta_1=beta_m=0, beta_2 in R | |||
| only an effect when both variant alleles are from pop 2 | |||
| 1 | |||
|- | |||
| R6 | |||
| beta_1=beta_m=beta_2 in R | |||
| same effect regardless of ancestry | |||
| 1 | |||
|- | |||
| R7 | |||
| beta_1=beta_m=beta_2=0 | |||
| no effect in any population | |||
| 0 | |||
|} | |||
'''beta_1''' and '''beta_2''' are the effect of the assumed effect-allele in population 1 and 2 respectively. '''beta_m''' is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. '''delta_1''' and '''delta_2''' are the effect of the assumed non-effect-allele in population 1 and 2 respectively. | |||
=Citation= | =Citation= |
Latest revision as of 10:08, 23 March 2019
This page contains information about the program asaMap, a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.
Download
The program can be downloaded from github:
https://github.com/e-jorsboe/asaMap
git clone https://github.com/e-jorsboe/asaMap.git; cd asaMap make
So far it has only been tested on Linux systems. Use curl if you are on a MAC.
Example
To be added...
Input Files
Input files are called genotypes in the binary plink files (*.bed) format [1]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies ADMIXTURE, can be used, where .Q and .P files respectively can be given directly to asaMap.
A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:
-0.712027291121767 -0.158413122435864 -1.77167888612947 -0.800940619551485 0.3016297021294 ...
A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.
0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973 0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705 0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881 0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436 0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168 ...
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
#run admixture admixture plinkFile.bed 2 #run asaMap with admix proportions ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
This produces a out.log logfile and a out.res with results for each site (after filtering).
Running asaMap
Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:
#run admixture admixture plinkFile.bed 2 #run asaMap with admix proportions ./asaMap -p plinkFile -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P
This produces a out.log logfile and a out.res with results for each site (after filtering).
A whole list of options can be explored by running asaMap without any input:
./asaMap
Must be specified:
- -p <filename>
Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.
- -o <filename>
Output filename - a .res file will be written with the results and a .log log file.
- -y <filename>
Phenotypes file, has to be plain text file - with as many rows as .fam file.
- -Q <filename> (either -a or -Q)
Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.
- -a <filename> (either -a or -Q)
Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.
- -f <filename>
Allele frequencies, .P file from ADMIXTURE.
Optional:
- -c <filename>
Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!
- -m <INT>
Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).
- -l <INT>
Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)
- -b <filename>
Text file containing a starting guess of the estimated coefficients.
- -i <INT>
The maximum number of iterations to run for the EM algorithm (default: 80).
- -t <FLOAT>
Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).
- -r <INT>
Give seed, for generation of starting values of coefficients.
- -P <INT>
Number of threads to be used for analysis. Each thread will write to temporary file in path specified by "-o".
- -e <INT>
Estimate standard error of coefficients (0: no, 1: yes - default: 0).
- -w <INT>
Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)
Outputs
A .res file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:
Chromo Position nInd f1 f2 llh(M0) llh(M1) llh(M2) llh(M3) llh(M4) llh(M5) b1(M1) b2(M1) b1(M2) b2(M3) b(M4) 1 9855422 1237 0.935997 0.537511 3242.099033 3242.214834 3243.033924 3242.812740 3243.019888 3243.115326 0.093018 -0.166907 -0.053931 0.047357 0.020093 1 10684283 1217 0.999990 0.509715 nan nan nan 3214.598952 3214.974638 3215.569371 nan nan nan -0.110044 -0.054084 1 11247763 1237 0.856692 0.78175 3234.025418 3241.930891 3242.902363 3242.561728 3242.820387 3243.028131 -0.048894 0.108007 0.045277 -0.030582 -0.016838 ...
For the recessive model it looks like this:
Chromo Position nInd f1 f2 llh(R0) llh(R1) llh(R2) llh(R3) llh(R4) llh(R5) llh(R6) llh(R7) b1(R1) b2(R1) bm(R1) b1(R2) b2m(R2) b1m(R3) b2(R3) b1(R4) b2(R5) b(R6) 1 9855422 1237 0.935997 0.537511 3236.442376 3241.191367 3242.235364 3241.191468 3243.112239 3241.188747 3242.691370 3243.115326 0.023373 -2.082935 -0.027433 0.016608 -0.582318 0.004700 -2.083112 -0.046849 -2.083275 -0.259338 1 10684283 1217 0.999990 0.509715 nan nan nan nan 3215.162291 3215.133559 3214.502575 3215.569371 nan nan nan nan nan nan nan -0.529999 -0.721649 -0.438317 1 11247763 1237 0.856692 0.78175 3235.030514 3242.807127 3242.809076 3242.836233 3242.818987 3243.028431 3242.907072 3243.028131 0.064419 -0.047597 -0.004021 0.068119 -0.019760 0.042905 -0.078669 0.060373 -0.018537 0.029227 ...
P-values can be generated doing a likelihood ratio test, between the 2 desired models.
An Rscript getPvalues.R is provided that makes it easy to obtain P-values from the .res file:
Rscript R/getPvalues.R out.res
Which produces a file with the suffix .Pvalues:
Chromo Position nInd f1 f2 M0vM1 M1vM5 M1vM2 M1vM3 M1vM4 M2vM5 M3vM5 M4vM5 1 9855422 1237 0.935997 0.537511 0.630338505521655 0.40636967666779 0.200575362363081 0.274160334109282 0.204476621296224 0.686587953953705 0.436611450245155 0.662188528285713 1 10684283 1217 0.99999 0.509715 NA NA NA NA NA NA 0.163577574260359 0.275437296874114 1 11247763 1237 0.856692 0.78175 6.99963946833027e-05 0.333791076895669 0.163349235419537 0.261334462945287 0.182273151757048 0.615995603296571 0.334134847663281 0.51919707427275 ...
Models
asaMap implements a range of linear models, making it possible to test specific hypotheses. For the additive model there are 6 different models:
Model | Parameters | Notes | Effect Parameters |
---|---|---|---|
M0 | (beta_1, beta_2, delta_1) in R^3 | effect of non-assumed effect allele | 1 |
M1 | (beta_1, beta_2) in R^2 | population specific effects | 2 |
M2 | beta_1=0, beta_2 in R | no effect in population 1 | 1 |
M3 | beta_1 in R, beta_2=0 | no effect in population 2 | 1 |
M4 | beta_1=beta_2 in R | same effect in both populations | 1 |
M5 | beta_1=beta_2=0 | no effect in any population | 0 |
For the recessive model there are 8 different models:
Model | Parameters | Notes | Effect Parameters |
---|---|---|---|
R0 | (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5 | recessive effect of non-assumed effect alleles | 2 |
R1 | (beta_1, beta_m, beta_2) in R^3 | population specific effects | 3 |
R2 | beta_1 in R, beta_m=beta_2 in R | same effect when one or both variant alleles are from pop 2 | 2 |
R3 | beta_1=beta_m in R, beta_2 in R | same effect when one or both variant alleles are from pop 1 | 2 |
R4 | beta_1 in R, beta_m=beta_2=0 | only an effect when both variant alleles are from pop 1 | 1 |
R5 | beta_1=beta_m=0, beta_2 in R | only an effect when both variant alleles are from pop 2 | 1 |
R6 | beta_1=beta_m=beta_2 in R | same effect regardless of ancestry | 1 |
R7 | beta_1=beta_m=beta_2=0 | no effect in any population | 0 |
beta_1 and beta_2 are the effect of the assumed effect-allele in population 1 and 2 respectively. beta_m is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. delta_1 and delta_2 are the effect of the assumed non-effect-allele in population 1 and 2 respectively.