AsaMap: Difference between revisions

From software
Jump to navigation Jump to search
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page contains information about the program '''asaMap''', a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.


=Download=
=Download=
Line 16: Line 17:
=Example=
=Example=
   
   
This an example!!
To be added...


=Input Files=
=Input Files=
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where .Q and .P files respectively can be given directly to asaMap.
Input files are called genotypes in the binary plink files (*.bed) format [https://www.cog-genomics.org/plink2]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies [http://software.genetics.ucla.edu/admixture/ ADMIXTURE], can be used, where '''.Q and .P files''' respectively can be given directly to asaMap.
   
   


Line 33: Line 34:
</pre>
</pre>


A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.
A covarite file can also be provided, where each column is a covariate and each row is an individual - '''should NOT have columns of 1s for intercept (intercept will be included automatically)'''. This file has to have same number of rows as phenotype file and .fam file.


<pre>
<pre>
Line 68: Line 69:
</pre>
</pre>


This produces a out.log logfile and a out.res with results for each site (after filtering).
This produces a '''out.log''' logfile and a '''out.res''' with results for each site (after filtering).




Line 119: Line 120:
=Outputs=
=Outputs=


A .res file with the likelihoods of each model and the estimated coefficents in each model is produced, here for the additive:
A '''.res''' file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:


<pre>
<pre>


Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)    b2(M1)    b1(M2)    b2(M3)    b(M4)
Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)    b2(M1)    b1(M2)    b2(M3)    b(M4)
1      980552   2737 0.935997  0.937511 3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018  -0.166907  -0.053931  0.047357  0.020093
1      9855422   1237 0.935997  0.537511 3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018  -0.166907  -0.053931  0.047357  0.020093
1      1068883   2717 0.999990  0.809715 nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084
1      10684283   1217 0.999990  0.509715 nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084
1      1124663   2737 0.886692 0.388175 3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007  0.045277  -0.030582  -0.016838
1      11247763   1237 0.856692 0.78175 3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007  0.045277  -0.030582  -0.016838
1      1171417  2736  0.999990  0.445701  nan          nan          nan          3239.320653  3239.524956  3239.641824  nan        nan        nan        -0.033530  -0.015845
...
1      1366830  2735  0.999990  0.374078  nan          nan          nan          3241.698019  3241.675158  3241.696793  nan        nan        nan        0.002135  0.007140
1      1450947  2738  0.659605  0.906222  3240.054094  3243.544587  3243.770254  3243.708934  3243.777517  3243.800524  -0.026101  0.044039  0.016671  -0.014242  -0.005544
1      1995211  2737  0.856699  0.982350  3235.516404  3242.070487  3242.928680  3242.571223  3242.756177  3242.941750  0.074805  -0.142018  -0.020892  0.039110  0.021462
1      2004098  2738  0.443711  0.815725  3241.253250  3242.382033  3243.741660  3242.955646  3243.532476  3243.800524  0.058767  -0.055806  -0.016451  0.041228  0.016158
1      2040898  2738  0.676808  0.610463  3242.664546  3243.371593  3243.574375  3243.801527  3243.787426  3243.800524  -0.024109  0.081087  0.047793  -0.001765  0.004108
 
</pre>
</pre>


Line 142: Line 137:


Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)    b2(R1)    bm(R1)    b1(R2)    b2m(R2)    b1m(R3)    b2(R3)    b1(R4)    b2(R5)    b(R6)
Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)    b2(R1)    bm(R1)    b1(R2)    b2m(R2)    b1m(R3)    b2(R3)    b1(R4)    b2(R5)    b(R6)
1      980552   2737 0.935997  0.937511 3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373  -2.082935  -0.027433  0.016608  -0.582318  0.004700  -2.083112  -0.046849  -2.083275  -0.259338
1      9855422   1237 0.935997  0.537511 3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373  -2.082935  -0.027433  0.016608  -0.582318  0.004700  -2.083112  -0.046849  -2.083275  -0.259338
1      1068883   2717 0.999990  0.809715 nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317
1      10684283   1217 0.999990  0.509715 nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317
1      1124663   2737 0.886692 0.388175 3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419  -0.047597  -0.004021  0.068119  -0.019760  0.042905  -0.078669  0.060373  -0.018537  0.029227
1      11247763   1237 0.856692 0.78175 3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419  -0.047597  -0.004021  0.068119  -0.019760  0.042905  -0.078669  0.060373  -0.018537  0.029227
1      1171417  2736  0.999990  0.445701  nan          nan          nan          nan          3238.750760  3239.274351  3238.288964  3239.641824  nan        nan        nan        nan        nan        nan        nan        -0.210643  -0.267111  -0.144645
...
1      1366830  2735  0.999990  0.374078  nan          nan          nan          nan          3241.645871  3241.199416  3241.338290  3241.696793  nan        nan        nan        nan        nan        nan        nan        -0.045970  -0.273382  -0.070305
1      1450947  2738  0.659605  0.906222  3240.883715  3242.545834  3243.515375  3243.627600  3243.713843  3243.659336  3243.802228  3243.800524  0.047735  0.291966  -0.216232  0.044591  -0.069851  -0.016796  0.170637  0.032325  0.146528  0.002457
1      1995211  2737  0.856699  0.982350  3234.731598  3241.839632  3241.919398  3241.997812  3242.204980  3242.750902  3242.000261  3242.941750  0.072845  0.113462  0.601882  0.114683  0.366807  0.175891  0.261334  0.209120  0.516155  0.181162
1      2004098  2738  0.443711  0.815725  3238.336234  3238.488951  3241.228881  3243.661958  3242.407555  3243.783839  3243.676693  3243.800524  0.133629  0.236260  -0.298383  0.122912  -0.100454  0.025324  -0.013486  0.097341  0.030391  0.019042
1      2040898  2738  0.676808  0.610463  3241.442146  3242.449918  3242.502684  3243.202847  3243.802047  3243.233496  3243.496321  3243.800524  -0.065485  0.095602  0.207722  -0.057787  0.165752  0.014559  0.205258  0.003543  0.221293  0.037588
 
</pre>
</pre>




P-values can be generated doing a likelihood ratio test, between the 2 desired models.
P-values can be generated doing a likelihood ratio test, between the 2 desired models.
An Rscript "getPvalues.R" is provided that makes it easy to obtain P-values from the .res file:
An Rscript '''getPvalues.R''' is provided that makes it easy to obtain P-values from the '''.res''' file:


<pre>
<pre>
Line 162: Line 151:
Rscript R/getPvalues.R out.res
Rscript R/getPvalues.R out.res


</pre>
Which produces a file with the suffix '''.Pvalues''':
<pre>
Chromo  Position  nInd  f1        f2        M0vM1                M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5
1      9855422    1237  0.935997  0.537511  0.630338505521655    0.40636967666779  0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713
1      10684283  1217  0.99999  0.509715  NA                    NA                NA                NA                NA                NA                0.163577574260359  0.275437296874114
1      11247763  1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275
...
</pre>
</pre>


=Models=
=Models=
asaMap implements a range of linear models, making it possible to test specific hypotheses.
For the additive model there are 6 different models:
{| class="wikitable"
|-
! scope="col"| Model
! scope="col"| Parameters
! scope="col"| Notes
! scope="col"| Effect Parameters
|-
| M0
| (beta_1, beta_2, delta_1) in R^3
| effect of non-assumed effect allele
| 1
|-
| M1
| (beta_1, beta_2) in R^2
| population specific effects
| 2
|-
| M2
| beta_1=0, beta_2 in R
| no effect in population 1
| 1
|-
| M3
| beta_1 in R, beta_2=0
| no effect in population 2
| 1
|-
| M4
| beta_1=beta_2 in R
| same effect in both populations
| 1
|-
| M5
| beta_1=beta_2=0
| no effect in any population
| 0
|}
For the recessive model there are 8 different models:
{| class="wikitable"
|-
! scope="col"| Model
! scope="col"| Parameters
! scope="col"| Notes
! scope="col"| Effect Parameters
|-
| R0
| (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5
| recessive effect of non-assumed effect alleles
| 2
|-
| R1
| (beta_1, beta_m, beta_2) in R^3
| population specific effects
| 3
|-
| R2
| beta_1 in R, beta_m=beta_2 in R
| same effect when one or both variant alleles are from pop 2
| 2
|-
| R3
| beta_1=beta_m in R, beta_2 in R
| same effect when one or both variant alleles are from pop 1
| 2
|-
| R4
| beta_1 in R, beta_m=beta_2=0
| only an effect when both variant alleles are from pop 1
| 1
|-
| R5
| beta_1=beta_m=0, beta_2 in R
| only an effect when both variant alleles are from pop 2
| 1
|-
| R6
| beta_1=beta_m=beta_2 in R
| same effect regardless of ancestry
| 1
|-
| R7
| beta_1=beta_m=beta_2=0
| no effect in any population
| 0
|}
'''beta_1''' and '''beta_2''' are the effect of the assumed effect-allele in population 1 and 2 respectively. '''beta_m''' is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. '''delta_1''' and '''delta_2''' are the effect of the assumed non-effect-allele in population 1 and 2 respectively.


=Citation=
=Citation=

Latest revision as of 10:08, 23 March 2019

This page contains information about the program asaMap, a tool for doing ancestry specific assocaition mapping for large scales genetic studies. It is based on called genotypes in the binary plink format (.bed). The program is written in C++.

Download

The program can be downloaded from github:

https://github.com/e-jorsboe/asaMap

git clone https://github.com/e-jorsboe/asaMap.git;
cd asaMap 
make

So far it has only been tested on Linux systems. Use curl if you are on a MAC.

Example

To be added...

Input Files

Input files are called genotypes in the binary plink files (*.bed) format [1]. And estimated admixture proportions and population specific allele frequencies. For estimating admixture proportions and population specific allele frequencies ADMIXTURE, can be used, where .Q and .P files respectively can be given directly to asaMap.


A phenotype also has to be provided, this should just be text file with one line for each individual in the .fam file, sorted in the same way:

-0.712027291121767
-0.158413122435864
-1.77167888612947
-0.800940619551485
0.3016297021294
...

A covarite file can also be provided, where each column is a covariate and each row is an individual - should NOT have columns of 1s for intercept (intercept will be included automatically). This file has to have same number of rows as phenotype file and .fam file.

0.0127096117618385 -0.0181281029917176 -0.0616739439849275 -0.0304606694443973
0.0109944672768584 -0.0205785925514037 -0.0547523583405743 -0.0208813157640705
0.0128395346453956 -0.0142116856067135 -0.0471689997039534 -0.0266186436009881
0.00816783754598649 -0.0189271733933446 -0.0302259313905976 -0.0222247658768436
0.00695928218989132 -0.0089960963981644 -0.0384886176827146 -0.012649019770168
...

Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:

#run admixture
admixture plinkFile.bed 2

#run asaMap with admix proportions
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P

This produces a out.log logfile and a out.res with results for each site (after filtering).

Running asaMap

Example of a command of how to run asaMap with covariates included and first running ADMIXTURE:

#run admixture
admixture plinkFile.bed 2

#run asaMap with admix proportions
./asaMap -p plinkFile  -o out -c $COV -y pheno.files -Q plinkFile.2.Q -f plinkFile.2.P

This produces a out.log logfile and a out.res with results for each site (after filtering).


A whole list of options can be explored by running asaMap without any input:

./asaMap


Must be specified:

-p <filename>

Plink prefix filename of binary plink files - so without .bed/.fam/.bim suffixes.

-o <filename>

Output filename - a .res file will be written with the results and a .log log file.

-y <filename>

Phenotypes file, has to be plain text file - with as many rows as .fam file.

-Q <filename> (either -a or -Q)

Admixture proportions, .Q file from ADMIXTURE. Either specify this or -a.

-a <filename> (either -a or -Q)

Admixture proportions (for source pop1) - so first column from .Q file from ADMIXTURE. Either specify this or -Q.

-f <filename>

Allele frequencies, .P file from ADMIXTURE.


Optional:

-c <filename>

Covariates, plain text file with one column for each covariates, same number of rows as .fam file. SHOULD NOT HAVE COLUMN OF 1s (for intercept) WILL BE ADDED AUTOMATICALLY!

-m <INT>

Model, whether an additive genotype model, or a recessive genotype model should be used (0: additive, 1: recessive - default: 0).

-l <INT>

Regression, whether a linear or logistic regression, should be used. Logistic regression is for binary phenotype data, linear regresion is fo quantative phenotype data. (0: linear regression, 1: logistic regression - default: 0)

-b <filename>

Text file containing a starting guess of the estimated coefficients.

-i <INT>

The maximum number of iterations to run for the EM algorithm (default: 80).

-t <FLOAT>

Tolerance for change in likelihood between EM iterations for finishing analysis (default: 0.0001).

-r <INT>

Give seed, for generation of starting values of coefficients.

-P <INT>

Number of threads to be used for analysis. Each thread will write to temporary file in path specified by "-o".

-e <INT>

Estimate standard error of coefficients (0: no, 1: yes - default: 0).

-w <INT>

Run M0/R0 model that models effect of other allele. Analyses are faster without having to run M0/R0. (0: no, 1: yes - default: 1)

Outputs

A .res file with the likelihoods of each model and the estimated coefficients in each model is produced, here for the additive:


Chromo  Position  nInd  f1        f2        llh(M0)      llh(M1)      llh(M2)      llh(M3)      llh(M4)      llh(M5)      b1(M1)     b2(M1)     b1(M2)     b2(M3)     b(M4)
1       9855422    1237  0.935997  0.537511  3242.099033  3242.214834  3243.033924  3242.812740  3243.019888  3243.115326  0.093018   -0.166907  -0.053931  0.047357   0.020093
1       10684283   1217  0.999990  0.509715  nan          nan          nan          3214.598952  3214.974638  3215.569371  nan        nan        nan        -0.110044  -0.054084
1       11247763   1237  0.856692  0.78175  3234.025418  3241.930891  3242.902363  3242.561728  3242.820387  3243.028131  -0.048894  0.108007   0.045277   -0.030582  -0.016838
...


For the recessive model it looks like this:


Chromo  Position  nInd  f1        f2        llh(R0)      llh(R1)      llh(R2)      llh(R3)      llh(R4)      llh(R5)      llh(R6)      llh(R7)      b1(R1)     b2(R1)     bm(R1)     b1(R2)     b2m(R2)    b1m(R3)    b2(R3)     b1(R4)     b2(R5)     b(R6)
1       9855422    1237  0.935997  0.537511  3236.442376  3241.191367  3242.235364  3241.191468  3243.112239  3241.188747  3242.691370  3243.115326  0.023373   -2.082935  -0.027433  0.016608   -0.582318  0.004700   -2.083112  -0.046849  -2.083275  -0.259338
1       10684283   1217  0.999990  0.509715  nan          nan          nan          nan          3215.162291  3215.133559  3214.502575  3215.569371  nan        nan        nan        nan        nan        nan        nan        -0.529999  -0.721649  -0.438317
1       11247763   1237  0.856692  0.78175  3235.030514  3242.807127  3242.809076  3242.836233  3242.818987  3243.028431  3242.907072  3243.028131  0.064419   -0.047597  -0.004021  0.068119   -0.019760  0.042905   -0.078669  0.060373   -0.018537  0.029227
...


P-values can be generated doing a likelihood ratio test, between the 2 desired models. An Rscript getPvalues.R is provided that makes it easy to obtain P-values from the .res file:


Rscript R/getPvalues.R out.res

Which produces a file with the suffix .Pvalues:


Chromo  Position  nInd  f1        f2        M0vM1                 M1vM5              M1vM2              M1vM3              M1vM4              M2vM5              M3vM5              M4vM5
1       9855422    1237  0.935997  0.537511  0.630338505521655     0.40636967666779   0.200575362363081  0.274160334109282  0.204476621296224  0.686587953953705  0.436611450245155  0.662188528285713
1       10684283   1217  0.99999   0.509715  NA                    NA                 NA                 NA                 NA                 NA                 0.163577574260359  0.275437296874114
1       11247763   1237  0.856692  0.78175  6.99963946833027e-05  0.333791076895669  0.163349235419537  0.261334462945287  0.182273151757048  0.615995603296571  0.334134847663281  0.51919707427275
...

Models

asaMap implements a range of linear models, making it possible to test specific hypotheses. For the additive model there are 6 different models:

Model Parameters Notes Effect Parameters
M0 (beta_1, beta_2, delta_1) in R^3 effect of non-assumed effect allele 1
M1 (beta_1, beta_2) in R^2 population specific effects 2
M2 beta_1=0, beta_2 in R no effect in population 1 1
M3 beta_1 in R, beta_2=0 no effect in population 2 1
M4 beta_1=beta_2 in R same effect in both populations 1
M5 beta_1=beta_2=0 no effect in any population 0

For the recessive model there are 8 different models:

Model Parameters Notes Effect Parameters
R0 (beta_1, beta_m, beta_2, delta_1, delta_2) in R^5 recessive effect of non-assumed effect alleles 2
R1 (beta_1, beta_m, beta_2) in R^3 population specific effects 3
R2 beta_1 in R, beta_m=beta_2 in R same effect when one or both variant alleles are from pop 2 2
R3 beta_1=beta_m in R, beta_2 in R same effect when one or both variant alleles are from pop 1 2
R4 beta_1 in R, beta_m=beta_2=0 only an effect when both variant alleles are from pop 1 1
R5 beta_1=beta_m=0, beta_2 in R only an effect when both variant alleles are from pop 2 1
R6 beta_1=beta_m=beta_2 in R same effect regardless of ancestry 1
R7 beta_1=beta_m=beta_2=0 no effect in any population 0

beta_1 and beta_2 are the effect of the assumed effect-allele in population 1 and 2 respectively. beta_m is the recessive effect of being recessive for an allele with one copy from population 1 and one copy from population 2. delta_1 and delta_2 are the effect of the assumed non-effect-allele in population 1 and 2 respectively.

Citation