Revision as of 16:39, 18 June 2012

Association

Association can be performed using two approaches. One based on testing differences in allele frequencies between cases and control while the other is based on a generalized linear framework which allowes for including additional covariates. Both methods takes the uncertaincy of the genotypes into account.

Case control association using allele frequencies

To test for differences in the allele frequency genotype likelihood need to be provided or estimated. If alignment files are your input then -doLike must be invoked.

-doAsso [int]

1: The test is performed assuming the minor allele is known
3: The test is performed summing over all possible minor alleles

-yBin [file]

a file containing the case control status. 0 being the controls, 1 being the cases and -999 being missing phenotypes. The file should contain a single phenotype entry per line. Example of cases control phenotype file

cite kim et al.

Score statistic

To perform the test in a generalized linear framework posterior genotype probabilities must be provided or estimated. If alignment files are your input then -doLike, -doMaf, -doPost must be invoked. If input files are genotype likelihoods then -doMaf, -doPost must be invoked. Beagle output files can be used directly.

-doAsso [int]

2: The test is based on a score statistic from a generalized linear framework

-yBin [file]

a file containing the case control status. 0 being the controls, 1 being the cases and -999 being missing phenotypes. The file should contain a single phenotype entry per line. Example of cases control phenotype file

-yQuant [file]

a file containing the phenotype values.-999 being missing phenotypes. The file should contain a single phenotype entry per line. Example of a quantitative phenotype file

-cov [file]

a files containing additional covariates in the analysis. Each lines should contain the additional covariates for a single individuals. Thus the number of lines should match the number of individuals and the number of coloums should match the number of additional covariates. Example of a covariance file

-minHigh [int]

default = 10
This approach needs a certain amount of variability in the genotype probabilities. minHigh filters out sites that does not have at least [int] number of heterozygoes and homogoes genotype with at least 0.9 probability. This filter avoids the scenario where all individuals are heterozygoes with a high probability.

-minCount [int]

default = 10
The minimum expected minor alleles in the sample. This is the frequency multiplied by to times the number of individuals. Performing association on extremely low minor allele frequencies does not make sence.

cite skotte et al.

output

Association

Score statistic (prefix lrt*)

Chromosome

Position

Frequency

N

LRT

Chromosome

The Chromosome

Position

The physical Position

Frequency

The frequency estimate. The choice of estimation is determined by the *doMaf option.

N

The number of individuals with non-missing data. That is the individuals who have both some sequencing data for the given site and have phenotype data

LRT

The likelihood ratio statistic. This statistic is chi square distributed with one degree of freedom. Sites that fails one of the filters are given the value -999.000000

example:

 1	711153	0.012228		3200	-999.000000
 1	713682	0.047357		3200	0.133145
 1	713754	0.047357		3200	1.018738
 1	742429	0.096592		3200	0.174977
 1	743404	0.043796		3200	1.003485
 1	744055	0.097272		3200	2.334205
 1	751595	0.055826		3200	0.300824
 1	758311	0.054249		3200	1.242375
 1	765522	0.097715		3200	2.667515
 1	766409	0.345465		3200	0.162817

Association: Difference between revisions

Revision as of 16:39, 18 June 2012

Contents

Association

Case control association using allele frequencies

Score statistic

output

Association

Score statistic (prefix lrt*)

Navigation menu

@@ Line 37: / Line 37: @@
 cite skotte et al.
+=output=
+==Association==
+===Score statistic (prefix lrt*)===
+{| border="1"
+|Chromosome || Position || Frequency || N || LRT
+|}
+*Chromosome
+The Chromosome
+*Position
+The physical Position
+* Frequency
+The frequency estimate. The choice of estimation is determined by the *doMaf option.
+*N
+The number of individuals with non-missing data. That is the individuals who have both some sequencing data for the given site and have phenotype data
+*LRT
+The likelihood ratio statistic. This statistic is chi square distributed with one degree of freedom. Sites that fails one of the filters are given the value -999.000000
+example:
+<pre>
+	711153	0.012228		3200	-999.000000
+	713682	0.047357		3200	0.133145
+	713754	0.047357		3200	1.018738
+	742429	0.096592		3200	0.174977
+	743404	0.043796		3200	1.003485
+	744055	0.097272		3200	2.334205
+	751595	0.055826		3200	0.300824
+	758311	0.054249		3200	1.242375
+	765522	0.097715		3200	2.667515
+	766409	0.345465		3200	0.162817
+</pre>