runHmmld.ped {Relate} | R Documentation |
This function estimates the probability of sharing alleles identity by descent (IBD) accross the genome. The method is based on a continouos time Markov model with hidden states. The hidden states are the IBD states between two diploid chromosome pairs. We assume that the individuals are not inbreed and thus the individuals can share 0, 1 or 2 alleles IBD. The SNPs are allowed to be in linkage disequilibrium (LD). To accomidate LD the methods need SNP for several individuas in order to estimate the allele frequencies and the pairwise LD. The method return the posterior probabilies of the IBD states accross the genome and the overall IBD sharing. This function will use a snpMatrix object consisting of chromosome, position and genotypes.
runHmmld.ped(pedfile_object,pair=c(1,2),par=NULL,min.maf=0,LD="rsq2",epsilon=0.01,back=5,alim=c(0.001,0.15),start=NULL,prune=NULL,ld_adj=TRUE,fix.a=NULL,fix.k2=NULL,calc.a=FALSE,phi=0.013,timesToRun=10,timesToConverge=5,giveCrap=FALSE,convTol=0.1)
pedfile_object |
Return value from 'SnpMatrix' function 'read.snps.pedfile'. This is a list of 3 elements. 'snp.matrix, 'subject.support' and 'snp.support. The matrix should be SNP genotypes where NA or 0 denotes missing data, 1 for AA, 2 for Aa and 3 for aa. The number of individuals is the number of rows and the number of SNPs is the number of coloums. The position of each SNP in centi Morgan (or mega bases). If centi Morgan is used then phi should be set to 1, a vector of chromosomes numbers. |
pair |
Integer vector of length two with the row numbers of the two individuals where relatedness is to be estimated |
par |
Optional numeric vector c(a,k2,k1,k0) of parameters used instead of optimazation |
min.maf |
The minumum minor allele frequecy allowed |
LD |
The measure used to select the previous SNP to condition on, ("D'","D" or "rsq2") |
epsilon |
The error rate |
back |
The number of privous SNPs that can be conditioned on (see details for recomadations) |
alim |
The allowed range for a |
start |
Optional starting point for the optimazition |
prune |
The maximum value allowed for pairwise LD. If 0 then to pruning is performed |
ld_adj |
Logical. use the pairwise emission probabilities to correct for LD |
fix.a |
Numeric. Fix the a parameters to this value |
fix.k2 |
Numeric. Fix the k2 parameter to this value |
calc.a |
Estimate the a parameter from the overall IBD sharing. appropreate for distantly related individuals or individual who are related though one or two paths of the same length |
phi |
Numeric. The recombination rate in Morgans per Mega base (m/Mb) |
timesToRun |
Integer. The maximum number of times the optimization is run |
timesToConverge |
Integer. The number of times the optimazation should reach the same optimum |
giveCrap |
Integer. If non zero then emission probabilies (given the unobserved state and allele phase) and the haplotype probabilities are returned. Also more runtime information is given. |
convTol |
Numeric. The tolerance for stating that the likelihood have reached the same likelihood. |
How to select the number of privious SNP that can be conditioned on. First of all if is no LD in the data chosse back=0.If you want to prune SNP away based on LD and/or you want to accomidate LD in the model choose <back> as the number of SNP where you expect there to be LD. For example if you have 500,000 SNP you expect their to be LD between a lot of SNPs in a region. Here I use back=50. If you only have 10,000 SNPs I woulduse back=5.
If there is LD in the data but you want to remove the LD before the analysis then set adj_ld to FALSE, prune to some numeric value larger than zero (e.g. 0.2) and back to some number.
kResult |
The value for sharing 2, 1 or 0 alleles IBD |
kLike |
The maximal -log likelihood |
kr |
The co-ancestry coefficient |
a |
The a parameter |
uLike |
The likehood for being unrelated |
LD |
The LD measure |
t |
distance between the used SNPs (some SNPs might have been discarded) |
snp |
The number of used SNPs (some SNPs might have been discarded) |
position |
The position of the used SNPs (some SNPs might have been discarded) |
double_recom |
If false then no instanstationus double recombination is allowed |
alim |
the allowed range of a |
poLike |
The likelihood for being parent-offspring |
back |
The number of privious SNPs that have been conditioned on (and/or used for pruning) |
post |
a matrix with posterior probabilities for the hidden IBD states |
timesRun |
The number of times the optimazation algorihm have been run |
timesConverged |
The number of times the optimazation algorithm have found the same maximum |
convergenceInfo |
matrix with estimates of the overall IBD sharing and the likelihood for each of the optimaziations |
usedSnps |
a vector indicating wether a SNP have been used in the analysis (1's) or discarded (0's) |
S |
The joint emission probabilies for the current SNP and the privious SNP |
choose |
vector indivicating which privious SNP have been used to condition on |
mea |
The LD meassure for the <back> number of privious SNPs |
S1 |
The emission probabilies for each single SNP |
hap |
The haplotypes probabilies for each privious SNPs |
maf |
the minor allele frequency for the SNPs |
path |
The famous viterbi path |
Anders Albrechtsen
http://staff.pubhealth.ku.dk/~ande/web/software/relate.html
snpMatrix
and read.snps.pedfile
## ## SNP MATRIX MUST BE INSTALLED # library("snpMatrix") #path<-paste(.find.package("Relate"),"/data/500.ped",sep="") #ped<-read.snps.pedfile(path) #pedRes<-runHmmld.ped(ped) #plot(pedRes)