|   |   | 
| (132 intermediate revisions by 2 users not shown) | 
| Line 1: | Line 1: | 
|  | =Brief description= |  | = NEW VERSION =   | 
|  | This page contains information about theprogram called NgsRelate, which can be used to infer relatedness coefficients for pairs ofindividuals for low coverage nags data by using genotype likelihoods. To be able to infer the relatednessyou will need toknow the population frequencies and have genotype likelihoods. This can be done e.g.using the program ANGSDas shown in the example.
 |  | For the NEW version of ngsRelate that coestimates relatedness and inbreeding go to this link https://github.com/ANGSD/NgsRelate | 
|  | 
 |  | 
 | 
|  | =Installation=
 |  | 
|  | Primary repository is github.
 |  | 
|  | == Download Installation of C program ==
 |  | 
|  | <pre>
 |  | 
|  | curl https://raw.githubusercontent.com/ANGSD/fastlate/master/fastlate.cpp >fastlate.cpp
 |  | 
|  | g++ fastlate.cpp -O3 -lz -o fastlate
 |  | 
|  | </pre>
 |  | 
|  | 
 |  | 
 | 
|  | = Run example using C = |  | = OLD VERSION =   | 
|  | Assume we have file containing paths to 100 BAM/CRAM files, then we can useANGSD to estimate frequencies calculate genotype likelihoods while doing SNP calling and dumping the input files needed for the NgsRelate program
 |  | For the old version please use this link: http://www.popgen.dk/software/index.php?title=NgsRelate&oldid=694 | 
|  | <pre>
 |  | 
|  | ./angsd -b filelist -gl 1 -domajorminor 1 -snp_pval 1e-6 - domaf 1 -minmaf 0.05 -doGlf 3
 |  | 
|  | #thisgenerates an angsdput.mafs.gz and a angsdput.glf.gz.
 |  | 
|  | #we will need to extract the frequency column from the mafs file and remove the header
 |  | 
|  | cut -f5 angsdput.mafs.gz |sed 1d >freq
 |  | 
|  | ./ngsrelate -g angsdput.glf.gz -n 100 -f freq -a 0 -b 1 >gl.res
 |  | 
|  | </pre>
 |  | 
|  | Here we specify that our binary genotype likelihood file contains 100 samples, and that we want to run the analysis for the first two samples -a 0 -b 1.
 |  | 
|  | If no -a and -b are specified it will loop through all pairs
 |  | 
|  |   |  | 
|  | == Output file format==
 |  | 
|  | Example of output
 |  | 
|  | <pre>
 |  | 
|  | Pair	k0	k1	k2	loglh	nIter	coverage
 |  | 
|  | (0,1)	0.673213	0.326774	0.000013	-1710940.769941	19	0.814658
 |  | 
|  | </pre>
 |  | 
|  |   |  | 
|  |   |  | 
|  | The first two columns are the individuals number. The next three columns are the estimated relatedness coefficients and the last column is the number of iterations used.
 |  | 
|  |   |  | 
|  |   |  | 
|  | === Input file format ===
 |  | 
|  |   |  | 
|  | The input consists of three files describignt the genotype data, a file with admixture proportions for each individual and a file with 
 |  | 
|  | allele frequencies for each SNP for each source population. The genotype data files are plink bed/bim/fam files. And the remaining two files
 |  | 
|  | are in the output format for the program [http://www.genetics.ucla.edu/software/admixture/ ADMIXTURE]:  
 |  | 
|  |   |  | 
|  | Example of the content of an admixture proportion file (for 3 populations)
 |  | 
|  | <pre>
 |  | 
|  | 0.531631 0.468359 0.000010
 |  | 
|  | 0.564461 0.435529 0.000010
 |  | 
|  | 0.850660 0.149330 0.000010
 |  | 
|  | 0.630527 0.369463 0.000010
 |  | 
|  | 0.747429 0.219346 0.033225
 |  | 
|  | 0.999980 0.000010 0.000010
 |  | 
|  | 0.999980 0.000010 0.000010
 |  | 
|  | 0.682072 0.317918 0.000010
 |  | 
|  | 0.000010 0.999980 0.000010
 |  | 
|  | 0.793133 0.206857 0.000010
 |  | 
|  | </pre>
 |  | 
|  | Each row is an individual and each column is a population. The admixture proportions for each individual must sum to 1
 |  | 
|  |   |  | 
|  | Example of the allele frequency file (for 3 populations)
 |  | 
|  | <pre>
 |  | 
|  | 0.312722 0.208605 0.999990
 |  | 
|  | 0.881352 0.999990 0.966966
 |  | 
|  | 0.708206 0.838869 0.932119
 |  | 
|  | 0.427789 0.620694 0.532966
 |  | 
|  | 0.411998 0.622253 0.534072
 |  | 
|  | 0.427789 0.620694 0.532966
 |  | 
|  | 0.440817 0.581630 0.618751
 |  | 
|  | 0.733733 0.985281 0.953523
 |  | 
|  | 0.724083 0.451452 0.784607
 |  | 
|  | 0.811161 0.578612 0.787782
 |  | 
|  | </pre>
 |  | 
|  | Each row is an SNP and each column is a population. When using plink files the allele frequency is the MAJOR allele frequency.
 |  | 
|  |   |  | 
|  | = Citing and references =
 |  | 
|  | === relateAdmix ===
 |  | 
|  | Moltke, I, Albrechtsen, A (2013). RelateAdmix: a softwaretool for estimating relatedness between admixed individuals. Bioinformatics.
 |  | 
|  | [http://www.ncbi.nlm.nih.gov/entrez?Db=pubmed&Cmd=ShowDetailView&uid=24215025 pubmed]
 |  | 
|  | [http://www.bioinformatics.org/texmed/cgi-bin/list.cgi?PMID=24215025 bibtex]
 |  | 
|  |   |  | 
|  | === ADMIXTURE ===
 |  | 
|  | D.H. Alexander, J. Novembre, and K. Lange. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19:1655–1664, 2009. 
 |  | 
|  |   |  | 
|  |   |  | 
|  | = change log =
 |  | 
|  | * 0.14 made more MAC usable (I think). Thanks to Paul Lott for reporting it and for suggestions and Thorfinn Sand for changing it
 |  | 
|  | * 0.13 added extra check for file exists to give instant errors + changes all printf to fprintf(stderr,
 |  | 
|  | * 0.11 changed threading to a fixed pool of threads
 |  | 
|  | * 0.10 optimized code
 |  | 
|  | * 0.09 added error for when the number of sites and individuals does not match between files
 |  | 
|  | * 0.08 fixed a bug that would sometimes print an extra line when multiple threaded
 |  | 
|  | * 0.07 fixed a small leak
 |  |