ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Genotype Distribution

From angsd
Revision as of 10:16, 14 July 2016 by Albrecht (talk | contribs) (→‎Example)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Works from version 0.913 and above. The latest developmental version can be found here github


This method allow for estimation of the expected genotype count or fractions for one or two individuals based on genotype likelihoods. This can be very usefull for a number of population genetic statistics including Relatedness and Heterozygosity.


Examples of genotypes fraction for a single individual

all 10 possible genotypes

pAA pAC pAG pAT pCC pCG pCT pGG pGT pTT
0.293 9.3e-05 0.000331 7.3e-05 0.2 7.7e-05 0.000411 0.204 7e-05 0.302

number of derived alleles (use SFS method )

pAA pAD pDD
0.9986 0.0003168 0.001127


or homozygoes vs. heterogoes

pHO pHE
0.9987 0.0003168


For two individuals it could be the full 10x10 possible genotype combination

Example of 10x10 genotype probability

       AA     AC     AG     AT     CC     CG     CT     GG     GT     TT
AA 0.0420 0.0130 0.0200 0.0170 0.0160 0.0170 0.0150 0.0240 0.0042 0.0500
AC 0.0030 0.0034 0.0071 0.0067 0.0074 0.0071 0.0065 0.0074 0.0032 0.0038
AG 0.0030 0.0033 0.0068 0.0064 0.0070 0.0068 0.0061 0.0070 0.0028 0.0034
AT 0.0071 0.0084 0.0110 0.0110 0.0110 0.0110 0.0100 0.0120 0.0072 0.0084
CC 0.0180 0.0045 0.0110 0.0100 0.0092 0.0100 0.0089 0.0140 0.0016 0.0240
CG 0.0015 0.0018 0.0061 0.0061 0.0067 0.0063 0.0060 0.0067 0.0019 0.0015
CT 0.0029 0.0032 0.0068 0.0064 0.0070 0.0067 0.0060 0.0069 0.0027 0.0033
GG 0.0180 0.0054 0.0110 0.0096 0.0088 0.0094 0.0085 0.0120 0.0012 0.0200
GT 0.0029 0.0033 0.0069 0.0066 0.0072 0.0070 0.0062 0.0071 0.0027 0.0031
TT 0.0400 0.0130 0.0200 0.0170 0.0150 0.0170 0.0150 0.0240 0.0038 0.0480


or the number of derived alleles (use 2D SFS method for this)

ind2
ind1 pAA pAD pDD
pAA 0.6561 0.1458 0.0081
pAD 0.1458 0.0324 0.0018
pDD 0.0081 0.0018 0.0001

or the heterozygoes and homozygoes

HO HO HO HE HE HO HE HE HO altHO
0.6562 0.1476 0.1476 0.0324 0.0162




Brief Overview

misc/ibs 
Needed arguments:
	-glf/-f		input GLF filename:
Optional arguments:
	-outFileName/-o	output filename(prefix):
	-nInd/-n	nubmer of individuals in GLF file:
	-ind1/i1	individuals 1:
	-ind2/i2	individuals 2:
	-allpairs/-a	analyse all pairs:
	-maxSites/-m	maximum sites to analyze:
	-model		ibs model:0 all 10 genotypes, 1 HO/HE


Options

-glf [fileName]

A binary GLF fileName that contains 10 genotype likelihoods per sites per individual as specified in ANGSD -doGLF 1.

-outFileName [fileName]

prefix for the output file names. Default in the glf input filename

-nInd

number of individuals in GLF file. This is needed if you have more than one individual in the GLF file.

-ind1 [int]

If you dont want to analysis all individuals then you can specify a single individual to analyze. an integer 0-(nInd-1). The first individuals is individuals 0

-ind2 [int]

if you only want to analyse a single pair of individuals then you can specify ind1 and ind2 an integer 0-(nInd-1). The first individuals is individuals 0

-allpairs [int]

use -allPairs 1 to analyse all pairs of individuals

-maxSites [int]

maximum sites to analyze. This is usefull if you don't have enough RAM for the whole genome

-model [int]

model:0 all 10 genotypes, 1 HO/HE


Output

If you analyse each individuals seperately (-allpair 0 )

Example of output *.ibs

ind	nSites	Llike	pAA	pAC	pAG	pAT	pCC	pCG	pCT	pGG	pGT	pTT
0	2713190	-3743970.428653	0.299576	0.000081	0.000300	0.000057	0.206840	0.000069	0.000319	0.200658	0.000080	0.292020
1	2696847	-3745104.294527	0.293158	0.000093	0.000331	0.000073	0.199560	0.000077	0.000411	0.203912	0.000070	0.302315
2	2708392	-3744487.856004	0.292558	0.000074	0.000304	0.000054	0.210749	0.000072	0.000347	0.205672	0.000064	0.290105
3	2703572	-3747132.052095	0.320657	0.000073	0.000269	0.000070	0.197760	0.000057	0.000296	0.191058	0.000076	0.289685
4	2645152	-3745343.670384	0.304946	0.000063	0.000248	0.000041	0.196731	0.000058	0.000242	0.196616	0.000047	0.301008
5	2697327	-3757413.590019	0.318968	0.000098	0.000323	0.000074	0.177954	0.000076	0.000367	0.182372	0.000097	0.319671
6	2712223	-3745037.550278	0.309327	0.000040	0.000222	0.000048	0.200149	0.000056	0.000238	0.195224	0.000053	0.294644
7	2671708	-3751258.005057	0.323357	0.000066	0.000318	0.000066	0.185810	0.000079	0.000336	0.187431	0.000072	0.302463

Example of output *.ibspair

ind1	ind2	nSites	Llike	pAA_AA	pAC_AA	pAG_AA	pAT_AA	pCC_AA	pCG_AA	pCT_AA	pGG_AA	pGT_AA	pTT_AA	pAA_AC	pAC_AC	pAG_AC	pAT_AC	pCC_AC	pCG_AC	pCT_AC	pGG_AC	pGT_AC	pTT_AC	pAA_AG	pAC_AG	pAG_AG	pAT_AG	pCC_AG	pCG_AG	pCT_AG	pGG_AG	pGT_AG	pTT_AG	pAA_AT	pAC_AT	pAG_AT	pAT_AT	pCC_AT	pCG_AT	pCT_AT	pGG_AT	pGT_AT	pTT_AT	pAA_CC	pAC_CC	pAG_CC	pAT_CC	pCC_CC	pCG_CC	pCT_CC	pGG_CC	pGT_CC	pTT_CC	pAA_CG	pAC_CG	pAG_CG	pAT_CG	pCC_CG	pCG_CG	pCT_CG	pGG_CG	pGT_CG	pTT_CG	pAA_CT	pAC_CT	pAG_CT	pAT_CT	pCC_CT	pCG_CT	pCT_CT	pGG_CT	pGT_CT	pTT_CT	pAA_GG	pAC_GG	pAG_GG	pAT_GG	pCC_GG	pCG_GG	pCT_GG	pGG_GG	pGT_GG	pTT_GG	pAA_GT	pAC_GT	pAG_GT	pAT_GT	pCC_GT	pCG_GT	pCT_GT	pGG_GT	pGT_GT	pTT_GT	pAA_TT	pAC_TT	pAG_TT	pAT_TT	pCC_TT	pCG_TT	pCT_TT	pGG_TT	pGT_TT	pTT_TT
0	1	2666273	-15101833.893403	0.044284	0.002556	0.002840	0.007315	0.017972	0.001189	0.002824	0.019690	0.003079	0.044594	0.011683	0.003200	0.003507	0.008936	0.003465	0.001633	0.003465	0.004289	0.003717	0.011984	0.016918	0.007269	0.007371	0.012387	0.009642	0.005872	0.007471	0.009886	0.007592	0.017011	0.017168	0.005425	0.005240	0.009444	0.009141	0.005412	0.005458	0.009405	0.005514	0.017127	0.017411	0.007036	0.006751	0.011750	0.009794	0.007228	0.006963	0.009600	0.007101	0.017301	0.016278	0.006881	0.006509	0.011501	0.009320	0.007174	0.006715	0.009397	0.006866	0.016429	0.018595	0.006579	0.006201	0.011048	0.009953	0.006773	0.006366	0.009590	0.006492	0.018422	0.019068	0.007246	0.006854	0.012420	0.010073	0.006641	0.007062	0.009515	0.007063	0.018884	0.002373	0.002853	0.002495	0.007785	0.001071	0.001662	0.002712	0.000920	0.002506	0.002561	0.053818	0.003643	0.003223	0.009266	0.023981	0.001400	0.003529	0.021892	0.003049	0.053435
0	2	2676526	-15816810.609543	0.041516	0.002951	0.002951	0.007122	0.018048	0.001474	0.002893	0.017909	0.002886	0.040033	0.013051	0.003357	0.003251	0.008372	0.004535	0.001808	0.003196	0.005441	0.003253	0.012761	0.020086	0.007098	0.006762	0.011401	0.011445	0.006130	0.006762	0.011151	0.006894	0.019695	0.017164	0.006746	0.006427	0.010837	0.010155	0.006090	0.006382	0.009635	0.006587	0.016748	0.015683	0.007385	0.007046	0.011432	0.009218	0.006675	0.007040	0.008797	0.007213	0.015285	0.017100	0.007088	0.006779	0.011182	0.009968	0.006274	0.006734	0.009407	0.006952	0.016626	0.014913	0.006453	0.006062	0.010040	0.008931	0.005985	0.006022	0.008548	0.006229	0.014648	0.024341	0.007421	0.006985	0.011662	0.013634	0.006674	0.006895	0.012352	0.007094	0.023660	0.004206	0.003167	0.002798	0.007235	0.001649	0.001867	0.002732	0.001216	0.002748	0.003767	0.049895	0.003815	0.003394	0.008391	0.023567	0.001494	0.003347	0.020295	0.003070	0.047911
0	3	2671296	-15660554.911695	0.037567	0.002986	0.003026	0.007970	0.012805	0.001148	0.002770	0.012966	0.002881	0.032981	0.022781	0.003080	0.003141	0.008308	0.008247	0.001530	0.002872	0.009005	0.002993	0.020261	0.021752	0.006992	0.006880	0.011773	0.010969	0.005788	0.006579	0.010640	0.006706	0.019734	0.018656	0.006450	0.006254	0.010667	0.009591	0.005672	0.006007	0.009326	0.006165	0.016988	0.017648	0.006789	0.006528	0.010714	0.009206	0.005808	0.006312	0.008878	0.006328	0.016119	0.017684	0.007068	0.006723	0.011350	0.009147	0.005854	0.006497	0.008710	0.006443	0.016055	0.018377	0.007038	0.006607	0.011320	0.009632	0.005736	0.006385	0.009067	0.006311	0.016671	0.021570	0.006779	0.006205	0.011137	0.010963	0.005538	0.006004	0.010191	0.005906	0.019442	0.017537	0.003402	0.002811	0.008049	0.007456	0.001552	0.002743	0.006172	0.002401	0.015417	0.045183	0.003889	0.003270	0.009095	0.017183	0.001220	0.003156	0.014249	0.002672	0.038899
0	4	2614962	-14819561.589144	0.043463	0.002407	0.002645	0.007394	0.016497	0.001107	0.002488	0.017405	0.002799	0.042466	0.015502	0.002873	0.003116	0.008777	0.003788	0.001351	0.002937	0.005238	0.003325	0.014871	0.018295	0.006551	0.006519	0.011845	0.009660	0.005401	0.006393	0.009730	0.006746	0.017307	0.017008	0.006034	0.005870	0.011057	0.009467	0.005539	0.005898	0.009462	0.006205	0.016197	0.018715	0.005621	0.005303	0.009626	0.010008	0.006151	0.005368	0.010072	0.005683	0.018173	0.023104	0.006546	0.006151	0.011009	0.012025	0.007019	0.006153	0.011398	0.006609	0.022200	0.016942	0.006526	0.006154	0.010938	0.009400	0.006624	0.006164	0.008933	0.006549	0.016012	0.020140	0.006080	0.005790	0.010608	0.010473	0.005667	0.005731	0.009818	0.006086	0.019118	0.001466	0.002945	0.002556	0.007448	0.000595	0.001660	0.002631	0.000536	0.002503	0.001708	0.058903	0.003621	0.003087	0.009001	0.024629	0.001300	0.003310	0.021219	0.002857	0.055710
0	5	2666711	-15227151.893570	0.036899	0.002553	0.002824	0.008822	0.010771	0.000729	0.002624	0.011793	0.003056	0.036172	0.022875	0.002763	0.002929	0.009225	0.007305	0.001029	0.002750	0.008607	0.003189	0.022559	0.021971	0.006600	0.006468	0.012324	0.010420	0.005076	0.006412	0.010733	0.006755	0.021399	0.018968	0.006145	0.005983	0.011149	0.009179	0.005045	0.005937	0.009421	0.006261	0.018479	0.017877	0.006457	0.006200	0.011085	0.008798	0.005187	0.006214	0.008926	0.006420	0.017399	0.017960	0.006619	0.006314	0.011399	0.008783	0.005235	0.006347	0.008791	0.006406	0.017389	0.018607	0.006561	0.006204	0.011295	0.009180	0.005133	0.006269	0.009132	0.006206	0.018034	0.021943	0.006268	0.005817	0.011020	0.010518	0.004960	0.005942	0.010337	0.005788	0.021241	0.017231	0.003055	0.002554	0.008172	0.006245	0.001119	0.002750	0.005446	0.002342	0.016849	0.045028	0.003483	0.002981	0.009449	0.015572	0.000810	0.003183	0.013559	0.002600	0.043110
0	6	2680960	-15533429.999386	0.044062	0.002891	0.002915	0.007490	0.016390	0.001064	0.002766	0.016855	0.002865	0.040301	0.013862	0.003288	0.003270	0.008757	0.003943	0.001378	0.003085	0.005044	0.003250	0.013000	0.021137	0.007026	0.006745	0.011713	0.011035	0.005662	0.006700	0.010895	0.006848	0.019762	0.018062	0.006638	0.006310	0.011037	0.009917	0.005632	0.006290	0.009454	0.006471	0.016941	0.016423	0.007178	0.006822	0.011603	0.008910	0.006190	0.006877	0.008591	0.006994	0.015524	0.018048	0.006978	0.006650	0.011437	0.009826	0.005892	0.006664	0.009334	0.006775	0.016896	0.015552	0.006321	0.005853	0.010214	0.008575	0.005558	0.006003	0.008188	0.005974	0.014624	0.025792	0.007423	0.006845	0.012090	0.013320	0.006362	0.006991	0.011994	0.006949	0.023881	0.004151	0.003212	0.002610	0.007736	0.001377	0.001592	0.002795	0.001003	0.002508	0.003837	0.053188	0.003932	0.003208	0.008975	0.022252	0.001276	0.003427	0.019009	0.002949	0.048090


Example

First generate genotype likelihood file for chromosome 1

./angsd -GL 1 -out genolike -doGlf 1 -bam bam.filelist -r 1:

Estimate all 10 genotype fractions for the second (same order as the bam.filelist) individual (-ind1 1)

misc/ibs -f genolike.glf.gz -nInd 10 -ind1 1

The output file is genolike.glf.gz.ibs

Estimate all 10 genotype fractions for each of the 10 individuals

misc/ibs -f genolike.glf.gz -nInd 10 -o all

The output file is all.ibs


Estimate the 10x10 genotype fraction matrix the first (-ind1 0) and the fourth (ind2 3) individual (same order as the bam.filelist)

misc/ibs -f genolike.glf.gz -nInd 10 -ind1 0 -ind 3

genolike.glf.gz.ibspair


Estimate the 10x10 genotype fraction matrix for all pairs (very slow)

misc/ibs -f genolike.glf.gz -nInd 10 -allpairs 1 -o all

The output file is all.ibspair