ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Haploid calling: Difference between revisions

From angsd
Jump to navigation Jump to search
(Created page with "Simple haploid output based on sampling or consensus. __TOC__ <classdiagram type="dir:LR"> [BAM files{bg:orange}]->[Sequence data|Random base;Consensus base] [sequence da...")
 
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Simple haploid output based on sampling or consensus.  
Simple haploid output based on sampling or consensus. Latest github version of angsd has a small utility program in the misc folde that converts to plink output (tfam/tped).
 
 


__TOC__
__TOC__
Line 8: Line 10:
[sequence data]->[*.haplo.gz|single base file{bg:blue}]
[sequence data]->[*.haplo.gz|single base file{bg:blue}]
</classdiagram>
</classdiagram>


=Brief Overview=
=Brief Overview=
Line 60: Line 61:


<pre>
<pre>
1 14094607 C C N N C C C T
chr pos major ind0 ind1 ind2 ind3 ind4 ind5 ind6
1 14094618 C C N N C G C N
1 14000170 C T T C N C C C
1 14094619 G C N N G N G G
1 14000202 A A N G A N N G
1 14094628 C G N N N C N G
1 14000457 G G G G G G N A
1 14094784 G G G T T N G G
1 14000459 G G G G G A N N
1 14095072 A A N A A A A C
1 14000774 G T G G G G G T
1 14095751 C C C C C N T C
1 14002083 C G N C C C C C
1 14095773 G G G G G G N T
1 14002351 A A C C A C N A
1 14095992 C C A N N C A C
1 14002950 A T A A A T N T
1 14096030 C C C N A N C N
1 14004832 G G G A G G A G
1 14096362 G T G G G G G G
1 14006543 G T G G G G G G
1 14096635 A T A A N A N N
1 14006631 A C N A N A N A
1 14096717 C C N C C C C N
1 14007068 G T T T G G G N
1 14097480 A A G A A A A A
1 14009284 A A C C C N A N
1 14097899 T T T G T T G T
1 14009775 G G G G G C G C
1 14098042 G T N G T T G T
1 14009787 T T T G T G T T
1 14098127 A A N C A N A A
1 14009791 A G G A G A G A
1 14098140 G G N G G G N G
1 14009794 A A A A N N A A
1 14098148 C A N C C C N C
1 14009800 A G A A G N G A
1 14098346 T T T T G T G G
1 14010748 A G N A G A A A
1 14098792 T T N T A N T N
1 14099223 G G G T G G G G
</pre>
</pre>
columns are
; chr
chromosome
; pos
position
; major
major allele (most common of the sampled alleles)
; ind0
first individual - same order as in the input files

Latest revision as of 10:21, 27 October 2020

Simple haploid output based on sampling or consensus. Latest github version of angsd has a small utility program in the misc folde that converts to plink output (tfam/tped).



<classdiagram type="dir:LR">

[BAM files{bg:orange}]->[Sequence data|Random base;Consensus base]

[sequence data]->[*.haplo.gz|single base file{bg:blue}] </classdiagram>

Brief Overview

> ./angsd -doHaploCall
	-> angsd version: 0.910-45-g2b2b4f0-dirty (htslib: 1.2.1-192-ge7e2b3d) build(Jan  3 2016 14:45:41)
	-> Analysis helpbox/synopsis information:
	-> Command: 
./angsd -doHaploCall 	-> Sun Jan  3 15:18:15 2016
--------------
abcHaploCall.cpp:
	-doHaploCall	0
	(Sampling strategies)
	 0:	 no haploid calling 
	 1:	 (Sample single base)
	 2:	 (Concensus base)
	-doCounts	0	Must choose -doCount 1
Optional
	-minMinor	0	Minimum observed minor alleles
	-maxMis	-1	Maximum missing bases (per site)


This function outputs a base for each individual for each site

Options

-doHaploCall [int]

1; sample a random base 2; most frequent base. Random base for ties

-doCounts 1

use -doCounts 1 in order to count the bases at each sites after filters.

-minMinor [int]

Minimum observed minor alleles; only prints sites with more than minMinor sampled alleles (across individuals).

-maxMis [int]

maximum allowed missing alleles (accross individuals). -maxMis 0 means only sites without missing alleles are printed


Output

  • .haplo.gz

Output: Each line represents site. chromsome name (Column 1), position (Column 2), major allele (Column 3). One column for each individual with the sampled allele.

Example

Create a fasta file bases from a random samples of bases.

./angsd -bam bam.filelist -dohaplocall 1 -doCounts 1 -r 1: -minMinor 1

Output

chr	pos	major	ind0	ind1	ind2	ind3	ind4	ind5	ind6
1	14000170	C	T	T	C	N	C	C	C
1	14000202	A	A	N	G	A	N	N	G
1	14000457	G	G	G	G	G	G	N	A
1	14000459	G	G	G	G	G	A	N	N
1	14000774	G	T	G	G	G	G	G	T
1	14002083	C	G	N	C	C	C	C	C
1	14002351	A	A	C	C	A	C	N	A
1	14002950	A	T	A	A	A	T	N	T
1	14004832	G	G	G	A	G	G	A	G
1	14006543	G	T	G	G	G	G	G	G
1	14006631	A	C	N	A	N	A	N	A
1	14007068	G	T	T	T	G	G	G	N
1	14009284	A	A	C	C	C	N	A	N
1	14009775	G	G	G	G	G	C	G	C
1	14009787	T	T	T	G	T	G	T	T
1	14009791	A	G	G	A	G	A	G	A
1	14009794	A	A	A	A	N	N	A	A
1	14009800	A	G	A	A	G	N	G	A
1	14010748	A	G	N	A	G	A	A	A

columns are

chr

chromosome

pos

position

major

major allele (most common of the sampled alleles)

ind0

first individual - same order as in the input files