ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.

Fasta: Difference between revisions

From angsd
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
availeble from version 0.559+
availeble from version 0.559+


This option create a fasta file from a sequencing data file. The function uses genome information in the bam header to determing the length and chromosome names. For the sites without data an "N" is written.  
This option creates a fasta file from a sequencing data file (bam file). The function uses genome information in the bam header to determine the length and chromosome names. For the sites without data an "N" is written.  


<classdiagram type="dir:LR">
<classdiagram type="dir:LR">
Line 20: Line 20:


=options=
=options=
;-doFasta 1: sample a random base
;-doFasta 1: sample a random base at each position


;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same counts. The "-doCounts 1" options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base
;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. The "-doCounts 1" options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base


;-minQ [INT]  
;-minQ [INT]  
Line 29: Line 29:


==Example==
==Example==
Create a fasta file bases on a random samples of bases
Create a fasta file bases from a random samples of bases


<pre>
<pre>
./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1
./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1
</pre>
</pre>

Revision as of 17:50, 27 November 2013

availeble from version 0.559+

This option creates a fasta file from a sequencing data file (bam file). The function uses genome information in the bam header to determine the length and chromosome names. For the sites without data an "N" is written.

<classdiagram type="dir:LR">

[One bam file{bg:orange}]->[sequencing data|random base (-doFasta 1);consensus base (-doFasta 2)]

[sequencing data]->doFasta[fasta file{bg:blue}]

</classdiagram>

Brief Overview

> ./angsd -doFasta
--------------
analysisFasta.cpp:
	-doFasta	0
	1: use a random base
	2: use the most common base (needs -doCounts 1)
	-minQ		13	(remove bases with qscore<minQ)

options

-doFasta 1
sample a random base at each position
-doFasta 2
use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. The "-doCounts 1" options for allele counts is needed in order to determine the most common base
-minQ [INT]

minimum base quality score


Example

Create a fasta file bases from a random samples of bases

./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1