ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Fasta: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
availeble from version 0.559+ | availeble from version 0.559+ | ||
This option | This option creates a fasta file from a sequencing data file (bam file). The function uses genome information in the bam header to determine the length and chromosome names. For the sites without data an "N" is written. | ||
<classdiagram type="dir:LR"> | <classdiagram type="dir:LR"> | ||
Line 20: | Line 20: | ||
=options= | =options= | ||
;-doFasta 1: sample a random base | ;-doFasta 1: sample a random base at each position | ||
;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same counts. The "-doCounts 1" options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base | ;-doFasta 2: use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. The "-doCounts 1" options for [[Alleles_counts|allele counts]] is needed in order to determine the most common base | ||
;-minQ [INT] | ;-minQ [INT] | ||
Line 29: | Line 29: | ||
==Example== | ==Example== | ||
Create a fasta file bases | Create a fasta file bases from a random samples of bases | ||
<pre> | <pre> | ||
./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1 | ./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1 | ||
</pre> | </pre> |
Revision as of 17:50, 27 November 2013
availeble from version 0.559+
This option creates a fasta file from a sequencing data file (bam file). The function uses genome information in the bam header to determine the length and chromosome names. For the sites without data an "N" is written.
<classdiagram type="dir:LR">
[One bam file{bg:orange}]->[sequencing data|random base (-doFasta 1);consensus base (-doFasta 2)]
[sequencing data]->doFasta[fasta file{bg:blue}]
</classdiagram>
Brief Overview
> ./angsd -doFasta -------------- analysisFasta.cpp: -doFasta 0 1: use a random base 2: use the most common base (needs -doCounts 1) -minQ 13 (remove bases with qscore<minQ)
options
- -doFasta 1
- sample a random base at each position
- -doFasta 2
- use the most common base. In the case of ties a random base is chosen among the bases with the same maximum counts. The "-doCounts 1" options for allele counts is needed in order to determine the most common base
- -minQ [INT]
minimum base quality score
Example
Create a fasta file bases from a random samples of bases
./angsd -i smallNA07056.mapped.ILLUMINA.bwa.CEU.low_coverage.20111114.bam -doFasta 1