This page contains some random examples that shows some aspect of the ANGSD program. There is also an old Tutorial, but this is somewhat outdated. We assume you will have SAMtools installed.

Many of the examples in the individual subpages are based on this test data set. The examples below are just some random examples.

Download and prepare

First download some test data of random small BAM files which contains some regions from different chromosomes for 10 samples from the 1000genomes project. The file size is around 100megabytes.

wget http://popgen.dk/software/download/angsd/bams.tar.gz
tar xf bams.tar.gz

This has made a folder called bams/, which contains our 10 samples. Now download and install angsd you can follow the guidelines at the Installation page which basicly says:

wget http://popgen.dk/software/download/angsd/angsd0.570.tar.gz
tar xfz angsd0.570.tar.gz
cd angsd0.570
make
cd ..

We will also index the BAM files in case we need to do random access, for this we will use SAMtools.

for i in bams/*.bam;do samtools index $i;done

We make a file containing a list of the locations of the 10 bamfiles

ls bams/*.bam > bam.filelist

Examples

Calculate Allele frequencies

Assuming you have a list of bamfiles in in file: 'bam.filelist' and you want the MAF using all reads and inferring the major and minor from the GL, we will use SAMtools genotype likelihoods, and will allow for 5 threads: See details on Allele Frequencies, Major Minor and Genotype Likelihoods.

./angsd0.570/angsd -b list -GL 1 -doMajorMinor 1 -doMaf 2 -P 5

	Command:
./angsd0.574/angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5 
	-> angsd version: 0.574	 build(Jan 10 2014 17:44:41)
	-> No '-out' argument given, output files will be called 'angsdput'
	-> Parsing 10 number of samples 
	-> Printing at chr: 20 pos:14095816 chunknumber 3500
	-> Done reading data waiting for calculations to finish
	-> Calling destroy
	-> Done waiting for threads
	-> Output filenames:
		->"angsdput.arg"
		->"angsdput.mafs.gz"
	-> Fri Jan 10 17:46:15 2014
	-> Arguments and parameters for all analysis are located in .arg file
	[ALL done] cpu-time used =  130.67 sec
	[ALL done] walltime used =  55.00 sec

The output is then located on angsdput.mafs.gz. We could have specified an output file name with -out. Lets remove those reads that has a mapping quality below 30, and only use the bases with a score above 19. And to simply output we only print those sites with an allele frequency above 0.05.

./angsd0.574/angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5 -minMapQ 30 -minBaseQ 20 -minMaf 0.05

Command:
./angsd0.574/angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5 -minMapQ 30 -minQ 20 -minMaf 0.05 
	-> angsd version: 0.574	 build(Jan 10 2014 17:44:41)
	-> No '-out' argument given, output files will be called 'angsdput'
	-> Parsing 10 number of samples 
	-> Printing at chr: 20 pos:14085533 chunknumber 2800
	-> Done reading data waiting for calculations to finish
	-> Calling destroy
	-> Done waiting for threads
	-> Output filenames:
		->"angsdput.arg"
		->"angsdput.mafs.gz"
	-> Fri Jan 10 17:57:55 2014
	-> Arguments and parameters for all analysis are located in .arg file
	[ALL done] cpu-time used =  123.48 sec
	[ALL done] walltime used =  51.00 sec

And lets look at the output:

gunzip -c angsdput.mafs.gz |head

chromo	position	major	minor	unknownEM	nInd
1	14000023	C	A	0.076211	4
1	14000176	G	A	0.117885	6
1	14000202	G	A	0.052565	6
1	14000873	G	A	0.295131	9
1	14001018	T	C	0.269244	9
1	14001202	G	T	0.065666	8
1	14001501	G	A	0.062746	10
1	14001867	A	G	0.272099	10
1	14002093	T	C	0.058891	10

We have 10 samples, lets only look at the sites where we have an effective sample size of at least 8.

./angsd0.574/angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5 -minMapQ 30 -minQ 20 -minMaf 0.05 -minInd 8

Command:
./angsd0.574/angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5 -minMapQ 30 -minQ 20 -minMaf 0.05 -minInd 8
	-> angsd version: 0.574	 build(Jan 10 2014 17:44:41)
	-> No '-out' argument given, output files will be called 'angsdput'
	-> Parsing 10 number of samples 
	-> Printing at chr: 20 pos:14085533 chunknumber 2800
	-> Done reading data waiting for calculations to finish
	-> Calling destroy
	-> Done waiting for threads
	-> Output filenames:
		->"angsdput.arg"
		->"angsdput.mafs.gz"
	-> Fri Jan 10 18:11:09 2014
	-> Arguments and parameters for all analysis are located in .arg file
	[ALL done] cpu-time used =  36.64 sec
	[ALL done] walltime used =  30.00 sec

And look at the output:

gunzip -c angsdput.mafs.gz |head

chromo	position	major	minor	unknownEM	nInd
1	14000873	G	A	0.295131	9
1	14001018	T	C	0.269244	9
1	14001202	G	T	0.065666	8
1	14001501	G	A	0.062746	10
1	14001867	A	G	0.272099	10
1	14002093	T	C	0.058891	10
1	14002342	C	T	0.078385	9
1	14002422	A	T	0.434411	9
1	14002474	T	C	0.072898	8

Generate Beagle Likelihood files

For diploid samples we have 10 possible genotypes. but we would only expect to observe to different alleles at a site. So we need to infer the major and minor allele for dumping the 3 possible genotypes for the beagle format.

Quick Start

Contents

Download and prepare

Examples

Calculate Allele frequencies

Generate Beagle Likelihood files

Navigation menu