ANGSD: Analysis of next generation Sequencing Data

Latest tar.gz version is (0.936/0.937 on github), see Change_log for changes, and download it here.

Information on this page is deprecated and should be removed.

Run Examples

Using glfv3

../dirty -samglf ceu.glf.list -outfiles test.glf -doMaf 2 -fai numSort.Fai -nLines 50000 -chunkSize 500 -nThreads 16

Using mpileup

./dirty.g++ -chunkSize 200000 -outfiles ASDF -doMaf 2 -nThreads 10 mpileup -g -r 21:1-20000000 -I ~/sample/*.chr21 >bcfoutput

First is programname. Followed by the arguments used for dirty Followed by mpileup and the arguments that will be bassed directly to SAMtools

From version 0.25, we can now get the nucleotide count for every site, for every sample. This is done by omitting the -g parameter

Using tglf files

cd into the teststuff subfolder

../dirty.g++ -nThreads 1 -tglf lct.list -posfile lct.pos -nLines 100000 -outfiles GG -doMaf 15 -SNP_pval 1

If we want to estimate the SFS

../dirty.g++ -nThreads 5 -tglf lct.list -posfile lct.pos -nLines 100000 -outfiles KK -realSFS 1

Using soapfiles

../dirty.g++ -soap tsk.sub10.list -doMaf 2 -outfiles NEW10 -chunkSize 1000 -nLines 10000 -nThreads 4

-soap is filelist containing the soapfiles, each soapfile must be sorted according the chromosomename (lexical ordering), and position.

Using simulated files

These are .glf.gz files generated from simnextgen in misc subfolder, NB REMEMBER TO SUPPLY -nInd argument since these can't be inferred from the binary file. ./dirty.g++ -sim1 misc/small.glf.gz -nInd 15 -outfiles results -doMaf 2