ANGSD: Analysis of next generation Sequencing Data
Latest tar.gz version is (0.938/0.939 on github), see Change_log for changes, and download it here.
Misc example
Information on this page is deprecated and should be removed DRAGON.
Run Examples
Using glfv3
../dirty -samglf ceu.glf.list -outfiles test.glf -doMaf 2 -fai numSort.Fai -nLines 50000 -chunkSize 500 -nThreads 16
Using mpileup
./dirty.g++ -chunkSize 200000 -outfiles ASDF -doMaf 2 -nThreads 10 mpileup -g -r 21:1-20000000 -I ~/sample/*.chr21 >bcfoutput
First is programname. Followed by the arguments used for dirty Followed by mpileup and the arguments that will be bassed directly to SAMtools
From version 0.25, we can now get the nucleotide count for every site, for every sample. This is done by omitting the -g parameter
Using tglf files
cd into the teststuff subfolder
../dirty.g++ -nThreads 1 -tglf lct.list -posfile lct.pos -nLines 100000 -outfiles GG -doMaf 15 -SNP_pval 1
If we want to estimate the SFS
../dirty.g++ -nThreads 5 -tglf lct.list -posfile lct.pos -nLines 100000 -outfiles KK -realSFS 1
Using soapfiles
../dirty.g++ -soap tsk.sub10.list -doMaf 2 -outfiles NEW10 -chunkSize 1000 -nLines 10000 -nThreads 4
-soap is filelist containing the soapfiles, each soapfile must be sorted according the chromosomename (lexical ordering), and position.
Using simulated files
These are .glf.gz files generated from simnextgen in misc subfolder, NB REMEMBER TO SUPPLY -nInd argument since these can't be inferred from the binary file.
./dirty.g++ -sim1 misc/small.glf.gz -nInd 15 -outfiles results -doMaf 2