IBSrelate: Difference between revisions

From software
Jump to navigation Jump to search
No edit summary
Line 16: Line 16:
The shell commands given below are available in a text file here: https://github.com/rwaples/freqfree_suppl/blob/master/example_data.sh .
The shell commands given below are available in a text file here: https://github.com/rwaples/freqfree_suppl/blob/master/example_data.sh .


They are available in a Jupyter notebook here: https://nbviewer.jupyter.org/github/rwaples/freqfree_suppl/blob/master/example_data.ipynb .
They are available in a Jupyter notebook here (slightly older version): https://nbviewer.jupyter.org/github/rwaples/freqfree_suppl/blob/master/example_data.ipynb .


== Setup ==
== Setup ==
Line 154: Line 154:
cat all.filelist
cat all.filelist
</pre>
</pre>


=Citation=
=Citation=

Revision as of 09:27, 24 September 2019

This page contains information about the method IBSrelate, a method to identify pairs of related individuals without requiring population allele frequencies.

On this page we will show you how to estimate the R0, R1 and KING-robust kinship statistics for a pair (or more!) of individuals from aligned sequencing data. These statistics are informative about relatedness, but can also be useful for quality-control (QC). For further details, please see our paper in Molecular Ecology at: https://doi.org/10.1111/mec.14954


Calculating statistics from the output of IBS and realSFS

IBS and realSFS are two methods implemented in ANGSD [1] that can be used to estimate the allele sharing genotype distribution for a pair of individuals. The paper describes and examines the differences between the two methods, but we expect they both will perform comparably well for most applications. Below are links to two R scripts that can be used to load the output of IBS and realSFS and produce estimates of R0, R1 and KING-robust kinship.

https://github.com/rwaples/freqfree_suppl/blob/master/read_IBS.R

https://github.com/rwaples/freqfree_suppl/blob/master/read_realSFS.R


Application to the ANGSD example data

The shell commands given below are available in a text file here: https://github.com/rwaples/freqfree_suppl/blob/master/example_data.sh .

They are available in a Jupyter notebook here (slightly older version): https://nbviewer.jupyter.org/github/rwaples/freqfree_suppl/blob/master/example_data.ipynb .

Setup

You will need installations and both ANGSD and samtools, as well as Rscript (part of R).

Set up shell variables

# set paths to the analysis programs,
# may need to be replaced your local installation(s)
ANGSD="$HOME/programs/angsd/angsd"
realSFS="$HOME/programs/angsd/misc/realSFS"
IBS="$HOME/programs/angsd/misc/ibs"
SAMTOOLS="samtools"

Get the example data

The example data set has small bam files from ten individuals.
# download the example data
wget http://popgen.dk/software/download/angsd/bams.tar.gz

# unzip/untar and index the bam files
tar xf bams.tar.gz
for i in bams/*.bam;do samtools index $i;done


realSFS method

Setup

# make a directory for the results
mkdir results_realsfs

# get the R script to parse the realSFS output
wget https://raw.githubusercontent.com/rwaples/freqfree_suppl/master/read_realSFS.R

Specify an allele at each site

For the realSFS method, one of the alleles at each site must be specified. Here we will use an ancestral state file.

# download and index the ancestral state fasta file
wget http://popgen.dk/software/download/angsd/hg19ancNoChr.fa.gz
$SAMTOOLS faidx hg19ancNoChr.fa.gz

Generate a saf (site allele frequency likelihood) file for each individual

# make a separate bam filelist for each individual
# also create a SAMPLES array for use below
BAMS=./bams/*.bam
SAMPLES=()
for b in $BAMS; do
  # parse out the sample name
  base="$(basename -- $b)"
  sample="${base%%.mapped.*}"
  SAMPLES+=("$sample")
  echo $sample
  echo $b > ${sample}.filelist.ind
done

# run doSAF on each individual
for s in "${SAMPLES[@]}"; do
  $ANGSD -b ${s}.filelist.ind \
  -anc hg19ancNoChr.fa.gz \
  -minMapQ 30 -minQ 20 -GL 2 \
  -doSaf 1 -doDepth 1 -doCounts 1 \
  -out ${s}
done

run realSFS on each pair of indiviudals

Here we have 10 individuals, and want to consider each pair just once. We index the SAMPLES array created above.

for i in {0..9}; do
  for j in {0..9}; do
    if (( i < j)); then
      sample1=${SAMPLES[i]}
      sample2=${SAMPLES[j]}
      $realSFS ${sample1}.saf.idx ${sample2}.saf.idx > ./results_realsfs/${sample1}_${sample2}.2dsfs
    fi
  done
done

Parse the results for a single pair of individuals

Below shows how to use the read_realSFS() function in R to parse the output 2dsfs file generated by realSFS.

Rscript \
  -e "source('./read_realSFS.R')" \
  -e "res = read_realSFS('results_realsfs/smallNA06985_smallNA11830.2dsfs')" \
  -e "res['sample1'] = 'smallNA06985'; res['sample2'] = 'smallNA11830'" \
  -e "print(res[,c('sample1', 'sample2', 'nSites', 'Kin', 'R0', 'R1') ])"


IBS Method

Setup

# make a directory for the results
mkdir results_IBS

# get the R script to parse the IBS output
wget https://raw.githubusercontent.com/rwaples/freqfree_suppl/master/read_IBS.R

## make a bam filelists containing all individuals
ls bams/*.bam > all.filelist


make a genotype likelihood file (glf) containing all individuals

$ANGSD -b all.filelist \
  -minMapQ 30 -minQ 20 -GL 2 \
  -doGlf 1 \
  -out example

run IBS, this will analyse each pair of individuals

$IBS -glf example.glf.gz \
  -model 0 \
  -nInd 10 -allpairs 1 \
  -outFileName results_IBS/ibs.model0.results

Examine the results

Rscript \
  -e "source('./read_IBS.R')" \
  -e "res = do_derived_stats(read_ibspair_model0('results_IBS/ibs.model0.results.ibspair'))" \
  -e "print(res[6,c('ind1', 'ind2', 'nSites', 'Kin', 'R0', 'R1') ])"

# the IBS method in ANGSD indexes individuals as they appear in the filelist
# (zero-indexed)
cat all.filelist

Citation

Waples, R. K., Albrechtsen, A. and Moltke, I. (2018), Allele frequency‐free inference of close familial relationships from genotypes or low depth sequencing data. Mol Ecol. doi:10.1111/mec.14954


Bibtex

@article{doi:10.1111/mec.14954,
author = {Waples, Ryan K and Albrechtsen, Anders and Moltke, Ida},
title = {Allele frequency-free inference of close familial relationships from genotypes or low depth sequencing data},
journal = {Molecular Ecology},
volume = {0},
number = {ja},
pages = {},
doi = {10.1111/mec.14954},
url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.14954},
eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/mec.14954},
}