This repository contains Python code for automatically fitting admixture graphs (with qpGraph), using a heuristic algorithm to iteratively fit increasingly complex models, and R code for calculating Bayes factors (with admixturegraph) to compare the fitted models.
The heuristic search algorithm was first described in the paper The evolutionary history of dogs in the Americas. The code was subsequently refactored to form a stand alone tool, include Bayes factor calculations, for the paper Genomic analysis on pygmy hog reveals extensive interbreeding during wild boar expansion.
Given an outgroup with which to root the graph, a stepwise addition order algorithm is used to add leaf nodes to the graph. At each step, insertion of a new node is tested at all branches of the graph, except the outgroup branch. Where a node can not be inserted without producing f4 outliers (i.e. |Z| >=3) then all possible admixture combinations are also attempted. If a node cannot not be inserted via either approach, that sub-graph is discarded. If the node is successfully inserted, the remaining nodes are recursively inserted into that graph. All possible starting node orders are attempted to ensure maximal coverage of the graph space.
The resulting list of fitted graphs are then passed to the MCMC algorithm implemented in the admixturegraph R package, to compute the marginal likelihood of the models and their Bayes Factors (BF).
If you reuse any of this code then please cite the papers:
Leathlobhair, M.N.*, Perri, A.R.*, Irving-Pease, E.K.*, Witt, K.E.*, Linderholm, A.*, [...], Murchison, E.P., Larson, G., Frantz, L.A.F., 2018. The evolutionary history of dogs in the Americas. Science 361, 81–85. https://doi.org/10.1126/science.aao4776
Liu, L., Bosse, M., Megens, H.-J., Frantz, L.A.F., Lee, Y.-L., Irving-Pease, E.K., Narayan, G., Groenen, M.A.M., Madsen, O., 2019. Genomic analysis on pygmy hog reveals extensive interbreeding during wild boar expansion. Nature Communications 10, 1992. https://doi.org/10.1038/s41467-019-10017-2
To use this software you will need to install various dependencies.
The easiest way to install qpBrute and all the dependencies is via the conda package manager.
To install miniconda3 for MacOSX:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.shor for Linux:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.shIf you already have conda installed, you may need to update it to the latest version before installing qpBrute:
conda update -n base -c defaults condaThen install qpBrute and all it's dependencies in one step:
conda env create --name qpbrute --file https://raw.githubusercontent.com/ekirving/qpbrute/master/environment.yamlAnd lastly, activate the new environment:
conda activate qpbruteAlternatively, you can install all the dependencies manually via pip and CRAN.
Python ≥ 3.6 and pip:
pip install git+https://github.com/ekirving/qpbrute.gitThe full list of Python modules installed in the project environment can be found in the requirements.txt file.
R ≥ 3.4 with the following modules:
install.packages(c("admixturegraph", "coda", "data.table", "ggplot2", "gtools", "raster", "reshape2", "scales", "stringr", "viridis"))install_github("sbfnk/fitR")Note: the build location of the binary files for AdmixTools need to be added to your path.
echo 'export PATH="/path/to/AdmixTools/bin:$PATH"' >> ~/.bash_profileThen reload your bash profile:
source ~/.bash_profileNote: The size of the graph space grows super exponentially with each additional population, so the maximum number of
population supported by qpBrute in a full search is 7. However, you can use the --no-admix and --qpgraph parameters
to reduce the size of the search space and add many more populations in an iterative fashion.
The pipeline is broken into two steps:
qpBrute \
--par test/sim1.par \
--prefix sim1 \
--pops A B C X \
--out OutSometimes you already have a base model which you just want to add extra populations to (i.e. use --pops to specify the new populations).
qpBrute \
--par test/sim1.par \
--prefix sim1 \
--pops Y Z \
--out Out \
--qpgraph path/to/modelYou can also use the --no-admix flag to create a skeleton tree containing populations you know are not admixed, and
use this model as input with the --qpgraph parameter. This allows you to create large models with many more
populations than can be fully explored via a brute force approach.
qpBayes \
--geno test/sim1.geno \
--ind test/sim1.ind \
--snp test/sim1.snp \
--prefix sim1 \
--pops A B C X \
--out OutEvan K. Irving-Pease, PalaeoBARN, University of Oxford
This project is licensed under the MIT License - see the LICENSE.md file for details