An IgBLAST wrapper and parser
PyIR is a minimally-dependent high-speed wrapper for the IgBLAST immunoglobulin and T-cell analyzer. This is achieved through chunking the input data set and running IgBLAST single-core in parallel to better utilize modern multi-core and hyperthreaded processors.
PyIR has become an essential part of the Vanderbilt Vaccine Center workflow, and the requirements in the past few years has lead to the development of new features including:
- Parsing algorithm refactorization
- AIRR naming compliance
- Updated IgBlast binary
- Multiple output formats (including python dictionary)
- Built-in sequence filtering
- Simplified command-line interface
- Linux
- Python 3
- Pip version >=10.0.1 and the following packages: tqdm
- Any requirements for IgBLAST (including glibc >= 2.14)
- wget, gawk
PyIR is installed with the pip software packager, but is not currently a part of the PyPI repository index. It can be manually downloaded and installed as followed:
This repository can be downloaded by selecting "Download ZIP" from the "Clone and Download" menu at the top right of this github page or by using git from command line:
git clone https://github.com/crowelab/PyIR
cd PyIR/
sudo pip3 install .
cd PyIR/
pip3 install --user .
Locate your local bin folder with PyIR and add it to your PATH variable. ~/.local/bin and /usr/local/bin are good places to start. If using scl or other virtual environments (such as conda) be sure to account for those when searching your directories.
Double-check that you've met all prerequisites to install IgBLAST, including GLIBC > 2.14 (which has caused issues with CentOS 6). If unsure
Ensure that the version of pip used to install pyir-plus is associated with the correct version of python you are attempting to run. This can also be an issue with virtual environments.
PyIR-Plus requires a set of BLAST germline databases to assign the VDJ germlines.
A snapshot of the IMGT/GENE-DB human immunome repertoire is included with PyIR-Plus, but users are recommended to build their own database to keep up with the newest germline definitions. A link to the full instructions from NCBI can be found here, or you can use PyIR's setup script to build the databases automatically:
#Builds databases in pyir library directory
pyir setup
#Builds databases in specified path
pyir setup -o path/
#Default PyIR
pyir example.fasta
#PyIR with filtering
pyir example.fasta --enable_filter
#PyIR with custom BLAST database
pyir example.fasta -d [path_to_DB]
import PyIR.factory as factory
FILE = 'example.fasta'
pyir = factory.PyIR(query=FILE, args={'outfmt': 'dict'})
result = pyir.run()
#Prints result as Python dictionary
print(result)
pyirfile = factory.PyIR(query=FILE)
result = pyirfile.run()
#Prints the output file
print(result)
Test files used for the BMC Bioinformatics paper can be found at: https://clonomatch.accre.vanderbilt.edu/pyirfiles/