PyIR

Files pertaining to the manuscript High frequency of shared clonotypes in human B cell receptor repertoires

PyIR

Immunoglobulin and T-Cell receptor rearrangement software

A Python wrapper for IgBLAST that scales to allow for the parallel processing of millions of reads on shared memory computers. All output is stored in a convenient JSON format.

Requires

Python 3.6
Pip version 10.0.1 or greater (python 3.6)
MacOSX or Linux
wget - Installed on many linux distributions by default. Available for mac through the homebrew package manager

Installation

Download the repository

This repository can be downloaded by selecting "Download ZIP" from the "Clone and Download" menu at the top right of this github page or by using git from command line.

If you have git installed you may use the command in order to place a copy of the github repo into your current directory.

git clone https://github.com/crowelab/PyIR

Global Installation

Ensure that pip is associated with the correct verison of python. If pip --version says that it is asscoiated with python version 2 then pip3 should be explicityly used instead of pip.

cd PyIR/
sudo pip install .

Local Installation

Ensure that pip is associated with the correct verison of python. If pip --version says that it is asscoiated with python version 2 then pip3 should be explicityly used instead of pip.

cd PyIR/
pip install --user .

If installing locally confirm pythons local bin is in your path. If not you can add the the following to your ~/.bashrc

export PY_USER_BIN=$(python -c 'import site; print(site.USER_BASE + "/bin")')
export PATH=$PY_USER_BIN:$PATH

Building the Database

A shell script has been included in this repository which will build the databases and check to make sure that your installation is functioning properly. You may run the included "SetupGermlineLibrary.sh" script in order to build the gerline library and load test data. If the setup is successful then a file will be created which is a gunzipped json file containing the output of PyIR for the setup scripts testcase.

bash SetupGermlinLibrary.sh

Usage

usage: pyir [-h] -d DATABASE [-r {Ig,TCR}] [-s {human,mouse}]
            [-nV NUM_V_ALIGNMENTS] [-nD NUM_D_ALIGNMENTS]
            [-nJ NUM_J_ALIGNMENTS] [-mD MIND] [-cz CHUNK_SIZE] [-x EXECUTABLE]
            [-m MULTI] [-o inputfile.json.gz] [--debug]
            [--additional_field ADDITIONAL_FIELD] [-f json] [--pretty]
            [--silent]
            query.fasta

A Python wrapper for IgBLAST that scales to allow for the parallel processing
of millions of reads on shared memory computers. All output is stored in a
convenient JSON format. Authors - Andre Branchizio, Jordan Willis, Jessica
Finn

optional arguments:
  -h, --help            show this help message and exit

Necessary Arguments:
  Arguments that must be included

  query.fasta           The fasta or fastq file to be run through the protocol

File paths and types:
  Database paths, search types

  -d DATABASE, --database DATABASE
                        Path to your blast database directory
  -r {Ig,TCR}, --receptor {Ig,TCR}
                        The receptor you are analyzing, immunoglobulin or t
                        cell receptor
  -s {human,mouse}, --species {human,mouse}
                        The Species you are analyzing
  -cz CHUNK_SIZE, --chunk_size CHUNK_SIZE
                        How many sequences to work on at once. The higher the
                        number the more memory needed. If none specified chunk
                        size will be determined based on input file size

BLAST Specific Arguments:
  Arguments Specific to IgBlast

  -nV NUM_V_ALIGNMENTS, --num_V_alignments NUM_V_ALIGNMENTS
                        How many V genes do you want to match?
  -nD NUM_D_ALIGNMENTS, --num_D_alignments NUM_D_ALIGNMENTS
                        How many D genes do you want to match?, does not apply
                        for kappa and lambda
  -nJ NUM_J_ALIGNMENTS, --num_J_alignments NUM_J_ALIGNMENTS
                        How many J genes do you want to match?
  -mD MIND, --minD MIND
                        The amount of nucleotide matches needed for a D gene
                        match. >= 5 right now
  -x EXECUTABLE, --executable EXECUTABLE
                        The location of IGBlastn binary, the default location
                        is determined based on the OS and uses the igblast
                        binaries included in this application.

General Arguments:
  Output and Miscellaneous Arguments

  -m MULTI, --multi MULTI
                        Multiprocess by the amount of CPUs you have. Or you
                        can enter a number or type 0 to turn it off
  -o inputfile.json.gz, --out inputfile.json.gz
                        Output_file_name, defaults to inputfile.json.gz
  --debug               Debug mode, this will not delete the temporary blast
                        files and will print some other useful things, like
                        which regions did not parse
  --additional_field ADDITIONAL_FIELD
                        A comma key,value pair for an additional field you
                        want to add to the output json. Example '--
                        additional_field=donor,10` adds a donor field with
                        value 10.
  -f json, --out-format json
                        Output file format, only json currently supported
  --pretty              Pretty json output
  --silent              Silence stdout

Using PyIR as an api

import json
import pyir.factory
import tempfile

input_file = open('1K_Seqs.fasta', 'r')
out_file = tempfile.NamedTemporaryFile(delete=True)
num_procs = 4
argument_overrides = {
    'silent': True,
    'database': 'Path/To/Database',
    'query': input_file,
    'out': out_file.name,
    'multi': num_procs
}

py_ir = pyir.factory.PyIr(argument_overrides)
result = py_ir.run()
for line in result:
    seq = json.loads(line)
    # Do whatever you need with the resulting sequence
    print(seq['Sequence ID'], seq['Top V gene match'] if 'Top V gene match' in seq else 'No match' )

Files associated with the manuscript High frequency of shared clonotypes in human B cell receptor repertoires

Synthetic clonotype data sets:

Recombinator & additional scripts:

Recombinator & other scripts

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
bin		bin
pyir		pyir
testing		testing
.gitignore		.gitignore
README.md		README.md
SetupGermlineLibrary.sh		SetupGermlineLibrary.sh
edit_imgt_file.pl		edit_imgt_file.pl
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Files pertaining to the manuscript High frequency of shared clonotypes in human B cell receptor repertoires

Install PyIR globally (for all users of your workstation)

Install PyIR locally (for the user that you are currently logged in under)

Building the databases that are used by PyIR

Using PyIR from bash

Using PyIR as an api

PyIR

Requires

Installation

Global Installation

Local Installation

Building the Database

Usage

Using PyIR as an api

Files associated with the manuscript High frequency of shared clonotypes in human B cell receptor repertoires

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

crowelab/PyIR

Folders and files

Latest commit

History

Repository files navigation

PyIR

Requires

Installation

Global Installation

Local Installation

Building the Database

Usage

Using PyIR as an api

Files associated with the manuscript High frequency of shared clonotypes in human B cell receptor repertoires

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages