P2Rank

Ligand-binding site prediction based on machine learning.

Description

P2Rank is a stand-alone command line program that predicts ligand-binding pockets from a protein structure. It achieves high prediction success rates without relying on an external software for computation of complex features or on a database of known protein-ligand templates.

Requirements

Java 8 to 15
PyMOL 1.7 (or newer) for viewing visualizations (optional)

Program is tested on Linux, macOS and Windows. On Windows, using bash console is recommended to execute the program instead of cmd or PowerShell.

Setup

P2Rank requires no installation. Binary packages are available as GitHub Releases.

Download: https://github.com/rdk/p2rank/releases
Source code: https://github.com/rdk/p2rank
Datasets: https://github.com/rdk/p2rank-datasets

Usage

prank predict -f test_data/1fbl.pdb         # predict pockets on a single pdb file

See more usage examples below...

Algorithm

P2Rank makes predictions by scoring and clustering points on the protein's solvent accessible surface. Ligandability score of individual points is determined by a machine learning based model trained on the dataset of known protein-ligand complexes. For more details see the slides and publications.

Presentation slides introducing original version of the algotithm: Slides (pdf)

Publications

If you use P2Rank, please cite relevant papers:

Software article in JChem about P2Rank pocket prediction tool
Krivak R, Hoksza D. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. Journal of Cheminformatics. 2018 Aug.
Web-server article in NAR about the web interface accessible at prankweb.cz
Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. PrankWeb: a web server for ligand binding site prediction and visualization. Nucleic Acids Research, Volume 47, Issue W1, 02 July 2019, Pages W345-W349
Conference paper inroducing P2Rank prediction algorithm
Krivak R, Hoksza D. P2RANK: Knowledge-Based Ligand Binding Site Prediction Using Aggregated Local Features. InInternational Conference on Algorithms for Computational Biology 2015 Aug 4 (pp. 41-52). Springer
Research article in JChem about PRANK rescoring algorithm
Krivak R, Hoksza D. Improving protein-ligand binding site prediction accuracy by classification of inner pocket points using local features. Journal of Cheminformatics. 2015 Dec.

Build from sources

This project uses Gradle build system via included Gradle wrapper. On Windows use bash to execute build comands (bash is installed as a part of Git for Windows).

git clone https://github.com/rdk/p2rank.git && cd p2rank
./make.sh       

./unit-tests.sh    # optionally you can run tests to check everything works fine on your machine        
./tests.sh quick   # runs further tests

Now you can run the program via:

distro/prank       # standard mode that logs to distro/log/prank.log
./prank.sh         # development mode that logs to console

Usage Examples

Following commands can be executed in the installation directory.

Print help

prank help

Predict ligand binding sites (P2Rank algorithm)

prank predict test.ds                             # run on whole dataset (containing list of pdb files)

prank predict -f test_data/1fbl.pdb               # run on single pdb file
prank predict -f test_data/1fbl.pdb.gz            # run on single gzipped pdb file

prank predict -threads 8              test.ds     # specify no. of working threads for parallel processing
prank predict -o output_here          test.ds     # explicitly specify output directory
prank predict -c conservation.groovy  test.ds     # specify configuration file (conservation.groovy  
                                                  # uses different prediction model and parameters)

Evaluate prediction model

...on a file or a dataset with known ligands.

prank eval-predict -f test_data/1fbl.pdb
prank eval-predict test.ds

Prediction output

For each file in the dataset P2Rank produces several output files:

<pdb_file_name>_predictions.csv: contains an ordered list of predicted pockets, their scores, coordinates of their centers together with a list of adjacent residues, and a list of adjacent protein surface atoms
<pdb_file_name>_residues.csv: contains list of all residues from the input protein with their scores, mapping to predicted pockets and calibrated probability of being a ligand-binding residue
PyMol visualization (.pml script with data files)

If coordinates of SAS points that belong to predicted pockets are needed, they can be found in visualizations/data/<pdb_file_name>_points.pdb. There "Residue sequence number" (23-26) of HETATM record corresponds to the rank of corresponding pocket (points with value 0 do not belong to any pocket).

Configuration

You can override the default params with a custom config file:

prank predict -c config/example.groovy  test.ds
prank predict -c example.groovy         test.ds

It is also possible to override the default params on the command line using their full name.

prank predict                   -seed 151 -threads 8  test.ds   #  set random seed and number of threads, override defeults
prank predict -c example.groovy -seed 151 -threads 8  test.ds   #  override defaults as well as values from example.groovy

P2Rank has numerous configurable parameters. To see the list of standard params look into config/default.groovy and other example config files in this directory. To see the complete commented list of all (including undocumented) params see Params.groovy in the source code.

Rescoring (PRANK algorithm)

In addition to predicting new ligand binding sites, P2Rank is also able to rescore pockets predicted by other methods (Fpocket, ConCavity, SiteHound, MetaPocket2, LISE and DeepSite are supported at the moment).

prank rescore test_data/fpocket.ds
prank rescore fpocket.ds                 # test_data/ is default 'dataset_base_dir'
prank rescore fpocket.ds -o output_dir   # test_output/ is default 'output_base_dir'       
prank eval-rescore fpocket.ds            # evaluate rescoring model

Comparison with Fpocket

Fpocket is widely used open source ligand binding site prediction program. It is fast, easy to use and well documented. As such, it was a great inspiration for this project. Fpocket is written in C, and it is based on a different geometric algorithm.

Some practical differences:

Fpocket
- has much smaller memory footprint
- runs faster when executed on a single protein
- produces a high number of less relevant pockets (and since the default scoring function isn't very effective the most relevant pockets often doesn't get to the top)
- contains MDpocket algorithm for pocket predictions from molecular trajectories
- still better documented
P2Rank
- achieves significantly higher identification success rates when considering top-ranked pockets
- produces smaller number of more relevant pockets
- speed:
  - slower when running on a single protein (due to JVM startup cost)
  - approximately as fast on average running on a big dataset on a single core
  - due to parallel implementation potentially much faster on multi core machines
- higher memory footprint (~1G but doesn't grow much with more parallel threads)

Both Fpocket and P2Rank have many configurable parameters that influence behaviour of the algorithm and can be tweaked to achieve better results for particular requirements.

Thanks

This program builds upon software written by other people, either through library dependencies or through code included in its source tree (where no library builds were available). Notably:

FastRandomForest by Fran Supek (https://code.google.com/archive/p/fast-random-forest/)
KDTree by Rednaxela (http://robowiki.net/wiki/User:Rednaxela/kD-Tree)
BioJava (https://github.com/biojava)
Chemistry Development Kit (https://github.com/cdk)
Weka (http://www.cs.waikato.ac.nz/ml/weka/)

Contributing

We welcome any bug reports, enhancement requests, and other contributions. To submit a bug report or enhancement request, please use the GitHub issues tracker. For more substantial contributions, please fork this repo, push your changes to your fork, and submit a pull request with a good commit message.

Name		Name	Last commit message	Last commit date
Latest commit History 972 Commits
config		config
distro		distro
gradle/wrapper		gradle/wrapper
lib		lib
misc		misc
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
build.gradle		build.gradle
commit.sh		commit.sh
experiment.sh		experiment.sh
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
logc.sh		logc.sh
make-clean.sh		make-clean.sh
make-distro.sh		make-distro.sh
make.sh		make.sh
prank.sh		prank.sh
push.sh		push.sh
tests.sh		tests.sh
unit-tests.sh		unit-tests.sh
update.sh		update.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

P2Rank

Description

Requirements

Setup

Usage

Algorithm

Publications

Build from sources

Usage Examples

Print help

Predict ligand binding sites (P2Rank algorithm)

Evaluate prediction model

Prediction output

Configuration

Rescoring (PRANK algorithm)

Comparison with Fpocket

Thanks

Contributing

About

Uh oh!

Releases 15

Uh oh!

Contributors 6

Uh oh!

Languages

License

rdk/p2rank

Folders and files

Latest commit

History

Repository files navigation

P2Rank

Description

Requirements

Setup

Usage

Algorithm

Publications

Build from sources

Usage Examples

Print help

Predict ligand binding sites (P2Rank algorithm)

Evaluate prediction model

Prediction output

Configuration

Rescoring (PRANK algorithm)

Comparison with Fpocket

Thanks

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Uh oh!

Contributors 6

Uh oh!

Languages