PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking

PoseX is a comprehensive benchmark dataset designed to evaluate molecular docking algorithms for predicting protein-ligand binding poses. It includes the construction process of Self-Docking and Cross-Docking datasets, as well as complete evaluation codes for different docking tools.

Online Leaderboard

Dataset Repository

PoseX Self-Docking Result

PoseX Cross-Docking Result

Installation

Install PoseX directly from Github to get the latest updates.

git clone https://github.com/CataAI/PoseX.git
cd PoseX

We recommend using mamba to manage the Python environment. For more information on how to install mamba, see Miniforge. Once mamba is installed, we can run the following command to install the basic environment.

mamba create -f environments/base.yaml
mamba activate posex

For a specific molecular docking tool, we can use the corresponding environment file in the environments folder. Take Chai-1 as an example:

pip install -r environments/chai-1.txt

Benchmark Data

Download posex_set.zip and posex_cif.zip from Dataset on HF

mkdir data
unzip posex_set.zip -d data 
unzip posex_cif.zip -d data
mv data/posex_set data/dataset

For information about creating a dataset from scratch, please refer to the data construction README.

Benchmark Pipeline

This project provides a complete pipeline for:

Generate the csv file required for the benchmark based on the PoseX dataset
Converting docking data into model-specific input formats
Running different docking models or tools
Energy minimization of molecular docking results
Extracting and aligning model outputs
Calculating evaluation metrics using PoseBusters

1. Generate Benchmark CSV Data

Generate benchmark CSV files containing protein sequences, ligand SMILES, and other metadata:

bash ./scripts/generate_docking_benchmark.sh <dataset>

# --------------- Example --------------- #
# Astex
bash ./scripts/generate_docking_benchmark.sh astex
# PoseX Self-Docking
bash ./scripts/generate_docking_benchmark.sh posex_self_dock
# PoseX Cross-Docking
bash ./scripts/generate_docking_benchmark.sh posex_cross_dock

2. Convert to Model Inputs

Convert benchmark CSV files to model-specific input formats:

bash ./scripts/convert_to_model_input.sh <dataset> <model_type>

# --------------- Example --------------- #
# PoseX Self-Docking (AlphaFold3)
bash ./scripts/convert_to_model_input.sh posex_self_dock alphafold3

3. Run Docking Models

Run different docking models:

bash ./scripts/run_<model_type>/run_<model_type>.sh <dataset>

# --------------- Example --------------- #
# PoseX Self-Docking (Alphafold3)
bash ./scripts/run_alphafold3/run_alphafold3.sh posex_self_dock

4. Extract Model Outputs

Extract predicted structures from model outputs:

bash ./scripts/extract_model_output.sh <dataset> <model_type>

# --------------- Example --------------- #
# PoseX Self-Docking (AlphaFold3)
bash ./scripts/extract_model_output.sh posex_self_dock alphafold3

5. Energy Minimization

python -m scripts.relax_model_outputs --input_dir <input_dir> --cif_dir data/posex_cif

# --------------- Example --------------- #
# PoseX Self-Docking (AlphaFold3)
python -m scripts.relax_model_outputs --input_dir data/benchmark/posex_self_dock/alphafold3/output --cif_dir data/posex_cif

6. Align Predicted Structures

Align predicted structures to the reference structures:

bash ./scripts/complex_structure_alignment.sh <dataset> <model_type> <relax_mode>

# --------------- Example --------------- #
# PoseX Self-Docking (AlphaFold3) (Using Relax)
bash ./scripts/complex_structure_alignment.sh posex_self_dock alphafold3 true

7. Calculate Benchmark Result

Calculate evaluation metrics using PoseBusters:

bash ./scripts/calculate_benchmark_result.sh <dataset> <model_type> <relax_mode>

# --------------- Example --------------- #
# PoseX Self-Docking (AlphaFold3) (Using Relax)
bash ./scripts/calculate_benchmark_result.sh posex_self_dock alphafold3 true

We provide the results here.

Acknowledgements

Method	Pub. Year	License	Code
Physics-based methods
Discovery Studio	late 1990s	Commercial	-
Schrödinger Glide	2004	Commercial	-
MOE	2008	Commercial	-
AutoDock Vina	2010, 2021	Apache-2.0
GNINA	2021	Apache-2.0
AI docking methods
DeepDock	2021	MIT
EquiBind	2022	MIT
TankBind	2022	MIT
DiffDock	2022	MIT
Uni-Mol	2024	MIT
FABind	2023	MIT
DiffDock-L	2024	MIT
DiffDock-Pocket	2024	MIT
DynamicBind	2024	MIT
Interformer	2024	Apache-2.0
SurfDock	2024	MIT
AI co-folding methods
NeuralPLexer	2024	BSD
RoseTTAFold-All-Atom	2023	BSD
AlphaFold3	2024	CC-BY-NC-SA 4.0
Chai-1	2024	Apache-2.0
Boltz-1	2024	MIT
Boltz-1x	2025	MIT
Protenix	2025	Apache-2.0

License

Code: Licensed under the MIT License.
Dataset: Licensed under Creative Commons Attribution 4.0 International (CC-BY 4.0). See PoseX Dataset for details.

Citations

If you are interested in our work or use our data and code, please cite the following article:

@misc{jiang2025posexaidefeatsphysics,
      title={PoseX: AI Defeats Physics Approaches on Protein-Ligand Cross Docking}, 
      author={Yize Jiang and Xinze Li and Yuanyuan Zhang and Jin Han and Youjun Xu and Ayush Pandit and Zaixi Zhang and Mengdi Wang and Mengyang Wang and Chong Liu and Guang Yang and Yejin Choi and Wu-Jun Li and Tianfan Fu and Fang Wu and Junhong Liu},
      year={2025},
      eprint={2505.01700},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.01700}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
dataset		dataset
environments		environments
figures		figures
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking

PoseX Self-Docking Result

PoseX Cross-Docking Result

Contents

Installation

Benchmark Data

Benchmark Pipeline

1. Generate Benchmark CSV Data

2. Convert to Model Inputs

3. Run Docking Models

4. Extract Model Outputs

5. Energy Minimization

6. Align Predicted Structures

7. Calculate Benchmark Result

Acknowledgements

License

Citations

About

Uh oh!

Releases 1

Packages

Contributors 5

Uh oh!

Languages

License

CataAI/PoseX

Folders and files

Latest commit

History

Repository files navigation

PoseX: AI Defeats Physics-based Methods on Protein Ligand Cross-Docking

PoseX Self-Docking Result

PoseX Cross-Docking Result

Contents

Installation

Benchmark Data

Benchmark Pipeline

1. Generate Benchmark CSV Data

2. Convert to Model Inputs

3. Run Docking Models

4. Extract Model Outputs

5. Energy Minimization

6. Align Predicted Structures

7. Calculate Benchmark Result

Acknowledgements

License

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Uh oh!

Languages

Packages