Thanks to visit codestin.com
Credit goes to github.com

Skip to content

HongxinXiang/EDBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

25 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ‹ EDBench: Large-Scale Electron Density Data for Molecular Modeling

Python 3.7+ GitHub GitHub last commit


๐Ÿ“ Project Directory / Table of Contents

TODO

  • Release the full ED dataset (>3M)
  • Release model checkpoints
  • Release model code for quantum property prediction task
  • Release model code for retrieval between molecule and ED task
  • Release model code for ED generation task
  • Release benchmark dataset

Note: The full dataset has not been released yet. Due to the increased data volume, we are currently coordinating with data storage providers to handle and host the large-scale data efficiently. Once this is completed, the full dataset will be made publicly available.

๐Ÿ“ข News

  • [2025/09/19] ๐ŸŽ‰ Paper was accepted by NeurIPS 2025 !

  • [2025/05/13] Uploaded code of prediction tasks with X-3D and PointVector.

  • [2025/05/10] Repository initialized!


๐Ÿงช 1. Summary

Most existing molecular machine learning force fields (MLFFs) focus on atom- or molecule-level properties like energy and forces, while overlooking the foundational role of electron density (ED), denoted as $\rho(r)$. According to the Hohenbergโ€“Kohn theorem, ED uniquely determines all ground-state properties of many-body quantum systems. However, ED is expensive to compute via first-principles methods such as Density Functional Theory (DFT), limiting its large-scale use in MLFFs.

EDBench ๐Ÿ‹ addresses this gap by providing a large-scale, high-quality dataset of electron densities for over 3.3 million molecules, based on the PCQM4Mv2 standard. To benchmark electronic-scale learning, we introduce a suite of ED-centric tasks covering:

  • Prediction of quantum chemical properties
  • Retrieval across structure and ED modalities
  • Generation of ED from molecular structures

We demonstrate that ML models can learn from ED with high accuracy and also generate high-quality ED, dramatically reducing DFT costs. All data and benchmarks will be made publicly available to support ED-driven research in drug discovery and materials science.

๐Ÿ“„ Citation

@inproceedings{xiang2025edbench,
  title         = {EDBench: Large-Scale Electron Density Data for Molecular Modeling},
  author        = {Hongxin xiang and Ke Li and Mingquan Liu and Zhixiang Cheng and Bin Yao and Wenjie Du and Jun Xia and Li Zeng and Xin Jin and Xiangxiang Zeng},
  booktitle     = {The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year          = {2025},
  url           = {https://openreview.net/forum?id=pAd7qVrYPG}
}

@misc{xiang2025edbenchlargescaleelectrondensity,
  title         = {EDBench: Large-Scale Electron Density Data for Molecular Modeling},
  author        = {Hongxin Xiang and Ke Li and Mingquan Liu and Zhixiang Cheng and Bin Yao and Wenjie Du and Jun Xia and Li Zeng and Xin Jin and Xiangxiang Zeng},
  year          = {2025},
  eprint        = {2505.09262},
  archivePrefix = {arXiv},
  primaryClass  = {physics.chem-ph},
  url           = {https://arxiv.org/abs/2505.09262}
}


๐Ÿงฌ 2. EDBench Database

Built on PCQM4Mv2, the EDBench dataset contains accurate DFT-computed EDs for 3.3M+ molecules, enabling deep learning at the electronic scale.

Component Description Link
๐Ÿ“‚ Full data DFT-computed electron density data for 3.3M+ molecules [Baidu Drive]

๐Ÿงช 3. Benchmark Tasks

We design a suite of benchmark tasks centered on electron density (ED):

๐Ÿ”ฎ 3.1 Prediction Tasks

Predict quantum chemical properties from ED representations.

๐Ÿ“‚ Click to expand the directory structure
{benchmark_root}
{benchmark root}
โ”œโ”€โ”€ ed_energy_5w
โ”‚   โ”œโ”€โ”€ raw
โ”‚   โ”‚   โ”œโ”€โ”€ ed_energy_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes
โ”‚   โ”‚       โ””โ”€โ”€ {mol_index}
โ”‚   โ”‚           โ”œโ”€โ”€ Mol1_Dt.cube
โ”‚   โ”‚           โ”œโ”€โ”€ timer.dat
โ”‚   โ”‚           โ”œโ”€โ”€ Mol1.sdf
โ”‚   โ”‚           โ”œโ”€โ”€ Mol1_ESP.cube
โ”‚   โ”‚           โ””โ”€โ”€ {mol_index}_Psi4.out
โ”‚   โ””โ”€โ”€ processed
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
โ”œโ”€โ”€ ed_homo_lumo_5w
โ”‚   โ”œโ”€โ”€ raw
โ”‚   โ”‚   โ”œโ”€โ”€ ed_homo_lumo_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes
โ”‚   โ””โ”€โ”€ processed
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
โ”œโ”€โ”€ ed_multipole_moments_5w
โ”‚   โ”œโ”€โ”€ raw
โ”‚   โ”‚   โ”œโ”€โ”€ ed_multipole_moments_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes
โ”‚   โ””โ”€โ”€ processed
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
โ””โ”€โ”€ ed_open_shell_5w
    โ”œโ”€โ”€ raw
    โ”‚   โ”œโ”€โ”€ ed_open_shell_5w.csv
    โ”‚   โ”œโ”€โ”€ readme.md
    โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes
    โ””โ”€โ”€ processed
        โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
Dataset Dir Name Link Description
ED5-EC ed_energy_5w Dataverse 6 energy components (DF-RKS Final Energy, Nuclear Repulsion Energy, One-Electron Energy, Two-Electron Energy, DFT Exchange-Correlation Energy, Total Energy)
ED5-OE ed_homo_lumo_5w Dataverse 7 orbital energies (HOMO-2, HOMO-1, HOMO-0, LUMO+0, LUMO+1, LUMO+2, LUMO+3)
ED5-MM ed_multipole_moments_5w Dataverse 4 multipole moment (Dipole X, Dipole Y, Dipole Z, Magnitude)
ED5-OCS ed_open_shell_5w Dataverse Binary classification of open-/closed-shell systems

๐Ÿ” 3.2 Retrieval Task

Cross-modal retrieval between molecular structures (MS) and electron densities (ED).

๐Ÿ“‚ Click to expand the directory structure
{benchmark_root}
โ”œโ”€โ”€ ed_retrieval_5w/
โ”‚   โ”œโ”€โ”€ raw/
โ”‚   โ”‚   โ”œโ”€โ”€ ed_retrieval_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes/
โ”‚   โ””โ”€โ”€ processed/
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
Dataset Dir Name Link Description
ED5-MER ed_retrieval_5w Dataverse Cross-modal retrieval: MS โ†” ED

๐Ÿงฌ 3.3 Generation Task

Generate ED representations from molecular structures.

๐Ÿ“‚ Click to expand the directory structure
{benchmark_root}
โ”œโ”€โ”€ ed_prediction_5w/
โ”‚   โ”œโ”€โ”€ raw/
โ”‚   โ”‚   โ”œโ”€โ”€ ed_prediction_5w.csv
โ”‚   โ”‚   โ”œโ”€โ”€ readme.md
โ”‚   โ”‚   โ””โ”€โ”€ psi4_grid0.4_cubes/
โ”‚   โ””โ”€โ”€ processed/
โ”‚       โ””โ”€โ”€ mol_EDthresh{thresh}_data.pkl
Dataset Dir Name Link Description
ED5-EDP ed_prediction_5w Dataverse Predict ED from molecular structures

๐Ÿ“‚ 3.4 Dataset File Format

Each raw/ directory includes a .csv summary file describing each molecule.

๐Ÿ“Œ Common Columns

  • index: Molecule index
  • smiles: Original SMILES
  • canonical_smiles: Canonicalized SMILES
  • scaffold_split: Scaffold-based split (80% train / 10% valid / 10% test)
  • random_split: Random split (80% train / 10% valid / 10% test)

๐Ÿงพ Task-Specific Columns

  • Prediction:
    • label: Ground-truth values (space-separated if multi-task)
  • Retrieval:
    • negative_index: Space-separated indices of 10 negative samples

๐Ÿš€ 4. Running Benchmarks

โš›๏ธ 4.1 Quantum Prediction Tasks

Description:

This task focuses on predicting quantum properties of molecules using point-cloud representations of electronic density. It is designed to evaluate how accurately models can capture molecular electronic behavior and properties.

Task Datasets Code Checkpoint
Quantum Prediction Tasks ED5-EC: 6 energy components
ED5-OE: 7 orbital energies
ED5-MM: 4 multipole moment
ED5-OCS: open-/closed-shell classification
Code ๐Ÿ“‚ Download

โšก 4.2 Retrieval and Generation Tasks

Description:

This task involves two objectives:

  1. Retrieval between molecular structures and their electronic densities.
  2. Predicting electronic densities from molecular conformations.

These tasks assess the modelโ€™s ability to connect molecular structures with their electron distributions and to generate accurate electronic density predictions.

Task Description Code Checkpoint
Retrieval and Generation Tasks ED5-MER: Retrieval between molecular structures and ED
ED5-EDP: Electronic density prediction
Code ๐Ÿ“‚ Download

๐Ÿ“ฌ Contact

Feel free to open an issue or pull request for questions or contributions. For academic inquiries, contact the authors upon paper publication.


๐Ÿ“˜ License

Released for research use under an open-source MIT license.

About

EDBench: Large-Scale Electron Density Data for Molecular Modeling (NeurIPS 2025)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published