- ๐ข News
- ๐งช 1. Summary
- ๐งฌ 2. EDBench Database
- ๐งช 3. Benchmark Tasks
- ๐ 4. Running Benchmarks
- ๐ฌ Contact
- ๐ License
- Release the full ED dataset (>3M)
- Release model checkpoints
- Release model code for quantum property prediction task
- Release model code for retrieval between molecule and ED task
- Release model code for ED generation task
- Release benchmark dataset
Note: The full dataset has not been released yet. Due to the increased data volume, we are currently coordinating with data storage providers to handle and host the large-scale data efficiently. Once this is completed, the full dataset will be made publicly available.
-
[2025/09/19] ๐ Paper was accepted by NeurIPS 2025 !
-
[2025/05/13] Uploaded code of prediction tasks with X-3D and PointVector.
-
[2025/05/10] Repository initialized!
Most existing molecular machine learning force fields (MLFFs) focus on atom- or molecule-level properties like energy and forces, while overlooking the foundational role of electron density (ED), denoted as
EDBench ๐ addresses this gap by providing a large-scale, high-quality dataset of electron densities for over 3.3 million molecules, based on the PCQM4Mv2 standard. To benchmark electronic-scale learning, we introduce a suite of ED-centric tasks covering:
- Prediction of quantum chemical properties
- Retrieval across structure and ED modalities
- Generation of ED from molecular structures
We demonstrate that ML models can learn from ED with high accuracy and also generate high-quality ED, dramatically reducing DFT costs. All data and benchmarks will be made publicly available to support ED-driven research in drug discovery and materials science.
๐ Citation
@inproceedings{xiang2025edbench, title = {EDBench: Large-Scale Electron Density Data for Molecular Modeling}, author = {Hongxin xiang and Ke Li and Mingquan Liu and Zhixiang Cheng and Bin Yao and Wenjie Du and Jun Xia and Li Zeng and Xin Jin and Xiangxiang Zeng}, booktitle = {The Thirty-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, year = {2025}, url = {https://openreview.net/forum?id=pAd7qVrYPG} } @misc{xiang2025edbenchlargescaleelectrondensity, title = {EDBench: Large-Scale Electron Density Data for Molecular Modeling}, author = {Hongxin Xiang and Ke Li and Mingquan Liu and Zhixiang Cheng and Bin Yao and Wenjie Du and Jun Xia and Li Zeng and Xin Jin and Xiangxiang Zeng}, year = {2025}, eprint = {2505.09262}, archivePrefix = {arXiv}, primaryClass = {physics.chem-ph}, url = {https://arxiv.org/abs/2505.09262} }
Built on PCQM4Mv2, the EDBench dataset contains accurate DFT-computed EDs for 3.3M+ molecules, enabling deep learning at the electronic scale.
| Component | Description | Link |
|---|---|---|
| ๐ Full data | DFT-computed electron density data for 3.3M+ molecules | [Baidu Drive] |
We design a suite of benchmark tasks centered on electron density (ED):
Predict quantum chemical properties from ED representations.
๐ Click to expand the directory structure
{benchmark_root}
{benchmark root}
โโโ ed_energy_5w
โ โโโ raw
โ โ โโโ ed_energy_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes
โ โ โโโ {mol_index}
โ โ โโโ Mol1_Dt.cube
โ โ โโโ timer.dat
โ โ โโโ Mol1.sdf
โ โ โโโ Mol1_ESP.cube
โ โ โโโ {mol_index}_Psi4.out
โ โโโ processed
โ โโโ mol_EDthresh{thresh}_data.pkl
โโโ ed_homo_lumo_5w
โ โโโ raw
โ โ โโโ ed_homo_lumo_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes
โ โโโ processed
โ โโโ mol_EDthresh{thresh}_data.pkl
โโโ ed_multipole_moments_5w
โ โโโ raw
โ โ โโโ ed_multipole_moments_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes
โ โโโ processed
โ โโโ mol_EDthresh{thresh}_data.pkl
โโโ ed_open_shell_5w
โโโ raw
โ โโโ ed_open_shell_5w.csv
โ โโโ readme.md
โ โโโ psi4_grid0.4_cubes
โโโ processed
โโโ mol_EDthresh{thresh}_data.pkl| Dataset | Dir Name | Link | Description |
|---|---|---|---|
| ED5-EC | ed_energy_5w |
Dataverse | 6 energy components (DF-RKS Final Energy, Nuclear Repulsion Energy, One-Electron Energy, Two-Electron Energy, DFT Exchange-Correlation Energy, Total Energy) |
| ED5-OE | ed_homo_lumo_5w |
Dataverse | 7 orbital energies (HOMO-2, HOMO-1, HOMO-0, LUMO+0, LUMO+1, LUMO+2, LUMO+3) |
| ED5-MM | ed_multipole_moments_5w |
Dataverse | 4 multipole moment (Dipole X, Dipole Y, Dipole Z, Magnitude) |
| ED5-OCS | ed_open_shell_5w |
Dataverse | Binary classification of open-/closed-shell systems |
Cross-modal retrieval between molecular structures (MS) and electron densities (ED).
๐ Click to expand the directory structure
{benchmark_root}
โโโ ed_retrieval_5w/
โ โโโ raw/
โ โ โโโ ed_retrieval_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes/
โ โโโ processed/
โ โโโ mol_EDthresh{thresh}_data.pkl| Dataset | Dir Name | Link | Description |
|---|---|---|---|
| ED5-MER | ed_retrieval_5w |
Dataverse | Cross-modal retrieval: MS โ ED |
Generate ED representations from molecular structures.
๐ Click to expand the directory structure
{benchmark_root}
โโโ ed_prediction_5w/
โ โโโ raw/
โ โ โโโ ed_prediction_5w.csv
โ โ โโโ readme.md
โ โ โโโ psi4_grid0.4_cubes/
โ โโโ processed/
โ โโโ mol_EDthresh{thresh}_data.pkl| Dataset | Dir Name | Link | Description |
|---|---|---|---|
| ED5-EDP | ed_prediction_5w |
Dataverse | Predict ED from molecular structures |
Each raw/ directory includes a .csv summary file describing each molecule.
index: Molecule indexsmiles: Original SMILEScanonical_smiles: Canonicalized SMILESscaffold_split: Scaffold-based split (80% train / 10% valid / 10% test)random_split: Random split (80% train / 10% valid / 10% test)
- Prediction:
label: Ground-truth values (space-separated if multi-task)
- Retrieval:
negative_index: Space-separated indices of 10 negative samples
Description:
This task focuses on predicting quantum properties of molecules using point-cloud representations of electronic density. It is designed to evaluate how accurately models can capture molecular electronic behavior and properties.
| Task | Datasets | Code | Checkpoint |
|---|---|---|---|
| Quantum Prediction Tasks | ED5-EC: 6 energy components ED5-OE: 7 orbital energies ED5-MM: 4 multipole moment ED5-OCS: open-/closed-shell classification |
Code ๐ | Download |
Description:
This task involves two objectives:
- Retrieval between molecular structures and their electronic densities.
- Predicting electronic densities from molecular conformations.
These tasks assess the modelโs ability to connect molecular structures with their electron distributions and to generate accurate electronic density predictions.
| Task | Description | Code | Checkpoint |
|---|---|---|---|
| Retrieval and Generation Tasks | ED5-MER: Retrieval between molecular structures and ED ED5-EDP: Electronic density prediction |
Code ๐ | Download |
Feel free to open an issue or pull request for questions or contributions. For academic inquiries, contact the authors upon paper publication.
Released for research use under an open-source MIT license.