Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain
the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in
Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles
and JavaScript.
Generative modeling for molecular design and discovery
Generative models have gained widespread attention in recent years due to their inverse design capabilities and their potential to accelerate the molecular design and discovery processes. This Collection includes manuscripts published by Nature Computational Science that apply and develop generative modeling tools for small molecule design and discovery. The Collection features both primary research articles and non-primary content and will be updated as new content is published. Content appears in reverse chronological order.
Combining conformal prediction machine learning with molecular docking, a method to efficiently screen multi-billion-scale libraries is developed, enabling the discovery of a dual-target ligand modulating the A2A adenosine and D2 dopamine receptors.
A physics-based training pipeline is developed to help tackle the challenges of data scarcity. The framework aligns large language models to a physically consistent initial state that is fine-tuned for learning polymer properties.
This work applies diffusion models to conditional molecule generation and shows how they can be used to tackle various structure-based drug design problems
DeepBlock is a deep learning framework for ligand generation, inspired by the DNA-encoded compound library technique, that enhances ligand design with building blocks and a rule-based reconstruction algorithm, achieving better drug properties.
The downselection of compounds for synthesis is a key challenge in molecular design cycles that typically relies on expert chemist intuition. Fromer and Coley propose a cost-aware method to automatically select compounds and synthetic routes.
A method is developed for the directional optimization of multiple properties without prior knowledge on their nature. Using a large ligand dataset, diverse metal complexes are found along the Pareto front of vast chemical spaces.
An optimization algorithm is used to discover guest molecules based on knowing only the structure of the host. The molecules are represented as 3D volumes, optimized to improve host–guest interaction and converted into SMILES using a transformer model.
A diffusion model that generates chemical reactions in 3D with all desired symmetries preserved is established and shown to reduce transition state search from days to seconds and complement intuition-based reaction exploration with generative AI.
SurfGen is a structure-based drug design approach that delves into topological and geometric deep learning techniques for interaction learning, echoing the classical lock-and-key model.
GaUDI is a guided diffusion method for the design of molecular structures that features a flexible and scalable target function and that achieves high validity of generated molecules.
A generative deep learning model of molecular structure is combined with supervised deep learning models of molecular properties to achieve high-throughput (multi-)property-driven design of organic molecules.
The Absolut! framework can generate synthetic three-dimensional antibody–antigen structures to assist machine learning and dataset construction for antibody design. Most importantly, the relative machine learning performance learnt on Absolut! datasets is shown to transfer to experimental datasets.
The application of machine learning techniques to small-molecule drug discovery has not yet yielded a true leap forward in the field. This Perspective discusses how a renewed focus on data and validation could help unlock machine learning’s potential.
Autoencoders are versatile tools for molecular informatics with the opportunity for advancing molecule and drug design. In this Review, the authors highlight the active areas of development in the field and explore the challenges that need to be addressed moving forward.
As artificial intelligence (AI) proliferates, synthetic chemistry stands to benefit from its progress. Despite hidden variables and ‘unknown unknowns’ in datasets that may impede the realization of a digital twin for the laboratory flask, there are many opportunities to leverage AI and large datasets to advance synthesis science.