Sesame: Structure-Aware Molecular Generation via Spatial Density-Map Conditioning
Pith reviewed 2026-06-26 08:54 UTC · model grok-4.3
The pith
Sesame conditions diffusion-based molecular generation on spatial density maps of partial structures and protein pockets to enable both de novo design and lead optimization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a novel spatial pairformer module in a diffusion framework can condition on spatial density maps of partial molecular structure and protein pockets to support both de novo generation and fragment-conditioned lead optimization, with additional joint denoising and trajectory finetuning improving the process.
What carries the argument
The spatial pairformer module that processes continuous spatial density maps to condition the diffusion model on molecular and protein environment information.
If this is right
- The same conditioning supports both de novo generation and fragment-conditioned lead optimization.
- Joint denoising produces consistent outputs across atom types, bond types, and positions.
- Trajectory finetuning on the model's own rollouts raises generation quality.
- Training on combined ligand-only and protein-ligand datasets broadens applicability to structure-based tasks.
Where Pith is reading between the lines
- Density-map conditioning might extend to ligands for targets other than proteins, such as nucleic acids.
- The approach could reduce reliance on explicit coordinate or interaction modeling in future generators.
- Generated molecules could be iteratively re-encoded as density maps for multi-step optimization loops.
Load-bearing premise
Expressing partial molecular structure and protein pockets as continuous spatial density maps supplies enough information for the model to generate chemically valid and productive molecules without explicit atom-level terms.
What would settle it
Running Sesame on density maps from known high-affinity protein-ligand complexes and measuring whether generated molecules recover correct binding poses or chemical validity at rates above baseline models without the maps.
Figures
read the original abstract
Generative molecular models for drug design are a promising direction with much active research. In the next phase of computational drug design, such models will need to understand small molecule structure and protein-ligand interactions, and they will need to possess the machinery to generate molecules de novo. Incorporating each feature poses a critical challenge. Equally important, yet often treated as secondary, is the ability to grow a molecule from a partial starting point -- a scaffold or fragment supplied by a chemist -- which is the central operation of lead optimization. We present Sesame (Spatial Evoformer for a Structure-Aware Molecular Engine), a diffusion-based molecular generation model that leverages a novel spatial pairformer module to condition on partial molecular structure and the surrounding protein pocket, both expressed as continuous spatial density maps. This single conditioning mechanism supports both de novo generation and fragment-conditioned lead optimization, letting a medicinal chemist prune a hit to a scaffold and have Sesame grow it in productive ways. In addition to this module, we also introduce a diffusion framework for joint denoising of atom types, bond types, and positions, along with a trajectory finetuning scheme that trains on the model's own sampling rollouts to improve generation quality. Sesame is trained on a large corpus of ligand-only and protein-ligand datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce Sesame, a diffusion-based molecular generation model using a novel spatial pairformer module that conditions on continuous spatial density maps of partial molecular structures and protein pockets. This single mechanism is said to enable both de novo generation and fragment-conditioned lead optimization. Additional contributions include a joint diffusion process for atom types, bond types, and positions, plus a trajectory finetuning scheme trained on the model's own rollouts; the model is trained on ligand-only and protein-ligand datasets.
Significance. If the density-map conditioning and joint diffusion framework prove effective at producing chemically valid and productive molecules, the work could offer a unified structure-aware approach for drug design tasks, particularly by supporting scaffold growth in lead optimization without separate models for de novo versus conditioned regimes. The trajectory finetuning on sampling rollouts is a positive methodological choice that could improve practical generation quality.
major comments (2)
- [Abstract] Abstract: the central claim that continuous spatial density maps of partial ligands and pockets, processed via the spatial pairformer, suffice for the joint diffusion process to yield valid molecules in both de novo and scaffold-growing modes rests on an unverified assumption that smoothed fields preserve the discrete geometric and interaction details (exact distances, atom-type pairings, steric constraints) needed for chemical validity; no auxiliary atom-level terms are described to compensate if the pairformer cannot recover them.
- [Abstract] Abstract (diffusion framework description): without reported validation details, error bars, or ablation results on whether the joint denoising of atom/bond types and positions maintains validity under density-map conditioning alone, it is impossible to assess whether the claimed support for both generation regimes holds or whether failures in recovering sharp constraints undermine the results.
minor comments (1)
- The abstract states training on 'a large corpus of ligand-only and protein-ligand datasets' but provides no specifics on dataset composition, sizes, or preprocessing that would allow assessment of generalization.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each major comment below and propose targeted revisions to the abstract to improve clarity without altering the core technical claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that continuous spatial density maps of partial ligands and pockets, processed via the spatial pairformer, suffice for the joint diffusion process to yield valid molecules in both de novo and scaffold-growing modes rests on an unverified assumption that smoothed fields preserve the discrete geometric and interaction details (exact distances, atom-type pairings, steric constraints) needed for chemical validity; no auxiliary atom-level terms are described to compensate if the pairformer cannot recover them.
Authors: The spatial pairformer is explicitly designed to recover discrete geometric and interaction details from the continuous density maps via its spatial attention over pairwise positions and features. The joint diffusion process on atom types, bond types, and positions further enforces chemical constraints during denoising. We agree the abstract could state this more explicitly. We will revise the abstract to note that the pairformer recovers the required details without auxiliary atom-level terms. revision: yes
-
Referee: [Abstract] Abstract (diffusion framework description): without reported validation details, error bars, or ablation results on whether the joint denoising of atom/bond types and positions maintains validity under density-map conditioning alone, it is impossible to assess whether the claimed support for both generation regimes holds or whether failures in recovering sharp constraints undermine the results.
Authors: The full manuscript reports validation metrics with error bars and ablation studies on the joint denoising under density-map conditioning (see Sections 4.2 and 4.3). We agree the abstract should reference this supporting evidence. We will revise the abstract to briefly note that validity is maintained as shown by these experiments and direct readers to the relevant sections. revision: yes
Circularity Check
No circularity: model architecture described without self-referential derivations
full rationale
The paper presents Sesame as a diffusion model with a spatial pairformer module conditioned on density maps for molecular generation. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or description. The claims concern architectural choices and training procedures that are presented as design decisions rather than derived results reducing to inputs by construction. This is a standard ML methods paper with no load-bearing mathematical derivations to inspect for circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
doi: 10.1038/s41586-024-07487-w. URLhttps://doi.org/10.1038/s41586-024-07487-w. 20 Keir Adams, Kento Abeywardane, Jenna Fromer, and Connor W. Coley. ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design. InThe Thirteenth International Conference on Learning Representations (ICLR 2025),
-
[2]
URLhttps://doi.org/10.1039/D4SC03523B
doi: 10.1039/ D4SC03523B. URLhttps://doi.org/10.1039/D4SC03523B. Julian Cremer, Ross Irwin, Alessandro Tibo, Jon Paul Janet, Simon Olsson, and Djork- Arn´ e Clevert. FLOWR: Flow matching for structure-aware de novo, interaction- and fragment-based ligand generation.Nature Computational Science,
-
[3]
URLhttps://doi.org/10.1038/s43588-026-00998-8
doi: 10.1038/ s43588-026-00998-8. URLhttps://doi.org/10.1038/s43588-026-00998-8. Miruna Cretu, Charles Harris, Ilia Igashov, Arne Schneuing, Marwin Segler, Bruno Correia, Julien Roy, Emmanuel Bengio, and Pietro Li` o. SynFlowNet: Towards molecule design with guaranteed synthesis pathways. InThe Thirteenth International Conference on Learning Representatio...
-
[4]
Ian Dunn and David R
URL https://openreview.net/forum? id=uvHmnahyp1. Ian Dunn and David R. Koes. FlowMol3: Flow matching for 3d de novo small-molecule generation.Digital Discovery, 5(5):2052–2066,
2052
-
[5]
URL https://doi.org/10.1073/ pnas.2415665122
doi: 10.1073/pnas.2415665122. URL https://doi.org/10.1073/ pnas.2415665122. Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, and Jianzhu Ma. 3d equivariant diffusion for target-aware molecule generation and affinity prediction. InThe Eleventh International Conference on Learning Representations (ICLR 2023),
-
[6]
Ross Irwin, Alessandro Tibo, Jon Paul Janet, and Simon Olsson. SemlaFlow – efficient 3d molecular generation with latent attention and equivariant flow matching.arXiv preprint arXiv:2406.07266,
-
[7]
doi: 10.1021/acs.jpcb.1c06437. URL https://doi.org/10.1021/acs.jpcb.1c06437. 21 Pablo Lemos, Zach Beckwith, Srimukh Bandi, Maarten van Damme, Jordan Crivelli-Decker, Benjamin J. Shields, Thomas Merth, Prabhat Kumar Jha, Nicola De Mitri, Tiffany J. Callahan, Aaron J. Nish, Peter Abruzzo, Romelia Salomon-Ferrer, and Martin Ganahl. SAIR: enabling deep learni...
-
[8]
Lemos, P., Beckwith, Z., Bandi, S., van Damme, M., Crivelli-Decker, J., Shields, B
doi: 10.1101/2025.06.17.660168. URL https://doi.org/10. 1101/2025.06.17.660168. Saro Passaro, Gabriele Corso, Jeremy Wohlwend, Mateo Reveiz, Stephan Thaler, Vig- nesh Ram Somnath, Noah Getz, Tally Portnoi, Julien Roy, Hannes Stark, David Kwabi- Addo, Dominique Beaini, Tommi Jaakkola, and Regina Barzilay. Boltz-2: Towards accurate and efficient binding aff...
-
[9]
Boltz-2: Towards accurate and efficient binding affinity prediction.bioRxiv, 2025
doi: 10.1101/2025.06.14.659707. URLhttps://doi.org/10.1101/2025.06.14.659707. Pedro O. Pinheiro, Arian Jamasb, Omar Mahmood, Vishnu Sresht, and Saeed Saremi. Structure-based drug design by denoising voxel grids.arXiv preprint arXiv:2405.03961, 2024a. Pedro O. Pinheiro, Joshua Rackers, Joseph Kleinhenz, Michael Maser, Omar Mahmood, Andrew Martin Watkins, S...
-
[11]
Jiaming Song, Chenlin Meng, and Stefano Ermon
URLhttps://arxiv.org/abs/2002.05202. Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations (ICLR),
Pith/arXiv arXiv 2002
-
[12]
URL https: //arxiv.org/abs/2010.02502. Benjamin I. Tingle, Khanh G. Tang, Mar Castanon, John J. Gutierrez, Munkhzul Khurel- baatar, Chinzorig Dandarchuluun, Yurii S. Moroz, and John J. Irwin. ZINC-22—a free multi-billion-scale database of tangible compounds for ligand discovery.Journal of Chem- ical Information and Modeling, 63(4):1166–1176,
Pith/arXiv arXiv 2010
-
[13]
URLhttps://doi.org/10.1021/acs.jcim.2c01253
doi: 10.1021/acs.jcim.2c01253. URLhttps://doi.org/10.1021/acs.jcim.2c01253. PMID: 36790087. Cl´ ement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, and Pascal Frossard. DiGress: Discrete denoising diffusion for graph generation. InThe Eleventh International Conference on Learning Representations (ICLR 2023), 2023a. URL https://openre...
-
[14]
URL https://doi.org/10.1101/2024
doi: 10.1101/2024.11.19.624167. URL https://doi.org/10.1101/2024. 11.19.624167. Junfeng Xie, Sensen Chen, Jinping Lei, and Yuedong Yang. DiffDec: Structure-aware scaffold decoration with an end-to-end diffusion model.Journal of Chemical Information and Modeling, 64(7):2554–2564,
-
[15]
doi: 10.1021/acs.jcim.3c01466. URL https://doi. org/10.1021/acs.jcim.3c01466. 23 A Density Map conditioning Operations Given single representations ∈R B×N×d s where B is batch size, N is number of atoms, and ds = 384 is the single dimension, we compute: Qsample = Linear(s)∈R B×N×H×d h (29) Ksample =K learned ∈R H×O×d h (30) Vsample = Linear(s)∈R B×N×H×3 (...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.