Teaching Language Models Mechanistic Explainability Through MechSMILES
Pith reviewed 2026-05-17 00:29 UTC · model grok-4.3
The pith
Language models can predict complete reaction mechanisms from reactants and products using MechSMILES encoding
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MechSMILES is a compact textual format that encodes molecular structure together with electron flow using three arrow types inside a Python environment that automatically enforces conservation of mass and charge. Training language models on four mechanism-prediction tasks demonstrates that they can reconstruct physically plausible pathways, perform complete atom-to-atom mapping including hydrogens, and extract catalyst-aware templates. On the task of predicting mechanisms given only reactants, conditions, and desired product, the models reach 93.2 percent pathway retrieval on FlowER and 73.3 percent on mech-USPTO-31k, with top-3 retrieval of 97.6 percent and 86.5 percent respectively, and to
What carries the argument
MechSMILES, a Python-enforced textual encoding of molecular structure and electron flow via three arrow types that prevents atom hallucination while enforcing conservation laws
If this is right
- Post-hoc validation of CASP proposals by reconstructing physically plausible electron pathways
- Holistic atom-to-atom mapping that tracks every atom including hydrogens
- Extraction of catalyst-aware reaction templates distinguishing recycled catalysts from spectator species
- Rapid acquisition of new reaction classes such as ozonolysis and Suzuki cross-coupling from as few as 40 examples
Where Pith is reading between the lines
- The same encoding could be paired with graph neural networks to improve accuracy on larger or more complex molecules
- Testing on industrial reaction logs not seen during training would show whether the conservation rules transfer to noisy real-world data
- Mechanistic outputs might be used to generate entirely novel reaction hypotheses by exploring unseen but conservation-compliant arrow sequences
Load-bearing premise
That the MechSMILES textual encoding and arrow-pushing formalism faithfully capture all relevant mechanistic details without introducing artifacts or missing important pathways that would appear in real experimental conditions
What would settle it
A collection of reactions outside the training data where the model-generated mechanisms either violate observed experimental outcomes or break conservation of mass and charge
Figures
read the original abstract
Chemical reaction mechanisms are the foundation of how chemists evaluate reactivity and feasibility, yet current Computer-Assisted Synthesis Planning (CASP) systems operate without this mechanistic reasoning. We introduce a computational framework that teaches language models to predict reaction mechanisms through arrow-pushing formalism, a century-old notation that tracks electron flow while enforcing conservation of mass and charge. This mechanistic understanding enables three capabilities that are difficult or impossible with current methods: post-hoc validation of CASP proposals by reconstructing physically plausible electron pathways, holistic atom-to-atom mapping that tracks all atoms including hydrogens, and extraction of catalyst-aware reaction templates that distinguish recycled catalysts from spectator species. Central to our approach is MechSMILES, a compact textual format encoding molecular structure and electron flow through three arrow types, designed within a Python-based environment that enforces conservation laws and eliminates the possibility of atom hallucination. We trained and benchmarked models on four mechanism prediction tasks of increasing complexity using the main mechanistic datasets in the literature. On our most challenging task, predicting complete mechanisms given only reactants, conditions, and the desired product, our models achieve 93.2\% and 73.3\% pathway retrieval on the FlowER and mech-USPTO-31k datasets respectively, with top-3 retrieval reaching 97.6\% and 86.5\%. Furthermore, the framework rapidly learns new reaction classes, with strong mechanistic predictions for ozonolysis and Suzuki cross-coupling emerging from as few as 40 training examples each. By grounding predictions in physically meaningful electron movements, this work provides an architecture-agnostic, open-source foundation for more explainable and chemically valid CASP.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MechSMILES, a compact textual encoding of molecular structures and electron flows via three arrow types in an arrow-pushing formalism, together with a Python environment that enforces mass/charge conservation and forbids atom hallucination. Language models are trained on four mechanism-prediction tasks of increasing difficulty drawn from existing literature datasets; the central empirical claim is that, on the hardest task (complete mechanism prediction from reactants, conditions, and desired product), the models reach 93.2 % top-1 and 97.6 % top-3 pathway retrieval on FlowER and 73.3 % / 86.5 % on mech-USPTO-31k, while also enabling post-hoc validation of CASP proposals, holistic atom mapping, and extraction of catalyst-aware templates. The work further reports rapid adaptation to new reaction classes (e.g., ozonolysis, Suzuki) from as few as 40 examples.
Significance. If the reported retrieval rates reflect genuine internalization of electron-flow rules rather than sequence memorization, the framework would supply an architecture-agnostic, open-source substrate for chemically grounded CASP that can validate proposals, produce interpretable templates, and track all atoms including hydrogens. The few-shot adaptation results would additionally indicate practical utility in low-data mechanistic regimes.
major comments (2)
- [Results, complete-mechanism-prediction task] The central claim that the models acquire mechanistic reasoning rests on pathway-retrieval accuracy (93.2 % top-1 on FlowER). Because MechSMILES is a deterministic textual serialization of structures plus three arrow symbols and the only constraints are conservation laws, the metric reduces to exact or near-exact string reproduction; no ablation is reported that tests whether performance survives removal of training-set co-occurrence statistics or substitution of unseen arrow sequences.
- [Experimental setup and dataset description] The evaluation uses externally curated datasets (FlowER, mech-USPTO-31k) whose train/test splits are not described with respect to reaction-class novelty or mechanistic diversity. Without explicit hold-out of entire mechanistic families or perturbation experiments (e.g., altering a single arrow while preserving stoichiometry), it remains possible that high retrieval simply reproduces statistical patterns present in the training distribution.
minor comments (2)
- [Abstract] The abstract states that four tasks of increasing complexity were evaluated but does not enumerate them; a one-sentence list would orient readers before the detailed results.
- [MechSMILES definition] Provide at least one concrete MechSMILES example string together with its corresponding arrow-pushing diagram so that readers can verify the encoding of the three arrow types.
Simulated Author's Rebuttal
We are grateful to the referee for their insightful comments, which highlight important aspects of our evaluation. We respond to each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Results, complete-mechanism-prediction task] The central claim that the models acquire mechanistic reasoning rests on pathway-retrieval accuracy (93.2 % top-1 on FlowER). Because MechSMILES is a deterministic textual serialization of structures plus three arrow symbols and the only constraints are conservation laws, the metric reduces to exact or near-exact string reproduction; no ablation is reported that tests whether performance survives removal of training-set co-occurrence statistics or substitution of unseen arrow sequences.
Authors: Pathway retrieval requires the model to output a complete MechSMILES string whose arrow sequence encodes a chemically valid electron flow; the accompanying Python environment rejects any output that violates mass/charge conservation or introduces atom hallucination. This constraint set is stricter than unconstrained string matching. We did not include explicit ablations that remove co-occurrence statistics or substitute unseen arrow sequences. The few-shot results on ozonolysis and Suzuki (strong performance from 40 examples) supply indirect evidence of generalization, but we acknowledge the referee's point and will add targeted ablations in the revision, including performance on held-out arrow motifs and perturbed sequences that preserve stoichiometry. revision: partial
-
Referee: [Experimental setup and dataset description] The evaluation uses externally curated datasets (FlowER, mech-USPTO-31k) whose train/test splits are not described with respect to reaction-class novelty or mechanistic diversity. Without explicit hold-out of entire mechanistic families or perturbation experiments (e.g., altering a single arrow while preserving stoichiometry), it remains possible that high retrieval simply reproduces statistical patterns present in the training distribution.
Authors: We followed the train/test splits published with FlowER and mech-USPTO-31k. We will expand the revised manuscript with an explicit breakdown of reaction classes and mechanistic families appearing in each split. In addition, we will report results from perturbation experiments in which individual arrows are altered while stoichiometry is held fixed, thereby testing whether retrieval depends on exact training-distribution matches or on the underlying electron-flow rules. revision: yes
Circularity Check
No significant circularity; empirical results from external datasets with no self-referential derivations
full rationale
The paper defines MechSMILES as a new textual encoding and reports empirical pathway retrieval accuracies (93.2% top-1 on FlowER, etc.) obtained by training language models on literature-derived mechanistic datasets. No equations, uniqueness theorems, or predictions are shown to reduce by construction to quantities fitted inside the paper; the evaluation metric is standard sequence retrieval on held-out splits rather than a tautological fit. The derivation chain consists of standard ML training and benchmarking steps that remain independent of the reported performance numbers.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Arrow-pushing formalism can be represented in a compact textual format that automatically enforces conservation of mass and charge.
invented entities (1)
-
MechSMILES
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Grzybowski, Tomasz Badowski, Karol Molga, and Sara Szymku´c
Bartosz A. Grzybowski, Tomasz Badowski, Karol Molga, and Sara Szymku´c. Network search algorithms and scoring functions for advanced-level computerized synthesis planning.WIREs Comput. Mol. Sci., 13(1):e1630, 2023
work page 2023
-
[2]
1. Reaxys database, 2024. URLhttps://www.reaxys.com. (Accessed Jul 29, 2021)
work page 2024
-
[3]
Marwin HS Segler, Mike Preuss, and Mark P Waller. Planning chemical syntheses with deep neural networks and symbolic ai.Nature, 555(7698):604–610, 2018
work page 2018
-
[4]
Philippe Schwaller, Riccardo Petraglia, Valerio Zullo, Vishnu H Nair, Rico Andreas Haeusel- mann, Riccardo Pisoni, Costas Bekas, Anna Iuliano, and Teodoro Laino. Predicting ret- rosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chemical science, 11(12):3316–3325, 2020
work page 2020
-
[5]
AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning.J
Samuel Genheden, Amol Thakkar, Veronika Chadimová, Jean-Louis Reymond, Ola Engkvist, and Esben Bjerrum. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning.J. Cheminf., 12:1–9, 2020
work page 2020
-
[6]
Machine intelligence for chemical reaction space
Philippe Schwaller, Alain C Vaucher, Ruben Laplaza, Charlotte Bunne, Andreas Krause, Clemence Corminboeuf, and Teodoro Laino. Machine intelligence for chemical reaction space. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(5):e1604, 2022
work page 2022
-
[7]
Lakshidaa Saigiridharan, Alan Kai Hassen, Helen Lai, Paula Torren-Peraire, Ola Engkvist, and Samuel Genheden. Aizynthfinder 4.0: developments based on learnings from 3 years of industrial application.Journal of cheminformatics, 16(1):57, 2024
work page 2024
-
[8]
Zhengkai Tu, Sourabh J Choure, Mun Hong Fong, Jihye Roh, Itai Levin, Kevin Yu, Joonyoung F Joung, Nathan Morgan, Shih-Cheng Li, Xiaoqi Sun, et al. Askcos: Open-source, data-driven synthesis planning.Accounts of Chemical Research, 58(11):1764–1775, 2025
work page 2025
-
[9]
Andres M Bran, Theo A Neukomm, Daniel P Armstrong, Zlatko Jonˇcev, and Philippe Schwaller. Chemical reasoning in llms unlocks steerable synthesis planning and reaction mechanism elucidation.arXiv preprint arXiv:2503.08537, 2025
-
[10]
Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions.Journal of cheminfor- matics, 1(1):8, 2009
work page 2009
-
[11]
Connor W Coley, Luke Rogers, William H Green, and Klavs F Jensen. Scscore: synthetic complexity learned from a reaction corpus.Journal of chemical information and modeling, 58 (2):252–261, 2018
work page 2018
-
[12]
Rebecca M Neeser, Bruno Correia, and Philippe Schwaller. Fsscore: A personalized machine learning-based synthetic feasibility score.Chemistry-Methods, 4(11):e202400024, 2024
work page 2024
-
[13]
Sara Szymku ´c, Ewa P Gajewska, Tomasz Klucznik, Karol Molga, Piotr Dittwald, Michał Startek, Michał Bajczyk, and Bartosz A Grzybowski. Computer-assisted synthetic planning: the end of the beginning.Angewandte Chemie International Edition, 55(20):5904–5937, 2016
work page 2016
-
[14]
Hitesh Patel, Wolf-Dietrich Ihlenfeldt, Philip N Judson, Yurii S Moroz, Yuri Pevzner, Megan L Peach, Victorien Delannée, Nadya I Tarasova, and Marc C Nicklaus. Savi, in silico generation of billions of easily synthesizable compounds through expert-system type rules.Scientific data, 7(1):384, 2020
work page 2020
-
[15]
Smiles, a chemical language and information system
David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of chemical information and computer sciences, 28 (1):31–36, 1988
work page 1988
-
[16]
Kusner, Brooks Paige, Marwin H
John Bradshaw, Matt J. Kusner, Brooks Paige, Marwin H. S. Segler, and José Miguel Hernández- Lobato. A generative model for electron paths. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=r1x4BnCqKX. 10
work page 2019
-
[17]
Ryan J Miller, Alexander E Dashuta, Brayden Rudisill, David Van Vranken, and Pierre Baldi. Mechanism-aware deep learning for polar reaction prediction.Journal of the American Chemical Society, 2025
work page 2025
-
[18]
Shuan Chen, Kye Sung Park, Taewan Kim, Sunkyu Han, and Yousung Jung. Predicting chemical reaction outcomes based on electron movements using machine learning.arXiv preprint arXiv:2503.10197, 2025
-
[19]
Electron flow matching for generative reaction mechanism prediction
Joonyoung F Joung, Mun Hong Fong, Nicholas Casetti, Jordan P Liles, Ne S Dassanayake, and Connor W Coley. Electron flow matching for generative reaction mechanism prediction. Nature, pages 1–9, 2025
work page 2025
-
[20]
URL https://www.daylight.com/dayhtml/doc/theory/ theory.smirks.html
Daylight Theory: SMIRKS. URL https://www.daylight.com/dayhtml/doc/theory/ theory.smirks.html. (Accessed Nov 15, 2021)
work page 2021
-
[21]
Mohammadamin Tavakoli, Ryan J Miller, Mirana Claire Angel, Michael A Pfeiffer, Eugene S Gutman, Aaron D Mood, David Van Vranken, and Pierre Baldi. Pmechdb: A public database of elementary polar reaction steps.Journal of Chemical Information and Modeling, 64(6): 1975–1983, 2024
work page 1975
-
[22]
Shuan Chen, Ramil Babazade, Taewan Kim, Sunkyu Han, and Yousung Jung. A large-scale reaction dataset of mechanistic pathways of organic reactions.Scientific Data, 11(1):863, 2024
work page 2024
-
[23]
The future of chemistry is language.Nature Reviews Chemistry, 7(7): 457–458, 2023
Andrew D White. The future of chemistry is language.Nature Reviews Chemistry, 7(7): 457–458, 2023
work page 2023
-
[24]
Transformers and large language models for chemistry and drug discovery
Andres M Bran and Philippe Schwaller. Transformers and large language models for chemistry and drug discovery. InDrug Development Supported by Informatics, pages 143–163. Springer, 2024
work page 2024
-
[25]
A review of large language models and autonomous agents in chemistry.Chemical science, 2025
Mayk Caldas Ramos, Christopher J Collison, and Andrew D White. A review of large language models and autonomous agents in chemistry.Chemical science, 2025
work page 2025
-
[26]
Robert MacKnight, Daniil A Boiko, Jose Emilio Regio, Liliana C Gallegos, Théo A Neukomm, and Gabe Gomes. Rethinking chemical research in the age of large language models.Nature Computational Science, pages 1–12, 2025
work page 2025
-
[27]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020
work page 2020
-
[28]
Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A Hunter, Costas Bekas, and Alpha A Lee. Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction.ACS central science, 5(9):1572–1583, 2019
work page 2019
-
[29]
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[31]
Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019
work page 2019
-
[32]
Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic reaction outcomes with weisfeiler-lehman network.Advances in neural information processing systems, 30, 2017
work page 2017
-
[33]
Samuel Genheden and Esben Bjerrum. Paroutes: towards a framework for benchmarking retrosynthesis route predictions.Digital Discovery, 1(4):527–539, 2022
work page 2022
-
[34]
Chemical name to structure: Opsin, an open source solution, 2011
Daniel M Lowe, Peter T Corbett, Peter Murray-Rust, and Robert C Glen. Chemical name to structure: Opsin, an open source solution, 2011. 11
work page 2011
-
[35]
Philippe Schwaller, Benjamin Hoover, Jean-Louis Reymond, Hendrik Strobelt, and Teodoro Laino. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.Science Advances, 7(15):eabe4166, 2021
work page 2021
-
[36]
Shuan Chen, Sunggi An, Ramil Babazade, and Yousung Jung. Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning.Nature Communications, 15(1): 2250, 2024
work page 2024
-
[37]
rxnutils–a cheminfor- matics python library for manipulating chemical reaction data
Christos Kannas, Amol Thakkar, Esben Bjerrum, and Samuel Genheden. rxnutils–a cheminfor- matics python library for manipulating chemical reaction data. 2022
work page 2022
-
[38]
URLhttps://www.nextmovesoftware.com/namerxn.html
Nextmove software namerxn. URLhttps://www.nextmovesoftware.com/namerxn.html. (Accessed Nov 30, 2025). 12 Supplementary Information A Additional concrete example of CASP validation via mechanism prediction Figure S1: Example of a CASP validation of the multistep reaction visible in figure 1 of the PaRoutes paper (33). Each step of this retrosynthesis (numb...
work page 2025
-
[39]
Fully mapped elementary steps (similar to the Flower (19) format)
-
[40]
From a reactant and the set of all arrows (similar to the mech-USPTO-31k (22) format)
-
[41]
SMIRKS accompanied by an arrow-code (similar to PMechDB (21) format) Figure S2: Character length distribution to encode the same mechanistic data using the different formats mentioned in this work. The main difference between equilibrated and minimal MechSMILES is that the latter will not explicitely rewrite species that do not interact in the specific el...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.