arxiv: 2512.05722 · v2 · submitted 2025-12-05 · 💻 cs.LG · physics.chem-ph

Teaching Language Models Mechanistic Explainability Through MechSMILES

Th\'eo A. Neukomm , Zlatko Jon\v{c}ev , Philippe Schwaller This is my paper

Pith reviewed 2026-05-17 00:29 UTC · model grok-4.3

classification 💻 cs.LG physics.chem-ph

keywords chemical reaction mechanismsMechSMILESarrow-pushing formalismlanguage modelscomputer-assisted synthesis planningreaction pathway predictionelectron flow

0 comments

The pith

Language models can predict complete reaction mechanisms from reactants and products using MechSMILES encoding

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces MechSMILES as a textual way to represent how electrons move during chemical reactions through arrow-pushing notation. It trains language models to output these representations on tasks of growing difficulty, including the hardest case of generating full mechanisms when given only starting materials, conditions, and the target product. A sympathetic reader would care because current synthesis-planning tools suggest reactions without showing the underlying electron steps that determine whether the reaction is physically possible. The approach yields high retrieval rates on established datasets and allows models to pick up new reaction types after seeing only a small number of examples.

Core claim

MechSMILES is a compact textual format that encodes molecular structure together with electron flow using three arrow types inside a Python environment that automatically enforces conservation of mass and charge. Training language models on four mechanism-prediction tasks demonstrates that they can reconstruct physically plausible pathways, perform complete atom-to-atom mapping including hydrogens, and extract catalyst-aware templates. On the task of predicting mechanisms given only reactants, conditions, and desired product, the models reach 93.2 percent pathway retrieval on FlowER and 73.3 percent on mech-USPTO-31k, with top-3 retrieval of 97.6 percent and 86.5 percent respectively, and to

What carries the argument

MechSMILES, a Python-enforced textual encoding of molecular structure and electron flow via three arrow types that prevents atom hallucination while enforcing conservation laws

If this is right

Post-hoc validation of CASP proposals by reconstructing physically plausible electron pathways
Holistic atom-to-atom mapping that tracks every atom including hydrogens
Extraction of catalyst-aware reaction templates distinguishing recycled catalysts from spectator species
Rapid acquisition of new reaction classes such as ozonolysis and Suzuki cross-coupling from as few as 40 examples

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same encoding could be paired with graph neural networks to improve accuracy on larger or more complex molecules
Testing on industrial reaction logs not seen during training would show whether the conservation rules transfer to noisy real-world data
Mechanistic outputs might be used to generate entirely novel reaction hypotheses by exploring unseen but conservation-compliant arrow sequences

Load-bearing premise

That the MechSMILES textual encoding and arrow-pushing formalism faithfully capture all relevant mechanistic details without introducing artifacts or missing important pathways that would appear in real experimental conditions

What would settle it

A collection of reactions outside the training data where the model-generated mechanisms either violate observed experimental outcomes or break conservation of mass and charge

Figures

Figures reproduced from arXiv: 2512.05722 by Philippe Schwaller, Th\'eo A. Neukomm, Zlatko Jon\v{c}ev.

**Figure 2.** Figure 2: Reaction mechanism prediction framework. (a) Progressive task difficulty showing [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Transfer learning results showing important improvement after fine-tuning on small curated [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: a) Example of a CASP validation of the multistep reaction visible in figure S2 of the [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Few example reactions mapped both with SOTA tools, and with mechanistic mapping using [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Suzuki coupling reaction taken from the test set of the FlowER dataset [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Chemical reaction mechanisms are the foundation of how chemists evaluate reactivity and feasibility, yet current Computer-Assisted Synthesis Planning (CASP) systems operate without this mechanistic reasoning. We introduce a computational framework that teaches language models to predict reaction mechanisms through arrow-pushing formalism, a century-old notation that tracks electron flow while enforcing conservation of mass and charge. This mechanistic understanding enables three capabilities that are difficult or impossible with current methods: post-hoc validation of CASP proposals by reconstructing physically plausible electron pathways, holistic atom-to-atom mapping that tracks all atoms including hydrogens, and extraction of catalyst-aware reaction templates that distinguish recycled catalysts from spectator species. Central to our approach is MechSMILES, a compact textual format encoding molecular structure and electron flow through three arrow types, designed within a Python-based environment that enforces conservation laws and eliminates the possibility of atom hallucination. We trained and benchmarked models on four mechanism prediction tasks of increasing complexity using the main mechanistic datasets in the literature. On our most challenging task, predicting complete mechanisms given only reactants, conditions, and the desired product, our models achieve 93.2\% and 73.3\% pathway retrieval on the FlowER and mech-USPTO-31k datasets respectively, with top-3 retrieval reaching 97.6\% and 86.5\%. Furthermore, the framework rapidly learns new reaction classes, with strong mechanistic predictions for ozonolysis and Suzuki cross-coupling emerging from as few as 40 training examples each. By grounding predictions in physically meaningful electron movements, this work provides an architecture-agnostic, open-source foundation for more explainable and chemically valid CASP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MechSMILES gives LMs a compact way to output arrow-pushing steps and the reported retrieval numbers are high, but the setup leaves open whether the models are learning electron-flow rules or just reproducing common sequences.

read the letter

The core contribution is MechSMILES, a textual encoding that folds structure and three kinds of arrows into a single string. They wrap it in an environment that enforces mass and charge balance and blocks atom invention, then train language models on four tasks that range from simple arrow prediction up to full mechanism generation given only reactants, conditions, and product. On the hardest task they report 93.2 % top-1 and 97.6 % top-3 pathway retrieval on FlowER, with lower but still usable numbers on mech-USPTO-31k. They also show that new classes such as ozonolysis can be learned from roughly 40 examples. That combination of format, constraint enforcement, and few-shot behavior is the part that is actually new relative to prior CASP work on mechanisms. The conservation checks and the explicit arrow types are practical engineering choices that make the outputs more chemically plausible on paper. The rapid adaptation to new reaction classes is a concrete result worth noting. The main soft spot is the evaluation itself. Because the target is an exact or near-exact MechSMILES string and the only hard constraints are conservation rules, strong retrieval can be achieved by memorizing frequent arrow sequences that co-occur with particular reactant-product pairs. Nothing in the abstract or the stress-test description shows an analysis that separates sequence matching from rule learning, such as performance on held-out combinations that require novel electron-flow logic or error breakdowns by reaction type. The datasets are external literature collections, so there is no circularity in the numbers, but the lack of visible ablation on data splits or generalization tests keeps the mechanistic claim provisional. This is work for people building or evaluating CASP systems who care about adding some form of mechanistic trace. A reader who already works with arrow-pushing formalisms or who needs atom-mapping that includes hydrogens will see immediate utility in the format. The idea is coherent and the engineering is reproducible in principle, so it clears the bar for a serious referee even if the current evidence does not yet prove that the models have internalized general electron-flow reasoning. I would send it to review and ask specifically for those generalization checks.

Referee Report

2 major / 2 minor

Summary. The paper introduces MechSMILES, a compact textual encoding of molecular structures and electron flows via three arrow types in an arrow-pushing formalism, together with a Python environment that enforces mass/charge conservation and forbids atom hallucination. Language models are trained on four mechanism-prediction tasks of increasing difficulty drawn from existing literature datasets; the central empirical claim is that, on the hardest task (complete mechanism prediction from reactants, conditions, and desired product), the models reach 93.2 % top-1 and 97.6 % top-3 pathway retrieval on FlowER and 73.3 % / 86.5 % on mech-USPTO-31k, while also enabling post-hoc validation of CASP proposals, holistic atom mapping, and extraction of catalyst-aware templates. The work further reports rapid adaptation to new reaction classes (e.g., ozonolysis, Suzuki) from as few as 40 examples.

Significance. If the reported retrieval rates reflect genuine internalization of electron-flow rules rather than sequence memorization, the framework would supply an architecture-agnostic, open-source substrate for chemically grounded CASP that can validate proposals, produce interpretable templates, and track all atoms including hydrogens. The few-shot adaptation results would additionally indicate practical utility in low-data mechanistic regimes.

major comments (2)

[Results, complete-mechanism-prediction task] The central claim that the models acquire mechanistic reasoning rests on pathway-retrieval accuracy (93.2 % top-1 on FlowER). Because MechSMILES is a deterministic textual serialization of structures plus three arrow symbols and the only constraints are conservation laws, the metric reduces to exact or near-exact string reproduction; no ablation is reported that tests whether performance survives removal of training-set co-occurrence statistics or substitution of unseen arrow sequences.
[Experimental setup and dataset description] The evaluation uses externally curated datasets (FlowER, mech-USPTO-31k) whose train/test splits are not described with respect to reaction-class novelty or mechanistic diversity. Without explicit hold-out of entire mechanistic families or perturbation experiments (e.g., altering a single arrow while preserving stoichiometry), it remains possible that high retrieval simply reproduces statistical patterns present in the training distribution.

minor comments (2)

[Abstract] The abstract states that four tasks of increasing complexity were evaluated but does not enumerate them; a one-sentence list would orient readers before the detailed results.
[MechSMILES definition] Provide at least one concrete MechSMILES example string together with its corresponding arrow-pushing diagram so that readers can verify the encoding of the three arrow types.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their insightful comments, which highlight important aspects of our evaluation. We respond to each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Results, complete-mechanism-prediction task] The central claim that the models acquire mechanistic reasoning rests on pathway-retrieval accuracy (93.2 % top-1 on FlowER). Because MechSMILES is a deterministic textual serialization of structures plus three arrow symbols and the only constraints are conservation laws, the metric reduces to exact or near-exact string reproduction; no ablation is reported that tests whether performance survives removal of training-set co-occurrence statistics or substitution of unseen arrow sequences.

Authors: Pathway retrieval requires the model to output a complete MechSMILES string whose arrow sequence encodes a chemically valid electron flow; the accompanying Python environment rejects any output that violates mass/charge conservation or introduces atom hallucination. This constraint set is stricter than unconstrained string matching. We did not include explicit ablations that remove co-occurrence statistics or substitute unseen arrow sequences. The few-shot results on ozonolysis and Suzuki (strong performance from 40 examples) supply indirect evidence of generalization, but we acknowledge the referee's point and will add targeted ablations in the revision, including performance on held-out arrow motifs and perturbed sequences that preserve stoichiometry. revision: partial
Referee: [Experimental setup and dataset description] The evaluation uses externally curated datasets (FlowER, mech-USPTO-31k) whose train/test splits are not described with respect to reaction-class novelty or mechanistic diversity. Without explicit hold-out of entire mechanistic families or perturbation experiments (e.g., altering a single arrow while preserving stoichiometry), it remains possible that high retrieval simply reproduces statistical patterns present in the training distribution.

Authors: We followed the train/test splits published with FlowER and mech-USPTO-31k. We will expand the revised manuscript with an explicit breakdown of reaction classes and mechanistic families appearing in each split. In addition, we will report results from perturbation experiments in which individual arrows are altered while stoichiometry is held fixed, thereby testing whether retrieval depends on exact training-distribution matches or on the underlying electron-flow rules. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results from external datasets with no self-referential derivations

full rationale

The paper defines MechSMILES as a new textual encoding and reports empirical pathway retrieval accuracies (93.2% top-1 on FlowER, etc.) obtained by training language models on literature-derived mechanistic datasets. No equations, uniqueness theorems, or predictions are shown to reduce by construction to quantities fitted inside the paper; the evaluation metric is standard sequence retrieval on held-out splits rather than a tautological fit. The derivation chain consists of standard ML training and benchmarking steps that remain independent of the reported performance numbers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that arrow-pushing can be losslessly encoded in text while enforcing conservation, plus the representativeness of the literature mechanism datasets used for training.

axioms (1)

domain assumption Arrow-pushing formalism can be represented in a compact textual format that automatically enforces conservation of mass and charge.
This is the foundational premise stated in the abstract for the MechSMILES design.

invented entities (1)

MechSMILES no independent evidence
purpose: Textual encoding of molecular structure plus three types of electron-flow arrows for language-model training.
Newly introduced representation whose validity is central to the reported performance.

pith-pipeline@v0.9.0 · 5606 in / 1395 out tokens · 68657 ms · 2026-05-17T00:29:51.790244+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

[1]

Grzybowski, Tomasz Badowski, Karol Molga, and Sara Szymku´c

Bartosz A. Grzybowski, Tomasz Badowski, Karol Molga, and Sara Szymku´c. Network search algorithms and scoring functions for advanced-level computerized synthesis planning.WIREs Comput. Mol. Sci., 13(1):e1630, 2023

work page 2023
[2]

Reaxys database, 2024

1. Reaxys database, 2024. URLhttps://www.reaxys.com. (Accessed Jul 29, 2021)

work page 2024
[3]

Planning chemical syntheses with deep neural networks and symbolic ai.Nature, 555(7698):604–610, 2018

Marwin HS Segler, Mike Preuss, and Mark P Waller. Planning chemical syntheses with deep neural networks and symbolic ai.Nature, 555(7698):604–610, 2018

work page 2018
[4]

Predicting ret- rosynthetic pathways using transformer-based models and a hyper-graph exploration strategy

Philippe Schwaller, Riccardo Petraglia, Valerio Zullo, Vishnu H Nair, Rico Andreas Haeusel- mann, Riccardo Pisoni, Costas Bekas, Anna Iuliano, and Teodoro Laino. Predicting ret- rosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chemical science, 11(12):3316–3325, 2020

work page 2020
[5]

AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning.J

Samuel Genheden, Amol Thakkar, Veronika Chadimová, Jean-Louis Reymond, Ola Engkvist, and Esben Bjerrum. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning.J. Cheminf., 12:1–9, 2020

work page 2020
[6]

Machine intelligence for chemical reaction space

Philippe Schwaller, Alain C Vaucher, Ruben Laplaza, Charlotte Bunne, Andreas Krause, Clemence Corminboeuf, and Teodoro Laino. Machine intelligence for chemical reaction space. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(5):e1604, 2022

work page 2022
[7]

Aizynthfinder 4.0: developments based on learnings from 3 years of industrial application.Journal of cheminformatics, 16(1):57, 2024

Lakshidaa Saigiridharan, Alan Kai Hassen, Helen Lai, Paula Torren-Peraire, Ola Engkvist, and Samuel Genheden. Aizynthfinder 4.0: developments based on learnings from 3 years of industrial application.Journal of cheminformatics, 16(1):57, 2024

work page 2024
[8]

Askcos: Open-source, data-driven synthesis planning.Accounts of Chemical Research, 58(11):1764–1775, 2025

Zhengkai Tu, Sourabh J Choure, Mun Hong Fong, Jihye Roh, Itai Levin, Kevin Yu, Joonyoung F Joung, Nathan Morgan, Shih-Cheng Li, Xiaoqi Sun, et al. Askcos: Open-source, data-driven synthesis planning.Accounts of Chemical Research, 58(11):1764–1775, 2025

work page 2025
[9]

Chemical reasoning in llms unlocks steerable synthesis planning and reaction mechanism elucidation.arXiv preprint arXiv:2503.08537, 2025

Andres M Bran, Theo A Neukomm, Daniel P Armstrong, Zlatko Jonˇcev, and Philippe Schwaller. Chemical reasoning in llms unlocks steerable synthesis planning and reaction mechanism elucidation.arXiv preprint arXiv:2503.08537, 2025

work page arXiv 2025
[10]

Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions.Journal of cheminfor- matics, 1(1):8, 2009

Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions.Journal of cheminfor- matics, 1(1):8, 2009

work page 2009
[11]

Scscore: synthetic complexity learned from a reaction corpus.Journal of chemical information and modeling, 58 (2):252–261, 2018

Connor W Coley, Luke Rogers, William H Green, and Klavs F Jensen. Scscore: synthetic complexity learned from a reaction corpus.Journal of chemical information and modeling, 58 (2):252–261, 2018

work page 2018
[12]

Fsscore: A personalized machine learning-based synthetic feasibility score.Chemistry-Methods, 4(11):e202400024, 2024

Rebecca M Neeser, Bruno Correia, and Philippe Schwaller. Fsscore: A personalized machine learning-based synthetic feasibility score.Chemistry-Methods, 4(11):e202400024, 2024

work page 2024
[13]

Computer-assisted synthetic planning: the end of the beginning.Angewandte Chemie International Edition, 55(20):5904–5937, 2016

Sara Szymku ´c, Ewa P Gajewska, Tomasz Klucznik, Karol Molga, Piotr Dittwald, Michał Startek, Michał Bajczyk, and Bartosz A Grzybowski. Computer-assisted synthetic planning: the end of the beginning.Angewandte Chemie International Edition, 55(20):5904–5937, 2016

work page 2016
[14]

Savi, in silico generation of billions of easily synthesizable compounds through expert-system type rules.Scientific data, 7(1):384, 2020

Hitesh Patel, Wolf-Dietrich Ihlenfeldt, Philip N Judson, Yurii S Moroz, Yuri Pevzner, Megan L Peach, Victorien Delannée, Nadya I Tarasova, and Marc C Nicklaus. Savi, in silico generation of billions of easily synthesizable compounds through expert-system type rules.Scientific data, 7(1):384, 2020

work page 2020
[15]

Smiles, a chemical language and information system

David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of chemical information and computer sciences, 28 (1):31–36, 1988

work page 1988
[16]

Kusner, Brooks Paige, Marwin H

John Bradshaw, Matt J. Kusner, Brooks Paige, Marwin H. S. Segler, and José Miguel Hernández- Lobato. A generative model for electron paths. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=r1x4BnCqKX. 10

work page 2019
[17]

Mechanism-aware deep learning for polar reaction prediction.Journal of the American Chemical Society, 2025

Ryan J Miller, Alexander E Dashuta, Brayden Rudisill, David Van Vranken, and Pierre Baldi. Mechanism-aware deep learning for polar reaction prediction.Journal of the American Chemical Society, 2025

work page 2025
[18]

Predicting chemical reaction outcomes based on electron movements using machine learning.arXiv preprint arXiv:2503.10197, 2025

Shuan Chen, Kye Sung Park, Taewan Kim, Sunkyu Han, and Yousung Jung. Predicting chemical reaction outcomes based on electron movements using machine learning.arXiv preprint arXiv:2503.10197, 2025

work page arXiv 2025
[19]

Electron flow matching for generative reaction mechanism prediction

Joonyoung F Joung, Mun Hong Fong, Nicholas Casetti, Jordan P Liles, Ne S Dassanayake, and Connor W Coley. Electron flow matching for generative reaction mechanism prediction. Nature, pages 1–9, 2025

work page 2025
[20]

URL https://www.daylight.com/dayhtml/doc/theory/ theory.smirks.html

Daylight Theory: SMIRKS. URL https://www.daylight.com/dayhtml/doc/theory/ theory.smirks.html. (Accessed Nov 15, 2021)

work page 2021
[21]

Pmechdb: A public database of elementary polar reaction steps.Journal of Chemical Information and Modeling, 64(6): 1975–1983, 2024

Mohammadamin Tavakoli, Ryan J Miller, Mirana Claire Angel, Michael A Pfeiffer, Eugene S Gutman, Aaron D Mood, David Van Vranken, and Pierre Baldi. Pmechdb: A public database of elementary polar reaction steps.Journal of Chemical Information and Modeling, 64(6): 1975–1983, 2024

work page 1975
[22]

A large-scale reaction dataset of mechanistic pathways of organic reactions.Scientific Data, 11(1):863, 2024

Shuan Chen, Ramil Babazade, Taewan Kim, Sunkyu Han, and Yousung Jung. A large-scale reaction dataset of mechanistic pathways of organic reactions.Scientific Data, 11(1):863, 2024

work page 2024
[23]

The future of chemistry is language.Nature Reviews Chemistry, 7(7): 457–458, 2023

Andrew D White. The future of chemistry is language.Nature Reviews Chemistry, 7(7): 457–458, 2023

work page 2023
[24]

Transformers and large language models for chemistry and drug discovery

Andres M Bran and Philippe Schwaller. Transformers and large language models for chemistry and drug discovery. InDrug Development Supported by Informatics, pages 143–163. Springer, 2024

work page 2024
[25]

A review of large language models and autonomous agents in chemistry.Chemical science, 2025

Mayk Caldas Ramos, Christopher J Collison, and Andrew D White. A review of large language models and autonomous agents in chemistry.Chemical science, 2025

work page 2025
[26]

Rethinking chemical research in the age of large language models.Nature Computational Science, pages 1–12, 2025

Robert MacKnight, Daniil A Boiko, Jose Emilio Regio, Liliana C Gallegos, Théo A Neukomm, and Gabe Gomes. Rethinking chemical research in the age of large language models.Nature Computational Science, pages 1–12, 2025

work page 2025
[27]

Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

work page 2020
[28]

Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction.ACS central science, 5(9):1572–1583, 2019

Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A Hunter, Costas Bekas, and Alpha A Lee. Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction.ACS central science, 5(9):1572–1583, 2019

work page 2019
[29]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[30]

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

work page 2019
[32]

Predicting organic reaction outcomes with weisfeiler-lehman network.Advances in neural information processing systems, 30, 2017

Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic reaction outcomes with weisfeiler-lehman network.Advances in neural information processing systems, 30, 2017

work page 2017
[33]

Paroutes: towards a framework for benchmarking retrosynthesis route predictions.Digital Discovery, 1(4):527–539, 2022

Samuel Genheden and Esben Bjerrum. Paroutes: towards a framework for benchmarking retrosynthesis route predictions.Digital Discovery, 1(4):527–539, 2022

work page 2022
[34]

Chemical name to structure: Opsin, an open source solution, 2011

Daniel M Lowe, Peter T Corbett, Peter Murray-Rust, and Robert C Glen. Chemical name to structure: Opsin, an open source solution, 2011. 11

work page 2011
[35]

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.Science Advances, 7(15):eabe4166, 2021

Philippe Schwaller, Benjamin Hoover, Jean-Louis Reymond, Hendrik Strobelt, and Teodoro Laino. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.Science Advances, 7(15):eabe4166, 2021

work page 2021
[36]

Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning.Nature Communications, 15(1): 2250, 2024

Shuan Chen, Sunggi An, Ramil Babazade, and Yousung Jung. Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning.Nature Communications, 15(1): 2250, 2024

work page 2024
[37]

rxnutils–a cheminfor- matics python library for manipulating chemical reaction data

Christos Kannas, Amol Thakkar, Esben Bjerrum, and Samuel Genheden. rxnutils–a cheminfor- matics python library for manipulating chemical reaction data. 2022

work page 2022
[38]

URLhttps://www.nextmovesoftware.com/namerxn.html

Nextmove software namerxn. URLhttps://www.nextmovesoftware.com/namerxn.html. (Accessed Nov 30, 2025). 12 Supplementary Information A Additional concrete example of CASP validation via mechanism prediction Figure S1: Example of a CASP validation of the multistep reaction visible in figure 1 of the PaRoutes paper (33). Each step of this retrosynthesis (numb...

work page 2025
[39]

Fully mapped elementary steps (similar to the Flower (19) format)

work page
[40]

From a reactant and the set of all arrows (similar to the mech-USPTO-31k (22) format)

work page
[41]

reaction without by- products

SMIRKS accompanied by an arrow-code (similar to PMechDB (21) format) Figure S2: Character length distribution to encode the same mechanistic data using the different formats mentioned in this work. The main difference between equilibrated and minimal MechSMILES is that the latter will not explicitely rewrite species that do not interact in the specific el...

work page