pith. machine review for the scientific record. sign in

arxiv: 2512.05722 · v2 · submitted 2025-12-05 · 💻 cs.LG · physics.chem-ph

Teaching Language Models Mechanistic Explainability Through MechSMILES

Pith reviewed 2026-05-17 00:29 UTC · model grok-4.3

classification 💻 cs.LG physics.chem-ph
keywords chemical reaction mechanismsMechSMILESarrow-pushing formalismlanguage modelscomputer-assisted synthesis planningreaction pathway predictionelectron flow
0
0 comments X

The pith

Language models can predict complete reaction mechanisms from reactants and products using MechSMILES encoding

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces MechSMILES as a textual way to represent how electrons move during chemical reactions through arrow-pushing notation. It trains language models to output these representations on tasks of growing difficulty, including the hardest case of generating full mechanisms when given only starting materials, conditions, and the target product. A sympathetic reader would care because current synthesis-planning tools suggest reactions without showing the underlying electron steps that determine whether the reaction is physically possible. The approach yields high retrieval rates on established datasets and allows models to pick up new reaction types after seeing only a small number of examples.

Core claim

MechSMILES is a compact textual format that encodes molecular structure together with electron flow using three arrow types inside a Python environment that automatically enforces conservation of mass and charge. Training language models on four mechanism-prediction tasks demonstrates that they can reconstruct physically plausible pathways, perform complete atom-to-atom mapping including hydrogens, and extract catalyst-aware templates. On the task of predicting mechanisms given only reactants, conditions, and desired product, the models reach 93.2 percent pathway retrieval on FlowER and 73.3 percent on mech-USPTO-31k, with top-3 retrieval of 97.6 percent and 86.5 percent respectively, and to

What carries the argument

MechSMILES, a Python-enforced textual encoding of molecular structure and electron flow via three arrow types that prevents atom hallucination while enforcing conservation laws

If this is right

  • Post-hoc validation of CASP proposals by reconstructing physically plausible electron pathways
  • Holistic atom-to-atom mapping that tracks every atom including hydrogens
  • Extraction of catalyst-aware reaction templates distinguishing recycled catalysts from spectator species
  • Rapid acquisition of new reaction classes such as ozonolysis and Suzuki cross-coupling from as few as 40 examples

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same encoding could be paired with graph neural networks to improve accuracy on larger or more complex molecules
  • Testing on industrial reaction logs not seen during training would show whether the conservation rules transfer to noisy real-world data
  • Mechanistic outputs might be used to generate entirely novel reaction hypotheses by exploring unseen but conservation-compliant arrow sequences

Load-bearing premise

That the MechSMILES textual encoding and arrow-pushing formalism faithfully capture all relevant mechanistic details without introducing artifacts or missing important pathways that would appear in real experimental conditions

What would settle it

A collection of reactions outside the training data where the model-generated mechanisms either violate observed experimental outcomes or break conservation of mass and charge

Figures

Figures reproduced from arXiv: 2512.05722 by Philippe Schwaller, Th\'eo A. Neukomm, Zlatko Jon\v{c}ev.

Figure 1
Figure 1. Figure 1: Two examples of MechSMILES, with the colors illustrating the purpose of each part of the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Reaction mechanism prediction framework. (a) Progressive task difficulty showing [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Transfer learning results showing important improvement after fine-tuning on small curated [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: a) Example of a CASP validation of the multistep reaction visible in figure S2 of the [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Few example reactions mapped both with SOTA tools, and with mechanistic mapping using [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Suzuki coupling reaction taken from the test set of the FlowER dataset [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Chemical reaction mechanisms are the foundation of how chemists evaluate reactivity and feasibility, yet current Computer-Assisted Synthesis Planning (CASP) systems operate without this mechanistic reasoning. We introduce a computational framework that teaches language models to predict reaction mechanisms through arrow-pushing formalism, a century-old notation that tracks electron flow while enforcing conservation of mass and charge. This mechanistic understanding enables three capabilities that are difficult or impossible with current methods: post-hoc validation of CASP proposals by reconstructing physically plausible electron pathways, holistic atom-to-atom mapping that tracks all atoms including hydrogens, and extraction of catalyst-aware reaction templates that distinguish recycled catalysts from spectator species. Central to our approach is MechSMILES, a compact textual format encoding molecular structure and electron flow through three arrow types, designed within a Python-based environment that enforces conservation laws and eliminates the possibility of atom hallucination. We trained and benchmarked models on four mechanism prediction tasks of increasing complexity using the main mechanistic datasets in the literature. On our most challenging task, predicting complete mechanisms given only reactants, conditions, and the desired product, our models achieve 93.2\% and 73.3\% pathway retrieval on the FlowER and mech-USPTO-31k datasets respectively, with top-3 retrieval reaching 97.6\% and 86.5\%. Furthermore, the framework rapidly learns new reaction classes, with strong mechanistic predictions for ozonolysis and Suzuki cross-coupling emerging from as few as 40 training examples each. By grounding predictions in physically meaningful electron movements, this work provides an architecture-agnostic, open-source foundation for more explainable and chemically valid CASP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MechSMILES, a compact textual encoding of molecular structures and electron flows via three arrow types in an arrow-pushing formalism, together with a Python environment that enforces mass/charge conservation and forbids atom hallucination. Language models are trained on four mechanism-prediction tasks of increasing difficulty drawn from existing literature datasets; the central empirical claim is that, on the hardest task (complete mechanism prediction from reactants, conditions, and desired product), the models reach 93.2 % top-1 and 97.6 % top-3 pathway retrieval on FlowER and 73.3 % / 86.5 % on mech-USPTO-31k, while also enabling post-hoc validation of CASP proposals, holistic atom mapping, and extraction of catalyst-aware templates. The work further reports rapid adaptation to new reaction classes (e.g., ozonolysis, Suzuki) from as few as 40 examples.

Significance. If the reported retrieval rates reflect genuine internalization of electron-flow rules rather than sequence memorization, the framework would supply an architecture-agnostic, open-source substrate for chemically grounded CASP that can validate proposals, produce interpretable templates, and track all atoms including hydrogens. The few-shot adaptation results would additionally indicate practical utility in low-data mechanistic regimes.

major comments (2)
  1. [Results, complete-mechanism-prediction task] The central claim that the models acquire mechanistic reasoning rests on pathway-retrieval accuracy (93.2 % top-1 on FlowER). Because MechSMILES is a deterministic textual serialization of structures plus three arrow symbols and the only constraints are conservation laws, the metric reduces to exact or near-exact string reproduction; no ablation is reported that tests whether performance survives removal of training-set co-occurrence statistics or substitution of unseen arrow sequences.
  2. [Experimental setup and dataset description] The evaluation uses externally curated datasets (FlowER, mech-USPTO-31k) whose train/test splits are not described with respect to reaction-class novelty or mechanistic diversity. Without explicit hold-out of entire mechanistic families or perturbation experiments (e.g., altering a single arrow while preserving stoichiometry), it remains possible that high retrieval simply reproduces statistical patterns present in the training distribution.
minor comments (2)
  1. [Abstract] The abstract states that four tasks of increasing complexity were evaluated but does not enumerate them; a one-sentence list would orient readers before the detailed results.
  2. [MechSMILES definition] Provide at least one concrete MechSMILES example string together with its corresponding arrow-pushing diagram so that readers can verify the encoding of the three arrow types.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their insightful comments, which highlight important aspects of our evaluation. We respond to each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Results, complete-mechanism-prediction task] The central claim that the models acquire mechanistic reasoning rests on pathway-retrieval accuracy (93.2 % top-1 on FlowER). Because MechSMILES is a deterministic textual serialization of structures plus three arrow symbols and the only constraints are conservation laws, the metric reduces to exact or near-exact string reproduction; no ablation is reported that tests whether performance survives removal of training-set co-occurrence statistics or substitution of unseen arrow sequences.

    Authors: Pathway retrieval requires the model to output a complete MechSMILES string whose arrow sequence encodes a chemically valid electron flow; the accompanying Python environment rejects any output that violates mass/charge conservation or introduces atom hallucination. This constraint set is stricter than unconstrained string matching. We did not include explicit ablations that remove co-occurrence statistics or substitute unseen arrow sequences. The few-shot results on ozonolysis and Suzuki (strong performance from 40 examples) supply indirect evidence of generalization, but we acknowledge the referee's point and will add targeted ablations in the revision, including performance on held-out arrow motifs and perturbed sequences that preserve stoichiometry. revision: partial

  2. Referee: [Experimental setup and dataset description] The evaluation uses externally curated datasets (FlowER, mech-USPTO-31k) whose train/test splits are not described with respect to reaction-class novelty or mechanistic diversity. Without explicit hold-out of entire mechanistic families or perturbation experiments (e.g., altering a single arrow while preserving stoichiometry), it remains possible that high retrieval simply reproduces statistical patterns present in the training distribution.

    Authors: We followed the train/test splits published with FlowER and mech-USPTO-31k. We will expand the revised manuscript with an explicit breakdown of reaction classes and mechanistic families appearing in each split. In addition, we will report results from perturbation experiments in which individual arrows are altered while stoichiometry is held fixed, thereby testing whether retrieval depends on exact training-distribution matches or on the underlying electron-flow rules. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results from external datasets with no self-referential derivations

full rationale

The paper defines MechSMILES as a new textual encoding and reports empirical pathway retrieval accuracies (93.2% top-1 on FlowER, etc.) obtained by training language models on literature-derived mechanistic datasets. No equations, uniqueness theorems, or predictions are shown to reduce by construction to quantities fitted inside the paper; the evaluation metric is standard sequence retrieval on held-out splits rather than a tautological fit. The derivation chain consists of standard ML training and benchmarking steps that remain independent of the reported performance numbers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that arrow-pushing can be losslessly encoded in text while enforcing conservation, plus the representativeness of the literature mechanism datasets used for training.

axioms (1)
  • domain assumption Arrow-pushing formalism can be represented in a compact textual format that automatically enforces conservation of mass and charge.
    This is the foundational premise stated in the abstract for the MechSMILES design.
invented entities (1)
  • MechSMILES no independent evidence
    purpose: Textual encoding of molecular structure plus three types of electron-flow arrows for language-model training.
    Newly introduced representation whose validity is central to the reported performance.

pith-pipeline@v0.9.0 · 5606 in / 1395 out tokens · 68657 ms · 2026-05-17T00:29:51.790244+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

  1. [1]

    Grzybowski, Tomasz Badowski, Karol Molga, and Sara Szymku´c

    Bartosz A. Grzybowski, Tomasz Badowski, Karol Molga, and Sara Szymku´c. Network search algorithms and scoring functions for advanced-level computerized synthesis planning.WIREs Comput. Mol. Sci., 13(1):e1630, 2023

  2. [2]

    Reaxys database, 2024

    1. Reaxys database, 2024. URLhttps://www.reaxys.com. (Accessed Jul 29, 2021)

  3. [3]

    Planning chemical syntheses with deep neural networks and symbolic ai.Nature, 555(7698):604–610, 2018

    Marwin HS Segler, Mike Preuss, and Mark P Waller. Planning chemical syntheses with deep neural networks and symbolic ai.Nature, 555(7698):604–610, 2018

  4. [4]

    Predicting ret- rosynthetic pathways using transformer-based models and a hyper-graph exploration strategy

    Philippe Schwaller, Riccardo Petraglia, Valerio Zullo, Vishnu H Nair, Rico Andreas Haeusel- mann, Riccardo Pisoni, Costas Bekas, Anna Iuliano, and Teodoro Laino. Predicting ret- rosynthetic pathways using transformer-based models and a hyper-graph exploration strategy. Chemical science, 11(12):3316–3325, 2020

  5. [5]

    AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning.J

    Samuel Genheden, Amol Thakkar, Veronika Chadimová, Jean-Louis Reymond, Ola Engkvist, and Esben Bjerrum. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning.J. Cheminf., 12:1–9, 2020

  6. [6]

    Machine intelligence for chemical reaction space

    Philippe Schwaller, Alain C Vaucher, Ruben Laplaza, Charlotte Bunne, Andreas Krause, Clemence Corminboeuf, and Teodoro Laino. Machine intelligence for chemical reaction space. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(5):e1604, 2022

  7. [7]

    Aizynthfinder 4.0: developments based on learnings from 3 years of industrial application.Journal of cheminformatics, 16(1):57, 2024

    Lakshidaa Saigiridharan, Alan Kai Hassen, Helen Lai, Paula Torren-Peraire, Ola Engkvist, and Samuel Genheden. Aizynthfinder 4.0: developments based on learnings from 3 years of industrial application.Journal of cheminformatics, 16(1):57, 2024

  8. [8]

    Askcos: Open-source, data-driven synthesis planning.Accounts of Chemical Research, 58(11):1764–1775, 2025

    Zhengkai Tu, Sourabh J Choure, Mun Hong Fong, Jihye Roh, Itai Levin, Kevin Yu, Joonyoung F Joung, Nathan Morgan, Shih-Cheng Li, Xiaoqi Sun, et al. Askcos: Open-source, data-driven synthesis planning.Accounts of Chemical Research, 58(11):1764–1775, 2025

  9. [9]

    Chemical reasoning in llms unlocks steerable synthesis planning and reaction mechanism elucidation.arXiv preprint arXiv:2503.08537, 2025

    Andres M Bran, Theo A Neukomm, Daniel P Armstrong, Zlatko Jonˇcev, and Philippe Schwaller. Chemical reasoning in llms unlocks steerable synthesis planning and reaction mechanism elucidation.arXiv preprint arXiv:2503.08537, 2025

  10. [10]

    Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions.Journal of cheminfor- matics, 1(1):8, 2009

    Peter Ertl and Ansgar Schuffenhauer. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions.Journal of cheminfor- matics, 1(1):8, 2009

  11. [11]

    Scscore: synthetic complexity learned from a reaction corpus.Journal of chemical information and modeling, 58 (2):252–261, 2018

    Connor W Coley, Luke Rogers, William H Green, and Klavs F Jensen. Scscore: synthetic complexity learned from a reaction corpus.Journal of chemical information and modeling, 58 (2):252–261, 2018

  12. [12]

    Fsscore: A personalized machine learning-based synthetic feasibility score.Chemistry-Methods, 4(11):e202400024, 2024

    Rebecca M Neeser, Bruno Correia, and Philippe Schwaller. Fsscore: A personalized machine learning-based synthetic feasibility score.Chemistry-Methods, 4(11):e202400024, 2024

  13. [13]

    Computer-assisted synthetic planning: the end of the beginning.Angewandte Chemie International Edition, 55(20):5904–5937, 2016

    Sara Szymku ´c, Ewa P Gajewska, Tomasz Klucznik, Karol Molga, Piotr Dittwald, Michał Startek, Michał Bajczyk, and Bartosz A Grzybowski. Computer-assisted synthetic planning: the end of the beginning.Angewandte Chemie International Edition, 55(20):5904–5937, 2016

  14. [14]

    Savi, in silico generation of billions of easily synthesizable compounds through expert-system type rules.Scientific data, 7(1):384, 2020

    Hitesh Patel, Wolf-Dietrich Ihlenfeldt, Philip N Judson, Yurii S Moroz, Yuri Pevzner, Megan L Peach, Victorien Delannée, Nadya I Tarasova, and Marc C Nicklaus. Savi, in silico generation of billions of easily synthesizable compounds through expert-system type rules.Scientific data, 7(1):384, 2020

  15. [15]

    Smiles, a chemical language and information system

    David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules.Journal of chemical information and computer sciences, 28 (1):31–36, 1988

  16. [16]

    Kusner, Brooks Paige, Marwin H

    John Bradshaw, Matt J. Kusner, Brooks Paige, Marwin H. S. Segler, and José Miguel Hernández- Lobato. A generative model for electron paths. InInternational Conference on Learning Representations, 2019. URLhttps://openreview.net/forum?id=r1x4BnCqKX. 10

  17. [17]

    Mechanism-aware deep learning for polar reaction prediction.Journal of the American Chemical Society, 2025

    Ryan J Miller, Alexander E Dashuta, Brayden Rudisill, David Van Vranken, and Pierre Baldi. Mechanism-aware deep learning for polar reaction prediction.Journal of the American Chemical Society, 2025

  18. [18]

    Predicting chemical reaction outcomes based on electron movements using machine learning.arXiv preprint arXiv:2503.10197, 2025

    Shuan Chen, Kye Sung Park, Taewan Kim, Sunkyu Han, and Yousung Jung. Predicting chemical reaction outcomes based on electron movements using machine learning.arXiv preprint arXiv:2503.10197, 2025

  19. [19]

    Electron flow matching for generative reaction mechanism prediction

    Joonyoung F Joung, Mun Hong Fong, Nicholas Casetti, Jordan P Liles, Ne S Dassanayake, and Connor W Coley. Electron flow matching for generative reaction mechanism prediction. Nature, pages 1–9, 2025

  20. [20]

    URL https://www.daylight.com/dayhtml/doc/theory/ theory.smirks.html

    Daylight Theory: SMIRKS. URL https://www.daylight.com/dayhtml/doc/theory/ theory.smirks.html. (Accessed Nov 15, 2021)

  21. [21]

    Pmechdb: A public database of elementary polar reaction steps.Journal of Chemical Information and Modeling, 64(6): 1975–1983, 2024

    Mohammadamin Tavakoli, Ryan J Miller, Mirana Claire Angel, Michael A Pfeiffer, Eugene S Gutman, Aaron D Mood, David Van Vranken, and Pierre Baldi. Pmechdb: A public database of elementary polar reaction steps.Journal of Chemical Information and Modeling, 64(6): 1975–1983, 2024

  22. [22]

    A large-scale reaction dataset of mechanistic pathways of organic reactions.Scientific Data, 11(1):863, 2024

    Shuan Chen, Ramil Babazade, Taewan Kim, Sunkyu Han, and Yousung Jung. A large-scale reaction dataset of mechanistic pathways of organic reactions.Scientific Data, 11(1):863, 2024

  23. [23]

    The future of chemistry is language.Nature Reviews Chemistry, 7(7): 457–458, 2023

    Andrew D White. The future of chemistry is language.Nature Reviews Chemistry, 7(7): 457–458, 2023

  24. [24]

    Transformers and large language models for chemistry and drug discovery

    Andres M Bran and Philippe Schwaller. Transformers and large language models for chemistry and drug discovery. InDrug Development Supported by Informatics, pages 143–163. Springer, 2024

  25. [25]

    A review of large language models and autonomous agents in chemistry.Chemical science, 2025

    Mayk Caldas Ramos, Christopher J Collison, and Andrew D White. A review of large language models and autonomous agents in chemistry.Chemical science, 2025

  26. [26]

    Rethinking chemical research in the age of large language models.Nature Computational Science, pages 1–12, 2025

    Robert MacKnight, Daniil A Boiko, Jose Emilio Regio, Liliana C Gallegos, Théo A Neukomm, and Gabe Gomes. Rethinking chemical research in the age of large language models.Nature Computational Science, pages 1–12, 2025

  27. [27]

    Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

  28. [28]

    Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction.ACS central science, 5(9):1572–1583, 2019

    Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A Hunter, Costas Bekas, and Alpha A Lee. Molecular transformer: A model for uncertainty-calibrated chemical reaction prediction.ACS central science, 5(9):1572–1583, 2019

  29. [29]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

  30. [30]

    Gemma: Open Models Based on Gemini Research and Technology

    Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024

  31. [31]

    Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

  32. [32]

    Predicting organic reaction outcomes with weisfeiler-lehman network.Advances in neural information processing systems, 30, 2017

    Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic reaction outcomes with weisfeiler-lehman network.Advances in neural information processing systems, 30, 2017

  33. [33]

    Paroutes: towards a framework for benchmarking retrosynthesis route predictions.Digital Discovery, 1(4):527–539, 2022

    Samuel Genheden and Esben Bjerrum. Paroutes: towards a framework for benchmarking retrosynthesis route predictions.Digital Discovery, 1(4):527–539, 2022

  34. [34]

    Chemical name to structure: Opsin, an open source solution, 2011

    Daniel M Lowe, Peter T Corbett, Peter Murray-Rust, and Robert C Glen. Chemical name to structure: Opsin, an open source solution, 2011. 11

  35. [35]

    Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.Science Advances, 7(15):eabe4166, 2021

    Philippe Schwaller, Benjamin Hoover, Jean-Louis Reymond, Hendrik Strobelt, and Teodoro Laino. Extraction of organic chemistry grammar from unsupervised learning of chemical reactions.Science Advances, 7(15):eabe4166, 2021

  36. [36]

    Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning.Nature Communications, 15(1): 2250, 2024

    Shuan Chen, Sunggi An, Ramil Babazade, and Yousung Jung. Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning.Nature Communications, 15(1): 2250, 2024

  37. [37]

    rxnutils–a cheminfor- matics python library for manipulating chemical reaction data

    Christos Kannas, Amol Thakkar, Esben Bjerrum, and Samuel Genheden. rxnutils–a cheminfor- matics python library for manipulating chemical reaction data. 2022

  38. [38]

    URLhttps://www.nextmovesoftware.com/namerxn.html

    Nextmove software namerxn. URLhttps://www.nextmovesoftware.com/namerxn.html. (Accessed Nov 30, 2025). 12 Supplementary Information A Additional concrete example of CASP validation via mechanism prediction Figure S1: Example of a CASP validation of the multistep reaction visible in figure 1 of the PaRoutes paper (33). Each step of this retrosynthesis (numb...

  39. [39]

    Fully mapped elementary steps (similar to the Flower (19) format)

  40. [40]

    From a reactant and the set of all arrows (similar to the mech-USPTO-31k (22) format)

  41. [41]

    reaction without by- products

    SMIRKS accompanied by an arrow-code (similar to PMechDB (21) format) Figure S2: Character length distribution to encode the same mechanistic data using the different formats mentioned in this work. The main difference between equilibrated and minimal MechSMILES is that the latter will not explicitely rewrite species that do not interact in the specific el...