Agentic generation of verifiable rules for deterministic, self-expanding reaction classification

Daniel Armstrong; Helena Avila; J\'er\^ome Waser; Maarten Dobbelaere; Octavian Susanu; Philippe Schwaller; Valentas Olikauskas

arxiv: 2607.01061 · v1 · pith:Q3UNYMKFnew · submitted 2026-07-01 · 💻 cs.AI · cs.CL

Agentic generation of verifiable rules for deterministic, self-expanding reaction classification

Daniel Armstrong , Maarten Dobbelaere , Valentas Olikauskas , Helena Avila , Octavian Susanu , J\'er\^ome Waser , Philippe Schwaller This is my paper

Pith reviewed 2026-07-02 12:17 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords reaction classificationmulti-agent LLMschemical taxonomypatent reactionsrule generationdeterministic classifierssynthesis planning

0 comments

The pith

A multi-agent LLM pipeline generates 14,073 verifiable reaction rules from patents without human input.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an automated pipeline in which LLM agents classify reactions from 665,901 US patents and write deterministic rules for each class under a verification loop that tests each rule against the corpus. This process expands a standard taxonomy from 68 classes to 14,073 classes. A lightweight fingerprint classifier trained on the resulting rules then labels 97.7 percent of unseen reactions, matching the accuracy of a leading proprietary system while distinguishing finer chemical distinctions and extending to reactions outside the original corpus. The outcome is positioned as a living reactivity database that converts generative models into self-expanding symbolic systems for synthesis planning.

Core claim

The multi-agent framework classifies reactions across the patent corpus and writes deterministic, verifiable rules for each class under an automated verification loop, expanding the taxonomy from 68 to 14,073 classes and supporting a fingerprint classifier that covers 97.7 percent of unseen reactions with greater resolution than fixed taxonomies.

What carries the argument

The multi-agent verification loop that generates each rule and tests it against the full reaction corpus to ensure determinism and coverage.

If this is right

The expanded set of rules supports finer-grained reaction classification than existing fixed taxonomies.
The classifier matches proprietary performance on unseen reactions while remaining extendable on demand.
The rules stay deterministic and interpretable, directly usable in computer-assisted synthesis planning.
The database can incorporate new reactions without manual re-curation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Continuous addition of new patent data could keep the taxonomy current without repeated human oversight.
The same agentic loop might be tested on non-patent reaction sources to check whether the verification step still prevents drift.
Integration of the rule set into existing synthesis planners could be measured by whether route prediction success rates increase with the added granularity.

Load-bearing premise

The verification loop produces rules that stay deterministic, free of LLM hallucinations or biases, and generalize to reactions outside the patent corpus.

What would settle it

Running the generated rules on a held-out set of reactions drawn from sources other than the 665,901-patent corpus and finding that the classifier assigns inconsistent labels or covers substantially fewer than 90 percent of them would falsify the generalizability and accuracy claims.

Figures

Figures reproduced from arXiv: 2607.01061 by Daniel Armstrong, Helena Avila, J\'er\^ome Waser, Maarten Dobbelaere, Octavian Susanu, Philippe Schwaller, Valentas Olikauskas.

**Figure 2.** Figure 2: In a. we show an example of a set of reactions classified into a single bucket by NameRXN, alongside distinct classes proposed by our LLM based methodology. In part b. we show how the taxonomy is adapted by the LLM to observed chemical data. to the 0.59% obtained for NameRXN by the same method. Thus indicating that the fully automated pipeline may achieve label reliability on par with human expert curation… view at source ↗

**Figure 3.** Figure 3: In a. we demonstrate a worked example of the template generalisation approach for pyrazolo[1,5-a]pyrimidine synthesis. The coloured molecules after the bottom arrows are the result of applying the generated template to the reactants. In b. we highlight a scheme demonstrating the template ordering. in (i) the value n assigned to an edge from A to B indicates that n templates of Class A produce a false posit… view at source ↗

**Figure 4.** Figure 4: Distribution of extrapolated reactions per class at three hierarchy levels. L3 (Type): 1,545 [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗

**Figure 5.** Figure 5: Label diversity across L3 classes. Left: scatter plot of template count vs. unique class codes per L3 class, coloured by diversity ratio (green = diverse, red = uniform). The dashed line indicates maximal diversity (y = x). Right: histogram of diversity ratios across all 1,545 L3 classes. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 7.** Figure 7: Representative reactions for each subtype in class 3.10.3 (Aromatic Formylation). [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

**Figure 8.** Figure 8: Representative reactions for each subtype in class 9.1.1 (Hydrohalogenation of Alkenes). [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Per-template coverage on the NNNS-2025 single-reaction-centre subset ( [PITH_FULL_IMAGE:figures/full_fig_p041_9.png] view at source ↗

read the original abstract

Computer-assisted synthesis planning breaks target molecules into accessible precursors using large libraries of reaction rules that assign each transformation a deterministic, interpretable label. But chemistry is long-tailed, making manual encoding intractable, and existing tools rely on fixed rulesets that cannot adapt to new chemistries. Here we present a fully automated pipeline in which a multi-agent framework of large language models (LLMs) classifies reactions and writes the rules themselves across 665,901 US patent reactions, generating each rule under a verification loop that tests it against the corpus. It expands a standard taxonomy from 68 to 14,073 classes without human curation. With a lightweight fingerprint classifier, it classifies 97.7\% of unseen reactions, matching a leading proprietary classifier while resolving chemistry more finely and extending on demand to chemistry outside its training distribution. The result is a living reactivity database and a general route to turning generative models into reliable, self-expanding symbolic systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Multi-agent LLMs generate and internally verify 14k reaction rules from patents, expanding the taxonomy 200x with 97.7% accuracy on held-out patent data, but the verification never leaves that corpus.

read the letter

The main takeaway is that the authors built a multi-agent LLM system that proposes reaction classification rules, tests them against 665k US patent reactions, and produces a taxonomy of 14,073 classes instead of the usual 68, then trains a fingerprint classifier that reaches 97.7% on unseen reactions from the same split.

What is actually new is the combination of scale and the closed verification loop that lets the system write and check its own rules without human curation. That part is useful for anyone who has watched reaction databases lag behind new chemistry because manual encoding is too slow.

The work does well at demonstrating a concrete pipeline that turns generative models into something that outputs deterministic labels at this volume. The numbers on taxonomy growth and classification accuracy are specific, and the claim that it matches a leading proprietary tool while being finer-grained is worth checking.

The soft spot is that the entire verification loop runs inside the patent corpus. The 97.7% figure is only on reactions from the same 665k set, so we have no data on whether the rules carry over to journal articles or reactions with different distributions. The abstract gives no detail on observed failure modes, how patent biases were addressed, or what the loop actually rejects, which leaves the determinism claim hard to evaluate.

This is for people working on synthesis planning software who need broader, updatable reaction coverage. A reader who builds or maintains route design tools would get practical value from the method, even if they treat the rules as a starting point that needs external checks.

I would send it to peer review. The approach tackles a genuine bottleneck with a workable pipeline and concrete results, and referees can press on the generalization and implementation details.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a multi-agent LLM pipeline that generates and verifies reaction classification rules from 665,901 US patent reactions. It expands a standard taxonomy from 68 to 14,073 classes without human curation. A lightweight fingerprint classifier achieves 97.7% accuracy on unseen reactions from the corpus, matching a proprietary baseline while offering finer resolution and claiming the ability to extend on demand to chemistries outside the training distribution, yielding a self-expanding symbolic reactivity database.

Significance. If the verification loop produces deterministic, bias-free rules that generalize beyond the patent corpus, the work would be significant for computer-assisted synthesis planning by addressing the long-tailed nature of reactions through scalable, interpretable, and adaptive rule generation. The automated expansion to over 14,000 classes at this scale, combined with reported performance parity to proprietary tools, represents a technical contribution toward turning generative models into reliable symbolic systems.

major comments (2)

[Abstract and Results] Abstract and Results: The central claim that the system 'extends on demand to chemistry outside its training distribution' is not supported by the reported experiments. The 97.7% accuracy applies only to unseen reactions from the same 665,901-patent corpus split; no evaluation on independent sources (journal articles or non-patent databases) is described, so corpus-specific biases cannot be ruled out and OOD generalizability remains unshown.
[Methods/Verification Loop] Methods/Verification Loop: The abstract supplies concrete performance numbers (97.7%, 14,073 classes) but supplies no information on the verification loop implementation, observed failure modes, handling of patent data biases, or whether the accuracy includes error bars or strict hold-out protocols; these omissions prevent assessment of whether the rules are deterministic and free of LLM-induced artifacts.

minor comments (1)

[Methods] The manuscript would benefit from an explicit definition or pseudocode for the 'lightweight fingerprint classifier' and how it interfaces with the generated rules.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address the two major comments point by point, indicating the revisions we will make.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results: The central claim that the system 'extends on demand to chemistry outside its training distribution' is not supported by the reported experiments. The 97.7% accuracy applies only to unseen reactions from the same 665,901-patent corpus split; no evaluation on independent sources (journal articles or non-patent databases) is described, so corpus-specific biases cannot be ruled out and OOD generalizability remains unshown.

Authors: We agree that the 97.7% accuracy is measured on a hold-out split drawn from the same 665,901-patent corpus and does not constitute an external test on journal articles or other independent databases. The statement that the system 'extends on demand to chemistry outside its training distribution' refers to the architectural property of the multi-agent pipeline: new rules can be generated and verified for any reaction presented to the system without retraining the downstream fingerprint classifier. Nevertheless, we accept that this architectural capability has not been demonstrated on data sources outside the patent corpus. In the revised manuscript we will qualify the claim in the abstract, results, and discussion, explicitly distinguishing the demonstrated intra-corpus self-expansion from untested cross-corpus generalization and noting external validation as future work. revision: partial
Referee: [Methods/Verification Loop] Methods/Verification Loop: The abstract supplies concrete performance numbers (97.7%, 14,073 classes) but supplies no information on the verification loop implementation, observed failure modes, handling of patent data biases, or whether the accuracy includes error bars or strict hold-out protocols; these omissions prevent assessment of whether the rules are deterministic and free of LLM-induced artifacts.

Authors: The full manuscript contains a Methods section that describes the verification loop, but we acknowledge that the abstract and high-level results summary omit the requested implementation details. We will expand the main text (and, if necessary, the supplementary information) to include: (i) the concrete prompts and consensus rules used in the multi-agent verification loop, (ii) the failure modes observed during rule generation (e.g., ambiguous patent language or conflicting agent outputs), (iii) the steps taken to mitigate patent-specific biases such as duplicate or noisy entries, and (iv) confirmation of the strict temporal or random hold-out protocol together with any error bars or confidence intervals on the reported accuracy. These additions will allow readers to evaluate determinism and the absence of LLM-induced artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The pipeline generates rules via LLM agents under a verification loop against the 665901-patent corpus and reports 97.7% accuracy on held-out reactions from the same corpus. This constitutes a standard train/test split with no reduction of the reported taxonomy size or accuracy metric to a quantity defined by construction from the inputs. No self-definitional steps, fitted parameters presented as predictions, load-bearing self-citations, or ansatz smuggling appear in the described chain. The result is self-contained against the internal corpus benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; the central claim rests on the unexamined premise that LLM agents can produce chemically correct, deterministic rules at scale.

axioms (1)

domain assumption LLM agents operating in a verification loop can generate chemically accurate and generalizable reaction classification rules without human oversight or systematic bias
This assumption is required for the pipeline to produce the claimed 14,073 classes and 97.7% accuracy on unseen data.

pith-pipeline@v0.9.1-grok · 5715 in / 1312 out tokens · 25979 ms · 2026-07-02T12:17:27.377123+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 54 canonical work pages · 14 internal anchors

[1]

Journal of Chemical Information and Modeling , author =

Reaction. Journal of Chemical Information and Modeling , author =. 2021 , note =. doi:10.1021/acs.jcim.0c01480 , abstract =

work page doi:10.1021/acs.jcim.0c01480 2021
[2]

arXiv preprint arXiv:2501.13299 , year=

Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents , author=. arXiv preprint arXiv:2501.13299 , year=

work page arXiv
[3]

Machine Learning: Science and Technology , year=

Large Language Models for Causal Hypothesis Generation in Science , author=. Machine Learning: Science and Technology , year=
[4]

Journal of Computing and Information Science in Engineering , volume=

Evaluating large language models for material selection , author=. Journal of Computing and Information Science in Engineering , volume=. 2025 , publisher=

2025
[5]

arXiv preprint arXiv:2409.13740 , year=

Language agents achieve superhuman synthesis of scientific knowledge , author=. arXiv preprint arXiv:2409.13740 , year=

work page arXiv
[6]

2023 , eprint=

Language models can generate molecules, materials, and protein binding sites directly in three dimensions as XYZ, CIF, and PDB files , author=. 2023 , eprint=

2023
[7]

Pat Walters , url =. Silly. Silly Things Large Language Models Do With Molecules , file =
[8]

2024 , eprint=

Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity , author=. 2024 , eprint=

2024
[9]

Journal of medicinal chemistry , volume=

The medicinal chemist’s toolbox: an analysis of reactions used in the pursuit of drug candidates , author=. Journal of medicinal chemistry , volume=. 2011 , publisher=

2011
[10]

ArXiv , year=

Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry , author=. ArXiv , year=

2024
[11]

Chemical Communications , author =

Mechanism to model: a physical organic chemistry approach to reaction prediction , volume =. Chemical Communications , author =. 2023 , note =. doi:10.1039/D3CC03229A , abstract =

work page doi:10.1039/d3cc03229a 2023
[12]

2023 , eprint=

Holistic chemical evaluation reveals pitfalls in reaction prediction models , author=. 2023 , eprint=

2023
[13]

Journal of medicinal chemistry , volume=

Big data from pharmaceutical patents: a computational analysis of medicinal chemists’ bread and butter , author=. Journal of medicinal chemistry , volume=. 2016 , publisher=

2016
[14]

ACS Central Science , volume=

Unbiasing retrosynthesis language models with disconnection prompts , author=. ACS Central Science , volume=. 2023 , publisher=

2023
[15]

Chemistry of Materials , volume=

Fast customization of chemical language models to out-of-distribution data sets , author=. Chemistry of Materials , volume=. 2023 , publisher=

2023
[16]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer , author=. arXiv preprint arXiv:1701.06538 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

ChemMedChem , volume=

On the art of compiling and using'drug-like'chemical fragment spaces , author=. ChemMedChem , volume=
[18]

Briefings in Bioinformatics , volume =

Xie, Ailin and Zhang, Ziqiao and Guan, Jihong and Zhou, Shuigeng , title = ". Briefings in Bioinformatics , volume =. 2023 , month =. doi:10.1093/bib/bbad296 , url =

work page doi:10.1093/bib/bbad296 2023
[19]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[20]

arXiv preprint arXiv:2405.06682 , year=

Self-Reflection in LLM Agents: Effects on Problem-Solving Performance , author=. arXiv preprint arXiv:2405.06682 , year=

work page arXiv
[21]

arXiv preprint arXiv:2311.10776 , year=

Chemist-X: Large language model-empowered agent for reaction condition recommendation in chemical synthesis, arXiv, 2023 , author=. arXiv preprint arXiv:2311.10776 , year=

work page arXiv 2023
[22]

Journal of Cheminformatics , volume=

Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices , author=. Journal of Cheminformatics , volume=. 2024 , publisher=

2024
[23]

arXiv preprint arXiv:2407.16867 , year=

From text to insight: large language models for materials science data extraction , author=. arXiv preprint arXiv:2407.16867 , year=

work page arXiv
[24]

arXiv preprint arXiv:2307.07443 , year=

Can large language models empower molecular property prediction? , author=. arXiv preprint arXiv:2307.07443 , year=

work page arXiv
[25]

Briefings in Bioinformatics , volume=

Drugassist: A large language model for molecule optimization , author=. Briefings in Bioinformatics , volume=. 2025 , publisher=

2025
[26]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[27]

arXiv preprint arXiv:2404.01475 , year=

Are large language models superhuman chemists? , author=. arXiv preprint arXiv:2404.01475 , year=

work page arXiv
[28]

Scaling Laws for Neural Language Models

Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2001
[29]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Scaling llm test-time compute optimally can be more effective than scaling model parameters , author=. arXiv preprint arXiv:2408.03314 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[30]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[31]

IEEE Transactions on Computational Intelligence and AI in games , volume=

A survey of monte carlo tree search methods , author=. IEEE Transactions on Computational Intelligence and AI in games , volume=. 2012 , publisher=

2012
[32]

LiteLLM , howpublished =
[33]

GPT-4 Technical Report

GPT-4 Technical Report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[34]

2023 , howpublished =

Model Card and Evaluations for Claude Models , author =. 2023 , howpublished =

2023
[35]

Journal of Medicinal Chemistry , author =

Design and. Journal of Medicinal Chemistry , author =. 2024 , note =. doi:10.1021/acs.jmedchem.4c00743 , number =

work page doi:10.1021/acs.jmedchem.4c00743 2024
[36]

ChemMedChem , author =

Development of. ChemMedChem , author =. 2018 , note =. doi:10.1002/cmdc.201800188 , abstract =

work page doi:10.1002/cmdc.201800188 2018
[37]

and Hinshaw, Stephen M

Zhu, Xijun and Byun, Woong Sub and Pieńkowska, Dominika Ewa and Nguyen, Kha The and Gerhartz, Jan and Geng, Qixiang and Qiu, Tian and Zhong, Jianing and Jiang, Zixuan and Wang, Mengxiong and Sarott, Roman C. and Hinshaw, Stephen M. and Zhang, Tinghu and Attardi, Laura D. and Nowak, Radosław P. and Gray, Nathanael S. , month = oct, year =. Activating. doi:...

work page doi:10.1101/2024.10.23.619961 2024
[38]

LLaMA: Open and Efficient Foundation Language Models

LLaMA: Open and Efficient Foundation Language Models , author=. arXiv preprint arXiv:2302.13971 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[39]

2025 , howpublished =

From DeepSeek LLM to DeepSeek R1 , author =. 2025 , howpublished =

2025
[40]

Qwen2.5 Technical Report

Qwen2.5 Technical Report , author=. arXiv preprint arXiv:2412.15115 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[41]

IEEE transactions on Systems Science and Cybernetics , volume=

A formal basis for the heuristic determination of minimum cost paths , author=. IEEE transactions on Systems Science and Cybernetics , volume=. 1968 , publisher=

1968
[42]

arXiv preprint arXiv:2310.19796 , year=

Re-evaluating Retrosynthesis Algorithms with Syntheseus , author=. arXiv preprint arXiv:2310.19796 , year=

work page arXiv
[43]

doi:10.6084/m9.figshare.30978826.v1 , url =

van der Lingen, Riky , title =. doi:10.6084/m9.figshare.30978826.v1 , url =

work page doi:10.6084/m9.figshare.30978826.v1
[44]

Advanced Synthesis & Catalysis , volume=

Iridium-Catalysed Reductive Deoxygenation of Ketones with Formic Acid as Traceless Hydride Donor , author=. Advanced Synthesis & Catalysis , volume=. 2020 , publisher=

2020
[45]

International Conference on Machine Learning , pages=

Retrosynthetic planning with dual value networks , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[46]

ACS central science , volume=

Learning retrosynthetic planning through simulated experience , author=. ACS central science , volume=. 2019 , publisher=

2019
[47]

Communications Chemistry , volume=

Retrosynthetic planning with experience-guided Monte Carlo tree search , author=. Communications Chemistry , volume=. 2023 , publisher=

2023
[48]

Nature Communications , volume=

Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing , author=. Nature Communications , volume=. 2023 , publisher=

2023
[49]

Communications Chemistry , volume=

G 2 Retro as a two-step graph generative models for retrosynthesis prediction , author=. Communications Chemistry , volume=. 2023 , publisher=

2023
[50]

Molecular Systems Design & Engineering , volume=

Application of automated network generation for retrosynthetic planning of potential corrosion inhibitors , author=. Molecular Systems Design & Engineering , volume=. 2024 , publisher=

2024
[51]

Nature , volume=

Computer-designed repurposing of chemical wastes into drugs , author=. Nature , volume=. 2022 , publisher=

2022
[52]

Nature Synthesis , pages=

Computational synthesis design for controlled degradation and revalorization , author=. Nature Synthesis , pages=. 2024 , publisher=

2024
[53]

Tetrahedron , volume=

New and efficient approaches to the semisynthesis of taxol and its C-13 side chain analogs by means of -lactam synthon method , author=. Tetrahedron , volume=. 1992 , publisher=

1992
[54]

Chemical reviews , volume=

Navigating the chiral pool in the total synthesis of complex terpene natural products , author=. Chemical reviews , volume=. 2017 , publisher=

2017
[55]

2023 , eprint=

Predictive Chemistry Augmented with Text Retrieval , author=. 2023 , eprint=

2023
[56]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba: Linear-time sequence modeling with selective state spaces , author=. arXiv preprint arXiv:2312.00752 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[57]

Jamba: A Hybrid Transformer-Mamba Language Model

Jamba: A Hybrid Transformer-Mamba Language Model , author=. arXiv preprint arXiv:2403.19887 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[58]

Nature Communications , author =

Extracting medicinal chemistry intuition via preference machine learning , volume =. Nature Communications , author =. 2023 , note =. doi:10.1038/s41467-023-42242-1 , abstract =

work page doi:10.1038/s41467-023-42242-1 2023
[59]

Digital Discovery , volume=

Enhancing diversity in language based models for single-step retrosynthesis , author=. Digital Discovery , volume=. 2023 , publisher=

2023
[60]

2024 , eprint=

Mastering Board Games by External and Internal Planning with Language Models , author=. 2024 , eprint=

2024
[61]

Nature Machine Intelligence , pages=

Augmenting large language models with chemistry tools , author=. Nature Machine Intelligence , pages=. 2024 , publisher=

2024
[62]

Journal of the American Chemical Society , volume=

Synthesis of some substituted benzimidazolones , author=. Journal of the American Chemical Society , volume=. 1958 , publisher=

1958
[63]

Studies in Chemotherapy. IX. Ureylenebenzene and Cyclohexane Derivatives as Biotin Antagonists1 , author=. Journal of the American Chemical Society , volume=. 1945 , publisher=

1945
[64]

Bioorganic & medicinal chemistry , volume=

Synthesis and biological evaluation of santacruzamate A analogues for anti-proliferative and immunomodulatory activity , author=. Bioorganic & medicinal chemistry , volume=. 2016 , publisher=

2016
[65]

Nature Chemistry , author =

Probing the chemical ‘reactome’ with high-throughput experimentation data , copyright =. Nature Chemistry , author =. 2024 , note =. doi:10.1038/s41557-023-01393-w , language =

work page doi:10.1038/s41557-023-01393-w 2024
[66]

EROS A computer program for generating sequences of reactions , pages =

Gasteiger, Johann and Jochum, Clemens , booktitle =. EROS A computer program for generating sequences of reactions , pages =
[67]

Krenn, Mario and H. Mach. Learn.: Sci. Technol. , publisher =
[68]

Sequence to sequence learning with neural networks , pages =

Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V , booktitle =. Sequence to sequence learning with neural networks , pages =
[69]

2019 , eprint=

A Generative Model For Electron Paths , author=. 2019 , eprint=

2019
[70]

Thakkar, Amol and Selmi, Nidhal and Reymond, Jean-Louis and Engkvist, Ola and Bjerrum, Esben Jannik , title =. J. Med. Chem. , publisher =
[71]

Predicting reaction performance in C--N cross-coupling using machine learning

Response to Comment on “Predicting reaction performance in C--N cross-coupling using machine learning” , author=. Science , volume=. 2018 , publisher=

2018
[72]

33rd Conference on Neural Information Processing Systems (NeurIPS 2019) , title =

Bradshaw, J and Paige, B and Kusner, MJ and Segler, MHS and Hern. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) , title =

2019
[73]

Computer-assisted synthetic planning: the end of the beginning , pages =

Szymku. Computer-assisted synthetic planning: the end of the beginning , pages =. Angew. Chem. - Int. Ed. , publisher =
[74]

Guido Falk von Rudorff and Stefan N Heinen and Marco Bragato and O Anatole von Lilienfeld , title =. Mach. Learn.: Sci. Technol. , month = oct, publisher =
[75]

Atom-to-atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies , author=. Mol. Inf. , pages=. 2020 , publisher=

2020
[76]

Chemical science , volume=

Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans , author=. Chemical science , volume=. 2019 , publisher=

2019
[77]

and Badowski, Tomasz and Grzybowski, Bartosz A

Beker, Wiktor and Gajewska, Ewa P. and Badowski, Tomasz and Grzybowski, Bartosz A. , title =. Angew. Chem. - Int. Ed. , keywords =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/anie.201806920 , pages =

work page doi:10.1002/anie.201806920
[78]

Accounts of Chemical Research , volume=

ASKCOS: Open-Source, Data-Driven Synthesis Planning , author=. Accounts of Chemical Research , volume=. 2025 , publisher=

2025
[79]

Li, Xin and Zhang, Shuo-Qing and Xu, Li-Cheng and Hong, Xin , title =. Angew. Chem. , publisher =
[80]

and Saigiridharan, Lakshidaa and Genheden, Samuel , month = may, year =

Westerlund, Annie M. and Saigiridharan, Lakshidaa and Genheden, Samuel , month = may, year =. Constrained synthesis planning with disconnection-aware transformer and multi-objective search , url =. doi:10.26434/chemrxiv-2024-c77p4 , abstract =

work page doi:10.26434/chemrxiv-2024-c77p4 2024

Showing first 80 references.

[1] [1]

Journal of Chemical Information and Modeling , author =

Reaction. Journal of Chemical Information and Modeling , author =. 2021 , note =. doi:10.1021/acs.jcim.0c01480 , abstract =

work page doi:10.1021/acs.jcim.0c01480 2021

[2] [2]

arXiv preprint arXiv:2501.13299 , year=

Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents , author=. arXiv preprint arXiv:2501.13299 , year=

work page arXiv

[3] [3]

Machine Learning: Science and Technology , year=

Large Language Models for Causal Hypothesis Generation in Science , author=. Machine Learning: Science and Technology , year=

[4] [4]

Journal of Computing and Information Science in Engineering , volume=

Evaluating large language models for material selection , author=. Journal of Computing and Information Science in Engineering , volume=. 2025 , publisher=

2025

[5] [5]

arXiv preprint arXiv:2409.13740 , year=

Language agents achieve superhuman synthesis of scientific knowledge , author=. arXiv preprint arXiv:2409.13740 , year=

work page arXiv

[6] [6]

2023 , eprint=

Language models can generate molecules, materials, and protein binding sites directly in three dimensions as XYZ, CIF, and PDB files , author=. 2023 , eprint=

2023

[7] [7]

Pat Walters , url =. Silly. Silly Things Large Language Models Do With Molecules , file =

[8] [8]

2024 , eprint=

Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity , author=. 2024 , eprint=

2024

[9] [9]

Journal of medicinal chemistry , volume=

The medicinal chemist’s toolbox: an analysis of reactions used in the pursuit of drug candidates , author=. Journal of medicinal chemistry , volume=. 2011 , publisher=

2011

[10] [10]

ArXiv , year=

Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry , author=. ArXiv , year=

2024

[11] [11]

Chemical Communications , author =

Mechanism to model: a physical organic chemistry approach to reaction prediction , volume =. Chemical Communications , author =. 2023 , note =. doi:10.1039/D3CC03229A , abstract =

work page doi:10.1039/d3cc03229a 2023

[12] [12]

2023 , eprint=

Holistic chemical evaluation reveals pitfalls in reaction prediction models , author=. 2023 , eprint=

2023

[13] [13]

Journal of medicinal chemistry , volume=

Big data from pharmaceutical patents: a computational analysis of medicinal chemists’ bread and butter , author=. Journal of medicinal chemistry , volume=. 2016 , publisher=

2016

[14] [14]

ACS Central Science , volume=

Unbiasing retrosynthesis language models with disconnection prompts , author=. ACS Central Science , volume=. 2023 , publisher=

2023

[15] [15]

Chemistry of Materials , volume=

Fast customization of chemical language models to out-of-distribution data sets , author=. Chemistry of Materials , volume=. 2023 , publisher=

2023

[16] [16]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer , author=. arXiv preprint arXiv:1701.06538 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

ChemMedChem , volume=

On the art of compiling and using'drug-like'chemical fragment spaces , author=. ChemMedChem , volume=

[18] [18]

Briefings in Bioinformatics , volume =

Xie, Ailin and Zhang, Ziqiao and Guan, Jihong and Zhou, Shuigeng , title = ". Briefings in Bioinformatics , volume =. 2023 , month =. doi:10.1093/bib/bbad296 , url =

work page doi:10.1093/bib/bbad296 2023

[19] [19]

The Llama 3 Herd of Models

The llama 3 herd of models , author=. arXiv preprint arXiv:2407.21783 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

arXiv preprint arXiv:2405.06682 , year=

Self-Reflection in LLM Agents: Effects on Problem-Solving Performance , author=. arXiv preprint arXiv:2405.06682 , year=

work page arXiv

[21] [21]

arXiv preprint arXiv:2311.10776 , year=

Chemist-X: Large language model-empowered agent for reaction condition recommendation in chemical synthesis, arXiv, 2023 , author=. arXiv preprint arXiv:2311.10776 , year=

work page arXiv 2023

[22] [22]

Journal of Cheminformatics , volume=

Rxn-INSIGHT: fast chemical reaction analysis using bond-electron matrices , author=. Journal of Cheminformatics , volume=. 2024 , publisher=

2024

[23] [23]

arXiv preprint arXiv:2407.16867 , year=

From text to insight: large language models for materials science data extraction , author=. arXiv preprint arXiv:2407.16867 , year=

work page arXiv

[24] [24]

arXiv preprint arXiv:2307.07443 , year=

Can large language models empower molecular property prediction? , author=. arXiv preprint arXiv:2307.07443 , year=

work page arXiv

[25] [25]

Briefings in Bioinformatics , volume=

Drugassist: A large language model for molecule optimization , author=. Briefings in Bioinformatics , volume=. 2025 , publisher=

2025

[26] [26]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[27] [27]

arXiv preprint arXiv:2404.01475 , year=

Are large language models superhuman chemists? , author=. arXiv preprint arXiv:2404.01475 , year=

work page arXiv

[28] [28]

Scaling Laws for Neural Language Models

Scaling laws for neural language models , author=. arXiv preprint arXiv:2001.08361 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2001

[29] [29]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Scaling llm test-time compute optimally can be more effective than scaling model parameters , author=. arXiv preprint arXiv:2408.03314 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

IEEE Transactions on Computational Intelligence and AI in games , volume=

A survey of monte carlo tree search methods , author=. IEEE Transactions on Computational Intelligence and AI in games , volume=. 2012 , publisher=

2012

[32] [32]

LiteLLM , howpublished =

[33] [33]

GPT-4 Technical Report

GPT-4 Technical Report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

2023 , howpublished =

Model Card and Evaluations for Claude Models , author =. 2023 , howpublished =

2023

[35] [35]

Journal of Medicinal Chemistry , author =

Design and. Journal of Medicinal Chemistry , author =. 2024 , note =. doi:10.1021/acs.jmedchem.4c00743 , number =

work page doi:10.1021/acs.jmedchem.4c00743 2024

[36] [36]

ChemMedChem , author =

Development of. ChemMedChem , author =. 2018 , note =. doi:10.1002/cmdc.201800188 , abstract =

work page doi:10.1002/cmdc.201800188 2018

[37] [37]

and Hinshaw, Stephen M

Zhu, Xijun and Byun, Woong Sub and Pieńkowska, Dominika Ewa and Nguyen, Kha The and Gerhartz, Jan and Geng, Qixiang and Qiu, Tian and Zhong, Jianing and Jiang, Zixuan and Wang, Mengxiong and Sarott, Roman C. and Hinshaw, Stephen M. and Zhang, Tinghu and Attardi, Laura D. and Nowak, Radosław P. and Gray, Nathanael S. , month = oct, year =. Activating. doi:...

work page doi:10.1101/2024.10.23.619961 2024

[38] [38]

LLaMA: Open and Efficient Foundation Language Models

LLaMA: Open and Efficient Foundation Language Models , author=. arXiv preprint arXiv:2302.13971 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[39] [39]

2025 , howpublished =

From DeepSeek LLM to DeepSeek R1 , author =. 2025 , howpublished =

2025

[40] [40]

Qwen2.5 Technical Report

Qwen2.5 Technical Report , author=. arXiv preprint arXiv:2412.15115 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[41] [41]

IEEE transactions on Systems Science and Cybernetics , volume=

A formal basis for the heuristic determination of minimum cost paths , author=. IEEE transactions on Systems Science and Cybernetics , volume=. 1968 , publisher=

1968

[42] [42]

arXiv preprint arXiv:2310.19796 , year=

Re-evaluating Retrosynthesis Algorithms with Syntheseus , author=. arXiv preprint arXiv:2310.19796 , year=

work page arXiv

[43] [43]

doi:10.6084/m9.figshare.30978826.v1 , url =

van der Lingen, Riky , title =. doi:10.6084/m9.figshare.30978826.v1 , url =

work page doi:10.6084/m9.figshare.30978826.v1

[44] [44]

Advanced Synthesis & Catalysis , volume=

Iridium-Catalysed Reductive Deoxygenation of Ketones with Formic Acid as Traceless Hydride Donor , author=. Advanced Synthesis & Catalysis , volume=. 2020 , publisher=

2020

[45] [45]

International Conference on Machine Learning , pages=

Retrosynthetic planning with dual value networks , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023

[46] [46]

ACS central science , volume=

Learning retrosynthetic planning through simulated experience , author=. ACS central science , volume=. 2019 , publisher=

2019

[47] [47]

Communications Chemistry , volume=

Retrosynthetic planning with experience-guided Monte Carlo tree search , author=. Communications Chemistry , volume=. 2023 , publisher=

2023

[48] [48]

Nature Communications , volume=

Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing , author=. Nature Communications , volume=. 2023 , publisher=

2023

[49] [49]

Communications Chemistry , volume=

G 2 Retro as a two-step graph generative models for retrosynthesis prediction , author=. Communications Chemistry , volume=. 2023 , publisher=

2023

[50] [50]

Molecular Systems Design & Engineering , volume=

Application of automated network generation for retrosynthetic planning of potential corrosion inhibitors , author=. Molecular Systems Design & Engineering , volume=. 2024 , publisher=

2024

[51] [51]

Nature , volume=

Computer-designed repurposing of chemical wastes into drugs , author=. Nature , volume=. 2022 , publisher=

2022

[52] [52]

Nature Synthesis , pages=

Computational synthesis design for controlled degradation and revalorization , author=. Nature Synthesis , pages=. 2024 , publisher=

2024

[53] [53]

Tetrahedron , volume=

New and efficient approaches to the semisynthesis of taxol and its C-13 side chain analogs by means of -lactam synthon method , author=. Tetrahedron , volume=. 1992 , publisher=

1992

[54] [54]

Chemical reviews , volume=

Navigating the chiral pool in the total synthesis of complex terpene natural products , author=. Chemical reviews , volume=. 2017 , publisher=

2017

[55] [55]

2023 , eprint=

Predictive Chemistry Augmented with Text Retrieval , author=. 2023 , eprint=

2023

[56] [56]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba: Linear-time sequence modeling with selective state spaces , author=. arXiv preprint arXiv:2312.00752 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[57] [57]

Jamba: A Hybrid Transformer-Mamba Language Model

Jamba: A Hybrid Transformer-Mamba Language Model , author=. arXiv preprint arXiv:2403.19887 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[58] [58]

Nature Communications , author =

Extracting medicinal chemistry intuition via preference machine learning , volume =. Nature Communications , author =. 2023 , note =. doi:10.1038/s41467-023-42242-1 , abstract =

work page doi:10.1038/s41467-023-42242-1 2023

[59] [59]

Digital Discovery , volume=

Enhancing diversity in language based models for single-step retrosynthesis , author=. Digital Discovery , volume=. 2023 , publisher=

2023

[60] [60]

2024 , eprint=

Mastering Board Games by External and Internal Planning with Language Models , author=. 2024 , eprint=

2024

[61] [61]

Nature Machine Intelligence , pages=

Augmenting large language models with chemistry tools , author=. Nature Machine Intelligence , pages=. 2024 , publisher=

2024

[62] [62]

Journal of the American Chemical Society , volume=

Synthesis of some substituted benzimidazolones , author=. Journal of the American Chemical Society , volume=. 1958 , publisher=

1958

[63] [63]

Studies in Chemotherapy. IX. Ureylenebenzene and Cyclohexane Derivatives as Biotin Antagonists1 , author=. Journal of the American Chemical Society , volume=. 1945 , publisher=

1945

[64] [64]

Bioorganic & medicinal chemistry , volume=

Synthesis and biological evaluation of santacruzamate A analogues for anti-proliferative and immunomodulatory activity , author=. Bioorganic & medicinal chemistry , volume=. 2016 , publisher=

2016

[65] [65]

Nature Chemistry , author =

Probing the chemical ‘reactome’ with high-throughput experimentation data , copyright =. Nature Chemistry , author =. 2024 , note =. doi:10.1038/s41557-023-01393-w , language =

work page doi:10.1038/s41557-023-01393-w 2024

[66] [66]

EROS A computer program for generating sequences of reactions , pages =

Gasteiger, Johann and Jochum, Clemens , booktitle =. EROS A computer program for generating sequences of reactions , pages =

[67] [67]

Krenn, Mario and H. Mach. Learn.: Sci. Technol. , publisher =

[68] [68]

Sequence to sequence learning with neural networks , pages =

Sutskever, Ilya and Vinyals, Oriol and Le, Quoc V , booktitle =. Sequence to sequence learning with neural networks , pages =

[69] [69]

2019 , eprint=

A Generative Model For Electron Paths , author=. 2019 , eprint=

2019

[70] [70]

Thakkar, Amol and Selmi, Nidhal and Reymond, Jean-Louis and Engkvist, Ola and Bjerrum, Esben Jannik , title =. J. Med. Chem. , publisher =

[71] [71]

Predicting reaction performance in C--N cross-coupling using machine learning

Response to Comment on “Predicting reaction performance in C--N cross-coupling using machine learning” , author=. Science , volume=. 2018 , publisher=

2018

[72] [72]

33rd Conference on Neural Information Processing Systems (NeurIPS 2019) , title =

Bradshaw, J and Paige, B and Kusner, MJ and Segler, MHS and Hern. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) , title =

2019

[73] [73]

Computer-assisted synthetic planning: the end of the beginning , pages =

Szymku. Computer-assisted synthetic planning: the end of the beginning , pages =. Angew. Chem. - Int. Ed. , publisher =

[74] [74]

Guido Falk von Rudorff and Stefan N Heinen and Marco Bragato and O Anatole von Lilienfeld , title =. Mach. Learn.: Sci. Technol. , month = oct, publisher =

[75] [75]

Atom-to-atom Mapping: A Benchmarking Study of Popular Mapping Algorithms and Consensus Strategies , author=. Mol. Inf. , pages=. 2020 , publisher=

2020

[76] [76]

Chemical science , volume=

Selection of cost-effective yet chemically diverse pathways from the networks of computer-generated retrosynthetic plans , author=. Chemical science , volume=. 2019 , publisher=

2019

[77] [77]

and Badowski, Tomasz and Grzybowski, Bartosz A

Beker, Wiktor and Gajewska, Ewa P. and Badowski, Tomasz and Grzybowski, Bartosz A. , title =. Angew. Chem. - Int. Ed. , keywords =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/anie.201806920 , pages =

work page doi:10.1002/anie.201806920

[78] [78]

Accounts of Chemical Research , volume=

ASKCOS: Open-Source, Data-Driven Synthesis Planning , author=. Accounts of Chemical Research , volume=. 2025 , publisher=

2025

[79] [79]

Li, Xin and Zhang, Shuo-Qing and Xu, Li-Cheng and Hong, Xin , title =. Angew. Chem. , publisher =

[80] [80]

and Saigiridharan, Lakshidaa and Genheden, Samuel , month = may, year =

Westerlund, Annie M. and Saigiridharan, Lakshidaa and Genheden, Samuel , month = may, year =. Constrained synthesis planning with disconnection-aware transformer and multi-objective search , url =. doi:10.26434/chemrxiv-2024-c77p4 , abstract =

work page doi:10.26434/chemrxiv-2024-c77p4 2024