Towards Generalizable and Evidential Nuclear Magnetic Resonance-Based Molecular Structure Elucidation via Large Language Model Agent

Chen Yang; Fanjie Xu; Hanyu Gao; Hanyu Sun; Hongxin Xiang; Jun Xia; Wenjie Du; Xiaojian Wang; Yunpeng Zhao; Yuqiang Li

arxiv: 2606.29776 · v1 · pith:VKM7FUP7new · submitted 2026-06-29 · 💻 cs.LG · cs.AI

Towards Generalizable and Evidential Nuclear Magnetic Resonance-Based Molecular Structure Elucidation via Large Language Model Agent

Zheng Fang , Chen Yang , Yusen Tan , Yunpeng Zhao , Fanjie Xu , Hongxin Xiang , Hanyu Sun , Hanyu Gao

show 4 more authors

Xiaojian Wang Wenjie Du Yuqiang Li Jun Xia

This is my paper

Pith reviewed 2026-06-30 07:26 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords NMR spectroscopymolecular structure elucidationlarge language model agentevidential reasoningnatural productschemical knowledge graphsscaffold split benchmark

0 comments

The pith

NMRAgent is an LLM-powered agent that plans, proposes, verifies, and refines molecular structures from NMR spectra and formulas to outperform prior methods on novel scaffolds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NMRAgent as a way to interpret NMR spectra for molecules whose structures are not in existing databases. It shows that an agent using large language models can break the task into planning the elucidation, proposing candidates, checking consistency between peaks and atoms, and optimizing fragments using the molecular formula. This produces both higher accuracy on benchmarks that test generalization to new scaffolds and transparent reasoning steps that chemists can inspect. The approach is demonstrated on two real unknown natural products, showing it can handle cases where database lookup fails and black-box models lack interpretability.

Core claim

NMRAgent takes experimental NMR spectra and molecular formula as input, plans the elucidation process, proposes candidate structures, verifies peak-atom consistency, and refines misaligned substructures through formula-aware fragment optimization. Enabled by its evidential reasoning, NMRAgent outperforms state-of-the-art methods, improving top-1 accuracy by 46.5% and Tanimoto similarity by 0.502 on a scaffold-split benchmark with novel scaffolds in the test set. It also elucidates the structures of two previously unknown natural products isolated from Hydrangea davidii and Vitex trifolia and corrects structural misassignments in established literature.

What carries the argument

NMRAgent, an evidential reasoning agent that integrates specialized spectral analysis tools with chemical knowledge graphs to mimic human expert deductive reasoning through planning, proposal, verification, and optimization steps.

If this is right

Top-1 accuracy improves by 46.5 percent and Tanimoto similarity by 0.502 over prior methods on scaffold-split tests with novel structures.
The agent can determine structures of previously unknown natural products from plant isolates.
It can identify and correct structural misassignments reported in the chemical literature.
Reasoning steps remain inspectable at the level of peak-atom assignments and fragment optimizations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same planning-and-verification loop could be adapted to other spectroscopic data such as mass spectra if the corresponding analysis tools are added.
Performance on very large or flexible molecules may still depend on the completeness of the underlying chemical knowledge graph.
The approach points toward hybrid systems in which LLM agents coordinate multiple analytical instruments rather than relying on any single data type.

Load-bearing premise

The large language model can reliably generate chemically valid plans, structure proposals, and consistency checks without persistent hallucinations that evade the verification and optimization steps.

What would settle it

Running the agent on a collection of NMR spectra for molecules with known novel scaffolds where it outputs structures that pass all internal consistency checks but are chemically incorrect would show the central claim does not hold.

read the original abstract

Nuclear Magnetic Resonance (NMR) spectroscopy is the gold standard for molecular structure elucidation, yet interpreting complex spectra for unknown molecules remains a bottleneck reliant on human expertise. While artificial intelligence has advanced this field, current methods face a critical trade-off: database retrieval cannot identify novel scaffolds, while de novo molecular structure elucidation models operate as black boxes, lacking the atom-level interpretability required for rigorous scientific validation. Here, we present NMRAgent, an evidential reasoning agent powered by large language models (LLMs) that bridges this gap by integrating specialized spectral analysis tools with chemical knowledge graphs. Unlike previous approaches, NMRAgent mimics the deductive reasoning of human experts: it takes experimental NMR spectra and molecular formula as input, plans the elucidation process, proposes candidate structures, verifies peak-atom consistency, and refines misaligned substructure through formula-aware fragment optimization. Enabled by its evidential reasoning, NMRAgent outperforms state-of-the-art methods, improving top-1 accuracy by 46.5% and Tanimoto similarity by 0.502 on a scaffold-split benchmark with novel scaffolds in the test set. Besides, we demonstrate the agent's practical utility by elucidating the structures of two previously unknown natural products isolated from Hydrangea davidii and Vitex trifolia, and by correcting structural misassignments in established literature. By combining high-accuracy prediction with transparent and evidence-based reasoning, NMRAgent establishes a new paradigm for interpretable AI in analytical chemistry.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

NMRAgent combines LLM planning with peak verification and fragment optimization for NMR elucidation and reports gains on scaffold splits plus two natural-product cases, but the abstract supplies almost no benchmark or validation details.

read the letter

The core of this paper is NMRAgent, an LLM agent that takes NMR spectra and formula, plans steps, proposes candidates, checks peak-atom consistency, and refines via formula-aware optimization. It claims a 46.5% top-1 accuracy lift and 0.502 Tanimoto gain on a scaffold-split benchmark with novel test scaffolds, plus successful elucidation of two unknown natural products.

The approach is new in its specific mix of evidential workflow, tool use, and knowledge-graph grounding to produce traceable reasoning rather than a single black-box output. That directly targets the interpretability problem in current de novo models.

The main weakness is that the abstract gives no information on how the benchmark was assembled, what the exact baselines were, whether the gains are statistically significant, or how the two real cases were independently confirmed. The stress-test concern about verification missing chemically plausible but incorrect structures on rare scaffolds is plausible given the reliance on LLM-generated plans; nothing in the provided text shows the checks are tight enough to rule that out.

This is for groups working on AI tools for analytical chemistry who want to see an agent-style pipeline in action. A reader focused on reproducible methods would want the full methods, data splits, and code before treating the numbers as settled.

The work is coherent enough on its own terms to merit peer review so the community can check the missing pieces.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces NMRAgent, an LLM-based evidential reasoning agent for NMR molecular structure elucidation. It integrates spectral analysis tools and chemical knowledge graphs to plan elucidation, propose candidates, verify peak-atom consistency, and refine via formula-aware fragment optimization. The central claims are a 46.5% improvement in top-1 accuracy and 0.502 gain in Tanimoto similarity over SOTA on a scaffold-split benchmark containing novel scaffolds, plus successful structure elucidation for two previously unknown natural products from Hydrangea davidii and Vitex trifolia, and correction of literature misassignments.

Significance. If the empirical results and real-world cases hold under rigorous validation, the work would represent a meaningful advance in interpretable AI for analytical chemistry by addressing the trade-off between database retrieval (limited to known scaffolds) and black-box de novo models. The explicit planning-verification-optimization loop and evidential reasoning provide a human-mimetic workflow that could improve trust and adoption in structure elucidation tasks. The scaffold-split evaluation and natural-product demonstrations are positive steps toward generalizability claims.

major comments (3)

[Results/Methods] Benchmark construction (Results or Methods section): The scaffold-split benchmark is load-bearing for the generalizability claim (46.5% top-1 accuracy gain), yet no details are supplied on the total number of molecules, scaffold selection criteria, how novelty of test scaffolds was verified against the training distribution, or any statistical significance testing of the reported improvements. This omission prevents assessment of whether the performance edge is robust or artifactual.
[Application/Results] Natural-product validation (Application or Results section): The elucidation of structures for the two unknown natural products is presented as practical evidence, but the manuscript provides no information on independent validation methods (e.g., comparison to synthetic standards, additional spectroscopic data, or cross-validation by human experts), which is required to substantiate that the agent's output survived verification steps for truly novel scaffolds.
[Agent architecture/Experiments] Verification robustness (Agent architecture or Experiments section): The central assumption that peak-atom consistency checks plus formula-aware optimization will reliably reject chemically plausible but incorrect LLM proposals is not supported by any error analysis, failure-case reporting, or ablation on ambiguous peak assignments. For novel scaffolds this is load-bearing, as the skeptic concern about hallucinations evading tolerance-based checks remains unaddressed by the presented evidence.

minor comments (2)

[Abstract/Methods] The abstract and main text should explicitly define the Tanimoto similarity metric used and the exact top-k settings for the accuracy metric to allow direct comparison with prior work.
[Methods] Notation for evidential reasoning components (e.g., how evidence scores are aggregated) should be introduced with a clear equation or pseudocode early in the methods to improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for strengthening the manuscript's clarity and rigor. We address each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [Results/Methods] Benchmark construction (Results or Methods section): The scaffold-split benchmark is load-bearing for the generalizability claim (46.5% top-1 accuracy gain), yet no details are supplied on the total number of molecules, scaffold selection criteria, how novelty of test scaffolds was verified against the training distribution, or any statistical significance testing of the reported improvements. This omission prevents assessment of whether the performance edge is robust or artifactual.

Authors: We agree that additional details on benchmark construction are needed to fully support the generalizability claims. In the revised manuscript, we will expand the Methods section to specify the total number of molecules in the benchmark, the scaffold selection criteria employed, the procedure used to verify novelty of test scaffolds relative to the training distribution, and the results of statistical significance testing on the reported accuracy and similarity improvements. revision: yes
Referee: [Application/Results] Natural-product validation (Application or Results section): The elucidation of structures for the two unknown natural products is presented as practical evidence, but the manuscript provides no information on independent validation methods (e.g., comparison to synthetic standards, additional spectroscopic data, or cross-validation by human experts), which is required to substantiate that the agent's output survived verification steps for truly novel scaffolds.

Authors: We acknowledge this point. In the revised manuscript, we will add details in the Application section on the independent validation methods used for the two natural products (from Hydrangea davidii and Vitex trifolia), including any comparisons to additional spectroscopic data or expert confirmation that supported the agent's structure assignments. revision: yes
Referee: [Agent architecture/Experiments] Verification robustness (Agent architecture or Experiments section): The central assumption that peak-atom consistency checks plus formula-aware optimization will reliably reject chemically plausible but incorrect LLM proposals is not supported by any error analysis, failure-case reporting, or ablation on ambiguous peak assignments. For novel scaffolds this is load-bearing, as the skeptic concern about hallucinations evading tolerance-based checks remains unaddressed by the presented evidence.

Authors: We agree that explicit analysis of verification robustness is important. In the revised manuscript, we will add an error analysis subsection in the Experiments, including selected failure cases, and ablations isolating the contributions of peak-atom consistency checks and formula-aware fragment optimization, to demonstrate their role in rejecting incorrect proposals on novel scaffolds. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on external benchmarks

full rationale

The paper's central claims are empirical: NMRAgent is evaluated on a scaffold-split benchmark (top-1 accuracy +46.5%, Tanimoto +0.502) and demonstrated on two previously unknown natural products. These outcomes are measured against held-out ground-truth structures and experimental spectra; no equations, fitted parameters, or self-citations are invoked to derive the reported metrics. The method description (planning, peak-atom consistency, fragment optimization) is presented as an engineering pipeline whose validity is tested externally rather than defined into existence. No load-bearing step reduces to a self-referential definition or prior author result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that NMR spectra plus molecular formula contain enough information for the agent workflow to succeed, plus standard assumptions about LLM tool-use reliability. No free parameters or new physical entities are introduced.

axioms (1)

domain assumption NMR spectra together with molecular formula provide sufficient information for structure elucidation when processed by the described agent workflow
The method takes experimental NMR spectra and molecular formula as input and assumes these inputs are adequate for the planning-verification-refinement loop.

invented entities (1)

NMRAgent no independent evidence
purpose: Evidential reasoning agent that integrates LLM planning with spectral tools and knowledge graphs for NMR structure elucidation
The agent is the novel system proposed in the paper; no independent evidence for its existence or performance is provided outside the paper itself.

pith-pipeline@v0.9.1-grok · 5834 in / 1313 out tokens · 50500 ms · 2026-06-30T07:26:45.398294+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Synthesis-driven structural revision of c5-hydroxy-cyclo (l-pro-l-leu) using electrochemical oxidation

Blessing B Akinlabi, Jared Balsz-Diaz, Mark C Walker, and Matthew R Aronoff. Synthesis-driven structural revision of c5-hydroxy-cyclo (l-pro-l-leu) using electrochemical oxidation. 2026

2026
[2]

Learning the language of nmr: structure elucidation from nmr spectra using transformer models

Marvin Alberts, Federico Zipoli, and Alain Vaucher. Learning the language of nmr: structure elucidation from nmr spectra using transformer models. InAI for Accelerated Materials Design-NeurIPS 2023 Workshop, 2023

2023
[3]

Buildingaknowledgegraphtoenableprecisionmedicine

PayalChandak, KexinHuang, andMarinkaZitnik. Buildingaknowledgegraphtoenableprecisionmedicine. Scientific data, 10(1):67, 2023

2023
[4]

arXiv preprint arXiv:2010.09885 , year=

Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. Chemberta: Large-scale self-supervised pretraining for molecular property prediction.arXiv preprint arXiv:2010.09885, 2020

work page arXiv 2010
[5]

Confirmation of the revised structure of samoquasine a and a proposed structural revision of cherimoline.Journal of natural products, 81(7):1658–1665, 2018

Francis Dhoro, Jesse Parkin-Gibbs, Matthew McIldowie, Brian W Skelton, and Matthew J Piggott. Confirmation of the revised structure of samoquasine a and a proposed structural revision of cherimoline.Journal of natural products, 81(7):1658–1665, 2018

2018
[6]

Self-consistent perturbation theory of diamagnetism: I

Robert Ditchfield. Self-consistent perturbation theory of diamagnetism: I. a gauge-invariant lcao method for nmr chemical shifts.Molecular Physics, 27(4):789–807, 1974

1974
[7]

The faiss library.IEEE Transactionson Big Data, 2025

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library.IEEE Transactionson Big Data, 2025

2025
[8]

gold standard

Abdul-Hamid Emwas, Kacper Szczepski, Benjamin Gabriel Poulson, Kousik Chandra, Ryan T McKay, Manel Dhahri, Fatimah Alahmari, Lukasz Jaremko, Joanna Izabela Lachowicz, and Mariusz Jaremko. Nmr as a “gold standard” method in drug design and discovery.Molecules, 25(20):4597, 2020

2020
[9]

Nmrgym: A comprehensive benchmark for nuclear magnetic resonance based molecular structure elucidation.arXiv preprint arXiv:2601.15763, 2026

Zheng Fang, Chen Yang, Hai-tao Yu, Haoming Luo, Haitao He, Jiaqing Xie, Zhuo Yang, and Jun Xia. Nmrgym: A comprehensive benchmark for nuclear magnetic resonance based molecular structure elucidation.arXiv preprint arXiv:2601.15763, 2026

work page arXiv 2026
[10]

John Wiley & Sons, 2012

Leslie D Field, Sev Sternhell, and John R Kalman.Organic Structures from Spectra. John Wiley & Sons, 2012

2012
[11]

M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Men- nucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. Williams-Young, F. Ding, F. Lipparini, F. Egidi, J. Goin...

2016
[12]

Chembl: a large-scale bioactivity database for drug discovery

Anna Gaulton, Louisa J Bellis, A Patricia Bento, Jon Chambers, Mark Davies, Anne Hersey, Yvonne Light, Shaun McGlinchey, David Michalovich, Bissan Al-Lazikani, et al. Chembl: a large-scale bioactivity database for drug discovery. Nucleic acids research, 40(D1):D1100–D1107, 2012

2012
[13]

Equivariant diffusion for molecule generation in 3d

Emiel Hoogeboom, Vıctor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d. InInternational conference on machine learning, pages 8867–8887. PMLR, 2022

2022
[14]

Accurate and efficient structure elucidation from routine one-dimensional nmr spectra using multitask machine learning

Frank Hu, Michael S Chen, Grant M Rotskoff, Matthew W Kanan, and Thomas E Markland. Accurate and efficient structure elucidation from routine one-dimensional nmr spectra using multitask machine learning. ACS Central Science, 10(11):2162–2170, 2024

2024
[15]

Nmr-solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization

Yongqi Jin, Jun-Jie Wang, Fanjie Xu, Xiaohong Ji, Zhifeng Gao, Linfeng Zhang, Guolin Ke, Rong Zhu, and Weinan E. Nmr-solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization. Nature Communications, 2026

2026
[16]

Prediction of chemical shift in nmr: A review.Magnetic Resonance in Chemistry, 60(11):1021–1031, 2022

Eric Jonas, Stefan Kuhn, and Nils Schlörer. Prediction of chemical shift in nmr: A review.Magnetic Resonance in Chemistry, 60(11):1021–1031, 2022

2022
[17]

Deepsat: learning molecular structures from nuclear magnetic resonance data

Hyun Woo Kim, Chen Zhang, Raphael Reher, Mingxun Wang, Kelsey L Alexander, Louis-Félix Nothias, Yoo Kyong Han, Hyeji Shin, Ki Yong Lee, Kyu Hyeong Lee, et al. Deepsat: learning molecular structures from nuclear magnetic resonance data. Journal of Cheminformatics, 15(1):71, 2023

2023
[18]

Altechromones a and b, new plant growth regulators produced by the fungus, alternaria sp.Bioscience, biotechnology,and biochemistry, 56(10):1664– 1665, 1992

Yasuo Kimura, Takashi Mizuno, Hiromitsu Nakajima, and Takashi Hamasaki. Altechromones a and b, new plant growth regulators produced by the fungus, alternaria sp.Bioscience, biotechnology,and biochemistry, 56(10):1664– 1665, 1992

1992
[19]

Structural revision and synthesis of altechromone a.Journal of natural products, 73(12):2064–2066, 2010

P Konigs, B Rinker, L Maus, M Nieger, J Rheinheimer, and SR Waldvogel. Structural revision and synthesis of altechromone a.Journal of natural products, 73(12):2064–2066, 2010

2064
[20]

Geometry-complete diffusion for 3d molecule generation and optimization

Alex Morehead and Jianlin Cheng. Geometry-complete diffusion for 3d molecule generation and optimization. Communications Chemistry, 7(1):150, 2024

2024
[21]

Samoquasine a, a benzoquinazoline alkaloid from the seeds of annona s quamosa.Journal of natural products, 63 (12):1707–1708, 2000

HiroshiMorita, YumikoSato, Kit-LamChan, Chee-YanChoo, HidejiItokawa, KoichiTakeya, andJun’ichiKobayashi. Samoquasine a, a benzoquinazoline alkaloid from the seeds of annona s quamosa.Journal of natural products, 63 (12):1707–1708, 2000

2000
[22]

Laxman D Nandawadekar, Hiba K P, Teena P George, Choppari Thirupathi, Aparna Sahoo, Sidharth Chopra, and D Srinivasa Reddy. Structural revision of a natural tetrahydroquinoxaline-6-carboxylic acid isolated from caulis sinomenii through total synthesis of both the regioisomers.Journal of Natural Products, 88(12):2978–2986, 2025

2025
[23]

John Wiley & Sons, 2011

Yong-Cheng Ning.Interpretation of Organic Spectra. John Wiley & Sons, 2011

2011
[24]

The natural products atlas 3.0: extending the database of microbially derived natural products.Nucleic Acids Research, 53(D1):D691–D699, 2025

Ella F Poynton, Jeffrey A van Santen, Matthew Pin, Marla Macias Contreras, Emily McMann, Jonathan Parra, Brandon Showalter, Liana Zaroubi, Katherine R Duncan, and Roger G Linington. The natural products atlas 3.0: extending the database of microbially derived natural products.Nucleic Acids Research, 53(D1):D691–D699, 2025

2025
[25]

Coumarin dimers from hydrangea davidii and their antimalarial activities.Magnetic Resonance in Chemistry, 2026

Wan-XianQi, Wei-JuanZu, GangLi, JingWang, Li-JuanLang, RongLiu, BeiJiang, andChao-JiangXiao. Coumarin dimers from hydrangea davidii and their antimalarial activities.Magnetic Resonance in Chemistry, 2026

2026
[26]

The lotus initiative for open knowledge management in natural products research.elife, 11:e70780, 2022

Adriano Rutz, Maria Sorokina, Jakub Galgonek, Daniel Mietchen, Egon Willighagen, Arnaud Gaudry, James G Graham, Ralf Stephan, Roderic Page, Jiří Vondrášek, et al. The lotus initiative for open knowledge management in natural products research.elife, 11:e70780, 2022

2022
[27]

Nmr-challenge

Ondrej Socha, Zuzana Osifová, and Martin Dracinsky. Nmr-challenge. com: An interactive website with exercises in solving structures from nmr spectra, 2023

2023
[28]

Coconut online: collection of open natural products database.Journal of Cheminformatics, 13(1):2, 2021

Maria Sorokina, Peter Merseburger, Kohulan Rajan, Mehmet Aziz Yirik, and Christoph Steinbeck. Coconut online: collection of open natural products database.Journal of Cheminformatics, 13(1):2, 2021

2021
[29]

Nmrshiftdb constructing a free chemical information system with open-source components.Journal of chemical information and computer sciences, 43(6):1733–1739, 2003

Christoph Steinbeck, Stefan Krause, and Stefan Kuhn. Nmrshiftdb constructing a free chemical information system with open-source components.Journal of chemical information and computer sciences, 43(6):1733–1739, 2003

2003
[30]

A transformer based generative chemical language ai model for structural elucidation of organic compounds

Xiaofeng Tan. A transformer based generative chemical language ai model for structural elucidation of organic compounds. Journal of cheminformatics, 17(1):103, 2025

2025
[31]

Cody Timmons and Peter Wipf. Density functional theory calculation of 13c nmr shifts of diazaphenanthrene alkaloids: reinvestigation of the structure of samoquasine a.The Journal of Organic Chemistry, 73(22):9168–9170, 2008. 12 SpectraAI Brief Communication

2008
[32]

Smiles, a chemical language and information system

David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988

1988
[33]

Efficient implementation of the gauge-independent atomic orbital method for nmr chemical shift calculations.Journal of the American Chemical Society, 112(23):8251–8260, 1990

Krzysztof Wolinski, James F Hinton, and Peter Pulay. Efficient implementation of the gauge-independent atomic orbital method for nmr chemical shift calculations.Journal of the American Chemical Society, 112(23):8251–8260, 1990

1990
[34]

Atomic diffusion models for small molecule structure elucidation from nmr spectra.arXiv preprint arXiv:2512.03127, 2025

Ziyu Xiong, Yichi Zhang, Foyez Alauddin, Chu Xin Cheng, Joon Soo An, Mohammad R Seyedsayamdost, and Ellen D Zhong. Atomic diffusion models for small molecule structure elucidation from nmr spectra.arXiv preprint arXiv:2512.03127, 2025

work page arXiv 2025
[35]

Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts.Nature Computational Science, 5(4):292–300, 2025

Fanjie Xu, Wentao Guo, Feng Wang, Lin Yao, Hongshuai Wang, Fujie Tang, Zhifeng Gao, Linfeng Zhang, Weinan E, Zhong-Qun Tian, et al. Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts.Nature Computational Science, 5(4):292–300, 2025

2025
[36]

Nmrmind: A transformer-based model enabling the elucidation from multidimensional nmr to structures

Xi Xue, Hanyu Sun, Jingying Sun, Luc Patiny, Xiangying Liu, Kai Chen, Jingjie Yan, Liangning Li, Xue Liu, Shu Xu, et al. Nmrmind: A transformer-based model enabling the elucidation from multidimensional nmr to structures. Analytical Chemistry, 97(41):22603–22614, 2025

2025
[37]

A large-scale foundation model enables simulation-to-real adaptation for nuclear magnetic resonance-based molecular structure analysis

Chen Yang, Zheng Fang, Hanyu Sun, Fanjie Xu, Hongxin Xiang, Hanyu Gao, Xiangxiang Zeng, Yuqiang Li, Xiaojian Wang, and Jun Xia. A large-scale foundation model enables simulation-to-real adaptation for nuclear magnetic resonance-based molecular structure analysis, 2026. URLhttps://arxiv.org/abs/2606.20756

work page internal anchor Pith review Pith/arXiv arXiv 2026
[38]

Diffnmr: diffusion models for nuclear magnetic resonance spectra elucidation.Materials Futures, 5(1):015601, 2026

Qingsong Yang, Binglan Wu, Xuwei Liu, Bo Chen, Wei Li, Gen Long, Xin Chen, and Mingjun Xiao. Diffnmr: diffusion models for nuclear magnetic resonance spectra elucidation.Materials Futures, 5(1):015601, 2026

2026
[39]

A deep learning model for predicting selected organic molecular spectra.Nature Computational Science, 3(11):957–964, 2023

Zihan Zou, Yujin Zhang, Lijun Liang, Mingzhi Wei, Jiancai Leng, Jun Jiang, Yi Luo, and Wei Hu. A deep learning model for predicting selected organic molecular spectra.Nature Computational Science, 3(11):957–964, 2023. 13 SpectraAI Brief Communication Appendix A Chemical Preliminaries Formally, we model the spectrum as a continuous functionx(δ) :R→Rover th...

2023
[40]

Do not plan any candidate search or structural modification that violates the molecular formula

Treat the molecular formula as a hard constraint. Do not plan any candidate search or structural modification that violates the molecular formula
[41]

If the input contains expanded repeated shifts, do not collapse repeated peaks unless explicitly instructed

Use the 1H NMR spectrum to reason about proton environments, integration, multiplicity when available, and repeated or equivalent signals. If the input contains expanded repeated shifts, do not collapse repeated peaks unless explicitly instructed
[42]

Use the 13C NMR spectrum as strong evidence for carbon environments, including carbonyl-like carbons, aromatic or alkene carbons, oxygen-bearing carbons, and saturated aliphatic carbons
[43]

Use optional experimental metadata only as contextual evidence. Metadata such as reaction precur- sors, reagents, catalysts, biological source, or isolation conditions can guide hypotheses, but cannot override the molecular formula or spectral evidence
[44]

A memory cannot override the query spectra, molecular formula, or verifier evidence

Use recalled memories only as analogical evidence from previously confirmed cases. A memory cannot override the query spectra, molecular formula, or verifier evidence
[45]

Retrieval rank alone should not suppress plausiblede novocandidates before peak–atom verification

Preserve both retrieval andde novocandidates when both are useful. Retrieval rank alone should not suppress plausiblede novocandidates before peak–atom verification
[46]

Prefer a larger or more diverse candidate pool when the spectra suggest compact natural-product- like scaffolds, fused rings, lactones, enones, unusual oxygenation patterns, or other rare scaffold types
[47]

Ifpreviousverifierfeedbackidentifiesunmatchedpeaks, inconsistentlocalassignments, orunresolved mismatch regions, plan targeted optimization rather than blind regeneration
[48]

analysis

Return JSON only. Do not include free-form text outside the JSON object. The Planner returns the following JSON schema: { "analysis": "short evidence-grounded reasoning about formula and NMR signals", "use_retrieval": "<true_or_false>", "use_denovo": "<true_or_false>", "retrieval_top_k": "<integer>", "denovo_top_k": "<integer>", "save_pool_file": "<true_o...
[49]

Execute retrieval,de novogeneration, pool merging, reranking, optimization, and molecular editing only when requested by the Planner or by verifier feedback
[50]

Preserve candidate provenance whenever possible, including SMILES string, source label, source rank, source score, molecular formula, spectrum metadata, and candidate-pool path
[51]

Do not discard a candidate solely because it comes from a lower-volume source

Keep retrieval-derived candidates andde novocandidates visible for downstream verification. Do not discard a candidate solely because it comes from a lower-volume source
[52]

Deduplicate candidates by canonical non-isomeric SMILES while retaining all available source prove- nance
[53]

Candidate con- struction, candidate merging, optimization, and verification must remain auditable as separate op- erations

Do not silently invoke retrieval orde novogeneration inside an optimization step. Candidate con- struction, candidate merging, optimization, and verification must remain auditable as separate op- erations
[54]

Optimization tools should operate only on an existing candidate pool
[55]

Verifier-guided in-place molecular editing may be invoked only when the verifier localizes a high- confidence mismatch to a specific atom or small chemical environment
[56]

After any local edit, retain the unedited parent candidate for comparison
[57]

If a tool fails, returns invalid output, or produces an empty candidate pool, report the failure explicitly in the structured execution output
[58]

You must not provide the final molecular structure unless the Peak–Atom Verifier has supplied sufficient evidence

Return structured execution results, including generated files, candidate counts, source composition, tool status, and any warnings needed by the verifier. You must not provide the final molecular structure unless the Peak–Atom Verifier has supplied sufficient evidence. B.2.3 Peak–Atom Verifier System Prompt Peak–Atom Verifier system prompt You are an exp...
[59]

Molecular formula consistency between the query and each candidate
[60]

Overall NMR similarity score from the reranking tool
[61]

Matched query peaks and their assigned candidate atoms
[62]

Unmatched query peaks, especially diagnostic peaks that indicate missing functional groups or unresolved local environments
[63]

Unused predicted peaks, especially predicted signals that have no reasonable support in the experi- mental spectrum
[64]

1H and 13C residuals, including whether the largest errors occur in chemically diagnostic regions
[65]

Atom-level assignment summaries, including which parts of the molecule are well supported and which regions remain uncertain
[66]

16 SpectraAI Brief Communication

Candidate provenance, including retrieval,de novo, optimized, merged, edited, or seed sources. 16 SpectraAI Brief Communication
[67]

These may provide supporting context but cannot override the current query evidence

Previous verifier outputs or recalled confirmed memories, if available. These may provide supporting context but cannot override the current query evidence. Decision instructions:
[68]

Returnacceptonly when both 1H and 13C evidence are coherent and no major diagnostic query peaks remain unexplained
[69]

Ade novocandidate may be accepted over retrieval candidates if its formula consistency, spectral similarity, and peak–atom assignments provide stronger evidence
[70]

Returnneed_optwhen the best candidate is globally reasonable but contains local mismatches that can be targeted by fragment optimization or in-place editing
[71]

Returnneed_bigger_poolwhen the candidate pool lacks sufficiently diverse or formula-compatible alternatives
[72]

Returnneed_retrywhen tool outputs are invalid, incomplete, inconsistent, or insufficient for evidence-based verification
[73]

verdict":

When optimization is needed, provide concrete mismatch descriptions, including the relevant un- matched peaks, poorly matched atoms, or local structural regions. Return JSON only using the required schema. The Peak–Atom Verifier returns the following JSON schema: { "verdict": "<accept | need_opt | need_bigger_pool | need_retry>", "analysis": "evidence-gro...

work page arXiv

[1] [1]

Synthesis-driven structural revision of c5-hydroxy-cyclo (l-pro-l-leu) using electrochemical oxidation

Blessing B Akinlabi, Jared Balsz-Diaz, Mark C Walker, and Matthew R Aronoff. Synthesis-driven structural revision of c5-hydroxy-cyclo (l-pro-l-leu) using electrochemical oxidation. 2026

2026

[2] [2]

Learning the language of nmr: structure elucidation from nmr spectra using transformer models

Marvin Alberts, Federico Zipoli, and Alain Vaucher. Learning the language of nmr: structure elucidation from nmr spectra using transformer models. InAI for Accelerated Materials Design-NeurIPS 2023 Workshop, 2023

2023

[3] [3]

Buildingaknowledgegraphtoenableprecisionmedicine

PayalChandak, KexinHuang, andMarinkaZitnik. Buildingaknowledgegraphtoenableprecisionmedicine. Scientific data, 10(1):67, 2023

2023

[4] [4]

arXiv preprint arXiv:2010.09885 , year=

Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. Chemberta: Large-scale self-supervised pretraining for molecular property prediction.arXiv preprint arXiv:2010.09885, 2020

work page arXiv 2010

[5] [5]

Confirmation of the revised structure of samoquasine a and a proposed structural revision of cherimoline.Journal of natural products, 81(7):1658–1665, 2018

Francis Dhoro, Jesse Parkin-Gibbs, Matthew McIldowie, Brian W Skelton, and Matthew J Piggott. Confirmation of the revised structure of samoquasine a and a proposed structural revision of cherimoline.Journal of natural products, 81(7):1658–1665, 2018

2018

[6] [6]

Self-consistent perturbation theory of diamagnetism: I

Robert Ditchfield. Self-consistent perturbation theory of diamagnetism: I. a gauge-invariant lcao method for nmr chemical shifts.Molecular Physics, 27(4):789–807, 1974

1974

[7] [7]

The faiss library.IEEE Transactionson Big Data, 2025

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library.IEEE Transactionson Big Data, 2025

2025

[8] [8]

gold standard

Abdul-Hamid Emwas, Kacper Szczepski, Benjamin Gabriel Poulson, Kousik Chandra, Ryan T McKay, Manel Dhahri, Fatimah Alahmari, Lukasz Jaremko, Joanna Izabela Lachowicz, and Mariusz Jaremko. Nmr as a “gold standard” method in drug design and discovery.Molecules, 25(20):4597, 2020

2020

[9] [9]

Nmrgym: A comprehensive benchmark for nuclear magnetic resonance based molecular structure elucidation.arXiv preprint arXiv:2601.15763, 2026

Zheng Fang, Chen Yang, Hai-tao Yu, Haoming Luo, Haitao He, Jiaqing Xie, Zhuo Yang, and Jun Xia. Nmrgym: A comprehensive benchmark for nuclear magnetic resonance based molecular structure elucidation.arXiv preprint arXiv:2601.15763, 2026

work page arXiv 2026

[10] [10]

John Wiley & Sons, 2012

Leslie D Field, Sev Sternhell, and John R Kalman.Organic Structures from Spectra. John Wiley & Sons, 2012

2012

[11] [11]

M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Men- nucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. Williams-Young, F. Ding, F. Lipparini, F. Egidi, J. Goin...

2016

[12] [12]

Chembl: a large-scale bioactivity database for drug discovery

Anna Gaulton, Louisa J Bellis, A Patricia Bento, Jon Chambers, Mark Davies, Anne Hersey, Yvonne Light, Shaun McGlinchey, David Michalovich, Bissan Al-Lazikani, et al. Chembl: a large-scale bioactivity database for drug discovery. Nucleic acids research, 40(D1):D1100–D1107, 2012

2012

[13] [13]

Equivariant diffusion for molecule generation in 3d

Emiel Hoogeboom, Vıctor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d. InInternational conference on machine learning, pages 8867–8887. PMLR, 2022

2022

[14] [14]

Accurate and efficient structure elucidation from routine one-dimensional nmr spectra using multitask machine learning

Frank Hu, Michael S Chen, Grant M Rotskoff, Matthew W Kanan, and Thomas E Markland. Accurate and efficient structure elucidation from routine one-dimensional nmr spectra using multitask machine learning. ACS Central Science, 10(11):2162–2170, 2024

2024

[15] [15]

Nmr-solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization

Yongqi Jin, Jun-Jie Wang, Fanjie Xu, Xiaohong Ji, Zhifeng Gao, Linfeng Zhang, Guolin Ke, Rong Zhu, and Weinan E. Nmr-solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization. Nature Communications, 2026

2026

[16] [16]

Prediction of chemical shift in nmr: A review.Magnetic Resonance in Chemistry, 60(11):1021–1031, 2022

Eric Jonas, Stefan Kuhn, and Nils Schlörer. Prediction of chemical shift in nmr: A review.Magnetic Resonance in Chemistry, 60(11):1021–1031, 2022

2022

[17] [17]

Deepsat: learning molecular structures from nuclear magnetic resonance data

Hyun Woo Kim, Chen Zhang, Raphael Reher, Mingxun Wang, Kelsey L Alexander, Louis-Félix Nothias, Yoo Kyong Han, Hyeji Shin, Ki Yong Lee, Kyu Hyeong Lee, et al. Deepsat: learning molecular structures from nuclear magnetic resonance data. Journal of Cheminformatics, 15(1):71, 2023

2023

[18] [18]

Altechromones a and b, new plant growth regulators produced by the fungus, alternaria sp.Bioscience, biotechnology,and biochemistry, 56(10):1664– 1665, 1992

Yasuo Kimura, Takashi Mizuno, Hiromitsu Nakajima, and Takashi Hamasaki. Altechromones a and b, new plant growth regulators produced by the fungus, alternaria sp.Bioscience, biotechnology,and biochemistry, 56(10):1664– 1665, 1992

1992

[19] [19]

Structural revision and synthesis of altechromone a.Journal of natural products, 73(12):2064–2066, 2010

P Konigs, B Rinker, L Maus, M Nieger, J Rheinheimer, and SR Waldvogel. Structural revision and synthesis of altechromone a.Journal of natural products, 73(12):2064–2066, 2010

2064

[20] [20]

Geometry-complete diffusion for 3d molecule generation and optimization

Alex Morehead and Jianlin Cheng. Geometry-complete diffusion for 3d molecule generation and optimization. Communications Chemistry, 7(1):150, 2024

2024

[21] [21]

Samoquasine a, a benzoquinazoline alkaloid from the seeds of annona s quamosa.Journal of natural products, 63 (12):1707–1708, 2000

HiroshiMorita, YumikoSato, Kit-LamChan, Chee-YanChoo, HidejiItokawa, KoichiTakeya, andJun’ichiKobayashi. Samoquasine a, a benzoquinazoline alkaloid from the seeds of annona s quamosa.Journal of natural products, 63 (12):1707–1708, 2000

2000

[22] [22]

Laxman D Nandawadekar, Hiba K P, Teena P George, Choppari Thirupathi, Aparna Sahoo, Sidharth Chopra, and D Srinivasa Reddy. Structural revision of a natural tetrahydroquinoxaline-6-carboxylic acid isolated from caulis sinomenii through total synthesis of both the regioisomers.Journal of Natural Products, 88(12):2978–2986, 2025

2025

[23] [23]

John Wiley & Sons, 2011

Yong-Cheng Ning.Interpretation of Organic Spectra. John Wiley & Sons, 2011

2011

[24] [24]

The natural products atlas 3.0: extending the database of microbially derived natural products.Nucleic Acids Research, 53(D1):D691–D699, 2025

Ella F Poynton, Jeffrey A van Santen, Matthew Pin, Marla Macias Contreras, Emily McMann, Jonathan Parra, Brandon Showalter, Liana Zaroubi, Katherine R Duncan, and Roger G Linington. The natural products atlas 3.0: extending the database of microbially derived natural products.Nucleic Acids Research, 53(D1):D691–D699, 2025

2025

[25] [25]

Coumarin dimers from hydrangea davidii and their antimalarial activities.Magnetic Resonance in Chemistry, 2026

Wan-XianQi, Wei-JuanZu, GangLi, JingWang, Li-JuanLang, RongLiu, BeiJiang, andChao-JiangXiao. Coumarin dimers from hydrangea davidii and their antimalarial activities.Magnetic Resonance in Chemistry, 2026

2026

[26] [26]

The lotus initiative for open knowledge management in natural products research.elife, 11:e70780, 2022

Adriano Rutz, Maria Sorokina, Jakub Galgonek, Daniel Mietchen, Egon Willighagen, Arnaud Gaudry, James G Graham, Ralf Stephan, Roderic Page, Jiří Vondrášek, et al. The lotus initiative for open knowledge management in natural products research.elife, 11:e70780, 2022

2022

[27] [27]

Nmr-challenge

Ondrej Socha, Zuzana Osifová, and Martin Dracinsky. Nmr-challenge. com: An interactive website with exercises in solving structures from nmr spectra, 2023

2023

[28] [28]

Coconut online: collection of open natural products database.Journal of Cheminformatics, 13(1):2, 2021

Maria Sorokina, Peter Merseburger, Kohulan Rajan, Mehmet Aziz Yirik, and Christoph Steinbeck. Coconut online: collection of open natural products database.Journal of Cheminformatics, 13(1):2, 2021

2021

[29] [29]

Nmrshiftdb constructing a free chemical information system with open-source components.Journal of chemical information and computer sciences, 43(6):1733–1739, 2003

Christoph Steinbeck, Stefan Krause, and Stefan Kuhn. Nmrshiftdb constructing a free chemical information system with open-source components.Journal of chemical information and computer sciences, 43(6):1733–1739, 2003

2003

[30] [30]

A transformer based generative chemical language ai model for structural elucidation of organic compounds

Xiaofeng Tan. A transformer based generative chemical language ai model for structural elucidation of organic compounds. Journal of cheminformatics, 17(1):103, 2025

2025

[31] [31]

Cody Timmons and Peter Wipf. Density functional theory calculation of 13c nmr shifts of diazaphenanthrene alkaloids: reinvestigation of the structure of samoquasine a.The Journal of Organic Chemistry, 73(22):9168–9170, 2008. 12 SpectraAI Brief Communication

2008

[32] [32]

Smiles, a chemical language and information system

David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988

1988

[33] [33]

Efficient implementation of the gauge-independent atomic orbital method for nmr chemical shift calculations.Journal of the American Chemical Society, 112(23):8251–8260, 1990

Krzysztof Wolinski, James F Hinton, and Peter Pulay. Efficient implementation of the gauge-independent atomic orbital method for nmr chemical shift calculations.Journal of the American Chemical Society, 112(23):8251–8260, 1990

1990

[34] [34]

Atomic diffusion models for small molecule structure elucidation from nmr spectra.arXiv preprint arXiv:2512.03127, 2025

Ziyu Xiong, Yichi Zhang, Foyez Alauddin, Chu Xin Cheng, Joon Soo An, Mohammad R Seyedsayamdost, and Ellen D Zhong. Atomic diffusion models for small molecule structure elucidation from nmr spectra.arXiv preprint arXiv:2512.03127, 2025

work page arXiv 2025

[35] [35]

Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts.Nature Computational Science, 5(4):292–300, 2025

Fanjie Xu, Wentao Guo, Feng Wang, Lin Yao, Hongshuai Wang, Fujie Tang, Zhifeng Gao, Linfeng Zhang, Weinan E, Zhong-Qun Tian, et al. Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts.Nature Computational Science, 5(4):292–300, 2025

2025

[36] [36]

Nmrmind: A transformer-based model enabling the elucidation from multidimensional nmr to structures

Xi Xue, Hanyu Sun, Jingying Sun, Luc Patiny, Xiangying Liu, Kai Chen, Jingjie Yan, Liangning Li, Xue Liu, Shu Xu, et al. Nmrmind: A transformer-based model enabling the elucidation from multidimensional nmr to structures. Analytical Chemistry, 97(41):22603–22614, 2025

2025

[37] [37]

A large-scale foundation model enables simulation-to-real adaptation for nuclear magnetic resonance-based molecular structure analysis

Chen Yang, Zheng Fang, Hanyu Sun, Fanjie Xu, Hongxin Xiang, Hanyu Gao, Xiangxiang Zeng, Yuqiang Li, Xiaojian Wang, and Jun Xia. A large-scale foundation model enables simulation-to-real adaptation for nuclear magnetic resonance-based molecular structure analysis, 2026. URLhttps://arxiv.org/abs/2606.20756

work page internal anchor Pith review Pith/arXiv arXiv 2026

[38] [38]

Diffnmr: diffusion models for nuclear magnetic resonance spectra elucidation.Materials Futures, 5(1):015601, 2026

Qingsong Yang, Binglan Wu, Xuwei Liu, Bo Chen, Wei Li, Gen Long, Xin Chen, and Mingjun Xiao. Diffnmr: diffusion models for nuclear magnetic resonance spectra elucidation.Materials Futures, 5(1):015601, 2026

2026

[39] [39]

A deep learning model for predicting selected organic molecular spectra.Nature Computational Science, 3(11):957–964, 2023

Zihan Zou, Yujin Zhang, Lijun Liang, Mingzhi Wei, Jiancai Leng, Jun Jiang, Yi Luo, and Wei Hu. A deep learning model for predicting selected organic molecular spectra.Nature Computational Science, 3(11):957–964, 2023. 13 SpectraAI Brief Communication Appendix A Chemical Preliminaries Formally, we model the spectrum as a continuous functionx(δ) :R→Rover th...

2023

[40] [40]

Do not plan any candidate search or structural modification that violates the molecular formula

Treat the molecular formula as a hard constraint. Do not plan any candidate search or structural modification that violates the molecular formula

[41] [41]

If the input contains expanded repeated shifts, do not collapse repeated peaks unless explicitly instructed

Use the 1H NMR spectrum to reason about proton environments, integration, multiplicity when available, and repeated or equivalent signals. If the input contains expanded repeated shifts, do not collapse repeated peaks unless explicitly instructed

[42] [42]

Use the 13C NMR spectrum as strong evidence for carbon environments, including carbonyl-like carbons, aromatic or alkene carbons, oxygen-bearing carbons, and saturated aliphatic carbons

[43] [43]

Use optional experimental metadata only as contextual evidence. Metadata such as reaction precur- sors, reagents, catalysts, biological source, or isolation conditions can guide hypotheses, but cannot override the molecular formula or spectral evidence

[44] [44]

A memory cannot override the query spectra, molecular formula, or verifier evidence

Use recalled memories only as analogical evidence from previously confirmed cases. A memory cannot override the query spectra, molecular formula, or verifier evidence

[45] [45]

Retrieval rank alone should not suppress plausiblede novocandidates before peak–atom verification

Preserve both retrieval andde novocandidates when both are useful. Retrieval rank alone should not suppress plausiblede novocandidates before peak–atom verification

[46] [46]

Prefer a larger or more diverse candidate pool when the spectra suggest compact natural-product- like scaffolds, fused rings, lactones, enones, unusual oxygenation patterns, or other rare scaffold types

[47] [47]

Ifpreviousverifierfeedbackidentifiesunmatchedpeaks, inconsistentlocalassignments, orunresolved mismatch regions, plan targeted optimization rather than blind regeneration

[48] [48]

analysis

Return JSON only. Do not include free-form text outside the JSON object. The Planner returns the following JSON schema: { "analysis": "short evidence-grounded reasoning about formula and NMR signals", "use_retrieval": "<true_or_false>", "use_denovo": "<true_or_false>", "retrieval_top_k": "<integer>", "denovo_top_k": "<integer>", "save_pool_file": "<true_o...

[49] [49]

Execute retrieval,de novogeneration, pool merging, reranking, optimization, and molecular editing only when requested by the Planner or by verifier feedback

[50] [50]

Preserve candidate provenance whenever possible, including SMILES string, source label, source rank, source score, molecular formula, spectrum metadata, and candidate-pool path

[51] [51]

Do not discard a candidate solely because it comes from a lower-volume source

Keep retrieval-derived candidates andde novocandidates visible for downstream verification. Do not discard a candidate solely because it comes from a lower-volume source

[52] [52]

Deduplicate candidates by canonical non-isomeric SMILES while retaining all available source prove- nance

[53] [53]

Candidate con- struction, candidate merging, optimization, and verification must remain auditable as separate op- erations

Do not silently invoke retrieval orde novogeneration inside an optimization step. Candidate con- struction, candidate merging, optimization, and verification must remain auditable as separate op- erations

[54] [54]

Optimization tools should operate only on an existing candidate pool

[55] [55]

Verifier-guided in-place molecular editing may be invoked only when the verifier localizes a high- confidence mismatch to a specific atom or small chemical environment

[56] [56]

After any local edit, retain the unedited parent candidate for comparison

[57] [57]

If a tool fails, returns invalid output, or produces an empty candidate pool, report the failure explicitly in the structured execution output

[58] [58]

You must not provide the final molecular structure unless the Peak–Atom Verifier has supplied sufficient evidence

Return structured execution results, including generated files, candidate counts, source composition, tool status, and any warnings needed by the verifier. You must not provide the final molecular structure unless the Peak–Atom Verifier has supplied sufficient evidence. B.2.3 Peak–Atom Verifier System Prompt Peak–Atom Verifier system prompt You are an exp...

[59] [59]

Molecular formula consistency between the query and each candidate

[60] [60]

Overall NMR similarity score from the reranking tool

[61] [61]

Matched query peaks and their assigned candidate atoms

[62] [62]

Unmatched query peaks, especially diagnostic peaks that indicate missing functional groups or unresolved local environments

[63] [63]

Unused predicted peaks, especially predicted signals that have no reasonable support in the experi- mental spectrum

[64] [64]

1H and 13C residuals, including whether the largest errors occur in chemically diagnostic regions

[65] [65]

Atom-level assignment summaries, including which parts of the molecule are well supported and which regions remain uncertain

[66] [66]

16 SpectraAI Brief Communication

Candidate provenance, including retrieval,de novo, optimized, merged, edited, or seed sources. 16 SpectraAI Brief Communication

[67] [67]

These may provide supporting context but cannot override the current query evidence

Previous verifier outputs or recalled confirmed memories, if available. These may provide supporting context but cannot override the current query evidence. Decision instructions:

[68] [68]

Returnacceptonly when both 1H and 13C evidence are coherent and no major diagnostic query peaks remain unexplained

[69] [69]

Ade novocandidate may be accepted over retrieval candidates if its formula consistency, spectral similarity, and peak–atom assignments provide stronger evidence

[70] [70]

Returnneed_optwhen the best candidate is globally reasonable but contains local mismatches that can be targeted by fragment optimization or in-place editing

[71] [71]

Returnneed_bigger_poolwhen the candidate pool lacks sufficiently diverse or formula-compatible alternatives

[72] [72]

Returnneed_retrywhen tool outputs are invalid, incomplete, inconsistent, or insufficient for evidence-based verification

[73] [73]

verdict":

When optimization is needed, provide concrete mismatch descriptions, including the relevant un- matched peaks, poorly matched atoms, or local structural regions. Return JSON only using the required schema. The Peak–Atom Verifier returns the following JSON schema: { "verdict": "<accept | need_opt | need_bigger_pool | need_retry>", "analysis": "evidence-gro...

work page arXiv