Towards Generalizable and Evidential Nuclear Magnetic Resonance-Based Molecular Structure Elucidation via Large Language Model Agent
Pith reviewed 2026-06-30 07:26 UTC · model grok-4.3
The pith
NMRAgent is an LLM-powered agent that plans, proposes, verifies, and refines molecular structures from NMR spectra and formulas to outperform prior methods on novel scaffolds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NMRAgent takes experimental NMR spectra and molecular formula as input, plans the elucidation process, proposes candidate structures, verifies peak-atom consistency, and refines misaligned substructures through formula-aware fragment optimization. Enabled by its evidential reasoning, NMRAgent outperforms state-of-the-art methods, improving top-1 accuracy by 46.5% and Tanimoto similarity by 0.502 on a scaffold-split benchmark with novel scaffolds in the test set. It also elucidates the structures of two previously unknown natural products isolated from Hydrangea davidii and Vitex trifolia and corrects structural misassignments in established literature.
What carries the argument
NMRAgent, an evidential reasoning agent that integrates specialized spectral analysis tools with chemical knowledge graphs to mimic human expert deductive reasoning through planning, proposal, verification, and optimization steps.
If this is right
- Top-1 accuracy improves by 46.5 percent and Tanimoto similarity by 0.502 over prior methods on scaffold-split tests with novel structures.
- The agent can determine structures of previously unknown natural products from plant isolates.
- It can identify and correct structural misassignments reported in the chemical literature.
- Reasoning steps remain inspectable at the level of peak-atom assignments and fragment optimizations.
Where Pith is reading between the lines
- The same planning-and-verification loop could be adapted to other spectroscopic data such as mass spectra if the corresponding analysis tools are added.
- Performance on very large or flexible molecules may still depend on the completeness of the underlying chemical knowledge graph.
- The approach points toward hybrid systems in which LLM agents coordinate multiple analytical instruments rather than relying on any single data type.
Load-bearing premise
The large language model can reliably generate chemically valid plans, structure proposals, and consistency checks without persistent hallucinations that evade the verification and optimization steps.
What would settle it
Running the agent on a collection of NMR spectra for molecules with known novel scaffolds where it outputs structures that pass all internal consistency checks but are chemically incorrect would show the central claim does not hold.
read the original abstract
Nuclear Magnetic Resonance (NMR) spectroscopy is the gold standard for molecular structure elucidation, yet interpreting complex spectra for unknown molecules remains a bottleneck reliant on human expertise. While artificial intelligence has advanced this field, current methods face a critical trade-off: database retrieval cannot identify novel scaffolds, while de novo molecular structure elucidation models operate as black boxes, lacking the atom-level interpretability required for rigorous scientific validation. Here, we present NMRAgent, an evidential reasoning agent powered by large language models (LLMs) that bridges this gap by integrating specialized spectral analysis tools with chemical knowledge graphs. Unlike previous approaches, NMRAgent mimics the deductive reasoning of human experts: it takes experimental NMR spectra and molecular formula as input, plans the elucidation process, proposes candidate structures, verifies peak-atom consistency, and refines misaligned substructure through formula-aware fragment optimization. Enabled by its evidential reasoning, NMRAgent outperforms state-of-the-art methods, improving top-1 accuracy by 46.5% and Tanimoto similarity by 0.502 on a scaffold-split benchmark with novel scaffolds in the test set. Besides, we demonstrate the agent's practical utility by elucidating the structures of two previously unknown natural products isolated from Hydrangea davidii and Vitex trifolia, and by correcting structural misassignments in established literature. By combining high-accuracy prediction with transparent and evidence-based reasoning, NMRAgent establishes a new paradigm for interpretable AI in analytical chemistry.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces NMRAgent, an LLM-based evidential reasoning agent for NMR molecular structure elucidation. It integrates spectral analysis tools and chemical knowledge graphs to plan elucidation, propose candidates, verify peak-atom consistency, and refine via formula-aware fragment optimization. The central claims are a 46.5% improvement in top-1 accuracy and 0.502 gain in Tanimoto similarity over SOTA on a scaffold-split benchmark containing novel scaffolds, plus successful structure elucidation for two previously unknown natural products from Hydrangea davidii and Vitex trifolia, and correction of literature misassignments.
Significance. If the empirical results and real-world cases hold under rigorous validation, the work would represent a meaningful advance in interpretable AI for analytical chemistry by addressing the trade-off between database retrieval (limited to known scaffolds) and black-box de novo models. The explicit planning-verification-optimization loop and evidential reasoning provide a human-mimetic workflow that could improve trust and adoption in structure elucidation tasks. The scaffold-split evaluation and natural-product demonstrations are positive steps toward generalizability claims.
major comments (3)
- [Results/Methods] Benchmark construction (Results or Methods section): The scaffold-split benchmark is load-bearing for the generalizability claim (46.5% top-1 accuracy gain), yet no details are supplied on the total number of molecules, scaffold selection criteria, how novelty of test scaffolds was verified against the training distribution, or any statistical significance testing of the reported improvements. This omission prevents assessment of whether the performance edge is robust or artifactual.
- [Application/Results] Natural-product validation (Application or Results section): The elucidation of structures for the two unknown natural products is presented as practical evidence, but the manuscript provides no information on independent validation methods (e.g., comparison to synthetic standards, additional spectroscopic data, or cross-validation by human experts), which is required to substantiate that the agent's output survived verification steps for truly novel scaffolds.
- [Agent architecture/Experiments] Verification robustness (Agent architecture or Experiments section): The central assumption that peak-atom consistency checks plus formula-aware optimization will reliably reject chemically plausible but incorrect LLM proposals is not supported by any error analysis, failure-case reporting, or ablation on ambiguous peak assignments. For novel scaffolds this is load-bearing, as the skeptic concern about hallucinations evading tolerance-based checks remains unaddressed by the presented evidence.
minor comments (2)
- [Abstract/Methods] The abstract and main text should explicitly define the Tanimoto similarity metric used and the exact top-k settings for the accuracy metric to allow direct comparison with prior work.
- [Methods] Notation for evidential reasoning components (e.g., how evidence scores are aggregated) should be introduced with a clear equation or pseudocode early in the methods to improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight important areas for strengthening the manuscript's clarity and rigor. We address each major comment below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [Results/Methods] Benchmark construction (Results or Methods section): The scaffold-split benchmark is load-bearing for the generalizability claim (46.5% top-1 accuracy gain), yet no details are supplied on the total number of molecules, scaffold selection criteria, how novelty of test scaffolds was verified against the training distribution, or any statistical significance testing of the reported improvements. This omission prevents assessment of whether the performance edge is robust or artifactual.
Authors: We agree that additional details on benchmark construction are needed to fully support the generalizability claims. In the revised manuscript, we will expand the Methods section to specify the total number of molecules in the benchmark, the scaffold selection criteria employed, the procedure used to verify novelty of test scaffolds relative to the training distribution, and the results of statistical significance testing on the reported accuracy and similarity improvements. revision: yes
-
Referee: [Application/Results] Natural-product validation (Application or Results section): The elucidation of structures for the two unknown natural products is presented as practical evidence, but the manuscript provides no information on independent validation methods (e.g., comparison to synthetic standards, additional spectroscopic data, or cross-validation by human experts), which is required to substantiate that the agent's output survived verification steps for truly novel scaffolds.
Authors: We acknowledge this point. In the revised manuscript, we will add details in the Application section on the independent validation methods used for the two natural products (from Hydrangea davidii and Vitex trifolia), including any comparisons to additional spectroscopic data or expert confirmation that supported the agent's structure assignments. revision: yes
-
Referee: [Agent architecture/Experiments] Verification robustness (Agent architecture or Experiments section): The central assumption that peak-atom consistency checks plus formula-aware optimization will reliably reject chemically plausible but incorrect LLM proposals is not supported by any error analysis, failure-case reporting, or ablation on ambiguous peak assignments. For novel scaffolds this is load-bearing, as the skeptic concern about hallucinations evading tolerance-based checks remains unaddressed by the presented evidence.
Authors: We agree that explicit analysis of verification robustness is important. In the revised manuscript, we will add an error analysis subsection in the Experiments, including selected failure cases, and ablations isolating the contributions of peak-atom consistency checks and formula-aware fragment optimization, to demonstrate their role in rejecting incorrect proposals on novel scaffolds. revision: yes
Circularity Check
No circularity: empirical performance claims rest on external benchmarks
full rationale
The paper's central claims are empirical: NMRAgent is evaluated on a scaffold-split benchmark (top-1 accuracy +46.5%, Tanimoto +0.502) and demonstrated on two previously unknown natural products. These outcomes are measured against held-out ground-truth structures and experimental spectra; no equations, fitted parameters, or self-citations are invoked to derive the reported metrics. The method description (planning, peak-atom consistency, fragment optimization) is presented as an engineering pipeline whose validity is tested externally rather than defined into existence. No load-bearing step reduces to a self-referential definition or prior author result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption NMR spectra together with molecular formula provide sufficient information for structure elucidation when processed by the described agent workflow
invented entities (1)
-
NMRAgent
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Synthesis-driven structural revision of c5-hydroxy-cyclo (l-pro-l-leu) using electrochemical oxidation
Blessing B Akinlabi, Jared Balsz-Diaz, Mark C Walker, and Matthew R Aronoff. Synthesis-driven structural revision of c5-hydroxy-cyclo (l-pro-l-leu) using electrochemical oxidation. 2026
2026
-
[2]
Learning the language of nmr: structure elucidation from nmr spectra using transformer models
Marvin Alberts, Federico Zipoli, and Alain Vaucher. Learning the language of nmr: structure elucidation from nmr spectra using transformer models. InAI for Accelerated Materials Design-NeurIPS 2023 Workshop, 2023
2023
-
[3]
Buildingaknowledgegraphtoenableprecisionmedicine
PayalChandak, KexinHuang, andMarinkaZitnik. Buildingaknowledgegraphtoenableprecisionmedicine. Scientific data, 10(1):67, 2023
2023
-
[4]
arXiv preprint arXiv:2010.09885 , year=
Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. Chemberta: Large-scale self-supervised pretraining for molecular property prediction.arXiv preprint arXiv:2010.09885, 2020
-
[5]
Confirmation of the revised structure of samoquasine a and a proposed structural revision of cherimoline.Journal of natural products, 81(7):1658–1665, 2018
Francis Dhoro, Jesse Parkin-Gibbs, Matthew McIldowie, Brian W Skelton, and Matthew J Piggott. Confirmation of the revised structure of samoquasine a and a proposed structural revision of cherimoline.Journal of natural products, 81(7):1658–1665, 2018
2018
-
[6]
Self-consistent perturbation theory of diamagnetism: I
Robert Ditchfield. Self-consistent perturbation theory of diamagnetism: I. a gauge-invariant lcao method for nmr chemical shifts.Molecular Physics, 27(4):789–807, 1974
1974
-
[7]
The faiss library.IEEE Transactionson Big Data, 2025
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library.IEEE Transactionson Big Data, 2025
2025
-
[8]
gold standard
Abdul-Hamid Emwas, Kacper Szczepski, Benjamin Gabriel Poulson, Kousik Chandra, Ryan T McKay, Manel Dhahri, Fatimah Alahmari, Lukasz Jaremko, Joanna Izabela Lachowicz, and Mariusz Jaremko. Nmr as a “gold standard” method in drug design and discovery.Molecules, 25(20):4597, 2020
2020
-
[9]
Zheng Fang, Chen Yang, Hai-tao Yu, Haoming Luo, Haitao He, Jiaqing Xie, Zhuo Yang, and Jun Xia. Nmrgym: A comprehensive benchmark for nuclear magnetic resonance based molecular structure elucidation.arXiv preprint arXiv:2601.15763, 2026
-
[10]
John Wiley & Sons, 2012
Leslie D Field, Sev Sternhell, and John R Kalman.Organic Structures from Spectra. John Wiley & Sons, 2012
2012
-
[11]
M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Men- nucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. Williams-Young, F. Ding, F. Lipparini, F. Egidi, J. Goin...
2016
-
[12]
Chembl: a large-scale bioactivity database for drug discovery
Anna Gaulton, Louisa J Bellis, A Patricia Bento, Jon Chambers, Mark Davies, Anne Hersey, Yvonne Light, Shaun McGlinchey, David Michalovich, Bissan Al-Lazikani, et al. Chembl: a large-scale bioactivity database for drug discovery. Nucleic acids research, 40(D1):D1100–D1107, 2012
2012
-
[13]
Equivariant diffusion for molecule generation in 3d
Emiel Hoogeboom, Vıctor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d. InInternational conference on machine learning, pages 8867–8887. PMLR, 2022
2022
-
[14]
Accurate and efficient structure elucidation from routine one-dimensional nmr spectra using multitask machine learning
Frank Hu, Michael S Chen, Grant M Rotskoff, Matthew W Kanan, and Thomas E Markland. Accurate and efficient structure elucidation from routine one-dimensional nmr spectra using multitask machine learning. ACS Central Science, 10(11):2162–2170, 2024
2024
-
[15]
Nmr-solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization
Yongqi Jin, Jun-Jie Wang, Fanjie Xu, Xiaohong Ji, Zhifeng Gao, Linfeng Zhang, Guolin Ke, Rong Zhu, and Weinan E. Nmr-solver: automated structure elucidation via large-scale spectral matching and physics-guided fragment optimization. Nature Communications, 2026
2026
-
[16]
Prediction of chemical shift in nmr: A review.Magnetic Resonance in Chemistry, 60(11):1021–1031, 2022
Eric Jonas, Stefan Kuhn, and Nils Schlörer. Prediction of chemical shift in nmr: A review.Magnetic Resonance in Chemistry, 60(11):1021–1031, 2022
2022
-
[17]
Deepsat: learning molecular structures from nuclear magnetic resonance data
Hyun Woo Kim, Chen Zhang, Raphael Reher, Mingxun Wang, Kelsey L Alexander, Louis-Félix Nothias, Yoo Kyong Han, Hyeji Shin, Ki Yong Lee, Kyu Hyeong Lee, et al. Deepsat: learning molecular structures from nuclear magnetic resonance data. Journal of Cheminformatics, 15(1):71, 2023
2023
-
[18]
Altechromones a and b, new plant growth regulators produced by the fungus, alternaria sp.Bioscience, biotechnology,and biochemistry, 56(10):1664– 1665, 1992
Yasuo Kimura, Takashi Mizuno, Hiromitsu Nakajima, and Takashi Hamasaki. Altechromones a and b, new plant growth regulators produced by the fungus, alternaria sp.Bioscience, biotechnology,and biochemistry, 56(10):1664– 1665, 1992
1992
-
[19]
Structural revision and synthesis of altechromone a.Journal of natural products, 73(12):2064–2066, 2010
P Konigs, B Rinker, L Maus, M Nieger, J Rheinheimer, and SR Waldvogel. Structural revision and synthesis of altechromone a.Journal of natural products, 73(12):2064–2066, 2010
2064
-
[20]
Geometry-complete diffusion for 3d molecule generation and optimization
Alex Morehead and Jianlin Cheng. Geometry-complete diffusion for 3d molecule generation and optimization. Communications Chemistry, 7(1):150, 2024
2024
-
[21]
Samoquasine a, a benzoquinazoline alkaloid from the seeds of annona s quamosa.Journal of natural products, 63 (12):1707–1708, 2000
HiroshiMorita, YumikoSato, Kit-LamChan, Chee-YanChoo, HidejiItokawa, KoichiTakeya, andJun’ichiKobayashi. Samoquasine a, a benzoquinazoline alkaloid from the seeds of annona s quamosa.Journal of natural products, 63 (12):1707–1708, 2000
2000
-
[22]
Laxman D Nandawadekar, Hiba K P, Teena P George, Choppari Thirupathi, Aparna Sahoo, Sidharth Chopra, and D Srinivasa Reddy. Structural revision of a natural tetrahydroquinoxaline-6-carboxylic acid isolated from caulis sinomenii through total synthesis of both the regioisomers.Journal of Natural Products, 88(12):2978–2986, 2025
2025
-
[23]
John Wiley & Sons, 2011
Yong-Cheng Ning.Interpretation of Organic Spectra. John Wiley & Sons, 2011
2011
-
[24]
The natural products atlas 3.0: extending the database of microbially derived natural products.Nucleic Acids Research, 53(D1):D691–D699, 2025
Ella F Poynton, Jeffrey A van Santen, Matthew Pin, Marla Macias Contreras, Emily McMann, Jonathan Parra, Brandon Showalter, Liana Zaroubi, Katherine R Duncan, and Roger G Linington. The natural products atlas 3.0: extending the database of microbially derived natural products.Nucleic Acids Research, 53(D1):D691–D699, 2025
2025
-
[25]
Coumarin dimers from hydrangea davidii and their antimalarial activities.Magnetic Resonance in Chemistry, 2026
Wan-XianQi, Wei-JuanZu, GangLi, JingWang, Li-JuanLang, RongLiu, BeiJiang, andChao-JiangXiao. Coumarin dimers from hydrangea davidii and their antimalarial activities.Magnetic Resonance in Chemistry, 2026
2026
-
[26]
The lotus initiative for open knowledge management in natural products research.elife, 11:e70780, 2022
Adriano Rutz, Maria Sorokina, Jakub Galgonek, Daniel Mietchen, Egon Willighagen, Arnaud Gaudry, James G Graham, Ralf Stephan, Roderic Page, Jiří Vondrášek, et al. The lotus initiative for open knowledge management in natural products research.elife, 11:e70780, 2022
2022
-
[27]
Nmr-challenge
Ondrej Socha, Zuzana Osifová, and Martin Dracinsky. Nmr-challenge. com: An interactive website with exercises in solving structures from nmr spectra, 2023
2023
-
[28]
Coconut online: collection of open natural products database.Journal of Cheminformatics, 13(1):2, 2021
Maria Sorokina, Peter Merseburger, Kohulan Rajan, Mehmet Aziz Yirik, and Christoph Steinbeck. Coconut online: collection of open natural products database.Journal of Cheminformatics, 13(1):2, 2021
2021
-
[29]
Nmrshiftdb constructing a free chemical information system with open-source components.Journal of chemical information and computer sciences, 43(6):1733–1739, 2003
Christoph Steinbeck, Stefan Krause, and Stefan Kuhn. Nmrshiftdb constructing a free chemical information system with open-source components.Journal of chemical information and computer sciences, 43(6):1733–1739, 2003
2003
-
[30]
A transformer based generative chemical language ai model for structural elucidation of organic compounds
Xiaofeng Tan. A transformer based generative chemical language ai model for structural elucidation of organic compounds. Journal of cheminformatics, 17(1):103, 2025
2025
-
[31]
Cody Timmons and Peter Wipf. Density functional theory calculation of 13c nmr shifts of diazaphenanthrene alkaloids: reinvestigation of the structure of samoquasine a.The Journal of Organic Chemistry, 73(22):9168–9170, 2008. 12 SpectraAI Brief Communication
2008
-
[32]
Smiles, a chemical language and information system
David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988
1988
-
[33]
Efficient implementation of the gauge-independent atomic orbital method for nmr chemical shift calculations.Journal of the American Chemical Society, 112(23):8251–8260, 1990
Krzysztof Wolinski, James F Hinton, and Peter Pulay. Efficient implementation of the gauge-independent atomic orbital method for nmr chemical shift calculations.Journal of the American Chemical Society, 112(23):8251–8260, 1990
1990
-
[34]
Ziyu Xiong, Yichi Zhang, Foyez Alauddin, Chu Xin Cheng, Joon Soo An, Mohammad R Seyedsayamdost, and Ellen D Zhong. Atomic diffusion models for small molecule structure elucidation from nmr spectra.arXiv preprint arXiv:2512.03127, 2025
-
[35]
Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts.Nature Computational Science, 5(4):292–300, 2025
Fanjie Xu, Wentao Guo, Feng Wang, Lin Yao, Hongshuai Wang, Fujie Tang, Zhifeng Gao, Linfeng Zhang, Weinan E, Zhong-Qun Tian, et al. Toward a unified benchmark and framework for deep learning-based prediction of nuclear magnetic resonance chemical shifts.Nature Computational Science, 5(4):292–300, 2025
2025
-
[36]
Nmrmind: A transformer-based model enabling the elucidation from multidimensional nmr to structures
Xi Xue, Hanyu Sun, Jingying Sun, Luc Patiny, Xiangying Liu, Kai Chen, Jingjie Yan, Liangning Li, Xue Liu, Shu Xu, et al. Nmrmind: A transformer-based model enabling the elucidation from multidimensional nmr to structures. Analytical Chemistry, 97(41):22603–22614, 2025
2025
-
[37]
Chen Yang, Zheng Fang, Hanyu Sun, Fanjie Xu, Hongxin Xiang, Hanyu Gao, Xiangxiang Zeng, Yuqiang Li, Xiaojian Wang, and Jun Xia. A large-scale foundation model enables simulation-to-real adaptation for nuclear magnetic resonance-based molecular structure analysis, 2026. URLhttps://arxiv.org/abs/2606.20756
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[38]
Diffnmr: diffusion models for nuclear magnetic resonance spectra elucidation.Materials Futures, 5(1):015601, 2026
Qingsong Yang, Binglan Wu, Xuwei Liu, Bo Chen, Wei Li, Gen Long, Xin Chen, and Mingjun Xiao. Diffnmr: diffusion models for nuclear magnetic resonance spectra elucidation.Materials Futures, 5(1):015601, 2026
2026
-
[39]
A deep learning model for predicting selected organic molecular spectra.Nature Computational Science, 3(11):957–964, 2023
Zihan Zou, Yujin Zhang, Lijun Liang, Mingzhi Wei, Jiancai Leng, Jun Jiang, Yi Luo, and Wei Hu. A deep learning model for predicting selected organic molecular spectra.Nature Computational Science, 3(11):957–964, 2023. 13 SpectraAI Brief Communication Appendix A Chemical Preliminaries Formally, we model the spectrum as a continuous functionx(δ) :R→Rover th...
2023
-
[40]
Do not plan any candidate search or structural modification that violates the molecular formula
Treat the molecular formula as a hard constraint. Do not plan any candidate search or structural modification that violates the molecular formula
-
[41]
If the input contains expanded repeated shifts, do not collapse repeated peaks unless explicitly instructed
Use the 1H NMR spectrum to reason about proton environments, integration, multiplicity when available, and repeated or equivalent signals. If the input contains expanded repeated shifts, do not collapse repeated peaks unless explicitly instructed
-
[42]
Use the 13C NMR spectrum as strong evidence for carbon environments, including carbonyl-like carbons, aromatic or alkene carbons, oxygen-bearing carbons, and saturated aliphatic carbons
-
[43]
Use optional experimental metadata only as contextual evidence. Metadata such as reaction precur- sors, reagents, catalysts, biological source, or isolation conditions can guide hypotheses, but cannot override the molecular formula or spectral evidence
-
[44]
A memory cannot override the query spectra, molecular formula, or verifier evidence
Use recalled memories only as analogical evidence from previously confirmed cases. A memory cannot override the query spectra, molecular formula, or verifier evidence
-
[45]
Retrieval rank alone should not suppress plausiblede novocandidates before peak–atom verification
Preserve both retrieval andde novocandidates when both are useful. Retrieval rank alone should not suppress plausiblede novocandidates before peak–atom verification
-
[46]
Prefer a larger or more diverse candidate pool when the spectra suggest compact natural-product- like scaffolds, fused rings, lactones, enones, unusual oxygenation patterns, or other rare scaffold types
-
[47]
Ifpreviousverifierfeedbackidentifiesunmatchedpeaks, inconsistentlocalassignments, orunresolved mismatch regions, plan targeted optimization rather than blind regeneration
-
[48]
analysis
Return JSON only. Do not include free-form text outside the JSON object. The Planner returns the following JSON schema: { "analysis": "short evidence-grounded reasoning about formula and NMR signals", "use_retrieval": "<true_or_false>", "use_denovo": "<true_or_false>", "retrieval_top_k": "<integer>", "denovo_top_k": "<integer>", "save_pool_file": "<true_o...
-
[49]
Execute retrieval,de novogeneration, pool merging, reranking, optimization, and molecular editing only when requested by the Planner or by verifier feedback
-
[50]
Preserve candidate provenance whenever possible, including SMILES string, source label, source rank, source score, molecular formula, spectrum metadata, and candidate-pool path
-
[51]
Do not discard a candidate solely because it comes from a lower-volume source
Keep retrieval-derived candidates andde novocandidates visible for downstream verification. Do not discard a candidate solely because it comes from a lower-volume source
-
[52]
Deduplicate candidates by canonical non-isomeric SMILES while retaining all available source prove- nance
-
[53]
Candidate con- struction, candidate merging, optimization, and verification must remain auditable as separate op- erations
Do not silently invoke retrieval orde novogeneration inside an optimization step. Candidate con- struction, candidate merging, optimization, and verification must remain auditable as separate op- erations
-
[54]
Optimization tools should operate only on an existing candidate pool
-
[55]
Verifier-guided in-place molecular editing may be invoked only when the verifier localizes a high- confidence mismatch to a specific atom or small chemical environment
-
[56]
After any local edit, retain the unedited parent candidate for comparison
-
[57]
If a tool fails, returns invalid output, or produces an empty candidate pool, report the failure explicitly in the structured execution output
-
[58]
You must not provide the final molecular structure unless the Peak–Atom Verifier has supplied sufficient evidence
Return structured execution results, including generated files, candidate counts, source composition, tool status, and any warnings needed by the verifier. You must not provide the final molecular structure unless the Peak–Atom Verifier has supplied sufficient evidence. B.2.3 Peak–Atom Verifier System Prompt Peak–Atom Verifier system prompt You are an exp...
-
[59]
Molecular formula consistency between the query and each candidate
-
[60]
Overall NMR similarity score from the reranking tool
-
[61]
Matched query peaks and their assigned candidate atoms
-
[62]
Unmatched query peaks, especially diagnostic peaks that indicate missing functional groups or unresolved local environments
-
[63]
Unused predicted peaks, especially predicted signals that have no reasonable support in the experi- mental spectrum
-
[64]
1H and 13C residuals, including whether the largest errors occur in chemically diagnostic regions
-
[65]
Atom-level assignment summaries, including which parts of the molecule are well supported and which regions remain uncertain
-
[66]
16 SpectraAI Brief Communication
Candidate provenance, including retrieval,de novo, optimized, merged, edited, or seed sources. 16 SpectraAI Brief Communication
-
[67]
These may provide supporting context but cannot override the current query evidence
Previous verifier outputs or recalled confirmed memories, if available. These may provide supporting context but cannot override the current query evidence. Decision instructions:
-
[68]
Returnacceptonly when both 1H and 13C evidence are coherent and no major diagnostic query peaks remain unexplained
-
[69]
Ade novocandidate may be accepted over retrieval candidates if its formula consistency, spectral similarity, and peak–atom assignments provide stronger evidence
-
[70]
Returnneed_optwhen the best candidate is globally reasonable but contains local mismatches that can be targeted by fragment optimization or in-place editing
-
[71]
Returnneed_bigger_poolwhen the candidate pool lacks sufficiently diverse or formula-compatible alternatives
-
[72]
Returnneed_retrywhen tool outputs are invalid, incomplete, inconsistent, or insufficient for evidence-based verification
-
[73]
When optimization is needed, provide concrete mismatch descriptions, including the relevant un- matched peaks, poorly matched atoms, or local structural regions. Return JSON only using the required schema. The Peak–Atom Verifier returns the following JSON schema: { "verdict": "<accept | need_opt | need_bigger_pool | need_retry>", "analysis": "evidence-gro...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.