ProvMind: Provenance-grounded reasoning for materials synthesis

Koji Tsuda; Ryo Tamura; Yiming Zhang

arxiv: 2605.28487 · v1 · pith:75FTMG6Vnew · submitted 2026-05-27 · 💻 cs.AI · cs.LG

ProvMind: Provenance-grounded reasoning for materials synthesis

Yiming Zhang , Ryo Tamura , Koji Tsuda This is my paper

Pith reviewed 2026-06-29 12:34 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords materials synthesisprovenance reasoningprocess optimizationbenchmark evaluationlanguage model reasoningout-of-distribution testingcausal consistencyretrieval augmentation

0 comments

The pith

ProvMind retrieves analogous synthesis processes from literature graphs to score options and guide language-model decisions, reaching 52.84% accuracy on dual-OOD splits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs MatProcBench from literature-mined MatPROV graphs to test seven process-reasoning tasks that require tracking route continuity, inferring step variables, and checking causal consistency. It presents ProvMind as a framework that pulls similar past processes, turns provenance relations into option-level compatibility scores, and passes those scores to a language model for final constrained choices. This method records 52.84% accuracy on a strict dual-OOD split that mixes temporal and material-class shifts, beating prompting, retrieval-augmented, and fine-tuning baselines. A reader would care because synthesis procedures involve many interdependent choices whose errors waste laboratory effort. The evaluation design isolates whether provenance information helps when test cases lie outside the training distribution.

Core claim

ProvMind retrieves analogous training processes from MatPROV graphs, converts them into provenance-aware option-level compatibility scores, and uses a language model for constrained final decision making, achieving 52.84% accuracy on the dual-OOD split that combines temporal and material-class shift while outperforming prompting, retrieval-augmented and supervised fine-tuning baselines.

What carries the argument

ProvMind process-memory reasoning framework that retrieves analogous processes and produces provenance-aware option-level compatibility scores for language-model decisions.

If this is right

The approach improves performance across route-continuity, step-variable inference, and global causal-consistency tasks.
ProvMind maintains higher accuracy than baselines under both same-distribution and shift-aware evaluation settings.
Provenance relations from literature graphs supply usable signals for constrained decision making in process reasoning.
The dual-OOD split demonstrates robustness when both time period and material class differ from training data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the mined graphs prove reliable, the same retrieval-plus-scoring pattern could apply to other procedural domains that produce step-wise causal records.
The framework might be extended by replacing static literature graphs with live experimental logs to keep compatibility scores current.
Performance gains on dual-OOD splits suggest the method could reduce invalid route proposals in automated materials-planning systems.

Load-bearing premise

The literature-mined MatPROV graphs faithfully capture the true causal dependencies, step variables, and provenance relations present in actual laboratory synthesis procedures.

What would settle it

An experiment that replaces the mined MatPROV graphs with manually verified causal graphs for the same procedures and measures whether ProvMind accuracy falls below the reported baseline levels.

read the original abstract

Materials process optimization requires reasoning over routes, conditions, tools and causal dependencies, yet most computational formulations flatten synthesis procedures into text or ordered steps. We introduce MatProcBench, a provenance-grounded benchmark constructed from literature-mined MatPROV graphs, to evaluate seven process-reasoning tasks spanning route continuity, step-level variable inference and global causal consistency under both same-split and shift-aware evaluation, including a strict dual-OOD split that combines temporal and material-class shift. We further introduce ProvMind, a process-memory reasoning framework that retrieves analogous training processes, converts them into provenance-aware option-level compatibility scores, and uses a language model for constrained final decision making. ProvMind achieves 52.84\% accuracy on the dual-OOD split, outperforming prompting, retrieval-augmented and supervised fine-tuning baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces MatProcBench with a dual-OOD split and a retrieval-plus-LM framework called ProvMind that scores 52.84% on the hard split, but the unvalidated literature-mined graphs make the ground truth shaky.

read the letter

The core contribution is a benchmark built from MatPROV graphs that defines seven process-reasoning tasks and tests them under a dual-OOD protocol combining temporal and material-class shifts. ProvMind retrieves analogous training processes, turns them into provenance-aware compatibility scores, and hands the final choice to a language model. That setup beats the prompting, RAG, and fine-tuning baselines they report.

The dual-OOD split and the explicit use of provenance structure are the parts that feel fresh. Most prior work on synthesis planning flattens routes into sequences or text, so adding step-level variables and causal consistency checks is a reasonable step forward for this subfield.

The main weakness is that the ground-truth labels come straight from literature-mined graphs with no reported expert validation, inter-annotator checks, or comparison to lab records. If the extraction step systematically misses or mislabels dependencies, every task and both OOD splits inherit that error. The abstract gives a single accuracy number without error bars or ablation numbers, which makes it hard to judge how stable the 52.84% result actually is.

The work is aimed at people already working on AI for materials process optimization who want concrete benchmarks with shift-aware evaluation. It is coherent enough on its own terms to deserve a full referee process rather than a desk reject, though any review would need to press hard on the graph construction pipeline and the missing statistical details.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces MatProcBench, a provenance-grounded benchmark derived from literature-mined MatPROV graphs, to evaluate seven process-reasoning tasks (route continuity, step-level variable inference, global causal consistency) under same-split and shift-aware settings including a strict dual-OOD split combining temporal and material-class shifts. It proposes ProvMind, a framework that retrieves analogous training processes, converts them into provenance-aware option-level compatibility scores, and employs a language model for constrained final decisions, reporting 52.84% accuracy on the dual-OOD split that outperforms prompting, retrieval-augmented, and supervised fine-tuning baselines.

Significance. If the MatPROV graphs faithfully encode real causal dependencies and provenance relations, the work would advance AI for materials synthesis by moving beyond flat text or ordered-step representations toward explicit process reasoning. The dual-OOD evaluation protocol is a clear strength for testing generalization, and the retrieval-plus-constrained-LM design of ProvMind offers a concrete, reproducible approach. The absence of machine-checked proofs or parameter-free derivations is noted, but the falsifiable task definitions on explicit OOD splits provide a useful testbed if the underlying graphs are validated.

major comments (2)

[Abstract and benchmark construction] Abstract and benchmark construction section: The 52.84% dual-OOD accuracy and all baseline comparisons rest on ground-truth labels taken directly from the literature-mined MatPROV graphs. No expert validation, inter-annotator agreement scores, or comparison against laboratory records is reported for the extracted causal dependencies, step variables, or provenance relations. This assumption is load-bearing for interpreting the result as evidence of genuine process-reasoning capability rather than extraction artifacts.
[Results and evaluation] Results and evaluation section: The headline accuracy is presented without error bars, ablation studies on the retrieval or scoring components, or basic dataset statistics (number of processes, material-class distribution, task balance). This prevents assessment of whether the reported outperformance is robust or sensitive to post-hoc choices in the dual-OOD split.

minor comments (2)

[Benchmark description] The definition of the dual-OOD split (temporal + material-class) should be stated explicitly with concrete criteria for the temporal cutoff and material-class partitioning in the main text rather than only in the abstract.
[Task definitions] Figure or table captions for the seven tasks could more clearly indicate which tasks involve step-level variables versus global causal consistency to aid reader navigation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the strengths of the dual-OOD protocol and ProvMind framework. We respond to each major comment below, acknowledging where the manuscript is limited and outlining planned revisions.

read point-by-point responses

Referee: [Abstract and benchmark construction] Abstract and benchmark construction section: The 52.84% dual-OOD accuracy and all baseline comparisons rest on ground-truth labels taken directly from the literature-mined MatPROV graphs. No expert validation, inter-annotator agreement scores, or comparison against laboratory records is reported for the extracted causal dependencies, step variables, or provenance relations. This assumption is load-bearing for interpreting the result as evidence of genuine process-reasoning capability rather than extraction artifacts.

Authors: We agree that the fidelity of the literature-mined MatPROV graphs to real causal dependencies is essential for interpreting results as evidence of process reasoning rather than extraction artifacts. The current manuscript does not report expert validation, inter-annotator agreement, or laboratory-record comparisons. In the revised version we will expand the benchmark construction section to explicitly state this reliance and add a limitations paragraph noting the absence of such validation. We will continue to emphasize that the tasks are defined on explicit, falsifiable provenance relations within the graphs, providing a reproducible testbed, while avoiding any claim of laboratory validation. revision: yes
Referee: [Results and evaluation] Results and evaluation section: The headline accuracy is presented without error bars, ablation studies on the retrieval or scoring components, or basic dataset statistics (number of processes, material-class distribution, task balance). This prevents assessment of whether the reported outperformance is robust or sensitive to post-hoc choices in the dual-OOD split.

Authors: We acknowledge that the results section omits error bars, ablations on retrieval and scoring, and basic dataset statistics, which limits evaluation of robustness. In the revision we will add dataset statistics (number of processes, material-class distribution, task balance) and ablation studies on the retrieval and scoring components of ProvMind. The dual-OOD split is a fixed, deterministic partition; we will clarify this and report any variance arising from retrieval configurations or model seeds where applicable. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical benchmark evaluation is self-contained

full rationale

The paper reports an empirical accuracy (52.84% on dual-OOD) measured against ground-truth labels taken from literature-mined MatPROV graphs on explicitly constructed OOD splits. The ProvMind framework performs retrieval of training processes followed by LM-based constrained decision making; this performance number is not obtained by fitting a parameter to a subset and then re-predicting a closely related quantity, nor by any self-definitional equation or self-citation chain that reduces the result to its inputs by construction. The benchmark construction step is described at the level of literature mining without equations or load-bearing self-citations that would make the accuracy tautological. This is a standard held-out evaluation setup whose central claim remains independent of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that mined graphs are faithful and that the seven tasks adequately represent real synthesis reasoning needs; no free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption Literature-mined MatPROV graphs accurately reflect real synthesis provenance and causal structure
Benchmark is constructed directly from these graphs

invented entities (1)

ProvMind framework no independent evidence
purpose: Provenance-grounded process reasoning via retrieval and compatibility scoring
New system introduced to solve the stated tasks

pith-pipeline@v0.9.1-grok · 5665 in / 1187 out tokens · 44156 ms · 2026-06-29T12:34:28.196258+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 4 canonical work pages

[1]

Wang, H.et al.Scientific discovery in the age of artificial intelligence.Nature 620, 47–60 (2023)

2023
[2]

A., MacKnight, R., Kline, B

Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models.Nature624, 570–578 (2023)

2023
[3]

Stach, E.et al.Autonomous experimentation systems for materials development: A community perspective.Matter4, 2702–2726 (2021)

2021
[4]

J.et al.An autonomous laboratory for the accelerated synthesis of inorganic materials.Nature624, 86 (2023)

Szymanski, N. J.et al.An autonomous laboratory for the accelerated synthesis of inorganic materials.Nature624, 86 (2023)

2023
[5]

& Kumacheva, E

Abolhasani, M. & Kumacheva, E. The rise of self-driving labs in chemical and materials sciences.Nature Synthesis2, 483–492 (2023)

2023
[6]

K., Seshadri, R

Cheetham, A. K., Seshadri, R. & Wudl, F. Chemical synthesis and materials discovery.Nature Synthesis1, 514–520 (2022)

2022
[7]

Kim, E.et al.Materials synthesis insights from scientific literature via text extraction and machine learning.Chemistry of Materials29, 9436–9444 (2017)

2017
[8]

Scientific data6, 203 (2019)

Kononova, O.et al.Text-mined dataset of inorganic materials synthesis recipes. Scientific data6, 203 (2019)

2019
[9]

Wang, Z.et al.Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature.Scientific data9, 231 (2022)

2022
[10]

Predictive synthesis.Chemistry of Materials33, 4835–4841 (2021)

Kovnir, K. Predictive synthesis.Chemistry of Materials33, 4835–4841 (2021). 15

2021
[11]

& Okubo, T

Muraoka, K., Sada, Y., Miyazaki, D., Chaikittisilp, W. & Okubo, T. Linking synthesis and structure descriptors from a large collection of synthetic records of zeolite materials.Nature communications10, 4459 (2019)

2019
[12]

Huo, H.et al.Machine-learning rationalization and prediction of solid-state synthesis conditions.Chemistry of Materials34, 7323–7336 (2022)

2022
[13]

& Olivetti, E

Karpovich, C., Pan, E., Jensen, Z. & Olivetti, E. Interpretable machine learn- ing enabled inorganic reaction classification and synthesis condition prediction. Chemistry of Materials35, 1062–1079 (2023)

2023
[14]

Wang, Z.et al.Optimal thermodynamic conditions to minimize kinetic by- products in aqueous materials synthesis.Nature Synthesis3, 527–536 (2024)

2024
[15]

& Kumagai, M

Tsuruta, H. & Kumagai, M. Matprov: A provenance graph dataset of material synthesis extracted from scientific literature.arXiv preprint arXiv:2509.01042 (2025)

work page arXiv 2025
[16]

A concept for synthesis planning in solid-state chemistry.Angewandte Chemie International Edition41, 3746–3766 (2002)

Jansen, M. A concept for synthesis planning in solid-state chemistry.Angewandte Chemie International Edition41, 3746–3766 (2002)

2002
[17]

Aykol, M., Montoya, J. H. & Hummelshøj, J. Rational solid-state synthesis routes for inorganic materials.Journal of the American Chemical Society143, 9244– 9259 (2021)

2021
[18]

Kim, E.et al.Inorganic materials synthesis planning with literature-trained neural networks.Journal of chemical information and modeling60, 1194–1201 (2020)

2020
[19]

He, T.et al.Precursor recommendation for inorganic synthesis by machine learn- ing materials similarity from scientific literature.Science advances9, eadg8180 (2023)

2023
[20]

H., Chen, S

Kim, S., Noh, J., Gu, G. H., Chen, S. & Jung, Y. Predicting synthesis recipes of inorganic crystal materials using elementwise template formulation.Chemical Science15, 1039–1045 (2024)

2024
[21]

& Schrier, J

Kim, S., Jung, Y. & Schrier, J. Large language models for inorganic synthesis predictions.Journal of the American Chemical Society146, 19654–19659 (2024)

2024
[22]

Noh, H., Lee, N., Na, G. S. & Park, C. Retrieval-retro: retrieval-based inorganic retrosynthesis with expert knowledge.Advances in Neural Information Processing Systems37, 25375–25400 (2024)

2024
[23]

Prein, T.et al.Language models enable data-augmented synthesis planning for inorganic materials.ACS Applied Materials & Interfaces17, 69221–69233 (2025). 16

2025
[24]

Prein, T.et al.Retro-rank-in: a ranking-based approach for inorganic materials synthesis planning.arXiv preprint arXiv:2502.04289(2025)

work page arXiv 2025
[25]

S., Lee, N

Noh, H., Na, G. S., Lee, N. & Park, C. Msp-llm: A unified large language model framework for complete material synthesis planning.arXiv preprint arXiv:2602.07543(2026)

work page arXiv 2026
[26]

Advances in Neural Information Processing Systems33, 9459–9474 (2020)

Lewis, P.et al.Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems33, 9459–9474 (2020)

2020
[27]

Bran, A.et al.Augmenting large language models with chemistry tools

M. Bran, A.et al.Augmenting large language models with chemistry tools. Nature machine intelligence6, 525–535 (2024)

2024
[28]

Jin, B.et al.Graph chain-of-thought: Augmenting large language models by reasoning on graphs.Findings of the Association for Computational Linguistics: ACL 2024163–184 (2024)

2024
[29]

Hu, Y.et al.Grag: Graph retrieval-augmented generation.Findings of the Association for Computational Linguistics: NAACL 20254145–4157 (2025)

2025
[30]

Peng, B.et al.Graph retrieval-augmented generation: A survey.ACM Transactions on Information Systems44, 1–52 (2025)

2025
[31]

& Coley, C

David, N., Sun, W. & Coley, C. W. The promise and pitfalls of ai for molecular and materials synthesis.Nature Computational Science3, 362–364 (2023)

2023
[32]

& Plaza, E

Aamodt, A. & Plaza, E. Case-based reasoning: foundational issues, method- ological variations, and system approaches.AI Communications7, 39–52 (1994)

1994
[33]

& Bergmann, R

Zeyen, C., M¨ uller, G. & Bergmann, R. A conversational approach to process- oriented case-based reasoning.Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018)5404–5408 (2018)

2018
[34]

Sun, J.et al.Think-on-graph: Deep and responsible reasoning of large language model on knowledge graph.arXiv preprint arXiv:2307.07697(2023)

work page arXiv 2023
[35]

Chen, J.et al.Navigating phase diagram complexity to guide robotic inorganic materials synthesis.Nature Synthesis3, 606–614 (2024)

2024
[36]

W., Rogers, L., Green, W

Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity.ACS central science3, 1237–1245 (2017)

2017
[37]

V., Ashyrmamatov, I., Ko, J

Ucak, U. V., Ashyrmamatov, I., Ko, J. & Lee, J. Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments.Nature communications13, 1186 (2022). 17

2022
[38]

& Faulon, J.-L

Duigou, T., Meyer, P. & Faulon, J.-L. Retrorules 2026: an expanded database combining biochemical and organic reaction templates for pathway discovery. Nucleic Acids Research54, D1799–D1806 (2026)

2026
[39]

Chemical Reviews124, 9633–9732 (2024)

Tom, G.et al.Self-driving laboratories for chemistry and materials science. Chemical Reviews124, 9633–9732 (2024). Data availability The benchmarks introduced in this study are available at https://github.com/ ZHymLumine/MatProcBench. The source corpora from which these resources were derived are publicly available from their original providers: MatPROV a...

2024

[1] [1]

Wang, H.et al.Scientific discovery in the age of artificial intelligence.Nature 620, 47–60 (2023)

2023

[2] [2]

A., MacKnight, R., Kline, B

Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models.Nature624, 570–578 (2023)

2023

[3] [3]

Stach, E.et al.Autonomous experimentation systems for materials development: A community perspective.Matter4, 2702–2726 (2021)

2021

[4] [4]

J.et al.An autonomous laboratory for the accelerated synthesis of inorganic materials.Nature624, 86 (2023)

Szymanski, N. J.et al.An autonomous laboratory for the accelerated synthesis of inorganic materials.Nature624, 86 (2023)

2023

[5] [5]

& Kumacheva, E

Abolhasani, M. & Kumacheva, E. The rise of self-driving labs in chemical and materials sciences.Nature Synthesis2, 483–492 (2023)

2023

[6] [6]

K., Seshadri, R

Cheetham, A. K., Seshadri, R. & Wudl, F. Chemical synthesis and materials discovery.Nature Synthesis1, 514–520 (2022)

2022

[7] [7]

Kim, E.et al.Materials synthesis insights from scientific literature via text extraction and machine learning.Chemistry of Materials29, 9436–9444 (2017)

2017

[8] [8]

Scientific data6, 203 (2019)

Kononova, O.et al.Text-mined dataset of inorganic materials synthesis recipes. Scientific data6, 203 (2019)

2019

[9] [9]

Wang, Z.et al.Dataset of solution-based inorganic materials synthesis procedures extracted from the scientific literature.Scientific data9, 231 (2022)

2022

[10] [10]

Predictive synthesis.Chemistry of Materials33, 4835–4841 (2021)

Kovnir, K. Predictive synthesis.Chemistry of Materials33, 4835–4841 (2021). 15

2021

[11] [11]

& Okubo, T

Muraoka, K., Sada, Y., Miyazaki, D., Chaikittisilp, W. & Okubo, T. Linking synthesis and structure descriptors from a large collection of synthetic records of zeolite materials.Nature communications10, 4459 (2019)

2019

[12] [12]

Huo, H.et al.Machine-learning rationalization and prediction of solid-state synthesis conditions.Chemistry of Materials34, 7323–7336 (2022)

2022

[13] [13]

& Olivetti, E

Karpovich, C., Pan, E., Jensen, Z. & Olivetti, E. Interpretable machine learn- ing enabled inorganic reaction classification and synthesis condition prediction. Chemistry of Materials35, 1062–1079 (2023)

2023

[14] [14]

Wang, Z.et al.Optimal thermodynamic conditions to minimize kinetic by- products in aqueous materials synthesis.Nature Synthesis3, 527–536 (2024)

2024

[15] [15]

& Kumagai, M

Tsuruta, H. & Kumagai, M. Matprov: A provenance graph dataset of material synthesis extracted from scientific literature.arXiv preprint arXiv:2509.01042 (2025)

work page arXiv 2025

[16] [16]

A concept for synthesis planning in solid-state chemistry.Angewandte Chemie International Edition41, 3746–3766 (2002)

Jansen, M. A concept for synthesis planning in solid-state chemistry.Angewandte Chemie International Edition41, 3746–3766 (2002)

2002

[17] [17]

Aykol, M., Montoya, J. H. & Hummelshøj, J. Rational solid-state synthesis routes for inorganic materials.Journal of the American Chemical Society143, 9244– 9259 (2021)

2021

[18] [18]

Kim, E.et al.Inorganic materials synthesis planning with literature-trained neural networks.Journal of chemical information and modeling60, 1194–1201 (2020)

2020

[19] [19]

He, T.et al.Precursor recommendation for inorganic synthesis by machine learn- ing materials similarity from scientific literature.Science advances9, eadg8180 (2023)

2023

[20] [20]

H., Chen, S

Kim, S., Noh, J., Gu, G. H., Chen, S. & Jung, Y. Predicting synthesis recipes of inorganic crystal materials using elementwise template formulation.Chemical Science15, 1039–1045 (2024)

2024

[21] [21]

& Schrier, J

Kim, S., Jung, Y. & Schrier, J. Large language models for inorganic synthesis predictions.Journal of the American Chemical Society146, 19654–19659 (2024)

2024

[22] [22]

Noh, H., Lee, N., Na, G. S. & Park, C. Retrieval-retro: retrieval-based inorganic retrosynthesis with expert knowledge.Advances in Neural Information Processing Systems37, 25375–25400 (2024)

2024

[23] [23]

Prein, T.et al.Language models enable data-augmented synthesis planning for inorganic materials.ACS Applied Materials & Interfaces17, 69221–69233 (2025). 16

2025

[24] [24]

Prein, T.et al.Retro-rank-in: a ranking-based approach for inorganic materials synthesis planning.arXiv preprint arXiv:2502.04289(2025)

work page arXiv 2025

[25] [25]

S., Lee, N

Noh, H., Na, G. S., Lee, N. & Park, C. Msp-llm: A unified large language model framework for complete material synthesis planning.arXiv preprint arXiv:2602.07543(2026)

work page arXiv 2026

[26] [26]

Advances in Neural Information Processing Systems33, 9459–9474 (2020)

Lewis, P.et al.Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems33, 9459–9474 (2020)

2020

[27] [27]

Bran, A.et al.Augmenting large language models with chemistry tools

M. Bran, A.et al.Augmenting large language models with chemistry tools. Nature machine intelligence6, 525–535 (2024)

2024

[28] [28]

Jin, B.et al.Graph chain-of-thought: Augmenting large language models by reasoning on graphs.Findings of the Association for Computational Linguistics: ACL 2024163–184 (2024)

2024

[29] [29]

Hu, Y.et al.Grag: Graph retrieval-augmented generation.Findings of the Association for Computational Linguistics: NAACL 20254145–4157 (2025)

2025

[30] [30]

Peng, B.et al.Graph retrieval-augmented generation: A survey.ACM Transactions on Information Systems44, 1–52 (2025)

2025

[31] [31]

& Coley, C

David, N., Sun, W. & Coley, C. W. The promise and pitfalls of ai for molecular and materials synthesis.Nature Computational Science3, 362–364 (2023)

2023

[32] [32]

& Plaza, E

Aamodt, A. & Plaza, E. Case-based reasoning: foundational issues, method- ological variations, and system approaches.AI Communications7, 39–52 (1994)

1994

[33] [33]

& Bergmann, R

Zeyen, C., M¨ uller, G. & Bergmann, R. A conversational approach to process- oriented case-based reasoning.Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018)5404–5408 (2018)

2018

[34] [34]

Sun, J.et al.Think-on-graph: Deep and responsible reasoning of large language model on knowledge graph.arXiv preprint arXiv:2307.07697(2023)

work page arXiv 2023

[35] [35]

Chen, J.et al.Navigating phase diagram complexity to guide robotic inorganic materials synthesis.Nature Synthesis3, 606–614 (2024)

2024

[36] [36]

W., Rogers, L., Green, W

Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. Computer-assisted retrosynthesis based on molecular similarity.ACS central science3, 1237–1245 (2017)

2017

[37] [37]

V., Ashyrmamatov, I., Ko, J

Ucak, U. V., Ashyrmamatov, I., Ko, J. & Lee, J. Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments.Nature communications13, 1186 (2022). 17

2022

[38] [38]

& Faulon, J.-L

Duigou, T., Meyer, P. & Faulon, J.-L. Retrorules 2026: an expanded database combining biochemical and organic reaction templates for pathway discovery. Nucleic Acids Research54, D1799–D1806 (2026)

2026

[39] [39]

Chemical Reviews124, 9633–9732 (2024)

Tom, G.et al.Self-driving laboratories for chemistry and materials science. Chemical Reviews124, 9633–9732 (2024). Data availability The benchmarks introduced in this study are available at https://github.com/ ZHymLumine/MatProcBench. The source corpora from which these resources were derived are publicly available from their original providers: MatPROV a...

2024