PDE-Agents: An LLM-Orchestrated Multi-Agent Framework for Automated Finite Element Simulations with Knowledge Graph-Augmented Reasoning

Gulshan Noorsumar; {\O}yvind Jensen; Sayan Adhikari

arxiv: 2606.07850 · v1 · pith:QE6YUAM5new · submitted 2026-06-05 · ⚛️ physics.comp-ph · math-ph· math.MP

PDE-Agents: An LLM-Orchestrated Multi-Agent Framework for Automated Finite Element Simulations with Knowledge Graph-Augmented Reasoning

Sayan Adhikari , Gulshan Noorsumar , {\O}yvind Jensen This is my paper

Pith reviewed 2026-06-27 19:56 UTC · model grok-4.3

classification ⚛️ physics.comp-ph math-phmath.MP

keywords multi-agent LLM systemsGraphRAGfinite element methodautomated PDE simulationknowledge graph augmentationLangGraph orchestrationmaterial property fidelitysimulation verification

0 comments

The pith

An adaptive knowledge-graph mode lets LLM agents reach 100% success on finite-element simulations including novel materials.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PDE-Agents, a multi-agent framework that uses large language models to automate the full cycle of setting up, running, and analyzing finite element simulations from natural-language prompts. Three specialist agents handle simulation, analytics, and database tasks under a supervisor, drawing on a GraphRAG knowledge base of material properties and failure patterns. Experiments compare three retrieval modes across fifty tasks and a separate novel-material test set: the smart adaptive mode achieves complete success and perfect material-property fidelity, while the no-graph baseline falls to 34 percent fidelity. The authors conclude that the pattern of knowledge-graph integration, rather than the raw content, decides whether augmentation improves or harms agent reliability. This result matters because it offers a concrete path toward reliable, hands-off simulation tools for engineering problems where material data may be incomplete or new.

Core claim

PDE-Agents orchestrates Simulation, Analytics, and Database LLM agents via a LangGraph supervisor, augmented by a Neo4j GraphRAG store of material properties, failure patterns, and run lineage. In a three-way ablation, the KG Smart mode attains 100% task success and the highest output quality scores, including material property fidelity of 0.926 versus 0.796 without the graph; on three fictional materials known only to the graph, KG Smart reaches fidelity of 1.00 while the KG-free baseline reaches only 0.34. Across 1,369 production runs the system records 97.8% overall success, with warm-start injection identified as the dominant reliability factor and integration pattern shown to govern whe

What carries the argument

The LangGraph supervisor that dynamically selects among KG On, KG Off, and KG Smart retrieval modes for each task while the three specialist agents execute the simulation lifecycle.

If this is right

KG Smart reaches 100% success and highest physics quality (0.933) across the fifty-task ablation.
On novel materials the adaptive mode attains material property fidelity of 1.00 versus 0.34 for the no-graph baseline.
KG growth produces an 8.8% MPF gain on hard tasks while easy and novel tasks remain at ceiling.
Warm-start injection from prior runs is the main driver of the 97.8% overall success rate.
An adaptive framework can choose the optimal retrieval mode per task without manual intervention.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptive-injection pattern could be tested on other PDE classes or multiphysics problems where material data is sparse.
Real-time graph updates during a run might further reduce the three observed budget-exhaustion failures.
The 57.6% first-try success rate suggests that production deployment would still require fallback mechanisms for the remaining cases.
Difficulty-dependent gains imply that the framework's value grows with task complexity rather than remaining uniform.

Load-bearing premise

The curated knowledge graph supplies accurate, complete, and non-conflicting material properties and failure patterns that the agents can apply without introducing setup errors.

What would settle it

A controlled run in which the knowledge graph is seeded with deliberately incorrect material values and the agents are observed to produce or avoid erroneous simulation setups.

Figures

Figures reproduced from arXiv: 2606.07850 by Gulshan Noorsumar, {\O}yvind Jensen, Sayan Adhikari.

**Figure 2.** Figure 2: Knowledge graph visualisation (Neo4j-style). [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Spatial convergence study. Cases 2 and 3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Representative temperature fields produced by PDE-Agents (six cases, all in Kelvin). [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Agent workflow under the three KG integration modes. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Success rate by difficulty level across three [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Material Property Fidelity (MPF) per fictional [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Error magnification: mean material property [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Paired KG growth comparison: success-only [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

read the original abstract

We present PDE-Agents, a multi-agent ecosystem that automates the full lifecycle of partial differential equation (PDE) / finite element method (FEM) simulations through natural-language interaction. Three specialist large language model (LLM) agents (Simulation, Analytics, Database) are orchestrated via a LangGraph supervisor, with a local open-source LLM stack (Qwen3-Coder-Next, Llama 4 Scout) on dual NVIDIA RTX PRO 6000 GPUs. The architecture is model-agnostic, validated across two LLM generations. A GraphRAG knowledge base (Neo4j, 768-d vector embeddings) encodes curated material properties, known failure patterns, and prior run lineage. We report seven contributions: (i) a verification and validation (V&V) study confirming second-order spatial convergence (O(h^2)) on the heat-equation solver; (ii) a three-way ablation over 50 tasks with a frozen KG (KG On, KG Off, KG Smart), where KG Smart reaches 100% success and the highest output quality (physics 0.933 vs. 0.853 for KG Off; MPF 0.926 vs. 0.796); (iii) a novel-material experiment with three fictional materials known only to the KG, where KG Smart attains near-perfect material property fidelity (MPF = 1.00) versus 0.34 for the KG-free baseline; (iv) a failure analysis tracing KG On's three failures to budget exhaustion and timeout, establishing warm-start injection as the dominant reliability factor; (v) an adaptive framework selecting the optimal retrieval mode per task; (vi) production metrics from 1,369 runs (97.8% success, 57.6% first-try); and (vii) a 100-task KG growth experiment showing a difficulty-dependent gain, with hard-task MPF improving 8.8% while easy/novel tasks stay at ceiling. All code, models, and evaluation artifacts are released openly. Our findings show that integration pattern, not knowledge content, determines whether GraphRAG augmentation helps or hinders LLM agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PDE-Agents gives a working multi-agent FEM automation setup with controlled ablations and open code showing smart GraphRAG lifts success and fidelity on novel materials.

read the letter

The main takeaway is that this paper builds a LangGraph-orchestrated system with three specialist agents and tests three GraphRAG modes on 50 tasks plus a novel-material experiment. KG Smart hits 100% success, physics score 0.933, MPF 0.926, and perfect fidelity on fictional materials the base model has never seen, while the no-KG baseline drops to 0.34 MPF.

What stands out is the verification study confirming second-order convergence on the heat solver, the failure analysis that pins the three KG-On misses on budget and timeout rather than retrieval errors, the 1369-run production metrics at 97.8% success, and the 100-task KG growth test that shows gains mainly on hard tasks. They release all code, models, and artifacts, which lets anyone reproduce the comparisons.

Soft spots are minor and expected for a tooling paper. The tasks center on heat transfer and material lookup, so results may shift with messier graphs or other PDEs, but the authors do not overclaim generality. The KG is curated, which is a real requirement; the failure tracing and open release make that dependency testable instead of hidden. No circular success metrics or unfalsifiable claims appear.

This is for groups working on LLM agents for engineering simulation or testing RAG patterns in technical domains. A reader who wants concrete numbers on orchestration modes and reproducible artifacts will get value.

It deserves peer review. The experiments are controlled, the claims are tied to traceable data, and the open release supports verification.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces PDE-Agents, a multi-agent LLM framework orchestrated via LangGraph for end-to-end automation of PDE/FEM simulations. Specialist agents (Simulation, Analytics, Database) are augmented by a GraphRAG knowledge graph (Neo4j) encoding material properties and failure patterns. Reported contributions include a V&V study confirming O(h^2) spatial convergence on the heat equation, a 50-task three-way ablation (KG On/Off/Smart) with KG Smart reaching 100% success and superior scores (physics 0.933, MPF 0.926), a novel-material experiment yielding MPF=1.00 for KG Smart versus 0.34 for the baseline, failure analysis attributing the three KG-On failures to budget/timeout rather than retrieval errors, production metrics from 1,369 runs (97.8% success), and open release of all code, models, and artifacts. The central claim is that integration pattern, not knowledge content per se, governs whether GraphRAG helps or hinders performance.

Significance. If the empirical results hold, the work supplies reproducible evidence that curated knowledge-graph augmentation can raise reliability and material-property fidelity of LLM agents on complex engineering tasks, including extrapolation to fictional materials absent from base training data. The combination of controlled ablations, explicit failure tracing, V&V convergence checks, and full artifact release constitutes a concrete, testable advance for automated scientific computing and multi-agent systems.

minor comments (3)

[Abstract] The abstract lists seven contributions in a single dense sentence; splitting the quantitative highlights (success rates, MPF values, run counts) into a short bulleted list would improve immediate readability.
[Methods] The precise operational definitions of the physics quality score and MPF metric should be stated explicitly in the methods section (with formulas or pseudocode) rather than only in the results, to allow independent replication.
[Results] Figure captions for the ablation and novel-material plots should include the exact task counts, LLM versions, and retrieval-mode selection rule used in each condition.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the detailed and positive summary of our manuscript, the assessment of its significance, and the recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is an empirical engineering paper whose central claims rest on controlled ablations (KG On/Off/Smart), a V&V convergence study, success-rate statistics, and a novel-material test with external benchmarks (O(h^2) order, MPF scores, 97.8 % success). No derivation chain, fitted parameter renamed as prediction, or self-referential definition is present; all reported quantities are measured against independent oracles (exact solutions, curated KG ground truth, timeout logs). Open release of code and artifacts further removes any load-bearing dependence on internal definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

No free parameters are introduced; the work relies on standard assumptions about LLM capabilities and the accuracy of curated domain data rather than new physical or mathematical postulates.

axioms (2)

domain assumption LLM agents can be reliably prompted and orchestrated to perform multi-step technical tasks such as simulation setup and result interpretation without systematic hallucination.
Underpins the entire multi-agent architecture and reported success rates.
domain assumption The Neo4j knowledge graph contains accurate material properties and failure patterns that improve agent outputs when retrieved appropriately.
Central to the KG Smart ablation results and novel-material experiment.

pith-pipeline@v0.9.1-grok · 5953 in / 1465 out tokens · 24332 ms · 2026-06-27T19:56:49.475944+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 8 canonical work pages

[1]

Brown, Benjamin Mann, Nick Ryder, et al

Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877– 1901, 2020

1901
[2]

Chain-of-thought prompting elicits reasoning in large language models.Advances in Neural Infor- mation Processing Systems, 35, 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in Neural Infor- mation Processing Systems, 35, 2022

2022
[3]

ReAct: Synergizing reasoning and acting in language mod- els.Proceedings of the International Conference on Learning Representations (ICLR), 2023

Shunyu Yao, Jeffrey Zhao, Dian Yu, et al. ReAct: Synergizing reasoning and acting in language mod- els.Proceedings of the International Conference on Learning Representations (ICLR), 2023

2023
[4]

Lagaris, Aristidis Likas, and Dimitrios I

Isaac E. Lagaris, Aristidis Likas, and Dimitrios I. Fo- tiadis. Artificial neural networks for solving ordinary and partial differential equations.IEEE Transac- tions on Neural Networks, 9(5):987–1000, 1998

1998
[5]

Raissi, P

Maziar Raissi, Paris Perdikaris, and George E. Kar- niadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differ- ential equations.Journal of Computational Physics, 378:686–707, 2019. doi: 10.1016/j.jcp.2018.10.045

work page doi:10.1016/j.jcp.2018.10.045 2019
[6]

Kevrekidis, Lu Lu, et al

George Em Karniadakis, Ioannis G. Kevrekidis, Lu Lu, et al. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021

2021
[7]

Fourier neural operator for parametric partial differential equations.Proceedings of the In- ternational Conference on Learning Representations (ICLR), 2021

Zongyi Li, Nikola Kovachki, Kamyar Azizzade- nesheli, et al. Fourier neural operator for parametric partial differential equations.Proceedings of the In- ternational Conference on Learning Representations (ICLR), 2021

2021
[8]

Nature , author =

John Jumper, Richard Evans, Alexander Pritzel, et al. Highly accurate protein structure prediction with AlphaFold.Nature, 596(7873):583–589, 2021. doi: 10.1038/s41586-021-03819-2

work page doi:10.1038/s41586-021-03819-2 2021
[9]

Retrieval-augmented generation for knowledge- intensive NLP tasks.Advances in Neural Informa- tion Processing Systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, et al. Retrieval-augmented generation for knowledge- intensive NLP tasks.Advances in Neural Informa- tion Processing Systems, 33:9459–9474, 2020

2020
[10]

From local to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

Darren Edge, Ha Trinh, Newman Cheng, et al. From local to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

Pith/arXiv arXiv 2024
[11]

Yu. A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 42(4):824–836, 2020. doi: 10.1109/TPAMI. 2018.2889473

work page doi:10.1109/tpami 2020
[12]

Bran, Sam Cox, Oliver Schilter, et al

Andres M. Bran, Sam Cox, Oliver Schilter, et al. ChemCrow: Augmenting large-language models with chemistry tools. InAdvances in Neural In- formation Processing Systems, volume 36, 2023

2023
[13]

SciAgent: Tool-augmented language models for sci- entific reasoning.arXiv preprint arXiv:2402.11451, 2024

Yubo Ma, Zhibin Liu, Liangming Pan Liang, et al. SciAgent: Tool-augmented language models for sci- entific reasoning.arXiv preprint arXiv:2402.11451, 2024

arXiv 2024
[14]

Wells, et al.Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book

Anders Logg, Kent-Andre Mardal, Garth N. Wells, et al.Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book. Springer, 2012. doi: 10.1007/978-3-642-23099-8. 17

work page doi:10.1007/978-3-642-23099-8 2012
[15]

Barrata, Joseph P

Igor A. Barrata, Joseph P. Dean, Jørgen S. Dokken, et al. DOLFINx: The next generation FEniCS problem solving environment.Zenodo, 2023. doi: 10.5281/zenodo.10447666

work page doi:10.5281/zenodo.10447666 2023
[16]

Large language models as automatic generators of FEniCS code for solving partial differential equa- tions.arXiv preprint arXiv:2312.09801, 2023

Philipp Bauer, Patrick Henning, and Janna Schae- fers. Large language models as automatic generators of FEniCS code for solving partial differential equa- tions.arXiv preprint arXiv:2312.09801, 2023

arXiv 2023
[17]

LLM4FEM: Leveraging large language models for finite element method.arXiv preprint arXiv:2405.03719, 2024

Wei Jiang, Keyi Chen, Minghan Wang, et al. LLM4FEM: Leveraging large language models for finite element method.arXiv preprint arXiv:2405.03719, 2024

arXiv 2024
[18]

ALL-FEM: Agentic large language models fine- tuned for finite element methods.arXiv preprint arXiv:2603.21011, 2026

Rushikesh Deotale, Adithya Srinivasan, Yuan Tian, Tianyi Zhang, Pavlos Vlachos, and Hector Gomez. ALL-FEM: Agentic large language models fine- tuned for finite element methods.arXiv preprint arXiv:2603.21011, 2026

Pith/arXiv arXiv 2026
[19]

Brenner, and Peter Norgaard

Nayantara Mudur, Hao Cui, Subhashini Venu- gopalan, Paul Raccuglia, Michael P. Brenner, and Peter Norgaard. FEABench: Evaluating language models on multiphysics reasoning ability.arXiv preprint arXiv:2504.06260, 2025

arXiv 2025
[20]

LangGraph: Build stateful, multi- actor applications with LLMs, 2024

LangChain AI. LangGraph: Build stateful, multi- actor applications with LLMs, 2024. URLhttps: //github.com/langchain-ai/langgraph

2024
[21]

AutoGen: Enabling next-generation LLM applica- tions via multi-agent conversation

Qingyun Wu, Gagan Bansal, Jieyu Zhang, et al. AutoGen: Enabling next-generation LLM applica- tions via multi-agent conversation. InProceedings of EMNLP Industry Track, 2023

2023
[22]

CrewAI: Framework for orchestrating role-playing, autonomous AI agents, 2024

João Moura. CrewAI: Framework for orchestrating role-playing, autonomous AI agents, 2024. URL https://github.com/joaomdmoura/crewai

2024
[23]

Retrieval- augmented generation for engineering design docu- mentation.arXiv preprint arXiv:2307.04512, 2023

Xinyi Liao, Hao Zhang, and Yutao Chen. Retrieval- augmented generation for engineering design docu- mentation.arXiv preprint arXiv:2307.04512, 2023

arXiv 2023
[24]

Retrieval-augmented code generation for universal information extraction.arXiv preprint arXiv:2311.02555, 2023

Yujia Gao, Shang Liu, Peng Shi, and Jimmy Lin. Retrieval-augmented code generation for universal information extraction.arXiv preprint arXiv:2311.02555, 2023

arXiv 2023
[25]

Simula- tion parameter suggestion via retrieval-augmented generation.arXiv preprint arXiv:2403.09512, 2024

Zheng Yang, Wenyan Li, and Peng Zhang. Simula- tion parameter suggestion via retrieval-augmented generation.arXiv preprint arXiv:2403.09512, 2024

arXiv 2024
[26]

Corrective retrieval augmented generation

Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. arXiv:2401.15884

Pith/arXiv arXiv 2024
[27]

Petr Anokhin, Nikita Kornaev, Andrey Babkin, and Aleksandr I. Panov. AriGraph: Learning knowledge graph world models with episodic memory for LLM agents. InAdvances in Neural Information Process- ing Systems (NeurIPS), 2024. arXiv:2407.04363

arXiv 2024
[28]

MatKG: The largest knowledge graph in applied materials science.arXiv preprint arXiv:2209.11632, 2022

Vineeth Venugopal, Soumya Sahoo, Gurinder Agastya, et al. MatKG: The largest knowledge graph in applied materials science.arXiv preprint arXiv:2209.11632, 2022

arXiv 2022
[29]

Andersen, Rickard Armiento, Evgeny Blokhin, et al

Casper W. Andersen, Rickard Armiento, Evgeny Blokhin, et al. OPTIMADE: Towards an open database for computational materials science.Sci- entific Data, 8(1):217, 2021. doi: 10.1038/ s41597-021-00974-z

2021
[30]

Markus J. Buehler. Generative retrieval-augmented ontologic graph and multiagent strategies for inter- pretive large language model-based materials de- sign.ACS Engineering Au, 4(2):241–277, 2024. doi: 10.1021/acsengineeringau.3c00058

work page doi:10.1021/acsengineeringau.3c00058 2024
[31]

Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities,

Christophe Geuzaine and Jean-François Remacle. Gmsh: A 3-d finite element mesh generator with built-in pre- and post-processing facilities.Interna- tional Journal for Numerical Methods in Engineer- ing, 79(11):1309–1331, 2009. doi: 10.1002/nme.2579

work page doi:10.1002/nme.2579 2009
[32]

Cypher: An evolving query language for property graphs

Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. Cypher: An evolving query language for property graphs. InProceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 1433–1445, 2018. doi: 10.1145/ 3183713.3190657

arXiv 2018
[33]

Morris, Brandon Duder- stadt, and Andriy Mulyar

Zach Nussbaum, John X. Morris, Brandon Duder- stadt, and Andriy Mulyar. Nomic embed: Training a reproducible long context text embedder.arXiv preprint arXiv:2402.01613, 2024

Pith/arXiv arXiv 2024
[34]

Docling: Document processing for AI, 2024

IBM Research. Docling: Document processing for AI, 2024. URL https://github.com/DS4SD/ docling

2024
[35]

Guide for verification and validation in com- putational solid mechanics

ASME. Guide for verification and validation in com- putational solid mechanics. Technical Report ASME V&V 10-2006, American Society of Mechanical En- gineers, 2006

2006
[36]

Edwin B. Wilson. Probable inference, the law of succession, and statistical inference.Journal of the American Statistical Association, 22(158):209–212,
[37]

doi: 10.1080/01621459.1927.10502953

work page doi:10.1080/01621459.1927.10502953 1927
[38]

Lawrence Erlbaum Associates, 2nd edition, 1988

Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, 2nd edition, 1988. ISBN 978-0-8058-0283-2

1988
[39]

OpenFOAMGPT 2.0: End-to-end, trustworthy au- tomation for computational fluid dynamics.arXiv preprint arXiv:2504.19338, 2025

Hernan Chen, Luca Mangani, and Gabriel Casas. OpenFOAMGPT 2.0: End-to-end, trustworthy au- tomation for computational fluid dynamics.arXiv preprint arXiv:2504.19338, 2025

arXiv 2025
[40]

MetaOpen- FOAM: An LLM-based multi-agent framework for CFD.arXiv preprint arXiv:2407.21320, 2024

Yuxuan Chen, Xu Zuo, Yifei Yang, et al. MetaOpen- FOAM: An LLM-based multi-agent framework for CFD.arXiv preprint arXiv:2407.21320, 2024. 18

arXiv 2024
[41]

MetaGPT: Meta pro- gramming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xi- awu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. MetaGPT: Meta pro- gramming for a multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2024

Pith/arXiv arXiv 2024
[42]

Qwen3-coder-next technical report.arXiv preprint arXiv:2603.00729, 2026

Ruisheng Cao, Mouxiang Chen, Jiawei Chen, Zeyu Cui, Yunlong Feng, Binyuan Hui, Yuheng Jing, Kaixin Li, Mingze Li, Junyang Lin, Zeyao Ma, Kashun Shum, Xuwu Wang, Jinxi Wei, Jiaxi Yang, JiajunZhang, LeiZhang, ZongmengZhang, Wenting Zhao, and Fan Zhou. Qwen3-coder-next technical report.arXiv preprint arXiv:2603.00729, 2026

Pith/arXiv arXiv 2026
[43]

The Llama 4 herd: The be- ginning of a new era of natively multimodal AI innovation

Meta AI. The Llama 4 herd: The be- ginning of a new era of natively multimodal AI innovation. https://ai.meta.com/blog/ llama-4-multimodal-intelligence/, 2025. Ac- cessed 2026-04-15. 19

2025

[1] [1]

Brown, Benjamin Mann, Nick Ryder, et al

Tom B. Brown, Benjamin Mann, Nick Ryder, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems, 33:1877– 1901, 2020

1901

[2] [2]

Chain-of-thought prompting elicits reasoning in large language models.Advances in Neural Infor- mation Processing Systems, 35, 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, et al. Chain-of-thought prompting elicits reasoning in large language models.Advances in Neural Infor- mation Processing Systems, 35, 2022

2022

[3] [3]

ReAct: Synergizing reasoning and acting in language mod- els.Proceedings of the International Conference on Learning Representations (ICLR), 2023

Shunyu Yao, Jeffrey Zhao, Dian Yu, et al. ReAct: Synergizing reasoning and acting in language mod- els.Proceedings of the International Conference on Learning Representations (ICLR), 2023

2023

[4] [4]

Lagaris, Aristidis Likas, and Dimitrios I

Isaac E. Lagaris, Aristidis Likas, and Dimitrios I. Fo- tiadis. Artificial neural networks for solving ordinary and partial differential equations.IEEE Transac- tions on Neural Networks, 9(5):987–1000, 1998

1998

[5] [5]

Raissi, P

Maziar Raissi, Paris Perdikaris, and George E. Kar- niadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differ- ential equations.Journal of Computational Physics, 378:686–707, 2019. doi: 10.1016/j.jcp.2018.10.045

work page doi:10.1016/j.jcp.2018.10.045 2019

[6] [6]

Kevrekidis, Lu Lu, et al

George Em Karniadakis, Ioannis G. Kevrekidis, Lu Lu, et al. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021

2021

[7] [7]

Fourier neural operator for parametric partial differential equations.Proceedings of the In- ternational Conference on Learning Representations (ICLR), 2021

Zongyi Li, Nikola Kovachki, Kamyar Azizzade- nesheli, et al. Fourier neural operator for parametric partial differential equations.Proceedings of the In- ternational Conference on Learning Representations (ICLR), 2021

2021

[8] [8]

Nature , author =

John Jumper, Richard Evans, Alexander Pritzel, et al. Highly accurate protein structure prediction with AlphaFold.Nature, 596(7873):583–589, 2021. doi: 10.1038/s41586-021-03819-2

work page doi:10.1038/s41586-021-03819-2 2021

[9] [9]

Retrieval-augmented generation for knowledge- intensive NLP tasks.Advances in Neural Informa- tion Processing Systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, et al. Retrieval-augmented generation for knowledge- intensive NLP tasks.Advances in Neural Informa- tion Processing Systems, 33:9459–9474, 2020

2020

[10] [10]

From local to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

Darren Edge, Ha Trinh, Newman Cheng, et al. From local to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

Pith/arXiv arXiv 2024

[11] [11]

Yu. A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 42(4):824–836, 2020. doi: 10.1109/TPAMI. 2018.2889473

work page doi:10.1109/tpami 2020

[12] [12]

Bran, Sam Cox, Oliver Schilter, et al

Andres M. Bran, Sam Cox, Oliver Schilter, et al. ChemCrow: Augmenting large-language models with chemistry tools. InAdvances in Neural In- formation Processing Systems, volume 36, 2023

2023

[13] [13]

SciAgent: Tool-augmented language models for sci- entific reasoning.arXiv preprint arXiv:2402.11451, 2024

Yubo Ma, Zhibin Liu, Liangming Pan Liang, et al. SciAgent: Tool-augmented language models for sci- entific reasoning.arXiv preprint arXiv:2402.11451, 2024

arXiv 2024

[14] [14]

Wells, et al.Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book

Anders Logg, Kent-Andre Mardal, Garth N. Wells, et al.Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book. Springer, 2012. doi: 10.1007/978-3-642-23099-8. 17

work page doi:10.1007/978-3-642-23099-8 2012

[15] [15]

Barrata, Joseph P

Igor A. Barrata, Joseph P. Dean, Jørgen S. Dokken, et al. DOLFINx: The next generation FEniCS problem solving environment.Zenodo, 2023. doi: 10.5281/zenodo.10447666

work page doi:10.5281/zenodo.10447666 2023

[16] [16]

Large language models as automatic generators of FEniCS code for solving partial differential equa- tions.arXiv preprint arXiv:2312.09801, 2023

Philipp Bauer, Patrick Henning, and Janna Schae- fers. Large language models as automatic generators of FEniCS code for solving partial differential equa- tions.arXiv preprint arXiv:2312.09801, 2023

arXiv 2023

[17] [17]

LLM4FEM: Leveraging large language models for finite element method.arXiv preprint arXiv:2405.03719, 2024

Wei Jiang, Keyi Chen, Minghan Wang, et al. LLM4FEM: Leveraging large language models for finite element method.arXiv preprint arXiv:2405.03719, 2024

arXiv 2024

[18] [18]

ALL-FEM: Agentic large language models fine- tuned for finite element methods.arXiv preprint arXiv:2603.21011, 2026

Rushikesh Deotale, Adithya Srinivasan, Yuan Tian, Tianyi Zhang, Pavlos Vlachos, and Hector Gomez. ALL-FEM: Agentic large language models fine- tuned for finite element methods.arXiv preprint arXiv:2603.21011, 2026

Pith/arXiv arXiv 2026

[19] [19]

Brenner, and Peter Norgaard

Nayantara Mudur, Hao Cui, Subhashini Venu- gopalan, Paul Raccuglia, Michael P. Brenner, and Peter Norgaard. FEABench: Evaluating language models on multiphysics reasoning ability.arXiv preprint arXiv:2504.06260, 2025

arXiv 2025

[20] [20]

LangGraph: Build stateful, multi- actor applications with LLMs, 2024

LangChain AI. LangGraph: Build stateful, multi- actor applications with LLMs, 2024. URLhttps: //github.com/langchain-ai/langgraph

2024

[21] [21]

AutoGen: Enabling next-generation LLM applica- tions via multi-agent conversation

Qingyun Wu, Gagan Bansal, Jieyu Zhang, et al. AutoGen: Enabling next-generation LLM applica- tions via multi-agent conversation. InProceedings of EMNLP Industry Track, 2023

2023

[22] [22]

CrewAI: Framework for orchestrating role-playing, autonomous AI agents, 2024

João Moura. CrewAI: Framework for orchestrating role-playing, autonomous AI agents, 2024. URL https://github.com/joaomdmoura/crewai

2024

[23] [23]

Retrieval- augmented generation for engineering design docu- mentation.arXiv preprint arXiv:2307.04512, 2023

Xinyi Liao, Hao Zhang, and Yutao Chen. Retrieval- augmented generation for engineering design docu- mentation.arXiv preprint arXiv:2307.04512, 2023

arXiv 2023

[24] [24]

Retrieval-augmented code generation for universal information extraction.arXiv preprint arXiv:2311.02555, 2023

Yujia Gao, Shang Liu, Peng Shi, and Jimmy Lin. Retrieval-augmented code generation for universal information extraction.arXiv preprint arXiv:2311.02555, 2023

arXiv 2023

[25] [25]

Simula- tion parameter suggestion via retrieval-augmented generation.arXiv preprint arXiv:2403.09512, 2024

Zheng Yang, Wenyan Li, and Peng Zhang. Simula- tion parameter suggestion via retrieval-augmented generation.arXiv preprint arXiv:2403.09512, 2024

arXiv 2024

[26] [26]

Corrective retrieval augmented generation

Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, and Zhen-Hua Ling. Corrective retrieval augmented generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. arXiv:2401.15884

Pith/arXiv arXiv 2024

[27] [27]

Petr Anokhin, Nikita Kornaev, Andrey Babkin, and Aleksandr I. Panov. AriGraph: Learning knowledge graph world models with episodic memory for LLM agents. InAdvances in Neural Information Process- ing Systems (NeurIPS), 2024. arXiv:2407.04363

arXiv 2024

[28] [28]

MatKG: The largest knowledge graph in applied materials science.arXiv preprint arXiv:2209.11632, 2022

Vineeth Venugopal, Soumya Sahoo, Gurinder Agastya, et al. MatKG: The largest knowledge graph in applied materials science.arXiv preprint arXiv:2209.11632, 2022

arXiv 2022

[29] [29]

Andersen, Rickard Armiento, Evgeny Blokhin, et al

Casper W. Andersen, Rickard Armiento, Evgeny Blokhin, et al. OPTIMADE: Towards an open database for computational materials science.Sci- entific Data, 8(1):217, 2021. doi: 10.1038/ s41597-021-00974-z

2021

[30] [30]

Markus J. Buehler. Generative retrieval-augmented ontologic graph and multiagent strategies for inter- pretive large language model-based materials de- sign.ACS Engineering Au, 4(2):241–277, 2024. doi: 10.1021/acsengineeringau.3c00058

work page doi:10.1021/acsengineeringau.3c00058 2024

[31] [31]

Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities,

Christophe Geuzaine and Jean-François Remacle. Gmsh: A 3-d finite element mesh generator with built-in pre- and post-processing facilities.Interna- tional Journal for Numerical Methods in Engineer- ing, 79(11):1309–1331, 2009. doi: 10.1002/nme.2579

work page doi:10.1002/nme.2579 2009

[32] [32]

Cypher: An evolving query language for property graphs

Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor. Cypher: An evolving query language for property graphs. InProceedings of the 2018 International Conference on Management of Data (SIGMOD), pages 1433–1445, 2018. doi: 10.1145/ 3183713.3190657

arXiv 2018

[33] [33]

Morris, Brandon Duder- stadt, and Andriy Mulyar

Zach Nussbaum, John X. Morris, Brandon Duder- stadt, and Andriy Mulyar. Nomic embed: Training a reproducible long context text embedder.arXiv preprint arXiv:2402.01613, 2024

Pith/arXiv arXiv 2024

[34] [34]

Docling: Document processing for AI, 2024

IBM Research. Docling: Document processing for AI, 2024. URL https://github.com/DS4SD/ docling

2024

[35] [35]

Guide for verification and validation in com- putational solid mechanics

ASME. Guide for verification and validation in com- putational solid mechanics. Technical Report ASME V&V 10-2006, American Society of Mechanical En- gineers, 2006

2006

[36] [36]

Edwin B. Wilson. Probable inference, the law of succession, and statistical inference.Journal of the American Statistical Association, 22(158):209–212,

[37] [37]

doi: 10.1080/01621459.1927.10502953

work page doi:10.1080/01621459.1927.10502953 1927

[38] [38]

Lawrence Erlbaum Associates, 2nd edition, 1988

Jacob Cohen.Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, 2nd edition, 1988. ISBN 978-0-8058-0283-2

1988

[39] [39]

OpenFOAMGPT 2.0: End-to-end, trustworthy au- tomation for computational fluid dynamics.arXiv preprint arXiv:2504.19338, 2025

Hernan Chen, Luca Mangani, and Gabriel Casas. OpenFOAMGPT 2.0: End-to-end, trustworthy au- tomation for computational fluid dynamics.arXiv preprint arXiv:2504.19338, 2025

arXiv 2025

[40] [40]

MetaOpen- FOAM: An LLM-based multi-agent framework for CFD.arXiv preprint arXiv:2407.21320, 2024

Yuxuan Chen, Xu Zuo, Yifei Yang, et al. MetaOpen- FOAM: An LLM-based multi-agent framework for CFD.arXiv preprint arXiv:2407.21320, 2024. 18

arXiv 2024

[41] [41]

MetaGPT: Meta pro- gramming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xi- awu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. MetaGPT: Meta pro- gramming for a multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2024

Pith/arXiv arXiv 2024

[42] [42]

Qwen3-coder-next technical report.arXiv preprint arXiv:2603.00729, 2026

Ruisheng Cao, Mouxiang Chen, Jiawei Chen, Zeyu Cui, Yunlong Feng, Binyuan Hui, Yuheng Jing, Kaixin Li, Mingze Li, Junyang Lin, Zeyao Ma, Kashun Shum, Xuwu Wang, Jinxi Wei, Jiaxi Yang, JiajunZhang, LeiZhang, ZongmengZhang, Wenting Zhao, and Fan Zhou. Qwen3-coder-next technical report.arXiv preprint arXiv:2603.00729, 2026

Pith/arXiv arXiv 2026

[43] [43]

The Llama 4 herd: The be- ginning of a new era of natively multimodal AI innovation

Meta AI. The Llama 4 herd: The be- ginning of a new era of natively multimodal AI innovation. https://ai.meta.com/blog/ llama-4-multimodal-intelligence/, 2025. Ac- cessed 2026-04-15. 19

2025