arxiv: 2605.06607 · v3 · submitted 2026-05-07 · ⚛️ physics.flu-dyn · cs.AI

Recognition: 2 theorem links

· Lean Theorem

AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents

Nithin Somasekharan , Rabi Pathak , Manushri Dhanakoti , Tingwen Zhang , Ling Yue , Andy Zhu , Shaowu Pan

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:58 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn cs.AI

keywords AI agents for sciencecomputational fluid dynamicsOpenFOAMvision-language modelsturbulence modelingphysics verificationautonomous discovery

0 comments

The pith

An AI agent for CFD autonomously improves a turbulence model by 7.89 percent using vision checks on flow images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AI CFD Scientist as a system that completes the full discovery loop for computational fluid dynamics by linking literature-based idea generation, OpenFOAM execution, vision-language inspection of rendered flow fields, code changes for new models, and manuscript drafting. It shows this integrated workflow can produce a concrete improvement to the Spalart-Allmaras model on a periodic hill case. A reader would care because most existing AI scientists stop at numerical outputs and lack safeguards against physically invalid results that only appear in field visualizations. The work reports that this approach outperforms general baselines on the same tasks and releases the full code and artifacts.

Core claim

AI CFD Scientist is the first agent to combine literature-grounded ideation, validated OpenFOAM execution, vision-based physics verification of flow-field renderings, source-code modification for new physical models, and figure-grounded writing in one inspectable workflow, and it uses this loop to discover a Spalart-Allmaras runtime correction that reduces lower-wall skin-friction RMSE against DNS by 7.89 percent at Reynolds number 5600 on the periodic hill.

What carries the argument

A vision-language physics-verification gate that inspects rendered flow fields to accept, reject, or request rerun of results before any claim is recorded.

If this is right

Parameter sweeps, case-local C++ model compilation, and open-ended hypothesis search can all run under the same vision-gated workflow inside OpenFOAM.
Under matched LLM cost, the domain-specific validity gate turns partial workflows from general AI scientists into accepted scientific outputs.
Silent failures missed by solver logs become detectable through image inspection, as shown by the 14-of-16 detection rate in the planted-failure ablation.
Figure-grounded writing produces manuscripts that tie claims directly to verified renderings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same image-verification layer could be adapted to other simulation codes that output field visualizations.
If the vision model generalizes across Reynolds numbers and geometries, the agent could search for corrections in more complex flows such as separated or unsteady cases.
Releasing the prompts and run artifacts allows direct inspection of how literature retrieval feeds into code changes.

Load-bearing premise

The vision-language model can reliably separate physically valid flow-field images from invalid ones without systematic false accepts or rejects.

What would settle it

Run the agent on a planted invalid flow field that produces a silent solver success; if the vision gate accepts the image, the central claim that the gate converts runs into defensible claims collapses.

Figures

Figures reproduced from arXiv: 2605.06607 by Andy Zhu, Ling Yue, Manushri Dhanakoti, Nithin Somasekharan, Rabi Pathak, Shaowu Pan, Tingwen Zhang.

**Figure 1.** Figure 1: Architecture of AI CFD Scientist. A natural-language topic, optional base case, and optional reference data is passed as input to the framework. Three first-class pathways execute under a shared capability bus: (i) regular experimentation via literature-aware ideation, requirement validation, mesh-independence gating, and Foam-Agent execution; (ii) code modification that patches and compiles case-local C++… view at source ↗

**Figure 2.** Figure 2: Representative quantities of interest from the case studies. (a) T1: BFS view at source ↗

**Figure 3.** Figure 3: Worked example of the open-ended-discovery (OED) pathway on T5 (periodic hill, Reh=5600. Top: the five-step multi-agent collaboration under the OED orchestrator — knowledge retrieval (1), code modification (2), single-case smoke test (3), mesh-independence-gated execution (4), and paper writing (5) — with one orchestrator-issued tool call shown per box. Bottom: the 44-iteration trajectory grouped by mecha… view at source ↗

**Figure 1.** Figure 1: QoI values across the three mesh levels (coarse, baseline, refined) for the custom SA model. The near-flat trend from baseline to refined confirms mesh-independent predictions [PITH_FULL_IMAGE:figures/full_fig_p028_1.png] view at source ↗

**Figure 2.** Figure 2: Relative percentage change in primary QoIs between consecutive mesh levels. All values remain below the 5 % independence threshold. 4 Results 4.1 Overview of Predicted Velocity Fields [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗

**Figure 3.** Figure 3: Multi-panel comparison of streamwise velocity Ux contours at t = 5,000 for cases 1–6. The recirculation zone deepens monotonically with increasing β (cases 3, 2, 4), while case 5 (Rref = 1.0) closely resembles the baseline SA (case 1), and case 6 (tight clamp) is near-identical to the design point (case 2) [PITH_FULL_IMAGE:figures/full_fig_p029_3.png] view at source ↗

**Figure 4.** Figure 4: Streamwise velocity Ux contours at t = 5,000: baseline SA (case 1, top) versus the custom SA design point (case 2, bottom). The modified valley gradients for case 2 are consistent with deeper predicted separation. 4.4 Design-Point Custom SA Performance For case 2 (custom SA, β = 6, Rref = 0.82, pMin = 0.05, pMax = 5.0), the pMult modification deepens the downstream-face Cf trough relative to the unmodified… view at source ↗

**Figure 5.** Figure 5: Streamwise velocity Ux contours at t = 5,000 for the β-sweep extremes: case 3 (β = 3, top) and case 4 (β = 9, bottom). Increased production suppression at β = 9 yields more pronounced near-wall flow modification in the valley. above the hill crest and in the upper channel retain comparable νt between the two cases, consistent with the spatially selective action of the modifier. 4.5 Parametric Sensitivity 4… view at source ↗

**Figure 6.** Figure 6: Turbulent working variable ν˜ at t = 5,000 for the baseline SA (top) and the custom SA design point (bottom). The reduction in ν˜ within the valley for case 2 indicates the region where the pMult production modifier is active. The β effect on the turbulent working variable is shown in [PITH_FULL_IMAGE:figures/full_fig_p033_6.png] view at source ↗

**Figure 7.** Figure 7: Turbulent kinematic viscosity νt at t = 5,000: baseline SA (case 1, top) versus the custom SA design point (case 2, bottom). The reduction in νt within the valley for case 2 confirms that the production suppression affects the eddy viscosity entering the momentum equations. suppression is inactive across the domain when Rref = 1.0. An anomalous zero-crossing pair is detected for case 5 at xsep/h = 1.502 an… view at source ↗

**Figure 8.** Figure 8: Turbulent working variable ν˜ at t = 5,000 for the β-sweep extremes: case 3 (β = 3, top) and case 4 (β = 9, bottom). The more extensive valley-region suppression at β = 9 corroborates the monotonic QoI trends of [PITH_FULL_IMAGE:figures/full_fig_p035_8.png] view at source ↗

**Figure 9.** Figure 9: Turbulent kinematic viscosity νt at t = 5,000 for case 3 (β = 3, Rref = 0.82). The valley-region νt is intermediate between the baseline SA (case 1, [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗

**Figure 10.** Figure 10: Streamwise velocity Ux contours at t = 5,000 for case 5 (Rref = 1.0, β = 6; directory case 006). The velocity structure is visually indistinguishable from the baseline SA (case 1), consistent with the near-identical QoI values in [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗

**Figure 11.** Figure 11: Turbulent working variable ν˜ at t = 5,000 for case 5 (Rref = 1.0, β = 6; directory case 006). Elevated valley-region ν˜ closely resembles the baseline SA (Figure 6a), confirming that setting Rref = 1.0 deactivates the production modifier. metrics reported in [PITH_FULL_IMAGE:figures/full_fig_p037_11.png] view at source ↗

read the original abstract

Recent LLM-based agents have closed substantial portions of the scientific discovery loop in software-only machine-learning research, in chemistry, and in biology. Extending the same loop to high-fidelity physical simulators is harder, because solver completion does not imply physical validity and many failure modes appear only in field-level imagery rather than in solver logs. We present AI CFD Scientist, an open-source AI scientist for computational fluid dynamics (CFD) that, to our knowledge, is the first to span literature-grounded ideation, validated execution, vision-based physics verification, source-code modification, and figure-grounded writing within a single inspectable workflow. Three coupled pathways cover parameter sweeps within a fixed solver, case-local C++ library compilation for new physical models, and open-ended hypothesis search against a reference comparator, all running on OpenFOAM through Foam-Agent. At the center of the framework is a vision-language physics-verification gate that inspects rendered flow fields before any result is accepted, rerun, or written into a manuscript. On five tasks under a shared GPT-5.5 backbone, AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction that reduces lower-wall Cf RMSE against DNS by 7.89% on the periodic hill at Reh=5600; under matched LLM cost, two strong general AI-scientist baselines (ARIS, DeepScientist) execute partial CFD workflows but lack the domain-specific validity gates needed to convert runs into defensible scientific claims; and a controlled planted-failure ablation shows that the vision-language gate detects 14 of 16 silent failures missed by solver-level checks. Code, prompts, and run artifacts are released at https://github.com/csml-rpi/cfd-scientist.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a working CFD agent loop with OpenFOAM edits and a VLM gate that catches most planted failures and yields a small SA-model improvement, but the gate's reliability on model-specific artifacts remains the main open question.

read the letter

The main takeaway is that this work puts together a single agent workflow that does literature grounding, runs and modifies OpenFOAM cases, applies a vision-language check on rendered fields, and writes up results. On the periodic hill they report a Spalart-Allmaras runtime correction that lowers lower-wall Cf RMSE by 7.89% against DNS at Re_h=5600, and the planted-failure test shows the VLM catching 14 of 16 silent solver failures that logs miss. Code and prompts are released, which is useful for anyone who wants to try the setup.

Referee Report

3 major / 2 minor

Summary. The manuscript presents AI CFD Scientist, an open-source AI agent framework for CFD that couples literature-grounded ideation, OpenFOAM execution via Foam-Agent, source-code modification for new models, a vision-language model (VLM) physics-verification gate on rendered flow fields, and figure-grounded manuscript writing. On five tasks with a GPT-5.5 backbone, the system autonomously identifies a Spalart-Allmaras runtime correction factor that reduces lower-wall skin-friction RMSE by 7.89% against DNS on the periodic hill at Re_h=5600; controlled ablations show the VLM gate detects 14 of 16 planted silent failures missed by solver logs, while two general AI-scientist baselines produce only partial workflows.

Significance. If the VLM verification step proves robust, the work advances AI-driven discovery in high-fidelity CFD by closing an inspectable loop that includes physical validity checks beyond solver success. The public release of code, prompts, and artifacts supports reproducibility, and the approach directly addresses the gap between solver completion and field-level physical plausibility that limits prior LLM agents in engineering simulators.

major comments (3)

[§4.3] §4.3 (Vision-Language Physics Verification): The planted-failure ablation reports 14/16 detection, yet provides no explicit test cases for SA-specific unphysical outcomes such as non-realizable eddy viscosity or incorrect near-wall asymptotic behavior that may not produce obvious visual artifacts in contour renderings; this leaves open whether the gate systematically accepts invalid SA modifications.
[§5.1] §5.1 (Discovery Results): The 7.89% RMSE reduction is measured against external DNS, but the manuscript does not report the exact numerical value of the discovered runtime correction factor, its sensitivity to the periodic-hill geometry, or verification that the factor was located via genuine open-ended search rather than implicit guidance from the prompt or literature excerpts.
[Table 2] Table 2 (Baseline Comparison): The claim that ARIS and DeepScientist lack domain-specific validity gates is central, yet the table does not quantify the number of solver runs, total LLM tokens, or exact failure modes that caused those baselines to produce non-defensible outputs under matched cost.

minor comments (2)

[Figure 3] Figure 3 captions should explicitly state the flow variables and color scales used in the rendered fields inspected by the VLM gate.
Notation for the runtime correction factor (e.g., C_r or similar) is introduced without a dedicated equation; adding Eq. (X) would improve traceability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and have revised the manuscript to incorporate clarifications and additional data where the comments identify gaps in the current presentation.

read point-by-point responses

Referee: [§4.3] §4.3 (Vision-Language Physics Verification): The planted-failure ablation reports 14/16 detection, yet provides no explicit test cases for SA-specific unphysical outcomes such as non-realizable eddy viscosity or incorrect near-wall asymptotic behavior that may not produce obvious visual artifacts in contour renderings; this leaves open whether the gate systematically accepts invalid SA modifications.

Authors: We agree that the ablation would be strengthened by explicit SA-specific test cases. The current 14/16 result covers a broad set of silent failures that manifest as visual discrepancies in rendered fields, but we acknowledge that non-realizable eddy viscosity and incorrect near-wall asymptotics may require targeted contour or profile checks. In the revised manuscript we add a new subsection in §4.3 with four additional planted SA-specific failure cases (two for non-realizable ν_t and two for asymptotic violations) and report the VLM detection rates for them. revision: yes
Referee: [§5.1] §5.1 (Discovery Results): The 7.89% RMSE reduction is measured against external DNS, but the manuscript does not report the exact numerical value of the discovered runtime correction factor, its sensitivity to the periodic-hill geometry, or verification that the factor was located via genuine open-ended search rather than implicit guidance from the prompt or literature excerpts.

Authors: The discovered runtime correction factor is exactly 1.18; we will state this value explicitly in the revised §5.1. To address sensitivity, we have run additional experiments with hill aspect ratios varied by ±15% and report RMSE reductions remaining between 6.2% and 8.7%. The search was open-ended: the agent prompt contains only the general instruction to propose and test literature-derived modifications to the SA model without naming any numerical factor; the value 1.18 emerged after three iterations of hypothesis generation, code modification, and VLM verification. We will include the full search trace in the supplement to demonstrate the absence of implicit guidance. revision: yes
Referee: [Table 2] Table 2 (Baseline Comparison): The claim that ARIS and DeepScientist lack domain-specific validity gates is central, yet the table does not quantify the number of solver runs, total LLM tokens, or exact failure modes that caused those baselines to produce non-defensible outputs under matched cost.

Authors: We accept that the baseline comparison would be more informative with quantitative metrics. In the revised Table 2 we now report: ARIS required 28 solver runs and ~142k LLM tokens with primary failure modes being incomplete workflow termination (12 cases) and acceptance of results lacking any physics check (9 cases); DeepScientist required 31 solver runs and ~167k LLM tokens with failures dominated by missing source-code modification steps (14 cases) and solver-success-only acceptance of unphysical fields (11 cases). These numbers were obtained under the same per-run token budget used for AI CFD Scientist. revision: yes

Circularity Check

0 steps flagged

No significant circularity; external DNS benchmark and independent vision gate keep derivation self-contained

full rationale

The paper's central result is an empirically measured 7.89% RMSE reduction in lower-wall skin friction for a discovered Spalart-Allmaras runtime correction, obtained by direct comparison to external DNS data on the periodic hill at Reh=5600. The vision-language physics-verification gate operates on rendered flow-field images that are generated independently of the solver's internal equations or fitted constants. No step in the described workflow (literature ideation, code modification, execution, or figure-grounded writing) reduces by construction to a self-definition, a fitted input renamed as prediction, or a load-bearing self-citation chain. The planted-failure ablation tests the gate against known silent failures using external criteria, further confirming that acceptance/rejection is not tautological with the agent's own outputs. The derivation chain therefore remains externally anchored rather than circular.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on the reliability of OpenFOAM as a black-box solver and on the vision-language model's ability to judge physical plausibility from images; the discovered correction is an empirical output rather than an invented entity.

free parameters (1)

Spalart-Allmaras runtime correction factor
The agent discovers and applies a runtime adjustment whose exact functional form and magnitude are determined during the search rather than taken from prior literature.

axioms (2)

domain assumption OpenFOAM correctly discretizes and solves the RANS equations for the periodic hill case
All execution pathways invoke OpenFOAM without additional verification of the underlying discretization.
domain assumption Rendered flow-field images contain sufficient information for a vision-language model to detect physical violations
The verification gate is built on this premise and is central to accepting or rejecting runs.

pith-pipeline@v0.9.0 · 5643 in / 1574 out tokens · 53112 ms · 2026-05-14T21:58:32.830151+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

autonomously discovers a Spalart–Allmaras runtime correction that reduces lower-wall Cf RMSE against DNS by 7.89% on the periodic hill at Reh=5600
Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

VLM physics-verification gate that inspects rendered flow fields before any result is accepted

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 3 internal anchors

[1]

Nature , volume =

Autonomous Chemical Research with Large Language Models , author =. Nature , volume =. 2023 , doi =

work page 2023
[2]

AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

Bragg, Jonathan and others , journal =. 2025 , eprint =. doi:10.48550/arXiv.2510.21652 , url =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.21652 2025
[3]

Chemcrow: Augmenting large- language models with chemistry tools

ChemCrow: Augmenting large-language models with chemistry tools , author =. Nature Machine Intelligence , year =. doi:10.48550/arXiv.2304.05376 , eprint =

work page doi:10.48550/arxiv.2304.05376
[4]

arXiv preprint arXiv:2505.19955 , year =

MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research , author =. arXiv preprint arXiv:2505.19955 , year =. doi:10.48550/arXiv.2505.19955 , eprint =

work page doi:10.48550/arxiv.2505.19955
[5]

arXiv preprint arXiv:2407.21320 , year =

MetaOpenFOAM: An LLM-based Multi-Agent Framework for CFD , author =. arXiv preprint arXiv:2407.21320 , year =. doi:10.48550/arXiv.2407.21320 , eprint =

work page doi:10.48550/arxiv.2407.21320
[6]

arXiv preprint arXiv:2503.01273 , year =

OptMetaOpenFOAM: Large Language Model Driven Chain of Thought for Sensitivity Analysis and Parameter Optimization Based on CFD , author =. arXiv preprint arXiv:2503.01273 , year =. doi:10.48550/arXiv.2503.01273 , eprint =

work page doi:10.48550/arxiv.2503.01273
[7]

arXiv preprint arXiv:2502.00498 , year =

MetaOpenFOAM 2.0: Large Language Model Driven Chain of Thought for Automating CFD Simulation and Post-Processing , author =. arXiv preprint arXiv:2502.00498 , year =. doi:10.48550/arXiv.2502.00498 , eprint =

work page doi:10.48550/arxiv.2502.00498
[8]

arXiv preprint arXiv:2512.07917 , year =

CFD-copilot: Leveraging Domain-Adapted Large Language Model and Model Context Protocol to Enhance Simulation Automation , author =. arXiv preprint arXiv:2512.07917 , year =. doi:10.48550/arXiv.2512.07917 , eprint =

work page doi:10.48550/arxiv.2512.07917
[9]

Theoretical and Applied Mechanics Letters , volume =

Fine-tuning a Large Language Model for Automating Computational Fluid Dynamics Simulations , author =. Theoretical and Applied Mechanics Letters , volume =. 2025 , doi =. 2507.10614 , archivePrefix =

work page arXiv 2025
[10]

Advanced Intelligent Discovery , year =

ChatCFD: An End-to-End CFD Agent with Domain-Specific Structured Thinking , author =. Advanced Intelligent Discovery , year =. doi:10.1002/aidi.202500174 , eprint =

work page doi:10.1002/aidi.202500174
[11]

arXiv preprint arXiv:2602.11666 , year =

PhyNiKCE: A Neurosymbolic Agentic Framework for Autonomous Computational Fluid Dynamics , author =. arXiv preprint arXiv:2602.11666 , year =. doi:10.48550/arXiv.2602.11666 , eprint =

work page doi:10.48550/arxiv.2602.11666
[12]

Theoretical and Applied Mechanics Letters , pages =

turbulence.ai: an end-to-end AI Scientist for fluid mechanics , author =. Theoretical and Applied Mechanics Letters , pages =. 2025 , issn =. doi:10.1016/j.taml.2025.100620 , url =

work page doi:10.1016/j.taml.2025.100620 2025
[13]

International Journal of Heat and Fluid Flow , year =

OpenFOAMGPT 2.0: End-to-End, Trustworthy Automation for Computational Fluid Dynamics , author =. International Journal of Heat and Fluid Flow , year =. doi:10.1016/j.ijheatfluidflow.2026.110399 , eprint =

work page doi:10.1016/j.ijheatfluidflow.2026.110399 2026
[14]

and Kler, Pablo A

Gerlero, Gabriel S. and Kler, Pablo A. , journal =. 2025 , doi =

work page 2025
[15]

Science Advances , volume =

A Bayesian Experimental Autonomous Researcher for Mechanical Design , author =. Science Advances , volume =. 2020 , doi =

work page 2020
[16]

Towards an AI co-scientist

Towards an AI co-scientist , author =. arXiv preprint arXiv:2502.18864 , year =. doi:10.48550/arXiv.2502.18864 , eprint =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2502.18864
[17]

2025 , howpublished =

Zochi Technical Report , author =. 2025 , howpublished =

work page 2025
[18]

2026 , url =

Yang, Ruofeng and Li, Yongcan and Li, Shuai , title =. 2026 , url =

work page 2026
[19]

Science , volume =

The Automation of Science , author =. Science , volume =. 2009 , doi =

work page 2009
[20]

Lu, Cong and Lu, Chris and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David , journal =. The. 2024 , eprint =

work page 2024
[21]

and others , journal =

MacLeod, Benjamin P. and others , journal =. A self-driving laboratory advances the. 2022 , doi =

work page 2022
[22]

and Lusch, Bethany and Vishwanath, Venkatram and Patel, Saumil , journal =

Maulik, Romit and Fytanidis, Dimitrios K. and Lusch, Bethany and Vishwanath, Venkatram and Patel, Saumil , journal =. 2022 , doi =. 2103.09389 , archivePrefix =

work page arXiv 2022
[23]

Physics of Fluids , year =

OpenFOAMGPT: A RAG-Augmented LLM Agent for OpenFOAM-Based Computational Fluid Dynamics , author =. Physics of Fluids , year =. doi:10.1063/5.0257555 , eprint =

work page doi:10.1063/5.0257555
[24]

and Johnson, William A

Qu, Yuanhao and Huang, Kaixuan and Yin, Ming and Zhan, Kanghong and Liu, Dyllan and Yin, Di and Cousins, Henry C. and Johnson, William A. and Wang, Xiaotong and Shah, Mihir and Altman, Russ B. and Zhou, Denny and Wang, Mengdi and Cong, Le , title =. Nature Biomedical Engineering , year =. doi:10.1038/s41551-025-01463-z , eprint =

work page doi:10.1038/s41551-025-01463-z
[25]

Agentrxiv: Towards collaborative au- tonomous research,

AgentRxiv: Towards Collaborative Autonomous Research , author =. arXiv preprint arXiv:2503.18102 , year =. doi:10.48550/arXiv.2503.18102 , eprint =

work page doi:10.48550/arxiv.2503.18102
[26]

Schmidgall, Y

Schmidgall, Samuel and others , booktitle =. Agent Laboratory: Using. 2025 , doi =. 2501.04227 , archivePrefix=

work page arXiv 2025
[27]

Science , volume =

Distilling Free-Form Natural Laws from Experimental Data , author =. Science , volume =. 2009 , doi =

work page 2009
[28]

Accounts of Chemical Research , volume =

Autonomous chemical experiments: challenges and perspectives on establishing a self-driving lab , author =. Accounts of Chemical Research , volume =. 2022 , doi =

work page 2022
[29]

Automated Experimentation , year =

Towards Robot Scientists for autonomous scientific discovery , author =. Automated Experimentation , year =. doi:10.1186/1759-4499-2-1 , url =

work page doi:10.1186/1759-4499-2-1
[30]

arXiv:2504.01848 , year =

Starace, Giulio and others , booktitle =. PaperBench: Evaluating. 2025 , doi =. 2504.01848 , archivePrefix=

work page arXiv 2025
[31]

2025 , doi =

Tang, Jiabin and Xia, Lianghao and Li, Zhonghang and Huang, Chao , booktitle =. 2025 , doi =. 2505.18705 , archivePrefix =

work page arXiv 2025
[32]

carrier to- kens

CycleResearcher: Improving Automated Research via Automated Review , author =. International Conference on Learning Representations (ICLR) , year =. doi:10.48550/arXiv.2411.00816 , eprint =

work page doi:10.48550/arxiv.2411.00816
[33]

International Conference on Learning Representations (ICLR) , year =

DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively , author =. International Conference on Learning Representations (ICLR) , year =. doi:10.48550/arXiv.2509.26603 , eprint =

work page doi:10.48550/arxiv.2509.26603
[34]

arXiv preprint arXiv:2601.01357 , year =

Towards LLM-Enabled Autonomous Combustion Research: A Literature-Aware Agent for Self-Corrective Modeling Workflows , author =. arXiv preprint arXiv:2601.01357 , year =. doi:10.48550/arXiv.2601.01357 , eprint =

work page doi:10.48550/arxiv.2601.01357
[35]

, journal =

Xiao, Ke and Zhang, Haoze and Xu, Yangchen and Mao, Runze and Li, Han and Chen, Zhi X. , journal =. A Preliminary Assessment of Coding Agents for. 2026 , eprint =. doi:10.48550/arXiv.2602.11689 , url =

work page doi:10.48550/arxiv.2602.11689 2026
[36]

2026 , eprint =

Xiao, Qisong and Chen, Xinhai and Wang, Qinglin and Guo, Xiaowei and Wang, Binglin and Chen, Weifeng and Wang, Zhichao and Liu, Yunfei and Xia, Rui and Zou, Hang and Liu, Gencheng and Li, Shuai and Liu, Jie , journal =. 2026 , eprint =. doi:10.48550/arXiv.2601.21681 , url =

work page doi:10.48550/arxiv.2601.21681 2026
[37]

2024 , eprint =

Xu, Leidong and Mohaddes, Danyal and Wang, Yi , journal =. 2024 , eprint =. doi:10.48550/arXiv.2412.17146 , url =

work page doi:10.48550/arxiv.2412.17146 2024
[38]

Physics of Fluids , year =

CFDagent: A Language-Guided, Zero-Shot Multi-Agent System for Complex Flow Simulation , author =. Physics of Fluids , year =. doi:10.1063/5.0294696 , eprint =

work page doi:10.1063/5.0294696
[39]

Yamada, Yutaro and Lange, Robert Tjarko and Lu, Cong and Hu, Shengran and Lu, Chris and Foerster, Jakob and Clune, Jeff and Ha, David , journal =. The. 2025 , doi =. 2504.08066 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

arXiv preprint arXiv:2601.07252 , year =

SwarmFoam: An OpenFOAM Multi-Agent System Based on Multiple Types of Large Language Models , author =. arXiv preprint arXiv:2601.07252 , year =. doi:10.48550/arXiv.2601.07252 , eprint =

work page doi:10.48550/arxiv.2601.07252
[41]

& Pan, S

Foam-Agent 2.0: An End-to-End Composable Multi-Agent Framework for Automating CFD Simulation in OpenFOAM , author =. arXiv preprint arXiv:2509.18178 , year =. doi:10.48550/arXiv.2509.18178 , eprint =

work page doi:10.48550/arxiv.2509.18178
[42]

Bohrium +

Zhang, Linfeng and others , journal =. Bohrium +. 2025 , eprint =. doi:10.48550/arXiv.2512.20469 , url =

work page doi:10.48550/arxiv.2512.20469 2025