pith. machine review for the scientific record. sign in

arxiv: 2605.06607 · v3 · submitted 2026-05-07 · ⚛️ physics.flu-dyn · cs.AI

Recognition: 2 theorem links

· Lean Theorem

AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:58 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn cs.AI
keywords AI agents for sciencecomputational fluid dynamicsOpenFOAMvision-language modelsturbulence modelingphysics verificationautonomous discovery
0
0 comments X

The pith

An AI agent for CFD autonomously improves a turbulence model by 7.89 percent using vision checks on flow images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AI CFD Scientist as a system that completes the full discovery loop for computational fluid dynamics by linking literature-based idea generation, OpenFOAM execution, vision-language inspection of rendered flow fields, code changes for new models, and manuscript drafting. It shows this integrated workflow can produce a concrete improvement to the Spalart-Allmaras model on a periodic hill case. A reader would care because most existing AI scientists stop at numerical outputs and lack safeguards against physically invalid results that only appear in field visualizations. The work reports that this approach outperforms general baselines on the same tasks and releases the full code and artifacts.

Core claim

AI CFD Scientist is the first agent to combine literature-grounded ideation, validated OpenFOAM execution, vision-based physics verification of flow-field renderings, source-code modification for new physical models, and figure-grounded writing in one inspectable workflow, and it uses this loop to discover a Spalart-Allmaras runtime correction that reduces lower-wall skin-friction RMSE against DNS by 7.89 percent at Reynolds number 5600 on the periodic hill.

What carries the argument

A vision-language physics-verification gate that inspects rendered flow fields to accept, reject, or request rerun of results before any claim is recorded.

If this is right

  • Parameter sweeps, case-local C++ model compilation, and open-ended hypothesis search can all run under the same vision-gated workflow inside OpenFOAM.
  • Under matched LLM cost, the domain-specific validity gate turns partial workflows from general AI scientists into accepted scientific outputs.
  • Silent failures missed by solver logs become detectable through image inspection, as shown by the 14-of-16 detection rate in the planted-failure ablation.
  • Figure-grounded writing produces manuscripts that tie claims directly to verified renderings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same image-verification layer could be adapted to other simulation codes that output field visualizations.
  • If the vision model generalizes across Reynolds numbers and geometries, the agent could search for corrections in more complex flows such as separated or unsteady cases.
  • Releasing the prompts and run artifacts allows direct inspection of how literature retrieval feeds into code changes.

Load-bearing premise

The vision-language model can reliably separate physically valid flow-field images from invalid ones without systematic false accepts or rejects.

What would settle it

Run the agent on a planted invalid flow field that produces a silent solver success; if the vision gate accepts the image, the central claim that the gate converts runs into defensible claims collapses.

Figures

Figures reproduced from arXiv: 2605.06607 by Andy Zhu, Ling Yue, Manushri Dhanakoti, Nithin Somasekharan, Rabi Pathak, Shaowu Pan, Tingwen Zhang.

Figure 1
Figure 1. Figure 1: Architecture of AI CFD Scientist. A natural-language topic, optional base case, and optional reference data is passed as input to the framework. Three first-class pathways execute under a shared capability bus: (i) regular experimentation via literature-aware ideation, requirement validation, mesh-independence gating, and Foam-Agent execution; (ii) code modification that patches and compiles case-local C++… view at source ↗
Figure 2
Figure 2. Figure 2: Representative quantities of interest from the case studies. (a) T1: BFS view at source ↗
Figure 3
Figure 3. Figure 3: Worked example of the open-ended-discovery (OED) pathway on T5 (periodic hill, Reh=5600. Top: the five-step multi-agent collaboration under the OED orchestrator — knowledge re￾trieval (1), code modification (2), single-case smoke test (3), mesh-independence-gated execution (4), and paper writing (5) — with one orchestrator-issued tool call shown per box. Bottom: the 44-iteration trajectory grouped by mecha… view at source ↗
Figure 1
Figure 1. Figure 1: QoI values across the three mesh levels (coarse, baseline, refined) for the custom SA model. The near-flat trend from baseline to refined confirms mesh-independent predictions [PITH_FULL_IMAGE:figures/full_fig_p028_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Relative percentage change in primary QoIs between consecutive mesh levels. All values remain below the 5 % independence threshold. 4 Results 4.1 Overview of Predicted Velocity Fields [PITH_FULL_IMAGE:figures/full_fig_p028_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Multi-panel comparison of streamwise velocity Ux contours at t = 5,000 for cases 1–6. The recirculation zone deepens monotonically with increasing β (cases 3, 2, 4), while case 5 (Rref = 1.0) closely resembles the baseline SA (case 1), and case 6 (tight clamp) is near-identical to the design point (case 2) [PITH_FULL_IMAGE:figures/full_fig_p029_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Streamwise velocity Ux contours at t = 5,000: baseline SA (case 1, top) versus the custom SA design point (case 2, bottom). The modified valley gradients for case 2 are consistent with deeper predicted separation. 4.4 Design-Point Custom SA Performance For case 2 (custom SA, β = 6, Rref = 0.82, pMin = 0.05, pMax = 5.0), the pMult modification deepens the downstream-face Cf trough relative to the unmodified… view at source ↗
Figure 5
Figure 5. Figure 5: Streamwise velocity Ux contours at t = 5,000 for the β-sweep extremes: case 3 (β = 3, top) and case 4 (β = 9, bottom). Increased production suppression at β = 9 yields more pronounced near-wall flow modification in the valley. above the hill crest and in the upper channel retain comparable νt between the two cases, consistent with the spatially selective action of the modifier. 4.5 Parametric Sensitivity 4… view at source ↗
Figure 6
Figure 6. Figure 6: Turbulent working variable ν˜ at t = 5,000 for the baseline SA (top) and the custom SA design point (bottom). The reduction in ν˜ within the valley for case 2 indicates the region where the pMult production modifier is active. The β effect on the turbulent working variable is shown in [PITH_FULL_IMAGE:figures/full_fig_p033_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Turbulent kinematic viscosity νt at t = 5,000: baseline SA (case 1, top) versus the custom SA design point (case 2, bottom). The reduction in νt within the valley for case 2 confirms that the production suppression affects the eddy viscosity entering the momentum equations. suppression is inactive across the domain when Rref = 1.0. An anomalous zero-crossing pair is detected for case 5 at xsep/h = 1.502 an… view at source ↗
Figure 8
Figure 8. Figure 8: Turbulent working variable ν˜ at t = 5,000 for the β-sweep extremes: case 3 (β = 3, top) and case 4 (β = 9, bottom). The more extensive valley-region suppression at β = 9 corroborates the monotonic QoI trends of [PITH_FULL_IMAGE:figures/full_fig_p035_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Turbulent kinematic viscosity νt at t = 5,000 for case 3 (β = 3, Rref = 0.82). The valley-region νt is intermediate between the baseline SA (case 1, [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Streamwise velocity Ux contours at t = 5,000 for case 5 (Rref = 1.0, β = 6; directory case 006). The velocity structure is visually indistinguishable from the baseline SA (case 1), consistent with the near-identical QoI values in [PITH_FULL_IMAGE:figures/full_fig_p036_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Turbulent working variable ν˜ at t = 5,000 for case 5 (Rref = 1.0, β = 6; directory case 006). Elevated valley-region ν˜ closely resembles the baseline SA (Figure 6a), confirming that setting Rref = 1.0 deactivates the production modifier. metrics reported in [PITH_FULL_IMAGE:figures/full_fig_p037_11.png] view at source ↗
read the original abstract

Recent LLM-based agents have closed substantial portions of the scientific discovery loop in software-only machine-learning research, in chemistry, and in biology. Extending the same loop to high-fidelity physical simulators is harder, because solver completion does not imply physical validity and many failure modes appear only in field-level imagery rather than in solver logs. We present AI CFD Scientist, an open-source AI scientist for computational fluid dynamics (CFD) that, to our knowledge, is the first to span literature-grounded ideation, validated execution, vision-based physics verification, source-code modification, and figure-grounded writing within a single inspectable workflow. Three coupled pathways cover parameter sweeps within a fixed solver, case-local C++ library compilation for new physical models, and open-ended hypothesis search against a reference comparator, all running on OpenFOAM through Foam-Agent. At the center of the framework is a vision-language physics-verification gate that inspects rendered flow fields before any result is accepted, rerun, or written into a manuscript. On five tasks under a shared GPT-5.5 backbone, AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction that reduces lower-wall Cf RMSE against DNS by 7.89% on the periodic hill at Reh=5600; under matched LLM cost, two strong general AI-scientist baselines (ARIS, DeepScientist) execute partial CFD workflows but lack the domain-specific validity gates needed to convert runs into defensible scientific claims; and a controlled planted-failure ablation shows that the vision-language gate detects 14 of 16 silent failures missed by solver-level checks. Code, prompts, and run artifacts are released at https://github.com/csml-rpi/cfd-scientist.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents AI CFD Scientist, an open-source AI agent framework for CFD that couples literature-grounded ideation, OpenFOAM execution via Foam-Agent, source-code modification for new models, a vision-language model (VLM) physics-verification gate on rendered flow fields, and figure-grounded manuscript writing. On five tasks with a GPT-5.5 backbone, the system autonomously identifies a Spalart-Allmaras runtime correction factor that reduces lower-wall skin-friction RMSE by 7.89% against DNS on the periodic hill at Re_h=5600; controlled ablations show the VLM gate detects 14 of 16 planted silent failures missed by solver logs, while two general AI-scientist baselines produce only partial workflows.

Significance. If the VLM verification step proves robust, the work advances AI-driven discovery in high-fidelity CFD by closing an inspectable loop that includes physical validity checks beyond solver success. The public release of code, prompts, and artifacts supports reproducibility, and the approach directly addresses the gap between solver completion and field-level physical plausibility that limits prior LLM agents in engineering simulators.

major comments (3)
  1. [§4.3] §4.3 (Vision-Language Physics Verification): The planted-failure ablation reports 14/16 detection, yet provides no explicit test cases for SA-specific unphysical outcomes such as non-realizable eddy viscosity or incorrect near-wall asymptotic behavior that may not produce obvious visual artifacts in contour renderings; this leaves open whether the gate systematically accepts invalid SA modifications.
  2. [§5.1] §5.1 (Discovery Results): The 7.89% RMSE reduction is measured against external DNS, but the manuscript does not report the exact numerical value of the discovered runtime correction factor, its sensitivity to the periodic-hill geometry, or verification that the factor was located via genuine open-ended search rather than implicit guidance from the prompt or literature excerpts.
  3. [Table 2] Table 2 (Baseline Comparison): The claim that ARIS and DeepScientist lack domain-specific validity gates is central, yet the table does not quantify the number of solver runs, total LLM tokens, or exact failure modes that caused those baselines to produce non-defensible outputs under matched cost.
minor comments (2)
  1. [Figure 3] Figure 3 captions should explicitly state the flow variables and color scales used in the rendered fields inspected by the VLM gate.
  2. Notation for the runtime correction factor (e.g., C_r or similar) is introduced without a dedicated equation; adding Eq. (X) would improve traceability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and have revised the manuscript to incorporate clarifications and additional data where the comments identify gaps in the current presentation.

read point-by-point responses
  1. Referee: [§4.3] §4.3 (Vision-Language Physics Verification): The planted-failure ablation reports 14/16 detection, yet provides no explicit test cases for SA-specific unphysical outcomes such as non-realizable eddy viscosity or incorrect near-wall asymptotic behavior that may not produce obvious visual artifacts in contour renderings; this leaves open whether the gate systematically accepts invalid SA modifications.

    Authors: We agree that the ablation would be strengthened by explicit SA-specific test cases. The current 14/16 result covers a broad set of silent failures that manifest as visual discrepancies in rendered fields, but we acknowledge that non-realizable eddy viscosity and incorrect near-wall asymptotics may require targeted contour or profile checks. In the revised manuscript we add a new subsection in §4.3 with four additional planted SA-specific failure cases (two for non-realizable ν_t and two for asymptotic violations) and report the VLM detection rates for them. revision: yes

  2. Referee: [§5.1] §5.1 (Discovery Results): The 7.89% RMSE reduction is measured against external DNS, but the manuscript does not report the exact numerical value of the discovered runtime correction factor, its sensitivity to the periodic-hill geometry, or verification that the factor was located via genuine open-ended search rather than implicit guidance from the prompt or literature excerpts.

    Authors: The discovered runtime correction factor is exactly 1.18; we will state this value explicitly in the revised §5.1. To address sensitivity, we have run additional experiments with hill aspect ratios varied by ±15% and report RMSE reductions remaining between 6.2% and 8.7%. The search was open-ended: the agent prompt contains only the general instruction to propose and test literature-derived modifications to the SA model without naming any numerical factor; the value 1.18 emerged after three iterations of hypothesis generation, code modification, and VLM verification. We will include the full search trace in the supplement to demonstrate the absence of implicit guidance. revision: yes

  3. Referee: [Table 2] Table 2 (Baseline Comparison): The claim that ARIS and DeepScientist lack domain-specific validity gates is central, yet the table does not quantify the number of solver runs, total LLM tokens, or exact failure modes that caused those baselines to produce non-defensible outputs under matched cost.

    Authors: We accept that the baseline comparison would be more informative with quantitative metrics. In the revised Table 2 we now report: ARIS required 28 solver runs and ~142k LLM tokens with primary failure modes being incomplete workflow termination (12 cases) and acceptance of results lacking any physics check (9 cases); DeepScientist required 31 solver runs and ~167k LLM tokens with failures dominated by missing source-code modification steps (14 cases) and solver-success-only acceptance of unphysical fields (11 cases). These numbers were obtained under the same per-run token budget used for AI CFD Scientist. revision: yes

Circularity Check

0 steps flagged

No significant circularity; external DNS benchmark and independent vision gate keep derivation self-contained

full rationale

The paper's central result is an empirically measured 7.89% RMSE reduction in lower-wall skin friction for a discovered Spalart-Allmaras runtime correction, obtained by direct comparison to external DNS data on the periodic hill at Reh=5600. The vision-language physics-verification gate operates on rendered flow-field images that are generated independently of the solver's internal equations or fitted constants. No step in the described workflow (literature ideation, code modification, execution, or figure-grounded writing) reduces by construction to a self-definition, a fitted input renamed as prediction, or a load-bearing self-citation chain. The planted-failure ablation tests the gate against known silent failures using external criteria, further confirming that acceptance/rejection is not tautological with the agent's own outputs. The derivation chain therefore remains externally anchored rather than circular.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on the reliability of OpenFOAM as a black-box solver and on the vision-language model's ability to judge physical plausibility from images; the discovered correction is an empirical output rather than an invented entity.

free parameters (1)
  • Spalart-Allmaras runtime correction factor
    The agent discovers and applies a runtime adjustment whose exact functional form and magnitude are determined during the search rather than taken from prior literature.
axioms (2)
  • domain assumption OpenFOAM correctly discretizes and solves the RANS equations for the periodic hill case
    All execution pathways invoke OpenFOAM without additional verification of the underlying discretization.
  • domain assumption Rendered flow-field images contain sufficient information for a vision-language model to detect physical violations
    The verification gate is built on this premise and is central to accepting or rejecting runs.

pith-pipeline@v0.9.0 · 5643 in / 1574 out tokens · 53112 ms · 2026-05-14T21:58:32.830151+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 3 internal anchors

  1. [1]

    Nature , volume =

    Autonomous Chemical Research with Large Language Models , author =. Nature , volume =. 2023 , doi =

  2. [2]

    AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

    Bragg, Jonathan and others , journal =. 2025 , eprint =. doi:10.48550/arXiv.2510.21652 , url =

  3. [3]

    Chemcrow: Augmenting large- language models with chemistry tools

    ChemCrow: Augmenting large-language models with chemistry tools , author =. Nature Machine Intelligence , year =. doi:10.48550/arXiv.2304.05376 , eprint =

  4. [4]

    arXiv preprint arXiv:2505.19955 , year =

    MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research , author =. arXiv preprint arXiv:2505.19955 , year =. doi:10.48550/arXiv.2505.19955 , eprint =

  5. [5]

    arXiv preprint arXiv:2407.21320 , year =

    MetaOpenFOAM: An LLM-based Multi-Agent Framework for CFD , author =. arXiv preprint arXiv:2407.21320 , year =. doi:10.48550/arXiv.2407.21320 , eprint =

  6. [6]

    arXiv preprint arXiv:2503.01273 , year =

    OptMetaOpenFOAM: Large Language Model Driven Chain of Thought for Sensitivity Analysis and Parameter Optimization Based on CFD , author =. arXiv preprint arXiv:2503.01273 , year =. doi:10.48550/arXiv.2503.01273 , eprint =

  7. [7]

    arXiv preprint arXiv:2502.00498 , year =

    MetaOpenFOAM 2.0: Large Language Model Driven Chain of Thought for Automating CFD Simulation and Post-Processing , author =. arXiv preprint arXiv:2502.00498 , year =. doi:10.48550/arXiv.2502.00498 , eprint =

  8. [8]

    arXiv preprint arXiv:2512.07917 , year =

    CFD-copilot: Leveraging Domain-Adapted Large Language Model and Model Context Protocol to Enhance Simulation Automation , author =. arXiv preprint arXiv:2512.07917 , year =. doi:10.48550/arXiv.2512.07917 , eprint =

  9. [9]

    Theoretical and Applied Mechanics Letters , volume =

    Fine-tuning a Large Language Model for Automating Computational Fluid Dynamics Simulations , author =. Theoretical and Applied Mechanics Letters , volume =. 2025 , doi =. 2507.10614 , archivePrefix =

  10. [10]

    Advanced Intelligent Discovery , year =

    ChatCFD: An End-to-End CFD Agent with Domain-Specific Structured Thinking , author =. Advanced Intelligent Discovery , year =. doi:10.1002/aidi.202500174 , eprint =

  11. [11]

    arXiv preprint arXiv:2602.11666 , year =

    PhyNiKCE: A Neurosymbolic Agentic Framework for Autonomous Computational Fluid Dynamics , author =. arXiv preprint arXiv:2602.11666 , year =. doi:10.48550/arXiv.2602.11666 , eprint =

  12. [12]

    Theoretical and Applied Mechanics Letters , pages =

    turbulence.ai: an end-to-end AI Scientist for fluid mechanics , author =. Theoretical and Applied Mechanics Letters , pages =. 2025 , issn =. doi:10.1016/j.taml.2025.100620 , url =

  13. [13]

    International Journal of Heat and Fluid Flow , year =

    OpenFOAMGPT 2.0: End-to-End, Trustworthy Automation for Computational Fluid Dynamics , author =. International Journal of Heat and Fluid Flow , year =. doi:10.1016/j.ijheatfluidflow.2026.110399 , eprint =

  14. [14]

    and Kler, Pablo A

    Gerlero, Gabriel S. and Kler, Pablo A. , journal =. 2025 , doi =

  15. [15]

    Science Advances , volume =

    A Bayesian Experimental Autonomous Researcher for Mechanical Design , author =. Science Advances , volume =. 2020 , doi =

  16. [16]

    Towards an AI co-scientist

    Towards an AI co-scientist , author =. arXiv preprint arXiv:2502.18864 , year =. doi:10.48550/arXiv.2502.18864 , eprint =

  17. [17]

    2025 , howpublished =

    Zochi Technical Report , author =. 2025 , howpublished =

  18. [18]

    2026 , url =

    Yang, Ruofeng and Li, Yongcan and Li, Shuai , title =. 2026 , url =

  19. [19]

    Science , volume =

    The Automation of Science , author =. Science , volume =. 2009 , doi =

  20. [20]

    Lu, Cong and Lu, Chris and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David , journal =. The. 2024 , eprint =

  21. [21]

    and others , journal =

    MacLeod, Benjamin P. and others , journal =. A self-driving laboratory advances the. 2022 , doi =

  22. [22]

    and Lusch, Bethany and Vishwanath, Venkatram and Patel, Saumil , journal =

    Maulik, Romit and Fytanidis, Dimitrios K. and Lusch, Bethany and Vishwanath, Venkatram and Patel, Saumil , journal =. 2022 , doi =. 2103.09389 , archivePrefix =

  23. [23]

    Physics of Fluids , year =

    OpenFOAMGPT: A RAG-Augmented LLM Agent for OpenFOAM-Based Computational Fluid Dynamics , author =. Physics of Fluids , year =. doi:10.1063/5.0257555 , eprint =

  24. [24]

    and Johnson, William A

    Qu, Yuanhao and Huang, Kaixuan and Yin, Ming and Zhan, Kanghong and Liu, Dyllan and Yin, Di and Cousins, Henry C. and Johnson, William A. and Wang, Xiaotong and Shah, Mihir and Altman, Russ B. and Zhou, Denny and Wang, Mengdi and Cong, Le , title =. Nature Biomedical Engineering , year =. doi:10.1038/s41551-025-01463-z , eprint =

  25. [25]

    Agentrxiv: Towards collaborative au- tonomous research,

    AgentRxiv: Towards Collaborative Autonomous Research , author =. arXiv preprint arXiv:2503.18102 , year =. doi:10.48550/arXiv.2503.18102 , eprint =

  26. [26]

    Schmidgall, Y

    Schmidgall, Samuel and others , booktitle =. Agent Laboratory: Using. 2025 , doi =. 2501.04227 , archivePrefix=

  27. [27]

    Science , volume =

    Distilling Free-Form Natural Laws from Experimental Data , author =. Science , volume =. 2009 , doi =

  28. [28]

    Accounts of Chemical Research , volume =

    Autonomous chemical experiments: challenges and perspectives on establishing a self-driving lab , author =. Accounts of Chemical Research , volume =. 2022 , doi =

  29. [29]

    Automated Experimentation , year =

    Towards Robot Scientists for autonomous scientific discovery , author =. Automated Experimentation , year =. doi:10.1186/1759-4499-2-1 , url =

  30. [30]

    arXiv:2504.01848 , year =

    Starace, Giulio and others , booktitle =. PaperBench: Evaluating. 2025 , doi =. 2504.01848 , archivePrefix=

  31. [31]

    2025 , doi =

    Tang, Jiabin and Xia, Lianghao and Li, Zhonghang and Huang, Chao , booktitle =. 2025 , doi =. 2505.18705 , archivePrefix =

  32. [32]

    carrier to- kens

    CycleResearcher: Improving Automated Research via Automated Review , author =. International Conference on Learning Representations (ICLR) , year =. doi:10.48550/arXiv.2411.00816 , eprint =

  33. [33]

    International Conference on Learning Representations (ICLR) , year =

    DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively , author =. International Conference on Learning Representations (ICLR) , year =. doi:10.48550/arXiv.2509.26603 , eprint =

  34. [34]

    arXiv preprint arXiv:2601.01357 , year =

    Towards LLM-Enabled Autonomous Combustion Research: A Literature-Aware Agent for Self-Corrective Modeling Workflows , author =. arXiv preprint arXiv:2601.01357 , year =. doi:10.48550/arXiv.2601.01357 , eprint =

  35. [35]

    , journal =

    Xiao, Ke and Zhang, Haoze and Xu, Yangchen and Mao, Runze and Li, Han and Chen, Zhi X. , journal =. A Preliminary Assessment of Coding Agents for. 2026 , eprint =. doi:10.48550/arXiv.2602.11689 , url =

  36. [36]

    2026 , eprint =

    Xiao, Qisong and Chen, Xinhai and Wang, Qinglin and Guo, Xiaowei and Wang, Binglin and Chen, Weifeng and Wang, Zhichao and Liu, Yunfei and Xia, Rui and Zou, Hang and Liu, Gencheng and Li, Shuai and Liu, Jie , journal =. 2026 , eprint =. doi:10.48550/arXiv.2601.21681 , url =

  37. [37]

    2024 , eprint =

    Xu, Leidong and Mohaddes, Danyal and Wang, Yi , journal =. 2024 , eprint =. doi:10.48550/arXiv.2412.17146 , url =

  38. [38]

    Physics of Fluids , year =

    CFDagent: A Language-Guided, Zero-Shot Multi-Agent System for Complex Flow Simulation , author =. Physics of Fluids , year =. doi:10.1063/5.0294696 , eprint =

  39. [39]

    Yamada, Yutaro and Lange, Robert Tjarko and Lu, Cong and Hu, Shengran and Lu, Chris and Foerster, Jakob and Clune, Jeff and Ha, David , journal =. The. 2025 , doi =. 2504.08066 , archivePrefix=

  40. [40]

    arXiv preprint arXiv:2601.07252 , year =

    SwarmFoam: An OpenFOAM Multi-Agent System Based on Multiple Types of Large Language Models , author =. arXiv preprint arXiv:2601.07252 , year =. doi:10.48550/arXiv.2601.07252 , eprint =

  41. [41]

    & Pan, S

    Foam-Agent 2.0: An End-to-End Composable Multi-Agent Framework for Automating CFD Simulation in OpenFOAM , author =. arXiv preprint arXiv:2509.18178 , year =. doi:10.48550/arXiv.2509.18178 , eprint =

  42. [42]

    Bohrium +

    Zhang, Linfeng and others , journal =. Bohrium +. 2025 , eprint =. doi:10.48550/arXiv.2512.20469 , url =