CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics
Pith reviewed 2026-05-18 15:01 UTC · model grok-4.3
The pith
A new benchmark suite with three components evaluates how well large language models perform on graduate-level computational fluid dynamics knowledge, reasoning, and code implementation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper presents CFDLLMBench as a benchmark suite that contains CFDQuery for graduate-level CFD knowledge, CFDCodeBench for numerical and physical reasoning, and FoamBench for context-dependent CFD workflow implementation, all supported by a task taxonomy and an evaluation framework that tracks code executability, solution accuracy, and numerical convergence behavior.
What carries the argument
The CFDLLMBench suite, whose three complementary components together test distinct LLM competencies using tasks drawn from real-world CFD practices and a consistent set of reproducibility-focused metrics.
If this is right
- LLM performance in automating CFD numerical experiments can now be measured systematically across knowledge, reasoning, and implementation stages.
- Developers gain concrete scores on code executability, solution accuracy, and convergence that can guide model improvement.
- A reusable foundation exists for building and checking LLM tools that assist with complex physical-system simulations.
Where Pith is reading between the lines
- The same three-competency structure could be adapted to create benchmarks for other simulation-heavy fields such as heat transfer or structural analysis.
- Strong benchmark results might indicate which models are ready for iterative, feedback-driven CFD workflows that combine LLM suggestions with live solver output.
- Extending the benchmark with time-dependent or multi-physics problems would test whether current LLM reasoning scales to more demanding CFD scenarios.
Load-bearing premise
The chosen tasks and metrics in CFDQuery, CFDCodeBench, and FoamBench faithfully represent the main difficulties and everyday practices of computational fluid dynamics without major biases or missing areas.
What would settle it
If models that score well on the benchmark produce code that fails to run, yields wrong answers, or diverges when applied to standard CFD test cases drawn from outside the benchmark, the claim that the suite measures genuine CFD capability would be weakened.
Figures
read the original abstract
Large Language Models (LLMs) have demonstrated strong performance across general NLP tasks, but their utility in automating numerical experiments of complex physical system -- a critical and labor-intensive component -- remains underexplored. As the major workhorse of computational science over the past decades, Computational Fluid Dynamics (CFD) offers a uniquely challenging testbed for evaluating the scientific capabilities of LLMs. We introduce CFDLLMBench, a benchmark suite comprising three complementary components -- CFDQuery, CFDCodeBench, and FoamBench -- designed to holistically evaluate LLM performance across three key competencies: graduate-level CFD knowledge, numerical and physical reasoning of CFD, and context-dependent implementation of CFD workflows. Grounded in real-world CFD practices, our benchmark combines a detailed task taxonomy with a rigorous evaluation framework to deliver reproducible results and quantify LLM performance across code executability, solution accuracy, and numerical convergence behavior. CFDLLMBench establishes a solid foundation for the development and evaluation of LLM-driven automation of numerical experiments for complex physical systems. Code and data are available at https://github.com/NREL-Theseus/cfdllmbench/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CFDLLMBench, a benchmark suite comprising three complementary components—CFDQuery, CFDCodeBench, and FoamBench—designed to holistically evaluate LLM performance across graduate-level CFD knowledge, numerical and physical reasoning of CFD, and context-dependent implementation of CFD workflows. Grounded in real-world CFD practices, the benchmark combines a detailed task taxonomy with a rigorous evaluation framework to deliver reproducible results quantifying LLM performance on code executability, solution accuracy, and numerical convergence behavior.
Significance. If the tasks and metrics accurately capture core CFD challenges, this benchmark could provide a valuable standardized foundation for assessing and advancing LLM capabilities in automating numerical experiments for complex physical systems. The open release of code and data at the provided GitHub repository is a clear strength that supports reproducibility and community use.
major comments (1)
- FoamBench component: the OpenFOAM-centric design risks measuring familiarity with one specific package's syntax and case-file structure rather than transferable context-dependent implementation skills across CFD methods (e.g., finite-element or spectral approaches). This directly affects the central claim that the three components together provide a holistic evaluation of 'context-dependent implementation of CFD workflows' grounded in general real-world practices.
minor comments (1)
- Abstract: the scale of the benchmark (number of tasks or queries per component) is not quantified, which would help readers assess coverage and effort required for evaluation.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript introducing CFDLLMBench. We address the major comment below and have revised the manuscript to clarify the design and scope of the benchmark suite.
read point-by-point responses
-
Referee: FoamBench component: the OpenFOAM-centric design risks measuring familiarity with one specific package's syntax and case-file structure rather than transferable context-dependent implementation skills across CFD methods (e.g., finite-element or spectral approaches). This directly affects the central claim that the three components together provide a holistic evaluation of 'context-dependent implementation of CFD workflows' grounded in general real-world practices.
Authors: We appreciate the referee's point regarding the specificity of FoamBench. OpenFOAM was selected because it is a widely adopted, open-source finite-volume CFD platform used extensively in both academic research and industrial applications for simulating complex flows. The tasks in FoamBench focus on practical skills such as case configuration, boundary condition specification, solver parameter tuning, and ensuring numerical convergence, which reflect core elements of real-world CFD workflows. We acknowledge, however, that this design emphasizes implementation within one particular software ecosystem and does not directly assess transferable skills for alternative discretizations such as finite-element or spectral methods. To address this, we have revised the manuscript to explicitly state the rationale for the OpenFOAM focus, to temper the claim of full holism across all CFD paradigms, and to add a limitations paragraph noting that extensions to other frameworks would broaden coverage of context-dependent implementation skills. revision: yes
Circularity Check
Benchmark construction with no derivation chain or circular reductions
full rationale
The paper introduces CFDLLMBench as a new benchmark suite with three components (CFDQuery, CFDCodeBench, FoamBench) to evaluate LLM competencies in CFD knowledge, reasoning, and implementation. No mathematical derivations, equations, fitted parameters, or predictions are present that could reduce to inputs by construction. The design is presented as grounded in real-world practices with a task taxonomy and evaluation framework; claims about holistic evaluation are supported directly by the described components rather than self-citations, ansatzes, or uniqueness theorems. This is a standard benchmark paper whose central contribution is self-contained construction, warranting no circularity flags.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The tasks in CFDQuery, CFDCodeBench, and FoamBench accurately reflect real-world CFD practices and challenges
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
FoamBench: Configuring OpenFOAM case files for simulating realistic engineering scenarios such as incompressible flow over obstacles, supersonic flow with shockwaves, Rayleigh-Benard convection
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science
SCICONVBENCH is a new benchmark evaluating LLMs on multi-turn disambiguation and inconsistency resolution for task formulation in computational science, with frontier models reaching only 52.7% success on fluid mechan...
-
SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?
LLMs predict outcomes of real scientific experiments at 14-26% accuracy, comparable to human experts, but lack calibration on prediction reliability while humans demonstrate strong calibration.
-
ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents
Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.
Reference graph
Works this paper leans on
-
[1]
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Claude 3.5 sonnet model card addendum
Anthropic. Claude 3.5 sonnet model card addendum. https://www-cdn.anthropic.com/ fed9cc193a14b84131812372d8d5857f8f304c52/Model_Card_Claude_3_Addendum.pdf,
-
[3]
Accessed: 2025-05-03
work page 2025
-
[4]
Program Synthesis with Large Language Models
Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q., et al. Program synthesis with large language models.arXiv preprint arXiv:2108.07732, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[5]
Barba, L. A. and Forsyth, G. F. Cfd python: the 12 steps to navier-stokes equations.Journal of Open Source Education, 2(16):21, 2018
work page 2018
-
[6]
Beltagy, I., Lo, K., and Cohan, A. Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676, 2019
-
[7]
Blocken, B. Computational fluid dynamics for urban physics: Importance, scales, possibilities, limitations and ten tips and tricks towards accurate and reliable simulations.Building and Environment, 91:219–245, 2015
work page 2015
-
[8]
Blocken, B., Stathopoulos, T., Carmeliet, J., and Hensen, J. L. Application of computational fluid dynamics in building performance simulation for the outdoor environment: an overview. Journal of building performance simulation, 4(2):157–184, 2011
work page 2011
-
[9]
Super: Evaluating agents on setting up and executing tasks from research repositories
Bogin, B., Yang, K., Gupta, S., Richardson, K., Bransom, E., Clark, P., Sabharwal, A., and Khot, T. Super: Evaluating agents on setting up and executing tasks from research repositories. arXiv preprint arXiv:2409.07440, 2024
-
[10]
A., MacKnight, R., Kline, B., and Gomes, G
Boiko, D. A., MacKnight, R., Kline, B., and Gomes, G. Autonomous chemical research with large language models.Nature, 624(7992):570–578, 2023
work page 2023
-
[11]
ChemCrow: Augmenting large-language models with chemistry tools
Bran, A. M., Cox, S., Schilter, O., Baldassari, C., White, A. D., and Schwaller, P. Chemcrow: Augmenting large-language models with chemistry tools.arXiv preprint arXiv:2304.05376, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[12]
Burns, K. J., Vasil, G. M., Oishi, J. S., Lecoanet, D., and Brown, B. P. Dedalus: A flexible framework for numerical simulations with spectral methods.Physical Review Research, 2(2): 023068, 2020
work page 2020
-
[13]
Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y ., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M., C...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[14]
Metaopenfoam: an llm-based multi-agent framework for cfd
Chen, Y ., Zhu, X., Zhou, H., and Ren, Z. Metaopenfoam: an llm-based multi-agent framework for cfd.arXiv preprint arXiv:2407.21320, 2024. 10
-
[15]
Chen, Z., Chen, S., Ning, Y ., Zhang, Q., Wang, B., Yu, B., Li, Y ., Liao, Z., Wei, C., Lu, Z., et al. Scienceagentbench: Toward rigorous assessment of language agents for data-driven scientific discovery.arXiv preprint arXiv:2410.05080, 2024
-
[16]
Cherian, A., Corcodel, R., Jain, S., and Romeres, D. Llmphy: Complex physical reasoning using large language models and world models.arXiv preprint arXiv:2411.08027, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Cui, H., Shamsi, Z., Cheon, G., Ma, X., Li, S., Tikhanovskaya, M., Norgaard, P., Mudur, N., Plomecka, M., Raccuglia, P., et al. Curie: Evaluating llms on multitask scientific long context understanding and reasoning.arXiv preprint arXiv:2503.13517, 2025
-
[18]
Start building with gemini 2.5 flash
DeepMind, G. Start building with gemini 2.5 flash. https://developers.googleblog. com/en/start-building-with-gemini-25-flash/?utm_source=deepmind.google& utm_medium=referral&utm_campaign=gdm&utm_content=, 2025. Accessed: 2025-05-03
work page 2025
-
[19]
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
Glazer, E., Erdil, E., Besiroglu, T., Chicharro, D., Chen, E., Gunning, A., Olsson, C. F., Denain, J.-S., Ho, A., Santos, E. d. O., et al. Frontiermath: A benchmark for evaluating advanced mathematical reasoning in ai.arXiv preprint arXiv:2411.04872, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[20]
Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Jacobs, P. F. and Pollice, R. Developing large language models for quantum chemistry simulation input generation.Digital Discovery, 2025
work page 2025
-
[22]
Jadhav, Y . and Farimani, A. B. Large language model agent as a mechanical designer.arXiv preprint arXiv:2404.17525, 2024
-
[23]
Openfoam: A c++ library for complex physics simulations
Jasak, H., Jemcov, A., Tukovic, Z., et al. Openfoam: A c++ library for complex physics simulations. InInternational workshop on coupled methods in numerical dynamics, volume 1000, pp. 1–20. IUC Dubrovnik Croatia, 2007
work page 2007
-
[24]
Jiang, G., Ma, Z., Zhang, L., and Chen, J. Eplus-llm: A large language model-based computing platform for automated building energy modeling.Applied Energy, 367:123431, 2024
work page 2024
-
[25]
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., and Narasimhan, K. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Kumar, V ., Gleyzer, L., Kahana, A., Shukla, K., and Karniadakis, G. E. Mycrunchgpt: A llm assisted framework for scientific machine learning.Journal of Machine Learning for Modeling and Computing, 4(4), 2023
work page 2023
-
[27]
Engr 491: Computational fluid dynamics
Lab, O. Engr 491: Computational fluid dynamics. https://github.com/okcfdlab/engr491,
-
[28]
Accessed: 2025-05-16
work page 2025
-
[29]
Ds-1000: A natural and reliable benchmark for data science code generation
Lai, Y ., Li, C., Wang, Y ., Zhang, T., Zhong, R., Zettlemoyer, L., Yih, W.-t., Fried, D., Wang, S., and Yu, T. Ds-1000: A natural and reliable benchmark for data science code generation. In International Conference on Machine Learning, pp. 18319–18345. PMLR, 2023
work page 2023
-
[30]
Lee, J. H., Michelis, M. Y ., Katzschmann, R., and Manchester, Z. Aquarium: A fully differen- tiable fluid-structure interaction solver for robotics applications. In2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 11272–11279. IEEE, 2023
work page 2023
-
[31]
Qasa: advanced question answering on scientific articles
Lee, Y ., Lee, K., Park, S., Hwang, D., Kim, J., Lee, H.-i., and Lee, M. Qasa: advanced question answering on scientific articles. InInternational Conference on Machine Learning, pp. 19036–19052. PMLR, 2023
work page 2023
-
[32]
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V ., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., et al. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459–9474, 2020
work page 2020
-
[33]
Fea- bench: A benchmark for evaluating repository-level code generation for feature implementation
Li, W., Zhang, X., Guo, Z., Mao, S., Luo, W., Peng, G., Huang, Y ., Wang, H., and Li, S. Fea- bench: A benchmark for evaluating repository-level code generation for feature implementation. arXiv preprint arXiv:2503.06680, 2025. 11
-
[34]
Rouge: A package for automatic evaluation of summaries
Lin, C.-Y . Rouge: A package for automatic evaluation of summaries. pp. 10, 01 2004
work page 2004
-
[35]
Luo, R., Sun, L., Xia, Y ., Qin, T., Zhang, S., Poon, H., and Liu, T.-Y . Biogpt: generative pre-trained transformer for biomedical text generation and mining.Briefings in bioinformatics, 23(6):bbac409, 2022
work page 2022
-
[36]
Majumder, B. P., Surana, H., Agarwal, D., Mishra, B. D., Meena, A., Prakhar, A., V ora, T., Khot, T., Sabharwal, A., and Clark, P. Discoverybench: Towards data-driven discovery with large language models.arXiv preprint arXiv:2407.01725, 2024
-
[37]
Laurent, Alex Andonian, Benjamin Tenmann, Siddharth Narayanan, Geemi P
Mitchener, L., Laurent, J. M., Tenmann, B., Narayanan, S., Wellawatte, G. P., White, A., Sani, L., and Rodriques, S. G. Bixbench: a comprehensive benchmark for llm-based agents in computational biology.arXiv preprint arXiv:2503.00096, 2025
-
[38]
Narayanan, S., Braza, J. D., Griffiths, R.-R., Ponnapati, M., Bou, A., Laurent, J., Kabeli, O., Wellawatte, G., Cox, S., Rodriques, S. G., et al. Aviary: training language agents on challenging scientific tasks.arXiv preprint arXiv:2412.21154, 2024
-
[39]
OpenAI. Hello gpt-4o. https://openai.com/index/hello-gpt-4o/, 2024. Accessed: 2025-05-03
work page 2024
-
[40]
OpenAI. Openai o3-mini. https://openai.com/index/openai-o3-mini/, 2024. Accessed: 2025-05-03
work page 2024
-
[41]
Pandey, S., Xu, R., Wang, W., and Chu, X. Openfoamgpt: A retrieval-augmented large language model (llm) agent for openfoam-based computational fluid dynamics.Physics of Fluids, 37(3), 2025
work page 2025
-
[42]
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Qin, Y ., Liang, S., Ye, Y ., Zhu, K., Yan, L., Lu, Y ., Lin, Y ., Cong, X., Tang, X., Qian, B., et al. Toolllm: Facilitating large language models to master 16000+ real-world apis.arXiv preprint arXiv:2307.16789, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
Rein, D., Hou, B. L., Stickland, A. C., Petty, J., Pang, R. Y ., Dirani, J., Michael, J., and Bowman, S. R. Gpqa: A graduate-level google-proof q&a benchmark. InFirst Conference on Language Modeling, 2024
work page 2024
-
[44]
Shah, M., Norris, S. E., Turner, R., and Flay, R. G. A review of computational fluid dynamics application to investigate tropical cyclone wind speeds.Natural Hazards, 117(1):897–915, 2023
work page 2023
-
[45]
Neural lander: Stable drone landing control using learned dynamics
Shi, G., Shi, X., O’Connell, M., Yu, R., Azizzadenesheli, K., Anandkumar, A., Yue, Y ., and Chung, S.-J. Neural lander: Stable drone landing control using learned dynamics. In2019 international conference on robotics and automation (icra), pp. 9784–9790. IEEE, 2019
work page 2019
-
[46]
Siegel, Z. S., Kapoor, S., Nagdir, N., Stroebl, B., and Narayanan, A. Core-bench: Fostering the credibility of published research through a computational reproducibility agent benchmark. arXiv preprint arXiv:2409.11363, 2024
-
[47]
Singhal, K., Tu, T., Gottweis, J., Sayres, R., Wulczyn, E., Amin, M., Hou, L., Clark, K., Pfohl, S. R., Cole-Lewis, H., et al. Toward expert-level medical question answering with large language models.Nature Medicine, pp. 1–8, 2025
work page 2025
-
[48]
P., Khodadoust, A., Alonso, J., Darmofal, D., Gropp, W., Lurie, E., and Mavriplis, D
Slotnick, J. P., Khodadoust, A., Alonso, J., Darmofal, D., Gropp, W., Lurie, E., and Mavriplis, D. J. Cfd vision 2030 study: a path to revolutionary computational aerosciences. Technical report, 2014
work page 2030
-
[49]
PaperBench: Evaluating AI's Ability to Replicate AI Research
Starace, G., Jaffe, O., Sherburn, D., Aung, J., Chan, J. S., Maksin, L., Dias, R., Mays, E., Kinsella, B., Thompson, W., et al. Paperbench: Evaluating ai’s ability to replicate ai research. arXiv preprint arXiv:2504.01848, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
Galactica: A Large Language Model for Science
Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., Poulton, A., Kerkez, V ., and Stojnic, R. Galactica: A large language model for science.arXiv preprint arXiv:2211.09085, 2022. 12
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[51]
Team, G. Gemma. 2024. doi: 10.34740/KAGGLE/M/3301. URL https://www.kaggle.com/ m/3301
-
[52]
Tian, M., Gao, L., Zhang, S., Chen, X., Fan, C., Guo, X., Haas, R., Ji, P., Krongchon, K., Li, Y ., et al. Scicode: A research coding benchmark curated by scientists.Advances in Neural Information Processing Systems, 37:30624–30650, 2024
work page 2024
-
[54]
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Wang, X., Hu, Z., Lu, P., Zhu, Y ., Zhang, J., Subramaniam, S., Loomba, A. R., Zhang, S., Sun, Y ., and Wang, W. Scibench: Evaluating college-level scientific problem-solving abilities of large language models.arXiv preprint arXiv:2307.10635, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[55]
G., Tabor, G., Jasak, H., and Fureby, C
Weller, H. G., Tabor, G., Jasak, H., and Fureby, C. A tensorial approach to computational continuum mechanics using object-oriented techniques.Computers in physics, 12(6):620–631, 1998
work page 1998
-
[56]
Foam-agent: Towards automated intelligent cfd workflows
Yue, L., Somasekharan, N., Cao, Y ., and Pan, S. Foam-agent: Towards automated intelligent cfd workflows.arXiv preprint arXiv:2505.04997, 2025
-
[57]
Physreason: A comprehensive benchmark towards physics-based reasoning
Zhang, X., Dong, Y ., Wu, Y ., Huang, J., Jia, C., Fernando, B., Shou, M. Z., Zhang, L., and Liu, J. Physreason: A comprehensive benchmark towards physics-based reasoning.arXiv preprint arXiv:2502.12054, 2025. 13 A Dataset Curation A.1 CFDQuery This Question and Answer dataset spans a broad spectrum of PDEs, numerical methods and error- analysis topics. I...
-
[58]
∂u ∂t +a ∂u ∂x = a∆x2 6 ∂3u ∂x3 +O(∆x 3)
-
[59]
∂u ∂t +a ∂u ∂x = a∆x2 2 ∂2u ∂x2 +O(∆x 3)
-
[60]
∂u ∂t +a ∂u ∂x =− a∆x2 6 ∂3u ∂x3 + a∆t2 6 ∂3u ∂t3 +O(∆x 3)
-
[61]
∂u ∂t +a ∂u ∂x = a∆x2 6 ∂3u ∂x3 − a3∆t2 6 ∂3u ∂x3 +O(∆x 3) Correct Answer:Option 4 Model Responses: •Sonnet 3.5:Option 4✓ •o3-mini:Option 3✗ •Gemini 2.5 Flash:Option 4✓ •Haiku 3.5:Option 1✗ •GPT-4o:Option 3✗ •Gemma-2-9B-IT:Option 1✗ C.2 CFDCodeBench The visual comparison of the model produced results and the ground truth solution at the final timestep for...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.