PDEAgent-Bench is the first multi-metric, multi-library benchmark for AI-generated PDE solvers, evaluating executability, numerical accuracy, and efficiency across DOLFINx, Firedrake, and deal.II.
arXiv preprint arXiv:2505.08783 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5representative citing papers
A Creator-Inspector multi-agent LLM pipeline for constitutive artificial neural networks increases the rate of models satisfying all nine physical constraints to 100% or 56% depending on the LLM backbone.
LLMs prompted with domain knowledge can generate runnable, numerically valid code for stiff and non-stiff ODEs on new diagnostic and 1000-task benchmarks.
AutoPDE maintains an explicit solver strategy through PDE analysis, numerical method selection, and adaptive tuning, achieving 54.5% pass rate on PDE Agent Bench, 14.2 points above the strongest baseline.
A survey of RLM use in 28 disciplines reveals uneven adoption and introduces a maturity assessment framework showing larger gaps when limited to public resources.
citing papers explorer
-
PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation
PDEAgent-Bench is the first multi-metric, multi-library benchmark for AI-generated PDE solvers, evaluating executability, numerical accuracy, and efficiency across DOLFINx, Firedrake, and deal.II.
-
LLM-driven design of physics-constrained constitutive models: two agents are better than one
A Creator-Inspector multi-agent LLM pipeline for constitutive artificial neural networks increases the rate of models satisfying all nine physical constraints to 100% or 56% depending on the LLM backbone.
-
SciML Agents: Write the Solver, Not the Solution
LLMs prompted with domain knowledge can generate runnable, numerically valid code for stiff and non-stiff ODEs on new diagnostic and 1000-task benchmarks.
-
AutoPDE: Reliable Agentic PDE Solving via Explicitly Represented Solver Strategies
AutoPDE maintains an explicit solver strategy through PDE analysis, numerical method selection, and adaptive tuning, achieving 54.5% pass rate on PDE Agent Bench, 14.2 points above the strongest baseline.
-
Reasoning4Sciences: Bridging Reasoning Language Models to All Scientific Branches
A survey of RLM use in 28 disciplines reveals uneven adoption and introduces a maturity assessment framework showing larger gaps when limited to public resources.