Decision-Aware Evaluation of Physics-Informed Surrogates
Pith reviewed 2026-06-27 22:14 UTC · model grok-4.3
The pith
Standard curve-error metrics frequently fail to identify useful lattice designs for engineering decisions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Low nRMSE on response curves is frequently insufficient to identify useful design selections. Physics-informed losses alter trade-offs among metrics rather than monotonically improving all of them, and dimensionless conditioning improves comparability without making transfer symmetric.
What carries the argument
The pinn-gym benchmark protocol that measures curve fidelity, physical admissibility, top-k retrieval accuracy, and mass regret on top of a reduced-order oracle for crush and impact.
Load-bearing premise
The transparent reduced-order crush-and-impact oracle, together with the five printable polymer cards and the defined protocol, serves as a valid proxy for real engineering decision outcomes in lattice design.
What would settle it
Finding a surrogate with higher nRMSE that consistently produces better top-k designs or lower regret than a lower-nRMSE model in physical validation tests would falsify the claim.
Figures
read the original abstract
Physics-informed machine learning is often assessed by curve error, although engineering use depends on downstream decisions: ranking candidates, avoiding infeasible designs and limiting regret. We introduce pinn-gym, an open benchmark for material-conditioned lattice design that couples a transparent reduced-order crush-and-impact oracle with five printable polymer cards, dimensionless force-response targets and a protocol spanning curve fidelity, physical admissibility, top-k retrieval and mass regret. Across per-material, pooled and cross-material settings, low nRMSE is frequently insufficient to identify useful design selections. Physics-informed losses alter trade-offs rather than monotonically improving all metrics, and dimensionless conditioning improves comparability without making transfer symmetric. The benchmark is not a certified material model; within the released oracle, candidate generator and material cards, pinn-gym provides a reproducible testbed for evaluating PIML surrogates as decision systems rather than curve predictors alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces pinn-gym, an open benchmark coupling a transparent reduced-order crush-and-impact oracle with five printable polymer material cards and dimensionless force-response targets. It evaluates physics-informed surrogates across per-material, pooled, and cross-material settings using protocols for curve fidelity (nRMSE), physical admissibility, top-k retrieval, and mass regret. The central empirical claims are that low nRMSE frequently fails to identify useful design selections, that physics-informed losses alter (rather than uniformly improve) decision metrics, and that dimensionless conditioning improves cross-material comparability without symmetric transfer.
Significance. If the reported mismatches between curve error and decision quality hold within the released components, the work supplies a reproducible, decision-oriented testbed that directly addresses a recognized gap in PIML evaluation. The explicit disclaimer that the oracle is not a certified material model, together with the open release of the oracle, candidate generator, and cards, constitutes a concrete strength that enables community scrutiny and extension. The findings provide falsifiable, quantitative evidence that standard error metrics are often insufficient proxies for engineering utility.
major comments (1)
- [§4 (experimental protocol) and appendix] The load-bearing assumption for the central claims is the reduced-order oracle's adequacy as a proxy for lattice design decisions. While the abstract and introduction correctly note that the benchmark is not a certified model, the manuscript would benefit from an explicit sensitivity study (e.g., in §4 or the appendix) showing how the reported gaps between nRMSE and top-k/admissibility/regret metrics respond to plausible variations in the oracle's contact or rate-dependent parameters.
minor comments (2)
- [§3] Notation for the five polymer cards and the exact definition of the dimensionless conditioning should be consolidated in a single table or subsection to improve readability for readers implementing the benchmark.
- [§5.3] The cross-material transfer results would be clearer if the symmetry (or lack thereof) were quantified with an explicit asymmetry metric rather than described qualitatively.
Simulated Author's Rebuttal
We thank the referee for the constructive assessment and for recognizing the benchmark's value as a reproducible testbed. We address the major comment below.
read point-by-point responses
-
Referee: [§4 (experimental protocol) and appendix] The load-bearing assumption for the central claims is the reduced-order oracle's adequacy as a proxy for lattice design decisions. While the abstract and introduction correctly note that the benchmark is not a certified model, the manuscript would benefit from an explicit sensitivity study (e.g., in §4 or the appendix) showing how the reported gaps between nRMSE and top-k/admissibility/regret metrics respond to plausible variations in the oracle's contact or rate-dependent parameters.
Authors: The central empirical claims are explicitly conditioned on the released reduced-order oracle, candidate generator, and material cards, as stated in the abstract and introduction. The benchmark is presented as a transparent, non-certified proxy to enable reproducible evaluation of decision metrics within these components; the open release is intended to support community extensions such as sensitivity analyses. A full sensitivity study on contact or rate-dependent parameters would require additional experiments and computational effort beyond the manuscript's scope of establishing the benchmark and protocols. We therefore do not view such a study as necessary to support the reported mismatches within the provided testbed. revision: no
Circularity Check
No circularity: empirical benchmark with external oracle
full rationale
The paper introduces pinn-gym as an open benchmark consisting of a reduced-order crush-and-impact oracle, five polymer material cards, dimensionless targets, and evaluation protocols for curve fidelity, admissibility, top-k retrieval and mass regret. All reported findings (low nRMSE insufficient for design selection, physics-informed losses altering trade-offs, dimensionless conditioning effects) are direct empirical observations on this externally defined testbed. No derivation, fitted parameter, uniqueness theorem, or ansatz is presented that reduces to the authors' own prior quantities or self-citations. The work is self-contained against the released oracle and cards; the skeptic concern about oracle fidelity is a correctness question, not a circularity reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The reduced-order crush-and-impact oracle accurately represents the physical behaviors relevant to design decisions
invented entities (1)
-
pinn-gym benchmark
no independent evidence
Reference graph
Works this paper leans on
-
[1]
& Karniadakis, G
Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.J. Comput. Phys.378, 686–707 (2019). 2.Karniadakis, G. E.et al.Physics-informed machine learning.Nat. Rev. Phys.3, 422–440 (2021)
2019
-
[2]
An Expert’s Guide to Training Physics-informed Neural Networks, August 2023
Cuomo, S., Schiano di Cola, V ., Giampaolo, F., Rozza, G., Raissi, M. & Piccialli, F. Scientific machine learning through physics-informed neural networks: Where we are and what’s next.J. Sci. Comput.92, 88 (2022). 4.Wang, S., Sankaran, S., Wang, H. & Perdikaris, P. An expert’s guide to training physics-informed neural networks.arXiv preprintarXiv:2308.08...
-
[3]
Toscano, J. D., Oommen, V ., Varghese, A. J., Zou, Z., Daryakenari, N. A., Wu, C. & Karniadakis, G. E. From PINNs to PIKANs: recent advances in physics-informed machine learning.arXiv preprintarXiv:2410.13228 (2024). 6.Gibson, L. J. & Ashby, M. F.Cellular Solids: Structure and Properties, 2nd edn. Cambridge University Press (1997). 7.Ashby, M. F. The prop...
-
[4]
& van Hecke, M
Bertoldi, K., Vitelli, V ., Christensen, J. & van Hecke, M. Flexible mechanical metamaterials.Nat. Rev. Mater .2, 17066 (2017)
2017
-
[5]
Liu, R.et al.A review on factors affecting the mechanical properties of additively manufactured lattice structures.J. Mater . Eng. Perform.33, 1–25 (2024)
2024
-
[6]
X., Chen, C.-T., Richmond, D
Gu, G. X., Chen, C.-T., Richmond, D. J. & Buehler, M. J. Bioinspired hierarchical composite design using machine learning: simulation, additive manufacturing, and experiment.Mater . Horiz.5, 939–945 (2018)
2018
-
[7]
& Watanabe, I
Zheng, X., Zhang, X., Chen, T.-T. & Watanabe, I. Deep learning in mechanical metamaterials: from prediction and generation to inverse design.Adv. Mater .35, 2302530 (2023)
2023
-
[8]
& Berto, F
Maurizi, M., Gao, C. & Berto, F. Predicting stress, strain and deformation fields in materials and structures with graph neural networks.Sci. Reports12, 21834 (2022). 11/12
2022
-
[9]
On physically similar systems; illustrations of the use of dimensional equations.Phys
Buckingham, E. On physically similar systems; illustrations of the use of dimensional equations.Phys. Rev.4, 345–376 (1914). 14.Barenblatt, G. I.Scaling, Self-similarity, and Intermediate Asymptotics. Cambridge University Press (1996)
1914
-
[10]
& Niepert, M
Takamoto, M., Praditia, T., Leiteritz, R., MacKinlay, D., Alesiani, F., Pflüger, D. & Niepert, M. PDEBench: An extensive benchmark for scientific machine learning. InAdvances in Neural Information Processing Systems35(NeurIPS 2022)
2022
-
[11]
& Karniadakis, G
Lu, L., Meng, X., Cai, S., Mao, Z., Goswami, S., Zhang, Z. & Karniadakis, G. E. A comprehensive and fair comparison of two neural operators (with practical extensions) based on FAIR data.Comput. Methods Appl. Mech. Eng.393, 114778 (2022)
2022
-
[12]
Subramanian, S., Harrington, P., Keutzer, K., Bhimji, W., Morozov, D., Mahoney, M. W. & Gholami, A. Towards foundation models for scientific machine learning: characterizing scaling and transfer behavior. InAdvances in Neural Information Processing Systems36(NeurIPS 2023)
2023
-
[13]
& Perdikaris, P
Wang, S., Yu, X. & Perdikaris, P. When and why PINNs fail to train: a neural tangent kernel perspective.J. Comput. Phys. 449, 110768 (2022)
2022
-
[14]
Wang, S., Sankaran, S. & Perdikaris, P. Respecting causality is all you need for training physics-informed neural networks. arXiv preprintarXiv:2203.07404 (2022)
-
[15]
S., Gholami, A., Zhe, S., Kirby, R
Krishnapriyan, A. S., Gholami, A., Zhe, S., Kirby, R. & Mahoney, M. W. Characterizing possible failure modes in physics-informed neural networks. InAdvances in Neural Information Processing Systems34(NeurIPS 2021)
2021
-
[16]
& Karpatne, A
Daw, A., Bu, J., Wang, S., Perdikaris, P. & Karpatne, A. Mitigating propagation failures in physics-informed neural networks using Retain-Resample-Release (R3) sampling. InProceedings of the 40th International Conference on Machine Learning(ICML 2023)
2023
-
[17]
Bischof, R. & Kraus, M. A. Multi-objective loss balancing for physics-informed deep learning.arXiv preprint arXiv:2110.09813 (2021)
-
[18]
& Karniadakis, G
Lu, L., Jin, P., Pang, G., Zhang, Z. & Karniadakis, G. E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nat. Mach. Intell.3, 218–229 (2021)
2021
-
[19]
B., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A
Li, Z., Kovachki, N. B., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A. & Anandkumar, A. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations(ICLR 2021)
2021
-
[20]
B., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A
Kovachki, N. B., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A. & Anandkumar, A. Neural operator: learning maps between function spaces with applications to PDEs.J. Mach. Learn. Res.24, 1–97 (2023)
2023
-
[21]
F., Meng, X., Zou, Z., Guo, L
Psaros, A. F., Meng, X., Zou, Z., Guo, L. & Karniadakis, G. E. Uncertainty quantification in scientific machine learning: methods, metrics, and comparisons.J. Comput. Phys.477, 111902 (2023)
2023
-
[22]
Zou, Z., Meng, X., Psaros, A. F. & Karniadakis, G. E. NeuralUQ: a comprehensive library for uncertainty quantification in neural differential equations and operators.SIAM Rev.66, 161–190 (2024). 12/12
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.