Enhancing neural network extrapolation in thermo-fluid systems using steady-state solutions

Sanghyun Lee; Sanjeeb Poudel; Teeratorn Kadeethum

arxiv: 2606.18417 · v1 · pith:QPL7EA6Rnew · submitted 2026-06-16 · 💻 cs.CE

Enhancing neural network extrapolation in thermo-fluid systems using steady-state solutions

Sanjeeb Poudel , Teeratorn Kadeethum , Sanghyun Lee This is my paper

Pith reviewed 2026-06-26 21:37 UTC · model grok-4.3

classification 💻 cs.CE

keywords physics-informed neural networkstemporal extrapolationsteady-state solutionsdissipative PDEsthermo-fluid systemsNavier-Stokes equationsconjugate heat transfer

0 comments

The pith

A neural network ansatz that decomposes solutions into a prescribed steady state plus a decaying transient correction embeds long-time convergence by construction for dissipative PDEs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a representation for time-dependent PDE solutions in which the network output is written as the sum of a known steady-state field and a transient term multiplied by a decay function that goes to zero at large times. This structure guarantees that the solution approaches the steady state without requiring an extra loss term to enforce the asymptotics. The method is implemented inside a physics-informed neural network and tested on problems that range from the heat equation to lid-driven cavity flow, natural convection, and three-dimensional conjugate heat transfer. Results indicate that the built-in convergence improves accuracy when the network is asked to predict times beyond the interval used for training. A reader would care because standard networks commonly lose fidelity on long-time forecasts, and the new form addresses that limitation directly through the functional form rather than through added penalties.

Core claim

The proposed ansatz decomposes the solution into a steady-state component and a transient correction modulated by a time-dependent decay profile. When the decay profile vanishes at long time and the transient correction remains bounded, the representation embeds convergence to the prescribed steady state directly into the architecture rather than enforcing it through an additional penalty term. This allows the network to learn the transient dynamics while preserving the correct asymptotic behavior.

What carries the argument

The steady-state-informed ansatz that writes the solution as the sum of a prescribed steady-state field and a bounded transient correction multiplied by a decay profile that vanishes at large times.

If this is right

The network only needs to learn the transient correction; the long-time limit is satisfied automatically by the functional form.
Temporal extrapolation beyond the training interval improves compared with networks that lack the explicit asymptotic embedding.
The approach applies to dissipative systems ranging from the one-dimensional heat equation to incompressible Navier-Stokes flow and three-dimensional conjugate heat transfer.
Training uses the physics-informed loss inside a PINN framework without an additional steady-state penalty.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same decomposition could be tried on systems whose steady states are computed once by a separate solver and then frozen for the network.
If the decay profile itself is learned rather than prescribed, the method might adapt to cases where the relaxation rate is not known in advance.
The construction may reduce the amount of long-time data needed during training because the network is not free to drift away from the known equilibrium.

Load-bearing premise

The PDE solutions relax toward a stationary equilibrium that can be prescribed or computed independently and supplied to the network.

What would settle it

A numerical example in which the learned transient correction grows unbounded even though the decay profile vanishes would show that the architectural guarantee of convergence does not hold.

Figures

Figures reproduced from arXiv: 2606.18417 by Sanghyun Lee, Sanjeeb Poudel, Teeratorn Kadeethum.

**Figure 1.** Figure 1: Proposed network architecture and multi-phase training strategy. Phase 1 (Steady-State): Fully connected [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: One-dimensional heat equation: Extrapolation performance and temporal function [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: One-dimensional heat equation: Extrapolated solution versus the reference solution for [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: One-dimensional heat equation: Extrapolation error as a function of training duration. The vertical axis shows [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Lid-Driven Cavity Flow: L∞ error over time for the velocity components (a) u1 and (b) u2, evaluated across various temporal functions f(t), with the network trained on t ∈ [0, 0.5]. 4.2.1 Training on t ∈ [0, 0.5] [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Lid-Driven Cavity Flow: L∞ error over time for the velocity components (a) u1 and (b) u2, evaluated across various temporal functions f(t), with the network trained on t ∈ [0, 1]. times the training horizon (t = 1.5 versus training up to t = 0.5), demonstrates that the steady-state-informed ansatz provides meaningful predictive capability well outside the training domain. (a) u h 1 (b) |u1 − u h 1 | (c) u … view at source ↗

**Figure 7.** Figure 7: Lid-Driven Cavity Flow: Velocity predictions and pointwise absolute errors at [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Lid-Driven Cavity Flow: Contour plots showing the absolute spatial differences in the velocity components [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Natural convection in a square cavity: (a) [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Natural convection in a square cavity: Temporal evolution of temperature profiles (top row) and the [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Three-dimensional domain for conjugate heat transfer [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Conjugate heat transfer in three dimensions: (a) Relative [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: Conjugate heat transfer in three dimensions: (a) Relative [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

**Figure 14.** Figure 14: Conjugate heat transfer in three dimensions: Temporal evolution of temperature profiles (top row) on a YZ [PITH_FULL_IMAGE:figures/full_fig_p019_14.png] view at source ↗

**Figure 15.** Figure 15: Conjugate heat transfer in three dimensions: The top row illustrates the temporal evolution of temperature [PITH_FULL_IMAGE:figures/full_fig_p020_15.png] view at source ↗

read the original abstract

Time-dependent partial differential equations (PDEs) arise in many engineering systems, including thermo-fluid applications. Classical numerical simulations of such systems can become computationally expensive for long-time dynamics because they typically require sequential time integration with time steps constrained by stability, accuracy, or nonlinear solvers. Although scientific machine learning provides an alternative for approximating PDE solutions, standard neural network approximations often degrade when extrapolated beyond the training time interval. In this work, we propose a steady-state-informed neural network representation for dissipative PDE systems whose solutions relax toward a stationary equilibrium. The proposed ansatz decomposes the solution into a steady-state component and a transient correction modulated by a time-dependent decay profile. When the decay profile vanishes at long time and the transient correction remains bounded, the representation embeds convergence to the prescribed steady state directly into the architecture, rather than enforcing it through an additional penalty term. This allows the network to learn the transient dynamics while preserving the correct asymptotic behavior. We implement the approach within a physics-informed neural network (PINN) framework and train the resulting model using the SOAP optimizer. The method is evaluated on a sequence of problems of increasing physical and geometric complexity, ranging from the one-dimensional heat equation to incompressible Navier-Stokes flow in a lid-driven cavity, natural convection in a square cavity, and a full three-dimensional conjugate heat transfer problem. The numerical results show that the steady-state-informed architecture substantially improves temporal extrapolation beyond the training interval compared with architectures that do not explicitly enforce the asymptotic condition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper embeds convergence to a known steady state into the PINN via a decay-modulated transient term and reports better extrapolation on several thermo-fluid cases, but the transient network has no built-in bound for times past the training window.

read the letter

The core idea is to write the solution as a prescribed steady state plus a neural network transient multiplied by a decay profile that goes to zero at long times. When the transient stays bounded, this puts the correct asymptote into the architecture itself instead of adding another term to the loss. They run this inside a standard PINN setup with the SOAP optimizer and test it on the 1D heat equation, lid-driven cavity flow, natural convection, and a 3D conjugate heat transfer problem. The reported gains in extrapolation past the training interval are the main empirical result.

The tests cover a useful range of complexity and the architectural choice is explicit rather than hidden in regularization. That is the concrete contribution.

The limitation that stands out is the one flagged in the stress test. The transient network is trained only on a finite interval, and nothing in the description enforces or checks that its output remains bounded afterward. If it grows even modestly for t larger than the training horizon, the embedded convergence property disappears and you are left with the usual extrapolation problems. The abstract states the assumption but does not describe verification steps or additional constraints that would make it hold.

You also need an independent steady-state solution to plug in, which shifts rather than removes the cost of finding the long-time behavior.

This is the kind of targeted architectural tweak that people working on PINNs for dissipative engineering PDEs should see. It is coherent on its own terms and the experiments are on the right problems, so it deserves a serious referee even though the boundedness question will need to be addressed in revision.

Referee Report

1 major / 0 minor

Summary. The paper proposes a steady-state-informed neural network ansatz for dissipative time-dependent PDEs whose solutions relax to a stationary equilibrium. The solution is decomposed as a prescribed steady-state component plus a neural-network-modeled transient correction modulated by a decay profile that vanishes at long times. When the transient remains bounded, this embeds asymptotic convergence directly into the architecture rather than via an added penalty. The ansatz is implemented in a PINN framework trained with the SOAP optimizer and evaluated on a sequence of problems of increasing complexity (1D heat equation, lid-driven cavity, natural convection, 3D conjugate heat transfer), with the claim that it substantially improves temporal extrapolation beyond the training interval relative to architectures without the explicit asymptotic constraint.

Significance. If the boundedness assumption holds and is validated, the approach offers a non-circular architectural mechanism for enforcing correct long-time behavior in neural approximations of relaxing thermo-fluid systems. This could reduce the need for penalty terms and improve reliability of extrapolation in engineering contexts where classical time-stepping is expensive. The explicit separation of steady state from transient distinguishes the method from standard PINNs and is a strength when the premise applies.

major comments (1)

[Abstract] Abstract: the central claim that the representation 'embeds convergence to the prescribed steady state directly into the architecture' (rather than through a penalty) requires that the NN-modeled transient correction remain bounded for t > T. The transient network is trained exclusively on a finite interval [0, T]; no architectural constraint, regularization, or post-training verification is described that would enforce or confirm boundedness outside the training window. If the transient grows (even slowly) for t > T, the embedded convergence property fails and the reported extrapolation benefit is lost. This assumption is load-bearing for the distinction from penalty-based methods and must be addressed with either a proof, a boundedness-enforcing modification, or explicit numerical verification on the test cases.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback, which highlights a critical assumption in our proposed ansatz. We address the major comment point-by-point below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the representation 'embeds convergence to the prescribed steady state directly into the architecture' (rather than through a penalty) requires that the NN-modeled transient correction remain bounded for t > T. The transient network is trained exclusively on a finite interval [0, T]; no architectural constraint, regularization, or post-training verification is described that would enforce or confirm boundedness outside the training window. If the transient grows (even slowly) for t > T, the embedded convergence property fails and the reported extrapolation benefit is lost. This assumption is load-bearing for the distinction from penalty-based methods and must be addressed with either a proof, a boundedness-enforcing modification, or explicit numerical verification on the test cases.

Authors: We agree that boundedness of the transient correction for t > T is a load-bearing assumption for the architectural (rather than penalty-based) enforcement of asymptotic behavior, and that the manuscript states this condition explicitly but does not provide supporting verification. A general proof of boundedness is not available without further restrictions on the network architecture or activation functions, and we do not introduce an additional constraint or regularization term, as this would change the method. In the revised version we will add explicit numerical verification: for each test case we will plot or tabulate the L2 norm of the transient network output over an extended extrapolation interval (e.g., up to 2T or 3T) to confirm that it remains bounded, thereby substantiating the claim for the dissipative systems considered. This verification will be included in the results section and discussed in the abstract revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; ansatz is an explicit architectural choice with external inputs.

full rationale

The paper's central step is the introduction of an explicit decomposition ansatz (solution = steady-state component + decay-modulated transient correction) whose convergence property holds by the algebraic form of the ansatz when the decay vanishes and the transient term is bounded. The steady-state is supplied externally or computed independently, not fitted from the target transient data. The transient correction is represented by a standard neural network trained on the usual residual loss; no claimed prediction is obtained by renaming or refitting a quantity already present in the inputs. No self-citation chain, uniqueness theorem, or smuggled ansatz is invoked to justify the form. Numerical results on multiple PDEs therefore constitute independent validation rather than a tautological restatement of fitted quantities.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the target PDE systems are dissipative and converge to a known stationary state, plus the modeling choice of a bounded transient correction modulated by a decay profile whose functional form is not detailed in the abstract.

free parameters (1)

decay profile parameters
The time-dependent decay profile is introduced as part of the ansatz; its specific functional form and any tunable coefficients are not specified in the abstract.

axioms (1)

domain assumption Solutions of the considered dissipative PDE systems relax toward a stationary equilibrium
Invoked in the abstract to delimit the class of problems for which the ansatz is designed.

pith-pipeline@v0.9.1-grok · 5804 in / 1436 out tokens · 33645 ms · 2026-06-26T21:37:54.285426+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 5 canonical work pages · 3 internal anchors

[1]

SIAM, 2007

Randall J LeVeque.Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM, 2007

2007
[2]

LeVeque.Finite Volume Methods for Hyperbolic Problems

Randall J. LeVeque.Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, 2002

2002
[3]

Pearson Education, 2nd edition, 2007

Henk Kaarle Versteeg and Weeratunge Malalasekera.An Introduction to Computational Fluid Dynamics: The Finite Volume Method. Pearson Education, 2nd edition, 2007

2007
[4]

Thomas J. R. Hughes.The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Dover Publications, 2012. Originally published by Prentice-Hall, 1987. 20

2012
[5]

Zienkiewicz, Robert L

Olek C. Zienkiewicz, Robert L. Taylor, and Jian Z. Zhu.The Finite Element Method: Its Basis and Fundamentals. Elsevier Butterworth-Heinemann, 6th edition, 2005

2005
[6]

Springer, 2nd edition, 2009

Alfio Quarteroni, Riccardo Sacco, and Fausto Saleri.Numerical Mathematics. Springer, 2nd edition, 2009

2009
[7]

J. N. Reddy.Introduction to the Finite Element Method. McGraw-Hill Education, 4th edition, 2019

2019
[8]

Dynamic thermal modelling of power transformers.IEEE Transactions on Power Delivery, 20(1):197–204, 2005

Dejan Susa, Matti Lehtonen, and Hasse Nordman. Dynamic thermal modelling of power transformers.IEEE Transactions on Power Delivery, 20(1):197–204, 2005

2005
[9]

Seddik, Jehan Shazly, and Magdy B

Mohamed S. Seddik, Jehan Shazly, and Magdy B. Eteiba. Thermal analysis of power transformer using 2D and 3D finite element method.Energies, 17(13):3203, 2024

2024
[10]

Numerical investigation of oil flow and temperature distributions for ON transformer windings.Applied Thermal Engineering, 130:1–9, 2018

Xiang Zhang, Zhongdong Wang, and Qiang Liu. Numerical investigation of oil flow and temperature distributions for ON transformer windings.Applied Thermal Engineering, 130:1–9, 2018

2018
[11]

Enhancing heat transfer in power transformer radiators via thermo-fluid dynamic analysis with periodic thermal boundary conditions

Jonathan J Dorella, Bruno A Storti, Gustavo A Ríos Rodriguez, and Mario A Storti. Enhancing heat transfer in power transformer radiators via thermo-fluid dynamic analysis with periodic thermal boundary conditions. International Journal of Heat and Mass Transfer, 222:125142, 2024

2024
[12]

Impacts of startup, shutdown and load variation on transient temperature and thermal stress fields within blades of gas turbines.Journal of Thermal Science, 31(3):727–740, 2022

Liuxi Cai, Yanfang Hou, Fang Li, Yun Li, Shunsen Wang, and Jingru Mao. Impacts of startup, shutdown and load variation on transient temperature and thermal stress fields within blades of gas turbines.Journal of Thermal Science, 31(3):727–740, 2022

2022
[13]

W. Z. Tang, L. Yang, W. Zhu, et al. Numerical simulation of temperature distribution and thermal-stress field in a turbine blade with multilayer-structure TBCs by a fluid-solid coupling method.Journal of Materials Science & Technology, 32(5):452–458, 2016

2016
[14]

High-re solutions for incompressible flow using the navier-stokes equations and a multigrid method.Journal of computational physics, 48(3):387–411, 1982

UKNG Ghia, Kirti N Ghia, and CT Shin. High-re solutions for incompressible flow using the navier-stokes equations and a multigrid method.Journal of computational physics, 48(3):387–411, 1982

1982
[15]

P. N. Shankar and M. D. Deshpande. Fluid mechanics in the driven cavity.Annual Review of Fluid Mechanics, 32(1):93–136, 2000

2000
[16]

Wiley New York, 1996

Frank P Incropera, David P DeWitt, Theodore L Bergman, Adrienne S Lavine, et al.Fundamentals of heat and mass transfer, volume 6. Wiley New York, 1996

1996
[17]

Recent advances in non-intrusive polynomial chaos and stochastic collocation methods for uncertainty analysis and design

Michael Eldred. Recent advances in non-intrusive polynomial chaos and stochastic collocation methods for uncertainty analysis and design. In50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference 17th AIAA/ASME/AHS Adaptive Structures Conference 11th AIAA No, page 2274, 2009

2009
[18]

Princeton university press, 2010

Dongbin Xiu.Numerical methods for stochastic computations: a spectral method approach. Princeton university press, 2010

2010
[19]

Physics- informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics- informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

2021
[20]

Brunton and J

Steven L. Brunton and J. Nathan Kutz.Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2nd edition, 2022

2022
[21]

Karniadakis

Maziar Raissi, Paris Perdikaris, and George E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019

2019
[22]

Jørgensen, and Hamidreza M

Teeratorn Kadeethum, Thomas M. Jørgensen, and Hamidreza M. Nick. Physics-informed neural networks for solving nonlinear diffusivity and Biot’s equations.PloS one, 15(5):e0232683, 2020

2020
[23]

Jagtap, Ehsan Kharazmi, and George Em Karniadakis

Ameya D. Jagtap, Ehsan Kharazmi, and George Em Karniadakis. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems.Computer Methods in Applied Mechanics and Engineering, 365:113028, 2020

2020
[24]

Jørgensen, and Hamidreza M

Teeratorn Kadeethum, Thomas M. Jørgensen, and Hamidreza M. Nick. Physics-informed neural networks for solv- ing inverse problems of nonlinear Biot’s equations: batch training. InARMA US Rock Mechanics/Geomechanics Symposium, pages ARMA–2020. ARMA, 2020

2020
[25]

DeepXDE: A deep learning library for solving differential equations.SIAM Review, 63(1):208–228, 2021

Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: A deep learning library for solving differential equations.SIAM Review, 63(1):208–228, 2021

2021
[26]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3(3):218–229, 2021

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3(3):218–229, 2021. 21

2021
[27]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikolas Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhatt, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021

2021
[28]

Learning the solution operator of parametric partial differential equations with physics-informed DeepONets.Science Advances, 7(40):eabi8605, 2021

Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of parametric partial differential equations with physics-informed DeepONets.Science Advances, 7(40):eabi8605, 2021

2021
[29]

Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

Nikolas Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhatt, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

2023
[30]

When and why PINNs fail to train: A neural tangent kernel perspective.Journal of Computational Physics, 449:110768, 2022

Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why PINNs fail to train: A neural tangent kernel perspective.Journal of Computational Physics, 449:110768, 2022

2022
[31]

Aditi Krishnapriyan, Amir Gholami, Shandian Zhe, Robert Kirby, and Michael W. Mahoney. Characterizing possible failure modes in physics-informed neural networks. 34:26548–26560, 2021

2021
[32]

Revanth Mattey and Susanta Ghosh. A novel sequential method to train physics informed neural networks for Allen–Cahn and Cahn–Hilliard equations.Computer Methods in Applied Mechanics and Engineering, 390:114474, 2022

2022
[33]

Understanding and mitigating gradient flow pathologies in physics-informed neural networks.SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021

Sifan Wang, Yujun Teng, and Paris Perdikaris. Understanding and mitigating gradient flow pathologies in physics-informed neural networks.SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021

2021
[34]

Chenxi Wu, Min Zhu, Qinyang Tan, Yadhu Kartha, and Lu Lu. A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks.Computer Methods in Applied Mechanics and Engineering, 403:115671, 2023

2023
[35]

SOAP: Improving and Stabilizing Shampoo using Adam

Nikhil Vyas, Depen Morwani, Rosie Zhao, Mujin Kwun, Itai Shapira, David Brandfonbrener, Lucas Janson, and Sham Kakade. Soap: Improving and stabilizing shampoo using adam.arXiv preprint arXiv:2409.11321, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Gradient alignment in physics-informed neural networks: A second-order optimization perspective.arXiv preprint arXiv:2502.00604, 2025

Sifan Wang, Ananyae Kumar Bhartari, Bowen Li, and Paris Perdikaris. Gradient alignment in physics-informed neural networks: A second-order optimization perspective.arXiv preprint arXiv:2502.00604, 2025

work page arXiv 2025
[37]

MIT press Cambridge, 2016

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio.Deep learning, volume 1. MIT press Cambridge, 2016

2016
[38]

Searching for Activation Functions

Prajit Ramachandran, Barret Zoph, and Quoc V Le. Searching for activation functions.arXiv preprint arXiv:1710.05941, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[39]

Multilayer feedforward networks are universal approxi- mators.Neural Networks, 2(5):359–366, 1989

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approxi- mators.Neural Networks, 2(5):359–366, 1989

1989
[40]

Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

1989
[41]

The expressive power of neural networks: A view from the width.Advances in neural information processing systems, 30, 2017

Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width.Advances in neural information processing systems, 30, 2017

2017
[42]

Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind

Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey.Journal of Machine Learning Research, 18(153):1–43, 2018

2018
[43]

Respecting causality for training physics-informed neural networks.Computer Methods in Applied Mechanics and Engineering, 421:116813, 2024

Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality for training physics-informed neural networks.Computer Methods in Applied Mechanics and Engineering, 421:116813, 2024

2024
[44]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[45]

Shampoo: Preconditioned stochastic tensor optimization

Vineet Gupta, Tomer Koren, and Yoram Singer. Shampoo: Preconditioned stochastic tensor optimization. In International Conference on Machine Learning, pages 1842–1850. PMLR, 2018

2018
[46]

Adafactor: Adaptive learning rates with sublinear memory cost

Noam Shazeer and Mitchell Stern. Adafactor: Adaptive learning rates with sublinear memory cost. InInternational conference on machine learning, pages 4596–4604. PMLR, 2018

2018
[47]

Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, ˙Ilhan Polat, Yu Feng, Eric W. M...

2020
[48]

Harris, K

Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Shepp...

2020
[49]

Keras.https://keras.io, 2015

François Chollet et al. Keras.https://keras.io, 2015

2015
[50]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murra...

2015
[51]

J. D. Hunter. Matplotlib: A 2d graphics environment.Computing in Science & Engineering, 9(3):90–95, 2007

2007
[52]

Pressure-robust enriched galerkin finite element methods for coupled navier-stokes and heat equations.arXiv preprint arXiv:2512.16716, 2025

Sanjeeb Poudel, Sanghyun Lee, and Lin Mu. Pressure-robust enriched galerkin finite element methods for coupled navier-stokes and heat equations.arXiv preprint arXiv:2512.16716, 2025

work page arXiv 2025
[53]

The deal

Daniel Arndt, Wolfgang Bangerth, Maximilian Bergbauer, Marco Feder, Marc Fehling, Johannes Heinz, Timo Heister, Luca Heltai, Martin Kronbichler, Matthias Maier, et al. The deal. ii library, version 9.5.Journal of Numerical Mathematics, 31(3):231–246, 2023

2023
[54]

Gauthier-Villars, 1897

Joseph Boussinesq.Théorie de l’écoulement tourbillonnant et tumultueux des liquides dans les lits rectilignes à grande section..., volume 1. Gauthier-Villars, 1897. 23

[1] [1]

SIAM, 2007

Randall J LeVeque.Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM, 2007

2007

[2] [2]

LeVeque.Finite Volume Methods for Hyperbolic Problems

Randall J. LeVeque.Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, 2002

2002

[3] [3]

Pearson Education, 2nd edition, 2007

Henk Kaarle Versteeg and Weeratunge Malalasekera.An Introduction to Computational Fluid Dynamics: The Finite Volume Method. Pearson Education, 2nd edition, 2007

2007

[4] [4]

Thomas J. R. Hughes.The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Dover Publications, 2012. Originally published by Prentice-Hall, 1987. 20

2012

[5] [5]

Zienkiewicz, Robert L

Olek C. Zienkiewicz, Robert L. Taylor, and Jian Z. Zhu.The Finite Element Method: Its Basis and Fundamentals. Elsevier Butterworth-Heinemann, 6th edition, 2005

2005

[6] [6]

Springer, 2nd edition, 2009

Alfio Quarteroni, Riccardo Sacco, and Fausto Saleri.Numerical Mathematics. Springer, 2nd edition, 2009

2009

[7] [7]

J. N. Reddy.Introduction to the Finite Element Method. McGraw-Hill Education, 4th edition, 2019

2019

[8] [8]

Dynamic thermal modelling of power transformers.IEEE Transactions on Power Delivery, 20(1):197–204, 2005

Dejan Susa, Matti Lehtonen, and Hasse Nordman. Dynamic thermal modelling of power transformers.IEEE Transactions on Power Delivery, 20(1):197–204, 2005

2005

[9] [9]

Seddik, Jehan Shazly, and Magdy B

Mohamed S. Seddik, Jehan Shazly, and Magdy B. Eteiba. Thermal analysis of power transformer using 2D and 3D finite element method.Energies, 17(13):3203, 2024

2024

[10] [10]

Numerical investigation of oil flow and temperature distributions for ON transformer windings.Applied Thermal Engineering, 130:1–9, 2018

Xiang Zhang, Zhongdong Wang, and Qiang Liu. Numerical investigation of oil flow and temperature distributions for ON transformer windings.Applied Thermal Engineering, 130:1–9, 2018

2018

[11] [11]

Enhancing heat transfer in power transformer radiators via thermo-fluid dynamic analysis with periodic thermal boundary conditions

Jonathan J Dorella, Bruno A Storti, Gustavo A Ríos Rodriguez, and Mario A Storti. Enhancing heat transfer in power transformer radiators via thermo-fluid dynamic analysis with periodic thermal boundary conditions. International Journal of Heat and Mass Transfer, 222:125142, 2024

2024

[12] [12]

Impacts of startup, shutdown and load variation on transient temperature and thermal stress fields within blades of gas turbines.Journal of Thermal Science, 31(3):727–740, 2022

Liuxi Cai, Yanfang Hou, Fang Li, Yun Li, Shunsen Wang, and Jingru Mao. Impacts of startup, shutdown and load variation on transient temperature and thermal stress fields within blades of gas turbines.Journal of Thermal Science, 31(3):727–740, 2022

2022

[13] [13]

W. Z. Tang, L. Yang, W. Zhu, et al. Numerical simulation of temperature distribution and thermal-stress field in a turbine blade with multilayer-structure TBCs by a fluid-solid coupling method.Journal of Materials Science & Technology, 32(5):452–458, 2016

2016

[14] [14]

High-re solutions for incompressible flow using the navier-stokes equations and a multigrid method.Journal of computational physics, 48(3):387–411, 1982

UKNG Ghia, Kirti N Ghia, and CT Shin. High-re solutions for incompressible flow using the navier-stokes equations and a multigrid method.Journal of computational physics, 48(3):387–411, 1982

1982

[15] [15]

P. N. Shankar and M. D. Deshpande. Fluid mechanics in the driven cavity.Annual Review of Fluid Mechanics, 32(1):93–136, 2000

2000

[16] [16]

Wiley New York, 1996

Frank P Incropera, David P DeWitt, Theodore L Bergman, Adrienne S Lavine, et al.Fundamentals of heat and mass transfer, volume 6. Wiley New York, 1996

1996

[17] [17]

Recent advances in non-intrusive polynomial chaos and stochastic collocation methods for uncertainty analysis and design

Michael Eldred. Recent advances in non-intrusive polynomial chaos and stochastic collocation methods for uncertainty analysis and design. In50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference 17th AIAA/ASME/AHS Adaptive Structures Conference 11th AIAA No, page 2274, 2009

2009

[18] [18]

Princeton university press, 2010

Dongbin Xiu.Numerical methods for stochastic computations: a spectral method approach. Princeton university press, 2010

2010

[19] [19]

Physics- informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu Yang. Physics- informed machine learning.Nature Reviews Physics, 3(6):422–440, 2021

2021

[20] [20]

Brunton and J

Steven L. Brunton and J. Nathan Kutz.Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2nd edition, 2022

2022

[21] [21]

Karniadakis

Maziar Raissi, Paris Perdikaris, and George E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019

2019

[22] [22]

Jørgensen, and Hamidreza M

Teeratorn Kadeethum, Thomas M. Jørgensen, and Hamidreza M. Nick. Physics-informed neural networks for solving nonlinear diffusivity and Biot’s equations.PloS one, 15(5):e0232683, 2020

2020

[23] [23]

Jagtap, Ehsan Kharazmi, and George Em Karniadakis

Ameya D. Jagtap, Ehsan Kharazmi, and George Em Karniadakis. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems.Computer Methods in Applied Mechanics and Engineering, 365:113028, 2020

2020

[24] [24]

Jørgensen, and Hamidreza M

Teeratorn Kadeethum, Thomas M. Jørgensen, and Hamidreza M. Nick. Physics-informed neural networks for solv- ing inverse problems of nonlinear Biot’s equations: batch training. InARMA US Rock Mechanics/Geomechanics Symposium, pages ARMA–2020. ARMA, 2020

2020

[25] [25]

DeepXDE: A deep learning library for solving differential equations.SIAM Review, 63(1):208–228, 2021

Lu Lu, Xuhui Meng, Zhiping Mao, and George Em Karniadakis. DeepXDE: A deep learning library for solving differential equations.SIAM Review, 63(1):208–228, 2021

2021

[26] [26]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3(3):218–229, 2021

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3(3):218–229, 2021. 21

2021

[27] [27]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikolas Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhatt, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021

2021

[28] [28]

Learning the solution operator of parametric partial differential equations with physics-informed DeepONets.Science Advances, 7(40):eabi8605, 2021

Sifan Wang, Hanwen Wang, and Paris Perdikaris. Learning the solution operator of parametric partial differential equations with physics-informed DeepONets.Science Advances, 7(40):eabi8605, 2021

2021

[29] [29]

Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

Nikolas Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhatt, Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

2023

[30] [30]

When and why PINNs fail to train: A neural tangent kernel perspective.Journal of Computational Physics, 449:110768, 2022

Sifan Wang, Xinling Yu, and Paris Perdikaris. When and why PINNs fail to train: A neural tangent kernel perspective.Journal of Computational Physics, 449:110768, 2022

2022

[31] [31]

Aditi Krishnapriyan, Amir Gholami, Shandian Zhe, Robert Kirby, and Michael W. Mahoney. Characterizing possible failure modes in physics-informed neural networks. 34:26548–26560, 2021

2021

[32] [32]

Revanth Mattey and Susanta Ghosh. A novel sequential method to train physics informed neural networks for Allen–Cahn and Cahn–Hilliard equations.Computer Methods in Applied Mechanics and Engineering, 390:114474, 2022

2022

[33] [33]

Understanding and mitigating gradient flow pathologies in physics-informed neural networks.SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021

Sifan Wang, Yujun Teng, and Paris Perdikaris. Understanding and mitigating gradient flow pathologies in physics-informed neural networks.SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021

2021

[34] [34]

Chenxi Wu, Min Zhu, Qinyang Tan, Yadhu Kartha, and Lu Lu. A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks.Computer Methods in Applied Mechanics and Engineering, 403:115671, 2023

2023

[35] [35]

SOAP: Improving and Stabilizing Shampoo using Adam

Nikhil Vyas, Depen Morwani, Rosie Zhao, Mujin Kwun, Itai Shapira, David Brandfonbrener, Lucas Janson, and Sham Kakade. Soap: Improving and stabilizing shampoo using adam.arXiv preprint arXiv:2409.11321, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Gradient alignment in physics-informed neural networks: A second-order optimization perspective.arXiv preprint arXiv:2502.00604, 2025

Sifan Wang, Ananyae Kumar Bhartari, Bowen Li, and Paris Perdikaris. Gradient alignment in physics-informed neural networks: A second-order optimization perspective.arXiv preprint arXiv:2502.00604, 2025

work page arXiv 2025

[37] [37]

MIT press Cambridge, 2016

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio.Deep learning, volume 1. MIT press Cambridge, 2016

2016

[38] [38]

Searching for Activation Functions

Prajit Ramachandran, Barret Zoph, and Quoc V Le. Searching for activation functions.arXiv preprint arXiv:1710.05941, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [39]

Multilayer feedforward networks are universal approxi- mators.Neural Networks, 2(5):359–366, 1989

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approxi- mators.Neural Networks, 2(5):359–366, 1989

1989

[40] [40]

Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

1989

[41] [41]

The expressive power of neural networks: A view from the width.Advances in neural information processing systems, 30, 2017

Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, and Liwei Wang. The expressive power of neural networks: A view from the width.Advances in neural information processing systems, 30, 2017

2017

[42] [42]

Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind

Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. Automatic differentiation in machine learning: a survey.Journal of Machine Learning Research, 18(153):1–43, 2018

2018

[43] [43]

Respecting causality for training physics-informed neural networks.Computer Methods in Applied Mechanics and Engineering, 421:116813, 2024

Sifan Wang, Shyam Sankaran, and Paris Perdikaris. Respecting causality for training physics-informed neural networks.Computer Methods in Applied Mechanics and Engineering, 421:116813, 2024

2024

[44] [44]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[45] [45]

Shampoo: Preconditioned stochastic tensor optimization

Vineet Gupta, Tomer Koren, and Yoram Singer. Shampoo: Preconditioned stochastic tensor optimization. In International Conference on Machine Learning, pages 1842–1850. PMLR, 2018

2018

[46] [46]

Adafactor: Adaptive learning rates with sublinear memory cost

Noam Shazeer and Mitchell Stern. Adafactor: Adaptive learning rates with sublinear memory cost. InInternational conference on machine learning, pages 4596–4604. PMLR, 2018

2018

[47] [47]

Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, ˙Ilhan Polat, Yu Feng, Eric W. M...

2020

[48] [48]

Harris, K

Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Shepp...

2020

[49] [49]

Keras.https://keras.io, 2015

François Chollet et al. Keras.https://keras.io, 2015

2015

[50] [50]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murra...

2015

[51] [51]

J. D. Hunter. Matplotlib: A 2d graphics environment.Computing in Science & Engineering, 9(3):90–95, 2007

2007

[52] [52]

Pressure-robust enriched galerkin finite element methods for coupled navier-stokes and heat equations.arXiv preprint arXiv:2512.16716, 2025

Sanjeeb Poudel, Sanghyun Lee, and Lin Mu. Pressure-robust enriched galerkin finite element methods for coupled navier-stokes and heat equations.arXiv preprint arXiv:2512.16716, 2025

work page arXiv 2025

[53] [53]

The deal

Daniel Arndt, Wolfgang Bangerth, Maximilian Bergbauer, Marco Feder, Marc Fehling, Johannes Heinz, Timo Heister, Luca Heltai, Martin Kronbichler, Matthias Maier, et al. The deal. ii library, version 9.5.Journal of Numerical Mathematics, 31(3):231–246, 2023

2023

[54] [54]

Gauthier-Villars, 1897

Joseph Boussinesq.Théorie de l’écoulement tourbillonnant et tumultueux des liquides dans les lits rectilignes à grande section..., volume 1. Gauthier-Villars, 1897. 23