pith. sign in

arxiv: 2606.22150 · v1 · pith:LQZWYVVRnew · submitted 2026-06-20 · 💻 cs.LG

Parameterized Representations via Implicit Stochastic Modulation for High-Dimensional and High-Order Neural PDE Solvers

Pith reviewed 2026-06-26 12:18 UTC · model grok-4.3

classification 💻 cs.LG
keywords neural PDE solversstochastic derivative estimatorsparameterized PDEshigh-dimensional PDEsimplicit modulationhyper-generatorautomatic differentiationzero-shot generalization
0
0 comments X

The pith

PRISM maps physical parameters to affine modulators on a spatial latent manifold to keep high-order derivative graphs free of parameter entanglement in neural PDE solvers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Directly conditioning stochastic neural PDE solvers on physical parameters entangles those parameters with the automatic-differentiation graph for high-order derivatives, which inflates memory and amplifies variance. PRISM introduces a hyper-generator that converts parameters into affine modulators applied only to a purely spatial latent representation. The modulators keep parameter branches value-connected yet spatially tangent-disconnected, so the original unbiased stochastic dimension and Taylor estimators remain valid. The framework supplies proofs of parameterized unbiasedness, error bounds, and convergence, and experiments demonstrate zero-shot generalization together with scaling to 100,000 dimensions on one GPU.

Core claim

PRISM uses a hyper-generator to map physical parameters to affine modulators that scale and shift a purely spatial latent manifold, while keeping parameter branches value-connected but spatial-tangent-disconnected. This design preserves unbiased stochastic dimension and Taylor estimators, removes the parameter encoder from high-order spatial AD, and provides a variance-aware Lipschitz envelope over the parameter space. The paper proves parameterized unbiasedness, estimation-error bounds, and convergence under bounded stochastic variance.

What carries the argument

Hyper-generator that produces affine modulators (scale and shift) applied to a purely spatial latent manifold, enforcing spatial-tangent-disconnection from parameter branches.

If this is right

  • Parameterized unbiasedness of stochastic derivative estimators holds after modulation.
  • Estimation-error bounds and convergence under bounded variance extend to the parameterized case.
  • Zero-shot generalization to unseen parameters occurs without retraining the spatial solver.
  • Memory growth is avoided because the parameter encoder is excluded from high-order spatial AD.
  • Low-rank SVD adaptation enables efficient handling of new parameters at 100,000 dimensions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same modulator construction could be inserted into other stochastic-gradient PDE methods that already rely on dimension or Taylor estimators.
  • The variance-aware Lipschitz envelope supplies a natural way to quantify how solution uncertainty grows with parameter distance.
  • Low-rank adaptation of the modulators suggests a route to rapid fine-tuning when only a few new parameter samples become available.
  • The separation of parameter and spatial graphs may reduce the cost of multi-query or inverse problems that repeatedly evaluate the PDE at different parameters.

Load-bearing premise

The spatial-tangent-disconnected property between parameter branches and the spatial manifold can be maintained in practice without introducing bias or extra variance into the stochastic derivative estimators.

What would settle it

A numerical check that stochastic derivative estimates acquire measurable bias once the learned affine modulators are applied to a parameter value outside the training set.

Figures

Figures reproduced from arXiv: 2606.22150 by Huanhuan Gao, Zhangyong Liang.

Figure 1
Figure 1. Figure 1: PRISM enables zero-shot parameter generalization across high-dimensional PDE families. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Motivation results on the parameterized high-dimensional Sine-Gordon equation. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Convergence of PRISM on three parameterized PDE families across dimensions [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Relative gradient-variance growth under increasing parameter magnitude [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Relative 𝐿1 and 𝐿2 error decay over training epochs under different 𝛽 values. Z. Liang et al.: Preprint submitted to Elsevier Page 16 of 40 [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Interpolation and extrapolation over the reaction coefficient [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: High-dimensional parameterized convection visualization for PRISM on the [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Time-space visualization of the high-dimensional parameterized convection problem on the [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Multi-equation zero-shot extrapolation across the full parameter range. [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Multi-equation per-step wall-clock time comparison across spatial dimensions. [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Multi-equation final loss comparison across spatial dimensions. [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Ablation study on the PRISM scale and shift modulation components. [PITH_FULL_IMAGE:figures/full_fig_p025_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Zero-shot testing convergence of the predicted minimum eigenvalue [PITH_FULL_IMAGE:figures/full_fig_p028_14.png] view at source ↗
read the original abstract

Solving high-dimensional and high-order PDEs is challenged by the coupled growth of spatial dimensionality and derivative order. Recent stochastic derivative estimators reduce this cost by replacing full derivative tensors with randomized dimension or Taylor estimators, but they are mostly designed for fixed physical parameters and require retraining for each new parameter. We show that direct conditional parameterization of such solvers entangles physical parameters with the high-order automatic differentiation graph, causing extra memory growth and parameter-induced variance amplification. We propose Parameterized Representations via Implicit Stochastic Modulation (PRISM), a plug-and-play framework for parameterized high-dimensional and high-order stochastic neural PDE solvers. PRISM uses a hyper-generator to map physical parameters to affine modulators that scale and shift a purely spatial latent manifold, while keeping parameter branches value-connected but spatial-tangent-disconnected. This design preserves unbiased stochastic dimension and Taylor estimators, removes the parameter encoder from high-order spatial AD, and provides a variance-aware Lipschitz envelope over the parameter space. We prove parameterized unbiasedness, estimation-error bounds, and convergence under bounded stochastic variance. Experiments with PRISM-STDE and PRISM-SDGD on nonlinear parameterized PDEs show stable zero-shot generalization, reduced memory usage, and scalability up to 100,000 dimensions on a single GPU, with efficient low-rank SVD adaptation for unseen parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper proposes PRISM, a plug-and-play framework for parameterized high-dimensional and high-order stochastic neural PDE solvers. It uses a hyper-generator to map physical parameters to affine modulators (scale and shift) applied to a purely spatial latent manifold, keeping parameter branches value-connected but spatial-tangent-disconnected. This preserves unbiased stochastic dimension and Taylor estimators, removes the parameter encoder from high-order spatial AD, and provides a variance-aware Lipschitz envelope. The authors prove parameterized unbiasedness, estimation-error bounds, and convergence under bounded stochastic variance. Experiments with PRISM-STDE and PRISM-SDGD on nonlinear parameterized PDEs demonstrate stable zero-shot generalization, reduced memory usage, and scalability up to 100,000 dimensions on a single GPU, with efficient low-rank SVD adaptation for unseen parameters.

Significance. If the central claims hold, PRISM addresses a key limitation in extending stochastic derivative estimators to parameterized settings by structurally decoupling parameter dependence from the spatial differentiation graph. This enables zero-shot generalization across parameter spaces without retraining or parameter-induced variance amplification, while maintaining unbiasedness and providing convergence guarantees. The design's explicit preservation of stochastic estimators and the experimental scalability to extreme dimensions represent a meaningful contribution to high-dimensional neural PDE solving.

minor comments (3)
  1. [Abstract] Abstract: the claim of 'efficient low-rank SVD adaptation for unseen parameters' is stated without reference to the specific adaptation procedure or its integration with the hyper-generator; a brief description or forward reference would improve clarity.
  2. [Methods] The notation for the hyper-generator output (affine modulators) and the spatial latent manifold could be introduced with explicit equations in the methods section to make the value-connected vs. tangent-disconnected distinction immediately formal.
  3. [Experiments] Experiments: while scalability to 100,000 dimensions is reported, the memory and variance comparisons would benefit from an explicit baseline table showing the entangled parameterization case to quantify the claimed reduction.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report accurately reflects the core contributions of PRISM in decoupling parameters from the differentiation graph while preserving unbiased stochastic estimators.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central design (hyper-generator mapping parameters to affine modulators on a spatial latent manifold, with value-connected but spatial-tangent-disconnected branches) is presented as an architectural choice that by definition removes the parameter encoder from high-order AD graphs. The abstract states that this preserves unbiased stochastic estimators and that proofs of parameterized unbiasedness, error bounds, and convergence exist under bounded variance. No equations, self-citations, or fitted inputs are quoted that reduce the claimed proofs or unbiasedness results to the inputs by construction. The derivation chain remains self-contained against external benchmarks, with the proofs and experiments providing independent content beyond the design description.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Based solely on the abstract, the central claim rests on the prior existence of unbiased stochastic dimension and Taylor estimators (domain assumption) and introduces a new hyper-generator component whose mapping to modulators is not independently verified outside the paper.

axioms (1)
  • domain assumption Stochastic dimension and Taylor estimators remain unbiased when the spatial manifold is modulated by affine transforms derived from a separate hyper-generator.
    Invoked in the description of how PRISM preserves estimator properties.
invented entities (2)
  • hyper-generator no independent evidence
    purpose: Maps physical parameters to affine modulators for the spatial latent manifold
    New architectural component introduced to achieve parameterization without entanglement.
  • affine modulators no independent evidence
    purpose: Scale and shift the spatial latent manifold in a parameter-dependent but spatially tangent-disconnected manner
    Core mechanism of the proposed framework.

pith-pipeline@v0.9.1-grok · 5761 in / 1352 out tokens · 36535 ms · 2026-06-26T12:18:31.513390+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 2 canonical work pages

  1. [1]

    rep., USDOE Office of Science (SC), Washington, DC (United States) (2019)

    N.Baker,F.Alexander,T.Bremer,A.Hagberg,Y.Kevrekidis,H.Najm,M.Parashar,A.Patra,J.Sethian,S.Wild,etal.,Workshopreporton basic research needs for scientific machine learning: Core technologies for artificial intelligence, Tech. rep., USDOE Office of Science (SC), Washington, DC (United States) (2019)

  2. [2]

    Raissi, P

    M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, Journal of Computational Physics 378 (2019) 686–707

  3. [3]

    G.E.Karniadakis,I.G.Kevrekidis,L.Lu,P.Perdikaris,S.Wang,L.Yang,Physics-informedmachinelearning,NatureReviewsPhysics3(6) (2021) 422–440. Z. Liang et al.:Preprint submitted to ElsevierPage 38 of 40 PRISM for high-dimensional and high-order PDEs

  4. [4]

    C. Beck, S. Becker, P. Cheridito, A. Jentzen, A. Neufeld, Deep splitting method for parabolic pdes, SIAM Journal on Scientific Computing 43 (5) (2021) A3135–A3154

  5. [5]

    J. Han, A. Jentzen, W. E, Solving high-dimensional partial differential equations using deep learning, Proceedings of the National Academy of Sciences 115 (34) (2018) 8505–8510

  6. [6]

    Raissi, Forward-backward stochastic neural networks: Deep learning of high-dimensional partial differential equations, arXiv preprint arXiv:1804.07010 (2018)

    M. Raissi, Forward-backward stochastic neural networks: Deep learning of high-dimensional partial differential equations, arXiv preprint arXiv:1804.07010 (2018)

  7. [7]

    Liang, H

    Z. Liang, H. Gao, J. Zhang, Stochastic dimension gradient descent for robust high-dimensional black-box optimization, arXiv preprint arXiv:2309.05572 (2023)

  8. [8]

    Z. Hu, Z. Shi, G. E. Karniadakis, K. Kawaguchi, Hutchinson trace estimation for high-dimensional and high-order physics-informed neural networks,ComputerMethodsinAppliedMechanicsandEngineering424(2024)116883.doi:https://doi.org/10.1016/j.cma.2024. 116883

  9. [9]

    Z. Hu, K. Kawaguchi, Z. Zhang, G. E. Karniadakis, Stochastic taylor derivative estimator: Efficient amortization for arbitrary differential operators, Advances in Neural Information Processing Systems 37 (2024)

  10. [10]

    X. Liu, X. Zhang, W. Peng, W. Zhou, W. Yao, A novel meta-learning initialization method for physics-informed neural networks, Neural Computing and Applications 34 (17) (2022) 14511–14534

  11. [11]

    A.Krishnapriyan,A.Gholami,S.Zhe,R.Kirby,M.W.Mahoney,Characterizingpossiblefailuremodesinphysics-informedneuralnetworks, Advances in Neural Information Processing Systems 34 (2021) 26548–26560

  12. [12]

    L.Lu,P.Jin,G.Pang,Z.Zhang,G.E.Karniadakis,Learningnonlinearoperatorsviadeeponetbasedontheuniversalapproximationtheorem of operators, Nature machine intelligence 3 (3) (2021) 218–229

  13. [13]

    Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. liu, K. Bhattacharya, A. Stuart, A. Anandkumar, Fourier neural operator for parametric partial differential equations, in: International Conference on Learning Representations, 2021

  14. [14]

    N.Kovachki,Z.Li,B.Liu,K.Azizzadenesheli,K.Bhattacharya,A.Stuart,A.Anandkumar,Neuraloperator:Learningmapsbetweenfunction spaces with applications to pdes, Journal of Machine Learning Research 24 (89) (2023) 1–97

  15. [15]

    Z. Li, H. Zheng, N. Kovachki, D. Jin, H. Chen, B. Liu, K. Azizzadenesheli, A. Anandkumar, Physics-informed neural operator for learning partial differential equations, arXiv preprint arXiv:2111.03794 (2021)

  16. [16]

    Goswami, A

    S. Goswami, A. Bora, Y. Yu, G. E. Karniadakis, Physics-informed neural operators, arXiv preprint arXiv:2207.05748 (2022)

  17. [17]

    Raissi, G

    M. Raissi, G. E. Karniadakis, Hidden physics models: Machine learning of nonlinear partial differential equations, Journal of Computational Physics 357 (2018) 125–141

  18. [18]

    E.Haghighat,M.Raissi,A.Moure,H.Gomez,R.Juanes,Aphysics-informeddeeplearningframeworkforinversionandsurrogatemodeling in solid mechanics, Computer Methods in Applied Mechanics and Engineering 379 (2021) 113741

  19. [19]

    Y. Yang, P. Perdikaris, Adversarial uncertainty quantification in physics-informed neural networks, Journal of Computational Physics 394 (2019) 136–152

  20. [20]

    A. D. Jagtap, D. Mitsotakis, G. E. Karniadakis, Deep learning of inverse water waves problems using multi-fidelity data: Application to serre–green–naghdi equations, Ocean Engineering 248 (2022) 110775

  21. [21]

    S.Cai,Z.Mao,Z.Wang,M.Yin,G.E.Karniadakis,Physics-informedneuralnetworks(pinns)forfluidmechanics:Areview,ActaMechanica Sinica 37 (12) (2021) 1727–1738

  22. [22]

    Goswami, A

    S. Goswami, A. D. Jagtap, H. Babaee, B. T. Susi, G. E. Karniadakis, Learning stiff chemical kinetics using extended deep neural operators, arXiv preprint arXiv:2302.12645 (2023)

  23. [23]

    K.Shukla,V.Oommen,A.Peyvan,M.Penwarden,L.Bravo,A.Ghoshal,R.M.Kirby,G.E.Karniadakis,Deepneuraloperatorscanserveas accurate surrogates for shape optimization: a case study for airfoils, arXiv preprint arXiv:2302.00807 (2023)

  24. [24]

    Goswami, M

    S. Goswami, M. Yin, Y. Yu, G. E. Karniadakis, A physics-informed variational deeponet for predicting crack path in quasi-brittle materials, Computer Methods in Applied Mechanics and Engineering 391 (2022) 114587

  25. [25]

    T. Luo, H. Yang, Two-layer neural networks for partial differential equations: Optimization and generalization theory, ArXiv abs/2006.15733 (2020)

  26. [26]

    Mishra, R

    S. Mishra, R. Molinaro, Estimates on the generalization error of physics informed neural networks (pinns) for approximating pdes, arXiv preprint arXiv:2006.16144 (2020)

  27. [27]

    Y. Shin, J. Darbon, G. E. Karniadakis, On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type pdes, arXiv preprint arXiv:2004.01806 (2020)

  28. [28]

    J.Lu,Y.Lu,M.Wang,Apriorigeneralizationanalysisofthedeepritzmethodforsolvinghighdimensionalellipticequations,arXivpreprint arXiv:2101.01708 (2021)

  29. [29]

    Z.Hu,A.D.Jagtap,G.E.Karniadakis,K.Kawaguchi,Whendoextendedphysics-informedneuralnetworks(xpinns)improvegeneralization?, SIAM Journal on Scientific Computing 44 (5) (2022) A3158–A3182.doi:10.1137/21M1447039

  30. [30]

    A.D.Jagtap,G.E.Karniadakis,Extendedphysics-informedneuralnetworks(xpinns):Ageneralizedspace-timedomaindecompositionbased deep learning framework for nonlinear partial differential equations, Communications in Computational Physics 28 (5) (2020) 2002–2041

  31. [31]

    Z. Hu, A. D. Jagtap, G. E. Karniadakis, K. Kawaguchi, Augmented physics-informed neural networks (apinns): A gating network-based soft domain decomposition methodology, arXiv preprint arXiv:2211.08939 (2022)

  32. [32]

    C. Wang, S. Li, D. He, L. Wang, Is $l^2$ physics informed loss always suitable for training physics informed neural network?, in: A. H. Oh, A. Agarwal, D. Belgrave, K. Cho (Eds.), Advances in Neural Information Processing Systems, 2022

  33. [33]

    D. He, S. Li, W. Shi, X. Gao, J. Zhang, J. Bian, L. Wang, T.-Y. Liu, Learning physics-informed neural networks without stacked back- propagation, in: International Conference on Artificial Intelligence and Statistics, PMLR, 2023, pp. 3034–3047

  34. [34]

    J. Cho, S. Nam, H. Yang, S.-B. Yun, Y. Hong, E. Park, Separable pinn: Mitigating the curse of dimensionality in physics-informed neural networks, arXiv preprint arXiv:2211.08761 (2022). Z. Liang et al.:Preprint submitted to ElsevierPage 39 of 40 PRISM for high-dimensional and high-order PDEs

  35. [35]

    J. Han, A. Jentzen, et al., Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Communications in mathematics and statistics 5 (4) (2017) 349–380

  36. [36]

    C.Beck,W.E,A.Jentzen,Machinelearningapproximationalgorithmsforhigh-dimensionalfullynonlinearpartialdifferentialequationsand second-order backward stochastic differential equations, Journal of Nonlinear Science 29 (2019) 1563–1619

  37. [37]

    Chan-Wai-Nam, J

    Q. Chan-Wai-Nam, J. Mikael, X. Warin, Machine learning for semi linear pdes, Journal of scientific computing 79 (3) (2019) 1667–1712

  38. [38]

    Henry-Labordere, Deep primal-dual algorithm for bsdes: Applications of machine learning to cva and im, Available at SSRN 3071506 (2017)

    P. Henry-Labordere, Deep primal-dual algorithm for bsdes: Applications of machine learning to cva and im, Available at SSRN 3071506 (2017)

  39. [39]

    C. Huré, H. Pham, X. Warin, Deep backward schemes for high-dimensional nonlinear pdes, Mathematics of Computation 89 (324) (2020) 1547–1579

  40. [40]

    S.Ji,S.Peng,Y.Peng,X.Zhang,Threealgorithmsforsolvinghigh-dimensionalfullycoupledfbsdesthroughdeeplearning,IEEEIntelligent Systems 35 (3) (2020) 71–84

  41. [41]

    Becker, P

    S. Becker, P. Cheridito, A. Jentzen, T. Welti, Solving high-dimensional optimal stopping problems using deep learning, European Journal of Applied Mathematics 32 (3) (2021) 470–514

  42. [42]

    C.Beck,L.Gonon,A.Jentzen,Overcomingthecurseofdimensionalityinthenumericalapproximationofhigh-dimensionalsemilinearelliptic partial differential equations, arXiv preprint arXiv:2003.00596 (2020)

  43. [43]

    C.Beck,F.Hornung,M.Hutzenthaler,A.Jentzen,T.Kruse,Overcomingthecurseofdimensionalityinthenumericalapproximationofallen– cahn partial differential equations via truncated full-history recursive multilevel picard approximations, Journal of Numerical Mathematics 28 (4) (2020) 197–222

  44. [44]

    Becker, R

    S. Becker, R. Braunwarth, M. Hutzenthaler, A. Jentzen, P. von Wurstemberger, Numerical simulations for full history recursive multilevel picard approximations for systems of high-dimensional partial differential equations, arXiv preprint arXiv:2005.10206 (2020)

  45. [45]

    Hutzenthaler, A

    M. Hutzenthaler, A. Jentzen, T. Kruse, T. Anh Nguyen, P. von Wurstemberger, Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations, Proceedings of the Royal Society A 476 (2244) (2020) 20190630

  46. [46]

    Hutzenthaler, A

    M. Hutzenthaler, A. Jentzen, T. Kruse, et al., Multilevel picard iterations for solving smooth semilinear parabolic heat equations, Partial Differential Equations and Applications 2 (6) (2021) 1–31

  47. [47]

    Y. Wang, P. Jin, H. Xie, Tensor neural network and its numerical integration, arXiv preprint arXiv:2207.02754 (2022)

  48. [48]

    Y. Wang, Y. Liao, H. Xie, Solving schr∖"{o}dinger equation using tensor neural network, arXiv preprint arXiv:2209.12572 (2022)

  49. [49]

    Y. Zang, G. Bao, X. Ye, H. Zhou, Weak adversarial networks for high-dimensional partial differential equations, Journal of Computational Physics 411 (2020) 109409

  50. [50]

    J.Sirignano,K.Spiliopoulos,Dgm:Adeeplearningalgorithmforsolvingpartialdifferentialequations,Journalofcomputationalphysics375 (2018) 1339–1364

  51. [51]

    Weinan, T

    E. Weinan, T. Yu, The deep ritz method: A deep learning-based numerical algorithm for solving variational problems, Communications in Mathematics and Statistics 6 (2017) 1–12

  52. [52]

    B.Fehrman,B.Gess,A.Jentzen,Convergenceratesforthestochasticgradientdescentmethodfornon-convexobjectivefunctions,TheJournal of Machine Learning Research 21 (1) (2020) 5354–5401

  53. [53]

    Y. Lei, T. Hu, G. Li, K. Tang, Stochastic gradient descent for nonconvex learning without bounded gradient assumptions, IEEE transactions on neural networks and learning systems 31 (10) (2019) 4394–4400

  54. [54]

    Mertikopoulos, N

    P. Mertikopoulos, N. Hallak, A. Kavis, V. Cevher, On the almost sure convergence of stochastic gradient descent in non-convex problems, Advances in Neural Information Processing Systems 33 (2020) 1117–1128

  55. [55]

    H. Gao, L. Sun, J.-X. Wang, Svd-pinns: Transfer learning of physics-informed neural networks via singular value decomposition, Computer Methods in Applied Mechanics and Engineering 393 (2022) 114787

  56. [56]

    X.Glorot,Y.Bengio,Understandingthedifficultyoftrainingdeepfeedforwardneuralnetworks,in:Proceedingsofthethirteenthinternational conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings, 2010, pp. 249–256

  57. [57]

    D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, ICLR (2015)

  58. [58]

    L. Lu, R. Pestourie, W. Yao, Z. Wang, F. Verdugo, S. G. Johnson, Physics-informed neural networks with hard constraints for inverse design, SIAM Journal on Scientific Computing 43 (6) (2021) B1105–B1132. Z. Liang et al.:Preprint submitted to ElsevierPage 40 of 40