Physics-Informed Neural Network with Squeeze-Excitation-like Attention

Fu-Peng Li; Jun-Jie Zhang; Long-Gang Pang; Yun-Fei Song

arxiv: 2606.19853 · v1 · pith:DN4NQR7Nnew · submitted 2026-06-18 · 💻 cs.LG · physics.comp-ph

Physics-Informed Neural Network with Squeeze-Excitation-like Attention

Yun-Fei Song , Long-Gang Pang , Fu-Peng Li , Jun-Jie Zhang This is my paper

Pith reviewed 2026-06-26 18:00 UTC · model grok-4.3

classification 💻 cs.LG physics.comp-ph

keywords physics-informed neural networkssqueeze-excitation attentionstable initializationbenchmark problemshigh-frequency problemsneural network architecture

0 comments

The pith

A squeeze-excitation attention module added to physics-informed neural networks yields stable initialization and competitive accuracy on 17 of 20 benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SEA-PINN, which embeds a squeeze-excitation-like attention mechanism into physics-informed neural networks to recalibrate neuron importance across layers. This produces nearly negligible variance and lower initial loss on 17 out of 20 benchmark problems, creating a more deterministic starting point for optimization. The architecture reaches accuracy comparable to models designed for high-frequency cases even without Fourier feature embeddings or periodic activations. When combined with TSA-PINN, it improves performance by 42.49 percent, positioning the module as a lightweight addition that strengthens convergence reliability.

Core claim

SEA-PINN incorporates a Squeeze-Excitation-like attention mechanism into physics-informed neural networks to dynamically recalibrate the importance of neurons across layers. On 17 out of 20 benchmark problems, SEA-PINN exhibits nearly negligible variance and significantly reduced initial loss, establishing a quasi-deterministic and favorable starting point for optimization. Without employing Fourier feature embeddings or periodic activation functions, SEA-PINN attains competitive accuracy compared with TSA-PINN, and integrating SEA-PINN into TSA-PINN boosts performance by 42.49 percent.

What carries the argument

The squeeze-excitation-like attention mechanism that dynamically recalibrates neuron importance across layers in the PINN architecture.

If this is right

SEA-PINN functions as a lightweight plug-in module that enhances nonlinear representation power in physics-informed networks.
The architecture promotes more robust and efficient convergence during optimization.
SEA-PINN strengthens the overall reliability of physics-informed learning across the tested benchmarks.
Integration of the module into specialized models such as TSA-PINN produces measurable performance gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The stable initialization property could reduce the computational cost of repeated random restarts when applying PINNs to new problems.
The attention recalibration might interact differently with other common PINN enhancements such as adaptive weighting or domain decomposition.
Extending the benchmarks to include time-dependent or multi-scale problems would test whether the observed stability holds beyond the current set.

Load-bearing premise

The 20 benchmark problems and evaluation protocol represent the distribution of physics problems that practitioners solve, and the attention mechanism does not systematically bias the physics residual terms outside these cases.

What would settle it

Training SEA-PINN on a new high-frequency PDE problem outside the given 20 benchmarks and measuring high variance in initial loss or accuracy below that of TSA-PINN would falsify the stability and competitiveness claims.

Figures

Figures reproduced from arXiv: 2606.19853 by Fu-Peng Li, Jun-Jie Zhang, Long-Gang Pang, Yun-Fei Song.

**Figure 2.** Figure 2: FIG. 2: Difference from Ground Truth and training convergence. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3: Comparison of loss components for NS2D_BackStep (Case 11) [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4: Jacobian-based sensitivity of FNN-PINN and SEA-PINN at initialization. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5: Heat2D_ Multiscale(ID=7):Heatmaps of FNN-PINN and SEA-PINN at [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6: Architecture of SEA-PINN within PINNs: The complete set of outputs from the [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

We introduce SEA-PINN, a novel architecture that incorporates a Squeeze-Excitation-like attention mechanism into physics-informed neural networks to dynamically recalibrate the importance of neurons across layers. A key feature of SEA-PINN is its highly stable initialization. On 17 out of 20 benchmark problems, SEA-PINN exhibit nearly negligible variance and significantly reduced initial loss, establishing a quasi-deterministic and favorable starting point for optimization. Notably, without employing Fourier feature embeddings or periodic activation functions, SEA-PINN attained competitive accuracy (83\% vs. 90\% improvement relative to FNN-PINN on the high-frequency case 7) as compared with TSA-PINN-a model specifically engineered for high-frequency problems via learnable frequencies in sinusoidal activations. Furthermore, integrating SEA-PINN into TSA-PINN boosted performance by 42.49\%. These results underscore SEA-PINN as a lightweight plug-in module that enhances nonlinear representation power, promotes more robust and efficient convergence, and strengthens the overall reliability of physics-informed learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SEA-PINN adds a squeeze-excitation block to PINNs and reports much lower run-to-run variance plus competitive high-frequency accuracy on 17 of 20 benchmarks, but the benchmark set itself is the main open question.

read the letter

The new piece is the squeeze-excitation-like attention inserted into the PINN layers to recalibrate neuron weights dynamically. The abstract shows this produces nearly flat variance and lower initial loss on 17 of the 20 test problems, which would be a practical win if it survives scrutiny. They also get 83% of the accuracy lift of a specialized high-frequency model on case 7 without Fourier features or periodic activations, and a further 42% gain when the block is dropped into that model.

Those numbers are the clearest part of the work. The architecture is simple enough that anyone already running PINNs could try the block as a drop-in.

The soft spot is the same one flagged in the stress-test note. The claims rest on a fixed collection of 20 benchmarks with no visible argument that the problems or their collocation distributions match the range of PDEs people actually solve. Nothing in the abstract addresses whether the attention changes gradient flow through the physics residual on problems outside that set. Without ablations on the attention itself, details on network depth and loss weighting, or tests on harder or out-of-distribution cases, the "lightweight plug-in" conclusion stays provisional.

This paper is for groups already using or extending PINNs who want a low-cost way to reduce training variance. It is coherent on its own terms and the empirical claims are falsifiable, so it deserves a serious referee even though the generalization argument needs more support.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces SEA-PINN, a PINN variant that inserts a Squeeze-Excitation-like attention block to dynamically recalibrate channel-wise neuron importance across layers. The central empirical claims are that the architecture yields quasi-deterministic initialization (negligible variance and markedly lower initial loss on 17 of 20 benchmark PDEs), achieves competitive accuracy on high-frequency problems without Fourier features or periodic activations (83 % improvement relative to FNN-PINN on case 7, versus 90 % for TSA-PINN), and delivers an additional 42.49 % gain when the same attention module is grafted onto TSA-PINN.

Significance. If the reported stability and accuracy gains are reproducible under standard experimental controls, the work would supply a lightweight, architecture-agnostic module that improves training reliability for PINNs without requiring specialized activations or feature mappings. Such a module could be adopted as a default component in PINN pipelines, particularly for problems where initialization variance currently forces extensive hyper-parameter search.

major comments (2)

[Results paragraph] Results paragraph (and any accompanying tables/figures): the headline claims of “nearly negligible variance” and “significantly reduced initial loss” on 17/20 problems are presented without network widths, depth, loss-term weights, optimizer settings, number of independent runs, or statistical tests. These omissions make it impossible to assess whether the reported stability is an artifact of the chosen experimental protocol rather than a property of the attention mechanism.
[Abstract / Results paragraph] Abstract and results paragraph: the assertion that SEA-PINN is a “reliable plug-in” for arbitrary physics problems rests on a fixed suite of 20 benchmarks. No analysis is supplied showing that the collocation-point distributions or PDE types are representative of the broader distribution of problems encountered in practice, nor is there a test demonstrating that the channel-wise recalibration does not systematically alter gradient flow through the physics residual on problems outside this suite.

minor comments (1)

[Abstract] The sentence “SEA-PINN exhibit nearly negligible variance” contains a subject-verb agreement error.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses

Referee: [Results paragraph] Results paragraph (and any accompanying tables/figures): the headline claims of “nearly negligible variance” and “significantly reduced initial loss” on 17/20 problems are presented without network widths, depth, loss-term weights, optimizer settings, number of independent runs, or statistical tests. These omissions make it impossible to assess whether the reported stability is an artifact of the chosen experimental protocol rather than a property of the attention mechanism.

Authors: We agree that these details were insufficiently specified. In the revised manuscript we will add an experimental setup section reporting network widths and depths, loss-term weights, optimizer settings, the number of independent runs performed (five), and standard deviations with basic statistical comparisons. This will allow readers to evaluate whether the observed stability arises from the attention mechanism. revision: yes
Referee: [Abstract / Results paragraph] Abstract and results paragraph: the assertion that SEA-PINN is a “reliable plug-in” for arbitrary physics problems rests on a fixed suite of 20 benchmarks. No analysis is supplied showing that the collocation-point distributions or PDE types are representative of the broader distribution of problems encountered in practice, nor is there a test demonstrating that the channel-wise recalibration does not systematically alter gradient flow through the physics residual on problems outside this suite.

Authors: We accept that the original wording overstated generality. We will revise the abstract and results text to qualify the claims as applying to the evaluated benchmark suite. We will also add a short gradient-flow analysis for the attention module and a limitations paragraph discussing the scope of the 20 benchmarks. A full representativeness study across all possible PDEs lies outside the present scope. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical architecture proposal with external benchmarks

full rationale

The paper proposes SEA-PINN as an architectural modification (Squeeze-Excitation-like attention inserted into PINNs) and evaluates it empirically on a fixed suite of 20 benchmark PDE problems. No derivation chain, uniqueness theorem, ansatz, or prediction is presented that reduces by construction to quantities defined or fitted inside the paper itself. Claims rest on reported loss curves, variance statistics, and accuracy deltas versus external baselines (FNN-PINN, TSA-PINN), none of which are shown to be self-referential or forced by the authors' own prior definitions. The evaluation protocol is external to any internal fitting loop, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, mathematical axioms, or new physical entities; the architecture itself is the proposed addition.

pith-pipeline@v0.9.1-grok · 5712 in / 1106 out tokens · 37390 ms · 2026-06-26T18:00:24.878073+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 14 canonical work pages · 7 internal anchors

[1]

Linear Transformation: z(l) =W (l)h(l−1) +b (l) (3)
[2]

Nonlinear Activation: a(l) =σ (l)(z(l)) wherea (l) ∈R Nl
[3]

6: Architecture of SEA-PINN within PINNs: The complete set of outputs from thenth hidden layer is taken as input to the weight generator, producing a weight vector

Weight Generation (WG): The activationa (l) is processed by a lightweight squeeze-excitation block to 11 𝜕𝑡 𝑛 𝜕𝑥 𝑛 ℒ = ℒ𝑃𝐷𝐸 + ℒ𝐼𝐶 + ℒ𝐵𝐶 < 𝜖? End N Y 𝜕ℒ 𝜕𝛩Update 𝛩 AD Loss Squeeze-Excitation like Attention× n Output Input … … … … … FIG. 6: Architecture of SEA-PINN within PINNs: The complete set of outputs from thenth hidden layer is taken as input to the w...
[4]

Weighted Output: h(l) =λ (l) ⊙a (l) (7) This mechanism enables SEA-PINN to dynamically adjust the contribution of each neuron. If a neuron’s ac- tivation is less relevant for a specific input region in the current task, the weight generator can assign a smaller weight to suppress its influence; conversely, a larger weight can be assigned to enhance its ro...
[5]

reduces to the usual activation a(l) FNN =σ (l) z(l) ,(8) wherez (l) is given by Eq. (3). Since the nonlinearity is applied element-wise, the local Jacobian ofa (l) FNN with respect toz (l) is diagonal: D(l) FNN = ∂a(l) FNN ∂z(l) = diag σ(l)′ (z(l)) .(9) The output layer is purely linear, fFNN =W (L)a(L−1) +b (L),(10) so the Jacobian of the network output...
[6]

This ensures a compre- hensive assessment of model performance

Benchmark Suite and Evaluation Metrics We evaluated the proposed models on 20 challenging PDE problems selected from a recognized PINNs bench- mark suite, covering diverse domains including heat con- duction, fluid dynamics, biology, electromagnetics, and high-dimensional problems [41]. This ensures a compre- hensive assessment of model performance. As al...
[7]

Unless otherwise specified, all weights are initialized using He uniform initialization and biases are set to zero

Network Configuration and Training For a fair comparison, all neural network mod- els (FNN-PINN, SEA-PINN, TSA-PINN, TSA_SEA- PINN) share the same backbone architecture, consist- ing of 9 hidden layers with 32 neurons per layer. Unless otherwise specified, all weights are initialized using He uniform initialization and biases are set to zero. Activation F...
[8]

Advanced Detection and Artificial Intelligence at the Frontiers of Physics

for chaotic problems. These settings are kept consistent across all models to ensure that any performance difference arises from the architectural innovations rather than training hyperpa- rameters. Supplementary Information: detailed training loss analysis, total training loss comparison, individual loss component comparison, initial loss behavior of SEA...

2025
[9]

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Nature Reviews Physics3, 422 (2021)

2021
[10]

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Journal of machine learning research18, 1 (2018)

2018
[11]

Lee and I

H. Lee and I. S. Kang, Journal of Computational Physics 91, 110 (1990)

1990
[12]

A. J. Meade Jr and A. A. Fernandez, Mathematical and Computer Modelling20, 19 (1994)

1994
[13]

M. G. Dissanayake and N. Phan-Thien, communications in Numerical Methods in Engineering10, 195 (1994)

1994
[14]

Lagaris, A

I. Lagaris, A. Likas, and D. Fotiadis, IEEE Transactions on Neural Networks9, 987 (1998)

1998
[15]

G. E. Hinton and R. R. Salakhutdinov, science313, 504 (2006)

2006
[16]

Raissi, P

M. Raissi, P. Perdikaris, and G. Karniadakis, Journal of Computational Physics378, 686 (2019)

2019
[17]

L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis, SIAM review63, 208 (2021)

2021
[18]

X. Jin, S. Cai, H. Li, and G. E. Karniadakis, Journal of Computational Physics426, 109951 (2021)

2021
[19]

R. Qiu, R. Huang, Y. Xiao, J. Wang, Z. Zhang, J. Yue, Z. Zeng, and Y. Wang, Physics of Fluids34(2022)

2022
[20]

Y. Wang, J. Sun, W. Li, Z. Lu, and Y. Liu, Com- puter Methods in Applied Mechanics and Engineering 400, 115491 (2022)

2022
[21]

Z. Wu, H. Zhang, H. Ye, H. Zhang, Y. Zheng, and X. Guo, Acta Mechanica235, 4895 (2024)

2024
[22]

Y. Chen, L. Lu, G. E. Karniadakis, and L. Dal Negro, Optics express28, 11618 (2020)

2020
[23]

Fang and J

Z. Fang and J. Zhan, Ieee Access8, 24506 (2019)

2019
[24]

P.Ren, C.Rao, S.Chen, J.-X.Wang, H.Sun, andY.Liu, Computer Physics Communications295, 109010 (2024)

2024
[25]

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, Nature machine intelligence3, 218 (2021)

2021
[26]

S. Wang, H. Wang, and P. Perdikaris, Science advances 7, eabi8605 (2021)

2021
[27]

L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, and G. E. Karniadakis, Computer Methods in Applied Mechanics and Engineering393, 114778 (2022)

2022
[28]

P. Jin, S. Meng, and L. Lu, SIAM Journal on Scientific Computing44, A3490 (2022)

2022
[29]

M. Zhu, S. Feng, Y. Lin, and L. Lu, Computer Meth- ods in Applied Mechanics and Engineering416, 116300 (2023)

2023
[30]

Jiang, M

Z. Jiang, M. Zhu, and L. Lu, Reliability Engineering & System Safety251, 110392 (2024)

2024
[31]

M.Zhu, H.Zhang, A.Jiao, G.E.Karniadakis, andL.Lu, Computer Methods in Applied Mechanics and Engineer- ing412, 116064 (2023)

2023
[32]

Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, in International Conference on Learning Representations (2021)

2021
[33]

A. Jiao, H. He, R. Ranade, J. Pathak, and L. Lu, Nature Communications16, 8386 (2025)

2025
[34]

Rahaman, A

N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville, inProceed- ings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 97, edited by K. Chaudhuri and R. Salakhutdinov (PMLR, 2019) pp. 5301–5310

2019
[35]

Tancik, P

M. Tancik, P. Srinivasan, B. Mildenhall, S. Fridovich- Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. Bar- ron, and R. Ng, Advances in neural information process- ing systems33, 7537 (2020)

2020
[36]

A. D. Jagtap, K. Kawaguchi, and G. Em Karni- adakis,ProceedingsoftheRoyalSocietyA476,20200334 (2020)

2020
[37]

H. Gao, L. Sun, and J.-X. Wang, Journal of Computa- tional Physics428, 110079 (2021)

2021
[38]

Pfaff, M

T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, and P. Battaglia, inInternational conference on learning rep- 15 resentations(2020)

2020
[39]

Z. Zhao, X. Ding, and B. A. Prakash, arXiv preprint arXiv:2307.11833 (2023)

work page arXiv 2023
[40]

A. D. Jagtap, E. Kharazmi, and G. E. Karniadakis, Computer Methods in Applied Mechanics and Engineer- ing365, 113028 (2020)

2020
[41]

A. D. Jagtap and G. E. Karniadakis, Communications in Computational Physics28(2020)

2020
[42]

C. Wu, M. Zhu, Q. Tan, Y. Kartha, and L. Lu, Com- puter Methods in Applied Mechanics and Engineering 403, 115671 (2023)

2023
[43]

Respecting causality is all you need for training physics-informed neural networks,

S. Wang, S. Sankaran, and P. Perdikaris, “Respecting causality is all you need for training physics-informed neural networks,” (2022), arXiv:2203.07404 [cs.LG]

work page arXiv 2022
[44]

S. Wang, X. Yu, and P. Perdikaris, Journal of Compu- tational Physics449, 110768 (2022)

2022
[45]

L. D. McClenny and U. M. Braga-Neto, Journal of Com- putational Physics474, 111722 (2023)

2023
[46]

S. Wang, Y. Teng, and P. Perdikaris, SIAM Journal on Scientific Computing43, A3055 (2021), https://doi.org/10.1137/20M1318043

work page doi:10.1137/20m1318043 2021
[47]

Fp64 is all you need: rethinking failure modes in physics-informed neural networks.arXiv preprint arXiv:2505.10949, 2025

C. Xu, D. Liu, A. Nassereldine, and J. Xiong, “Fp64 is all you need: Rethinking failure modes in physics-informed neural networks,” (2025), arXiv:2505.10949 [cs.LG]

work page arXiv 2025
[48]

J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, arXiv e-prints , arXiv:1709.01507 (2017), arXiv:1709.01507 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[49]

Zhang, S

H.Zhongkai, J.Yao, C.Su, H.Su, Z.Wang, F.Lu, Z.Xia, Y. Zhang, S. Liu, L. Lu,et al., Advances in Neural In- formation Processing Systems37, 76721 (2024)

2024
[50]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er- han, I. Goodfellow, and R. Fergus, arXiv preprint arXiv:1312.6199 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013
[51]

S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl- Dickstein, arXiv preprint arXiv:1611.01232 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[52]

Pennington, S

J. Pennington, S. Schoenholz, and S. Ganguli, Advances in neural information processing systems30(2017)

2017
[53]

Sensitivity and Generalization in Neural Networks: an Empirical Study

R. Novak, Y. Bahri, D. A. Abolafia, J. Pennington, and J. Sohl-Dickstein, arXiv preprint arXiv:1802.08760 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[54]

Ross and F

A. Ross and F. Doshi-Velez, inProceedings of the AAAI conference on artificial intelligence, Vol. 32 (2018)

2018
[55]

Pennington, S

J. Pennington, S. Schoenholz, and S. Ganguli, inInter- national Conference on Artificial Intelligence and Statis- tics(PMLR, 2018) pp. 1924–1932

2018
[56]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Advances in neural information processing systems30 (2017)

2017
[57]

Z. Qin, W. Sun, D. Li, X. Shen, W. Sun, and Y. Zhong, arXiv preprint arXiv:2401.04658 (2024)

work page arXiv 2024
[58]

Y. Sun, L. Dong, S. Huang, S. Ma, Y. Xia, J. Xue, J. Wang, and F. Wei, arXiv preprint arXiv:2307.08621 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[59]

Z. Qiu, Z. Wang, B. Zheng, Z. Huang, K. Wen, S. Yang, R. Men, L. Yu, F. Huang, S. Huang,et al., arXiv preprint arXiv:2505.06708 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[60]

Z. Lin, E. Nikishin, X. O. He, and A. Courville, arXiv preprint arXiv:2503.02130 (2025)

work page arXiv 2025
[61]

M. E. Larkum, W. Senn, and H.-R. Lüscher, Cerebral cortex14, 1059 (2004)

2004
[62]

Gidon, T

A. Gidon, T. A. Zolnik, P. Fidzinski, F. Bolduan, A. Pa- poutsi, P. Poirazi, M. Holtkamp, I. Vida, and M. E. Larkum, Science367, 83 (2020)

2020
[63]

D. P. Kingma, arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[64]

Chebfun guide,

T. A. Driscoll, N. Hale, and L. N. Trefethen, “Chebfun guide,” (2014). 16 Supplementary Information S1. DETAILED TRAINING LOSS ANALYSIS This document provides supplementary results to support the claims made in the main text, demonstrating the consistentandsignificantadvantagesoftheSEA-PINNoverthestandardFNN-PINN.Wepresentdetailedtrainingloss comparisons ...

2014
[65]

In the top two plots, SEA-PINN’s average prediction (orange line) stays closer to the true function (dashed black line) in the extrapolation regions ( x<0 and x>1 ). Furthermore, its uncertainty band (orange shading) is significantly smaller than the FNN-PINN’s (blue shading), indicating that SEA-PINN’s predictions are not only more accurate but also more...
[66]

While both models learn the difficult features inside the training domain, the FNN-PINN produces highly unstable and incorrect predictions outside of it

The bottom two plots, which feature a sharp peak and a sudden jump, highlight this difference even more clearly. While both models learn the difficult features inside the training domain, the FNN-PINN produces highly unstable and incorrect predictions outside of it. In contrast, SEA-PINN defaults to a simple, flat prediction in these unknown regions, main...

work page arXiv 2031
[67]

a(k) N   =   ζ1 sin f (k) 1 z(k) 1 +ζ 2 cos f (k) 1 z(k) 1 ζ1 sin f (k) 2 z(k) 2 +ζ 2 cos f (k) 2 z(k) 2

Trainable Sine Activation (TSA) where the final output of thek-th layer is: a(k) =   a(k) 1 a(k) 2 ... a(k) N   =   ζ1 sin f (k) 1 z(k) 1 +ζ 2 cos f (k) 1 z(k) 1 ζ1 sin f (k) 2 z(k) 2 +ζ 2 cos f (k) 2 z(k) 2 ... ζ1 sin f (k) N z(k) N +ζ 2 cos f (k) N z(k) N   (S1) where: •z (k) i is the output of thei-th neuron in thek-th layer...
[68]

S(a) = 1 1 L−1 PL−1 k=1 exp 1 Nk PNk i=1 f (k) i (S2) where: •Lis the total number of layers, •N k is the number of neurons in thek-th layer

Slope Recovery Mechanism The slope recovery termS(a)dynamically adjusts the slope of the activation function, maintaining active and effective gradient propagation throughout the network. S(a) = 1 1 L−1 PL−1 k=1 exp 1 Nk PNk i=1 f (k) i (S2) where: •Lis the total number of layers, •N k is the number of neurons in thek-th layer. This slope recovery term is...
[69]

Compute the standard TSA-PINN activation outputa(l): a(l) i =ζ 1 sin(f (l) i z(l) i ) +ζ 2 cos(f (l) i z(l) i )(S4) wherez (l) i =w (l) i ·ˆ a(l−1) +b (l) i
[70]

Use the activation outputa(l) as input to the weight generator to compute the weight vectorλ(l): λ(l) =Sigmoid(W 2σ(W1a(l) +b 1) +b 2)(S5)
[71]

Element-wise multiply the weight vector with the activation output to obtain the final layer output: ˆ a(l) =λ (l) ⊙a (l) (S6) By combining TSA-PINN with our weighting module, the TSA_SEA-PINN model explores whether the adaptive weighting mechanism can serve as a general enhancement module, further improving the performance of existing advanced PINN archi...

2000

[1] [1]

Linear Transformation: z(l) =W (l)h(l−1) +b (l) (3)

[2] [2]

Nonlinear Activation: a(l) =σ (l)(z(l)) wherea (l) ∈R Nl

[3] [3]

6: Architecture of SEA-PINN within PINNs: The complete set of outputs from thenth hidden layer is taken as input to the weight generator, producing a weight vector

Weight Generation (WG): The activationa (l) is processed by a lightweight squeeze-excitation block to 11 𝜕𝑡 𝑛 𝜕𝑥 𝑛 ℒ = ℒ𝑃𝐷𝐸 + ℒ𝐼𝐶 + ℒ𝐵𝐶 < 𝜖? End N Y 𝜕ℒ 𝜕𝛩Update 𝛩 AD Loss Squeeze-Excitation like Attention× n Output Input … … … … … FIG. 6: Architecture of SEA-PINN within PINNs: The complete set of outputs from thenth hidden layer is taken as input to the w...

[4] [4]

Weighted Output: h(l) =λ (l) ⊙a (l) (7) This mechanism enables SEA-PINN to dynamically adjust the contribution of each neuron. If a neuron’s ac- tivation is less relevant for a specific input region in the current task, the weight generator can assign a smaller weight to suppress its influence; conversely, a larger weight can be assigned to enhance its ro...

[5] [5]

reduces to the usual activation a(l) FNN =σ (l) z(l) ,(8) wherez (l) is given by Eq. (3). Since the nonlinearity is applied element-wise, the local Jacobian ofa (l) FNN with respect toz (l) is diagonal: D(l) FNN = ∂a(l) FNN ∂z(l) = diag σ(l)′ (z(l)) .(9) The output layer is purely linear, fFNN =W (L)a(L−1) +b (L),(10) so the Jacobian of the network output...

[6] [6]

This ensures a compre- hensive assessment of model performance

Benchmark Suite and Evaluation Metrics We evaluated the proposed models on 20 challenging PDE problems selected from a recognized PINNs bench- mark suite, covering diverse domains including heat con- duction, fluid dynamics, biology, electromagnetics, and high-dimensional problems [41]. This ensures a compre- hensive assessment of model performance. As al...

[7] [7]

Unless otherwise specified, all weights are initialized using He uniform initialization and biases are set to zero

Network Configuration and Training For a fair comparison, all neural network mod- els (FNN-PINN, SEA-PINN, TSA-PINN, TSA_SEA- PINN) share the same backbone architecture, consist- ing of 9 hidden layers with 32 neurons per layer. Unless otherwise specified, all weights are initialized using He uniform initialization and biases are set to zero. Activation F...

[8] [8]

Advanced Detection and Artificial Intelligence at the Frontiers of Physics

for chaotic problems. These settings are kept consistent across all models to ensure that any performance difference arises from the architectural innovations rather than training hyperpa- rameters. Supplementary Information: detailed training loss analysis, total training loss comparison, individual loss component comparison, initial loss behavior of SEA...

2025

[9] [9]

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Nature Reviews Physics3, 422 (2021)

2021

[10] [10]

A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and J. M. Siskind, Journal of machine learning research18, 1 (2018)

2018

[11] [11]

Lee and I

H. Lee and I. S. Kang, Journal of Computational Physics 91, 110 (1990)

1990

[12] [12]

A. J. Meade Jr and A. A. Fernandez, Mathematical and Computer Modelling20, 19 (1994)

1994

[13] [13]

M. G. Dissanayake and N. Phan-Thien, communications in Numerical Methods in Engineering10, 195 (1994)

1994

[14] [14]

Lagaris, A

I. Lagaris, A. Likas, and D. Fotiadis, IEEE Transactions on Neural Networks9, 987 (1998)

1998

[15] [15]

G. E. Hinton and R. R. Salakhutdinov, science313, 504 (2006)

2006

[16] [16]

Raissi, P

M. Raissi, P. Perdikaris, and G. Karniadakis, Journal of Computational Physics378, 686 (2019)

2019

[17] [17]

L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis, SIAM review63, 208 (2021)

2021

[18] [18]

X. Jin, S. Cai, H. Li, and G. E. Karniadakis, Journal of Computational Physics426, 109951 (2021)

2021

[19] [19]

R. Qiu, R. Huang, Y. Xiao, J. Wang, Z. Zhang, J. Yue, Z. Zeng, and Y. Wang, Physics of Fluids34(2022)

2022

[20] [20]

Y. Wang, J. Sun, W. Li, Z. Lu, and Y. Liu, Com- puter Methods in Applied Mechanics and Engineering 400, 115491 (2022)

2022

[21] [21]

Z. Wu, H. Zhang, H. Ye, H. Zhang, Y. Zheng, and X. Guo, Acta Mechanica235, 4895 (2024)

2024

[22] [22]

Y. Chen, L. Lu, G. E. Karniadakis, and L. Dal Negro, Optics express28, 11618 (2020)

2020

[23] [23]

Fang and J

Z. Fang and J. Zhan, Ieee Access8, 24506 (2019)

2019

[24] [24]

P.Ren, C.Rao, S.Chen, J.-X.Wang, H.Sun, andY.Liu, Computer Physics Communications295, 109010 (2024)

2024

[25] [25]

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, Nature machine intelligence3, 218 (2021)

2021

[26] [26]

S. Wang, H. Wang, and P. Perdikaris, Science advances 7, eabi8605 (2021)

2021

[27] [27]

L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, and G. E. Karniadakis, Computer Methods in Applied Mechanics and Engineering393, 114778 (2022)

2022

[28] [28]

P. Jin, S. Meng, and L. Lu, SIAM Journal on Scientific Computing44, A3490 (2022)

2022

[29] [29]

M. Zhu, S. Feng, Y. Lin, and L. Lu, Computer Meth- ods in Applied Mechanics and Engineering416, 116300 (2023)

2023

[30] [30]

Jiang, M

Z. Jiang, M. Zhu, and L. Lu, Reliability Engineering & System Safety251, 110392 (2024)

2024

[31] [31]

M.Zhu, H.Zhang, A.Jiao, G.E.Karniadakis, andL.Lu, Computer Methods in Applied Mechanics and Engineer- ing412, 116064 (2023)

2023

[32] [32]

Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. liu, K. Bhattacharya, A. Stuart, and A. Anandkumar, in International Conference on Learning Representations (2021)

2021

[33] [33]

A. Jiao, H. He, R. Ranade, J. Pathak, and L. Lu, Nature Communications16, 8386 (2025)

2025

[34] [34]

Rahaman, A

N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y. Bengio, and A. Courville, inProceed- ings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 97, edited by K. Chaudhuri and R. Salakhutdinov (PMLR, 2019) pp. 5301–5310

2019

[35] [35]

Tancik, P

M. Tancik, P. Srinivasan, B. Mildenhall, S. Fridovich- Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. Bar- ron, and R. Ng, Advances in neural information process- ing systems33, 7537 (2020)

2020

[36] [36]

A. D. Jagtap, K. Kawaguchi, and G. Em Karni- adakis,ProceedingsoftheRoyalSocietyA476,20200334 (2020)

2020

[37] [37]

H. Gao, L. Sun, and J.-X. Wang, Journal of Computa- tional Physics428, 110079 (2021)

2021

[38] [38]

Pfaff, M

T. Pfaff, M. Fortunato, A. Sanchez-Gonzalez, and P. Battaglia, inInternational conference on learning rep- 15 resentations(2020)

2020

[39] [39]

Z. Zhao, X. Ding, and B. A. Prakash, arXiv preprint arXiv:2307.11833 (2023)

work page arXiv 2023

[40] [40]

A. D. Jagtap, E. Kharazmi, and G. E. Karniadakis, Computer Methods in Applied Mechanics and Engineer- ing365, 113028 (2020)

2020

[41] [41]

A. D. Jagtap and G. E. Karniadakis, Communications in Computational Physics28(2020)

2020

[42] [42]

C. Wu, M. Zhu, Q. Tan, Y. Kartha, and L. Lu, Com- puter Methods in Applied Mechanics and Engineering 403, 115671 (2023)

2023

[43] [43]

Respecting causality is all you need for training physics-informed neural networks,

S. Wang, S. Sankaran, and P. Perdikaris, “Respecting causality is all you need for training physics-informed neural networks,” (2022), arXiv:2203.07404 [cs.LG]

work page arXiv 2022

[44] [44]

S. Wang, X. Yu, and P. Perdikaris, Journal of Compu- tational Physics449, 110768 (2022)

2022

[45] [45]

L. D. McClenny and U. M. Braga-Neto, Journal of Com- putational Physics474, 111722 (2023)

2023

[46] [46]

S. Wang, Y. Teng, and P. Perdikaris, SIAM Journal on Scientific Computing43, A3055 (2021), https://doi.org/10.1137/20M1318043

work page doi:10.1137/20m1318043 2021

[47] [47]

Fp64 is all you need: rethinking failure modes in physics-informed neural networks.arXiv preprint arXiv:2505.10949, 2025

C. Xu, D. Liu, A. Nassereldine, and J. Xiong, “Fp64 is all you need: Rethinking failure modes in physics-informed neural networks,” (2025), arXiv:2505.10949 [cs.LG]

work page arXiv 2025

[48] [48]

J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, arXiv e-prints , arXiv:1709.01507 (2017), arXiv:1709.01507 [cs.CV]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[49] [49]

Zhang, S

H.Zhongkai, J.Yao, C.Su, H.Su, Z.Wang, F.Lu, Z.Xia, Y. Zhang, S. Liu, L. Lu,et al., Advances in Neural In- formation Processing Systems37, 76721 (2024)

2024

[50] [50]

Intriguing properties of neural networks

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er- han, I. Goodfellow, and R. Fergus, arXiv preprint arXiv:1312.6199 (2013)

work page internal anchor Pith review Pith/arXiv arXiv 2013

[51] [51]

S. S. Schoenholz, J. Gilmer, S. Ganguli, and J. Sohl- Dickstein, arXiv preprint arXiv:1611.01232 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[52] [52]

Pennington, S

J. Pennington, S. Schoenholz, and S. Ganguli, Advances in neural information processing systems30(2017)

2017

[53] [53]

Sensitivity and Generalization in Neural Networks: an Empirical Study

R. Novak, Y. Bahri, D. A. Abolafia, J. Pennington, and J. Sohl-Dickstein, arXiv preprint arXiv:1802.08760 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[54] [54]

Ross and F

A. Ross and F. Doshi-Velez, inProceedings of the AAAI conference on artificial intelligence, Vol. 32 (2018)

2018

[55] [55]

Pennington, S

J. Pennington, S. Schoenholz, and S. Ganguli, inInter- national Conference on Artificial Intelligence and Statis- tics(PMLR, 2018) pp. 1924–1932

2018

[56] [56]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Advances in neural information processing systems30 (2017)

2017

[57] [57]

Z. Qin, W. Sun, D. Li, X. Shen, W. Sun, and Y. Zhong, arXiv preprint arXiv:2401.04658 (2024)

work page arXiv 2024

[58] [58]

Y. Sun, L. Dong, S. Huang, S. Ma, Y. Xia, J. Xue, J. Wang, and F. Wei, arXiv preprint arXiv:2307.08621 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[59] [59]

Z. Qiu, Z. Wang, B. Zheng, Z. Huang, K. Wen, S. Yang, R. Men, L. Yu, F. Huang, S. Huang,et al., arXiv preprint arXiv:2505.06708 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[60] [60]

Z. Lin, E. Nikishin, X. O. He, and A. Courville, arXiv preprint arXiv:2503.02130 (2025)

work page arXiv 2025

[61] [61]

M. E. Larkum, W. Senn, and H.-R. Lüscher, Cerebral cortex14, 1059 (2004)

2004

[62] [62]

Gidon, T

A. Gidon, T. A. Zolnik, P. Fidzinski, F. Bolduan, A. Pa- poutsi, P. Poirazi, M. Holtkamp, I. Vida, and M. E. Larkum, Science367, 83 (2020)

2020

[63] [63]

D. P. Kingma, arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[64] [64]

Chebfun guide,

T. A. Driscoll, N. Hale, and L. N. Trefethen, “Chebfun guide,” (2014). 16 Supplementary Information S1. DETAILED TRAINING LOSS ANALYSIS This document provides supplementary results to support the claims made in the main text, demonstrating the consistentandsignificantadvantagesoftheSEA-PINNoverthestandardFNN-PINN.Wepresentdetailedtrainingloss comparisons ...

2014

[65] [65]

In the top two plots, SEA-PINN’s average prediction (orange line) stays closer to the true function (dashed black line) in the extrapolation regions ( x<0 and x>1 ). Furthermore, its uncertainty band (orange shading) is significantly smaller than the FNN-PINN’s (blue shading), indicating that SEA-PINN’s predictions are not only more accurate but also more...

[66] [66]

While both models learn the difficult features inside the training domain, the FNN-PINN produces highly unstable and incorrect predictions outside of it

The bottom two plots, which feature a sharp peak and a sudden jump, highlight this difference even more clearly. While both models learn the difficult features inside the training domain, the FNN-PINN produces highly unstable and incorrect predictions outside of it. In contrast, SEA-PINN defaults to a simple, flat prediction in these unknown regions, main...

work page arXiv 2031

[67] [67]

a(k) N   =   ζ1 sin f (k) 1 z(k) 1 +ζ 2 cos f (k) 1 z(k) 1 ζ1 sin f (k) 2 z(k) 2 +ζ 2 cos f (k) 2 z(k) 2

Trainable Sine Activation (TSA) where the final output of thek-th layer is: a(k) =   a(k) 1 a(k) 2 ... a(k) N   =   ζ1 sin f (k) 1 z(k) 1 +ζ 2 cos f (k) 1 z(k) 1 ζ1 sin f (k) 2 z(k) 2 +ζ 2 cos f (k) 2 z(k) 2 ... ζ1 sin f (k) N z(k) N +ζ 2 cos f (k) N z(k) N   (S1) where: •z (k) i is the output of thei-th neuron in thek-th layer...

[68] [68]

S(a) = 1 1 L−1 PL−1 k=1 exp 1 Nk PNk i=1 f (k) i (S2) where: •Lis the total number of layers, •N k is the number of neurons in thek-th layer

Slope Recovery Mechanism The slope recovery termS(a)dynamically adjusts the slope of the activation function, maintaining active and effective gradient propagation throughout the network. S(a) = 1 1 L−1 PL−1 k=1 exp 1 Nk PNk i=1 f (k) i (S2) where: •Lis the total number of layers, •N k is the number of neurons in thek-th layer. This slope recovery term is...

[69] [69]

Compute the standard TSA-PINN activation outputa(l): a(l) i =ζ 1 sin(f (l) i z(l) i ) +ζ 2 cos(f (l) i z(l) i )(S4) wherez (l) i =w (l) i ·ˆ a(l−1) +b (l) i

[70] [70]

Use the activation outputa(l) as input to the weight generator to compute the weight vectorλ(l): λ(l) =Sigmoid(W 2σ(W1a(l) +b 1) +b 2)(S5)

[71] [71]

Element-wise multiply the weight vector with the activation output to obtain the final layer output: ˆ a(l) =λ (l) ⊙a (l) (S6) By combining TSA-PINN with our weighting module, the TSA_SEA-PINN model explores whether the adaptive weighting mechanism can serve as a general enhancement module, further improving the performance of existing advanced PINN archi...

2000