arxiv: 2604.08869 · v1 · submitted 2026-04-10 · 🧮 math.NA · cs.NA

Recognition: 2 theorem links

· Lean Theorem

Adaptive Randomized Neural Networks with Locally Activation Function: Theory and Algorithm for Solving PDEs

Ran Bi, Weibing Deng

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:56 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords randomized neural networksapproximation theorempartition of unityadaptive methodphysics-informed neural networksPDE solvinglocal regularitya posteriori error

0 comments

The pith

Randomized neural networks achieve optimal approximation when the hidden-parameter sampling domain is sized to match the target function's smoothness and the number of neurons.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves an approximation theorem for randomized neural networks whose hidden-layer parameters are drawn uniformly from a single bounded domain. It shows that the domain size needed for the best possible error rates depends directly on how smooth the target function is and on how many neurons the network contains. This theoretical link is used to build an adaptive physics-informed method that refines a partition of unity according to a posteriori error indicators, allowing the network to focus on localized regions of low regularity when solving partial differential equations.

Core claim

The authors establish that for networks of the form sum W_i sigma(A_i, b_i) with uniform sampling of (A_i, b_i) from a prescribed bounded domain, optimal approximation rates require the domain size to scale with the smoothness of the target function and the network width. They then combine these networks with a partition of unity whose subdomains are refined adaptively by a posteriori error indicators, producing the adaptive PIRaNN scheme that solves PDEs whose solutions have limited local regularity without introducing additional consistency errors.

What carries the argument

The approximation theorem that relates the required size of the uniform sampling domain for hidden parameters in randomized neural networks to the smoothness of the target function and the number of neurons, together with a posteriori error-driven partition-of-unity refinement.

If this is right

The adaptive PIRaNN method captures localized low-regularity features in PDE solutions by refining the partition of unity according to a posteriori indicators.
The method maintains consistency because the refinement strategy does not add new approximation errors beyond those already controlled by the randomized network.
Numerical benchmarks confirm both the theoretical dependence of domain size on smoothness and the practical performance of the adaptive scheme on standard test problems.
The approach extends the use of randomized networks from globally smooth to locally irregular PDE solutions while keeping the number of neurons moderate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same domain-size tuning principle could be tested on other randomized approximation schemes outside neural networks to see whether it yields similar rate improvements.
Applying the adaptive partition-of-unity idea to time-dependent or high-dimensional PDEs would test whether the error-driven refinement remains computationally efficient as dimension grows.
If the load-bearing assumption holds, one could replace the uniform sampling step with other simple distributions and still obtain the same link between domain size and smoothness.

Load-bearing premise

Uniform sampling of hidden-layer parameters from one fixed bounded domain plus error-driven partition-of-unity refinement is enough to resolve localized low-regularity features without creating new consistency errors.

What would settle it

A numerical test that measures whether the observed approximation or PDE-solution error stops improving at the predicted optimal rate once the sampling domain size is deliberately mismatched to the smoothness and neuron count, or once the adaptive refinement is removed.

read the original abstract

This paper establishes an approximation theorem for randomized neural networks (RaNNs) whose hidden-layer parameters are uniformly sampled from a prescribed bounded domain. Our analysis shows that, for RaNNs of the form $\mathop{\sum}_i W_i \sigma(A_i, b_i)$, the size of the sampling domain required to achieve optimal approximation is intrinsically linked to the smoothness of the target function and the number of neurons. Motivated by this theoretical insight, we integrate a partition of unity (PoU) with RaNNs to develop an adaptive physics-informed randomized neural network (PIRaNN) method for solving partial differential equations with limited local regularity. The proposed adaptive strategy refines the PoU based on a posteriori error indicators, enabling the network to efficiently capture localized solution features. Numerical experiments validate the theoretical results and demonstrate the strong approximation capabilities of RaNNs, confirming the effectiveness of the adaptive PIRaNN method on a range of benchmark problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper ties sampling domain size to smoothness and neuron count in randomized NNs, then uses that to motivate an adaptive PoU-PIRaNN for PDEs with local features, but the adaptive error bounds look underdeveloped.

read the letter

The key takeaway here is that the authors derive a relationship showing how the bounded domain for sampling hidden-layer parameters in randomized neural networks must scale with the smoothness of the solution and the number of neurons to get optimal approximation rates. They then leverage this to create an adaptive physics-informed randomized neural network that uses partition of unity to focus on local features in PDEs with limited regularity. This link between domain size, smoothness, and neuron count is the genuinely new piece. Most randomized NN work picks the sampling range somewhat arbitrarily, so having a theorem that ties it to regularity is helpful for making these methods more reliable. The adaptive strategy, where they refine the PoU based on a posteriori indicators, lets the method handle solutions with limited local regularity without needing a huge global network. The numerical experiments on benchmark problems apparently confirm that this works in practice for capturing localized features. The potential issue is whether the adaptive version inherits the approximation guarantees without extra errors. The base theorem is for a fixed global sampling domain. When you break the domain into patches, each with its own possibly smaller sampling domain, and combine them with PoU weights, you need to show that the local errors add up properly and that the randomization still works locally. The abstract mentions the method but doesn't indicate a detailed proof for the global error bound under adaptation. If the full paper has that, great; if not, that's the spot that needs tightening. Readers working on neural network approaches to PDEs, particularly those exploring randomized or physics-informed variants, will find this useful. It offers both a theoretical insight and a practical adaptive algorithm. It's not going to change the field overnight, but it's a solid incremental step. I think this deserves to go to peer review. The core idea is sound enough to warrant checking the details of the proofs and the experiments.

Referee Report

2 major / 2 minor

Summary. The manuscript establishes an approximation theorem for randomized neural networks (RaNNs) with hidden-layer parameters uniformly sampled from a bounded domain, showing that the required sampling-domain size is linked to the target function's smoothness and the number of neurons. It then proposes an adaptive physics-informed RaNN (PIRaNN) method that integrates a partition of unity (PoU), refines patches via a posteriori error indicators, and solves PDEs with localized low regularity; numerical experiments on benchmark problems are used to support the claims.

Significance. If the approximation theorem holds and the adaptive PoU construction is shown to preserve optimal rates, the work would supply a theoretically motivated adaptive framework for neural solvers of PDEs with singularities or reduced regularity, with the domain-size/smoothness link offering practical guidance for parameter choice. The numerical validation on benchmarks is a positive indicator, but the absence of quantitative rate comparisons limits the assessed impact.

major comments (2)

[§3] §3 (approximation theorem): the result is stated for a single global RaNN with fixed bounded sampling domain; the subsequent adaptive PIRaNN construction assigns independent RaNNs to a posteriori-refined patches with possibly different local sampling domains, yet no error-propagation argument is supplied showing that the sum of local approximation errors remains controlled by the same smoothness-dependent constants.
[§4.2] §4.2 (adaptive PIRaNN algorithm): the claim that the method captures localized low-regularity features 'without introducing new consistency errors' rests on the assumption that PoU weighting and per-patch adaptive sampling commute with the randomization argument; a concrete global error bound or proof sketch verifying that each local RaNN satisfies the theorem hypotheses at every refinement step is required.

minor comments (2)

[Abstract] Abstract: the statement that 'numerical experiments validate the theoretical results' is vague; a single sentence summarizing observed convergence rates or error magnitudes relative to theory would strengthen the claim.
[Notation] Notation: the RaNN form ∑_i W_i σ(A_i, b_i) is introduced without an immediate reminder of the precise definitions of the random matrices A_i and vectors b_i; adding a short parenthetical or reference to the earlier definition would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the major comments point by point below, agreeing that the connection between the global approximation theorem and the adaptive construction requires explicit justification. We will strengthen the manuscript accordingly.

read point-by-point responses

Referee: [§3] §3 (approximation theorem): the result is stated for a single global RaNN with fixed bounded sampling domain; the subsequent adaptive PIRaNN construction assigns independent RaNNs to a posteriori-refined patches with possibly different local sampling domains, yet no error-propagation argument is supplied showing that the sum of local approximation errors remains controlled by the same smoothness-dependent constants.

Authors: The referee is correct that Theorem 3.1 is stated for a single global RaNN. The adaptive PIRaNN employs a partition of unity to localize the approximation, with each patch using its own RaNN whose sampling domain is sized according to the local regularity. Because the PoU functions are smooth, non-negative, and sum to one, the global L2 error is bounded by a sum of the local errors (with a multiplicative constant depending only on the PoU). We will insert a new proposition after Theorem 3.1 that makes this propagation explicit, showing that the smoothness-dependent constants from the theorem carry over to each local approximant when the sampling domain is chosen adaptively per patch. This addition will confirm that the global error remains controlled without inflation. revision: yes
Referee: [§4.2] §4.2 (adaptive PIRaNN algorithm): the claim that the method captures localized low-regularity features 'without introducing new consistency errors' rests on the assumption that PoU weighting and per-patch adaptive sampling commute with the randomization argument; a concrete global error bound or proof sketch verifying that each local RaNN satisfies the theorem hypotheses at every refinement step is required.

Authors: We acknowledge that the current text does not supply a self-contained verification that the local randomization hypotheses remain satisfied after each refinement. The PoU weights are independent of the random parameters and the adaptive choice of sampling domain is made from a posteriori indicators that estimate local smoothness; thus the local problems continue to meet the hypotheses of the theorem. We will add a short proof sketch in §4.2 that (i) confirms each local RaNN at every step satisfies the uniform-sampling assumption with a domain sized to the local regularity, and (ii) assembles the local bounds into a global a-priori error estimate that contains no extra consistency terms arising from the PoU or the adaptation process. This will rigorously justify the claim. revision: yes

Circularity Check

0 steps flagged

No circularity detected; approximation theorem and adaptive construction remain independent of self-referential inputs.

full rationale

The paper first states an approximation theorem for RaNNs that links the required sampling-domain diameter to the target function's smoothness and the number of neurons; this is presented as a derived result from analysis of the form ∑ W_i σ(A_i, b_i) with uniform sampling from a bounded domain. The subsequent adaptive PIRaNN construction with a posteriori PoU refinement is motivated by that theorem but does not redefine any quantity in terms of itself, fit a parameter and relabel it a prediction, or rely on a load-bearing self-citation whose content is unverified. No equation reduces the claimed global error bound to a fitted constant or to the adaptive choice itself by construction. The derivation chain is therefore self-contained and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard approximation-theory assumptions for neural networks and on the existence of reliable a-posteriori error indicators for the chosen PDE class; no new free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Uniform sampling of hidden-layer parameters from a bounded domain yields approximation rates governed by the smoothness of the target and the number of neurons.
Invoked in the statement of the approximation theorem for RaNNs.
domain assumption A posteriori error indicators computed from the current network solution can be used to refine the partition of unity without destroying consistency.
Required for the adaptive strategy to converge on problems with limited local regularity.

pith-pipeline@v0.9.0 · 5463 in / 1510 out tokens · 31886 ms · 2026-05-10T17:56:46.302938+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
the size of the sampling domain required to achieve optimal approximation is intrinsically linked to the smoothness of the target function and the number of neurons... M = O(N^{p/2[(p-1)(d+1)+pη]})
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
integrate a partition of unity (PoU) with RaNNs... adaptive strategy refines the PoU based on a posteriori error indicators

Reference graph

Works this paper leans on

40 extracted references · 12 canonical work pages · 2 internal anchors

[1]

R. A. Adams and J. J. Fournier , Sobolev spaces, vol. 140, Elsevier, 2003

2003
[2]

A. R. Barron , Universal approximation bounds for superpositions of a sig moidal function , IEEE Transactions on Information theory, 39 (2002), pp. 930 –945

2002
[3]

R. Bi, W. Deng, and Y. Zhu , Extended interface physics-informed neural networks meth od for moving interface problems , arXiv preprint arXiv:2508.01463, (2025)

work page arXiv 2025
[4]

J. M. Cascon, C. Kreuzer, R. H. Nochetto, and K. G. Siebert , Quasi-optimal convergence rate for an adaptive ﬁnite element method , SIAM Journal on Numerical Analysis, 46 (2008), pp. 2524–2550

2008
[5]

J. Chen, X. Chi, W. E, and Z. Yang , Bridging traditional and machine learning-based algo- rithms for solving pdes: the random feature method , J Mach Learn, 1 (2022), pp. 268–298

2022
[6]

S. M. Cox and P. C. Matthews , Exponential time diﬀerencing for stiﬀ systems , Journal of Computational Physics, 176 (2002), pp. 430–455

2002
[7]

De Ryck, S

T. De Ryck, S. Lanthaler, and S. Mishra , On the approximation of functions by tanh neural networks, Neural Networks, 143 (2021), pp. 732–750

2021
[8]

Approximation theory and applications of randomized neural networks for solving high-dimensional PDEs.arXiv preprint arXiv:2501.12145, 2025

T. De Ryck, S. Mishra, Y. Shang, and F. W ang , Approximation theory and applica- tions of randomized neural networks for solving high-dimen sional pdes , arXiv preprint arXiv:2501.12145, (2025)

work page arXiv 2025
[9]

Dong and Z

S. Dong and Z. Li , Local extreme learning machines and domain decomposition f or solving lin- ear and nonlinear partial diﬀerential equations , Computer Methods in Applied Mechanics and Engineering, 387 (2021), p. 114129

2021
[10]

D ¨orfler, A convergent adaptive algorithm for poisson ’s equation , SIAM Journal on Nu- merical Analysis, 33 (1996), pp

W. D ¨orfler, A convergent adaptive algorithm for poisson ’s equation , SIAM Journal on Nu- merical Analysis, 33 (1996), pp. 1106–1124

1996
[11]

T. A. Driscoll, N. Hale, and L. N. Trefethen , Chebfun guide , 2014

2014
[12]

Dwivedi and B

V. Dwivedi and B. Srinivasan , Physics informed extreme learning machine (pielm)–a rapid method for the numerical solution of partial diﬀerential eq uations, Neurocomputing, 391 (2020), pp. 96–118

2020
[13]

Ellacott , Aspects of the numerical analysis of neural networks , Acta numerica, 3 (1994), pp

S. Ellacott , Aspects of the numerical analysis of neural networks , Acta numerica, 3 (1994), pp. 145–202

1994
[14]

L. C. Evans , Partial diﬀerential equations , vol. 19, American mathematical society, 2022

2022
[15]

G. B. Folland , Real analysis: modern techniques and their applications , John Wiley & Sons, 1999

1999
[16]

Gonon , Random feature neural networks learn black-scholes type pd es without curse of dimensionality, Journal of Machine Learning Research, 24 (2023), pp

L. Gonon , Random feature neural networks learn black-scholes type pd es without curse of dimensionality, Journal of Machine Learning Research, 24 (2023), pp. 1–51

2023
[17]

G ¨uhring and M

I. G ¨uhring and M. Raslan , Approximation rates for neural networks with encodable wei ghts in smoothness spaces , Neural Networks, 134 (2021), pp. 107–130

2021
[18]

Hu, T.-S

W.-F. Hu, T.-S. Lin, and M.-C. Lai , A discontinuity capturing shallow neural network for elliptic interface problems , Journal of Computational Physics, 469 (2022), p. 111576

2022
[19]

Huang, Q.-Y

G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew , Extreme learning machine: theory and applica- tions, Neurocomputing, 70 (2006), pp. 489–501

2006
[20]

A. D. Jagtap, K. Kaw aguchi, and G. E. Karniadakis , Adaptive activation functions acceler- ate convergence in deep and physics-informed neural networ ks, Journal of Computational Physics, 404 (2020), p. 109136

2020
[21]

O. A. Karakashian and F. Pascal , Convergence of adaptive discontinuous galerkin approxi- mations of second-order elliptic problems , SIAM Journal on Numerical Analysis, 45 (2007), pp. 641–665

2007
[22]

J. M. Klusowski and A. R. Barron , Risk bounds for high-dimensional ridge function com- binations including neural networks , arXiv preprint arXiv:1607.01434, (2016)

work page arXiv 2016
[23]

J. M. Klusowski and A. R. Barron , Uniform approximation by neural networks activated by ﬁrst and second order ridge splines , arXiv preprint arXiv:1607.07819, (2016)

work page arXiv 2016
[24]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattachary a, A. Stuart, and A. Anandkumar , Fourier neural operator for parametric partial diﬀerentia l equations , arXiv preprint arXiv:2010.08895, (2020)

work page internal anchor Pith review arXiv 2010
[25]

X. Liu, T. Mao, and J. Xu , Integral representations of sobolev spaces via reluk activ ation func- tion and optimal error estimates for linearized networks , arXiv preprint arXiv:2505.00351, (2025)

work page arXiv 2025
[26]

J. Lu, Z. Shen, H. Yang, and S. Zhang , Deep network approximation for smooth functions , SIAM Journal on Mathematical Analysis, 53 (2021), pp. 5465– 5506. 30 RAN BI AND WEIBING DENG

2021
[27]

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis , Learning nonlinear operators via deeponet based on the universal approximation theorem o f operators, Nature machine intelligence, 3 (2021), pp. 218–229

2021
[28]

Universal approximation property of Banach space-valued random feature models including random neural networks

A. Neufeld and P. Schmocker , Universal approximation property of random neural network s, arXiv preprint arXiv:2312.08410, (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[29]

Raissi, P

M. Raissi, P. Perdikaris, and G. E. Karniadakis , Physics-informed neural networks: A deep learning framework for solving forward and inverse problem s involving nonlinear partial diﬀerential equations , Journal of Computational physics, 378 (2019), pp. 686–707

2019
[30]

Rathore, W

P. Rathore, W. Lei, Z. Frangella, L. Lu, and M. Udell , Challenges in training pinns: A loss landscape perspective , arXiv preprint arXiv:2402.01868, (2024)

work page arXiv 2024
[31]

J. W. Siegel and J. Xu , Approximation rates for neural networks with general activ ation functions, Neural Networks, 128 (2020), pp. 313–321

2020
[32]

J. W. Siegel and J. Xu , High-order approximation rates for shallow neural network s with cosine and reluk activation functions , Applied and Computational Harmonic Analysis, 58 (2022), pp. 1–26

2022
[33]

J. W. Siegel and J. Xu , Sharp bounds on the approximation rates, metric entropy, an d n- widths of shallow neural networks , Foundations of Computational Mathematics, 24 (2024), pp. 481–537

2024
[34]

Gradient alignment in physics- informed neural networks: a second-order optimization perspective

S. W ang, A. K. Bhartari, B. Li, and P. Perdikaris , Gradient alignment in physics- informed neural networks: A second-order optimization per spective, arXiv preprint arXiv:2502.00604, (2025)

work page arXiv 2025
[35]

W ang, Y

S. W ang, Y. Teng, and P. Perdikaris , Understanding and mitigating gradient ﬂow patholo- gies in physics-informed neural networks , SIAM Journal on Scientiﬁc Computing, 43 (2021), pp. A3055–A3081

2021
[36]

W ang, X

S. W ang, X. Yu, and P. Perdikaris , When and why pinns fail to train: A neural tangent kernel perspective, Journal of Computational Physics, 449 (2022), p. 110768

2022
[37]

Weinan, C

E. Weinan, C. Ma, and L. Wu , A priori estimates of the population risk for two-layer neur al networks, arXiv preprint arXiv:1810.06397, (2018)

work page arXiv 2018
[38]

Weinan, C

E. Weinan, C. Ma, and L. Wu , The barron space and the ﬂow-induced function spaces for neural network models , Constructive Approximation, 55 (2022), pp. 369–406

2022
[39]

Xu , The ﬁnite neuron method and convergence analysis , arXiv preprint arXiv:2010.01458, (2020)

J. Xu , The ﬁnite neuron method and convergence analysis , arXiv preprint arXiv:2010.01458, (2020)

work page arXiv 2010
[40]

Y. Zhu, W. Deng, and R. Bi , A two-stage adaptive lifting pinn framework for solving vis cous approximations to hyperbolic conservation laws , arXiv preprint arXiv:2511.04490, (2025)

work page arXiv 2025