Recognition: 1 theorem link
· Lean TheoremPosterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs
Pith reviewed 2026-05-12 01:14 UTC · model grok-4.3
The pith
Bayesian PINNs achieve posterior concentration around exact elliptic PDE solutions at near-minimax rates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Assuming the PDE has a strong solution in a Hölder space and with a suitably constructed prior on the neural network weights, the posterior distribution concentrates around the exact solution at a near-minimax rate. The prior is rate-adaptive, so the posterior contracts at an almost optimal rate without knowledge of the smoothness level of the exact solution.
What carries the argument
The posterior contraction rate analysis for Bayesian PINNs, driven by a rate-adaptive prior on the neural network weights.
If this is right
- The Bayesian PINN provides a statistically valid way to quantify uncertainty in PDE solutions.
- The approach works without prior knowledge of the solution's smoothness.
- It handles noisy data from both interior points and the boundary.
- Extends to a general class of elliptic PDEs with non-homogeneous Dirichlet boundary conditions.
Where Pith is reading between the lines
- This suggests Bayesian neural networks could be effective for other PDE types if similar contraction results hold.
- Practical implementations might benefit from using such adaptive priors to improve reliability.
- Could connect to nonparametric Bayesian statistics where rate-adaptive priors are studied for regression.
- Testable by checking contraction rates on synthetic elliptic PDE problems with varying smoothness.
Load-bearing premise
The PDE admits a strong solution in a Hölder space and the prior on the neural network weights is suitably constructed.
What would settle it
If for an elliptic PDE with a known strong Hölder solution, the posterior of the Bayesian PINN does not concentrate at near-minimax rate under noisy data, the claim would be falsified.
read the original abstract
We study the posterior contraction rate of Bayesian Physics-Informed Neural Networks (PINNs) for solving a general class of elliptic partial differential equations (PDEs). We focus on learning of the elliptic equation with a non-homogeneous Dirichlet boundary condition from independent and noisy measurements collected both inside the domain and on the boundary. Assuming that the PDE admits a strong solution in a H\"older space and using with a suitably constructed prior on the neural network weights, we prove that the posterior distribution concentrates around the exact solution at a near-minimax rate. Furthermore, the chosen prior is rate-adaptive: the posterior contracts at an (almost) optimal rate without prior knowledge of the smoothness level of the exact solution. Our results provide statistical guarantees for uncertainty quantification of PDEs via Bayesian PINNs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proves posterior contraction rates for Bayesian physics-informed neural networks (PINNs) solving elliptic PDEs from noisy interior and boundary measurements. Under the assumption that the PDE has a strong solution in a Hölder space, and using a suitably constructed prior on the neural network weights, the authors show that the posterior concentrates around the true solution at a near-minimax rate ε_n that is adaptive to the unknown Hölder smoothness index α without requiring knowledge of α in advance.
Significance. If the central claims hold, the result supplies rigorous statistical guarantees for uncertainty quantification with Bayesian PINNs, a practically popular method that has so far lacked nonparametric posterior contraction theory. The rate-adaptivity is a notable strength, as it aligns with general Bayesian nonparametric requirements (prior mass and entropy conditions) while remaining within the PINN architecture.
major comments (2)
- [§3, Theorem 3.1] §3, Theorem 3.1 (main contraction result): the statement invokes a 'suitably constructed prior' on the NN weights to achieve the small-ball probability lower bound Π{θ : ||f_θ - f_0|| ≤ ε_n} ≳ exp(-C n ε_n²) uniformly over α in a compact interval, but the explicit form of the prior (distribution on depth, width, weight variances, or mixture over architectures) is not exhibited. Without this construction, the adaptivity mechanism cannot be verified independently of the claim.
- [§4] §4 (proof of the prior mass condition): the entropy and small-ball arguments appear to rely on the prior satisfying both (i) sufficient mass near the true solution and (ii) controlled covering numbers of the sieve, yet the derivation does not supply the concrete parameter choices (e.g., variance scaling with n or depth growth) that would make these bounds hold simultaneously for unknown α.
minor comments (2)
- [§2] Notation for the Hölder ball and the neural network function class f_θ should be introduced once in §2 and used consistently; several instances switch between C^α and the NN parameterization without cross-reference.
- [Abstract and §1] The abstract and introduction both state the rate is 'near-minimax' and 'almost optimal'; a precise statement of the logarithmic factors or the exact exponent (e.g., n^{-α/(2α+d)} up to logs) would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and for the positive evaluation of its potential significance. We address the two major comments below.
read point-by-point responses
-
Referee: [§3, Theorem 3.1] §3, Theorem 3.1 (main contraction result): the statement invokes a 'suitably constructed prior' on the NN weights to achieve the small-ball probability lower bound Π{θ : ||f_θ - f_0|| ≤ ε_n} ≳ exp(-C n ε_n²) uniformly over α in a compact interval, but the explicit form of the prior (distribution on depth, width, weight variances, or mixture over architectures) is not exhibited. Without this construction, the adaptivity mechanism cannot be verified independently of the claim.
Authors: We agree that an explicit description of the prior is required for independent verification of the uniform small-ball probability and the resulting rate adaptivity. The manuscript states that a suitably constructed prior is used and sketches in the proof of Theorem 3.1 how the small-ball condition is obtained uniformly over α ∈ [α_min, α_max], but the concrete form (architecture distribution, variance schedule, or mixture weights) is not displayed in the main text. In the revision we will add an explicit hierarchical prior construction in Section 3: a discrete mixture over candidate depths and widths (with depth growing as O(log n) and width polynomial in n), combined with a Gaussian weight prior whose variance is scaled as n^{-2α/(2α+d)} for each candidate smoothness level, with the mixing measure chosen so that the prior mass condition holds simultaneously for all α in a compact interval without knowledge of the true α. This makes the adaptivity mechanism fully verifiable. revision: yes
-
Referee: [§4] §4 (proof of the prior mass condition): the entropy and small-ball arguments appear to rely on the prior satisfying both (i) sufficient mass near the true solution and (ii) controlled covering numbers of the sieve, yet the derivation does not supply the concrete parameter choices (e.g., variance scaling with n or depth growth) that would make these bounds hold simultaneously for unknown α.
Authors: The referee correctly notes that the entropy and small-ball arguments in Section 4 are stated at a general level. While the proof invokes standard entropy bounds for neural-network sieves and derives the prior-mass lower bound from the construction, the specific scaling rules (variance decay rate, depth/width growth) that simultaneously satisfy both conditions uniformly over unknown α are not written out explicitly. In the revised manuscript we will insert the concrete parameter choices: depth L_n ∼ log n, width W_n ∼ n^{d/(2α+d)}, and weight variance σ_n² ∼ n^{-2α/(2α+d)}, together with a mixing distribution over a finite grid of candidate α values whose spacing is fine enough to preserve the near-minimax rate. These choices ensure the entropy integral and small-ball probability hold uniformly, and we will verify that they do not require prior knowledge of α. revision: yes
Circularity Check
No circularity: standard Bayesian nonparametric proof under explicit assumptions
full rationale
The derivation establishes posterior contraction for Bayesian PINNs by applying general nonparametric Bayesian theory to a suitably constructed prior on neural network weights, under the assumption that the elliptic PDE admits a strong Hölder solution. The rate-adaptive property follows from verifying the standard prior mass and entropy conditions for unknown smoothness, which the paper claims to satisfy via its prior construction. No step reduces by definition to the target rate, no fitted input is relabeled as prediction, and no load-bearing claim rests solely on unverified self-citation. The result is self-contained against the stated assumptions and external theory.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The PDE admits a strong solution in a Hölder space
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearwe prove that the posterior distribution concentrates around the exact solution at a near-minimax rate... using a suitably constructed prior on the neural network weights
Reference graph
Works this paper leans on
- [1]
-
[2]
Lee, K., Lin, L., Park, J., and Jeong, S. Posterior contraction for sparse neural networks in besov spaces with intrinsic dimensionality.arXiv preprint arXiv:2506.19144,
-
[3]
Lu, Y ., Chen, H., Lu, J., Ying, L., and Blanchet, J. Machine learning for elliptic pdes: Fast rate generalization bound, neural scaling law and minimax optimality.arXiv preprint arXiv:2110.06897, 2021a. Lu, Y ., Lu, J., and Wang, M. A priori generalization analysis of the deep ritz method for solving high dimensional elliptic partial differential equatio...
-
[4]
ISSN 0021-9991. doi: https://doi.org/10.1016/j.jcp.2018.10.045. Schmidt-Hieber, J. Nonparametric regression using deep neural networks with relu activation function.The Annals of Statistics, 48(4):1875,
-
[5]
Shukla, K., Zou, Z., Kaeufer, T., Triantafyllou, M., and Karniadakis, G. E. Uncertainty quantification in pinns for turbulent flows: Bayesian inference and repulsive ensembles.arXiv preprint arXiv:2604.17156,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
On the estimation rate of bayesian pinn for inverse problems.arXiv preprint arXiv:2406.14808,
Sun, Y ., Mukherjee, D., and Atchade, Y . On the estimation rate of bayesian pinn for inverse problems.arXiv preprint arXiv:2406.14808,
-
[7]
ISBN 978-0-387-79051-0. doi: 10.1007/b13794. Vershynin, R.High-dimensional probability: An introduc- tion with applications in data science, volume
-
[8]
Z F \Fn p(n) u p(n) u∗ q(n) u q(n) u∗ dΠ(u) # =
i ≤P (n) u∗ Q(n) u∗ h Π(u∈ F n :∥ −div(A∇(u−u ∗)) +V(u−u ∗)∥n,1 > J M ϵn|D(n))(1−ϕ 1)1E i +P (n) u∗ Q(n) u∗ (1Ec) =P (n) u∗ Q(n) u∗ R ∥−div(A∇(u−u∗))+V(u−u ∗)∥n,1>JM ϵn p(n) u p(n) u∗ q(n) u q(n) u∗ dΠ(u) R p(n) u p(n) u∗ q(n) u q(n) u∗ dΠ(u) (1−ϕ 1)1E +P (n) u∗ Q(n) u∗ (1Ec) ≤e− 1 36 nM2ϵ2 nJ2+ξnϵ2 n+(1+C1)nϵ2 n +C −1 1 (nϵ2 n)−1. (18) 12 Poste...
work page 2007
-
[9]
Finally, we need to check that the second moment of ˜his uniformly bounded. To see this, E[˜h2] =E |E[h]−h| 2 |E[h] +r| 2 ≤ ∥h∥∞E[h] 2E[h]r ≤ C1 r =: σ2, Let U= max{σ, 2C|Ω| r }= max{ q C1 r , C1 r } ≥σ . Hence the conditions of Lemma C.7 are satisfied. Take {Xi}n i=1 as independent uniform random variables onΩ, and we have with probability at least1−e −t...
work page 2009
-
[10]
+ 4σ3(y+ 1)−3σ 3(y) + 4]. C.3. Auxiliary tools for verifying Lemma 4.1 Lemma C.6((Lu et al., 2021a)).Letθ 1, θ2 ∈Θ(L, W, S, B). Then we have sup x∈Ω |fθ1(x)−f θ2(x)| ∨sup x∈Ω ∥∇fθ1(x)− ∇f θ2(x)∥∞ ∨sup x∈Ω ∥∇2fθ1(x)− ∇ 2fθ2(x)∥∞ =O(W 3L−1 −1 2 (B∨d) 5·3L−1 −1 2 )· ∥θ 1 −θ 2∥∞. 28 Posterior Concentration of Physics-Informed Neural Networks for Elliptic PDEs...
work page 2021
-
[11]
sup f∈F 1 n nX i=1 f(X i) # + √2vnt n + U t 3n , where vn = 2UE
Let n∈N and takeX 1, . . . , Xn i.i.d. ∼µ. Then for anyt >0, we have sup f∈F 1 n nX i=1 f(X i)≤2E " sup f∈F 1 n nX i=1 f(X i) # + 4U t 3n + r 2σ2t n with probability at least1−e −t. Proof.By (Gin ´e & Nickl, 2021, Theorem 3.3.9), the following holds with probability at least1−e −t sup f∈F 1 n nX i=1 f(X i)<E " sup f∈F 1 n nX i=1 f(X i) # + √2vnt n + U t 3...
work page 2021
-
[12]
sup f∈F nX i=1 f(X i) # t+ √ 2nσ2t≤E
Then using √ a+b≤ √a+ √ band AM-GM inequality, we have √ 2vnt≤ vuut4UE " sup f∈F nX i=1 f(X i) # t+ √ 2nσ2t≤E " sup f∈F nX i=1 f(X i) # +U t+ √ 2nσ2t. Plugging this back and rearranging terms yield the result. Lemma C.8((Maurer, 2016, Corollary 4)).Let X be any set, and (x1, . . . , xn)⊂ X n. Let F be a class of functions f:X →ℓ 2 and leth i :ℓ 2 →RbeC L-...
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.