arxiv: 2605.08672 · v1 · submitted 2026-05-09 · 🧮 math.ST · cs.NA· math.NA· stat.ML· stat.TH

Recognition: 1 theorem link

· Lean Theorem

Posterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs

Yulong Lu, Yuxuan Zhao

Pith reviewed 2026-05-12 01:14 UTC · model grok-4.3

classification 🧮 math.ST cs.NAmath.NAstat.MLstat.TH

keywords Bayesian PINNsposterior contraction rateselliptic partial differential equationsneural networksstatistical guaranteesrate-adaptive priorsuncertainty quantification

0 comments

The pith

Bayesian PINNs achieve posterior concentration around exact elliptic PDE solutions at near-minimax rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that Bayesian physics-informed neural networks for elliptic PDEs have posteriors that concentrate around the true solution at a rate close to the minimax optimal rate. It uses a specially constructed prior on the neural network weights that adapts to the unknown smoothness of the solution. This is shown for cases with noisy measurements inside the domain and on the boundary for non-homogeneous Dirichlet conditions. The result gives statistical guarantees for using these networks in solving PDEs with uncertainty quantification.

Core claim

Assuming the PDE has a strong solution in a Hölder space and with a suitably constructed prior on the neural network weights, the posterior distribution concentrates around the exact solution at a near-minimax rate. The prior is rate-adaptive, so the posterior contracts at an almost optimal rate without knowledge of the smoothness level of the exact solution.

What carries the argument

The posterior contraction rate analysis for Bayesian PINNs, driven by a rate-adaptive prior on the neural network weights.

If this is right

The Bayesian PINN provides a statistically valid way to quantify uncertainty in PDE solutions.
The approach works without prior knowledge of the solution's smoothness.
It handles noisy data from both interior points and the boundary.
Extends to a general class of elliptic PDEs with non-homogeneous Dirichlet boundary conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This suggests Bayesian neural networks could be effective for other PDE types if similar contraction results hold.
Practical implementations might benefit from using such adaptive priors to improve reliability.
Could connect to nonparametric Bayesian statistics where rate-adaptive priors are studied for regression.
Testable by checking contraction rates on synthetic elliptic PDE problems with varying smoothness.

Load-bearing premise

The PDE admits a strong solution in a Hölder space and the prior on the neural network weights is suitably constructed.

What would settle it

If for an elliptic PDE with a known strong Hölder solution, the posterior of the Bayesian PINN does not concentrate at near-minimax rate under noisy data, the claim would be falsified.

read the original abstract

We study the posterior contraction rate of Bayesian Physics-Informed Neural Networks (PINNs) for solving a general class of elliptic partial differential equations (PDEs). We focus on learning of the elliptic equation with a non-homogeneous Dirichlet boundary condition from independent and noisy measurements collected both inside the domain and on the boundary. Assuming that the PDE admits a strong solution in a H\"older space and using with a suitably constructed prior on the neural network weights, we prove that the posterior distribution concentrates around the exact solution at a near-minimax rate. Furthermore, the chosen prior is rate-adaptive: the posterior contracts at an (almost) optimal rate without prior knowledge of the smoothness level of the exact solution. Our results provide statistical guarantees for uncertainty quantification of PDEs via Bayesian PINNs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims an adaptive posterior contraction rate for Bayesian PINNs on elliptic PDEs from noisy data, but the key prior is only labeled suitable without enough detail to check the adaptivity.

read the letter

The main takeaway is a claimed posterior contraction result for Bayesian PINNs solving elliptic PDEs that reaches near-minimax rates and adapts to unknown smoothness of the solution. It works with noisy measurements both inside the domain and on the boundary, which is a realistic setup compared to cleaner data assumptions in some earlier analyses. The extension from non-Bayesian PINN error bounds to a full Bayesian contraction statement is the clearest step forward here, and it could give some backing for uncertainty estimates in these models. The target rate matches what nonparametric theory would suggest for Hölder classes, so the benchmark is sensible. The Hölder assumption on the PDE solution is standard and not the sticking point. The soft spot is the prior. The result rests on a suitably constructed prior over neural network weights that is supposed to deliver rate-adaptivity without knowing the smoothness level in advance. General Bayesian nonparametric arguments need explicit control on prior mass near the truth at the right scale and on the entropy of the sieve, uniformly over a range of smoothness parameters. Calling the prior suitable does not show those conditions hold, and without the explicit form or the small-ball bounds it is hard to confirm the adaptivity mechanism actually works. This leaves the central technical claim under-supported from what is presented. The paper is aimed at researchers working on statistical guarantees for physics-informed methods or on nonparametric Bayes for PDEs. Someone looking for theoretical support for Bayesian PINNs in scientific computing could get value from the direction, but only if the prior construction is filled in. I would send it to peer review so the details on the prior and the contraction proof can be checked directly.

Referee Report

2 major / 2 minor

Summary. The paper proves posterior contraction rates for Bayesian physics-informed neural networks (PINNs) solving elliptic PDEs from noisy interior and boundary measurements. Under the assumption that the PDE has a strong solution in a Hölder space, and using a suitably constructed prior on the neural network weights, the authors show that the posterior concentrates around the true solution at a near-minimax rate ε_n that is adaptive to the unknown Hölder smoothness index α without requiring knowledge of α in advance.

Significance. If the central claims hold, the result supplies rigorous statistical guarantees for uncertainty quantification with Bayesian PINNs, a practically popular method that has so far lacked nonparametric posterior contraction theory. The rate-adaptivity is a notable strength, as it aligns with general Bayesian nonparametric requirements (prior mass and entropy conditions) while remaining within the PINN architecture.

major comments (2)

[§3, Theorem 3.1] §3, Theorem 3.1 (main contraction result): the statement invokes a 'suitably constructed prior' on the NN weights to achieve the small-ball probability lower bound Π{θ : ||f_θ - f_0|| ≤ ε_n} ≳ exp(-C n ε_n²) uniformly over α in a compact interval, but the explicit form of the prior (distribution on depth, width, weight variances, or mixture over architectures) is not exhibited. Without this construction, the adaptivity mechanism cannot be verified independently of the claim.
[§4] §4 (proof of the prior mass condition): the entropy and small-ball arguments appear to rely on the prior satisfying both (i) sufficient mass near the true solution and (ii) controlled covering numbers of the sieve, yet the derivation does not supply the concrete parameter choices (e.g., variance scaling with n or depth growth) that would make these bounds hold simultaneously for unknown α.

minor comments (2)

[§2] Notation for the Hölder ball and the neural network function class f_θ should be introduced once in §2 and used consistently; several instances switch between C^α and the NN parameterization without cross-reference.
[Abstract and §1] The abstract and introduction both state the rate is 'near-minimax' and 'almost optimal'; a precise statement of the logarithmic factors or the exact exponent (e.g., n^{-α/(2α+d)} up to logs) would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and for the positive evaluation of its potential significance. We address the two major comments below.

read point-by-point responses

Referee: [§3, Theorem 3.1] §3, Theorem 3.1 (main contraction result): the statement invokes a 'suitably constructed prior' on the NN weights to achieve the small-ball probability lower bound Π{θ : ||f_θ - f_0|| ≤ ε_n} ≳ exp(-C n ε_n²) uniformly over α in a compact interval, but the explicit form of the prior (distribution on depth, width, weight variances, or mixture over architectures) is not exhibited. Without this construction, the adaptivity mechanism cannot be verified independently of the claim.

Authors: We agree that an explicit description of the prior is required for independent verification of the uniform small-ball probability and the resulting rate adaptivity. The manuscript states that a suitably constructed prior is used and sketches in the proof of Theorem 3.1 how the small-ball condition is obtained uniformly over α ∈ [α_min, α_max], but the concrete form (architecture distribution, variance schedule, or mixture weights) is not displayed in the main text. In the revision we will add an explicit hierarchical prior construction in Section 3: a discrete mixture over candidate depths and widths (with depth growing as O(log n) and width polynomial in n), combined with a Gaussian weight prior whose variance is scaled as n^{-2α/(2α+d)} for each candidate smoothness level, with the mixing measure chosen so that the prior mass condition holds simultaneously for all α in a compact interval without knowledge of the true α. This makes the adaptivity mechanism fully verifiable. revision: yes
Referee: [§4] §4 (proof of the prior mass condition): the entropy and small-ball arguments appear to rely on the prior satisfying both (i) sufficient mass near the true solution and (ii) controlled covering numbers of the sieve, yet the derivation does not supply the concrete parameter choices (e.g., variance scaling with n or depth growth) that would make these bounds hold simultaneously for unknown α.

Authors: The referee correctly notes that the entropy and small-ball arguments in Section 4 are stated at a general level. While the proof invokes standard entropy bounds for neural-network sieves and derives the prior-mass lower bound from the construction, the specific scaling rules (variance decay rate, depth/width growth) that simultaneously satisfy both conditions uniformly over unknown α are not written out explicitly. In the revised manuscript we will insert the concrete parameter choices: depth L_n ∼ log n, width W_n ∼ n^{d/(2α+d)}, and weight variance σ_n² ∼ n^{-2α/(2α+d)}, together with a mixing distribution over a finite grid of candidate α values whose spacing is fine enough to preserve the near-minimax rate. These choices ensure the entropy integral and small-ball probability hold uniformly, and we will verify that they do not require prior knowledge of α. revision: yes

Circularity Check

0 steps flagged

No circularity: standard Bayesian nonparametric proof under explicit assumptions

full rationale

The derivation establishes posterior contraction for Bayesian PINNs by applying general nonparametric Bayesian theory to a suitably constructed prior on neural network weights, under the assumption that the elliptic PDE admits a strong Hölder solution. The rate-adaptive property follows from verifying the standard prior mass and entropy conditions for unknown smoothness, which the paper claims to satisfy via its prior construction. No step reduces by definition to the target rate, no fitted input is relabeled as prediction, and no load-bearing claim rests solely on unverified self-citation. The result is self-contained against the stated assumptions and external theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the PDE has a strong solution in a Hölder space and on the existence of a suitably constructed prior on neural network weights.

axioms (1)

domain assumption The PDE admits a strong solution in a Hölder space
Explicitly stated as the setting in which the contraction result is proved.

pith-pipeline@v0.9.0 · 5437 in / 1123 out tokens · 43573 ms · 2026-05-12T01:14:43.389167+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
we prove that the posterior distribution concentrates around the exact solution at a near-minimax rate... using a suitably constructed prior on the neural network weights

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

[1]

Jiao, Y ., Lai, Y ., Li, D., Lu, X., Wang, F., Wang, Y ., and Yang, J. Z. A rate of convergence of physics informed neural networks for the linear second order elliptic pdes. arXiv preprint arXiv:2109.01780,

work page arXiv
[2]

Posterior contraction for sparse neural networks in besov spaces with intrinsic dimensionality.arXiv preprint arXiv:2506.19144,

Lee, K., Lin, L., Park, J., and Jeong, S. Posterior contraction for sparse neural networks in besov spaces with intrinsic dimensionality.arXiv preprint arXiv:2506.19144,

work page arXiv
[3]

Machine learning for elliptic pdes: Fast rate generalization bound, neural scaling law and minimax optimality.arXiv preprint arXiv:2110.06897, 2021a

Lu, Y ., Chen, H., Lu, J., Ying, L., and Blanchet, J. Machine learning for elliptic pdes: Fast rate generalization bound, neural scaling law and minimax optimality.arXiv preprint arXiv:2110.06897, 2021a. Lu, Y ., Lu, J., and Wang, M. A priori generalization analysis of the deep ritz method for solving high dimensional elliptic partial differential equatio...

work page arXiv
[4]

Raissi, P

ISSN 0021-9991. doi: https://doi.org/10.1016/j.jcp.2018.10.045. Schmidt-Hieber, J. Nonparametric regression using deep neural networks with relu activation function.The Annals of Statistics, 48(4):1875,

work page doi:10.1016/j.jcp.2018.10.045 2018
[5]

Shukla, K., Zou, Z., Kaeufer, T., Triantafyllou, M., and Karniadakis, G. E. Uncertainty quantification in pinns for turbulent flows: Bayesian inference and repulsive ensembles.arXiv preprint arXiv:2604.17156,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

On the estimation rate of bayesian pinn for inverse problems.arXiv preprint arXiv:2406.14808,

Sun, Y ., Mukherjee, D., and Atchade, Y . On the estimation rate of bayesian pinn for inverse problems.arXiv preprint arXiv:2406.14808,

work page arXiv
[7]

2009 , isbn =

ISBN 978-0-387-79051-0. doi: 10.1007/b13794. Vershynin, R.High-dimensional probability: An introduc- tion with applications in data science, volume

work page doi:10.1007/b13794
[8]

Z F \Fn p(n) u p(n) u∗ q(n) u q(n) u∗ dΠ(u) # =

i ≤P (n) u∗ Q(n) u∗ h Π(u∈ F n :∥ −div(A∇(u−u ∗)) +V(u−u ∗)∥n,1 > J M ϵn|D(n))(1−ϕ 1)1E i +P (n) u∗ Q(n) u∗ (1Ec) =P (n) u∗ Q(n) u∗   R ∥−div(A∇(u−u∗))+V(u−u ∗)∥n,1>JM ϵn p(n) u p(n) u∗ q(n) u q(n) u∗ dΠ(u) R p(n) u p(n) u∗ q(n) u q(n) u∗ dΠ(u) (1−ϕ 1)1E   +P (n) u∗ Q(n) u∗ (1Ec) ≤e− 1 36 nM2ϵ2 nJ2+ξnϵ2 n+(1+C1)nϵ2 n +C −1 1 (nϵ2 n)−1. (18) 12 Poste...

work page 2007
[9]

To see this, E[˜h2] =E |E[h]−h| 2 |E[h] +r| 2 ≤ ∥h∥∞E[h] 2E[h]r ≤ C1 r =: σ2, Let U= max{σ, 2C|Ω| r }= max{ q C1 r , C1 r } ≥σ

Finally, we need to check that the second moment of ˜his uniformly bounded. To see this, E[˜h2] =E |E[h]−h| 2 |E[h] +r| 2 ≤ ∥h∥∞E[h] 2E[h]r ≤ C1 r =: σ2, Let U= max{σ, 2C|Ω| r }= max{ q C1 r , C1 r } ≥σ . Hence the conditions of Lemma C.7 are satisfied. Take {Xi}n i=1 as independent uniform random variables onΩ, and we have with probability at least1−e −t...

work page 2009
[10]

+ 4σ3(y+ 1)−3σ 3(y) + 4]. C.3. Auxiliary tools for verifying Lemma 4.1 Lemma C.6((Lu et al., 2021a)).Letθ 1, θ2 ∈Θ(L, W, S, B). Then we have sup x∈Ω |fθ1(x)−f θ2(x)| ∨sup x∈Ω ∥∇fθ1(x)− ∇f θ2(x)∥∞ ∨sup x∈Ω ∥∇2fθ1(x)− ∇ 2fθ2(x)∥∞ =O(W 3L−1 −1 2 (B∨d) 5·3L−1 −1 2 )· ∥θ 1 −θ 2∥∞. 28 Posterior Concentration of Physics-Informed Neural Networks for Elliptic PDEs...

work page 2021
[11]

sup f∈F 1 n nX i=1 f(X i) # + √2vnt n + U t 3n , where vn = 2UE

Let n∈N and takeX 1, . . . , Xn i.i.d. ∼µ. Then for anyt >0, we have sup f∈F 1 n nX i=1 f(X i)≤2E " sup f∈F 1 n nX i=1 f(X i) # + 4U t 3n + r 2σ2t n with probability at least1−e −t. Proof.By (Gin ´e & Nickl, 2021, Theorem 3.3.9), the following holds with probability at least1−e −t sup f∈F 1 n nX i=1 f(X i)<E " sup f∈F 1 n nX i=1 f(X i) # + √2vnt n + U t 3...

work page 2021
[12]

sup f∈F nX i=1 f(X i) # t+ √ 2nσ2t≤E

Then using √ a+b≤ √a+ √ band AM-GM inequality, we have √ 2vnt≤ vuut4UE " sup f∈F nX i=1 f(X i) # t+ √ 2nσ2t≤E " sup f∈F nX i=1 f(X i) # +U t+ √ 2nσ2t. Plugging this back and rearranging terms yield the result. Lemma C.8((Maurer, 2016, Corollary 4)).Let X be any set, and (x1, . . . , xn)⊂ X n. Let F be a class of functions f:X →ℓ 2 and leth i :ℓ 2 →RbeC L-...

work page 2016