arxiv: 2603.20645 · v2 · submitted 2026-03-21 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Diffusion Model for Manifold Data: Score Decomposition, Curvature, and Statistical Complexity

Zixuan Zhang , Kaixuan Huang , Tuo Zhao , Mengdi Wang , Minshuo Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-15 06:39 UTC · model grok-4.3

classification 💻 cs.LG

keywords diffusion modelsmanifold datascore estimationstatistical complexityRiemannian manifoldgenerative modelingcurvature

0 comments

The pith

When data lies on a low-dimensional manifold, diffusion model statistical rates depend on intrinsic dimension and curvature rather than ambient dimension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that diffusion models applied to data concentrated on smooth Riemannian manifolds exhibit score functions that decompose according to noise levels and manifold geometry. This decomposition reveals how curvature influences the score structure, permitting neural network approximations whose complexity scales with the intrinsic dimension. As a result, the error bounds for score estimation and subsequent distribution learning are controlled by the manifold's intrinsic dimension and curvature. A reader would care because this provides a theoretical basis for why these models succeed on high-dimensional data that is effectively low-dimensional, such as images or point clouds.

Core claim

By modeling data as samples from a smooth Riemannian manifold, the analysis reveals crucial decompositions of score functions in diffusion models under different levels of injected noise, highlighting the interplay of manifold curvature with the structures in the score function. This enables an efficient neural network approximation to the score function and provides statistical rates for score estimation and distribution learning that are governed by the intrinsic dimension of data and the manifold curvature.

What carries the argument

Decomposition of score functions under varying noise levels on Riemannian manifolds, capturing the interaction between curvature and score structures.

If this is right

Score estimation achieves rates depending on intrinsic dimension instead of full ambient dimension.
Distribution learning rates are similarly improved and controlled by manifold curvature.
Neural network approximation of the score becomes efficient due to the decomposition.
Curvature directly affects the statistical complexity of learning the data distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This suggests diffusion models naturally exploit manifold structure in real-world data without needing explicit manifold learning steps.
The framework could be tested on synthetic data lying on manifolds with tunable curvature to isolate its effect.
Similar score decompositions might apply to other generative approaches that add noise progressively.

Load-bearing premise

Data samples are drawn from a smooth Riemannian manifold whose curvature interacts with the score function in a decomposable way under varying noise levels.

What would settle it

Empirical results showing that estimation error does not decrease as intrinsic dimension is reduced or that curvature has no measurable impact on rates would contradict the claim.

Figures

Figures reproduced from arXiv: 2603.20645 by Kaixuan Huang, Mengdi Wang, Minshuo Chen, Tuo Zhao, Zixuan Zhang.

**Figure 1.** Figure 1: Demonstration of tangent space TxM, geodesic, and exponential map based on x ∈ M and v ∈ T d xM. As a result, for any x ∈ M, the exponential map restricted to a ball of radius inj(M) in T d xM is a welldefined diffeomorphism, which validates (Ux, Exp−1 x ) as a chart when Ux = {Expx (v) | v ∈ Bd TxM(0, inj(M))}. It is convenient to denote the inverse of the exponential map as Logx , the log map. Controlli… view at source ↗

**Figure 2.** Figure 2: Score decomposition for linear subspace and general manifold. For linear subspace data, [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of the neural network architecture. A time switching network aggregates sub [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

read the original abstract

Diffusion models have become a leading framework in generative modeling, yet their theoretical understanding -- especially for high-dimensional data concentrated on low-dimensional structures -- remains incomplete. This paper investigates how diffusion models learn such structured data, focusing on two key aspects: statistical complexity and influence of data geometric properties. By modeling data as samples from a smooth Riemannian manifold, our analysis reveals crucial decompositions of score functions in diffusion models under different levels of injected noise. We also highlight the interplay of manifold curvature with the structures in the score function. These analyses enable an efficient neural network approximation to the score function, built upon which we further provide statistical rates for score estimation and distribution learning. Remarkably, the obtained statistical rates are governed by the intrinsic dimension of data and the manifold curvature. These results advance the statistical foundations of diffusion models, bridging theory and practice for generative modeling on manifolds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper decomposes the score on manifolds to get rates depending on intrinsic dimension and curvature, but the clean separation from diffusion-time remainders looks shaky.

read the letter

The main takeaway is that they model data on a Riemannian manifold, split the score function into a Euclidean-like term plus curvature corrections at different noise scales, and then claim statistical rates for score estimation and distribution learning that scale with intrinsic dimension d and curvature bounds rather than ambient dimension. That decomposition and the resulting dimension reduction in the rates are the actual new pieces relative to standard diffusion analyses that ignore the manifold structure. They also sketch how this lets them approximate the score with neural nets more efficiently under those geometric assumptions. If the bounds hold without hidden dependencies, it would be a useful step for explaining why diffusion works on structured data like images or simulations with lower sample needs. The derivations appear formally set up with manifold heat kernels and score matching, which is a solid direction. The soft spot is exactly the one in the stress-test note: short-time expansions of the heat kernel on manifolds with bounded sectional curvature K still carry remainder terms of order t^{3/2} that involve derivatives of curvature and the injectivity radius. The paper needs to show these remainders are absorbed into the neural-net approximation error uniformly over t without picking up extra factors of ||∇K|| or 1/inj-rad. If that control is missing or only holds under stronger unstated assumptions, the final rates will not depend cleanly on just d and K. The abstract and claims are stated sharply enough that the work is worth refereeing, but any review should focus on verifying the uniform control over diffusion time in the proofs. This is for readers already working on geometric generative models or manifold learning; a practitioner looking for plug-and-play rates would get limited value until the bounds are tightened. I would send it to review.

Referee Report

1 major / 2 minor

Summary. The paper models data as samples from a smooth Riemannian manifold and analyzes diffusion models on such data. It derives decompositions of the score function under different noise levels, examines the interplay between manifold curvature and score structure, constructs efficient neural network approximations to the score, and obtains statistical rates for score estimation and distribution learning. These rates are claimed to depend only on the intrinsic dimension d and curvature bounds K, rather than ambient dimension.

Significance. If the score decompositions hold with curvature corrections uniformly controlled in diffusion time t and the rates follow without extra factors from curvature derivatives, the work would provide valuable statistical foundations for diffusion models on manifold data, explaining their effectiveness on structured high-dimensional data via intrinsic geometry.

major comments (1)

[§3 (score decomposition and curvature analysis)] The central claim that statistical rates depend only on intrinsic dimension d and curvature bounds K requires that the score decomposition (likely around the heat-kernel expansion in §3) absorbs all curvature-dependent remainders. However, short-time heat-kernel expansions include t^{3/2} terms involving derivatives of sectional curvature; if these are not explicitly bounded or absorbed into the neural approximation error, the final rates will depend on ||∇K|| or injectivity radius, undermining the stated dependence.

minor comments (2)

[Theorem statements] Clarify the precise regularity assumptions on the manifold (e.g., bounds on injectivity radius, smoothness class of the metric) in the statement of the main theorems.
[Introduction or §4] Add a brief comparison table or discussion contrasting the derived rates with existing Euclidean diffusion results to highlight the improvement from intrinsic dimension.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying a key technical detail in the curvature analysis. We address the major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [§3 (score decomposition and curvature analysis)] The central claim that statistical rates depend only on intrinsic dimension d and curvature bounds K requires that the score decomposition (likely around the heat-kernel expansion in §3) absorbs all curvature-dependent remainders. However, short-time heat-kernel expansions include t^{3/2} terms involving derivatives of sectional curvature; if these are not explicitly bounded or absorbed into the neural approximation error, the final rates will depend on ||∇K|| or injectivity radius, undermining the stated dependence.

Authors: We agree that the parametrix expansion of the heat kernel contains higher-order terms whose coefficients involve derivatives of the sectional curvature. In §3 we retain the leading O(t) terms in the score decomposition and bound the remainder using the given curvature bound K together with a uniform lower bound on the injectivity radius (implicit in our smoothness assumptions). To close the argument rigorously for the t^{3/2} remainder, an explicit bound on ||∇K|| is required. We will therefore add the standing assumption that the manifold has bounded first covariant derivatives of the curvature (a standard hypothesis in Riemannian geometry that remains independent of ambient dimension). With this addition the error terms are absorbed into the neural-network approximation budget, and the final statistical rates continue to depend only on the intrinsic dimension d and the curvature quantities (now including their first derivatives). We will revise the statement of Theorem 3.1, the surrounding discussion in §3, and the list of assumptions to make this explicit. This is a clarification rather than a change to the core results. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core claims rest on modeling data as samples from a smooth Riemannian manifold and deriving score decompositions under varying noise levels, followed by neural network approximation bounds and statistical rates. These steps rely on standard heat kernel expansions and manifold geometry assumptions that are external to the fitted quantities; no load-bearing step reduces a prediction to a parameter fit on the same data, nor does any central result collapse to a self-citation or self-definition by construction. The statistical rates are presented as consequences of the intrinsic dimension and curvature bounds via explicit approximation arguments, keeping the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that data is supported on a smooth Riemannian manifold and that the score admits a useful decomposition at different noise scales; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Data is sampled from a smooth Riemannian manifold
Stated in the abstract as the modeling choice enabling the score decomposition and curvature analysis.

pith-pipeline@v0.9.0 · 5456 in / 1181 out tokens · 20643 ms · 2026-05-15T06:39:13.378176+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the score function decomposes as a weighted sum of localized components... additional interaction term that reflects the influence of curvature (Lemma 3.1, Lemma 3.2, E_2(t))
IndisputableMonolith/Foundation/DimensionForcing.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

statistical rates... governed by the intrinsic dimension d and the manifold curvature (reach τ)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 6 internal anchors

[1]

Convergence of diffusion models under the manifold hypothesis in high-dimensions.arXiv preprint arXiv:2409.18804,

Iskander Azangulov, George Deligiannidis, and Judith Rousseau. Convergence of diffusion models under the manifold hypothesis in high-dimensions.arXiv preprint arXiv:2409.18804,

work page arXiv
[2]

Nearlyd-linear conver- gence bounds for diffusion models via stochastic localization.arXiv preprint arXiv:2308.03686,

22 Joe Benton, Valentin De Bortoli, Arnaud Doucet, and George Deligiannidis. Nearlyd-linear conver- gence bounds for diffusion models via stochastic localization.arXiv preprint arXiv:2308.03686,

work page arXiv
[3]

Generative modeling with denoising auto- encoders and langevin sampling.arXiv preprint arXiv:2002.00107,

Adam Block, Youssef Mroueh, and Alexander Rakhlin. Generative modeling with denoising auto- encoders and langevin sampling.arXiv preprint arXiv:2002.00107,

work page arXiv 2002
[4]

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Saptarshi Chakraborty, Quentin Berthet, and Peter L Bartlett. Generalization properties of score-matching diffusion models for intrinsically low-dimensional data.arXiv preprint arXiv:2603.03700,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions.arXiv preprint arXiv:2209.11215, 2022c

Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru R Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions.arXiv preprint arXiv:2209.11215, 2022c. Sitan Chen, Sinho Chewi, Holden Lee, Yuanzhi Li, Jianfeng Lu, and Adil Salim. The probability flow ode is provably fast.Advances in Neural ...

work page arXiv
[6]

Convergence of denoising diffusion models under the manifold hypothesis

Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis. arXiv preprint arXiv:2208.05314,

work page arXiv
[7]

From optimal score matching to optimal sampling.arXiv preprint arXiv:2409.07032,

Zehao Dou, Subhodh Kotekal, Zhehao Xu, and Harrison H Zhou. From optimal score matching to optimal sampling.arXiv preprint arXiv:2409.07032,

work page arXiv
[8]

Diffusion models and the manifold hypothesis: Log-domain smoothing is geometry adaptive

Tyler Farghly, Peter Potaptchik, Samuel Howard, George Deligiannidis, and Jakiw Pidstrigach. Diffusion models and the manifold hypothesis: Log-domain smoothing is geometry adaptive. arXiv preprint arXiv:2510.02305,

work page arXiv
[9]

Scaling Laws for Autoregressive Generative Modeling

Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Hee- woo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, and Sam McCandlish. Scaling laws for autoregressive generative modeling.arXiv preprint arXiv:2010.14701,

work page internal anchor Pith review Pith/arXiv arXiv 2010
[10]

Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality.arXiv preprint arXiv:2410.18784,

Zhihan Huang, Yuting Wei, and Yuxin Chen. Denoising diffusion probabilistic models are optimally adaptive to unknown low dimensionality.arXiv preprint arXiv:2410.18784,

work page arXiv
[11]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361,

work page internal anchor Pith review Pith/arXiv arXiv 2001
[12]

Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation

Dongjun Kim, Seungjae Shin, Kyungwoo Song, Wanmo Kang, and Il-Chul Moon. Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation. arXiv preprint arXiv:2106.05527,

work page arXiv
[13]

Auto-Encoding Variational Bayes

24 DP Kingma and M Welling. Auto-encoding variational bayes. iclr 2014 2014.arXiv preprint arXiv:1312.6114,

work page internal anchor Pith review Pith/arXiv arXiv 2014
[14]

Diffwave: A versatile diffusion model for audio synthesis.arXiv preprint arXiv:2009.09761, 2020

Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis.arXiv preprint arXiv:2009.09761,

work page arXiv 2009
[15]

When scores learn geometry: Rate separations under the manifold hypothesis.arXiv preprint arXiv:2509.24912,

Xiang Li, Zebang Shen, Ya-Ping Hsieh, and Niao He. When scores learn geometry: Rate separations under the manifold hypothesis.arXiv preprint arXiv:2509.24912,

work page arXiv
[16]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution.arXiv preprint arXiv:2310.16834,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Towards understanding text hallucination of diffusion models via local generation bias.arXiv preprint arXiv:2503.03595,

Rui Lu, Runzhe Wang, Kaifeng Lyu, Xitai Jiang, Gao Huang, and Mengdi Wang. Towards understanding text hallucination of diffusion models via local generation bias.arXiv preprint arXiv:2503.03595,

work page arXiv
[18]

Large Language Diffusion Models

Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models.arXiv preprint arXiv:2502.09992,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Score-based diffusion models via stochastic differential equations– a technical tutorial.arXiv preprint arXiv:2402.07487,

Wenpin Tang and Hanyang Zhao. Score-based diffusion models via stochastic differential equations– a technical tutorial.arXiv preprint arXiv:2402.07487,

work page arXiv
[20]

Larry Wasserman.All of nonparametric statistics

doi: 10.1093/imaiai/iaad018. Larry Wasserman.All of nonparametric statistics. Springer Science & Business Media,

work page doi:10.1093/imaiai/iaad018
[21]

Generalization error bound for denoising score matching under relaxed manifold assumption.arXiv preprint arXiv:2502.13662,

Konstantin Yakovlev and Nikita Puchkin. Generalization error bound for denoising score matching under relaxed manifold assumption.arXiv preprint arXiv:2502.13662,

work page arXiv
[22]

For term (♠), we use the standard identity for the squared distance to a set with positive reach [Leobacher and Steinicke, 2020] onα tK(M, τ), (♠) =− x−Π M(x, t) ht

Taking logarithm and gradient with respect toxonp t gives rise to ∇logp t(x) =∇ x − ∥x−Π M(x, t)∥2 2ht | {z } (♠) +g(x, t), (A.5) where g(x, t) =∇ x log Z x0∈M exp −∥ΠM(x, t)−α tx0∥2 + 2⟨x−Π M(x, t),Π M(x, t)−α tx0⟩ 2ht dPdata(x0). For term (♠), we use the standard identity for the squared distance to a set with positive reach [Leobacher and Steinicke, 20...

work page 2020
[23]

The approximation error ofs 1 is provided in Lemma B.1 (Appendix B.1.3)

implementing local polynomials using neural networks (Appendix B.1.2). The approximation error ofs 1 is provided in Lemma B.1 (Appendix B.1.3). The approximation ofs 2 follows a similar procedure and is postponed to Lemma B.2 (Appendix B.1.4). B.1.1 Local Polynomial Construction By Remark 2.8, we equipMwith an atlas{(U k,Log k)}CM k=1, whereU k =Exp k(Bd(...

work page 1987
[24]

Then by Lemma F.7 in Oko et al

log(1/h t)∥x−α tx∗ t ∥ τ αt ·max j=0,1,...,γ ( ∥x−α tx∗ t ∥j (ht)j/2 ) (log(1/ϵ) +dlog(1/h t)/2)2γ τ γ ϵγ/β +ϵ . Then by Lemma F.7 in Oko et al. [2023], there exists a feedforward network ¯ϕwith no more than O(log2(1/ϵ)) layers, width bounded byO(log 3(1/ϵ)), at most non-zeroO(log 4(1/ϵ)) neurons and weight parameters bounded byO(ϵ −2), such that ¯ϕ(¯s1(x...

work page 2023
[25]

Recall that we decompose theL 2 approximation error of ¯ssmall as as ∥¯ssmall(x, t)− ∇logp t(x)∥2 L2(Pt) = Z x∈Kt(ϵ) + Z x∈RD\Kt(ϵ) ! ∥¯ssmall(x, t)− ∇logp t(x)∥2 pt(x) dx

log(1/h t)∥x−α tx∗ t ∥ τ αt ·max j=0,1,...,γ ( ∥x−α tx∗ t ∥j (ht)j/2 ) (log(1/ϵ) +dlog(1/h t)/2)2γ τ γ ϵγ/β +ϵ , The remaining part repeats the argument in the beginning of Appendix B.1 with more details. Recall that we decompose theL 2 approximation error of ¯ssmall as as ∥¯ssmall(x, t)− ∇logp t(x)∥2 L2(Pt) = Z x∈Kt(ϵ) + Z x∈RD\Kt(ϵ) ! ∥¯ssmall(x, t)− ∇l...

work page 2023
[26]

EX∼P data[ℓ0(X;bs)]− 1 n nX i=1 ℓ0(Xi;bs)−aR(bs) # +aE D[R(bs)] (i) =E D, ¯D

By the definition ofI k in (B.31), we can rewrite the above inequality as CMX k=1 Ndet(x−x k) exp − ∥x−Π k(x, t)∥2 2ht Ik(x, t)−s 3(x, t) ≤2ϵ 0.(B.36) Moreover, we applyh t ≥ϵ 2/β to (B.33), which yields CMX k=1 Ndet(x−x k) exp − ∥x−α txk∥2 2ht Polyk(x, ht, αt)−I k(x, ht, αt) = eO ( p log(1/ϵ0) +B) γ0 τ γ0 hd/2 t ϵγ0/β +ϵ 0 ! . (B.37) Next, Lemma D.17 der...

work page 2023
[27]

Moreover, by Proposition 6.1 in Niyogi et al

It follows that − ⟨x−α tx∗ t , x∗ t −x 0⟩ = D x−α tx∗ t , Z τ 0 [γ′(s)−γ ′(0)] ds E ≤ ∥x−α tx∗ t ∥ · Z τ 0 [γ′(s)−γ ′(0)] ds , where the last inequality uses Cauchy-Schwartz inequality. Moreover, by Proposition 6.1 in Niyogi et al. [2008], we have∥γ ′′(s)∥ ≤1/τwhereτis the reach of manifoldM. Then we arrive at − ⟨x−α tx∗ t , x∗ t −x 0⟩ ≤ ∥x−α tx∗ t ∥ · Z ...

work page 2008
[28]

Lemma D.17(Network Implementation for Tensor Product).LetC≥1

and its coefficients and Lipschitz constant can be respectively bounded by Cpoly ht α2 t d/2 4L2 Log(log(1/ϵ1) +dlog(1/h t)/2) (j+d)/2+γ ′−1 , where we plug in ∆(t) = 2L Log p (ht/α2 t )(log(1/ϵ1) +dlog(1/h t)/2). Lemma D.17(Network Implementation for Tensor Product).LetC≥1. Given anyj∈ {1, . . . , γ−1}, andϵ × ∈(0,1), there exists a ReLU feedforward netw...

work page 2023
[29]

By Lemma F.1, F.2 and F.3 in Oko et al

= Dj X I=1 Mult(T1,I ,T 2,I), we have Nj ×(T1,T 2)− D v⊗j 1 , v⊗j 2 E ≤ Dj X I=1 Mult(T1,I ,T 2,I)−v ⊗j 1,I v⊗j 2,I ≤D jϵ× + 2CDjϵerror. By Lemma F.1, F.2 and F.3 in Oko et al. [2023],N j × can be exactly implemented by a ReLU feedforward network withO(log(1/ϵ ×) + logC) layers, widthO(D j),O(D j(log(1/ϵ×) + logC)) non-zero neurons, and weight parameters ...

work page 2023
[30]

71 Lemma D.18(Network Implementation for Projection Term).Given anyj∈ {0, . . . , γ−1}, and ϵproj ∈(0,1), there exists a tensor consisting of ReLU feedforward networks,N j proj ={N j proj,I }I=1,2,...,D j, such that for any timet∈[t 0, T], sup x∈αtK(M,τ) Nj proj,I (x, ht, αt)− [x−α tx∗ t ]⊗j I hj/2 t ≤h d/2 t ϵproj. Here each networkN j proj,I (x, ht, αt)...

work page 2023
[31]

Notably, givenx 0 ∈U k,T k(x, x0, t) is linear inx∈R D, whileD k(x, x0, t) is quadratic in the low-dimensional representationP ⊤ k (x−α txk)∈R d

2 . Notably, givenx 0 ∈U k,T k(x, x0, t) is linear inx∈R D, whileD k(x, x0, t) is quadratic in the low-dimensional representationP ⊤ k (x−α txk)∈R d. Since the integral regionU k in Poly k(x;t) is independent ofx, Poly k(x;t) after performing the integration is a polynomial with the form Polyk(x;t) = γ0X l=0 S−1X j=0 1 hl+j t 2(l+j)X p=0 αp t X |θ|≤l,|γ|≤...

work page 2000
[32]

The averaged Taylor polynomial can approximatef and its partial derivatives well

Forf∈C α(Ω), we define its Sobolev norm [Brenner, 2008, Definition 1.3.1] as∥f∥ W α,∞(Ω) = max|θ|≤α ∥∂θf∥ L∞(Ω) withθa multi-index. The averaged Taylor polynomial can approximatef and its partial derivatives well. Specifically, Lemma F.2 provides an approximation guarantee in Sobolev norm. Lemma F.2(Bramble-Hilbert, Chapter 4.1 in Brenner [2008]).Supposef...

work page 2008