Approximation and learning of anisotropic and mixed smooth functions by deep ReLU neural networks

Jun Fan; Yunfei Yang

arxiv: 2605.31152 · v1 · pith:LAK3ZVV2new · submitted 2026-05-29 · 📊 stat.ML · cs.LG· cs.NA· math.NA

Approximation and learning of anisotropic and mixed smooth functions by deep ReLU neural networks

Yunfei Yang , Jun Fan This is my paper

Pith reviewed 2026-06-28 21:08 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NAmath.NA

keywords ReLU neural networksanisotropic Besov spacesmixed smoothnessfunction approximationstatistical learningcurse of dimensionalityapproximation rates

0 comments

The pith

Deep ReLU neural networks approximate functions with direction-dependent smoothness at rate O((WL)^{-2 ilde s}) where ilde s is the harmonic mean of the smoothness values.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes approximation rates for deep ReLU networks on anisotropic Besov spaces that replace the usual d in the exponent with the mean smoothness ilde s. This extends prior work on isotropic Besov spaces to cases where smoothness differs along each coordinate. Similar rates hold for mixed smoothness spaces up to logs, and the results imply optimal learning rates for these classes. A reader would care if they want to understand how neural networks handle functions that are not equally smooth in all directions without suffering the full curse of dimensionality.

Core claim

For functions in the anisotropic Besov space B^{s}_{q,r}([0,1]^d) with mean smoothness ilde s > 1/q - 1/p, deep ReLU networks with total parameters WL achieve L^p approximation error of order (WL)^{-2 ilde s}. For mixed smooth Besov spaces the rate is (WL)^{-2s} up to logs. The paper also gives rates for compositions and shows these yield minimax optimal estimation rates.

What carries the argument

The mean smoothness ilde s = (sum_{i=1}^d s_i^{-1})^{-1} that governs the approximation rate O((WL)^{-2 ilde s}) for anisotropic Besov functions by ReLU networks of width W and depth L.

If this is right

Composition of anisotropic Besov functions can be approximated at rates derived from the individual rates.
Deep ReLU networks achieve minimax optimal rates up to logs for a wide range of smooth function classes including anisotropic and mixed ones.
The approximation rates overcome the curse of dimensionality when the mean smoothness is fixed.
Learning rates for nonparametric estimation using deep networks match the minimax rates for these spaces.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These results suggest testing the rates on functions like those arising in anisotropic PDEs or images with directional features.
Similar analysis might apply to other network types or activations if the proof relies on specific ReLU properties.
Extensions could consider adaptive networks that exploit the anisotropy automatically.

Load-bearing premise

The function to be approximated must lie in the anisotropic Besov space or mixed smooth Besov space and satisfy the given condition on the mean smoothness.

What would settle it

Construct a specific function in B^s_{q,r} with ilde s > 1/q-1/p and show that no ReLU network with given W and L can achieve error smaller than C (WL)^{-2 ilde s} for some C.

read the original abstract

This paper studies how efficiently deep ReLU neural networks can approximate and learn smooth functions. When the error is measured in $L^p([0,1]^d)$ norm and the approximator is a network with width $W$ and depth $L$, recent works have proven the supper approximation rate $\mathcal{O}((WL)^{-2s/d})$ for Besov space $\mathcal{B}^s_{q,r}([0,1]^d)$ under the Sobolev embedding condition $s/d>1/q-1/p$. In order to overcome the curse of dimensionality in this rate, we extent this result to anisotropic and mixed smooth function classes. We establish the approximation rate $\mathcal{O}((WL)^{-2\tilde{s}})$ for anisotropic Besov space $\mathcal{B}^{\boldsymbol{s}}_{q,r}([0,1]^d)$ with anisotropic smoothness $\boldsymbol{s}=(s_1,\dots,s_d)$ under the embedding condition $\tilde{s} > 1/q-1/p$, where the mean smoothness $\tilde{s} = (\sum_{i=1}^d s_i^{-1})^{-1}$. For mixed smooth Besov space $\mathcal{MB}^s_{q,r}([0,1]^d)$ with mixed smoothness $s>1/q-1/p$, we show that the approximation rate $\mathcal{O}((WL)^{-2s})$ holds up to logarithmic factors. Using these results, we also derive approximation bounds for the composition of anisotropic Besov functions. As an application, it is shown that deep ReLU neural networks can achieve minimax optimal rates up to logarithmic factors for a wide range of smooth function classes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Paper extends ReLU approximation rates to anisotropic and mixed Besov spaces via mean smoothness, delivering dimension-independent rates under explicit embedding conditions.

read the letter

The main thing to know is that this work takes the known isotropic Besov rate O((WL)^{-2s/d}) and generalizes it to anisotropic Besov spaces with rate O((WL)^{-2 ilde s}), where ilde s is the harmonic mean of the per-coordinate smoothness values, and to mixed-smoothness classes with the same rate up to logs. They also cover compositions and minimax optimality.

The extension is direct but non-trivial because it requires adapting the wavelet or spline constructions to the anisotropic norm and then emulating those with ReLU networks. The embedding condition ilde s > 1/q-1/p is stated clearly as necessary, which keeps the claim honest. The stress-test note confirms the proofs follow standard techniques without internal contradictions or hidden assumptions.

The soft spots are modest. This is a generalization rather than a new method, so the technical lift is moderate. The abstract supplies no derivation steps, but the full text apparently does via the usual route, and nothing in the stress-test flags a gap. If the log factors or the composition bounds turn out to have loose constants, that would be normal for this style of paper.

The paper is for readers working on neural network approximation theory or high-dimensional statistical estimation who need explicit rates for non-isotropic smoothness. It is not reshaping the field but supplies usable bounds.

I would send it to peer review. The claims are specific, the conditions are upfront, and the extension is verifiable.

Referee Report

0 major / 4 minor

Summary. The paper extends approximation theory for deep ReLU networks to anisotropic Besov spaces B^s_{q,r}([0,1]^d) and mixed-smoothness Besov spaces MB^s_{q,r}([0,1]^d). It claims that networks of width W and depth L achieve the rate O((WL)^{-2 ilde s}) in L^p norm for anisotropic smoothness vector s=(s1,...,sd) under the embedding condition ilde s > 1/q-1/p, where ilde s is the harmonic mean of the si; an analogous result holds for mixed smoothness up to logarithmic factors. The work also derives composition bounds and shows that the rates are minimax optimal (up to logs) for a range of smooth classes.

Significance. If the stated rates hold, the results meaningfully extend prior isotropic Besov approximation theorems by replacing the isotropic smoothness s/d with a mean smoothness that can be substantially larger than the minimum si when smoothness varies across coordinates. This directly addresses the curse of dimensionality for function classes that arise in applications with directional or product structure. The explicit embedding conditions and the derivation of minimax optimality via known lower bounds are strengths.

minor comments (4)

Abstract: 'supper approximation rate' is a typographical error and should read 'super approximation rate'.
Abstract: 'we extent this result' should be 'we extend this result'.
The abstract states the embedding condition ilde s > 1/q-1/p but does not indicate where in the manuscript the necessity of this condition is proved or whether it is only sufficient; a brief pointer to the relevant theorem would improve clarity.
Notation for the mean smoothness ilde s = (sum s_i^{-1})^{-1} is introduced in the abstract; the manuscript should confirm that this definition is used consistently in all subsequent statements of the rate (including the mixed-smoothness case).

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation extends isotropic Besov approximation rates O((WL)^{-2s/d}) to anisotropic B^s_{q,r} with rate O((WL)^{-2 ilde s}) where ilde s = (sum 1/s_i)^{-1} and to mixed smoothness cases, using standard wavelet/spline constructions adapted to the norms followed by ReLU emulation. The mean smoothness ilde s is the conventional harmonic-mean definition for anisotropic spaces, not a fitted or self-defined quantity. The embedding condition ilde s > 1/q-1/p is stated explicitly as necessary. No equations reduce by construction, no fitted parameters are renamed as predictions, and citations to prior isotropic results are external rather than self-referential load-bearing chains. The central claims remain independent of the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard functional-analytic properties of Besov spaces and Sobolev-type embeddings; no new free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

axioms (1)

standard math Standard embedding theorems for (anisotropic/mixed) Besov spaces into L^p
Invoked to obtain the condition ilde s > 1/q-1/p that enables the stated rates.

pith-pipeline@v0.9.1-grok · 5842 in / 1241 out tokens · 20062 ms · 2026-06-28T21:08:04.887141+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 linked inside Pith

[1]

Higher order approximation rates for ReLU CNNs in Korobov spaces.arXiv: 2501.11275,

Yuwen Li and Guozhi Zhang. Higher order approximation rates for ReLU CNNs in Korobov spaces.arXiv: 2501.11275,

Pith/arXiv arXiv
[2]

Marco Signoretto, Lieven De Lathauwer, and Johan A. K. Suykens. Learning tensors in reproducing kernel Hilbert spaces with multilinear spectral penalties.arXiv: 1310.4977,

Pith/arXiv arXiv

[1] [1]

Higher order approximation rates for ReLU CNNs in Korobov spaces.arXiv: 2501.11275,

Yuwen Li and Guozhi Zhang. Higher order approximation rates for ReLU CNNs in Korobov spaces.arXiv: 2501.11275,

Pith/arXiv arXiv

[2] [2]

Marco Signoretto, Lieven De Lathauwer, and Johan A. K. Suykens. Learning tensors in reproducing kernel Hilbert spaces with multilinear spectral penalties.arXiv: 1310.4977,

Pith/arXiv arXiv