Nonparametric regression using deep neural networks with ReLU activation function , volume=

Johannes Schmidt-Hieber · 2020 · The Annals of Statistics · DOI 10.1214/19-aos1875

7 Pith papers cite this work, alongside 203 external citations. Polarity classification is still indexing.

7 Pith papers citing it

203 external citations · Crossref

open at publisher browse 7 citing papers

representative citing papers

Approximation Error Upper and Lower Bounds for H\"{o}lder Class with Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 8.0

A standard Transformer with O(ε^{-d0/α}) blocks can approximate any bounded d0-dimensional Hölder function of smoothness α to accuracy ε, but at least Ω(ε^{-d0/(4α)}) blocks are required.

A Deep Risk Estimator for Known Operator Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

A per-layer risk estimator for hybrid deep networks shows that replacing learned layers with known operators shrinks the bound and scales sample needs with the number of replaced parameters, validated on CT reconstruction.

Beyond the Independence Assumption: Finite-Sample Guarantees for Deep Q-Learning under $\tau$-Mixing

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

Finite-sample risk bounds for DQN with ReLU networks are extended to τ-mixing data, showing an extra dimensionality penalty in the convergence rate due to dependence.

Transformer Approximations from ReLUs

cs.LG · 2026-04-27 · unverdicted · novelty 7.0

A recipe translates ReLU approximations to softmax attention with target-specific economic bounds for multiplication, reciprocal computation, and min/max primitives.

A Semi-Supervised Kernel Two-Sample Test

stat.ML · 2026-05-03 · unverdicted · novelty 6.0

A semi-supervised kernel two-sample test integrates unlabeled covariate data to achieve asymptotic normality under the null, higher power than standard kernel tests, and consistency against fixed and local alternatives.

Scalable Gaussian process inference via neural feature maps

stat.ML · 2026-05-11 · unverdicted · novelty 5.0

Neural feature maps create expressive kernels that enable fast, scalable, and consistent exact Gaussian process inference for regression and classification.

Exploring climate change effects on concurrent floods and concurrent droughts via statistical deep learning

stat.AP · 2026-04-23 · unverdicted · novelty 5.0

The deep SPAR model shows concurrent floods and droughts becoming more likely in the Upper Danube by 2100 under high emissions, with changes in the dependence between catchments contributing substantially to the increase.

citing papers explorer

Showing 7 of 7 citing papers.

Approximation Error Upper and Lower Bounds for H\"{o}lder Class with Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 28
A standard Transformer with O(ε^{-d0/α}) blocks can approximate any bounded d0-dimensional Hölder function of smoothness α to accuracy ε, but at least Ω(ε^{-d0/(4α)}) blocks are required.
A Deep Risk Estimator for Known Operator Learning cs.LG · 2026-05-08 · unverdicted · none · ref 7
A per-layer risk estimator for hybrid deep networks shows that replacing learned layers with known operators shrinks the bound and scales sample needs with the number of replaced parameters, validated on CT reconstruction.
Beyond the Independence Assumption: Finite-Sample Guarantees for Deep Q-Learning under $\tau$-Mixing stat.ML · 2026-05-07 · unverdicted · none · ref 14
Finite-sample risk bounds for DQN with ReLU networks are extended to τ-mixing data, showing an extra dimensionality penalty in the convergence rate due to dependence.
Transformer Approximations from ReLUs cs.LG · 2026-04-27 · unverdicted · none · ref 5
A recipe translates ReLU approximations to softmax attention with target-specific economic bounds for multiplication, reciprocal computation, and min/max primitives.
A Semi-Supervised Kernel Two-Sample Test stat.ML · 2026-05-03 · unverdicted · none · ref 55
A semi-supervised kernel two-sample test integrates unlabeled covariate data to achieve asymptotic normality under the null, higher power than standard kernel tests, and consistency against fixed and local alternatives.
Scalable Gaussian process inference via neural feature maps stat.ML · 2026-05-11 · unverdicted · none · ref 55
Neural feature maps create expressive kernels that enable fast, scalable, and consistent exact Gaussian process inference for regression and classification.
Exploring climate change effects on concurrent floods and concurrent droughts via statistical deep learning stat.AP · 2026-04-23 · unverdicted · none · ref 75
The deep SPAR model shows concurrent floods and droughts becoming more likely in the Upper Danube by 2100 under high emissions, with changes in the dependence between catchments contributing substantially to the increase.

Nonparametric regression using deep neural networks with ReLU activation function , volume=

fields

years

verdicts

representative citing papers

citing papers explorer