Nonparametric regression using deep neural networks with ReLU activation function

Johannes Schmidt-Hieber · 2020 · The Annals of Statistics · DOI 10.1214/19-aos1875

12 Pith papers cite this work, alongside 203 external citations. Polarity classification is still indexing.

12 Pith papers citing it

203 external citations · Crossref

open at publisher browse 12 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Provable Data Scaling Law for Meta Learning via Complexity Minimization

stat.ML · 2026-06-01 · unverdicted · novelty 7.0

A novel complexity minimization meta-learning framework provably demonstrates that few-shot adaptation error decreases as meta-training data volume increases.

A Deep Risk Estimator for Known Operator Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

A per-layer risk estimator for hybrid deep networks shows that replacing learned layers with known operators shrinks the bound and scales sample needs with the number of replaced parameters, validated on CT reconstruction.

Approximation Error Upper and Lower Bounds for H\"{o}lder Class with Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

Transformers require Ω(ε^{-d0/(4α)}) to O(ε^{-d0/α}) blocks to approximate bounded d0-dimensional Hölder-α functions to accuracy ε.

Beyond the Independence Assumption: Finite-Sample Guarantees for Deep Q-Learning under $\tau$-Mixing

stat.ML · 2026-05-07 · unverdicted · novelty 7.0

Finite-sample risk bounds for DQN with ReLU networks are extended to τ-mixing data, showing an extra dimensionality penalty in the convergence rate due to dependence.

Transformer Approximations from ReLUs

cs.LG · 2026-04-27 · unverdicted · novelty 7.0

A recipe translates ReLU approximations to softmax attention with target-specific economic bounds for multiplication, reciprocal computation, and min/max primitives.

Second-Order Path Kernel Interpolation Formulas in Machine Learning

cs.LG · 2026-06-05 · unverdicted · novelty 6.0

Derives second-order path-kernel interpolation formulas for gradient descent, SGD, and momentum training, adding curvature terms and a concentration estimate around the expected prediction.

A Semi-Supervised Kernel Two-Sample Test

stat.ML · 2026-05-03 · unverdicted · novelty 6.0

A semi-supervised kernel two-sample test integrates unlabeled covariate data to achieve asymptotic normality under the null, higher power than standard kernel tests, and consistency against fixed and local alternatives.

Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability

stat.ML · 2025-09-27 · unverdicted · novelty 6.0

SDAMI detects interactions in high-dimensional data via an Effect Footprint principle and models them using sparsity, group lasso, and dedicated deep subnetworks for improved interpretability.

Scalable Gaussian process inference via neural feature maps

stat.ML · 2026-05-11 · unverdicted · novelty 5.0

Neural feature maps create expressive kernels that enable fast, scalable, and consistent exact Gaussian process inference for regression and classification.

Exploring climate change effects on concurrent floods and concurrent droughts via statistical deep learning

stat.AP · 2026-04-23 · unverdicted · novelty 5.0

The deep SPAR model shows concurrent floods and droughts becoming more likely in the Upper Danube by 2100 under high emissions, with changes in the dependence between catchments contributing substantially to the increase.

Can Explanations Improve Recommendations? Evidence from Prediction-Informed Explanations

cs.IR · 2025-02-24 · unverdicted · novelty 5.0

RecPIE jointly optimizes recommendation predictions and LLM-generated natural-language explanations via alternating training and reinforcement learning, yielding 3-4% accuracy gains and higher human preference on Google Maps POI data.

Scalable Bayesian Spatial Mixture Modelling for Remote Sensing Image Segmentation

stat.ME · 2026-06-28 · unverdicted · novelty 4.0

POTTERS extends the Potts model with generalized spatial dependence and external priors for Bayesian remote sensing image segmentation via variational inference, without needing target-region labels.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Approximation Error Upper and Lower Bounds for H\"{o}lder Class with Transformers cs.LG · 2026-05-08 · unverdicted · none · ref 28 · 2 links
Transformers require Ω(ε^{-d0/(4α)}) to O(ε^{-d0/α}) blocks to approximate bounded d0-dimensional Hölder-α functions to accuracy ε.

Nonparametric regression using deep neural networks with ReLU activation function

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer