arxiv: 2605.09757 · v1 · submitted 2026-05-10 · 💻 cs.LG · stat.ML

Recognition: 2 theorem links

· Lean Theorem

On Uniform Error Bounds for Kernel Regression under Non-Gaussian Noise

Johannes Teutsch , Oleksii Molodchyk , Marion Leibold , Timm Faulwasser , Armin Lederer

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:22 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords kernel regressionuniform error boundsnon-Gaussian noiseprobabilistic boundsnon-asymptotic analysissafe controluncertainty quantification

0 comments

The pith

Kernel regression now has uniform error bounds for non-Gaussian and correlated noise

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops non-asymptotic probabilistic uniform error bounds for kernel regression that extend beyond the common restriction to independent sub-Gaussian noise. The new bounds cover sub-Gaussian, bounded, sub-exponential, and variance or moment-bounded noise distributions and remain valid when noise terms are correlated across samples. This matters for safety-critical uses because it supplies finite-sample guarantees on the maximum deviation between the estimated and true function without forcing unrealistic noise assumptions. The authors verify the bounds by comparing the size of the uncertainty sets they induce and by embedding them in a safe control task, where they produce less conservative results than earlier bounds.

Core claim

The central claim is that novel non-asymptotic probabilistic uniform error bounds can be established for kernel-based regression estimators under a broad class of non-Gaussian noise distributions, including sub-Gaussian, bounded, sub-exponential, and variance or moment-bounded noise, and that these bounds continue to hold when the noise is correlated or uncorrelated. The bounds are shown to be tighter than prior results by direct comparison of the induced uncertainty regions and by their use in a safe control application.

What carries the argument

Novel non-asymptotic probabilistic uniform error bounds obtained via concentration arguments adapted to generalized noise classes in the kernel regression setting.

If this is right

Uncertainty quantification for kernel regression estimates becomes possible under realistic non-Gaussian and dependent noise without asymptotic approximations.
The induced uncertainty regions are smaller than those produced by bounds limited to independent sub-Gaussian noise.
Safe control applications can enforce constraints with reduced conservatism when using these bounds.
Finite-sample guarantees apply directly to both correlated and uncorrelated observation noise.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same concentration techniques may transfer to other nonparametric estimators if analogous tail conditions on the noise can be verified.
In practice the tighter bounds could relax safety margins in learning-based controllers operating with heavy-tailed sensor noise.
The explicit comparison on safe control performance indicates that bound tightness translates into measurable improvements in closed-loop behavior.

Load-bearing premise

The noise process must belong to one of the listed classes such as sub-Gaussian or moment-bounded, and the kernel regression problem must satisfy the regularity conditions required for the uniform concentration arguments.

What would settle it

A controlled simulation with sub-exponential or moment-bounded noise in which the observed supremum deviation of the kernel regressor from the true function exceeds the derived bound with probability larger than the claimed value.

Figures

Figures reproduced from arXiv: 2605.09757 by Armin Lederer, Johannes Teutsch, Marion Leibold, Oleksii Molodchyk, Timm Faulwasser.

**Figure 1.** Figure 1: Comparison of the posterior variance σ 2 t (x) from (5) and the kernel- and data-dependent terms ϱ 2 ∥ht(x)∥ 2 2 and ϱ 2 ∥ht(x)∥ 2∞, leveraged in the proposed bounds from Theorem 3.2 using k(x, x′ ) = exp −(x − x ′ ) 2 and ϱ = 0.1. Plots 0 2 4 6 8 10 10−4 10−2 100 Input x t = 4 0 2 4 6 8 10 Input x t = 10 Input data x1, . . . , xt σ 2 t (x) ρ 2∥ht(x)∥ 2 2 ρ 2∥ht(x)∥ 2∞ 0 2 4 6 8 10 Input x t = 100 0 2 4 … view at source ↗

**Figure 2.** Figure 2: Comparison of confidence regions and error bands in a regression example under sub-Gaussian noise using k(x, x′ ) = exp −(x − x ′ ) 2 , σM = ϱ = 0.1, δ = 0.001, and B = 5. The black crosses represent the collected data points (x1, y1), . . . ,(xt, yt). ever, under heavy-tailed noise, the square loss in (3) can render the estimate (4) sensitive to outliers (Huber, 1981), although proper regularization can… view at source ↗

**Figure 3.** Figure 3: Comparison of the size of the uncertainty region via the integral of the probabilistic uniform error bound ηt (6) over the input domain X = [0, 10] for an increasing number of data points t, with input dimension nx = 1, noise level m ∈ {0.01, 0.1, 1}, and kernel k(x, x′ ) = exp −∥x − x ′ ∥ 2 2 . The shaded areas show the 5%–95% percentile range over 100 Monte Carlo data collection runs. Plots 100 101 102… view at source ↗

**Figure 4.** Figure 4: Comparison of the size of the uncertainty region via the integral of the probabilistic uniform error bound ηt (6) over the input domain X = [0, 10]nx for an increasing number of data points t, with input dimension nx ∈ {1, 2, 3}, noise level m = 0.1 and kernel k(x, x′ ) = exp −(x − x ′ ) 2 . The shaded areas show the 5%–95% percentile range over 100 Monte Carlo data collection runs. the number of data po… view at source ↗

**Figure 5.** Figure 5: Comparison of the success rate of the safe optimal control algorithm over the number of data points under bounded noise. Specifically when lots of data is available, the proposed bound from Theorem 3.2 outperforms the comparison methods. Plots 100 101 102 103 0% 25% 50% 75% 100% Number of data points t Success rate Abbasi-Yadkori (2013) Fiedler et al. (2021) Theorem 3.2(a) (proposed, SG) −2 −1 0 1 2 0 2 4 … view at source ↗

**Figure 6.** Figure 6: Comparison of the inferred cost over the state domain at the final learning instant (t = 1 000), resulting from the optimal safe control u ∗ t based on various error bounds. The proposed method yields the largest feasible region and lowest inferred costs. costs that result from perfect model knowledge. 5.3. Beyond Sub-Gaussian Noise We consider the setting from Section 5.2 again, but under i.i.d. chi-squa… view at source ↗

**Figure 9.** Figure 9: Comparison of the size of the uncertainty region via the integral of the probabilistic uniform error bound ηt (6) over the input domain X ⊂ R nx for the kernel k(x, x′ ) = exp −∥x − x ′ ∥ 2 2/l2 SE and increasing number of data points t with input dimension nx ∈ {1, 2, 3} and lengthscale lSE ∈ {1, 2, 3}. The shaded areas show the 5%- to 95% percentile range over 100 Monte Carlo data collection runs. and i… view at source ↗

**Figure 10.** Figure 10: Comparison of the size of the uncertainty region via the integral of the probabilistic uniform error bound ηt (6) over the input domain X = [0, r] for the kernel k(x, x′ ) = exp −(x − x ′ ) 2 and increasing number of data points t, with varying size r of the input domain. The shaded areas show the 5%- to 95% percentile range over 100 Monte Carlo data collection runs. similar entries in the vector ht(x) … view at source ↗

**Figure 11.** Figure 11: Comparison of the size of the uncertainty region via the integral of the probabilistic uniform error bound ηt (6) over the input domain X[0, r] and increasing number of data points t, with varying kernel k(x, x′ ) from (70) with lν = 1 and nx = 1. The shaded areas show the 5%- to 95% percentile range over 100 Monte Carlo data collection runs. Plots 100 101 102 103 100 101 Number of data points t R x∈X ηt(… view at source ↗

**Figure 12.** Figure 12: Comparison of the mean size of the uncertainty region via the integral of the probabilistic uniform error bound ηt = η SG t (see Theorem 3.2(a)) over the input domain X = [0, 10] for the kernel k(x, x′ ) = exp −(x − x ′ ) 2 and increasing number of data points t with varying discretization terms ∆SG t (see [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗

read the original abstract

Providing non-conservative uncertainty quantification for function estimates derived from noisy observations remains a fundamental challenge in statistical machine learning, particularly for applications in safety-critical domains. In this work, we propose novel non-asymptotic probabilistic uniform error bounds for kernel-based regression. Compared to related bounds in the literature that are restricted to (conditionally) independent sub-Gaussian noise, our bounds allow to consider a broad class of non-Gaussian distributions, such as sub-Gaussian, bounded, sub-exponential, and variance/moment-bounded noise. Moreover, our results apply to correlated and uncorrelated noise. We compare our proposed error bounds with existing results in terms of the induced uncertainty region and their performance in safe control, demonstrating the tightness of the proposed bounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends uniform kernel regression bounds to sub-exponential and moment-bounded noise plus some correlated cases, but the correlation extension rests on conditions that are not made explicit enough.

read the letter

The main takeaway is that this work gives non-asymptotic uniform error bounds for kernel regression that cover a wider set of noise distributions than the usual sub-Gaussian independent setting, including bounded, sub-exponential, variance-bounded, and moment-bounded noise, and it claims to handle correlated noise as well. The authors motivate it from safety-critical control and show that the new bounds produce smaller uncertainty regions than prior results in a control example, which is a concrete plus. They also compare the bounds directly on the induced uncertainty sets rather than just stating abstract rates. That part is useful and addresses a real practical gap where conservative sub-Gaussian assumptions limit what you can certify. The comparisons and the control demonstration are the strongest parts of the paper. The soft spot is the treatment of correlated noise. Uniform bounds require controlling the supremum of an empirical process over the kernel features. Standard tail bounds give the right scaling only when the noise terms are independent or satisfy a specific weak-dependence condition that makes the variance of the average decay like 1/n. The abstract lists correlated noise as covered but does not name a dependence measure such as alpha-mixing coefficients or martingale differences. If the full paper supplies a precise condition with the right rate, the claim is fine. If it does not, the probability bound may pick up extra logarithmic factors or fail to stay non-asymptotic. The stress-test note is accurate on this point. This paper is aimed at people working on robust uncertainty quantification for kernel methods in control or estimation. A reader who already knows the sub-Gaussian literature will see the extension clearly and can judge whether the new noise classes matter for their setting. It deserves peer review because the claim is specific, the application is relevant, and the comparisons are checkable, even if the dependence assumptions need to be stated more sharply in revision.

Referee Report

1 major / 0 minor

Summary. The manuscript claims to derive novel non-asymptotic probabilistic uniform error bounds for kernel-based regression. These bounds extend beyond the standard setting of conditionally independent sub-Gaussian noise to a broad class of non-Gaussian distributions (sub-Gaussian, bounded, sub-exponential, variance/moment-bounded) and apply to both correlated and uncorrelated noise. The work compares the induced uncertainty regions against existing bounds and demonstrates tightness via applications to safe control.

Significance. If the derivations hold, the results would provide a useful extension of uniform concentration tools for kernel regression to more general noise settings, which is relevant for uncertainty quantification in safety-critical domains. Handling correlated noise and multiple tail behaviors addresses a practical gap, though the strength depends on whether the dependence structures are fully characterized.

major comments (1)

[Abstract and main theorem on correlated noise] Abstract and main theorem on correlated noise: the claim that the uniform bounds apply directly to correlated noise is load-bearing for the 'broad class' contribution, yet no explicit weak-dependence condition (e.g., summable alpha-mixing coefficients, martingale difference structure, or beta-mixing rate) is stated. Standard empirical-process arguments for the supremum over the kernel feature map require such conditions to guarantee that the deviation probability remains non-asymptotic and decays at the claimed rate; arbitrary correlation can inflate the variance term and invalidate the bound.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comment below and will incorporate the necessary clarifications in the revised version.

read point-by-point responses

Referee: [Abstract and main theorem on correlated noise] Abstract and main theorem on correlated noise: the claim that the uniform bounds apply directly to correlated noise is load-bearing for the 'broad class' contribution, yet no explicit weak-dependence condition (e.g., summable alpha-mixing coefficients, martingale difference structure, or beta-mixing rate) is stated. Standard empirical-process arguments for the supremum over the kernel feature map require such conditions to guarantee that the deviation probability remains non-asymptotic and decays at the claimed rate; arbitrary correlation can inflate the variance term and invalidate the bound.

Authors: We agree with the referee that the claim regarding correlated noise requires explicit weak-dependence conditions for the non-asymptotic uniform bounds to hold rigorously. Standard empirical process techniques indeed demand control on the dependence to ensure the supremum deviation concentrates at the stated rate. In the original manuscript, the noise classes (including moment-bounded and sub-exponential) were intended to encompass certain dependent structures, but we did not state the required mixing or martingale conditions explicitly. In the revision, we will add precise assumptions to the main theorem and assumptions section—for instance, requiring the noise process to be strongly mixing with summable alpha-mixing coefficients or to satisfy a martingale difference property with respect to a suitable filtration. These will be reflected in the abstract as well. This addresses the concern without altering the core technical contributions. revision: yes

Circularity Check

0 steps flagged

No circularity detected; bounds derived from standard concentration extensions

full rationale

The paper's central contribution is the derivation of non-asymptotic uniform error bounds for kernel regression under extended noise classes (sub-Gaussian, sub-exponential, moment-bounded, and correlated cases). No step in the provided abstract or described derivation reduces the target bound to a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation chain. The argument applies known tail inequalities and empirical process tools to new settings without importing uniqueness theorems or ansatzes from prior self-work that would collapse the claim. The derivation remains self-contained against external benchmarks such as standard sub-Gaussian concentration results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the noise belongs to one of several standard tail classes; no new entities are introduced and no free parameters are mentioned in the abstract.

axioms (2)

domain assumption The noise distribution belongs to one of the classes sub-Gaussian, bounded, sub-exponential, or variance/moment-bounded
This is the key modeling assumption that enables the uniform bounds to hold for non-Gaussian noise.
domain assumption The kernel regression problem satisfies standard regularity conditions on the reproducing kernel Hilbert space and the sampling points
Implicit background assumption required for any uniform error bound in kernel regression.

pith-pipeline@v0.9.0 · 5431 in / 1458 out tokens · 43076 ms · 2026-05-12T02:22:57.399457+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose novel non-asymptotic probabilistic uniform error bounds for kernel-based regression... sub-Gaussian, bounded, sub-exponential, and variance/moment-bounded noise... correlated and uncorrelated noise.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Lemma 3.1... ηt(x) = B σ̃t(x) + ηMt(x) ... concentration inequalities for the respective distribution class

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

2013 , school =

Online learning for linearly parametrized control problems , author=. 2013 , school =

work page 2013
[2]

Stochastic model predictive control for sub-

Ao, Yunke and K. Stochastic model predictive control for sub-. arXiv preprint arXiv:2503.08795 , year=

work page arXiv
[3]

Machine learning , volume=

Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , author=. Machine learning , volume=. 2023 , publisher=

work page 2023
[4]

International Conference on Machine Learning , pages=

On kernelized multi-armed bandits , author=. International Conference on Machine Learning , pages=

work page
[5]

Advances in Neural Information Processing Systems , volume=

Bayesian optimization under heavy-tailed payoffs , author=. Advances in Neural Information Processing Systems , volume=

work page
[6]

Advances in Neural Information Processing Systems , volume=

Deep reinforcement learning in a handful of trials using probabilistic dynamics models , author=. Advances in Neural Information Processing Systems , volume=

work page
[7]

Advances in Neural Information Processing Systems , volume=

Efficient model-based reinforcement learning through optimistic policy search and planning , author=. Advances in Neural Information Processing Systems , volume=

work page
[8]

How do noise tails impact on deep

Fan, Jianqing and Gu, Yihong and Zhou, Wen-Xin , journal=. How do noise tails impact on deep. 2024 , publisher=

work page 2024
[9]

Fiedler, Christian , journal=

work page
[10]

Practical and rigorous uncertainty bounds for

Fiedler, Christian and Scherer, Carsten W and Trimpe, Sebastian , booktitle=. Practical and rigorous uncertainty bounds for

work page
[11]

On Safety in Safe

Fiedler, Christian and Menn, Johanna and Kreisk. On Safety in Safe. Transactions on Machine Learning Research , issn=

work page
[12]

Automatica , volume=

Learning-based symbolic abstractions for nonlinear control systems , author=. Automatica , volume=. 2022 , publisher=

work page 2022
[13]

a \"a and Richard Scharff and Lennart S \

Bri-Mathias Hodge and Debra Lew and Michael Milligan and Emilio G \'o mez-L \'a zaro and Lars \'e n, \ Xiaoli Guo\ and Gregor Giebel and Hannele Holttinen and Samuli Sillanp \"a \"a and Richard Scharff and Lennart S \"o der and Damian Flynn. Wind Power Forecasting Error Distributions: An International Comparison. Proceedings of 11th International Workshop...

work page 2012
[14]

Probability Surveys , volume=

Time-uniform Chernoff bounds via nonnegative supermartingales , author=. Probability Surveys , volume=

work page
[15]

A tail inequality for quadratic forms of subgaussian random vectors , journal=

Hsu, Daniel and Kakade, Sham and Zhang, Tong , year=. A tail inequality for quadratic forms of subgaussian random vectors , journal=

work page
[16]

1981 , publisher=

Robust statistics , author=. 1981 , publisher=

work page 1981
[17]

Johnson, J. B. , year = 1928, journal =. Thermal

work page 1928
[18]

arXiv preprint arXiv:2506.17366 , year=

Gaussian Processes and Reproducing Kernels: Connections and Equivalences , author=. arXiv preprint arXiv:2506.17366 , year=

work page arXiv
[19]

7th Annual Learning for Dynamics & Control Conference , pages=

Outlier-Robust Linear System Identification Under Heavy-Tailed Noise , author=. 7th Annual Learning for Dynamics & Control Conference , pages=. 2025 , organization=

work page 2025
[20]

Conference On Learning Theory , pages=

Information directed sampling and bandits with heteroscedastic noise , author=. Conference On Learning Theory , pages=. 2018 , organization=

work page 2018
[21]

Optimal kernel regression bounds under energy-bounded noise , volume =

Lahr, Amon and K\". Optimal kernel regression bounds under energy-bounded noise , volume =. Advances in Neural Information Processing Systems , pages =

work page
[22]

2023 , school=

Gaussian Processes in Control: Performance Guarantees through Efficient Learning , author=. 2023 , school=

work page 2023
[23]

Uniform error bounds for

Lederer, Armin and Umlauft, Jonas and Hirche, Sandra , journal=. Uniform error bounds for

work page
[24]

arXiv preprint arXiv:2009.06202 , year=

Risk bounds for robust deep learning , author=. arXiv preprint arXiv:2009.06202 , year=

work page arXiv 2009
[25]

Automatica , volume=

Deterministic error bounds for kernel-based learning techniques under bounded noise , author=. Automatica , volume=. 2021 , publisher=

work page 2021
[26]

Regularized least squares learning with heavy-tailed noise is minimax optimal , volume =

Mollenhauer, Mattes and Muecke, Nicole and Meunier, Dimitri and Gretton, Arthur , booktitle =. Regularized least squares learning with heavy-tailed noise is minimax optimal , volume =

work page
[27]

Towards safe

Molodchyk, Oleksii and Teutsch, Johannes and Faulwasser, Timm , booktitle=. Towards safe. 2025 , organization=

work page 2025
[28]

Advances in Neural Information Processing Systems , volume=

Identification of analytic nonlinear dynamical systems with non-asymptotic guarantees , author=. Advances in Neural Information Processing Systems , volume=

work page
[29]

Rigid motion

Omainska, Marco and Yamauchi, Junya and Lederer, Armin and Hirche, Sandra and Fujita, Masayuki , journal=. Rigid motion. 2023 , publisher=

work page 2023
[30]

Rasmussen, Carl Edward and Williams, Christopher K. I. , publisher =. 2006 , address =

work page 2006
[31]

Error bounds for

Reed, Robert and Laurenti, Luca and Lahijanian, Morteza , journal=. Error bounds for

work page
[32]

Robust uncertainty bounds in reproducing kernel

Scharnhorst, Paul and Maddalena, Emilio T and Jiang, Yuning and Jones, Colin N , journal=. Robust uncertainty bounds in reproducing kernel. 2022 , publisher=

work page 2022
[33]

, publisher =

Schölkopf, Bernhard and Smola, Alexander J. , publisher =. Learning with. 2001 , shorttitle =

work page 2001
[34]

2014 , publisher=

Understanding machine learning: From theory to algorithms , author=. 2014 , publisher=

work page 2014
[35]

Conference On Learning Theory , pages=

Learning without mixing: Towards a sharp analysis of linear system identification , author=. Conference On Learning Theory , pages=. 2018 , organization=

work page 2018
[36]

Gaussian process optimization in the bandit setting:

Srinivas, Niranjan and Krause, Andreas and Kakade, Sham M and Seeger, Matthias , journal=. Gaussian process optimization in the bandit setting:

work page
[37]

Safe exploration for optimization with

Sui, Yanan and Gotovos, Alkis and Burdick, Joel and Krause, Andreas , booktitle=. Safe exploration for optimization with. 2015 , organization=

work page 2015
[38]

, publisher =

Sullivan, T.J. , publisher =. 2015 , journal =

work page 2015
[39]

Journal of Economic Literature , author=

Gibrat's. Journal of Economic Literature , author=. 1997 , pages=

work page 1997
[40]

2018 , publisher=

High-dimensional probability: An introduction with applications in data science , author=. 2018 , publisher=

work page 2018
[41]

2019 , publisher=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , publisher=

work page 2019