pith. sign in

arxiv: 2605.20550 · v1 · pith:D7LPYG6Qnew · submitted 2026-05-19 · 🧮 math.ST · stat.TH

Kernel Density Estimation under C^(1,1) Regularity: AMISE, Weak Curvature, and Plug-in Bandwidths

Pith reviewed 2026-05-21 06:07 UTC · model grok-4.3

classification 🧮 math.ST stat.TH
keywords kernel density estimationAMISEweak derivativeC^{1,1} regularityplug-in bandwidthEpanechnikov kernelnonparametric statisticsweak curvature
0
0 comments X

The pith

The classical AMISE formula and optimal bandwidth for kernel density estimation hold under the weaker condition that the density has a Lipschitz first derivative.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that the usual asymptotic mean integrated squared error expression for kernel density estimators continues to apply when the density belongs only to C^{1,1} rather than requiring a continuous second derivative. The authors replace the pointwise Taylor expansion with an integral representation that uses the weak second derivative, allowing the roughness term to be defined even when curvature is kinked or discontinuous. Under the additional condition that this weak second derivative lies in L^2, the familiar n^{-1/5} optimal bandwidth and Epanechnikov kernel optimality are recovered exactly as in the classical setting. The work further develops a plug-in bandwidth selector based on a generalized curvature estimate and proves its first-order equivalence to the oracle bandwidth, along with consistency of a related U-statistic estimator. A multivariate version recovers the expected scalar-bandwidth convergence rate.

Core claim

Under f in C^{1,1}(R) with weak second derivative in L^2(R), the asymptotic mean integrated squared error of a kernel density estimator equals the classical expression involving the kernel's second moment and the roughness R(f''), where the latter is interpreted through the weak-curvature functional. This yields the standard optimal bandwidth of order n^{-1/5} and confirms that the Epanechnikov kernel minimizes the leading AMISE term without any need for pointwise twice differentiability.

What carries the argument

Integral Taylor representation based on the weak second derivative, which defines the weak-curvature functional R(f'').

If this is right

  • The AMISE formula remains identical to the classical case without assuming a continuous second derivative.
  • The optimal bandwidth retains the rate n^{-1/5}.
  • The Epanechnikov kernel remains asymptotically optimal among kernels of fixed order.
  • A plug-in selector based on estimated weak curvature is first-order AMISE equivalent under ratio-consistent estimation.
  • A leave-one-out U-statistic estimator of the weak curvature is consistent.
  • In the multivariate setting the scalar bandwidth achieves the rate n^{-4/(d+4)} via weak Hessian regularity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Densities arising from threshold or regime-change models can now use standard KDE bandwidth rules with the same theoretical guarantees.
  • The weak-curvature approach may extend naturally to other bias calculations in nonparametric smoothing under minimal smoothness.
  • Numerical checks on piecewise-quadratic densities would directly test whether the predicted AMISE rates materialize in finite samples.

Load-bearing premise

The integral Taylor representation using the weak second derivative accurately captures the leading bias term in the mean integrated squared error expansion.

What would settle it

Compute the finite-sample integrated squared error of a kernel estimator on a density whose first derivative is Lipschitz but whose second derivative has a jump discontinuity, using the n^{-1/5} bandwidth, and check whether the error scales exactly as predicted by the classical AMISE expression.

Figures

Figures reproduced from arXiv: 2605.20550 by Alireza Kabgani, Elaheh Lotfian.

Figure 1
Figure 1. Figure 1: A Gaussian density perturbed by an odd curvature-kink term. The density belongs [PITH_FULL_IMAGE:figures/full_fig_p024_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Log–log plot of Monte Carlo mean integrated squared error against sample size for the [PITH_FULL_IMAGE:figures/full_fig_p026_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Epanechnikov kernel density estimates for the Old Faithful eruption-duration data [PITH_FULL_IMAGE:figures/full_fig_p028_3.png] view at source ↗
read the original abstract

Classical kernel density estimation usually derives the AMISE and optimal bandwidth from a pointwise Taylor expansion, which requires twice continuous differentiability. This assumption is stronger than necessary and excludes natural densities arising from threshold models, regime changes, and robust mixture models, where the first derivative may be Lipschitz while the curvature is kinked, discontinuous, or only weakly defined. We show that the classical AMISE theory remains valid under the weaker condition $f\in C^{1,1}(\mathbb{R})$. The pointwise $C^2$ Taylor expansion is replaced by an integral Taylor representation based on the weak second derivative, so that $R(f'')$ is interpreted as a weak-curvature functional. Under $f\in C^{1,1}(\mathbb{R})$ and $f''\in L^2(\mathbb{R})$, we recover the classical AMISE formula, the $n^{-1/5}$ optimal bandwidth, and Epanechnikov kernel optimality without assuming a continuous classical second derivative. We also propose a generalized-curvature plug-in bandwidth selector, prove its first-order AMISE equivalence under ratio-consistent curvature estimation, and establish consistency of a leave-one-out U-statistic curvature estimator. A multivariate extension using weak Hessians recovers the scalar-bandwidth rate $n^{-4/(d+4)}$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that under f ∈ C^{1,1}(ℝ) with f'' ∈ L²(ℝ), kernel density estimation recovers the classical AMISE formula (including the leading bias term involving ∫ [f''(x)]² dx), the n^{-1/5} optimal bandwidth rate, and Epanechnikov kernel optimality. This is achieved by replacing the pointwise C² Taylor expansion with an integral Taylor representation based on the weak second derivative. The work also introduces a generalized-curvature plug-in bandwidth selector shown to be first-order AMISE equivalent under ratio-consistent curvature estimation, proves consistency of a leave-one-out U-statistic estimator for the curvature functional, and provides a multivariate extension recovering the scalar-bandwidth rate n^{-4/(d+4)}.

Significance. If the central claims hold, the result would extend classical KDE asymptotics to a wider class of densities with kinked or only weakly defined curvature (e.g., from threshold or regime-switching models), which is a meaningful theoretical advance. The explicit use of weak derivatives to interpret R(f'') and the construction of a consistent U-statistic curvature estimator are notable strengths that could support practical plug-in methods under reduced smoothness.

major comments (2)
  1. [§3.1] §3.1 (Bias expansion via integral Taylor representation): The argument that the remainder term arising from the representation f(x + hu) = f(x) + hu f'(x) + ∫_0^{hu} (hu - t) f''(x + t) dt, after convolution with K and integration of the squared bias, is o(h²) in the L² sense under only f'' ∈ L²(ℝ) is load-bearing for recovering the exact classical AMISE. With f'' merely square-integrable, the resulting double-integral term against K(u) need not vanish faster than the leading (h²/2) μ₂(f'') term uniformly in x or after squaring and integrating; a explicit bound or additional local integrability condition is required to confirm the o(h²) rate.
  2. [Theorem 4] Theorem 4 (AMISE equivalence of the generalized-curvature plug-in selector): The first-order equivalence to the AMISE-optimal bandwidth is stated to hold under ratio-consistent estimation of the weak-curvature functional, but the proof sketch does not explicitly address whether the leave-one-out U-statistic estimator remains ratio-consistent when the pilot bandwidth used in curvature estimation is itself of order n^{-1/5}; any implicit dependence could affect the claimed equivalence.
minor comments (2)
  1. [§2] The notation for the weak-curvature functional R(f'') is introduced in the abstract but would benefit from an explicit integral definition in §2 before the main results.
  2. Figure 1 (if present) comparing classical and weak-curvature AMISE curves should include the precise regularity conditions used for each curve in the caption.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and for the positive assessment of its potential significance in extending classical KDE asymptotics to the C^{1,1} class. The two major comments raise valid points about technical details in the bias expansion and the proof of the plug-in selector. We address each below and will incorporate clarifications and expanded arguments in the revised version.

read point-by-point responses
  1. Referee: [§3.1] §3.1 (Bias expansion via integral Taylor representation): The argument that the remainder term arising from the representation f(x + hu) = f(x) + hu f'(x) + ∫_0^{hu} (hu - t) f''(x + t) dt, after convolution with K and integration of the squared bias, is o(h²) in the L² sense under only f'' ∈ L²(ℝ) is load-bearing for recovering the exact classical AMISE. With f'' merely square-integrable, the resulting double-integral term against K(u) need not vanish faster than the leading (h²/2) μ₂(f'') term uniformly in x or after squaring and integrating; a explicit bound or additional local integrability condition is required to confirm the o(h²) rate.

    Authors: We agree that an explicit bound on the remainder would strengthen the argument. In the current proof we apply the integral Taylor formula, convolve with K, square the bias, and integrate. Because f'' ∈ L²(ℝ) and K has finite second moment and compact support, Fubini’s theorem together with Cauchy–Schwarz yields that the L²-norm of the remainder convolution is bounded by C h² ‖f''‖₂ times a factor that tends to zero with h (uniformly in the location variable after integration against the density). This is sufficient to make the integrated squared remainder o(h⁴) and therefore o_p of the leading AMISE terms. We will add a short lemma in §3.1 that records this bound explicitly, confirming the classical AMISE expansion under the stated assumptions without requiring extra local integrability. revision: yes

  2. Referee: [Theorem 4] Theorem 4 (AMISE equivalence of the generalized-curvature plug-in selector): The first-order equivalence to the AMISE-optimal bandwidth is stated to hold under ratio-consistent estimation of the weak-curvature functional, but the proof sketch does not explicitly address whether the leave-one-out U-statistic estimator remains ratio-consistent when the pilot bandwidth used in curvature estimation is itself of order n^{-1/5}; any implicit dependence could affect the claimed equivalence.

    Authors: The concern is well taken. The manuscript establishes consistency of the leave-one-out U-statistic for the curvature functional R(f'') whenever the pilot bandwidth h_p satisfies h_p → 0 and n h_p → ∞, a regime that includes the order n^{-1/5} pilot. To obtain ratio-consistency (estimated curvature / true curvature → 1 in probability) we further need the estimation error to be o_p(1) uniformly over pilots of that order. We will expand the proof of Theorem 4 by conditioning on the pilot bandwidth and invoking the uniform convergence rate of the U-statistic (which is faster than the variation induced by an n^{-1/5} pilot). This explicit argument removes any ambiguity about dependence and preserves the first-order AMISE equivalence of the resulting bandwidth selector. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation uses independent integral representation and consistency proof

full rationale

The paper replaces the classical pointwise C^2 Taylor expansion with an integral Taylor representation based on the weak second derivative under the stated C^{1,1} and L^2 assumptions. It then derives the AMISE formula, n^{-1/5} rate, and Epanechnikov optimality directly from this representation. The generalized-curvature plug-in selector is shown first-order equivalent under a ratio-consistent estimator whose consistency is separately established via a leave-one-out U-statistic with its own proof; neither step reduces to a fitted input renamed as prediction nor to a self-citation chain. The multivariate extension follows the same pattern. All load-bearing steps are self-contained against external analytic benchmarks and do not rely on the target result by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests primarily on the domain assumption of C^{1,1} regularity together with square-integrability of the weak second derivative; no free parameters or new postulated entities are introduced in the abstract.

axioms (1)
  • domain assumption f belongs to C^{1,1}(R) with weak second derivative belonging to L^2(R)
    This assumption replaces the classical twice continuous differentiability and enables the integral Taylor representation for the AMISE derivation.

pith-pipeline@v0.9.0 · 5776 in / 1468 out tokens · 64401 ms · 2026-05-21T06:07:15.200228+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    Bickel, P. J. and Ritov, Y. (1988). Estimating integrated squared density derivatives: Sharp best order of convergence estimates. Sankhy\= a : The Indian Journal of Statistics, Series A 50 381--393

  2. [2]

    Chac \'o n, J. E. and Duong, T. (2010). Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. TEST 19 375--398

  3. [3]

    Chac \'o n, J. E. and Duong, T. (2013). Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electron. J. Stat. 7 499--532

  4. [4]

    Chac \'o n, J. E. and Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications. Chapman and Hall/CRC, Boca Raton

  5. [5]

    Chac \'o n, J. E. , Duong, T. and Wand, M. P. (2011). Asymptotics for general multivariate kernel density derivative estimators. Statist. Sinica 21 807--840

  6. [6]

    Chiu, S.-T. (1996). A comparative review of bandwidth selection for kernel density estimation. Statist. Sinica 6 129--145

  7. [7]

    Clarke, F. H. (1990). Optimization and Nonsmooth Analysis. SIAM, Philadelphia

  8. [8]

    Cline, D. B. H. and Hart, J. D. (1991). Kernel estimation of densities with discontinuities or discontinuous derivatives. Statistics 22 69--84

  9. [9]

    Donoho, D. L. , Johnstone, I. M. , Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24 508--539

  10. [10]

    Epanechnikov, V. A. (1969). Non-parametric estimation of a multivariate probability density. Theory Probab. Appl. 14 153--158

  11. [11]

    Evans, L. C. and Gariepy, R. F. (2015). Measure Theory and Fine Properties of Functions, revised ed. CRC Press, Boca Raton

  12. [12]

    and Marron, J

    Fan, J. and Marron, J. S. (1992). Best possible constant for bandwidth selection. Ann. Statist. 20 2057--2070

  13. [13]

    and Nickl, R

    Gin \'e , E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Univ. Press, Cambridge

  14. [14]

    and Lepski, O

    Goldenshluger, A. and Lepski, O. (2011). Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality. Ann. Statist. 39 1608--1632

  15. [15]

    and Lepski, O

    Goldenshluger, A. and Lepski, O. (2014). On adaptive minimax density estimation on R ^d . Probab. Theory Related Fields 159 479--543

  16. [16]

    Guidoum, A. C. (2020). Kernel estimator and bandwidth selection for density and its derivatives: The kedd package. arXiv preprint arXiv:2012.06102

  17. [17]

    and Marron, J

    Hall, P. and Marron, J. S. (1987). Estimation of integrated squared density derivatives. Statist. Probab. Lett. 6 109--115

  18. [18]

    , Sheather, S

    Hall, P. , Sheather, S. J. , Jones, M. C. and Marron, J. S. (1991). On optimal data-based bandwidth selection in kernel density estimation. Biometrika 78 263--269

  19. [19]

    Hansen, B. E. (2017). Regression kink with an unknown threshold. J. Bus. Econom. Statist. 35 228--240

  20. [20]

    , Schindler, A

    Heidenreich, N.-B. , Schindler, A. and Sperlich, S. (2013). Bandwidth selection for kernel density estimation: A review of fully automatic selectors. AStA Adv. Stat. Anal. 97 403--433

  21. [21]

    , Strodiot, J.-J

    Hiriart-Urruty, J.-B. , Strodiot, J.-J. and Nguyen, V. H. (1984). Generalized Hessian matrix and second-order optimality conditions for problems with C^ 1,1 data. Appl. Math. Optim. 11 43--56

  22. [22]

    Jones, M. C. (1993). Simple boundary correction for kernel density estimation. Statist. Comput. 3 135--146

  23. [23]

    Jones, M. C. , Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. 91 401--407

  24. [24]

    Jones, M. C. and Sheather, S. J. (1991). Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Statist. Probab. Lett. 11 511--514

  25. [25]

    and Picard, D

    Kerkyacharian, G. and Picard, D. (1993). Density estimation by kernel and wavelets methods: Optimality of Besov spaces. Statist. Probab. Lett. 18 327--336

  26. [26]

    , Picard, D

    Kerkyacharian, G. , Picard, D. and Tribouley, K. (1996). L_p adaptive density estimation. Bernoulli 2 229--247

  27. [27]

    McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. Wiley Series in Probability and Statistics. Wiley, New York

  28. [28]

    Park, B. U. and Marron, J. S. (1990). Comparison of data-driven bandwidth selectors. J. Amer. Statist. Assoc. 85 66--72

  29. [29]

    Parzen, E. (1962). On estimation of a probability density function and mode. Ann. Math. Statist. 33 1065--1076

  30. [30]

    Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27 832--837

  31. [31]

    Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, Hoboken, NJ

  32. [32]

    Sheather, S. J. (2004). Density estimation. Statist. Sci. 19 588--597

  33. [33]

    Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B 53 683--690

  34. [34]

    Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London

  35. [35]

    Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York

  36. [36]

    van Eeden, C. (1985). Mean integrated squared error of kernel estimators when the density and its derivatives are not necessarily continuous. Ann. Inst. Statist. Math. 37 461--472

  37. [37]

    van Es, A. J. and Hoogstrate, A. J. (1994). Kernel estimators of integrated squared density derivatives in non-smooth cases. In Asymptotic Statistics: Proceedings of the Fifth Prague Symposium, 163--178. Physica-Verlag

  38. [38]

    van Es, B. (1997). A note on the integrated squared error of a kernel density estimator in non-smooth cases. Statist. Probab. Lett. 35 241--250

  39. [39]

    Wand, M. P. and Jones, M. C. (1994). Multivariate plug-in bandwidth selection. Comput. Statist. 9 97--116

  40. [40]

    Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall, London

  41. [41]

    Wu, T.-J. (1995). Adaptive root- n estimates of integrated squared density derivatives. Ann. Statist. 23 1474--1495