Kernel Density Estimation under C^(1,1) Regularity: AMISE, Weak Curvature, and Plug-in Bandwidths
Pith reviewed 2026-05-21 06:07 UTC · model grok-4.3
The pith
The classical AMISE formula and optimal bandwidth for kernel density estimation hold under the weaker condition that the density has a Lipschitz first derivative.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under f in C^{1,1}(R) with weak second derivative in L^2(R), the asymptotic mean integrated squared error of a kernel density estimator equals the classical expression involving the kernel's second moment and the roughness R(f''), where the latter is interpreted through the weak-curvature functional. This yields the standard optimal bandwidth of order n^{-1/5} and confirms that the Epanechnikov kernel minimizes the leading AMISE term without any need for pointwise twice differentiability.
What carries the argument
Integral Taylor representation based on the weak second derivative, which defines the weak-curvature functional R(f'').
If this is right
- The AMISE formula remains identical to the classical case without assuming a continuous second derivative.
- The optimal bandwidth retains the rate n^{-1/5}.
- The Epanechnikov kernel remains asymptotically optimal among kernels of fixed order.
- A plug-in selector based on estimated weak curvature is first-order AMISE equivalent under ratio-consistent estimation.
- A leave-one-out U-statistic estimator of the weak curvature is consistent.
- In the multivariate setting the scalar bandwidth achieves the rate n^{-4/(d+4)} via weak Hessian regularity.
Where Pith is reading between the lines
- Densities arising from threshold or regime-change models can now use standard KDE bandwidth rules with the same theoretical guarantees.
- The weak-curvature approach may extend naturally to other bias calculations in nonparametric smoothing under minimal smoothness.
- Numerical checks on piecewise-quadratic densities would directly test whether the predicted AMISE rates materialize in finite samples.
Load-bearing premise
The integral Taylor representation using the weak second derivative accurately captures the leading bias term in the mean integrated squared error expansion.
What would settle it
Compute the finite-sample integrated squared error of a kernel estimator on a density whose first derivative is Lipschitz but whose second derivative has a jump discontinuity, using the n^{-1/5} bandwidth, and check whether the error scales exactly as predicted by the classical AMISE expression.
Figures
read the original abstract
Classical kernel density estimation usually derives the AMISE and optimal bandwidth from a pointwise Taylor expansion, which requires twice continuous differentiability. This assumption is stronger than necessary and excludes natural densities arising from threshold models, regime changes, and robust mixture models, where the first derivative may be Lipschitz while the curvature is kinked, discontinuous, or only weakly defined. We show that the classical AMISE theory remains valid under the weaker condition $f\in C^{1,1}(\mathbb{R})$. The pointwise $C^2$ Taylor expansion is replaced by an integral Taylor representation based on the weak second derivative, so that $R(f'')$ is interpreted as a weak-curvature functional. Under $f\in C^{1,1}(\mathbb{R})$ and $f''\in L^2(\mathbb{R})$, we recover the classical AMISE formula, the $n^{-1/5}$ optimal bandwidth, and Epanechnikov kernel optimality without assuming a continuous classical second derivative. We also propose a generalized-curvature plug-in bandwidth selector, prove its first-order AMISE equivalence under ratio-consistent curvature estimation, and establish consistency of a leave-one-out U-statistic curvature estimator. A multivariate extension using weak Hessians recovers the scalar-bandwidth rate $n^{-4/(d+4)}$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that under f ∈ C^{1,1}(ℝ) with f'' ∈ L²(ℝ), kernel density estimation recovers the classical AMISE formula (including the leading bias term involving ∫ [f''(x)]² dx), the n^{-1/5} optimal bandwidth rate, and Epanechnikov kernel optimality. This is achieved by replacing the pointwise C² Taylor expansion with an integral Taylor representation based on the weak second derivative. The work also introduces a generalized-curvature plug-in bandwidth selector shown to be first-order AMISE equivalent under ratio-consistent curvature estimation, proves consistency of a leave-one-out U-statistic estimator for the curvature functional, and provides a multivariate extension recovering the scalar-bandwidth rate n^{-4/(d+4)}.
Significance. If the central claims hold, the result would extend classical KDE asymptotics to a wider class of densities with kinked or only weakly defined curvature (e.g., from threshold or regime-switching models), which is a meaningful theoretical advance. The explicit use of weak derivatives to interpret R(f'') and the construction of a consistent U-statistic curvature estimator are notable strengths that could support practical plug-in methods under reduced smoothness.
major comments (2)
- [§3.1] §3.1 (Bias expansion via integral Taylor representation): The argument that the remainder term arising from the representation f(x + hu) = f(x) + hu f'(x) + ∫_0^{hu} (hu - t) f''(x + t) dt, after convolution with K and integration of the squared bias, is o(h²) in the L² sense under only f'' ∈ L²(ℝ) is load-bearing for recovering the exact classical AMISE. With f'' merely square-integrable, the resulting double-integral term against K(u) need not vanish faster than the leading (h²/2) μ₂(f'') term uniformly in x or after squaring and integrating; a explicit bound or additional local integrability condition is required to confirm the o(h²) rate.
- [Theorem 4] Theorem 4 (AMISE equivalence of the generalized-curvature plug-in selector): The first-order equivalence to the AMISE-optimal bandwidth is stated to hold under ratio-consistent estimation of the weak-curvature functional, but the proof sketch does not explicitly address whether the leave-one-out U-statistic estimator remains ratio-consistent when the pilot bandwidth used in curvature estimation is itself of order n^{-1/5}; any implicit dependence could affect the claimed equivalence.
minor comments (2)
- [§2] The notation for the weak-curvature functional R(f'') is introduced in the abstract but would benefit from an explicit integral definition in §2 before the main results.
- Figure 1 (if present) comparing classical and weak-curvature AMISE curves should include the precise regularity conditions used for each curve in the caption.
Simulated Author's Rebuttal
We thank the referee for the careful reading of the manuscript and for the positive assessment of its potential significance in extending classical KDE asymptotics to the C^{1,1} class. The two major comments raise valid points about technical details in the bias expansion and the proof of the plug-in selector. We address each below and will incorporate clarifications and expanded arguments in the revised version.
read point-by-point responses
-
Referee: [§3.1] §3.1 (Bias expansion via integral Taylor representation): The argument that the remainder term arising from the representation f(x + hu) = f(x) + hu f'(x) + ∫_0^{hu} (hu - t) f''(x + t) dt, after convolution with K and integration of the squared bias, is o(h²) in the L² sense under only f'' ∈ L²(ℝ) is load-bearing for recovering the exact classical AMISE. With f'' merely square-integrable, the resulting double-integral term against K(u) need not vanish faster than the leading (h²/2) μ₂(f'') term uniformly in x or after squaring and integrating; a explicit bound or additional local integrability condition is required to confirm the o(h²) rate.
Authors: We agree that an explicit bound on the remainder would strengthen the argument. In the current proof we apply the integral Taylor formula, convolve with K, square the bias, and integrate. Because f'' ∈ L²(ℝ) and K has finite second moment and compact support, Fubini’s theorem together with Cauchy–Schwarz yields that the L²-norm of the remainder convolution is bounded by C h² ‖f''‖₂ times a factor that tends to zero with h (uniformly in the location variable after integration against the density). This is sufficient to make the integrated squared remainder o(h⁴) and therefore o_p of the leading AMISE terms. We will add a short lemma in §3.1 that records this bound explicitly, confirming the classical AMISE expansion under the stated assumptions without requiring extra local integrability. revision: yes
-
Referee: [Theorem 4] Theorem 4 (AMISE equivalence of the generalized-curvature plug-in selector): The first-order equivalence to the AMISE-optimal bandwidth is stated to hold under ratio-consistent estimation of the weak-curvature functional, but the proof sketch does not explicitly address whether the leave-one-out U-statistic estimator remains ratio-consistent when the pilot bandwidth used in curvature estimation is itself of order n^{-1/5}; any implicit dependence could affect the claimed equivalence.
Authors: The concern is well taken. The manuscript establishes consistency of the leave-one-out U-statistic for the curvature functional R(f'') whenever the pilot bandwidth h_p satisfies h_p → 0 and n h_p → ∞, a regime that includes the order n^{-1/5} pilot. To obtain ratio-consistency (estimated curvature / true curvature → 1 in probability) we further need the estimation error to be o_p(1) uniformly over pilots of that order. We will expand the proof of Theorem 4 by conditioning on the pilot bandwidth and invoking the uniform convergence rate of the U-statistic (which is faster than the variation induced by an n^{-1/5} pilot). This explicit argument removes any ambiguity about dependence and preserves the first-order AMISE equivalence of the resulting bandwidth selector. revision: yes
Circularity Check
No circularity: derivation uses independent integral representation and consistency proof
full rationale
The paper replaces the classical pointwise C^2 Taylor expansion with an integral Taylor representation based on the weak second derivative under the stated C^{1,1} and L^2 assumptions. It then derives the AMISE formula, n^{-1/5} rate, and Epanechnikov optimality directly from this representation. The generalized-curvature plug-in selector is shown first-order equivalent under a ratio-consistent estimator whose consistency is separately established via a leave-one-out U-statistic with its own proof; neither step reduces to a fitted input renamed as prediction nor to a self-citation chain. The multivariate extension follows the same pattern. All load-bearing steps are self-contained against external analytic benchmarks and do not rely on the target result by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption f belongs to C^{1,1}(R) with weak second derivative belonging to L^2(R)
Reference graph
Works this paper leans on
-
[1]
Bickel, P. J. and Ritov, Y. (1988). Estimating integrated squared density derivatives: Sharp best order of convergence estimates. Sankhy\= a : The Indian Journal of Statistics, Series A 50 381--393
work page 1988
-
[2]
Chac \'o n, J. E. and Duong, T. (2010). Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices. TEST 19 375--398
work page 2010
-
[3]
Chac \'o n, J. E. and Duong, T. (2013). Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electron. J. Stat. 7 499--532
work page 2013
-
[4]
Chac \'o n, J. E. and Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications. Chapman and Hall/CRC, Boca Raton
work page 2018
-
[5]
Chac \'o n, J. E. , Duong, T. and Wand, M. P. (2011). Asymptotics for general multivariate kernel density derivative estimators. Statist. Sinica 21 807--840
work page 2011
-
[6]
Chiu, S.-T. (1996). A comparative review of bandwidth selection for kernel density estimation. Statist. Sinica 6 129--145
work page 1996
-
[7]
Clarke, F. H. (1990). Optimization and Nonsmooth Analysis. SIAM, Philadelphia
work page 1990
-
[8]
Cline, D. B. H. and Hart, J. D. (1991). Kernel estimation of densities with discontinuities or discontinuous derivatives. Statistics 22 69--84
work page 1991
-
[9]
Donoho, D. L. , Johnstone, I. M. , Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24 508--539
work page 1996
-
[10]
Epanechnikov, V. A. (1969). Non-parametric estimation of a multivariate probability density. Theory Probab. Appl. 14 153--158
work page 1969
-
[11]
Evans, L. C. and Gariepy, R. F. (2015). Measure Theory and Fine Properties of Functions, revised ed. CRC Press, Boca Raton
work page 2015
-
[12]
Fan, J. and Marron, J. S. (1992). Best possible constant for bandwidth selection. Ann. Statist. 20 2057--2070
work page 1992
-
[13]
Gin \'e , E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Univ. Press, Cambridge
work page 2016
-
[14]
Goldenshluger, A. and Lepski, O. (2011). Bandwidth selection in kernel density estimation: Oracle inequalities and adaptive minimax optimality. Ann. Statist. 39 1608--1632
work page 2011
-
[15]
Goldenshluger, A. and Lepski, O. (2014). On adaptive minimax density estimation on R ^d . Probab. Theory Related Fields 159 479--543
work page 2014
- [16]
-
[17]
Hall, P. and Marron, J. S. (1987). Estimation of integrated squared density derivatives. Statist. Probab. Lett. 6 109--115
work page 1987
-
[18]
Hall, P. , Sheather, S. J. , Jones, M. C. and Marron, J. S. (1991). On optimal data-based bandwidth selection in kernel density estimation. Biometrika 78 263--269
work page 1991
-
[19]
Hansen, B. E. (2017). Regression kink with an unknown threshold. J. Bus. Econom. Statist. 35 228--240
work page 2017
-
[20]
Heidenreich, N.-B. , Schindler, A. and Sperlich, S. (2013). Bandwidth selection for kernel density estimation: A review of fully automatic selectors. AStA Adv. Stat. Anal. 97 403--433
work page 2013
-
[21]
Hiriart-Urruty, J.-B. , Strodiot, J.-J. and Nguyen, V. H. (1984). Generalized Hessian matrix and second-order optimality conditions for problems with C^ 1,1 data. Appl. Math. Optim. 11 43--56
work page 1984
-
[22]
Jones, M. C. (1993). Simple boundary correction for kernel density estimation. Statist. Comput. 3 135--146
work page 1993
-
[23]
Jones, M. C. , Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. 91 401--407
work page 1996
-
[24]
Jones, M. C. and Sheather, S. J. (1991). Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Statist. Probab. Lett. 11 511--514
work page 1991
-
[25]
Kerkyacharian, G. and Picard, D. (1993). Density estimation by kernel and wavelets methods: Optimality of Besov spaces. Statist. Probab. Lett. 18 327--336
work page 1993
-
[26]
Kerkyacharian, G. , Picard, D. and Tribouley, K. (1996). L_p adaptive density estimation. Bernoulli 2 229--247
work page 1996
-
[27]
McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. Wiley Series in Probability and Statistics. Wiley, New York
work page 2000
-
[28]
Park, B. U. and Marron, J. S. (1990). Comparison of data-driven bandwidth selectors. J. Amer. Statist. Assoc. 85 66--72
work page 1990
-
[29]
Parzen, E. (1962). On estimation of a probability density function and mode. Ann. Math. Statist. 33 1065--1076
work page 1962
-
[30]
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27 832--837
work page 1956
-
[31]
Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley, Hoboken, NJ
work page 2015
-
[32]
Sheather, S. J. (2004). Density estimation. Statist. Sci. 19 588--597
work page 2004
-
[33]
Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B 53 683--690
work page 1991
-
[34]
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London
work page 1986
-
[35]
Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York
work page 2009
-
[36]
van Eeden, C. (1985). Mean integrated squared error of kernel estimators when the density and its derivatives are not necessarily continuous. Ann. Inst. Statist. Math. 37 461--472
work page 1985
-
[37]
van Es, A. J. and Hoogstrate, A. J. (1994). Kernel estimators of integrated squared density derivatives in non-smooth cases. In Asymptotic Statistics: Proceedings of the Fifth Prague Symposium, 163--178. Physica-Verlag
work page 1994
-
[38]
van Es, B. (1997). A note on the integrated squared error of a kernel density estimator in non-smooth cases. Statist. Probab. Lett. 35 241--250
work page 1997
-
[39]
Wand, M. P. and Jones, M. C. (1994). Multivariate plug-in bandwidth selection. Comput. Statist. 9 97--116
work page 1994
-
[40]
Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall, London
work page 1995
-
[41]
Wu, T.-J. (1995). Adaptive root- n estimates of integrated squared density derivatives. Ann. Statist. 23 1474--1495
work page 1995
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.