Recognition: 2 theorem links
· Lean TheoremTweedie-based nonparametric estimation for semicontinuous mixed densities
Pith reviewed 2026-05-08 18:05 UTC · model grok-4.3
The pith
Tweedie asymmetric kernels unify nonparametric estimation of semicontinuous mixed densities on [0, ∞).
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce an asymmetric kernel estimator for mixed densities on [0,∞) based on the Tweedie distribution. For a power parameter p∈(1,2), the Tweedie kernel itself has a point mass at zero and an absolutely continuous component on (0,∞), yielding a unified smoothing construction that preserves the atom at zero and smooths the positive component using the full semicontinuous sample. We establish pointwise bias and variance expansions, derive asymptotic formulae for the mean squared error and mean integrated squared error, obtain optimal bandwidth rates, and prove asymptotic normality. We propose a profile least-squares cross-validation procedure to jointly select the bandwidth and the power.
What carries the argument
The Tweedie kernel with power p in (1,2), an asymmetric kernel that carries both a built-in point mass at zero and a continuous density on (0,∞) to weight the entire semicontinuous sample in one step.
If this is right
- The estimator attains standard optimal convergence rates for density estimation while respecting the mixed structure.
- Asymptotic normality at interior points supports construction of pointwise confidence bands.
- Profile least-squares cross-validation jointly tunes bandwidth and power parameter in finite samples.
- Performance advantages appear most clearly in boundary-spike and heavy-tailed positive components.
Where Pith is reading between the lines
- The construction may extend directly to regression settings for conditional semicontinuous densities.
- Similar Tweedie-based kernels could handle other nonnegative supports with atoms, such as count-inflated continuous data.
- Multivariate versions would allow joint modeling of several semicontinuous variables without separate marginal steps.
- The automatic mass preservation suggests the method can serve as a building block for more complex zero-inflated models in applied statistics.
Load-bearing premise
The Tweedie distribution with power p in (1,2) provides a valid kernel that correctly separates the discrete mass at zero from the continuous component on (0,∞) for the target density class.
What would settle it
A simulation or real-data comparison in which the Tweedie estimator yields higher mean integrated squared error than a two-part estimator (separate zero-probability estimate plus positive kernel) under heavy tails or strong boundary spikes would refute the unified advantage.
Figures
read the original abstract
Semicontinuous outcomes occur frequently in health services, insurance, and cost studies. Standard nonparametric density estimators are not well suited to such data because they do not naturally accommodate the mixed structure, the nonnegative support, or the pronounced boundary effects near zero. To address these limitations, we introduce an asymmetric kernel estimator for mixed densities on $[0,\infty)$ based on the Tweedie distribution. For a power parameter $p\in(1,2)$, the Tweedie kernel itself has a point mass at zero and an absolutely continuous component on $(0,\infty)$, yielding a unified smoothing construction that preserves the atom at zero and smooths the positive component using the full semicontinuous sample. We establish pointwise bias and variance expansions, derive asymptotic formulae for the mean squared error and mean integrated squared error, obtain optimal bandwidth rates, and prove asymptotic normality. We propose a profile least-squares cross-validation procedure to jointly select the bandwidth and the power parameter. Simulation results show competitive performance, particularly in challenging boundary-spike and heavy-tailed settings, and an application to emergency department length-of-stay data illustrates the practical value of the method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a nonparametric estimator for semicontinuous mixed densities supported on [0, ∞) that employs the Tweedie distribution with power parameter p ∈ (1,2) as an asymmetric kernel. This kernel inherently carries a point mass at zero together with a density on (0, ∞), allowing a single smoothing construction that preserves the atom while using the full sample for the continuous component. The authors derive pointwise bias and variance expansions, asymptotic MSE and MISE formulae, optimal bandwidth rates, and asymptotic normality; they also introduce a profile least-squares cross-validation procedure for joint selection of the bandwidth h and power p. Simulation experiments and an application to emergency-department length-of-stay data are reported.
Significance. If the stated derivations hold, the work supplies a theoretically grounded, unified kernel method for a data type that appears routinely in health-services, insurance, and cost studies. The approach exploits the compound-Poisson-gamma structure of the Tweedie family to avoid ad-hoc separation of the discrete and continuous parts and to mitigate boundary effects near zero. The claimed optimal rates, normality result, and joint CV selector constitute concrete methodological advances. The reported simulations indicate competitive performance in boundary-spike and heavy-tailed regimes, and the real-data illustration demonstrates practical utility.
major comments (2)
- [§3] §3 (asymptotic theory): the bias and variance expansions are stated to follow from standard kernel arguments applied to the Tweedie kernel, yet the precise regularity conditions on the target density (e.g., smoothness at zero, behavior of the continuous component) that justify interchanging limits and integrals are not spelled out; these conditions are load-bearing for the subsequent MISE and normality claims.
- [§4] §4 (profile least-squares CV): the procedure is defined and implemented, but no consistency or rate result for the selected (ĥ, p̂) is provided; without such a result the practical estimator lacks the theoretical backing given to the oracle bandwidth, which weakens the overall contribution.
minor comments (3)
- [§2] The notation for the mixed density (atom plus continuous part) is introduced only in the abstract and should be restated explicitly in §2 with a clear decomposition f = π δ_0 + (1-π) f_c.
- [§5] Simulation tables would benefit from reporting the selected p̂ values alongside ĥ; this would illustrate how the joint selector behaves across the boundary-spike and heavy-tailed designs.
- [Introduction] A brief comparison with the existing literature on asymmetric kernels for nonnegative data (e.g., gamma or inverse-Gaussian kernels) would help situate the Tweedie choice.
Simulated Author's Rebuttal
We thank the referee for the thorough review and the recommendation for minor revision. The comments highlight important points regarding the presentation of assumptions and the theoretical properties of the cross-validation procedure. We address each comment below and indicate the changes we will implement in the revised manuscript.
read point-by-point responses
-
Referee: [§3] §3 (asymptotic theory): the bias and variance expansions are stated to follow from standard kernel arguments applied to the Tweedie kernel, yet the precise regularity conditions on the target density (e.g., smoothness at zero, behavior of the continuous component) that justify interchanging limits and integrals are not spelled out; these conditions are load-bearing for the subsequent MISE and normality claims.
Authors: We concur that explicitly stating the regularity conditions is necessary for rigor. In the revised version, we will insert a new paragraph at the beginning of §3 that specifies the assumptions: the mixed density f is such that its continuous part f_c on (0, ∞) belongs to the class of twice continuously differentiable functions with bounded second derivative, f is continuous from the right at 0, and suitable moment conditions hold to allow differentiation under the integral sign in the bias and variance calculations. These conditions ensure the validity of the expansions and the subsequent results on MISE and asymptotic normality. We believe this addition will address the concern without altering the main results. revision: yes
-
Referee: [§4] §4 (profile least-squares CV): the procedure is defined and implemented, but no consistency or rate result for the selected (ĥ, p̂) is provided; without such a result the practical estimator lacks the theoretical backing given to the oracle bandwidth, which weakens the overall contribution.
Authors: This is a fair observation. Deriving consistency and rates for the joint selector (ĥ, p̂) would require proving uniform convergence of the CV criterion to the MISE over a suitable range of h and p, which involves more advanced empirical process techniques and is not included in the present work. We will revise §4 to include a brief discussion acknowledging this gap and noting that the selector is supported by the simulation studies showing good finite-sample performance. The primary theoretical contribution remains the asymptotic properties of the Tweedie kernel estimator itself, with the CV serving as a data-driven implementation. If the editor deems it necessary, we can explore adding a heuristic argument, but a full proof is left for future research. revision: partial
Circularity Check
No significant circularity; derivation uses standard Tweedie kernel properties and kernel asymptotics
full rationale
The paper constructs the estimator directly from the known mixed structure of the Tweedie distribution (point mass at zero plus continuous component on (0,∞) for p ∈ (1,2)), then applies standard bias/variance expansions, MISE derivations, optimal bandwidth rates, asymptotic normality, and profile least-squares CV. No step reduces a claimed prediction or result to a fitted quantity defined by the same data, nor relies on self-citation chains for uniqueness or ansatz smuggling. The central construction and asymptotics remain independent of the target estimates.
Axiom & Free-Parameter Ledger
free parameters (1)
- power parameter p
axioms (1)
- domain assumption Tweedie distributions with p ∈ (1,2) possess a point mass at zero and an absolutely continuous component on (0,∞) that can serve as a valid kernel.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
For a power parameter p in (1,2), the Tweedie kernel itself has a point mass at zero and an absolutely continuous component on (0,infty) ... We propose a profile least-squares cross-validation procedure to jointly select the bandwidth and the power parameter.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Tweedie-based nonparametric estimation for semicontinuous mixed densities
Introduction Semicontinuous outcomes, characterized by a point mass at zero together with a continuously dis- tributed positive component, arise naturally in many applied settings where no-event or nonuse observa- tions coexist with positive, often strongly right-skewed realizations. Such data have long been recognized in statistics; an early treatment is...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
Equivalently, the cumulative distribution functionGofXhas the form G(x) =p 01(x≥0) + (1−p 0)F(x), x∈R,(2.1) where1(·) is the indicator function
Semicontinuous data and Tweedie kernel estimator Formally, a random variableXissemicontinuousif it admits the representation X d= ( 0,with probabilityp 0, Y,with probability 1−p 0, wherep 0 ∈(0,1) andYhas distribution functionFsupported on (0,∞) with densityfwith respect to the Lebesgue measure. Equivalently, the cumulative distribution functionGofXhas th...
-
[3]
Main results Becausebgh(0) =bp0, the only nontrivial asymptotic analysis concerns the casex >0. The results in this section are therefore pointwise at a fixed interior locationx, but they must still account for the influence of the boundary at zero through the support of the Tweedie kernel. The assumptions below separate these two aspects: local smoothnes...
-
[4]
coarse”, “medium
Simulation study This section investigates the finite-sample performance of the proposed Tweedie kernel estimator for semicontinuous data. The simulation study is designed to assess how well the estimator recovers the positive component of the target density across a range of settings, including the Tweedie case and several misspecified zero-inflated alte...
-
[5]
The data were obtained from Nova Scotia Health’s centralized Emergency Department Information System (EDIS), which uses standardized reporting procedures across hospitals
Real-data application For the empirical illustration, we analyze 732 mental health-related emergency department visits recorded in March 2023. The data were obtained from Nova Scotia Health’s centralized Emergency Department Information System (EDIS), which uses standardized reporting procedures across hospitals. The data come from four principal emergenc...
2023
-
[6]
Concluding remarks In this paper, we studied nonparametric estimation for semicontinuous distributions by introducing a Tweedie kernel estimator for the full mixed density on [0,∞). The proposed construction exploits the compound Poisson–Gamma structure of Tweedie laws with power parameterp∈(1,2), so that the point mass at zero and the continuous positive...
1984
-
[7]
J. Aitchison. On the distribution of a positive random variable having a discrete probability mass at the origin.J. Amer. Statist. Assoc., 50(271):901–908, 1955.DOI:10.1080/01621459.1955.1050 1976
-
[8]
Billingsley.Probability and Measure
P. Billingsley.Probability and Measure. Wiley Series in Probability and Mathematical Statistics. Wiley, New York, 3rd edition, 1995. ISBN 978-0-471-00710-4
1995
-
[9]
T. Bouezmarni and J.-M. Rolin. Consistency of the beta kernel density function estimator.Canad. J. Statist., 31(1):89–98, 2003.DOI:10.2307/3315905
-
[10]
T. Bouezmarni and J.-M. Rolin. Bernstein estimator for unbounded density function.J. Non- parametr. Stat., 19(3):145–161, 2007.DOI:10.1080/10485250701441218
-
[11]
Bouezmarni and O
T. Bouezmarni and O. Scaillet. Consistency of asymmetric kernel density estimators and smoothed histograms with application to income data.Econometric Theory, 21(2):390–412, 2005.DOI:10.1 017/S0266466605050218
2005
-
[12]
S. X. Chen. Beta kernel estimators for density functions.Comput. Statist. Data Anal., 31(2):131–145, 1999.DOI:10.1016/S0167-9473(99)00010-9
-
[13]
S. X. Chen. Probability density function estimation using gamma kernels.Ann. Inst. Statist. Math., 52(3):471–480, 2000.DOI:10.1023/A:1004165218295
-
[14]
A. Cowling and P. Hall. On pseudodata methods for removing boundary effects in kernel density estimation.J. R. Stat. Soc. Ser. B. Stat. Methodol., 58(3):551–563, 1996.DOI:10.1111/j.2517-6 161.1996.tb02100.x
-
[15]
Duong, M
T. Duong, M. Wand, J. Chac´ on, and A. Gramacki.ks: Kernel Smoothing, 2025. URLhttps: //CRAN.R-project.org/package=ks. R package version 1.15.1
2025
-
[16]
Funke and M
B. Funke and M. Hirukawa. Nonparametric estimation and testing on discontinuity of positive supported densities: a kernel truncation approach.Econom. Stat., 9:156–170, 2019.DOI:10.1016/ j.ecosta.2017.07.006
2019
-
[17]
B. Funke and M. Hirukawa. Density derivative estimation using asymmetric kernels.J. Nonparametr. Stat., 36(4):994–1017, 2024.DOI:10.1080/10485252.2023.2291430
-
[18]
B. Funke and M. Hirukawa. On uniform consistency of nonparametric estimators smoothed by the gamma kernel.Ann. Inst. Statist. Math., 77(3):459–489, 2025.DOI:10.1007/s10463-024-00923-8
-
[19]
B. Funke and M. Hirukawa. Nonparametric estimation of splicing points in skewed cost distributions: a kernel-based approach.J. Nonparametr. Stat., pages 1–40, 2025.DOI:10.1080/10485252.2025. 2505639. 29
-
[20]
B. Funke and R. Kawka. Nonparametric density estimation for multivariate bounded data using two non-negative multiplicative bias correction methods.Comput. Statist. Data Anal., 92:148–162, 2015. DOI:10.1016/j.csda.2015.07.006
-
[21]
Gawronski and U
W. Gawronski and U. Stadtm¨ uller. On density estimation by means of Poisson’s distribution.Scand. J. Statist., 7(2):90–94, 1980
1980
-
[22]
W. Gawronski and U. Stadtm¨ uller. Smoothing histograms by means of lattice- and continuous distributions.Metrika, 28:155–164, 1981.DOI:10.1007/BF01902889
-
[23]
Hirukawa, I
M. Hirukawa, I. Murtazashvili, and A. Prokhorov. Uniform convergence rates for nonparametric estimators smoothed by the beta kernel.Scand. J. Statist., 49(3):1353–1382, 2022.DOI:10.1111/ sjos.12573
2022
-
[24]
M. C. Jones. Simple boundary correction for kernel density estimation.Stat. Comput., 3(3):135–146, 1993.DOI:10.1007/BF00147776
-
[25]
B. Jørgensen. Exponential dispersion models.J. R. Stat. Soc. Ser. B. Stat. Methodol., 49(2):127–145, 1987.DOI:10.1111/j.2517-6161.1987.tb01685.x
-
[26]
Jørgensen.The Theory of Dispersion Models, volume 76 ofMonographs on Statistics and Applied Probability
B. Jørgensen.The Theory of Dispersion Models, volume 76 ofMonographs on Statistics and Applied Probability. Chapman & Hall, London, 1997. ISBN 978-0-412-99711-2
1997
-
[27]
R. J. Karunamuni and T. Alberts. On boundary correction in kernel density estimation.Stat. Methodol., 2(3):191–212, 2005.DOI:10.1016/j.stamet.2005.04.001
-
[28]
J. S. Marron and D. Ruppert. Transformations to reduce boundary bias in kernel density estimation. J. R. Stat. Soc. Ser. B. Methodol., 56(4):653–671, 1994.DOI:10.1111/j.2517-6161.1994.tb020 06.x
-
[29]
Min and A
Y. Min and A. Agresti. Modeling nonnegative data with clumping at zero: a survey.J. Iran. Stat. Soc., 1(1):7–33, 2002
2002
-
[30]
Moss and M
J. Moss and M. Tveten.kdensity: Kernel Density Estimation with Parametric Starts and Asymmetric Kernels, 2025. URLhttps://CRAN.R-project.org/package=kdensity. R package version 1.1.1
2025
-
[31]
M. K. Olsen and J. L. Schafer. A two-part random-effects model for semicontinuous longitudinal data.J. Amer. Statist. Assoc., 96(454):730–745, 2001.DOI:10.1198/016214501753168389
-
[32]
Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya, and Tomaso A Poggio
M. Rosenblatt. Remarks on some nonparametric estimates of a density function.Ann. Math. Statist., 27(3):832–837, 1956.DOI:10.1214/aoms/1177728190
-
[33]
E. F. Schuster. Incorporating support constraints into nonparametric estimators of densities.Comm. Statist. Theory Methods, 14(5):1123–1136, 1985.DOI:10.1080/03610928508828965
-
[34]
V. A. Smith, J. S. Preisser, B. Neelon, and M. L. Maciejewski. A marginalized two-part model for semicontinuous data.Stat. Med., 33(28):4891–4903, 2014.DOI:10.1002/sim.6263
-
[35]
V. A. Smith, B. Neelon, M. L. Maciejewski, and J. S. Preisser. Two parts are better than one: modeling marginal means of semicontinuous data.Health Serv. Outcomes Res. Methodol., 17(3–4): 198–218, 2017.DOI:10.1007/s10742-017-0169-9
-
[36]
Census profile, 2021 census of population: Zone 4 - Central, Nova Scotia [Health region], 2023
Statistics Canada. Census profile, 2021 census of population: Zone 4 - Central, Nova Scotia [Health region], 2023. URLhttps://www12.statcan.gc.ca/census-recensement/2021/dp-pd/prof/i ndex.cfm?Lang=E. Statistics Canada Catalogue no. 98-316-X2021001; released November 15, 2023; accessed April 27, 2026
2021
-
[37]
M. C. K. Tweedie. An index which distinguishes between some important exponential families. In J. K. Ghosh and J. Roy, editors,Statistics: Applications and New Directions, pages 579–604. Indian Statistical Institute, Calcutta, 1984. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference
1984
-
[38]
S. Yiu and B. D. M. Tom. Two-part models with stochastic processes for modelling longitudinal semicontinuous data: computationally efficient inference and modelling the overall marginal mean. Stat. Methods Med. Res., 27(12):3679–3695, 2018.DOI:10.1177/0962280217710573. 30
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.