pith. sign in

arxiv: 2605.20604 · v1 · pith:BWNG6CK4new · submitted 2026-05-20 · 📊 stat.ME

Conditional regularized halfspace depth for sparse functional data and its applications

Pith reviewed 2026-05-21 03:22 UTC · model grok-4.3

classification 📊 stat.ME
keywords sparse functional datahalfspace depthconditional depthfunctional data analysisdata depthranking methodsgrowth curves
0
0 comments X

The pith

Conditional regularized halfspace depth ranks sparse functional data directly from observations without reconstructing trajectories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces conditional regularized halfspace depth to order functional data observed only at scattered irregular points. It sets this depth equal to the lowest conditional halfspace probability for the full underlying trajectory given the sparse points that were actually recorded. The construction works straight from the limited measurements and skips the usual step of estimating the unseen segments of each curve. The resulting orderings support rank-based tests and are demonstrated on infant growth data collected at irregular intervals.

Core claim

CRHD is defined as the infimum of conditional halfspace probabilities of the underlying trajectory given the observed sparse measurements, thereby enabling depth evaluation directly at sparse observations without requiring trajectory reconstruction. The paper establishes several basic theoretical properties that confirm its behavior as a depth measure and shows that the approach remains applicable even when observations are extremely sparse.

What carries the argument

The conditional regularized halfspace depth, defined as the infimum of conditional halfspace probabilities for the trajectory given the sparse measurements.

If this is right

  • CRHD applies to functional data observed at only a handful of irregular times.
  • It produces rankings that can be used directly in rank-based statistical procedures.
  • The method avoids bias that can arise from preliminary curve reconstruction steps.
  • It is illustrated on an infant growth dataset with irregular measurement times.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conditional-probability construction could be examined for other forms of incomplete high-dimensional data where full reconstruction is unreliable.
  • It may offer a route to robust outlier detection in longitudinal studies that collect only a few observations per subject.
  • Performance under varying patterns of missingness, such as clustered observation times, remains a natural next check.

Load-bearing premise

Conditional halfspace probabilities for the infinite-dimensional trajectory can be defined, regularized, and estimated from sparse measurements without introducing substantial bias or instability.

What would settle it

Apply CRHD to simulated data generated from known full trajectories, subsample each trajectory to sparse points, compute the induced ordering, and compare it with the ordering obtained from full-data depth; systematic reversal of expected ranks would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.20604 by Hyemin Yeon, Sara Lopez-Pintado, Xiongtao Dai.

Figure 1
Figure 1. Figure 1: Empirical rank correlations of ACRHD, PCRHD, and TwoStageRHD values at [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical power of the KW tests based on both averaged and plug-in CRHDs [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Filtered height trajectories from the infant growth data, obtained by retaining [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
read the original abstract

Many functional datasets are observed sparsely and irregularly. Ordering such data is challenging because only limited information is available from each observation, while the underlying trajectories remain infinite-dimensional. This paper develops a novel depth notion for sparse functional data, called the conditional regularized halfspace depth (CRHD). CRHD is defined as the infimum of conditional halfspace probabilities of the underlying trajectory given the observed sparse measurements, thereby enabling depth evaluation directly at sparse observations without requiring trajectory reconstruction. We study several basic theoretical properties of CRHD that clarify its behavior as a depth measure. The proposed depth is applicable even to extremely sparsely observed functional data, overcoming key limitations of existing sparse functional depths that often rely on reconstructed curves. In addition, CRHD induces meaningful rankings for complex functional data. Its numerical performance is demonstrated through rank-based tests, and its practical utility is illustrated using an infant growth dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes the conditional regularized halfspace depth (CRHD) for sparsely and irregularly observed functional data. CRHD is defined as the infimum over directions u of the conditional probability that the underlying infinite-dimensional trajectory satisfies a halfspace condition given the finite sparse point observations. The authors state that this construction permits direct depth evaluation on the observed data points without explicit trajectory reconstruction. They examine basic theoretical properties of CRHD as a depth measure, claim applicability to extremely sparse regimes, demonstrate that it produces meaningful rankings, and illustrate performance via rank-based tests together with an application to an infant growth dataset.

Significance. If the central construction can be shown to be well-defined and estimable without implicitly reintroducing reconstruction bias, the work would address a genuine methodological gap in functional data analysis. Existing sparse functional depths frequently rely on preliminary smoothing or basis expansion, which can distort ordering; a depth that operates directly on the conditional law would be a useful addition. The reported theoretical properties and the real-data illustration on growth curves provide a foundation for further development, though the magnitude of the advance hinges on whether the regularization step delivers the claimed separation from reconstruction-based approaches.

major comments (3)
  1. [Section 2] Definition of CRHD (Section 2): the claim that the depth is evaluated 'directly at sparse observations without requiring trajectory reconstruction' is not yet supported by an explicit argument showing that the conditional probability P(X in halfspace | observations at t1..tk) remains identifiable and stable under the chosen regularization without effectively constraining the trajectory to a finite-dimensional subspace. In nonparametric settings the conditional law of an infinite-dimensional process given finitely many point evaluations is typically non-identifiable; any practical estimator must therefore impose structure (basis truncation, kernel smoothing, or parametric covariance) whose effect on the resulting depth values needs to be quantified.
  2. [Section 3] Theoretical properties (Section 3): the manuscript states that several basic properties of CRHD are established, yet the provided text does not contain the derivations or the precise regularity conditions under which the infimum over directions is attained and the depth is monotone or convex. Without these details it is difficult to assess whether the properties survive the regularization step or whether they reduce to known properties of unconditional halfspace depth after the conditional law is approximated.
  3. [Section 4] Estimation procedure (Section 4): the numerical implementation must rely on an estimator of the conditional halfspace probability. The manuscript should clarify whether this estimator is constructed from a finite sample of sparsely observed curves and whether consistency or rates are proved; if the estimator implicitly reconstructs trajectories via the same regularization used in the definition, the advantage over existing reconstruction-based depths requires explicit comparison on the same sparse regimes.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from a concise statement of the precise form of regularization (e.g., basis dimension, penalty parameter, or kernel bandwidth) employed in the definition and estimation of CRHD.
  2. [Section 2] Notation for the sparse observation times and the conditioning sigma-field should be introduced once and used consistently throughout the theoretical sections.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each major comment point by point below, indicating revisions where the manuscript will be strengthened to provide additional clarity and rigor.

read point-by-point responses
  1. Referee: [Section 2] Definition of CRHD (Section 2): the claim that the depth is evaluated 'directly at sparse observations without requiring trajectory reconstruction' is not yet supported by an explicit argument showing that the conditional probability P(X in halfspace | observations at t1..tk) remains identifiable and stable under the chosen regularization without effectively constraining the trajectory to a finite-dimensional subspace. In nonparametric settings the conditional law of an infinite-dimensional process given finitely many point evaluations is typically non-identifiable; any practical estimator must therefore impose structure (basis truncation, kernel smoothing, or parametric covariance) whose effect on the resulting depth values needs to be quantified.

    Authors: We appreciate the referee's emphasis on identifiability. The regularization in the definition of CRHD is introduced precisely to ensure the conditional probability is well-defined and depends only on the sparse observations and the regularization parameter, without performing explicit trajectory reconstruction or projecting onto a finite basis. We will revise Section 2 to include a detailed argument establishing identifiability under the chosen regularization, along with an analysis of how the regularization parameter affects depth values and preserves the infinite-dimensional character of the underlying process. revision: yes

  2. Referee: [Section 3] Theoretical properties (Section 3): the manuscript states that several basic properties of CRHD are established, yet the provided text does not contain the derivations or the precise regularity conditions under which the infimum over directions is attained and the depth is monotone or convex. Without these details it is difficult to assess whether the properties survive the regularization step or whether they reduce to known properties of unconditional halfspace depth after the conditional law is approximated.

    Authors: We agree that the derivations and regularity conditions should be presented more explicitly. In the revised manuscript we will supply complete proofs for the properties of CRHD, including conditions ensuring the infimum is attained and establishing monotonicity and convexity. These proofs will explicitly address the effect of regularization and demonstrate that the properties do not reduce to those of the unconditional halfspace depth. revision: yes

  3. Referee: [Section 4] Estimation procedure (Section 4): the numerical implementation must rely on an estimator of the conditional halfspace probability. The manuscript should clarify whether this estimator is constructed from a finite sample of sparsely observed curves and whether consistency or rates are proved; if the estimator implicitly reconstructs trajectories via the same regularization used in the definition, the advantage over existing reconstruction-based depths requires explicit comparison on the same sparse regimes.

    Authors: The estimator is constructed directly from finite samples of sparsely observed curves via empirical conditional probabilities under the regularization. We will clarify this construction in the revised Section 4, add consistency results with convergence rates, and include explicit numerical comparisons against reconstruction-based depths in the same sparse observation regimes to quantify the practical advantage. revision: yes

Circularity Check

0 steps flagged

No significant circularity; CRHD definition is a direct construction from conditional probabilities without reduction to fitted inputs or self-citations

full rationale

The paper defines CRHD explicitly as the infimum of conditional halfspace probabilities of the underlying trajectory given sparse measurements. This is presented as a novel depth notion enabling direct evaluation without reconstruction. No equations or sections in the provided abstract or description show the definition reducing by construction to its own inputs, fitted parameters renamed as predictions, or load-bearing self-citations. The regularization is invoked to make the conditional law tractable, but the derivation chain remains independent of the target result itself. This is the common honest non-finding for papers introducing new depth measures via explicit probabilistic definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.0 · 5680 in / 935 out tokens · 35996 ms · 2026-05-21T03:22:08.733560+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

158 extracted references · 158 canonical work pages

  1. [1]

    Proceedings of the International Congress of Mathematicians, Vancouver, 1975 , volume=

    Mathematics and the picturing of data , author=. Proceedings of the International Congress of Mathematicians, Vancouver, 1975 , volume=

  2. [2]

    1970 , volume=

    Exploratory Data Analysis (Limited Preliminary Edition) , author=. 1970 , volume=

  3. [3]

    1977 , publisher=

    Exploratory Data Analysis , author=. 1977 , publisher=

  4. [4]

    1979 , publisher=

    Multivariate statistics , author=. 1979 , publisher=

  5. [5]

    1988 , publisher=

    Linear Operators, Part 1: General Theory , author=. 1988 , publisher=

  6. [6]

    I , author=

    Classes of Linear Operators Vol. I , author=. 1990 , publisher=

  7. [7]

    1996 , publisher=

    Handbook of Analysis and its Foundations , author=. 1996 , publisher=

  8. [8]

    1999 , publisher=

    Theory of Rank Tests , author=. 1999 , publisher=

  9. [9]

    2006 , publisher=

    Nonparametrics: Statistical Methods Based on Ranks , author=. 2006 , publisher=

  10. [10]

    1996 , publisher=

    Weak Convergence and Empirical Processes: With Applications to Statistics , author=. 1996 , publisher=

  11. [11]

    1999 , publisher=

    Convergence of Probability Measures , author=. 1999 , publisher=

  12. [12]

    2004 , publisher=

    Reproducing Kernel Hilbert Spaces in Probability and Statistics , author=. 2004 , publisher=

  13. [13]

    2002 , publisher=

    Applied Functional Data Analysis: Methods and Case Studies , author=. 2002 , publisher=

  14. [14]

    2005 , publisher=

    Functional Data Analysis , author=. 2005 , publisher=

  15. [15]

    2006 , publisher=

    Measure Theory and Probability Theory , author=. 2006 , publisher=

  16. [16]

    2009 , publisher=

    Elementary Functional Analysis , author=. 2009 , publisher=

  17. [17]

    2010 , publisher=

    Real Analysis , author=. 2010 , publisher=

  18. [18]

    2010 , publisher=

    Functional Analysis, Sobolev Spaces and Partial Differential Equations , author=. 2010 , publisher=

  19. [19]

    2012 , publisher=

    Inference for Functional Data with Applications , author=. 2012 , publisher=

  20. [20]

    2015 , publisher=

    Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators , author=. 2015 , publisher=

  21. [21]

    2017 , publisher=

    Introduction to functional data analysis , author=. 2017 , publisher=

  22. [22]

    2014 , publisher=

    Nonparametric Statistical Inference: Revised and Expanded , author=. 2014 , publisher=

  23. [23]

    1999 , publisher=

    Practical Nonparametric Statistics , author=. 1999 , publisher=

  24. [24]

    2003 , publisher=

    An Introduction to Multivariate Statistical Analysis , author=. 2003 , publisher=

  25. [25]

    2024 , publisher=

    Functional Data Analysis with R , author=. 2024 , publisher=

  26. [26]

    Psychometrika , volume=

    Principal components analysis of sampled functions , author=. Psychometrika , volume=. 1986 , publisher=

  27. [27]

    Birth , volume=

    The 1988 National Maternal and Infant Health Survey: design, content, and data availability , author=. Birth , volume=. 1991 , publisher=

  28. [28]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Some tools for functional data analysis , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1991 , publisher=

  29. [29]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    Estimating the mean and covariance structure nonparametrically when the data are curves , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1991 , publisher=

  30. [30]

    Journal of the Royal Statistical Society: Series B (Methodological) , volume=

    Incorporating parametric effects into functional principal components analysis , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1995 , publisher=

  31. [31]

    Annals of Statistics , volume=

    Smoothed functional principal components analysis by choice of norm , author=. Annals of Statistics , volume=. 1996 , publisher=

  32. [32]

    Bernoulli , volume=

    Asymptotics for the Tukey depth process, with an application to a multivariate trimmed mean , author=. Bernoulli , volume=. 2004 , publisher=

  33. [33]

    Journal of Computational and Graphical Statistics , volume=

    Integrated Depths for Partially Observed Functional Data , author=. Journal of Computational and Graphical Statistics , volume=. 2023 , publisher=

  34. [34]

    Biometrika , volume=

    A generalization of Fisher's z test , author=. Biometrika , volume=. 1938 , publisher=

  35. [35]

    Biometrics , volume=

    Individual comparisons by ranking methods , author=. Biometrics , volume=

  36. [36]

    Annals of Mathematical Statistics , pages=

    On a test of whether one of two random variables is stochastically larger than the other , author=. Annals of Mathematical Statistics , pages=. 1947 , publisher=

  37. [37]

    Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability , volume=

    A generalized T test and measure of multivariate dispersion , author=. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability , volume=. 1951 , organization=

  38. [38]

    Annals of Mathematical Statistics , pages=

    A nonparametric test for the several sample problem , author=. Annals of Mathematical Statistics , pages=. 1952 , publisher=

  39. [39]

    Journal of the American Statistical Association , volume=

    Use of ranks in one-criterion variance analysis , author=. Journal of the American Statistical Association , volume=. 1952 , publisher=

  40. [40]

    Journal of Approximation Theory , volume =

    Convergence rates of certain approximate solutions to. Journal of Approximation Theory , volume =. 1973 , issn =

  41. [41]

    Mathematische Operationsforschung und Statistik , volume=

    Kleffe, J. Mathematische Operationsforschung und Statistik , volume=. 1973 , publisher=

  42. [42]

    SIAM Journal on Mathematical Analysis , volume=

    Generalized inverses in reproducing kernel spaces: An approach to regularization of linear operator equations , author=. SIAM Journal on Mathematical Analysis , volume=. 1974 , publisher=

  43. [43]

    Technometrics , volume=

    Principal modes of variation for processes with continuous sample curves , author=. Technometrics , volume=. 1986 , publisher=

  44. [44]

    Annals of Statistics , pages=

    On a notion of data depth based on random simplices , author=. Annals of Statistics , pages=. 1990 , publisher=

  45. [45]

    L1-statistical analysis and related methods , pages=

    Data depth and multivariate rank tests , author=. L1-statistical analysis and related methods , pages=. 1992 , publisher=

  46. [46]

    Annals of Statistics , volume=

    Breakdown properties of location estimates based on halfspace depth and projected outlyingness , author=. Annals of Statistics , volume=. 1992 , publisher=

  47. [47]

    Journal of the American Statistical Association , volume=

    A quality index based on data depth and multivariate rank tests , author=. Journal of the American Statistical Association , volume=. 1993 , publisher=

  48. [48]

    Journal of Multivariate Analysis , volume=

    Bounds for the breakdown point of the simplicial median , author=. Journal of Multivariate Analysis , volume=. 1995 , publisher=

  49. [49]

    Journal of Multivariate Analysis , volume=

    A characterization of halfspace depth , author=. Journal of Multivariate Analysis , volume=. 1996 , publisher=

  50. [50]

    Balanced confidence regions based on

    Yeh, Arthur B and Singh, Kesar , journal=. Balanced confidence regions based on. 1997 , publisher=

  51. [51]

    Journal of the American Statistical Association , volume=

    Notions of limiting P values based on data depth and bootstrap , author=. Journal of the American Statistical Association , volume=. 1997 , publisher=

  52. [52]

    The Journals of Gerontology Series A: Biological Sciences and Medical Sciences , volume=

    Relationship of age patterns of fecundity to mortality, longevity, and lifetime reproduction in a large cohort of Mediterranean fruit fly females , author=. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences , volume=. 1998 , publisher=

  53. [53]

    Vital Health Stat , volume=

    National Maternal and Infant Health Survey: methods and response characteristics , author=. Vital Health Stat , volume=. 1998 , publisher=

  54. [54]

    Journal of the American Statistical Association , volume=

    Regression depth , author=. Journal of the American Statistical Association , volume=. 1999 , publisher=

  55. [55]

    Test , volume=

    Multivariate L-estimation , author=. Test , volume=. 1999 , publisher=

  56. [56]

    American Statistician , volume=

    The bagplot: a bivariate boxplot , author=. American Statistician , volume=. 1999 , publisher=

  57. [57]

    Different outcomes of the

    Bergmann, Reinhard and Ludbrook, John and Spooren, Will PJM , journal=. Different outcomes of the. 2000 , publisher=

  58. [58]

    Annals of Statistics , volume=

    General notions of statistical depth function , author=. Annals of Statistics , volume=. 2000 , publisher=

  59. [59]

    Test , volume=

    Trimmed means for functional data , author=. Test , volume=. 2001 , publisher=

  60. [60]

    A proof of the

    V. A proof of the. The American mathematical monthly , volume=. 2003 , publisher=

  61. [61]

    Journal of the American Statistical Association , volume=

    Clustering for sparsely sampled functional data , author=. Journal of the American Statistical Association , volume=. 2003 , publisher=

  62. [62]

    Allgemeines Statistisches Archiv , VOLUME =

    Dyckerhoff, Rainer , TITLE =. Allgemeines Statistisches Archiv , VOLUME =. 2004 , NUMBER =

  63. [63]

    Journal of Multivariate Analysis , volume=

    Clustering and classification based on the L1 data depth , author=. Journal of Multivariate Analysis , volume=. 2004 , publisher=

  64. [64]

    Scandinavian Journal of Statistics , volume=

    Functional modelling and classification of longitudinal data , author=. Scandinavian Journal of Statistics , volume=. 2005 , publisher=

  65. [65]

    Bernoulli , volume=

    On data depth and distribution-free discriminant analysis using separating surfaces , author=. Bernoulli , volume=. 2005 , publisher=

  66. [66]

    Annals of Statistics , volume=

    Functional linear regression analysis for longitudinal data , author=. Annals of Statistics , volume=

  67. [67]

    Journal of the American Statistical Association , volume=

    Functional data analysis for sparse longitudinal data , author=. Journal of the American Statistical Association , volume=. 2005 , publisher=

  68. [68]

    DIMACS Series in Discrete Mathematics and Theoretical Computer Science , volume=

    Depth-based classification for functional data , author=. DIMACS Series in Discrete Mathematics and Theoretical Computer Science , volume=. 2006 , publisher=

  69. [69]

    Annals of Statistics , volume=

    On the limiting distributions of multivariate depth-based rank sum statistics and related tests , author=. Annals of Statistics , volume=. 2006 , publisher=

  70. [70]

    Annals of Statistics , volume=

    Multidimensional trimming based on projection depth , author=. Annals of Statistics , volume=. 2006 , publisher=

  71. [71]

    Journal of Multivariate Analysis , volume=

    Asymptotic distributions of nonparametric regression estimators for longitudinal or functional data , author=. Journal of Multivariate Analysis , volume=. 2007 , publisher=

  72. [72]

    Petrov , keywords =

    Valentin V. Petrov , keywords =. On lower bounds for tail probabilities , journal =. 2007 , note =

  73. [73]

    Annals of Statistics , volume=

    Methodology and convergence rates for functional linear regression , author=. Annals of Statistics , volume=. 2007 , publisher=

  74. [74]

    Probability Theory and Related Fields , volume=

    Cardot, Herv. Probability Theory and Related Fields , volume=. 2007 , publisher=

  75. [75]

    Computational Statistics , volume=

    Robust estimation and classification for functional data via projection-based depth notions , author=. Computational Statistics , volume=. 2007 , publisher=

  76. [76]

    Journal of Theoretical Probability , volume=

    A sharp form of the Cramer--Wold theorem , author=. Journal of Theoretical Probability , volume=. 2007 , publisher=

  77. [77]

    The random

    Cuesta-Albertos, Juan Antonio and Nieto-Reyes, Alicia , journal=. The random. 2008 , publisher=

  78. [78]

    Statistics & Probability Letters , volume=

    Principal points and elliptical distributions from the multivariate setting to the functional case , author=. Statistics & Probability Letters , volume=. 2009 , publisher=

  79. [79]

    Journal of the American statistical Association , volume=

    On the concept of depth for functional data , author=. Journal of the American statistical Association , volume=. 2009 , publisher=

  80. [80]

    Journal of the American Statistical Association , volume=

    Estimating derivatives for samples of sparsely observed functions, with application to online auction dynamics , author=. Journal of the American Statistical Association , volume=. 2009 , publisher=

Showing first 80 references.