Error bounds of Median-of-means estimators with VC-dimension
Pith reviewed 2026-05-23 21:22 UTC · model grok-4.3
The pith
Median-of-means estimators achieve error bounds for mean vectors under finite second moments using VC-dimension.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The median-of-means method yields upper error bounds for robust estimators of the mean vector that handle heavy-tailed data and contamination with only finite second moments. A new estimator is the MOM version of the halfspace depth, which comes with error bounds for mean estimation in any norm. The use of VC-dimension allows implementation in covariance estimation without additional tail or norm equivalence conditions.
What carries the argument
The median-of-means (MOM) method applied to classes with finite VC-dimension, including the MOM version of the halfspace depth.
Load-bearing premise
That VC-dimension provides sufficient control over the statistical complexity of the function classes involved in the estimation, replacing the need for Rademacher complexity or stronger tail conditions.
What would settle it
Empirical verification on a heavy-tailed distribution with finite variance where the observed error of the MOM halfspace depth estimator exceeds the derived upper bound.
read the original abstract
We obtain the upper error bounds of robust estimators for mean vector, using the median-of-means (MOM) method. The method is designed to handle data with heavy tails and contamination, with only a finite second moment, which is weaker than many others, relying on the VC dimension rather than the Rademacher complexity to measure statistical complexity. This allows us to implement MOM in covariance estimation, without imposing conditions such as $L$-sub-Gaussian or $L_{4}-L_{2}$ norm equivalence. In particular, we derive a new robust estimator, the MOM version of the halfspace depth, along with error bounds for mean estimation in any norm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to derive upper error bounds for median-of-means (MOM) estimators of the mean vector under heavy tails and contamination, assuming only finite second moments. Statistical complexity is controlled via VC-dimension (rather than Rademacher averages), enabling a new MOM halfspace-depth estimator whose bounds hold in arbitrary norms; the same framework is asserted to extend to covariance estimation without L-sub-Gaussian or L4-L2 norm-equivalence conditions.
Significance. If the derivations are correct, the work would relax moment and norm assumptions that commonly restrict MOM methods, potentially broadening applicability to covariance estimation and other high-dimensional tasks. The explicit use of VC-dimension to obtain parameter-free bounds in arbitrary norms is a notable technical feature.
major comments (2)
- The central claim that VC-dimension suffices to remove L-sub-Gaussian and L4-L2 conditions for covariance estimation is load-bearing; the manuscript must exhibit the precise VC-class for the relevant covariance functionals and verify that the resulting deviation inequality holds under only second moments (no section or equation number supplied in the provided text).
- The error bound for the MOM halfspace-depth estimator in arbitrary norms relies on a specific chaining or covering argument; the manuscript should state the VC-dimension of the halfspace class and the precise constant dependence on that dimension (no equation or theorem number given).
minor comments (2)
- Notation for the MOM aggregator and the halfspace-depth functional should be introduced with explicit definitions before the main theorems.
- The abstract asserts 'upper error bounds' but does not indicate whether the bounds are dimension-free or how the sample size enters; a brief statement of the rate would improve readability.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for highlighting the load-bearing aspects of our claims. We address the two major comments point by point below and will revise the manuscript accordingly to improve clarity and completeness.
read point-by-point responses
-
Referee: The central claim that VC-dimension suffices to remove L-sub-Gaussian and L4-L2 conditions for covariance estimation is load-bearing; the manuscript must exhibit the precise VC-class for the relevant covariance functionals and verify that the resulting deviation inequality holds under only second moments (no section or equation number supplied in the provided text).
Authors: We agree that the extension to covariance estimation requires explicit verification to be fully convincing. The manuscript states that the VC-dimension approach applies to covariance without L-sub-Gaussian or L4-L2 assumptions, but does not supply the precise VC-class or the full deviation inequality derivation in a dedicated section. In revision we will add this material: the relevant VC-class consists of the functions x |-> 1_{<x,u> > t} <x,v> (or the quadratic forms for the entries of the covariance matrix) indexed by directions u,v on the unit sphere, whose VC-dimension is bounded by a constant multiple of d; the resulting MOM deviation inequality then follows directly from the general theorem under only finite second moments, with the same proof structure as for the mean. We will include the explicit class, the VC-dimension bound, and the referenced theorem/equation numbers. revision: yes
-
Referee: The error bound for the MOM halfspace-depth estimator in arbitrary norms relies on a specific chaining or covering argument; the manuscript should state the VC-dimension of the halfspace class and the precise constant dependence on that dimension (no equation or theorem number given).
Authors: The halfspace class in R^d has VC-dimension exactly d+1. Our error bound for the MOM halfspace-depth estimator is obtained by substituting this VC-dimension into the general MOM deviation inequality (which itself uses a chaining argument controlled by the VC-dimension via the Sauer-Shelah lemma). The resulting constants therefore depend on d+1 through a factor of order sqrt((d+1) log(1/delta)) or the corresponding covering number term. In the revised manuscript we will explicitly record the VC-dimension d+1 in the statement of the halfspace-depth theorem and spell out the precise functional dependence of the constants on this dimension, together with the equation or theorem number of the underlying general bound. revision: yes
Circularity Check
No significant circularity detected
full rationale
The provided abstract and claims describe a derivation of error bounds for median-of-means estimators controlled by VC-dimension, introducing a MOM halfspace-depth estimator under finite second moments. No equations or steps are shown that reduce a prediction to a fitted input by construction, nor any load-bearing self-citation chain, self-definitional loop, or ansatz smuggled via prior work. The central claims rest on standard VC-dimension properties applied to MOM, which are independent of the target bounds and externally verifiable. This is the expected non-finding for a paper whose derivation chain is self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data distribution has finite second moment
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
relying on the VC dimension rather than the Rademacher complexity to measure statistical complexity... MOM version of the halfspace depth... error bounds for mean estimation in any norm
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
VC(F) ≤ c0 VC(B∗0)... E sup |1/N ∑ f(Xi) − Ef| ≤ C √(VC(F)/N)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Aloupis, G. (2006). Geometric measures of data depth. DIMACS series in discrete mathematics and theoretical computer science , 72 , 147,
work page 2006
-
[2]
Boucheron, S., Lugosi, G., Bousquet, O. (2003). Concentration in equalities. Summer school on machine learning (pp. 208–240). Springer
work page 2003
- [3]
-
[4]
Brownlees, C., Joly, E., Lugosi, G. (2015). Empirical risk minimization f or heavy-tailed losses. The Annals of Statistics , 43 (6), 2507–2536,
work page 2015
-
[5]
Catoni, O. (2012). Challenging the empirical mean and empirical varia nce: a deviation study. Annales de l’ihp probabilit´ es et statistiques (Vol. 48, pp. 1148–1185)
work page 2012
-
[6]
Catoni, O., & Giulini, I. (2017). Dimension-free pac-bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747 , ,
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
Dalalyan, A.S., & Minasyan, A. (2022). All-in-one robust estimator of the gaussian mean. The Annals of Statistics , 50 (2), 1193–1219,
work page 2022
- [8]
-
[9]
Depersin, J., & Lecu´ e, G. (2023). On the robustness to adversa rial corruption and to heavy-tailed data of the stahel–donoho median of means. Information and 17 Inference: A Journal of the IMA , 12 (2), 814–850,
work page 2023
-
[10]
Devroye, L., Lerasle, M., Lugosi, G., Oliveira, R.I. (2016). Sub-gaus sian mean estimators. The Annals of Statistics , 44 (6), 2695–2725,
work page 2016
- [11]
-
[12]
Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. , 35 , 73–101,
work page 1964
-
[13]
Ke, Y., Minsker, S., Ren, Z., Sun, Q., Zhou, W.-X. (2019). User-frien dly covariance estimation for heavy-tailed distributions. Statistical Science , 34 (3), 454–471, Lecu´ e, G., & Mendelson, S. (2017). Regularization and the small-ba ll method ii: complexity dependent error rates. The Journal of Machine Learning Research , 18 (1), 5356–5403, Lecu´ e, G....
work page 2019
- [14]
-
[15]
Lugosi, G., & Mendelson, S. (2019c). Regularization, sparse recov ery, and median-of- means tournaments. Bernoulli , 25 (3), , https://doi.org/10.3150/18-bej1046 18
- [16]
-
[17]
Mendelson, S. (2017). An optimal unrestricted learning procedur e. arXiv preprint arXiv:1707.05342, ,
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[18]
Mendelson, S., & Zhivotovskiy, N. (2020). Robust covariance estim ation under L4 − L2 norm equivalence. The Annals of Statistics , 48 (3), 1648 – 1664, https://doi.org/10.1214/19-AOS1862 Retrieved from https://doi.org/10.1214/19-AOS1862
-
[19]
Minsker, S. (2015). Geometric median and robust estimation in bana ch spaces. Bernoulli , 21 (4), 2308,
work page 2015
-
[20]
Minsker, S. (2018). Sub-gaussian estimators of the mean of a ran dom matrix with heavy-tailed entries. The Annals of Statistics , 46 (6A), 2871–2903,
work page 2018
-
[21]
Nagy, S., Dyckerhoff, R., Mozharovskyi, P. (2020). Uniform conve rgence rates for the approximated halfspace and projection depth. Electronic Journal of Statistics , 14 , 3939–3975,
work page 2020
-
[22]
Rousseeuw, P.J., & Hubert, M. (1999). Depth in an arrangement of hyperplanes. Discrete & Computational Geometry , 22 (2), 167–176,
work page 1999
-
[23]
Sen, B. (2018). A gentle introduction to empirical process theory and applications. Lecture Notes, Columbia University , 11 , 28–29,
work page 2018
-
[24]
Small, C.G. (1990). A survey of multidimensional medians. International Statistical Review/Revue Internationale de Statistique , 263–277,
work page 1990
-
[25]
Tukey, J.W. (1975). Mathematics and the picturing of data. Proceedings of the international congress of mathematicians, vancouver, 197 5 (Vol. 2, pp. 523– 531). 19 van der Vaart, A., & Wellner, J.A. (2009). A note on bounds for vc dim ensions. In Institute of mathematical statistics collections (p. 103–107). Institute of Mathematical Statistics
work page 1975
-
[26]
Vapnik, V. (2013). The nature of statistical learning theory . Springer science & business media
work page 2013
-
[27]
Vershynin, R. (2018). High-dimensional probability: An introduction with appli cations in data science (Vol. 47). Cambridge university press
work page 2018
-
[28]
Wei, X., & Minsker, S. (2017). Estimation of the covariance structu re of heavy-tailed distributions. Advances in neural information processing systems , 30 , ,
work page 2017
-
[29]
Zwald, L., & Blanchard, G. (2005). On the convergence of eigenspa ces in kernel principal component analysis. Advances in neural information processing sys- tems 18 [neural information processing systems, NIPS 2005, december 5-8, 2005, vancouver, british columbia, canada] (pp. 1649–1656). Retrieved from https://proceedings.neurips.cc/paper/2005/hash/2b692...
work page 2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.