Error bounds of Median-of-means estimators with VC-dimension

Hanchao Wang; Lixin Zhang; Yiming Chen; Yuxuan Wang

arxiv: 2409.03410 · v2 · submitted 2024-09-05 · 🧮 math.ST · stat.TH

Error bounds of Median-of-means estimators with VC-dimension

Yuxuan Wang , Yiming Chen , Hanchao Wang , Lixin Zhang This is my paper

Pith reviewed 2026-05-23 21:22 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords median of meansVC dimensionrobust mean estimationhalfspace deptherror boundsheavy tailed distributionscovariance estimation

0 comments

The pith

Median-of-means estimators achieve error bounds for mean vectors under finite second moments using VC-dimension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper derives upper error bounds for robust mean estimation using the median-of-means method on data with heavy tails and possible contamination. It requires only finite second moments, a weaker condition than many existing approaches. By measuring statistical complexity with VC-dimension instead of Rademacher complexity, the method extends to covariance estimation without needing L-sub-Gaussian assumptions or L4-L2 norm equivalence. The authors introduce the median-of-means version of the halfspace depth as a new robust estimator that provides bounds for mean estimation in arbitrary norms.

Core claim

The median-of-means method yields upper error bounds for robust estimators of the mean vector that handle heavy-tailed data and contamination with only finite second moments. A new estimator is the MOM version of the halfspace depth, which comes with error bounds for mean estimation in any norm. The use of VC-dimension allows implementation in covariance estimation without additional tail or norm equivalence conditions.

What carries the argument

The median-of-means (MOM) method applied to classes with finite VC-dimension, including the MOM version of the halfspace depth.

Load-bearing premise

That VC-dimension provides sufficient control over the statistical complexity of the function classes involved in the estimation, replacing the need for Rademacher complexity or stronger tail conditions.

What would settle it

Empirical verification on a heavy-tailed distribution with finite variance where the observed error of the MOM halfspace depth estimator exceeds the derived upper bound.

read the original abstract

We obtain the upper error bounds of robust estimators for mean vector, using the median-of-means (MOM) method. The method is designed to handle data with heavy tails and contamination, with only a finite second moment, which is weaker than many others, relying on the VC dimension rather than the Rademacher complexity to measure statistical complexity. This allows us to implement MOM in covariance estimation, without imposing conditions such as $L$-sub-Gaussian or $L_{4}-L_{2}$ norm equivalence. In particular, we derive a new robust estimator, the MOM version of the halfspace depth, along with error bounds for mean estimation in any norm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VC-dimension swap lets MOM reach covariance estimation under only second moments, which is the real technical step here.

read the letter

The paper uses VC-dimension to get median-of-means error bounds for the mean and covariance under only finite second moments. This avoids the sub-Gaussian and norm equivalence conditions that usually appear in this area. What is new is the MOM halfspace depth estimator and the application to covariance via VC-dimension. It gives bounds in any norm for the mean. This is a solid move because it directly addresses the limitation of stronger assumptions in robust high-dim work. The abstract presents it without obvious gaps in the logic. The main issue is that no actual bounds or proof ideas are given, so we can't see the dependence on dimension, sample size, or contamination level. The claims rest on the VC-dim property delivering the right control, but without the details it's hard to verify. This is for people working on robust estimation with heavy tails. A specialist in the subfield would find the relaxation of assumptions worth examining. Send it for peer review to let referees evaluate the derivations.

Referee Report

2 major / 2 minor

Summary. The manuscript claims to derive upper error bounds for median-of-means (MOM) estimators of the mean vector under heavy tails and contamination, assuming only finite second moments. Statistical complexity is controlled via VC-dimension (rather than Rademacher averages), enabling a new MOM halfspace-depth estimator whose bounds hold in arbitrary norms; the same framework is asserted to extend to covariance estimation without L-sub-Gaussian or L4-L2 norm-equivalence conditions.

Significance. If the derivations are correct, the work would relax moment and norm assumptions that commonly restrict MOM methods, potentially broadening applicability to covariance estimation and other high-dimensional tasks. The explicit use of VC-dimension to obtain parameter-free bounds in arbitrary norms is a notable technical feature.

major comments (2)

The central claim that VC-dimension suffices to remove L-sub-Gaussian and L4-L2 conditions for covariance estimation is load-bearing; the manuscript must exhibit the precise VC-class for the relevant covariance functionals and verify that the resulting deviation inequality holds under only second moments (no section or equation number supplied in the provided text).
The error bound for the MOM halfspace-depth estimator in arbitrary norms relies on a specific chaining or covering argument; the manuscript should state the VC-dimension of the halfspace class and the precise constant dependence on that dimension (no equation or theorem number given).

minor comments (2)

Notation for the MOM aggregator and the halfspace-depth functional should be introduced with explicit definitions before the main theorems.
The abstract asserts 'upper error bounds' but does not indicate whether the bounds are dimension-free or how the sample size enters; a brief statement of the rate would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting the load-bearing aspects of our claims. We address the two major comments point by point below and will revise the manuscript accordingly to improve clarity and completeness.

read point-by-point responses

Referee: The central claim that VC-dimension suffices to remove L-sub-Gaussian and L4-L2 conditions for covariance estimation is load-bearing; the manuscript must exhibit the precise VC-class for the relevant covariance functionals and verify that the resulting deviation inequality holds under only second moments (no section or equation number supplied in the provided text).

Authors: We agree that the extension to covariance estimation requires explicit verification to be fully convincing. The manuscript states that the VC-dimension approach applies to covariance without L-sub-Gaussian or L4-L2 assumptions, but does not supply the precise VC-class or the full deviation inequality derivation in a dedicated section. In revision we will add this material: the relevant VC-class consists of the functions x |-> 1_{<x,u> > t} <x,v> (or the quadratic forms for the entries of the covariance matrix) indexed by directions u,v on the unit sphere, whose VC-dimension is bounded by a constant multiple of d; the resulting MOM deviation inequality then follows directly from the general theorem under only finite second moments, with the same proof structure as for the mean. We will include the explicit class, the VC-dimension bound, and the referenced theorem/equation numbers. revision: yes
Referee: The error bound for the MOM halfspace-depth estimator in arbitrary norms relies on a specific chaining or covering argument; the manuscript should state the VC-dimension of the halfspace class and the precise constant dependence on that dimension (no equation or theorem number given).

Authors: The halfspace class in R^d has VC-dimension exactly d+1. Our error bound for the MOM halfspace-depth estimator is obtained by substituting this VC-dimension into the general MOM deviation inequality (which itself uses a chaining argument controlled by the VC-dimension via the Sauer-Shelah lemma). The resulting constants therefore depend on d+1 through a factor of order sqrt((d+1) log(1/delta)) or the corresponding covering number term. In the revised manuscript we will explicitly record the VC-dimension d+1 in the statement of the halfspace-depth theorem and spell out the precise functional dependence of the constants on this dimension, together with the equation or theorem number of the underlying general bound. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The provided abstract and claims describe a derivation of error bounds for median-of-means estimators controlled by VC-dimension, introducing a MOM halfspace-depth estimator under finite second moments. No equations or steps are shown that reduce a prediction to a fitted input by construction, nor any load-bearing self-citation chain, self-definitional loop, or ansatz smuggled via prior work. The central claims rest on standard VC-dimension properties applied to MOM, which are independent of the target bounds and externally verifiable. This is the expected non-finding for a paper whose derivation chain is self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, invented entities, or non-standard axioms are mentioned. The finite second moment condition is treated as a domain assumption.

axioms (1)

domain assumption Data distribution has finite second moment
Explicitly stated as the weaker condition enabling the bounds.

pith-pipeline@v0.9.0 · 5638 in / 1060 out tokens · 22677 ms · 2026-05-23T21:22:42.702862+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

relying on the VC dimension rather than the Rademacher complexity to measure statistical complexity... MOM version of the halfspace depth... error bounds for mean estimation in any norm
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

VC(F) ≤ c0 VC(B∗0)... E sup |1/N ∑ f(Xi) − Ef| ≤ C √(VC(F)/N)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 2 internal anchors

[1]

Aloupis, G. (2006). Geometric measures of data depth. DIMACS series in discrete mathematics and theoretical computer science , 72 , 147,

work page 2006
[2]

Boucheron, S., Lugosi, G., Bousquet, O. (2003). Concentration in equalities. Summer school on machine learning (pp. 208–240). Springer

work page 2003
[3]

Briend, S., Lugosi, G., Oliveira, R.I. (2023). On the quality of randomiz ed approximations of tukey’s depth. arXiv preprint arXiv:2309.05657 , ,

work page arXiv 2023
[4]

Brownlees, C., Joly, E., Lugosi, G. (2015). Empirical risk minimization f or heavy-tailed losses. The Annals of Statistics , 43 (6), 2507–2536,

work page 2015
[5]

Catoni, O. (2012). Challenging the empirical mean and empirical varia nce: a deviation study. Annales de l’ihp probabilit´ es et statistiques (Vol. 48, pp. 1148–1185)

work page 2012
[6]

Catoni, O., & Giulini, I. (2017). Dimension-free pac-bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747 , ,

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Dalalyan, A.S., & Minasyan, A. (2022). All-in-one robust estimator of the gaussian mean. The Annals of Statistics , 50 (2), 1193–1219,

work page 2022
[8]

Depersin, J. (2020b). A spectral algorithm for robust regressio n with subgaussian rates. arXiv preprint arXiv:2007.06072 , ,

work page arXiv 2007
[9]

Depersin, J., & Lecu´ e, G. (2023). On the robustness to adversa rial corruption and to heavy-tailed data of the stahel–donoho median of means. Information and 17 Inference: A Journal of the IMA , 12 (2), 814–850,

work page 2023
[10]

Devroye, L., Lerasle, M., Lugosi, G., Oliveira, R.I. (2016). Sub-gaus sian mean estimators. The Annals of Statistics , 44 (6), 2695–2725,

work page 2016
[11]

Donoho, D.L., & Gasko, M. (1992). Breakdown properties of locatio n estimates based on halfspace depth and projected outlyingness. The Annals of Statistics , 20 (4), 1803–1827, Retrieved 2024-01-29, from http://www.jstor.org/ stable/2242368

work page arXiv 1992
[12]

Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. , 35 , 73–101,

work page 1964
[13]

Ke, Y., Minsker, S., Ren, Z., Sun, Q., Zhou, W.-X. (2019). User-frien dly covariance estimation for heavy-tailed distributions. Statistical Science , 34 (3), 454–471, Lecu´ e, G., & Mendelson, S. (2017). Regularization and the small-ba ll method ii: complexity dependent error rates. The Journal of Machine Learning Research , 18 (1), 5356–5403, Lecu´ e, G....

work page 2019
[14]

Lerasle, M., & Oliveira, R.I. (2011). Robust empirical mean estimator s. arXiv preprint arXiv:1112.3914, ,

work page arXiv 2011
[15]

Lugosi, G., & Mendelson, S. (2019c). Regularization, sparse recov ery, and median-of- means tournaments. Bernoulli , 25 (3), , https://doi.org/10.3150/18-bej1046 18

work page doi:10.3150/18-bej1046
[16]

Lugosi, G., & Mendelson, S. (2020). Multivariate mean estimation with direction- dependent accuracy. arXiv preprint arXiv:2010.11921 , ,

work page arXiv 2020
[17]

Mendelson, S. (2017). An optimal unrestricted learning procedur e. arXiv preprint arXiv:1707.05342, ,

work page internal anchor Pith review Pith/arXiv arXiv 2017
[18]

Mendelson, S., & Zhivotovskiy, N. (2020). Robust covariance estim ation under L4 − L2 norm equivalence. The Annals of Statistics , 48 (3), 1648 – 1664, https://doi.org/10.1214/19-AOS1862 Retrieved from https://doi.org/10.1214/19-AOS1862

work page doi:10.1214/19-aos1862 2020
[19]

Minsker, S. (2015). Geometric median and robust estimation in bana ch spaces. Bernoulli , 21 (4), 2308,

work page 2015
[20]

Minsker, S. (2018). Sub-gaussian estimators of the mean of a ran dom matrix with heavy-tailed entries. The Annals of Statistics , 46 (6A), 2871–2903,

work page 2018
[21]

Nagy, S., Dyckerhoﬀ, R., Mozharovskyi, P. (2020). Uniform conve rgence rates for the approximated halfspace and projection depth. Electronic Journal of Statistics , 14 , 3939–3975,

work page 2020
[22]

Rousseeuw, P.J., & Hubert, M. (1999). Depth in an arrangement of hyperplanes. Discrete & Computational Geometry , 22 (2), 167–176,

work page 1999
[23]

Sen, B. (2018). A gentle introduction to empirical process theory and applications. Lecture Notes, Columbia University , 11 , 28–29,

work page 2018
[24]

Small, C.G. (1990). A survey of multidimensional medians. International Statistical Review/Revue Internationale de Statistique , 263–277,

work page 1990
[25]

Tukey, J.W. (1975). Mathematics and the picturing of data. Proceedings of the international congress of mathematicians, vancouver, 197 5 (Vol. 2, pp. 523– 531). 19 van der Vaart, A., & Wellner, J.A. (2009). A note on bounds for vc dim ensions. In Institute of mathematical statistics collections (p. 103–107). Institute of Mathematical Statistics

work page 1975
[26]

Vapnik, V. (2013). The nature of statistical learning theory . Springer science & business media

work page 2013
[27]

Vershynin, R. (2018). High-dimensional probability: An introduction with appli cations in data science (Vol. 47). Cambridge university press

work page 2018
[28]

Wei, X., & Minsker, S. (2017). Estimation of the covariance structu re of heavy-tailed distributions. Advances in neural information processing systems , 30 , ,

work page 2017
[29]

Zwald, L., & Blanchard, G. (2005). On the convergence of eigenspa ces in kernel principal component analysis. Advances in neural information processing sys- tems 18 [neural information processing systems, NIPS 2005, december 5-8, 2005, vancouver, british columbia, canada] (pp. 1649–1656). Retrieved from https://proceedings.neurips.cc/paper/2005/hash/2b692...

work page 2005

[1] [1]

Aloupis, G. (2006). Geometric measures of data depth. DIMACS series in discrete mathematics and theoretical computer science , 72 , 147,

work page 2006

[2] [2]

Boucheron, S., Lugosi, G., Bousquet, O. (2003). Concentration in equalities. Summer school on machine learning (pp. 208–240). Springer

work page 2003

[3] [3]

Briend, S., Lugosi, G., Oliveira, R.I. (2023). On the quality of randomiz ed approximations of tukey’s depth. arXiv preprint arXiv:2309.05657 , ,

work page arXiv 2023

[4] [4]

Brownlees, C., Joly, E., Lugosi, G. (2015). Empirical risk minimization f or heavy-tailed losses. The Annals of Statistics , 43 (6), 2507–2536,

work page 2015

[5] [5]

Catoni, O. (2012). Challenging the empirical mean and empirical varia nce: a deviation study. Annales de l’ihp probabilit´ es et statistiques (Vol. 48, pp. 1148–1185)

work page 2012

[6] [6]

Catoni, O., & Giulini, I. (2017). Dimension-free pac-bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747 , ,

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Dalalyan, A.S., & Minasyan, A. (2022). All-in-one robust estimator of the gaussian mean. The Annals of Statistics , 50 (2), 1193–1219,

work page 2022

[8] [8]

Depersin, J. (2020b). A spectral algorithm for robust regressio n with subgaussian rates. arXiv preprint arXiv:2007.06072 , ,

work page arXiv 2007

[9] [9]

Depersin, J., & Lecu´ e, G. (2023). On the robustness to adversa rial corruption and to heavy-tailed data of the stahel–donoho median of means. Information and 17 Inference: A Journal of the IMA , 12 (2), 814–850,

work page 2023

[10] [10]

Devroye, L., Lerasle, M., Lugosi, G., Oliveira, R.I. (2016). Sub-gaus sian mean estimators. The Annals of Statistics , 44 (6), 2695–2725,

work page 2016

[11] [11]

Donoho, D.L., & Gasko, M. (1992). Breakdown properties of locatio n estimates based on halfspace depth and projected outlyingness. The Annals of Statistics , 20 (4), 1803–1827, Retrieved 2024-01-29, from http://www.jstor.org/ stable/2242368

work page arXiv 1992

[12] [12]

Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. , 35 , 73–101,

work page 1964

[13] [13]

Ke, Y., Minsker, S., Ren, Z., Sun, Q., Zhou, W.-X. (2019). User-frien dly covariance estimation for heavy-tailed distributions. Statistical Science , 34 (3), 454–471, Lecu´ e, G., & Mendelson, S. (2017). Regularization and the small-ba ll method ii: complexity dependent error rates. The Journal of Machine Learning Research , 18 (1), 5356–5403, Lecu´ e, G....

work page 2019

[14] [14]

Lerasle, M., & Oliveira, R.I. (2011). Robust empirical mean estimator s. arXiv preprint arXiv:1112.3914, ,

work page arXiv 2011

[15] [15]

Lugosi, G., & Mendelson, S. (2019c). Regularization, sparse recov ery, and median-of- means tournaments. Bernoulli , 25 (3), , https://doi.org/10.3150/18-bej1046 18

work page doi:10.3150/18-bej1046

[16] [16]

Lugosi, G., & Mendelson, S. (2020). Multivariate mean estimation with direction- dependent accuracy. arXiv preprint arXiv:2010.11921 , ,

work page arXiv 2020

[17] [17]

Mendelson, S. (2017). An optimal unrestricted learning procedur e. arXiv preprint arXiv:1707.05342, ,

work page internal anchor Pith review Pith/arXiv arXiv 2017

[18] [18]

Mendelson, S., & Zhivotovskiy, N. (2020). Robust covariance estim ation under L4 − L2 norm equivalence. The Annals of Statistics , 48 (3), 1648 – 1664, https://doi.org/10.1214/19-AOS1862 Retrieved from https://doi.org/10.1214/19-AOS1862

work page doi:10.1214/19-aos1862 2020

[19] [19]

Minsker, S. (2015). Geometric median and robust estimation in bana ch spaces. Bernoulli , 21 (4), 2308,

work page 2015

[20] [20]

Minsker, S. (2018). Sub-gaussian estimators of the mean of a ran dom matrix with heavy-tailed entries. The Annals of Statistics , 46 (6A), 2871–2903,

work page 2018

[21] [21]

Nagy, S., Dyckerhoﬀ, R., Mozharovskyi, P. (2020). Uniform conve rgence rates for the approximated halfspace and projection depth. Electronic Journal of Statistics , 14 , 3939–3975,

work page 2020

[22] [22]

Rousseeuw, P.J., & Hubert, M. (1999). Depth in an arrangement of hyperplanes. Discrete & Computational Geometry , 22 (2), 167–176,

work page 1999

[23] [23]

Sen, B. (2018). A gentle introduction to empirical process theory and applications. Lecture Notes, Columbia University , 11 , 28–29,

work page 2018

[24] [24]

Small, C.G. (1990). A survey of multidimensional medians. International Statistical Review/Revue Internationale de Statistique , 263–277,

work page 1990

[25] [25]

Tukey, J.W. (1975). Mathematics and the picturing of data. Proceedings of the international congress of mathematicians, vancouver, 197 5 (Vol. 2, pp. 523– 531). 19 van der Vaart, A., & Wellner, J.A. (2009). A note on bounds for vc dim ensions. In Institute of mathematical statistics collections (p. 103–107). Institute of Mathematical Statistics

work page 1975

[26] [26]

Vapnik, V. (2013). The nature of statistical learning theory . Springer science & business media

work page 2013

[27] [27]

Vershynin, R. (2018). High-dimensional probability: An introduction with appli cations in data science (Vol. 47). Cambridge university press

work page 2018

[28] [28]

Wei, X., & Minsker, S. (2017). Estimation of the covariance structu re of heavy-tailed distributions. Advances in neural information processing systems , 30 , ,

work page 2017

[29] [29]

Zwald, L., & Blanchard, G. (2005). On the convergence of eigenspa ces in kernel principal component analysis. Advances in neural information processing sys- tems 18 [neural information processing systems, NIPS 2005, december 5-8, 2005, vancouver, british columbia, canada] (pp. 1649–1656). Retrieved from https://proceedings.neurips.cc/paper/2005/hash/2b692...

work page 2005