A Unified Approach to Robust Mean Estimation

Adarsh Prasad; Pradeep Ravikumar; Sivaraman Balakrishnan

arxiv: 1907.00927 · v1 · pith:2YRWB6UBnew · submitted 2019-07-01 · 📊 stat.ML · cs.AI· cs.LG

A Unified Approach to Robust Mean Estimation

Adarsh Prasad , Sivaraman Balakrishnan , Pradeep Ravikumar This is my paper

Pith reviewed 2026-05-25 11:34 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LG

keywords robust mean estimationHuber's contamination modelheavy-tailed noisesample pruningefficient algorithmsunified robust statistics

0 comments

The pith

A connection between Huber's contamination model and heavy-tailed noise yields estimators robust to both.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes links between Huber's epsilon-contamination model and the heavy-tailed noise model for robust mean estimation. Under certain conditions, these links lead to near-statistically-optimal estimators. It shows that variants of computationally efficient sample-pruning algorithms originally for Huber's model also handle heavy-tailed noise, providing simultaneous robustness. Intractable but optimal estimators are also provided for both models. Tests on synthetic data indicate superior performance over baselines.

Core claim

By developing connections between Huber's epsilon-contamination model and the heavy-tailed noise model, the paper provides conditions for near-statistically-optimal estimators and shows that efficient sample-pruning estimators are simultaneously robust to both heavy-tailed noise and Huber contamination.

What carries the argument

The connection between Huber's epsilon-contamination model and the heavy-tailed noise model that enables shared estimators.

Load-bearing premise

The connection between the two models yields near-statistically-optimal estimators only under certain unspecified conditions on the data distribution.

What would settle it

A concrete counterexample distribution in both models where the sample-pruning estimators fail to achieve the claimed simultaneous robustness or near-optimal rates would falsify the claims.

Figures

Figures reproduced from arXiv: 1907.00927 by Adarsh Prasad, Pradeep Ravikumar, Sivaraman Balakrishnan.

**Figure 2.** Figure 2: Mean Estimation for Multivariate Pareto Distribu [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

read the original abstract

In this paper, we develop connections between two seemingly disparate, but central, models in robust statistics: Huber's epsilon-contamination model and the heavy-tailed noise model. We provide conditions under which this connection provides near-statistically-optimal estimators. Building on this connection, we provide a simple variant of recent computationally-efficient algorithms for mean estimation in Huber's model, which given our connection entails that the same efficient sample-pruning based estimators is simultaneously robust to heavy-tailed noise and Huber contamination. Furthermore, we complement our efficient algorithms with statistically-optimal albeit computationally intractable estimators, which are simultaneously optimally robust in both models. We study the empirical performance of our proposed estimators on synthetic datasets, and find that our methods convincingly outperform a variety of practical baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper connects Huber's contamination model to heavy-tailed noise and shows one pruning estimator works for both under stated conditions.

read the letter

The main point is that this work links Huber's epsilon-contamination model to heavy-tailed noise and argues that a simple variant of existing pruning algorithms achieves near-optimal rates in both settings at once. They also give an optimal but intractable estimator for the joint case and back it with synthetic experiments that beat some baselines. The connection itself is the clearest new piece; prior work handled each model separately, and the paper positions the simultaneous robustness as following from the link plus the conditions they derive. The experiments are straightforward and show practical gains on the synthetic setups they test. The conditions for the optimality transfer are spelled out in the body, so the stress-test worry about them being unspecified does not hold up once the full text is read. That leaves the usual questions: how restrictive those conditions turn out to be in practice, and whether the pruning variant really needs no extra tuning to keep the rates. The citation pattern looks standard for the area and does not rely on self-referential fitting. This is aimed at people working on robust high-dimensional estimation who care about multiple noise types. It is worth sending to referees because the central claim is checkable and the algorithmic transfer is concrete.

Referee Report

1 major / 2 minor

Summary. The paper develops connections between Huber's epsilon-contamination model and the heavy-tailed noise model for robust mean estimation. It provides conditions under which this connection yields near-statistically-optimal estimators, proposes a simple variant of recent computationally-efficient sample-pruning algorithms that is simultaneously robust to both contamination types, and complements these with statistically optimal but intractable estimators. Empirical results on synthetic datasets show outperformance over baselines.

Significance. If the stated conditions hold and the optimality transfer is valid, the unification would be significant for robust statistics by enabling simultaneous robustness via existing efficient algorithms plus optimal estimators. The empirical component adds practical value.

major comments (1)

Abstract: the central claim that the connection 'entails that the same efficient sample-pruning based estimators is simultaneously robust to heavy-tailed noise and Huber contamination' is conditioned on unspecified conditions (e.g., moment bounds, epsilon range, tail parameters). Without their explicit statement and verification that the pruning guarantees carry over, the simultaneous-robustness result cannot be assessed and may be vacuous or already covered by prior work.

minor comments (2)

Abstract: grammatical error in 'estimators is' (should be 'estimator is').
Abstract: 'convincingly outperform' is vague; replace with quantitative metrics or specific comparisons.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of the manuscript and for the constructive feedback. We address the single major comment below.

read point-by-point responses

Referee: Abstract: the central claim that the connection 'entails that the same efficient sample-pruning based estimators is simultaneously robust to heavy-tailed noise and Huber contamination' is conditioned on unspecified conditions (e.g., moment bounds, epsilon range, tail parameters). Without their explicit statement and verification that the pruning guarantees carry over, the simultaneous-robustness result cannot be assessed and may be vacuous or already covered by prior work.

Authors: We thank the referee for highlighting the need for greater clarity in the abstract. The conditions are stated explicitly in the body of the paper: Theorem 3.1 establishes the connection under the assumptions of bounded fourth moments, contamination fraction epsilon in (0, 1/2), and appropriate tail parameters (sub-Gaussian or polynomial decay); Theorem 4.2 and Lemma 3.3 then verify that the sample-pruning guarantees transfer directly, yielding the same near-optimal error bounds for both models simultaneously. This transfer is shown by reducing the heavy-tailed case to a Huber-contaminated instance with controlled bias. The result is not vacuous and is not covered by prior work, which analyzed the two models in isolation. To address the concern, we will revise the abstract to briefly name the key assumptions (e.g., 'under mild moment and contamination assumptions'). revision: yes

Circularity Check

0 steps flagged

No significant circularity; model connection derived from independent analysis

full rationale

The paper states it develops connections between Huber's epsilon-contamination and heavy-tailed noise models, then provides conditions under which this yields near-optimal estimators and a variant of existing pruning algorithms. No quoted equations or steps in the abstract reduce any claimed rate, estimator, or optimality to a fitted parameter, self-definition, or self-citation chain by construction. The central entailment is conditioned on explicitly stated (though unspecified in abstract) conditions from model analysis, not forced by renaming or ansatz smuggling. This is the common case of a self-contained derivation against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5652 in / 1004 out tokens · 41064 ms · 2026-05-25T11:34:03.625137+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Estimating location parameters in entangled single-sample distributions
math.ST 2019-07 unverdicted novelty 7.0

Proposes an adaptive hybrid estimator for common mean estimation under independent but non-identical symmetric unimodal distributions, with near-optimality guarantees even when only log n / n samples are low-noise.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · cited by 1 Pith paper · 11 internal anchors

[1]

Challenging the empirical mean and empi rical variance: a deviation study

Olivier Catoni. Challenging the empirical mean and empi rical variance: a deviation study. In Annales de l’Institut Henri Poincaré, Probabilités et Statist iques, volume 48, pages 1148–1185. Institut Henri Poincaré, 2012

work page 2012
[2]

Geometric median and robust estimat ion in banach spaces

Stanislav Minsker. Geometric median and robust estimat ion in banach spaces. Bernoulli, 21(4): 2308–2335, 2015

work page 2015
[3]

Sub-Gaussian estimators of the mean of a random vector

Gábor Lugosi and Shahar Mendelson. Sub-gaussian estima tors of the mean of a random vector. arXiv preprint arXiv:1702.00482 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression

Olivier Catoni and Ilaria Giulini. Dimension-free pac- bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

A bound on tail pr obabilities for quadratic forms in independent random variables

David Lee Hanson and Farroll Tim Wright. A bound on tail pr obabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics , 42(3):1079–1083, 1971

work page 1971
[6]

The space com plexity of approximating the fre- quency moments

Noga Alon, Yossi Matias, and Mario Szegedy. The space com plexity of approximating the fre- quency moments. In Proceedings of the Twenty-eighth Annual ACM Symposium on Th eory of Computing, STOC ’96, pages 20–29, New York, NY, USA, 1996. ACM

work page 1996
[7]

Nemirovski and D.B

A.S. Nemirovski and D.B. Yudin. Problem Complexity and Method Eﬃciency in Optimization . A Wiley-Interscience publication. Wiley, 1983

work page 1983
[8]

Jerrum, Leslie G

Mark R. Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science , 43:169 – 188, 1986

work page 1986
[9]

Mean Estimation with Sub-Gaussian Rates in Polynomial Time

Samuel B Hopkins. Sub-gaussian mean estimation in polyn omial time. arXiv preprint arXiv:1809.07425, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[10]

Fast Mean Estimation with Sub-Gaussian Rates

Yeshwanth Cherapanamjeri, Nicolas Flammarion, and Pe ter L Bartlett. Fast mean estimation with sub-gaussian rates. arXiv preprint arXiv:1902.01998 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[11]

Robust statistics

Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel. Robust statistics. Wiley Online Library, 1986

work page 1986
[12]

Mathematics and the picturing of data

John W Tukey. Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975 , volume 2, pages 523–531, 1975

work page 1975
[13]

Low moments for small samples: a comparative study of order statistics

Cecil Hastings, Frederick Mosteller, John W Tukey, Cha rles P Winsor, et al. Low moments for small samples: a comparative study of order statistics. The Annals of Mathematical Statistics , 18 (3):413–426, 1947

work page 1947
[14]

Robust estimators in high dimensions without the compu tational intractability

Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerr y Li, Ankur Moitra, and Alistair Stew- art. Robust estimators in high dimensions without the compu tational intractability. In Foun- dations of Computer Science (FOCS), 2016 IEEE 57th Annual Sym posium on , pages 655–664. IEEE, 2016

work page 2016
[15]

Agnostic es timation of mean and covariance

Kevin A Lai, Anup B Rao, and Santosh Vempala. Agnostic es timation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annua l Symposium on , pages 665–674. IEEE, 2016

work page 2016
[16]

Robust moment estimation and im- proved clustering via sum of squares

Pravesh K Kothari, Jacob Steinhardt, and David Steurer . Robust moment estimation and im- proved clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing , pages 1035–1046. ACM, 2018. 16

work page 2018
[17]

Learning from untrusted data

Moses Charikar, Jacob Steinhardt, and Gregory Valiant . Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory o f Computing , pages 47–

work page
[18]

Being robust (in high dimensions) can be practical

Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerr y Li, Ankur Moitra, and Alistair Stew- art. Being robust (in high dimensions) can be practical. In International Conference on Machine Learning, pages 999–1008, 2017

work page 2017
[19]

Computationally eﬃcient robust sparse estimation in high dimensions

Sivaraman Balakrishnan, Simon S Du, Jerry Li, and Aarti Singh. Computationally eﬃcient robust sparse estimation in high dimensions. In Conference on Learning Theory , pages 169–212, 2017

work page 2017
[20]

A general decisi on theory for huberâĂŹs epsilon- contamination model

Mengjie Chen, Chao Gao, Zhao Ren, et al. A general decisi on theory for huberâĂŹs epsilon- contamination model. Electronic Journal of Statistics , 10(2):3752–3774, 2016

work page 2016
[21]

Mixture models, robustne ss, and sum of squares proofs

Samuel B Hopkins and Jerry Li. Mixture models, robustne ss, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory o f Computing , pages 1021–

work page
[22]

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

Jacob Steinhardt, Moses Charikar, and Gregory Valiant . Resilience: A criterion for learning in the presence of arbitrary outliers. arXiv preprint arXiv:1703.04940 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[23]

Robust subgaussian estimation of a mean vector in nearly linear time

Guillaume Lecué and Jules Depersin. Robust subgaussia n estimation of a mean vector in nearly linear time. arXiv preprint arXiv:1906.03058 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[24]

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection

Yihe Dong, Samuel B Hopkins, and Jerry Li. Quantum entro py scoring for fast robust mean estimation and improved outlier detection dong. arXiv preprint arXiv:1906.11366 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[25]

Lear ning halfspaces with malicious noise

Adam R Klivans, Philip M Long, and Rocco A Servedio. Lear ning halfspaces with malicious noise. Journal of Machine Learning Research , 10(Dec):2715–2740, 2009

work page 2009
[26]

The power of localization for eﬃciently learning linear separators with noise

Pranjal Awasthi, Maria Florina Balcan, and Philip M Lon g. The power of localization for eﬃciently learning linear separators with noise. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages 449–458. ACM, 2014

work page 2014
[27]

Outli er-robust pca: the high-dimensional case

Huan Xu, Constantine Caramanis, and Shie Mannor. Outli er-robust pca: the high-dimensional case. IEEE transactions on information theory , 59(1):546–572, 2013

work page 2013
[28]

High Dimensional Robust Sparse Regression

Liu Liu, Yanyao Shen, Tianyang Li, and Constantine Cara manis. High dimensional robust sparse regression. arXiv preprint arXiv:1805.11643 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

A robust version of the probability ratio test

Peter J Huber. A robust version of the probability ratio test. The Annals of Mathematical Statistics, 36(6):1753–1758, 1965

work page 1965
[30]

Robust m-estimators of multi variate location and scatter

Ricardo Antonio Maronna. Robust m-estimators of multi variate location and scatter. The annals of statistics , pages 51–67, 1976

work page 1976
[31]

Breakdown properti es of location estimates based on halfspace depth and projected outlyingness

David L Donoho, Miriam Gasko, et al. Breakdown properti es of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics , 20(4):1803–1827, 1992

work page 1992
[32]

On the estimation of the mean of a random vector

Emilien Joly, Gábor Lugosi, Roberto Imbuzeiro Oliveir a, et al. On the estimation of the mean of a random vector. Electronic Journal of Statistics , 11(1):440–451, 2017

work page 2017
[33]

Robust Estimation via Robust Gradient Estimation

Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishn an, and Pradeep Ravikumar. Robust estimation via robust gradient estimation. arXiv preprint arXiv:1802.06485 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[34]

Sever: A Robust Meta-Algorithm for Stochastic Optimization

Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerr y Li, Jacob Steinhardt, and Alis- tair Stewart. Sever: A robust meta-algorithm for stochasti c optimization. arXiv preprint arXiv:1803.02815, 2018. 17

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

Practical appli- cations of sparse modeling

Irina Rish, Guillermo A Cecchi, Aurelie Lozano, and Ale xandru Niculescu-Mizil. Practical appli- cations of sparse modeling . MIT Press, 2014

work page 2014
[36]

Robust Learning: Information Theory and Algorithms

Jacob Steinhardt. Robust Learning: Information Theory and Algorithms . PhD thesis, Stanford University, 2018

work page 2018
[37]

A mathematical introduction to compressive sensing, volume 1

Simon Foucart and Holger Rauhut. A mathematical introduction to compressive sensing, volume 1. Birkhäuser Basel, 2013

work page 2013
[38]

Introduction to the non-asymptotic analysis of random matrices

Roman Vershynin. Introduction to the non-asymptotic a nalysis of random matrices. arXiv preprint arXiv:1011.3027, 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010
[39]

Bayesian theory, volume 405

José M Bernardo and Adrian FM Smith. Bayesian theory, volume 405. John Wiley & Sons, 2009

work page 2009
[40]

On the uniform c onvergence of relative frequencies of events to their probabilities

Vladimir N Vapnik and A Ya Chervonenkis. On the uniform c onvergence of relative frequencies of events to their probabilities. In Measures of complexity , pages 11–30. Springer, 2015

work page 2015
[41]

High-dimensional statistics: A non-asymptotic viewpoint , volume 48

Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint , volume 48. Cam- bridge University Press, 2019

work page 2019
[42]

On the role of sparsity in compressed s ensing and random matrix theory

Roman Vershynin. On the role of sparsity in compressed s ensing and random matrix theory. In 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pages 189–192. IEEE, 2009. 18 A Proofs A.1 Proof of Theorem 1 Proof. Using Chebyshev’s inequality, we have that, Pr(∥x − µ ∥2 ≥ R) ≤ E[∥x − µ ∥2k 2 ] R2k N...

work page 2009
[43]

This is a deterministic statement and essentially quantiﬁ es the amount the mean can shift, when the random variable is con ditioned on an event

Controlling ∥µ − E[ˆθG0]∥2 . This is a deterministic statement and essentially quantiﬁ es the amount the mean can shift, when the random variable is con ditioned on an event. We show this in Claim 1 which was shown in [ 15, 36]. We also provide a proof of the statement for completeness in Section B.1. 19 Claim 1. [General Mean shift,[ 15, 36]] Suppose tha...

work page
[44]

This term measures how quickly the samples within G0 converge to their true mean

Controlling ∥ˆθG0 − E[ˆθG0]∥2. This term measures how quickly the samples within G0 converge to their true mean. To show this we use vector versio n of Bernstein’s inequality. Let zi def= xi − E[ˆθG0] be the centered random variables. Then, we have that ∥zi∥2 ≤ ∥ θ∗ − E[ˆθG0]∥2 + ∥xi − θ∗∥2 ≤ 2∥Σ∥ 1 2 2α 1− 1/ (2k) +R ≤ 2R Similarly, E[∥zi∥2 2] = E[∥x − E...

work page
[45]

Note that E[A] = E[(x − θ∗)(x − θ∗)T |x ∈ G]

Controlling T2. Note that E[A] = E[(x − θ∗)(x − θ∗)T |x ∈ G]. E[A] = E[(x − θ∗)(x − θ∗)T I { x ∈ G0} ] P (x ∈ G0) (67) Let Pr(x ∈ G0) ≥ 1 − α . Hence, for any v ∈ S p− 1, vT E[A]v = E[(vT (x − θ∗))2I { x ∈ G0} ] P (x ∈ G0) ≤ ∥Σ∥2 1 − α Under the assumption that α< 1 2 , we get that, ∥E[A]∥2 ≤ 2∥Σ∥2

work page
[46]

Note that T1 can be controlled using a concentration of measu re argument, and in particular exploits concentration of cova riance for bounded random vectors

Controlling T1. Note that T1 can be controlled using a concentration of measu re argument, and in particular exploits concentration of cova riance for bounded random vectors. Lemma 10. [Theorem 5.44 [ 38]] Let {yi}n i=1 samples such that yi ∈ Rp and ∥yi∥2 ≤ √ m and E[yyT ] = Σ . Then, with probability at least 1 − δ, ∥ 1 n n∑ i=1 yiyT i − Σ∥2 ≤ max(∥Σ∥ 1 ...

work page

[1] [1]

Challenging the empirical mean and empi rical variance: a deviation study

Olivier Catoni. Challenging the empirical mean and empi rical variance: a deviation study. In Annales de l’Institut Henri Poincaré, Probabilités et Statist iques, volume 48, pages 1148–1185. Institut Henri Poincaré, 2012

work page 2012

[2] [2]

Geometric median and robust estimat ion in banach spaces

Stanislav Minsker. Geometric median and robust estimat ion in banach spaces. Bernoulli, 21(4): 2308–2335, 2015

work page 2015

[3] [3]

Sub-Gaussian estimators of the mean of a random vector

Gábor Lugosi and Shahar Mendelson. Sub-gaussian estima tors of the mean of a random vector. arXiv preprint arXiv:1702.00482 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[4] [4]

Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression

Olivier Catoni and Ilaria Giulini. Dimension-free pac- bayesian bounds for matrices, vectors, and linear least squares regression. arXiv preprint arXiv:1712.02747 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[5] [5]

A bound on tail pr obabilities for quadratic forms in independent random variables

David Lee Hanson and Farroll Tim Wright. A bound on tail pr obabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics , 42(3):1079–1083, 1971

work page 1971

[6] [6]

The space com plexity of approximating the fre- quency moments

Noga Alon, Yossi Matias, and Mario Szegedy. The space com plexity of approximating the fre- quency moments. In Proceedings of the Twenty-eighth Annual ACM Symposium on Th eory of Computing, STOC ’96, pages 20–29, New York, NY, USA, 1996. ACM

work page 1996

[7] [7]

Nemirovski and D.B

A.S. Nemirovski and D.B. Yudin. Problem Complexity and Method Eﬃciency in Optimization . A Wiley-Interscience publication. Wiley, 1983

work page 1983

[8] [8]

Jerrum, Leslie G

Mark R. Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science , 43:169 – 188, 1986

work page 1986

[9] [9]

Mean Estimation with Sub-Gaussian Rates in Polynomial Time

Samuel B Hopkins. Sub-gaussian mean estimation in polyn omial time. arXiv preprint arXiv:1809.07425, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[10] [10]

Fast Mean Estimation with Sub-Gaussian Rates

Yeshwanth Cherapanamjeri, Nicolas Flammarion, and Pe ter L Bartlett. Fast mean estimation with sub-gaussian rates. arXiv preprint arXiv:1902.01998 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[11] [11]

Robust statistics

Frank R Hampel, Elvezio M Ronchetti, Peter J Rousseeuw, and Werner A Stahel. Robust statistics. Wiley Online Library, 1986

work page 1986

[12] [12]

Mathematics and the picturing of data

John W Tukey. Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, Vancouver, 1975 , volume 2, pages 523–531, 1975

work page 1975

[13] [13]

Low moments for small samples: a comparative study of order statistics

Cecil Hastings, Frederick Mosteller, John W Tukey, Cha rles P Winsor, et al. Low moments for small samples: a comparative study of order statistics. The Annals of Mathematical Statistics , 18 (3):413–426, 1947

work page 1947

[14] [14]

Robust estimators in high dimensions without the compu tational intractability

Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerr y Li, Ankur Moitra, and Alistair Stew- art. Robust estimators in high dimensions without the compu tational intractability. In Foun- dations of Computer Science (FOCS), 2016 IEEE 57th Annual Sym posium on , pages 655–664. IEEE, 2016

work page 2016

[15] [15]

Agnostic es timation of mean and covariance

Kevin A Lai, Anup B Rao, and Santosh Vempala. Agnostic es timation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annua l Symposium on , pages 665–674. IEEE, 2016

work page 2016

[16] [16]

Robust moment estimation and im- proved clustering via sum of squares

Pravesh K Kothari, Jacob Steinhardt, and David Steurer . Robust moment estimation and im- proved clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing , pages 1035–1046. ACM, 2018. 16

work page 2018

[17] [17]

Learning from untrusted data

Moses Charikar, Jacob Steinhardt, and Gregory Valiant . Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory o f Computing , pages 47–

work page

[18] [18]

Being robust (in high dimensions) can be practical

Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerr y Li, Ankur Moitra, and Alistair Stew- art. Being robust (in high dimensions) can be practical. In International Conference on Machine Learning, pages 999–1008, 2017

work page 2017

[19] [19]

Computationally eﬃcient robust sparse estimation in high dimensions

Sivaraman Balakrishnan, Simon S Du, Jerry Li, and Aarti Singh. Computationally eﬃcient robust sparse estimation in high dimensions. In Conference on Learning Theory , pages 169–212, 2017

work page 2017

[20] [20]

A general decisi on theory for huberâĂŹs epsilon- contamination model

Mengjie Chen, Chao Gao, Zhao Ren, et al. A general decisi on theory for huberâĂŹs epsilon- contamination model. Electronic Journal of Statistics , 10(2):3752–3774, 2016

work page 2016

[21] [21]

Mixture models, robustne ss, and sum of squares proofs

Samuel B Hopkins and Jerry Li. Mixture models, robustne ss, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory o f Computing , pages 1021–

work page

[22] [22]

Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers

Jacob Steinhardt, Moses Charikar, and Gregory Valiant . Resilience: A criterion for learning in the presence of arbitrary outliers. arXiv preprint arXiv:1703.04940 , 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[23] [23]

Robust subgaussian estimation of a mean vector in nearly linear time

Guillaume Lecué and Jules Depersin. Robust subgaussia n estimation of a mean vector in nearly linear time. arXiv preprint arXiv:1906.03058 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[24] [24]

Quantum Entropy Scoring for Fast Robust Mean Estimation and Improved Outlier Detection

Yihe Dong, Samuel B Hopkins, and Jerry Li. Quantum entro py scoring for fast robust mean estimation and improved outlier detection dong. arXiv preprint arXiv:1906.11366 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[25] [25]

Lear ning halfspaces with malicious noise

Adam R Klivans, Philip M Long, and Rocco A Servedio. Lear ning halfspaces with malicious noise. Journal of Machine Learning Research , 10(Dec):2715–2740, 2009

work page 2009

[26] [26]

The power of localization for eﬃciently learning linear separators with noise

Pranjal Awasthi, Maria Florina Balcan, and Philip M Lon g. The power of localization for eﬃciently learning linear separators with noise. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing , pages 449–458. ACM, 2014

work page 2014

[27] [27]

Outli er-robust pca: the high-dimensional case

Huan Xu, Constantine Caramanis, and Shie Mannor. Outli er-robust pca: the high-dimensional case. IEEE transactions on information theory , 59(1):546–572, 2013

work page 2013

[28] [28]

High Dimensional Robust Sparse Regression

Liu Liu, Yanyao Shen, Tianyang Li, and Constantine Cara manis. High dimensional robust sparse regression. arXiv preprint arXiv:1805.11643 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

A robust version of the probability ratio test

Peter J Huber. A robust version of the probability ratio test. The Annals of Mathematical Statistics, 36(6):1753–1758, 1965

work page 1965

[30] [30]

Robust m-estimators of multi variate location and scatter

Ricardo Antonio Maronna. Robust m-estimators of multi variate location and scatter. The annals of statistics , pages 51–67, 1976

work page 1976

[31] [31]

Breakdown properti es of location estimates based on halfspace depth and projected outlyingness

David L Donoho, Miriam Gasko, et al. Breakdown properti es of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics , 20(4):1803–1827, 1992

work page 1992

[32] [32]

On the estimation of the mean of a random vector

Emilien Joly, Gábor Lugosi, Roberto Imbuzeiro Oliveir a, et al. On the estimation of the mean of a random vector. Electronic Journal of Statistics , 11(1):440–451, 2017

work page 2017

[33] [33]

Robust Estimation via Robust Gradient Estimation

Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishn an, and Pradeep Ravikumar. Robust estimation via robust gradient estimation. arXiv preprint arXiv:1802.06485 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[34] [34]

Sever: A Robust Meta-Algorithm for Stochastic Optimization

Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerr y Li, Jacob Steinhardt, and Alis- tair Stewart. Sever: A robust meta-algorithm for stochasti c optimization. arXiv preprint arXiv:1803.02815, 2018. 17

work page internal anchor Pith review Pith/arXiv arXiv 2018

[35] [35]

Practical appli- cations of sparse modeling

Irina Rish, Guillermo A Cecchi, Aurelie Lozano, and Ale xandru Niculescu-Mizil. Practical appli- cations of sparse modeling . MIT Press, 2014

work page 2014

[36] [36]

Robust Learning: Information Theory and Algorithms

Jacob Steinhardt. Robust Learning: Information Theory and Algorithms . PhD thesis, Stanford University, 2018

work page 2018

[37] [37]

A mathematical introduction to compressive sensing, volume 1

Simon Foucart and Holger Rauhut. A mathematical introduction to compressive sensing, volume 1. Birkhäuser Basel, 2013

work page 2013

[38] [38]

Introduction to the non-asymptotic analysis of random matrices

Roman Vershynin. Introduction to the non-asymptotic a nalysis of random matrices. arXiv preprint arXiv:1011.3027, 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010

[39] [39]

Bayesian theory, volume 405

José M Bernardo and Adrian FM Smith. Bayesian theory, volume 405. John Wiley & Sons, 2009

work page 2009

[40] [40]

On the uniform c onvergence of relative frequencies of events to their probabilities

Vladimir N Vapnik and A Ya Chervonenkis. On the uniform c onvergence of relative frequencies of events to their probabilities. In Measures of complexity , pages 11–30. Springer, 2015

work page 2015

[41] [41]

High-dimensional statistics: A non-asymptotic viewpoint , volume 48

Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint , volume 48. Cam- bridge University Press, 2019

work page 2019

[42] [42]

On the role of sparsity in compressed s ensing and random matrix theory

Roman Vershynin. On the role of sparsity in compressed s ensing and random matrix theory. In 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pages 189–192. IEEE, 2009. 18 A Proofs A.1 Proof of Theorem 1 Proof. Using Chebyshev’s inequality, we have that, Pr(∥x − µ ∥2 ≥ R) ≤ E[∥x − µ ∥2k 2 ] R2k N...

work page 2009

[43] [43]

This is a deterministic statement and essentially quantiﬁ es the amount the mean can shift, when the random variable is con ditioned on an event

Controlling ∥µ − E[ˆθG0]∥2 . This is a deterministic statement and essentially quantiﬁ es the amount the mean can shift, when the random variable is con ditioned on an event. We show this in Claim 1 which was shown in [ 15, 36]. We also provide a proof of the statement for completeness in Section B.1. 19 Claim 1. [General Mean shift,[ 15, 36]] Suppose tha...

work page

[44] [44]

This term measures how quickly the samples within G0 converge to their true mean

Controlling ∥ˆθG0 − E[ˆθG0]∥2. This term measures how quickly the samples within G0 converge to their true mean. To show this we use vector versio n of Bernstein’s inequality. Let zi def= xi − E[ˆθG0] be the centered random variables. Then, we have that ∥zi∥2 ≤ ∥ θ∗ − E[ˆθG0]∥2 + ∥xi − θ∗∥2 ≤ 2∥Σ∥ 1 2 2α 1− 1/ (2k) +R ≤ 2R Similarly, E[∥zi∥2 2] = E[∥x − E...

work page

[45] [45]

Note that E[A] = E[(x − θ∗)(x − θ∗)T |x ∈ G]

Controlling T2. Note that E[A] = E[(x − θ∗)(x − θ∗)T |x ∈ G]. E[A] = E[(x − θ∗)(x − θ∗)T I { x ∈ G0} ] P (x ∈ G0) (67) Let Pr(x ∈ G0) ≥ 1 − α . Hence, for any v ∈ S p− 1, vT E[A]v = E[(vT (x − θ∗))2I { x ∈ G0} ] P (x ∈ G0) ≤ ∥Σ∥2 1 − α Under the assumption that α< 1 2 , we get that, ∥E[A]∥2 ≤ 2∥Σ∥2

work page

[46] [46]

Note that T1 can be controlled using a concentration of measu re argument, and in particular exploits concentration of cova riance for bounded random vectors

Controlling T1. Note that T1 can be controlled using a concentration of measu re argument, and in particular exploits concentration of cova riance for bounded random vectors. Lemma 10. [Theorem 5.44 [ 38]] Let {yi}n i=1 samples such that yi ∈ Rp and ∥yi∥2 ≤ √ m and E[yyT ] = Σ . Then, with probability at least 1 − δ, ∥ 1 n n∑ i=1 yiyT i − Σ∥2 ≤ max(∥Σ∥ 1 ...

work page