Robust Moment-Based Estimation via Spectral Gradient Reweighting

Amit Singer; Liu Zhang

arxiv: 2605.27718 · v1 · pith:WEFTTKOAnew · submitted 2026-05-26 · 🧮 math.ST · cs.LG· stat.ME· stat.TH

Robust Moment-Based Estimation via Spectral Gradient Reweighting

Liu Zhang , Amit Singer This is my paper

Pith reviewed 2026-06-29 14:42 UTC · model grok-4.3

classification 🧮 math.ST cs.LGstat.MEstat.TH

keywords robust GMMspectral gradient reweightingfinite-sample boundscontaminated datamoment estimationGaussian mixturesoutlier robustnessgeneralized method of moments

0 comments

The pith

SGR-GMM reweights gradients via a spectral game to deliver robust GMM estimation with explicit local finite-sample error bounds under contamination.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Moment-based methods rely on sample averages and therefore break down when even a fraction of observations are outliers. The paper develops the SGR-GMM algorithm that inserts a spectral gradient reweighting step inside the moment-matching optimization to soften the influence of bad points. The reweighting step is cast as an entropy-regularized game whose solution yields nearly clean gradients; the overall procedure is then shown to converge locally and to satisfy an error bound whose leading term scales with the contamination fraction. A reader should care because the bound makes precise how much clean-data stability is needed to tolerate a given level of corruption, opening the door to reliable inference in regimes where full likelihood methods are unavailable or misspecified.

Core claim

The SGR-GMM algorithm formulates per-observation gradient reweighting as an entropy-regularized spectral game between a sample-weight player and a density-matrix player, proves explicit convergence radius and finite termination for the fixed-center updates, and establishes a local finite-sample parameter estimation error bound that depends explicitly on the contamination fraction, inlier gradient stability, local GMM identification strength, and optimization accuracy. The same construction specializes to a robust diagonally-weighted GMM estimator for heteroscedastic low-rank Gaussian mixtures observed under additive Gaussian noise and strong contamination.

What carries the argument

The SGR primitive: an entropy-regularized spectral game between a sample-weight player and a density-matrix player that produces soft-reweighted gradients for the moment equations.

If this is right

Fixed-center SGR updates converge inside an explicit radius and terminate after a finite number of steps.
The robust DGMM specialization produces nearly-oracle gradient estimates and substantially lower error than non-robust moment baselines on contaminated heteroscedastic mixtures.
The finite-sample bound isolates the separate contributions of contamination level, inlier stability, identification strength, and solver tolerance.
The procedure remains a valid GMM estimator when the contamination fraction is zero.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same spectral-game reweighting could be inserted into other moment-matching procedures whose gradients admit a low-rank or diagonal structure.
Empirical checks of inlier gradient stability on hold-out clean data might serve as a practical diagnostic before trusting the estimator on a new contaminated collection.
Because the bound is local, one could test whether initializing from a coarse robust location estimator enlarges the basin in which the guarantee applies.

Load-bearing premise

Inlier gradient stability and local GMM identification strength must hold at a level sufficient for the chosen contamination fraction.

What would settle it

A data set generated from a locally identified GMM whose clean inliers satisfy the stated gradient-stability condition, yet the recovered parameter error after running SGR-GMM to the claimed accuracy exceeds the explicit bound by a constant factor.

Figures

Figures reproduced from arXiv: 2605.27718 by Amit Singer, Liu Zhang.

**Figure 1.** Figure 1: Mean ℓ2 estimation error under increasing contamination in the case of directional outliers. 5.1.2 Progress over outer-loop iterations [PITH_FULL_IMAGE:figures/full_fig_p026_1.png] view at source ↗

**Figure 2.** Figure 2: Progress over outer-loop iterations in the case of directional outliers. The horizontal dotted [PITH_FULL_IMAGE:figures/full_fig_p027_2.png] view at source ↗

**Figure 3.** Figure 3: Sensitivity to the user-supplied contamination level when the actual contamination is [PITH_FULL_IMAGE:figures/full_fig_p027_3.png] view at source ↗

**Figure 4.** Figure 4: Representative convergence diagnostics for [PITH_FULL_IMAGE:figures/full_fig_p029_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of repeated-trial DGMM errors in the clean, noise-only, contamination-only, [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗

**Figure 6.** Figure 6: Mixing probability error, center error, and covariance errors across the methods under [PITH_FULL_IMAGE:figures/full_fig_p031_6.png] view at source ↗

read the original abstract

Moment-based estimation is a theoretically attractive approach to parametric inference, especially when likelihood-based estimation is unavailable, misspecified, or computationally inconvenient. However, the moment equations involve sample averages, which makes moment-based estimation sensitive to outliers. We propose the SGR-GMM algorithm, a robust generalized method of moments (GMM) procedure that uses a spectral gradient reweighting (SGR) primitive to soft-reweight the per-observation gradients during the moment-matching optimization. Our analysis has three layers. First, for a fixed center, the SGR primitive is formulated as an entropy-regularized spectral game between a sample-weight player and a density-matrix player, which is analyzed using classical multiplicative-weights and matrix-multiplicative-weights regret bounds. Second, we establish explicit convergence radius and finite termination bound for the fixed-center updates in the SGR primitive. Third, we prove a local finite-sample parameter estimation error bound with explicit dependence on the contamination fraction, inlier gradient stability, local GMM identification strength, and optimization accuracy. We further specialize the SGR-GMM algorithm to obtain a robust diagonally-weighted GMM (DGMM) estimator for estimating heteroscedastic low-rank Gaussian mixtures observed under additive Gaussian noise and strong contamination. In the numerical experiments, the SGR primitive produces nearly-oracle gradient estimation and the robust DGMM specialization substantially improves over non-robust moment baselines. The code and data are available at https://github.com/liu-lzhang/sgr-gmm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SGR-GMM gives a fresh spectral-game reweighting primitive for robust GMM with conditional local error bounds, but those bounds rest on stability assumptions that the experiments do not verify.

read the letter

The paper's core contribution is the SGR primitive: an entropy-regularized spectral game between sample weights and a density matrix, solved via multiplicative-weights regret bounds, plus a fixed-center convergence radius and a local finite-sample GMM error bound that explicitly tracks contamination fraction, inlier gradient stability, identification strength, and optimization error. They then specialize it to a robust DGMM estimator for heteroscedastic low-rank Gaussian mixtures under additive noise.

What works: the game formulation does not collapse to earlier reweighting schemes, the regret analysis imports standard online-learning tools cleanly, the specialization is concrete, code and data are released, and the numerics show the reweighting produces near-oracle gradients and beats plain moment baselines under contamination.

The soft spot is exactly where the stress-test flags it. The finite-sample bound is conditional on inlier gradient stability and local GMM identification being sufficiently strong relative to the contamination level. Nothing in the abstract or the reported experiments supplies an a-priori guarantee or post-hoc check that these quantities actually meet the thresholds for the noise and contamination fractions tested. If they do not, the explicit bound does not apply and the convergence claims become less relevant to the estimator's actual error. The analysis itself looks formally grounded; the gap is between the theorem statement and the numerical regime.

This is for researchers who already work on robust moment methods or contaminated GMM. It is coherent on its own terms and deserves a serious referee, even though the practical reach of the bound needs clarification.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes the SGR-GMM algorithm for robust generalized method of moments estimation. It formulates the spectral gradient reweighting (SGR) primitive as an entropy-regularized spectral game between sample-weight and density-matrix players, analyzed via classical multiplicative-weights and matrix-multiplicative-weights regret bounds. It establishes explicit convergence radius and finite termination for fixed-center updates, and proves a local finite-sample parameter estimation error bound depending explicitly on contamination fraction, inlier gradient stability, local GMM identification strength, and optimization accuracy. The approach is specialized to a robust diagonally-weighted GMM (DGMM) estimator for heteroscedastic low-rank Gaussian mixtures under additive noise and contamination. Numerical experiments indicate that SGR yields nearly-oracle gradients and the robust DGMM outperforms non-robust baselines; code and data are provided.

Significance. If the conditional assumptions hold, the explicit dependence of the error bound on contamination, stability, identification, and accuracy supplies a theoretically grounded robust moment estimator with verifiable conditions, which is valuable when likelihood methods are unavailable or misspecified. The reproducibility via public code is a clear strength.

major comments (2)

[Abstract (third layer of analysis) and numerical experiments] Abstract (third layer): The local finite-sample parameter estimation error bound is derived under the assumption that inlier gradient stability and local GMM identification strength are sufficiently large relative to the contamination fraction. For the specialized robust DGMM estimator on heteroscedastic low-rank Gaussian mixtures, neither the general theory nor the numerical section supplies an a-priori guarantee or post-hoc verification that these quantities meet the required thresholds at the tested contamination fractions and noise levels. Without this, the explicit bound does not apply to the reported estimator error or convergence claims.
[Analysis of fixed-center updates and DGMM specialization] The convergence-radius and finite-termination claims for the fixed-center SGR updates rest on the game formulation and external regret bounds, but the manuscript does not demonstrate that the inlier gradient stability assumption (required for the downstream error bound) is preserved under the specific reweighting produced by the SGR primitive in the DGMM specialization.

minor comments (1)

[Abstract] The abstract states that the SGR primitive produces nearly-oracle gradient estimation, but the precise metric (e.g., which norm or expectation) and comparison baseline are not defined in the summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments correctly identify gaps in connecting the conditional theoretical bounds to the numerical experiments and in explicitly linking the SGR reweighting to preservation of the stability assumption. We address each point below and will incorporate revisions.

read point-by-point responses

Referee: The local finite-sample parameter estimation error bound is derived under the assumption that inlier gradient stability and local GMM identification strength are sufficiently large relative to the contamination fraction. For the specialized robust DGMM estimator on heteroscedastic low-rank Gaussian mixtures, neither the general theory nor the numerical section supplies an a-priori guarantee or post-hoc verification that these quantities meet the required thresholds at the tested contamination fractions and noise levels. Without this, the explicit bound does not apply to the reported estimator error or convergence claims.

Authors: We agree that the error bound is conditional on inlier gradient stability and local identification strength being sufficiently large relative to contamination, and that the manuscript provides neither a priori guarantees nor post-hoc verification of these conditions for the DGMM experiments. The bound is presented as holding under those assumptions, while the numerics serve as separate empirical evidence. In revision we will add post-hoc numerical verification (e.g., subsample-based estimates of gradient stability and identification strength) for the reported contamination and noise levels, together with a short discussion of practical checks. revision: yes
Referee: The convergence-radius and finite-termination claims for the fixed-center SGR updates rest on the game formulation and external regret bounds, but the manuscript does not demonstrate that the inlier gradient stability assumption (required for the downstream error bound) is preserved under the specific reweighting produced by the SGR primitive in the DGMM specialization.

Authors: The stability assumption concerns the uncontaminated inlier population. The SGR primitive is constructed to recover inlier moments via the entropy-regularized game, so that sufficiently accurate optimization yields reweighted gradients whose stability is inherited from the inliers. We acknowledge, however, that an explicit argument showing preservation under the DGMM reweighting map is absent. In revision we will insert a short supporting remark (or lemma) establishing that, when the optimization accuracy condition of the convergence theorem holds, the reweighted gradients remain stable whenever the original inlier gradients are. revision: yes

Circularity Check

0 steps flagged

No significant circularity: external regret bounds and conditional assumptions keep derivation independent

full rationale

The paper derives its SGR primitive analysis and local finite-sample error bound by invoking classical multiplicative-weights and matrix-multiplicative-weights regret bounds from external online-learning literature, then states explicit dependence on contamination fraction, inlier gradient stability, local GMM identification strength, and optimization accuracy as assumptions. These quantities are not fitted from the target data, self-defined, or obtained via self-citation chains; the central claims therefore remain non-tautological and externally grounded. No load-bearing step reduces by construction to the paper's own inputs or renames a fitted quantity as a prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the new SGR primitive and on standard regret bounds from online optimization; no free parameters are explicitly fitted in the abstract description.

axioms (1)

standard math Classical multiplicative-weights and matrix-multiplicative-weights regret bounds apply to the entropy-regularized spectral game
Invoked to analyze the SGR primitive in the first layer of the analysis.

invented entities (1)

SGR primitive no independent evidence
purpose: Soft-reweighting of per-observation gradients via entropy-regularized spectral game during moment-matching optimization
New algorithmic component introduced to achieve robustness in GMM.

pith-pipeline@v0.9.1-grok · 5794 in / 1408 out tokens · 49276 ms · 2026-06-29T14:42:09.678658+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 4 canonical work pages · 1 internal anchor

[1]

The multiplicative weights update method: a meta-algorithm and applications.Theory Comput., 8:121–164, 2012

Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications.Theory Comput., 8:121–164, 2012. 3, 9, 12, 34

2012
[2]

Convex Optimization: Algorithms and Complexity

Sébastien Bubeck. Convex Optimization: Algorithms and Complexity, November 2015. arXiv:1405.4980 [math]. 3, 9, 12, 13

work page internal anchor Pith review Pith/arXiv arXiv 2015
[3]

High-dimensional robust mean estimation via gradient descent

Yu Cheng, Ilias Diakonikolas, Rong Ge, and Mahdi Soltanolkotabi. High-dimensional robust mean estimation via gradient descent. InInternational conference on machine learning, pages 1768–1778. PMLR, 2020. 3, 5, 19

2020
[4]

Dalalyan and Arshak Minasyan

Arnak S. Dalalyan and Arshak Minasyan. All-in-one robust estimator of the Gaussian mean. The Annals of Statistics, 50(2), April 2022. 3, 5, 16

2022
[5]

J. E. Dennis, Jr. and Robert B. Schnabel.Numerical methods for unconstrained optimization and nonlinear equations, volume 16 ofClassics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1996. Corrected reprint of the 1983 original. 4, 10

1996
[6]

Robust estimators in high-dimensions without the computational intractability.SIAM Journal on Computing, 48(2):742–864, 2019

Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimators in high-dimensions without the computational intractability.SIAM Journal on Computing, 48(2):742–864, 2019. 3

2019
[7]

Sever: A robust meta-algorithm for stochastic optimization

Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart. Sever: A robust meta-algorithm for stochastic optimization. InInternational Conference on Machine Learning, pages 1596–1606. PMLR, 2019. 3, 5

2019
[8]

Cambridge university press, 2023

Ilias Diakonikolas and Daniel M Kane.Algorithmic high-dimensional robust statistics. Cambridge university press, 2023. 3

2023
[9]

Quantum entropy scoring for fast robust mean estimation and improved outlier detection.Advances in Neural Information Processing Systems, 32, 2019

Yihe Dong, Samuel Hopkins, and Jerry Li. Quantum entropy scoring for fast robust mean estimation and improved outlier detection.Advances in Neural Information Processing Systems, 32, 2019. 5

2019
[10]

David Donoho and Peter J. Huber. The notion of breakdown point. InA Festschrift for Erich L. Lehmann, Wadsworth Statist./Probab. Ser., pages 157–184. Wadsworth, Belmont, CA, 1983. 3

1983
[11]

Wiley Online Library, 2004

Alastair Hall.Generalized Method of Moments. Wiley Online Library, 2004. 4, 17, 18, 31

2004
[12]

PhD thesis, University of California, Berkeley, 1968

Frank Rudolf Hampel.Contributions to the theory of robust estimation. PhD thesis, University of California, Berkeley, 1968. 3

1968
[13]

Large sample properties of generalized method of moments estimators

Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054, 1982. 2, 4, 17

1982
[14]

Introduction to Online Convex Optimization, August 2023

Elad Hazan. Introduction to Online Convex Optimization, August 2023. arXiv:1909.05207 [cs]. 3, 9

work page arXiv 2023
[15]

Robust and heavy-tailed mean estimation made simple, via regret minimization.Advances in Neural Information Processing Systems, 33:11902–11912,

Sam Hopkins, Jerry Li, and Fred Zhang. Robust and heavy-tailed mean estimation made simple, via regret minimization.Advances in Neural Information Processing Systems, 33:11902–11912,
[16]

Robust estimation of a location parameter.The Annals of Mathematical Statistics, 35(1):73–101, 1964

Peter J Huber. Robust estimation of a location parameter.The Annals of Mathematical Statistics, 35(1):73–101, 1964. 3

1964
[17]

Peter J. Huber. A robust version of the probability ratio test.Ann. Math. Statist., 36:1753–1758,
[18]

Agnostic estimation of mean and covariance

Kevin A Lai, Anup B Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665–674. IEEE, 2016. 3

2016
[19]

Global convergence of iteratively reweighted least squares for robust subspace recovery.arXiv preprint arXiv:2506.20533, 2025

Gilad Lerman, Kang Li, Tyler Maunu, and Teng Zhang. Global convergence of iteratively reweighted least squares for robust subspace recovery.arXiv preprint arXiv:2506.20533, 2025. 4

work page arXiv 2025
[20]

Fast, robust and non-convex subspace recovery.Inf

Gilad Lerman and Tyler Maunu. Fast, robust and non-convex subspace recovery.Inf. Inference, 7(2):277–336, 2018. 4, 19

2018
[21]

A theoretical review of modern robust statistics.Annu

Po-Ling Loh. A theoretical review of modern robust statistics.Annu. Rev. Stat. Appl., 12:477– 496, 2025. 3

2025
[22]

Luenberger and Yinyu Ye.Linear and Nonlinear Programming

David G. Luenberger and Yinyu Ye.Linear and Nonlinear Programming. Number 116 in International Series in Operations Research & Management Science. Springer US, Boston, MA, third edition edition, 2008. 4

2008
[23]

Large sample estimation and hypothesis testing

Whitney K Newey and Daniel McFadden. Large sample estimation and hypothesis testing. In Handbook of Econometrics, volume 4, pages 2111–2245. Elsevier, 1994. 4, 17, 18, 19

1994
[24]

Cambridge university press, 2010

Michael A Nielsen and Isaac L Chuang.Quantum computation and quantum information. Cambridge university press, 2010. 7

2010
[25]

Contributions to the mathematical theory of evolution.Philosophical Transactions of the Royal Society of London

Karl Pearson. Contributions to the mathematical theory of evolution.Philosophical Transactions of the Royal Society of London. A, 185:71–110, 1894. 2
[26]

Robust estimation via robust gradient estimation.J

Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishnan, and Pradeep Ravikumar. Robust estimation via robust gradient estimation.J. R. Stat. Soc. Ser. B. Stat. Methodol., 82(3):601–627,
[27]

Robust generalized method of moments: a finite sample viewpoint.Advances in Neural Information Processing Systems, 35:15970–15981, 2022

Dhruv Rohatgi and Vasilis Syrgkanis. Robust generalized method of moments: a finite sample viewpoint.Advances in Neural Information Processing Systems, 35:15970–15981, 2022. 3, 5

2022
[28]

PhD thesis, ETH Zürich, 1979

Elvezio Ronchetti.Robusthatseigenschaften von Tests. PhD thesis, ETH Zürich, 1979. 3

1979
[29]

Inequalities for quantum entropy: a review with conditions for equality

Mary Beth Ruskai. Inequalities for quantum entropy: a review with conditions for equality. volume 43, pages 4358–4375. 2002. Quantum information theory. 7

2002
[30]

CS395T: Continuous algorithms, part viii: Matrix multiplicative weights, 2025

Kevin Tian. CS395T: Continuous algorithms, part viii: Matrix multiplicative weights, 2025. Lecture notes. 7

2025
[31]

Diagonally-weighted generalized method of moments estimation for Gaussian mixture modeling.arXiv preprint arXiv:2507.20459, 2025

Liu Zhang, Oscar Mickelin, Sheng Xu, and Amit Singer. Diagonally-weighted generalized method of moments estimation for Gaussian mixture modeling.arXiv preprint arXiv:2507.20459, 2025. 4, 21, 24, 28

work page arXiv 2025
[32]

Robust estimation via generalized quasi- gradients.Inf

Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt. Robust estimation via generalized quasi- gradients.Inf. Inference, 11(2):581–636, 2022. 3, 5, 8 33 A Supplementary proofs A.1 Supplementary proofs for fixed-center regret bound In what follows, we will prove that given a fixed centerbµ[s], the averaged weight vector returned from the MW-MMW rounds,bw[s] = w...

2022

[1] [1]

The multiplicative weights update method: a meta-algorithm and applications.Theory Comput., 8:121–164, 2012

Sanjeev Arora, Elad Hazan, and Satyen Kale. The multiplicative weights update method: a meta-algorithm and applications.Theory Comput., 8:121–164, 2012. 3, 9, 12, 34

2012

[2] [2]

Convex Optimization: Algorithms and Complexity

Sébastien Bubeck. Convex Optimization: Algorithms and Complexity, November 2015. arXiv:1405.4980 [math]. 3, 9, 12, 13

work page internal anchor Pith review Pith/arXiv arXiv 2015

[3] [3]

High-dimensional robust mean estimation via gradient descent

Yu Cheng, Ilias Diakonikolas, Rong Ge, and Mahdi Soltanolkotabi. High-dimensional robust mean estimation via gradient descent. InInternational conference on machine learning, pages 1768–1778. PMLR, 2020. 3, 5, 19

2020

[4] [4]

Dalalyan and Arshak Minasyan

Arnak S. Dalalyan and Arshak Minasyan. All-in-one robust estimator of the Gaussian mean. The Annals of Statistics, 50(2), April 2022. 3, 5, 16

2022

[5] [5]

J. E. Dennis, Jr. and Robert B. Schnabel.Numerical methods for unconstrained optimization and nonlinear equations, volume 16 ofClassics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1996. Corrected reprint of the 1983 original. 4, 10

1996

[6] [6]

Robust estimators in high-dimensions without the computational intractability.SIAM Journal on Computing, 48(2):742–864, 2019

Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimators in high-dimensions without the computational intractability.SIAM Journal on Computing, 48(2):742–864, 2019. 3

2019

[7] [7]

Sever: A robust meta-algorithm for stochastic optimization

Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart. Sever: A robust meta-algorithm for stochastic optimization. InInternational Conference on Machine Learning, pages 1596–1606. PMLR, 2019. 3, 5

2019

[8] [8]

Cambridge university press, 2023

Ilias Diakonikolas and Daniel M Kane.Algorithmic high-dimensional robust statistics. Cambridge university press, 2023. 3

2023

[9] [9]

Quantum entropy scoring for fast robust mean estimation and improved outlier detection.Advances in Neural Information Processing Systems, 32, 2019

Yihe Dong, Samuel Hopkins, and Jerry Li. Quantum entropy scoring for fast robust mean estimation and improved outlier detection.Advances in Neural Information Processing Systems, 32, 2019. 5

2019

[10] [10]

David Donoho and Peter J. Huber. The notion of breakdown point. InA Festschrift for Erich L. Lehmann, Wadsworth Statist./Probab. Ser., pages 157–184. Wadsworth, Belmont, CA, 1983. 3

1983

[11] [11]

Wiley Online Library, 2004

Alastair Hall.Generalized Method of Moments. Wiley Online Library, 2004. 4, 17, 18, 31

2004

[12] [12]

PhD thesis, University of California, Berkeley, 1968

Frank Rudolf Hampel.Contributions to the theory of robust estimation. PhD thesis, University of California, Berkeley, 1968. 3

1968

[13] [13]

Large sample properties of generalized method of moments estimators

Lars Peter Hansen. Large sample properties of generalized method of moments estimators. Econometrica, 50(4):1029–1054, 1982. 2, 4, 17

1982

[14] [14]

Introduction to Online Convex Optimization, August 2023

Elad Hazan. Introduction to Online Convex Optimization, August 2023. arXiv:1909.05207 [cs]. 3, 9

work page arXiv 2023

[15] [15]

Robust and heavy-tailed mean estimation made simple, via regret minimization.Advances in Neural Information Processing Systems, 33:11902–11912,

Sam Hopkins, Jerry Li, and Fred Zhang. Robust and heavy-tailed mean estimation made simple, via regret minimization.Advances in Neural Information Processing Systems, 33:11902–11912,

[16] [16]

Robust estimation of a location parameter.The Annals of Mathematical Statistics, 35(1):73–101, 1964

Peter J Huber. Robust estimation of a location parameter.The Annals of Mathematical Statistics, 35(1):73–101, 1964. 3

1964

[17] [17]

Peter J. Huber. A robust version of the probability ratio test.Ann. Math. Statist., 36:1753–1758,

[18] [18]

Agnostic estimation of mean and covariance

Kevin A Lai, Anup B Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665–674. IEEE, 2016. 3

2016

[19] [19]

Global convergence of iteratively reweighted least squares for robust subspace recovery.arXiv preprint arXiv:2506.20533, 2025

Gilad Lerman, Kang Li, Tyler Maunu, and Teng Zhang. Global convergence of iteratively reweighted least squares for robust subspace recovery.arXiv preprint arXiv:2506.20533, 2025. 4

work page arXiv 2025

[20] [20]

Fast, robust and non-convex subspace recovery.Inf

Gilad Lerman and Tyler Maunu. Fast, robust and non-convex subspace recovery.Inf. Inference, 7(2):277–336, 2018. 4, 19

2018

[21] [21]

A theoretical review of modern robust statistics.Annu

Po-Ling Loh. A theoretical review of modern robust statistics.Annu. Rev. Stat. Appl., 12:477– 496, 2025. 3

2025

[22] [22]

Luenberger and Yinyu Ye.Linear and Nonlinear Programming

David G. Luenberger and Yinyu Ye.Linear and Nonlinear Programming. Number 116 in International Series in Operations Research & Management Science. Springer US, Boston, MA, third edition edition, 2008. 4

2008

[23] [23]

Large sample estimation and hypothesis testing

Whitney K Newey and Daniel McFadden. Large sample estimation and hypothesis testing. In Handbook of Econometrics, volume 4, pages 2111–2245. Elsevier, 1994. 4, 17, 18, 19

1994

[24] [24]

Cambridge university press, 2010

Michael A Nielsen and Isaac L Chuang.Quantum computation and quantum information. Cambridge university press, 2010. 7

2010

[25] [25]

Contributions to the mathematical theory of evolution.Philosophical Transactions of the Royal Society of London

Karl Pearson. Contributions to the mathematical theory of evolution.Philosophical Transactions of the Royal Society of London. A, 185:71–110, 1894. 2

[26] [26]

Robust estimation via robust gradient estimation.J

Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishnan, and Pradeep Ravikumar. Robust estimation via robust gradient estimation.J. R. Stat. Soc. Ser. B. Stat. Methodol., 82(3):601–627,

[27] [27]

Robust generalized method of moments: a finite sample viewpoint.Advances in Neural Information Processing Systems, 35:15970–15981, 2022

Dhruv Rohatgi and Vasilis Syrgkanis. Robust generalized method of moments: a finite sample viewpoint.Advances in Neural Information Processing Systems, 35:15970–15981, 2022. 3, 5

2022

[28] [28]

PhD thesis, ETH Zürich, 1979

Elvezio Ronchetti.Robusthatseigenschaften von Tests. PhD thesis, ETH Zürich, 1979. 3

1979

[29] [29]

Inequalities for quantum entropy: a review with conditions for equality

Mary Beth Ruskai. Inequalities for quantum entropy: a review with conditions for equality. volume 43, pages 4358–4375. 2002. Quantum information theory. 7

2002

[30] [30]

CS395T: Continuous algorithms, part viii: Matrix multiplicative weights, 2025

Kevin Tian. CS395T: Continuous algorithms, part viii: Matrix multiplicative weights, 2025. Lecture notes. 7

2025

[31] [31]

Diagonally-weighted generalized method of moments estimation for Gaussian mixture modeling.arXiv preprint arXiv:2507.20459, 2025

Liu Zhang, Oscar Mickelin, Sheng Xu, and Amit Singer. Diagonally-weighted generalized method of moments estimation for Gaussian mixture modeling.arXiv preprint arXiv:2507.20459, 2025. 4, 21, 24, 28

work page arXiv 2025

[32] [32]

Robust estimation via generalized quasi- gradients.Inf

Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt. Robust estimation via generalized quasi- gradients.Inf. Inference, 11(2):581–636, 2022. 3, 5, 8 33 A Supplementary proofs A.1 Supplementary proofs for fixed-center regret bound In what follows, we will prove that given a fixed centerbµ[s], the averaged weight vector returned from the MW-MMW rounds,bw[s] = w...

2022