arxiv: 2603.26029 · v2 · submitted 2026-03-27 · 🧮 math.ST · cs.CC· stat.TH

Recognition: no theorem link

Detection Is Harder Than Estimation in Certain Regimes: Inference for Moment and Cumulant Tensors

Anru R. Zhang, Runshi Tang, Yuefeng Han

Authors on Pith no claims yet

Pith reviewed 2026-05-14 23:26 UTC · model grok-4.3

classification 🧮 math.ST cs.CCstat.TH

keywords moment tensorscumulant tensorsestimation ratedetectioncomputational hardnesslow-degree polynomialstensor spectral normreverse gap

0 comments

The pith

Efficient estimation of high-order cumulant tensors succeeds in regimes where low-degree methods cannot detect them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the minimax rate for estimating the spectral norm of order-d moment and cumulant tensors from n sub-Gaussian samples in p dimensions is sqrt(p/n) wedge 1. A specially constructed estimator attains this rate up to logarithmic factors, outperforming the raw sample moment tensor when d is at least 3. At the same time, low-degree polynomial analysis indicates that testing whether the d-th cumulant tensor vanishes after whitening is computationally hard for n much smaller than p to the power d over 2. In an overlapping parameter range the efficient estimator's error lies below the separation that low-degree tests can distinguish, producing a reverse gap in which estimation is easier than detection. The root cause is that the spectral norm loss itself is NP-hard to compute.

Core claim

The central claim is that there exists a regime in which an efficiently computable estimator for the order-d cumulant tensor achieves error smaller than the separation at which low-degree tests can reliably distinguish the null from the alternative, after whitening. This produces an unusual reverse detection-estimation gap because the tensor spectral norm metric is NP-hard to evaluate.

What carries the argument

The tensor spectral norm as the performance metric, which is NP-hard to compute, together with the low-degree polynomial framework used to certify computational hardness of detection.

If this is right

The minimax estimation rate for the d-th moment or cumulant tensor in spectral norm is sqrt(p/n) wedge 1.
The sample moment tensor is rate-suboptimal for d at least 3, but a modified estimator attains the optimal rate up to logs.
Low-degree evidence shows detection of non-vanishing cumulants is hard when n is much less than p^{d/2}.
The reverse gap appears precisely because the spectral norm is NP-hard to compute, so efficient procedures can control estimation error without solving the detection problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar reverse gaps may appear in other tensor or high-order moment problems whenever the loss function is computationally intractable.
Practitioners facing high-order tensor data might obtain useful estimates even when reliable detection remains out of reach.
Approximation schemes for the tensor spectral norm could close the gap or confirm that the hardness is intrinsic.

Load-bearing premise

The observations are sub-Gaussian and the low-degree polynomial framework accurately reflects computational hardness for the detection problem.

What would settle it

Discovery of a polynomial-time algorithm that distinguishes zero from non-zero d-th cumulant tensors (after whitening) for sample sizes n much smaller than p^{d/2}, or a direct numerical check showing the proposed estimator fails to achieve error of order sqrt(p/n) on synthetic sub-Gaussian data.

Figures

Figures reproduced from arXiv: 2603.26029 by Anru R. Zhang, Runshi Tang, Yuefeng Han.

read the original abstract

We study estimation and detection of high-order moment and cumulant tensors from $n$ i.i.d.\ observations of a $p$-dimensional random vector, with performance measured in tensor spectral norm. Under sub-Gaussianity, we show that the minimax rate for estimating the order-$d$ moment and cumulant tensors is $\sqrt{p/n}\wedge 1$. In contrast to covariance estimation, the sample moment tensor is generally not rate-optimal for $d\ge 3$, and we construct an estimator that attains the minimax rate up to logarithmic factors. On the computational side, we study testing whether the $d$-th order cumulant tensor vanishes after whitening. Using the low-degree polynomial framework, we provide evidence that detection is computationally hard when $n\ll p^{d/2}$. At the same time, we identify a regime in which an efficiently computable estimator achieves error smaller than the separation at which low-degree tests can reliably distinguish the null from the alternative. This reveals an unusual reverse detection--estimation gap: computationally efficient detection can be harder than computationally efficient estimation. The underlying reason is that the relevant loss, tensor spectral norm, is itself NP-hard to compute, creating a new form of computational--statistical gap.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a regime where efficient cumulant tensor estimation beats the low-degree detection threshold under spectral norm, but the hardness claim rests only on suggestive evidence.

read the letter

The main thing to know is that this work identifies a reverse detection-estimation gap for order-d cumulant tensors: an efficient estimator can achieve spectral-norm error below the separation where low-degree polynomials fail to distinguish null from alternative when n is much smaller than p to the d/2. This is driven by the fact that computing the tensor spectral norm itself is NP-hard, which flips the usual pattern seen in covariance or matrix settings. The authors derive the minimax estimation rate of sqrt(p/n) wedge 1 under sub-Gaussian tails and show the sample moment tensor falls short for d at least 3, then build a better estimator that reaches the rate up to logs. That part follows standard information-theoretic arguments and looks solid on the surface. The low-degree analysis for detection hardness is consistent with how the framework has been used before, and the abstract plus the claimed regime line up with what they describe. The soft spot is that low-degree polynomials only give evidence of computational hardness, not a proof that no polynomial-time algorithm exists. If an efficient method outside that class succeeds at detection in the same regime, the reverse gap disappears. Sub-Gaussianity is a standard assumption but narrows the scope, and the estimator construction needs the full details to confirm it stays efficient. Overall the math and citation pattern look clean with no obvious circularity. This is for people working on high-dimensional statistics and tensor problems who track computational-statistical gaps. It is worth a serious referee because the reverse gap angle is new even if the hardness part stays suggestive rather than definitive. I would send it to peer review.

Referee Report

1 major / 2 minor

Summary. The manuscript derives minimax rates of √(p/n) ∧ 1 for estimating order-d moment and cumulant tensors in spectral norm under sub-Gaussian observations. It shows the empirical moment tensor is suboptimal for d ≥ 3 and constructs a near-optimal estimator attaining the rate up to logarithmic factors. For detection of non-vanishing cumulants after whitening, the low-degree polynomial method is used to indicate computational hardness when n ≪ p^{d/2}. The paper identifies a regime in which an efficient estimator achieves spectral-norm error below the low-degree detection threshold, establishing a reverse detection-estimation gap whose origin is the NP-hardness of computing the tensor spectral norm itself.

Significance. If the central claims hold, the work identifies an unusual computational-statistical tradeoff in which efficient estimation can succeed in a regime where efficient detection fails, driven by the computational hardness of the loss function rather than by information-theoretic limits. The minimax upper and lower bounds follow standard sub-Gaussian arguments, and the low-degree analysis is consistent with prior applications of the framework; the explicit construction of a rate-optimal estimator and the concrete regime exhibiting the reverse gap are the primary contributions.

major comments (1)

[Detection analysis (low-degree section)] The reverse-gap claim in the abstract and detection section rests on low-degree polynomial evidence that detection is hard for n ≪ p^{d/2}. This framework shows that the maximum correlation of degree-k polynomials with the likelihood ratio is o(1) below the threshold, but it supplies only suggestive evidence of computational hardness rather than a rigorous proof that no polynomial-time algorithm exists. If an efficient algorithm outside the low-degree class succeeds at detection in this regime, the claimed reverse gap disappears. The manuscript should state the precise strength of this evidence and, if possible, supply a concrete test (e.g., a specific algorithm class) that would falsify the hardness claim.

minor comments (2)

[Introduction] The notation distinguishing moment tensors from cumulant tensors after whitening is introduced without an explicit reference to the relevant definition; adding a short display equation or pointer to the appendix would improve readability.
[Estimator construction] The logarithmic factors in the upper bound for the constructed estimator are stated but not derived in the main text; a brief sketch of where the logs arise (e.g., from union bounds or covering numbers) would help readers assess tightness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comment below and will incorporate the suggested clarifications in the revised version.

read point-by-point responses

Referee: The reverse-gap claim in the abstract and detection section rests on low-degree polynomial evidence that detection is hard for n ≪ p^{d/2}. This framework shows that the maximum correlation of degree-k polynomials with the likelihood ratio is o(1) below the threshold, but it supplies only suggestive evidence of computational hardness rather than a rigorous proof that no polynomial-time algorithm exists. If an efficient algorithm outside the low-degree class succeeds at detection in this regime, the claimed reverse gap disappears. The manuscript should state the precise strength of this evidence and, if possible, supply a concrete test (e.g., a specific algorithm class) that would falsify the hardness claim.

Authors: We agree that the low-degree polynomial method provides suggestive evidence of computational hardness rather than a rigorous unconditional proof. In the revised manuscript, we will explicitly state in the detection section (and update the abstract accordingly) that the hardness claim is evidence under the low-degree framework, which is a standard but not exhaustive proxy for polynomial-time algorithms. We will also add a brief discussion of falsifying tests, noting that the claim would be refuted by the discovery of an efficient algorithm (for instance, via convex relaxations such as semidefinite programming or tensor power iteration variants) that reliably detects non-vanishing cumulants for n ≪ p^{d/2} while remaining outside the low-degree class. This clarification preserves the paper's contributions while accurately qualifying the computational evidence. revision: yes

Circularity Check

0 steps flagged

No significant circularity; rates and gap rest on external minimax arguments and low-degree framework

full rationale

The estimation minimax rate sqrt(p/n) wedge 1 is obtained from standard sub-Gaussian concentration and tensor-norm arguments that do not reduce to any fitted parameter inside the paper. The detection threshold n << p^{d/2} is imported from the established low-degree polynomial method (external literature), which is used only to supply suggestive computational-hardness evidence rather than a self-contained proof. The reverse gap is then obtained by direct numerical comparison of these two independently sourced quantities. No equation, estimator, or claim is defined in terms of itself, no internal fit is relabeled as a prediction, and no load-bearing step collapses to a self-citation whose validity is presupposed by the present work. The analysis therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the sub-Gaussian tail assumption for concentration and on the low-degree polynomial method as a proxy for computational hardness; no free parameters or new entities are introduced.

axioms (1)

domain assumption The p-dimensional random vector has sub-Gaussian marginals
Invoked to obtain concentration bounds for the sample moment tensors in the minimax analysis.

pith-pipeline@v0.9.0 · 5531 in / 1271 out tokens · 34777 ms · 2026-05-14T23:26:47.939969+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 7 internal anchors

[1]

Bandeira, and Georgina Hall

Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model.IEEE Transactions on Information Theory, 62(1):471–487, 2016. 27

work page 2016
[2]

Community detection in general stochastic block models: Fundamental limits and efficient recovery algorithms, 2015

Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models: Fundamental limits and efficient recovery algorithms, 2015

work page 2015
[3]

Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order.Probability Theory and Related Fields, 162(3):531– 586, 2015

Rados law Adamczak and Pawe l Wolff. Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order.Probability Theory and Related Fields, 162(3):531– 586, 2015

work page 2015
[4]

Sharp Concentration of Simple Random Tensors, February 2025

Omar Al-Ghattas, Jiaheng Chen, and Daniel Sanz-Alonso. Sharp Concentration of Simple Random Tensors, February 2025. arXiv:2502.16916 [math]

work page arXiv 2025
[5]

Tensor decompositions for learning latent variable models.J

Animashree Anandkumar, Rong Ge, Daniel J Hsu, Sham M Kakade, Matus Telgarsky, et al. Tensor decompositions for learning latent variable models.J. Mach. Learn. Res., 15(1):2773–2832, 2014

work page 2014
[6]

A method of moments for mixture models and hidden markov models

Animashree Anandkumar, Daniel Hsu, and Sham M Kakade. A method of moments for mixture models and hidden markov models. InConference on learning theory, pages 33–1. JMLR Workshop and Conference Proceedings, 2012

work page 2012
[7]

Large-dimensional independent component analysis: Sta- tistical optimality and computational tractability.The Annals of Statistics, 53(2):477–505, 2025

Arnab Auddy and Ming Yuan. Large-dimensional independent component analysis: Sta- tistical optimality and computational tractability.The Annals of Statistics, 53(2):477–505, 2025

work page 2025
[8]

A nearly tight sum-of-squares lower bound for the planted clique problem

Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019

work page 2019
[9]

Noisy tensor completion via the sum-of-squares hierarchy

Boaz Barak and Ankur Moitra. Noisy tensor completion via the sum-of-squares hierarchy. InConference on Learning Theory, pages 417–445. PMLR, 2016

work page 2016
[10]

Reducibility and Statistical-Computational Gaps from Secret Leakage, June 2020

Matthew Brennan and Guy Bresler. Reducibility and Statistical-Computational Gaps from Secret Leakage, June 2020. arXiv:2005.08099 [cs]

work page arXiv 2020
[11]

Sta- tistical query algorithms and low-degree tests are almost equivalent.arXiv preprint arXiv:2009.06107, 2020

Matthew Brennan, Guy Bresler, Samuel B Hopkins, Jerry Li, and Tselil Schramm. Sta- tistical query algorithms and low-degree tests are almost equivalent.arXiv preprint arXiv:2009.06107, 2020

work page arXiv 2009
[12]

Detection-recovery and detection-refutation gaps via re- ductions from planted clique

Guy Bresler and Tianze Jiang. Detection-recovery and detection-refutation gaps via re- ductions from planted clique. InThe Thirty Sixth Annual Conference on Learning Theory, pages 5850–5889. PMLR, 2023

work page 2023
[13]

The quasi- polynomial low-degree conjecture is false.arXiv preprint arXiv:2505.17360, 2025

Rares-Darius Buhai, Jun-Ting Hsieh, Aayush Jain, and Pravesh K Kothari. The quasi- polynomial low-degree conjecture is false.arXiv preprint arXiv:2505.17360, 2025. 28

work page arXiv 2025
[14]

PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design

Olivier Catoni. Pac-bayesian bounds for the gram matrix and least squares regression with a random design.arXiv preprint arXiv:1603.05229, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[15]

Independent component analysis, a new concept?Signal processing, 36(3):287–314, 1994

Pierre Comon. Independent component analysis, a new concept?Signal processing, 36(3):287–314, 1994

work page 1994
[16]

Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

Aur´ elien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborov´ a. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106, 2011

work page 2011
[17]

M.N. Do. Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models.IEEE Signal Processing Letters, 10(4):115–118, April 2003

work page 2003
[18]

Higher criticism for detecting sparse heterogeneous mix- tures.The Annals of Statistics, 32(3):962–994, 2004

David Donoho and Jiashun Jin. Higher criticism for detecting sparse heterogeneous mix- tures.The Annals of Statistics, 32(3):962–994, 2004

work page 2004
[19]

Optimal estimation of high-dimensional location gaussian mixtures.arXiv preprint arXiv:2002.05818, 2020

Natalie Doss, Yihong Wu, Pengkun Yang, and Harrison H Zhou. Optimal estimation of high-dimensional location gaussian mixtures.arXiv preprint arXiv:2002.05818, 2020

work page arXiv 2002
[20]

Statistical Query Lower Bounds for Tensor PCA, February

Rishabh Dudeja and Daniel Hsu. Statistical Query Lower Bounds for Tensor PCA, February

work page
[21]

arXiv:2008.04101 [math]

work page arXiv 2008
[22]

Statistical query lower bounds for tensor pca.Journal of Machine Learning Research, 22(83):1–51, 2021

Rishabh Dudeja and Daniel Hsu. Statistical query lower bounds for tensor pca.Journal of Machine Learning Research, 22(83):1–51, 2021

work page 2021
[23]

Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity, January 2024

Rishabh Dudeja and Daniel Hsu. Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity, January 2024. arXiv:2204.07526 [math]

work page arXiv 2024
[24]

Ahmed El Alaoui, Florent Krzakala, and Michael I. Jordan. Fundamental limits of detection in the spiked wigner model.The Annals of Statistics, 48(2):863–885, April 2020

work page 2020
[25]

The number of singular vector tuples and uniqueness of best rank one approximation of tensors

Shmuel Friedland and Giorgio Ottaviani. The number of singular vector tuples and unique- ness of best rank one approximation of tensors, November 2013. arXiv:1210.8316 [math]

work page internal anchor Pith review Pith/arXiv arXiv 2013
[26]

Concentration inequalities for polyno- mials inα-sub-exponential random variables.Electronic Journal of Probability, 26(none):1– 22, January 2021

Friedrich G¨ otze, Holger Sambale, and Arthur Sinulis. Concentration inequalities for polyno- mials inα-sub-exponential random variables.Electronic Journal of Probability, 26(none):1– 22, January 2021. Publisher: Institute of Mathematical Statistics and Bernoulli Society

work page 2021
[27]

Hillar and Lek-Heng Lim

Christopher J. Hillar and Lek-Heng Lim. Most Tensor Problems Are NP-Hard.Journal of the ACM, 60(6):1–39, November 2013

work page 2013
[28]

Counterexamples to the low-degree conjecture

Justin Holmgren and Alexander S Wein. Counterexamples to the low-degree conjecture. arXiv preprint arXiv:2004.08454, 2020. 29

work page arXiv 2004
[29]

Cornell University, 2018

Samuel Hopkins.Statistical inference and the sum of squares method. Cornell University, 2018

work page 2018
[30]

The power of sum-of-squares for detecting hidden structures

Samuel B Hopkins, Pravesh K Kothari, Aaron Potechin, Prasad Raghavendra, Tselil Schramm, and David Steurer. The power of sum-of-squares for detecting hidden structures. In2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 720–731. IEEE, 2017

work page 2017
[31]

Tensor principal component analysis via sum-of-square proofs

Samuel B Hopkins, Jonathan Shi, and David Steurer. Tensor principal component analysis via sum-of-square proofs. InConference on Learning Theory, pages 956–1006. PMLR, 2015

work page 2015
[32]

Bayesian estimation from few samples: community detection and related problems

Samuel B Hopkins and David Steurer. Bayesian estimation from few samples: community detection and related problems.arXiv preprint arXiv:1710.00264, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Efficient bayesian estimation from few samples: community detection and related problems

Samuel B Hopkins and David Steurer. Efficient bayesian estimation from few samples: community detection and related problems. In2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 379–390. IEEE, 2017

work page 2017
[34]

Ingster, Alexandre B

Yuri I. Ingster, Alexandre B. Tsybakov, and Nicolas Verzelen. Detection boundary in sparse regression.Electronic Journal of Statistics, 4:1476–1526, 2010

work page 2010
[36]

User-Friendly Covariance Estimation for Heavy-Tailed Distributions

Yuan Ke, Stanislav Minsker, Zhao Ren, Qiang Sun, and Wen-Xin Zhou. User-Friendly Co- variance Estimation for Heavy-Tailed Distributions, March 2019. arXiv:1811.01520 [stat]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[37]

Tensor decomposition for learning gaussian mixtures from moments.Journal of Symbolic Computation, 113:193–210, 2022

Rima Khouja, Pierre-Alexandre Mattei, and Bernard Mourrain. Tensor decomposition for learning gaussian mixtures from moments.Journal of Symbolic Computation, 113:193–210, 2022

work page 2022
[38]

Concentration inequalities and moment bounds for sample covariance operators.Bernoulli, pages 110–133, 2017

Vladimir Koltchinskii and Karim Lounici. Concentration inequalities and moment bounds for sample covariance operators.Bernoulli, pages 110–133, 2017

work page 2017
[39]

Method of moments for gaus- sian mixtures: Implementation and benchmarks

Haley Kottler, Julia Lindberg, and Jose Israel Rodriguez. Method of moments for gaus- sian mixtures: Implementation and benchmarks. InProceedings of the 2025 International Symposium on Symbolic and Algebraic Computation, pages 150–159, 2025

work page 2025
[40]

Wein, and Afonso S

Dmitriy Kunisky, Alexander S. Wein, and Afonso S. Bandeira. Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio, July

work page
[41]

arXiv:1907.11636 [math]. 30

work page internal anchor Pith review Pith/arXiv arXiv 1907
[42]

Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio

Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira. Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio. InISAAC Congress (International Society for Analysis, its Applications and Computation), pages 1–

work page
[43]

Decoupling inequalities for polynomial chaos.The Annals of Probabil- ity, pages 1062–1071, 1987

Stanislaw Kwapien. Decoupling inequalities for polynomial chaos.The Annals of Probabil- ity, pages 1062–1071, 1987

work page 1987
[44]

Estimates of moments and tails of gaussian chaoses.The Annals of Probability, 34(6):2315–2331, 2006

Rafa l Lata la. Estimates of moments and tails of gaussian chaoses.The Annals of Probability, 34(6):2315–2331, 2006

work page 2006
[45]

Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials, August 2024

Yuetian Luo and Chao Gao. Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials, August 2024. arXiv:2308.15728 [math]

work page arXiv 2024
[46]

Open problem: Average-case hardness of hypergraphic planted clique detection

Yuetian Luo and Anru R Zhang. Open problem: Average-case hardness of hypergraphic planted clique detection. InConference on Learning Theory, pages 3852–3856. PMLR, 2020

work page 2020
[47]

Tensor-on-tensor regression: Riemannian optimization, over-parameterization, statistical-computational gap and their interplay.The Annals of Statistics, 52(6):2583–2612, 2024

Yuetian Luo and Anru R Zhang. Tensor-on-tensor regression: Riemannian optimization, over-parameterization, statistical-computational gap and their interplay.The Annals of Statistics, 52(6):2583–2612, 2024

work page 2024
[48]

Detection-recovery gap for planted dense cycles

Cheng Mao, Alexander S Wein, and Shenduo Zhang. Detection-recovery gap for planted dense cycles. InThe Thirty Sixth Annual Conference on Learning Theory, pages 2440–2481. PMLR, 2023

work page 2023
[49]

Robust covariance estimation under l 4-l 2 norm equivalence.The Annals of Statistics, 48(3):1648–1664, June 2020

Shahar Mendelson and Nikita Zhivotovskiy. Robust covariance estimation under l 4-l 2 norm equivalence.The Annals of Statistics, 48(3):1648–1664, June 2020

work page 2020
[50]

Sub-gaussian estimators of the mean of a random matrix with heavy- tailed entries.The Annals of Statistics, 46(6A):2871–2903, 2018

Stanislav Minsker. Sub-gaussian estimators of the mean of a random matrix with heavy- tailed entries.The Annals of Statistics, 46(6A):2871–2903, 2018

work page 2018
[51]

Estimation of the covariance structure of heavy-tailed distributions

Stanislav Minsker and Xiaohan Wei. Estimation of the covariance structure of heavy-tailed distributions, January 2018. arXiv:1708.00502 [math]

work page internal anchor Pith review Pith/arXiv arXiv 2018
[52]

Contributions to the mathematical theory of evolution.Philosophical Trans- actions of the Royal Society of London

Karl Pearson. Contributions to the mathematical theory of evolution.Philosophical Trans- actions of the Royal Society of London. A, 185:71–110, 1894

work page
[53]

Pereira, Joe Kileel, and Tamara G

Jo˜ ao M. Pereira, Joe Kileel, and Tamara G. Kolda. Tensor Moments of Gaussian Mixture Models: Theory and Applications, March 2022. arXiv:2202.06930 [stat]

work page arXiv 2022
[54]

Machinery for proving sum-of-squares lower bounds on certification problems.arXiv preprint arXiv:2011.04253, 2020

Aaron Potechin and Goutham Rajendran. Machinery for proving sum-of-squares lower bounds on certification problems.arXiv preprint arXiv:2011.04253, 2020. 31

work page arXiv 2011
[55]

Rigollet and J.-C

Philippe Rigollet and Jan-Christian H¨ utter. High-dimensional statistics.arXiv preprint arXiv:2310.19244, 2023

work page arXiv 2023
[56]

Springer, 2017

Konrad Schm¨ udgen et al.The moment problem, volume 9. Springer, 2017

work page 2017
[57]

Computational barriers to estimation from low- degree polynomials.The Annals of Statistics, 50(3):1833–1858, 2022

Tselil Schramm and Alexander S Wein. Computational barriers to estimation from low- degree polynomials.The Annals of Statistics, 50(3):1833–1858, 2022

work page 2022
[58]

Cumulants and partition lattices 1.Australian Journal of Statistics, 25(2):378–388, 1983

Terry P Speed. Cumulants and partition lattices 1.Australian Journal of Statistics, 25(2):378–388, 1983

work page 1983
[59]

Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks, June 2024

Eszter Sz´ ekely, Lorenzo Bardone, Federica Gerace, and Sebastian Goldt. Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks, June 2024. arXiv:2312.14922 [stat] version: 3

work page arXiv 2024
[60]

Revisit cp tensor decom- position: Statistical optimality and fast convergence.arXiv preprint arXiv:2505.23046, 2025

Runshi Tang, Julien Chhor, Olga Klopp, and Anru R Zhang. Revisit cp tensor decom- position: Statistical optimality and fast convergence.arXiv preprint arXiv:2505.23046, 2025

work page arXiv 2025
[61]

Tsybakov.Introduction to Nonparametric Estimation

Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, 2009

work page 2009
[62]

Estimation of the covariance structure of heavy-tailed distributions.Advances in neural information processing systems, 30, 2017

Xiaohan Wei and Stanislav Minsker. Estimation of the covariance structure of heavy-tailed distributions.Advances in neural information processing systems, 30, 2017

work page 2017
[63]

Average-case complexity of tensor decomposition for low-degree poly- nomials

Alexander S Wein. Average-case complexity of tensor decomposition for low-degree poly- nomials. InProceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1685–1698, 2023

work page 2023
[64]

Computational complexity of statistics: New insights from low-degree polynomials.arXiv preprint arXiv:2506.10748, 2025

Alexander S Wein. Computational complexity of statistics: New insights from low-degree polynomials.arXiv preprint arXiv:2506.10748, 2025

work page arXiv 2025
[65]

Optimal estimation of gaussian mixtures via denoised method of moments.The Annals of Statistics, 48(4):1981–2007, August 2020

Yihong Wu and Pengkun Yang. Optimal estimation of gaussian mixtures via denoised method of moments.The Annals of Statistics, 48(4):1981–2007, August 2020

work page 1981
[66]

Assouad, fano, and le cam

Bin Yu. Assouad, fano, and le cam. InFestschrift for Lucien Le Cam: research papers in probability and statistics, pages 423–435. Springer, 1997

work page 1997
[67]

Tensor svd: Statistical and computational limits.IEEE Transactions on Information Theory, 64(11):7311–7338, 2018

Anru Zhang and Dong Xia. Tensor svd: Statistical and computational limits.IEEE Transactions on Information Theory, 64(11):7311–7338, 2018

work page 2018
[68]

Lower bounds on the performance of polynomial-time algorithms for sparse linear regression

Yuchen Zhang, Martin J. Wainwright, and Michael I. Jordan. Lower bounds on the performance of polynomial-time algorithms for sparse linear regression, May 2014. arXiv:1402.1918 [math]. 32 A Additional Results for the Third Moment In this section, we discuss more theoretical results for the third moments. A.1 Existence A fundamental question closely relate...

work page internal anchor Pith review Pith/arXiv arXiv 2014
[69]

, Xp)∼N(0, σ 2I) andY=X ⊤SX−σ 2tr(S) for some symmetricS

LetX= (X 1, . . . , Xp)∼N(0, σ 2I) andY=X ⊤SX−σ 2tr(S) for some symmetricS. Then we haveEY=EX i = 0,EX iY=EX iXj = 0,EX 2 i =σ 2,EY 2 = 2σ 4tr(S2),EY X iXj = 2σ4Si,j,EY 2Xi =EX 3 i = 0,EY 3 = 8σ6tr(S3)

work page
[70]

Now, for a given tensorT, we consider the following construction procedure

For random vectorsX⊥YwithEX=EY= 0, we haveE(X+Y) ⊗3 =EX ⊗3 +EY ⊗3 andE(X+Y) ⊗2 =EX ⊗2 +EY ⊗2. Now, for a given tensorT, we consider the following construction procedure. For the first slice T1,i,j:

work page
[71]

LetX 1,1 = (Y, X) be constructed in the fact 1 withS=S 1/2 andσ= (2tr(S 2))−1/2

We let matrixS 1 ∈R (p−1)×(p−1) be (S1)i,j = (T) 1,1+i,1+j. LetX 1,1 = (Y, X) be constructed in the fact 1 withS=S 1/2 andσ= (2tr(S 2))−1/2

work page
[72]

Fork= 2, . . . , p, letX 1,k be the vector with first entry as (p−1) −1/6X,kth entry as (p−1) 1/3Y, and 0 otherwise.XandYare constructed as in the part 2 of Lemma 2 such thatE[X, Y] ⊤[X, Y] =βI 2×2,EXY 2 = 0, andEX 2Y=σ 4(T) 1,1,k forβ=σ 2(p−1) −2/3

work page
[73]

LetX 1,p+1 = (X, Y) whereXis constructed as in the part 1 of Lemma 2 such that EX2 =σ 2,EX 3 =σ 4(T) 1,1,1 −8σ 6tr(S3), andY∼N(0, σ 2I(p−1)×(p−1)), Y⊥X

work page
[74]

By the fact 2 above, we have (E ˜X ⊗3 1 )1,i,j =σ 4(T) 1,i,j and E ˜X ⊗2 1 = (σ2 +β(p−1) 2/3 +σ 2)Ip×p = 3σ2Ip×p

Let ˜X1 = P k∈[p+1] X1,k. By the fact 2 above, we have (E ˜X ⊗3 1 )1,i,j =σ 4(T) 1,i,j and E ˜X ⊗2 1 = (σ2 +β(p−1) 2/3 +σ 2)Ip×p = 3σ2Ip×p

work page
[75]

Now letX 1 = ˜X1Y

LetYbe independent of ˜X1 and constructed as in the part 1 of Lemma 2 such that EY 2 = 1/(3pσ 2) andEY 3 = 1/σ 4. Now letX 1 = ˜X1Y. We have (EX ⊗3 1 )1,i,j = (T) 1,i,j, (EX ⊗3 1 )k,i,j = 0 fori, j, k >1, andEX ⊗2 1 =p −1I. For thehth sliceT h,i,j:

work page
[76]

LetX h,h = (Y, X) be con- structed in the fact 1 withS=S h/2 andσ= (2tr(S 2))−1/2

We let matrixS h ∈R (p−h)×(p−h) be (S h)i,j = (T) h,h+i,h+j. LetX h,h = (Y, X) be con- structed in the fact 1 withS=S h/2 andσ= (2tr(S 2))−1/2. 35

work page
[77]

Fork=h+ 1, . . . , p, letX h,k be the vector with first entry as (p−h) −1/6X,kth entry as (p−h) 1/3Y, and 0 otherwise.XandYare constructed as in the part 2 of Lemma 2 such thatE[X, Y] ⊤[X, Y] =βI 2×2,EXY 2 = 0, andEX 2Y=σ 4(T) h,h,k forβ=σ 2(p−h) −2/3

work page
[78]

LetX h,p+1 = (X, Y) whereXis constructed as in the part 1 of Lemma 2 such that EX2 =σ 2,EX 3 =σ 4(T) h,h,h −8σ 6tr(S3), andY∼N(0, σ 2I(p−h)×(p−h)), Y⊥X

work page
[79]

By the fact 2 above, we have (E ˜X ⊗3 h )1,i,j =σ 4(T) h,h−1+i,h−1+j and E ˜X ⊗2 h = (σ2 +β(p−h) 2/3 +σ 2)I(p+1−h)×(p+1−h) = 3σ2I(p+1−h)×(p+1−h)

Let ˜Xh =Pp+1 k=h Xh,k. By the fact 2 above, we have (E ˜X ⊗3 h )1,i,j =σ 4(T) h,h−1+i,h−1+j and E ˜X ⊗2 h = (σ2 +β(p−h) 2/3 +σ 2)I(p+1−h)×(p+1−h) = 3σ2I(p+1−h)×(p+1−h)

work page
[80]

Now let ¯Xh = ˜XhY

LetYbe independent of ˜Xh and constructed as in the part 1 of Lemma 2 such that EY 2 = 1/(3pσ 2) (= 1/(pσ 2) ifh=p) andEY 3 = 1/σ 4. Now let ¯Xh = ˜XhY. We have (E ¯X ⊗3 h )1,i,j = (T) h,h−1+i,h−1+j, (E ¯X ⊗3 h )k,i,j = 0 fori, j, k >1, andE ¯X ⊗2 h =p −1I(p+1−h)×(p+1−h)

work page
[81]

,0, ¯Xh)∈R p

LetX h = (0, . . . ,0, ¯Xh)∈R p. We finally let ˜X= P h∈[p] Xh. We haveE ˜X ⊗3 =TandE ˜X ⊗2 is a diagonal matrix withkth diagonal entry (E ˜X ⊗2)k,k =k/p. LetY∼N(0,D) with diagonal matrixD∈R p given by (D)kk = (p−k)/p. LetX= ˜X+Y. Then we haveEX ⊗3 =TandEX ⊗2 =I. Now, for genealM, and an order-3 symmetric tensorTwith full Tucker rank, if random vectorXsat...

work page

Showing first 80 references.