pith. machine review for the scientific record. sign in

arxiv: 2603.26029 · v2 · submitted 2026-03-27 · 🧮 math.ST · cs.CC· stat.TH

Recognition: no theorem link

Detection Is Harder Than Estimation in Certain Regimes: Inference for Moment and Cumulant Tensors

Anru R. Zhang, Runshi Tang, Yuefeng Han

Authors on Pith no claims yet

Pith reviewed 2026-05-14 23:26 UTC · model grok-4.3

classification 🧮 math.ST cs.CCstat.TH
keywords moment tensorscumulant tensorsestimation ratedetectioncomputational hardnesslow-degree polynomialstensor spectral normreverse gap
0
0 comments X

The pith

Efficient estimation of high-order cumulant tensors succeeds in regimes where low-degree methods cannot detect them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the minimax rate for estimating the spectral norm of order-d moment and cumulant tensors from n sub-Gaussian samples in p dimensions is sqrt(p/n) wedge 1. A specially constructed estimator attains this rate up to logarithmic factors, outperforming the raw sample moment tensor when d is at least 3. At the same time, low-degree polynomial analysis indicates that testing whether the d-th cumulant tensor vanishes after whitening is computationally hard for n much smaller than p to the power d over 2. In an overlapping parameter range the efficient estimator's error lies below the separation that low-degree tests can distinguish, producing a reverse gap in which estimation is easier than detection. The root cause is that the spectral norm loss itself is NP-hard to compute.

Core claim

The central claim is that there exists a regime in which an efficiently computable estimator for the order-d cumulant tensor achieves error smaller than the separation at which low-degree tests can reliably distinguish the null from the alternative, after whitening. This produces an unusual reverse detection-estimation gap because the tensor spectral norm metric is NP-hard to evaluate.

What carries the argument

The tensor spectral norm as the performance metric, which is NP-hard to compute, together with the low-degree polynomial framework used to certify computational hardness of detection.

If this is right

  • The minimax estimation rate for the d-th moment or cumulant tensor in spectral norm is sqrt(p/n) wedge 1.
  • The sample moment tensor is rate-suboptimal for d at least 3, but a modified estimator attains the optimal rate up to logs.
  • Low-degree evidence shows detection of non-vanishing cumulants is hard when n is much less than p^{d/2}.
  • The reverse gap appears precisely because the spectral norm is NP-hard to compute, so efficient procedures can control estimation error without solving the detection problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar reverse gaps may appear in other tensor or high-order moment problems whenever the loss function is computationally intractable.
  • Practitioners facing high-order tensor data might obtain useful estimates even when reliable detection remains out of reach.
  • Approximation schemes for the tensor spectral norm could close the gap or confirm that the hardness is intrinsic.

Load-bearing premise

The observations are sub-Gaussian and the low-degree polynomial framework accurately reflects computational hardness for the detection problem.

What would settle it

Discovery of a polynomial-time algorithm that distinguishes zero from non-zero d-th cumulant tensors (after whitening) for sample sizes n much smaller than p^{d/2}, or a direct numerical check showing the proposed estimator fails to achieve error of order sqrt(p/n) on synthetic sub-Gaussian data.

Figures

Figures reproduced from arXiv: 2603.26029 by Anru R. Zhang, Runshi Tang, Yuefeng Han.

Figure 1
Figure 1. Figure 1: Phase diagram of statistical and computational limits in moment and cumulant tensor [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
read the original abstract

We study estimation and detection of high-order moment and cumulant tensors from $n$ i.i.d.\ observations of a $p$-dimensional random vector, with performance measured in tensor spectral norm. Under sub-Gaussianity, we show that the minimax rate for estimating the order-$d$ moment and cumulant tensors is $\sqrt{p/n}\wedge 1$. In contrast to covariance estimation, the sample moment tensor is generally not rate-optimal for $d\ge 3$, and we construct an estimator that attains the minimax rate up to logarithmic factors. On the computational side, we study testing whether the $d$-th order cumulant tensor vanishes after whitening. Using the low-degree polynomial framework, we provide evidence that detection is computationally hard when $n\ll p^{d/2}$. At the same time, we identify a regime in which an efficiently computable estimator achieves error smaller than the separation at which low-degree tests can reliably distinguish the null from the alternative. This reveals an unusual reverse detection--estimation gap: computationally efficient detection can be harder than computationally efficient estimation. The underlying reason is that the relevant loss, tensor spectral norm, is itself NP-hard to compute, creating a new form of computational--statistical gap.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript derives minimax rates of √(p/n) ∧ 1 for estimating order-d moment and cumulant tensors in spectral norm under sub-Gaussian observations. It shows the empirical moment tensor is suboptimal for d ≥ 3 and constructs a near-optimal estimator attaining the rate up to logarithmic factors. For detection of non-vanishing cumulants after whitening, the low-degree polynomial method is used to indicate computational hardness when n ≪ p^{d/2}. The paper identifies a regime in which an efficient estimator achieves spectral-norm error below the low-degree detection threshold, establishing a reverse detection-estimation gap whose origin is the NP-hardness of computing the tensor spectral norm itself.

Significance. If the central claims hold, the work identifies an unusual computational-statistical tradeoff in which efficient estimation can succeed in a regime where efficient detection fails, driven by the computational hardness of the loss function rather than by information-theoretic limits. The minimax upper and lower bounds follow standard sub-Gaussian arguments, and the low-degree analysis is consistent with prior applications of the framework; the explicit construction of a rate-optimal estimator and the concrete regime exhibiting the reverse gap are the primary contributions.

major comments (1)
  1. [Detection analysis (low-degree section)] The reverse-gap claim in the abstract and detection section rests on low-degree polynomial evidence that detection is hard for n ≪ p^{d/2}. This framework shows that the maximum correlation of degree-k polynomials with the likelihood ratio is o(1) below the threshold, but it supplies only suggestive evidence of computational hardness rather than a rigorous proof that no polynomial-time algorithm exists. If an efficient algorithm outside the low-degree class succeeds at detection in this regime, the claimed reverse gap disappears. The manuscript should state the precise strength of this evidence and, if possible, supply a concrete test (e.g., a specific algorithm class) that would falsify the hardness claim.
minor comments (2)
  1. [Introduction] The notation distinguishing moment tensors from cumulant tensors after whitening is introduced without an explicit reference to the relevant definition; adding a short display equation or pointer to the appendix would improve readability.
  2. [Estimator construction] The logarithmic factors in the upper bound for the constructed estimator are stated but not derived in the main text; a brief sketch of where the logs arise (e.g., from union bounds or covering numbers) would help readers assess tightness.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. We address the major comment below and will incorporate the suggested clarifications in the revised version.

read point-by-point responses
  1. Referee: The reverse-gap claim in the abstract and detection section rests on low-degree polynomial evidence that detection is hard for n ≪ p^{d/2}. This framework shows that the maximum correlation of degree-k polynomials with the likelihood ratio is o(1) below the threshold, but it supplies only suggestive evidence of computational hardness rather than a rigorous proof that no polynomial-time algorithm exists. If an efficient algorithm outside the low-degree class succeeds at detection in this regime, the claimed reverse gap disappears. The manuscript should state the precise strength of this evidence and, if possible, supply a concrete test (e.g., a specific algorithm class) that would falsify the hardness claim.

    Authors: We agree that the low-degree polynomial method provides suggestive evidence of computational hardness rather than a rigorous unconditional proof. In the revised manuscript, we will explicitly state in the detection section (and update the abstract accordingly) that the hardness claim is evidence under the low-degree framework, which is a standard but not exhaustive proxy for polynomial-time algorithms. We will also add a brief discussion of falsifying tests, noting that the claim would be refuted by the discovery of an efficient algorithm (for instance, via convex relaxations such as semidefinite programming or tensor power iteration variants) that reliably detects non-vanishing cumulants for n ≪ p^{d/2} while remaining outside the low-degree class. This clarification preserves the paper's contributions while accurately qualifying the computational evidence. revision: yes

Circularity Check

0 steps flagged

No significant circularity; rates and gap rest on external minimax arguments and low-degree framework

full rationale

The estimation minimax rate sqrt(p/n) wedge 1 is obtained from standard sub-Gaussian concentration and tensor-norm arguments that do not reduce to any fitted parameter inside the paper. The detection threshold n << p^{d/2} is imported from the established low-degree polynomial method (external literature), which is used only to supply suggestive computational-hardness evidence rather than a self-contained proof. The reverse gap is then obtained by direct numerical comparison of these two independently sourced quantities. No equation, estimator, or claim is defined in terms of itself, no internal fit is relabeled as a prediction, and no load-bearing step collapses to a self-citation whose validity is presupposed by the present work. The analysis therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the sub-Gaussian tail assumption for concentration and on the low-degree polynomial method as a proxy for computational hardness; no free parameters or new entities are introduced.

axioms (1)
  • domain assumption The p-dimensional random vector has sub-Gaussian marginals
    Invoked to obtain concentration bounds for the sample moment tensors in the minimax analysis.

pith-pipeline@v0.9.0 · 5531 in / 1271 out tokens · 34777 ms · 2026-05-14T23:26:47.939969+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 81 canonical work pages · 7 internal anchors

  1. [1]

    Bandeira, and Georgina Hall

    Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. Exact recovery in the stochastic block model.IEEE Transactions on Information Theory, 62(1):471–487, 2016. 27

  2. [2]

    Community detection in general stochastic block models: Fundamental limits and efficient recovery algorithms, 2015

    Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models: Fundamental limits and efficient recovery algorithms, 2015

  3. [3]

    Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order.Probability Theory and Related Fields, 162(3):531– 586, 2015

    Rados law Adamczak and Pawe l Wolff. Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order.Probability Theory and Related Fields, 162(3):531– 586, 2015

  4. [4]

    Sharp Concentration of Simple Random Tensors, February 2025

    Omar Al-Ghattas, Jiaheng Chen, and Daniel Sanz-Alonso. Sharp Concentration of Simple Random Tensors, February 2025. arXiv:2502.16916 [math]

  5. [5]

    Tensor decompositions for learning latent variable models.J

    Animashree Anandkumar, Rong Ge, Daniel J Hsu, Sham M Kakade, Matus Telgarsky, et al. Tensor decompositions for learning latent variable models.J. Mach. Learn. Res., 15(1):2773–2832, 2014

  6. [6]

    A method of moments for mixture models and hidden markov models

    Animashree Anandkumar, Daniel Hsu, and Sham M Kakade. A method of moments for mixture models and hidden markov models. InConference on learning theory, pages 33–1. JMLR Workshop and Conference Proceedings, 2012

  7. [7]

    Large-dimensional independent component analysis: Sta- tistical optimality and computational tractability.The Annals of Statistics, 53(2):477–505, 2025

    Arnab Auddy and Ming Yuan. Large-dimensional independent component analysis: Sta- tistical optimality and computational tractability.The Annals of Statistics, 53(2):477–505, 2025

  8. [8]

    A nearly tight sum-of-squares lower bound for the planted clique problem

    Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing, 48(2):687–735, 2019

  9. [9]

    Noisy tensor completion via the sum-of-squares hierarchy

    Boaz Barak and Ankur Moitra. Noisy tensor completion via the sum-of-squares hierarchy. InConference on Learning Theory, pages 417–445. PMLR, 2016

  10. [10]

    Reducibility and Statistical-Computational Gaps from Secret Leakage, June 2020

    Matthew Brennan and Guy Bresler. Reducibility and Statistical-Computational Gaps from Secret Leakage, June 2020. arXiv:2005.08099 [cs]

  11. [11]

    Sta- tistical query algorithms and low-degree tests are almost equivalent.arXiv preprint arXiv:2009.06107, 2020

    Matthew Brennan, Guy Bresler, Samuel B Hopkins, Jerry Li, and Tselil Schramm. Sta- tistical query algorithms and low-degree tests are almost equivalent.arXiv preprint arXiv:2009.06107, 2020

  12. [12]

    Detection-recovery and detection-refutation gaps via re- ductions from planted clique

    Guy Bresler and Tianze Jiang. Detection-recovery and detection-refutation gaps via re- ductions from planted clique. InThe Thirty Sixth Annual Conference on Learning Theory, pages 5850–5889. PMLR, 2023

  13. [13]

    The quasi- polynomial low-degree conjecture is false.arXiv preprint arXiv:2505.17360, 2025

    Rares-Darius Buhai, Jun-Ting Hsieh, Aayush Jain, and Pravesh K Kothari. The quasi- polynomial low-degree conjecture is false.arXiv preprint arXiv:2505.17360, 2025. 28

  14. [14]

    PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design

    Olivier Catoni. Pac-bayesian bounds for the gram matrix and least squares regression with a random design.arXiv preprint arXiv:1603.05229, 2016

  15. [15]

    Independent component analysis, a new concept?Signal processing, 36(3):287–314, 1994

    Pierre Comon. Independent component analysis, a new concept?Signal processing, 36(3):287–314, 1994

  16. [16]

    Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

    Aur´ elien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborov´ a. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6):066106, 2011

  17. [17]

    M.N. Do. Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models.IEEE Signal Processing Letters, 10(4):115–118, April 2003

  18. [18]

    Higher criticism for detecting sparse heterogeneous mix- tures.The Annals of Statistics, 32(3):962–994, 2004

    David Donoho and Jiashun Jin. Higher criticism for detecting sparse heterogeneous mix- tures.The Annals of Statistics, 32(3):962–994, 2004

  19. [19]

    Optimal estimation of high-dimensional location gaussian mixtures.arXiv preprint arXiv:2002.05818, 2020

    Natalie Doss, Yihong Wu, Pengkun Yang, and Harrison H Zhou. Optimal estimation of high-dimensional location gaussian mixtures.arXiv preprint arXiv:2002.05818, 2020

  20. [20]

    Statistical Query Lower Bounds for Tensor PCA, February

    Rishabh Dudeja and Daniel Hsu. Statistical Query Lower Bounds for Tensor PCA, February

  21. [21]

    arXiv:2008.04101 [math]

  22. [22]

    Statistical query lower bounds for tensor pca.Journal of Machine Learning Research, 22(83):1–51, 2021

    Rishabh Dudeja and Daniel Hsu. Statistical query lower bounds for tensor pca.Journal of Machine Learning Research, 22(83):1–51, 2021

  23. [23]

    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity, January 2024

    Rishabh Dudeja and Daniel Hsu. Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity, January 2024. arXiv:2204.07526 [math]

  24. [24]

    Ahmed El Alaoui, Florent Krzakala, and Michael I. Jordan. Fundamental limits of detection in the spiked wigner model.The Annals of Statistics, 48(2):863–885, April 2020

  25. [25]

    The number of singular vector tuples and uniqueness of best rank one approximation of tensors

    Shmuel Friedland and Giorgio Ottaviani. The number of singular vector tuples and unique- ness of best rank one approximation of tensors, November 2013. arXiv:1210.8316 [math]

  26. [26]

    Concentration inequalities for polyno- mials inα-sub-exponential random variables.Electronic Journal of Probability, 26(none):1– 22, January 2021

    Friedrich G¨ otze, Holger Sambale, and Arthur Sinulis. Concentration inequalities for polyno- mials inα-sub-exponential random variables.Electronic Journal of Probability, 26(none):1– 22, January 2021. Publisher: Institute of Mathematical Statistics and Bernoulli Society

  27. [27]

    Hillar and Lek-Heng Lim

    Christopher J. Hillar and Lek-Heng Lim. Most Tensor Problems Are NP-Hard.Journal of the ACM, 60(6):1–39, November 2013

  28. [28]

    Counterexamples to the low-degree conjecture

    Justin Holmgren and Alexander S Wein. Counterexamples to the low-degree conjecture. arXiv preprint arXiv:2004.08454, 2020. 29

  29. [29]

    Cornell University, 2018

    Samuel Hopkins.Statistical inference and the sum of squares method. Cornell University, 2018

  30. [30]

    The power of sum-of-squares for detecting hidden structures

    Samuel B Hopkins, Pravesh K Kothari, Aaron Potechin, Prasad Raghavendra, Tselil Schramm, and David Steurer. The power of sum-of-squares for detecting hidden structures. In2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 720–731. IEEE, 2017

  31. [31]

    Tensor principal component analysis via sum-of-square proofs

    Samuel B Hopkins, Jonathan Shi, and David Steurer. Tensor principal component analysis via sum-of-square proofs. InConference on Learning Theory, pages 956–1006. PMLR, 2015

  32. [32]

    Bayesian estimation from few samples: community detection and related problems

    Samuel B Hopkins and David Steurer. Bayesian estimation from few samples: community detection and related problems.arXiv preprint arXiv:1710.00264, 2017

  33. [33]

    Efficient bayesian estimation from few samples: community detection and related problems

    Samuel B Hopkins and David Steurer. Efficient bayesian estimation from few samples: community detection and related problems. In2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 379–390. IEEE, 2017

  34. [34]

    Ingster, Alexandre B

    Yuri I. Ingster, Alexandre B. Tsybakov, and Nicolas Verzelen. Detection boundary in sparse regression.Electronic Journal of Statistics, 4:1476–1526, 2010

  35. [36]

    User-Friendly Covariance Estimation for Heavy-Tailed Distributions

    Yuan Ke, Stanislav Minsker, Zhao Ren, Qiang Sun, and Wen-Xin Zhou. User-Friendly Co- variance Estimation for Heavy-Tailed Distributions, March 2019. arXiv:1811.01520 [stat]

  36. [37]

    Tensor decomposition for learning gaussian mixtures from moments.Journal of Symbolic Computation, 113:193–210, 2022

    Rima Khouja, Pierre-Alexandre Mattei, and Bernard Mourrain. Tensor decomposition for learning gaussian mixtures from moments.Journal of Symbolic Computation, 113:193–210, 2022

  37. [38]

    Concentration inequalities and moment bounds for sample covariance operators.Bernoulli, pages 110–133, 2017

    Vladimir Koltchinskii and Karim Lounici. Concentration inequalities and moment bounds for sample covariance operators.Bernoulli, pages 110–133, 2017

  38. [39]

    Method of moments for gaus- sian mixtures: Implementation and benchmarks

    Haley Kottler, Julia Lindberg, and Jose Israel Rodriguez. Method of moments for gaus- sian mixtures: Implementation and benchmarks. InProceedings of the 2025 International Symposium on Symbolic and Algebraic Computation, pages 150–159, 2025

  39. [40]

    Wein, and Afonso S

    Dmitriy Kunisky, Alexander S. Wein, and Afonso S. Bandeira. Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio, July

  40. [41]

    arXiv:1907.11636 [math]. 30

  41. [42]

    Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio

    Dmitriy Kunisky, Alexander S Wein, and Afonso S Bandeira. Notes on computational hardness of hypothesis testing: Predictions using the low-degree likelihood ratio. InISAAC Congress (International Society for Analysis, its Applications and Computation), pages 1–

  42. [43]

    Decoupling inequalities for polynomial chaos.The Annals of Probabil- ity, pages 1062–1071, 1987

    Stanislaw Kwapien. Decoupling inequalities for polynomial chaos.The Annals of Probabil- ity, pages 1062–1071, 1987

  43. [44]

    Estimates of moments and tails of gaussian chaoses.The Annals of Probability, 34(6):2315–2331, 2006

    Rafa l Lata la. Estimates of moments and tails of gaussian chaoses.The Annals of Probability, 34(6):2315–2331, 2006

  44. [45]

    Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials, August 2024

    Yuetian Luo and Chao Gao. Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials, August 2024. arXiv:2308.15728 [math]

  45. [46]

    Open problem: Average-case hardness of hypergraphic planted clique detection

    Yuetian Luo and Anru R Zhang. Open problem: Average-case hardness of hypergraphic planted clique detection. InConference on Learning Theory, pages 3852–3856. PMLR, 2020

  46. [47]

    Tensor-on-tensor regression: Riemannian optimization, over-parameterization, statistical-computational gap and their interplay.The Annals of Statistics, 52(6):2583–2612, 2024

    Yuetian Luo and Anru R Zhang. Tensor-on-tensor regression: Riemannian optimization, over-parameterization, statistical-computational gap and their interplay.The Annals of Statistics, 52(6):2583–2612, 2024

  47. [48]

    Detection-recovery gap for planted dense cycles

    Cheng Mao, Alexander S Wein, and Shenduo Zhang. Detection-recovery gap for planted dense cycles. InThe Thirty Sixth Annual Conference on Learning Theory, pages 2440–2481. PMLR, 2023

  48. [49]

    Robust covariance estimation under l 4-l 2 norm equivalence.The Annals of Statistics, 48(3):1648–1664, June 2020

    Shahar Mendelson and Nikita Zhivotovskiy. Robust covariance estimation under l 4-l 2 norm equivalence.The Annals of Statistics, 48(3):1648–1664, June 2020

  49. [50]

    Sub-gaussian estimators of the mean of a random matrix with heavy- tailed entries.The Annals of Statistics, 46(6A):2871–2903, 2018

    Stanislav Minsker. Sub-gaussian estimators of the mean of a random matrix with heavy- tailed entries.The Annals of Statistics, 46(6A):2871–2903, 2018

  50. [51]

    Estimation of the covariance structure of heavy-tailed distributions

    Stanislav Minsker and Xiaohan Wei. Estimation of the covariance structure of heavy-tailed distributions, January 2018. arXiv:1708.00502 [math]

  51. [52]

    Contributions to the mathematical theory of evolution.Philosophical Trans- actions of the Royal Society of London

    Karl Pearson. Contributions to the mathematical theory of evolution.Philosophical Trans- actions of the Royal Society of London. A, 185:71–110, 1894

  52. [53]

    Pereira, Joe Kileel, and Tamara G

    Jo˜ ao M. Pereira, Joe Kileel, and Tamara G. Kolda. Tensor Moments of Gaussian Mixture Models: Theory and Applications, March 2022. arXiv:2202.06930 [stat]

  53. [54]

    Machinery for proving sum-of-squares lower bounds on certification problems.arXiv preprint arXiv:2011.04253, 2020

    Aaron Potechin and Goutham Rajendran. Machinery for proving sum-of-squares lower bounds on certification problems.arXiv preprint arXiv:2011.04253, 2020. 31

  54. [55]

    Rigollet and J.-C

    Philippe Rigollet and Jan-Christian H¨ utter. High-dimensional statistics.arXiv preprint arXiv:2310.19244, 2023

  55. [56]

    Springer, 2017

    Konrad Schm¨ udgen et al.The moment problem, volume 9. Springer, 2017

  56. [57]

    Computational barriers to estimation from low- degree polynomials.The Annals of Statistics, 50(3):1833–1858, 2022

    Tselil Schramm and Alexander S Wein. Computational barriers to estimation from low- degree polynomials.The Annals of Statistics, 50(3):1833–1858, 2022

  57. [58]

    Cumulants and partition lattices 1.Australian Journal of Statistics, 25(2):378–388, 1983

    Terry P Speed. Cumulants and partition lattices 1.Australian Journal of Statistics, 25(2):378–388, 1983

  58. [59]

    Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks, June 2024

    Eszter Sz´ ekely, Lorenzo Bardone, Federica Gerace, and Sebastian Goldt. Learning from higher-order statistics, efficiently: hypothesis tests, random features, and neural networks, June 2024. arXiv:2312.14922 [stat] version: 3

  59. [60]

    Revisit cp tensor decom- position: Statistical optimality and fast convergence.arXiv preprint arXiv:2505.23046, 2025

    Runshi Tang, Julien Chhor, Olga Klopp, and Anru R Zhang. Revisit cp tensor decom- position: Statistical optimality and fast convergence.arXiv preprint arXiv:2505.23046, 2025

  60. [61]

    Tsybakov.Introduction to Nonparametric Estimation

    Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer, 2009

  61. [62]

    Estimation of the covariance structure of heavy-tailed distributions.Advances in neural information processing systems, 30, 2017

    Xiaohan Wei and Stanislav Minsker. Estimation of the covariance structure of heavy-tailed distributions.Advances in neural information processing systems, 30, 2017

  62. [63]

    Average-case complexity of tensor decomposition for low-degree poly- nomials

    Alexander S Wein. Average-case complexity of tensor decomposition for low-degree poly- nomials. InProceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1685–1698, 2023

  63. [64]

    Computational complexity of statistics: New insights from low-degree polynomials.arXiv preprint arXiv:2506.10748, 2025

    Alexander S Wein. Computational complexity of statistics: New insights from low-degree polynomials.arXiv preprint arXiv:2506.10748, 2025

  64. [65]

    Optimal estimation of gaussian mixtures via denoised method of moments.The Annals of Statistics, 48(4):1981–2007, August 2020

    Yihong Wu and Pengkun Yang. Optimal estimation of gaussian mixtures via denoised method of moments.The Annals of Statistics, 48(4):1981–2007, August 2020

  65. [66]

    Assouad, fano, and le cam

    Bin Yu. Assouad, fano, and le cam. InFestschrift for Lucien Le Cam: research papers in probability and statistics, pages 423–435. Springer, 1997

  66. [67]

    Tensor svd: Statistical and computational limits.IEEE Transactions on Information Theory, 64(11):7311–7338, 2018

    Anru Zhang and Dong Xia. Tensor svd: Statistical and computational limits.IEEE Transactions on Information Theory, 64(11):7311–7338, 2018

  67. [68]

    Lower bounds on the performance of polynomial-time algorithms for sparse linear regression

    Yuchen Zhang, Martin J. Wainwright, and Michael I. Jordan. Lower bounds on the performance of polynomial-time algorithms for sparse linear regression, May 2014. arXiv:1402.1918 [math]. 32 A Additional Results for the Third Moment In this section, we discuss more theoretical results for the third moments. A.1 Existence A fundamental question closely relate...

  68. [69]

    , Xp)∼N(0, σ 2I) andY=X ⊤SX−σ 2tr(S) for some symmetricS

    LetX= (X 1, . . . , Xp)∼N(0, σ 2I) andY=X ⊤SX−σ 2tr(S) for some symmetricS. Then we haveEY=EX i = 0,EX iY=EX iXj = 0,EX 2 i =σ 2,EY 2 = 2σ 4tr(S2),EY X iXj = 2σ4Si,j,EY 2Xi =EX 3 i = 0,EY 3 = 8σ6tr(S3)

  69. [70]

    Now, for a given tensorT, we consider the following construction procedure

    For random vectorsX⊥YwithEX=EY= 0, we haveE(X+Y) ⊗3 =EX ⊗3 +EY ⊗3 andE(X+Y) ⊗2 =EX ⊗2 +EY ⊗2. Now, for a given tensorT, we consider the following construction procedure. For the first slice T1,i,j:

  70. [71]

    LetX 1,1 = (Y, X) be constructed in the fact 1 withS=S 1/2 andσ= (2tr(S 2))−1/2

    We let matrixS 1 ∈R (p−1)×(p−1) be (S1)i,j = (T) 1,1+i,1+j. LetX 1,1 = (Y, X) be constructed in the fact 1 withS=S 1/2 andσ= (2tr(S 2))−1/2

  71. [72]

    Fork= 2, . . . , p, letX 1,k be the vector with first entry as (p−1) −1/6X,kth entry as (p−1) 1/3Y, and 0 otherwise.XandYare constructed as in the part 2 of Lemma 2 such thatE[X, Y] ⊤[X, Y] =βI 2×2,EXY 2 = 0, andEX 2Y=σ 4(T) 1,1,k forβ=σ 2(p−1) −2/3

  72. [73]

    LetX 1,p+1 = (X, Y) whereXis constructed as in the part 1 of Lemma 2 such that EX2 =σ 2,EX 3 =σ 4(T) 1,1,1 −8σ 6tr(S3), andY∼N(0, σ 2I(p−1)×(p−1)), Y⊥X

  73. [74]

    By the fact 2 above, we have (E ˜X ⊗3 1 )1,i,j =σ 4(T) 1,i,j and E ˜X ⊗2 1 = (σ2 +β(p−1) 2/3 +σ 2)Ip×p = 3σ2Ip×p

    Let ˜X1 = P k∈[p+1] X1,k. By the fact 2 above, we have (E ˜X ⊗3 1 )1,i,j =σ 4(T) 1,i,j and E ˜X ⊗2 1 = (σ2 +β(p−1) 2/3 +σ 2)Ip×p = 3σ2Ip×p

  74. [75]

    Now letX 1 = ˜X1Y

    LetYbe independent of ˜X1 and constructed as in the part 1 of Lemma 2 such that EY 2 = 1/(3pσ 2) andEY 3 = 1/σ 4. Now letX 1 = ˜X1Y. We have (EX ⊗3 1 )1,i,j = (T) 1,i,j, (EX ⊗3 1 )k,i,j = 0 fori, j, k >1, andEX ⊗2 1 =p −1I. For thehth sliceT h,i,j:

  75. [76]

    LetX h,h = (Y, X) be con- structed in the fact 1 withS=S h/2 andσ= (2tr(S 2))−1/2

    We let matrixS h ∈R (p−h)×(p−h) be (S h)i,j = (T) h,h+i,h+j. LetX h,h = (Y, X) be con- structed in the fact 1 withS=S h/2 andσ= (2tr(S 2))−1/2. 35

  76. [77]

    Fork=h+ 1, . . . , p, letX h,k be the vector with first entry as (p−h) −1/6X,kth entry as (p−h) 1/3Y, and 0 otherwise.XandYare constructed as in the part 2 of Lemma 2 such thatE[X, Y] ⊤[X, Y] =βI 2×2,EXY 2 = 0, andEX 2Y=σ 4(T) h,h,k forβ=σ 2(p−h) −2/3

  77. [78]

    LetX h,p+1 = (X, Y) whereXis constructed as in the part 1 of Lemma 2 such that EX2 =σ 2,EX 3 =σ 4(T) h,h,h −8σ 6tr(S3), andY∼N(0, σ 2I(p−h)×(p−h)), Y⊥X

  78. [79]

    By the fact 2 above, we have (E ˜X ⊗3 h )1,i,j =σ 4(T) h,h−1+i,h−1+j and E ˜X ⊗2 h = (σ2 +β(p−h) 2/3 +σ 2)I(p+1−h)×(p+1−h) = 3σ2I(p+1−h)×(p+1−h)

    Let ˜Xh =Pp+1 k=h Xh,k. By the fact 2 above, we have (E ˜X ⊗3 h )1,i,j =σ 4(T) h,h−1+i,h−1+j and E ˜X ⊗2 h = (σ2 +β(p−h) 2/3 +σ 2)I(p+1−h)×(p+1−h) = 3σ2I(p+1−h)×(p+1−h)

  79. [80]

    Now let ¯Xh = ˜XhY

    LetYbe independent of ˜Xh and constructed as in the part 1 of Lemma 2 such that EY 2 = 1/(3pσ 2) (= 1/(pσ 2) ifh=p) andEY 3 = 1/σ 4. Now let ¯Xh = ˜XhY. We have (E ¯X ⊗3 h )1,i,j = (T) h,h−1+i,h−1+j, (E ¯X ⊗3 h )k,i,j = 0 fori, j, k >1, andE ¯X ⊗2 h =p −1I(p+1−h)×(p+1−h)

  80. [81]

    ,0, ¯Xh)∈R p

    LetX h = (0, . . . ,0, ¯Xh)∈R p. We finally let ˜X= P h∈[p] Xh. We haveE ˜X ⊗3 =TandE ˜X ⊗2 is a diagonal matrix withkth diagonal entry (E ˜X ⊗2)k,k =k/p. LetY∼N(0,D) with diagonal matrixD∈R p given by (D)kk = (p−k)/p. LetX= ˜X+Y. Then we haveEX ⊗3 =TandEX ⊗2 =I. Now, for genealM, and an order-3 symmetric tensorTwith full Tucker rank, if random vectorXsat...

Showing first 80 references.