pith. sign in

arxiv: 2606.28573 · v1 · pith:MHLU6MWInew · submitted 2026-06-26 · 💻 cs.LG · math.ST· stat.TH

Replica Symmetry Breaking and Algorithmic Thresholds in Empirical Risk Minimization under Multi-Index Model

Pith reviewed 2026-06-30 00:39 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.TH
keywords multi-index modelempirical risk minimizationapproximate message passingreplica symmetry breakingalgorithmic thresholdshigh-dimensional asymptoticsnon-convex optimization
0
0 comments X

The pith

An incremental approximate message passing algorithm achieves a training error conjectured to be optimal among all polynomial-time methods for empirical risk minimization in multi-index models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines high-dimensional supervised learning where the response depends on an unknown k-dimensional subspace of the features. It introduces the IAMP algorithm to optimize an m-dimensional projection model and derives exact asymptotic expressions for its training error and the test-training error relationship as n and d grow with fixed ratio alpha. This provides a concrete picture of the accessible optima in non-convex landscapes for efficient algorithms.

Core claim

We propose an incremental approximate message passing (IAMP) algorithm and precisely characterize the training error it achieves, as well as the relation between test and training error, in the high dimensional asymptotics n,d→∞, with n/d→α∈(0,+∞). Based on earlier work in related models, we expect that the performance achieved by our algorithm is optimal among polynomial-time algorithms.

What carries the argument

Incremental approximate message passing (IAMP) algorithm, which incrementally constructs solutions while characterizing the empirical risk in the multi-index setting.

If this is right

  • The training error of IAMP is exactly characterized in the asymptotic limit.
  • The relationship between test and training error is precisely described.
  • This performance level is expected to be the best achievable by any polynomial-time algorithm.
  • Gradient-based methods are limited to the regions accessible by IAMP in this model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar algorithmic thresholds may appear in more general neural network architectures.
  • The replica symmetry breaking phenomena likely govern the computational hardness in these ERM problems.
  • Finite-size simulations could test the accuracy of the high-dimensional predictions for moderate n and d.

Load-bearing premise

The optimality among polynomial-time algorithms is transferred from earlier related models without a dedicated proof for the multi-index case.

What would settle it

Demonstrating a polynomial-time algorithm that attains strictly lower training error than the IAMP prediction for some choice of k, m, and alpha in the high-dimensional limit would disprove the optimality conjecture.

Figures

Figures reproduced from arXiv: 2606.28573 by Andrea Montanari, Kangjie Zhou.

Figure 1
Figure 1. Figure 1: Left panel: The limiting overlap with w∗ for Bayes AMP, ERM (value achieved by our two-stage algorithm), and projected gradient descent (PGD). Right panel: The limiting training error (normalized empirical correlation) and test error (normalized population correlation) for these three methods. These values are computed under the limit where α → ∞ and λ → 0 such that αλ → α. See Section 5 for a comprehensiv… view at source ↗
Figure 2
Figure 2. Figure 2: Left panel: The limiting overlap with w∗ for Bayes AMP, ERM and PGD. Right panel: The normalized limiting empirical and population correlation for these three methods. In our implementation of PGD, we set d = 100 and α = 200, sweeping α ∈ [0, 10] so that λ ranges in [0, 0.05]. All reported numerical results are averaged over 100 independent runs of PGD. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Left panel: The limiting overlap with w∗ for Bayes AMP, ERM and PGD, under different choices of the activation σ. Right panel: The normalized limiting empirical and population correlation for these three methods. In this experiment, we fix σ1 = 1 and vary k ∈ {2, 3} and σk ∈ {1, 2, 5}. The top and bottom rows present our results for k = 2 and k = 3, respectively. The remaining experimental setup is the sam… view at source ↗
read the original abstract

Modern machine learning models are trained by optimizing high-dimensional non-convex empirical risk functions. Such cost functions can have a multitude of local optima and yet, gradient-based optimization appears to converge to near-global optima. Within a simple supervised learning setting, we develop a precise picture of which parts of the empirical risk landscape are accessible by polynomial-time algorithms. We are given i.i.d. pairs $\{(\boldsymbol{x}_i,y_i):\; 1 \le i\le n\}$ with $\boldsymbol{x}_i\in \mathbb{R}^d$ standard Gaussian feature vectors, and $y_i\in\mathbb{R}$ response variables that depend on $\boldsymbol{x}_i$ through their projections on an unknown $k$-dimensional subspace. We use empirical risk minimization to learn a model that depends on an $m$-dimensional projection of the data (e.g., an $m$-neurons neural network). We propose an incremental approximate message passing (IAMP) algorithm and precisely characterize the training error it achieves, as well as the relation between test and training error, in the high dimensional asymptotics $n,d\to\infty$, with $n/d\to\alpha \in (0, +\infty)$. Based on earlier work in related models, we expect that the performance achieved by our algorithm is optimal among polynomial-time algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper considers supervised learning with responses depending on an unknown k-dimensional subspace of Gaussian features. It introduces an incremental approximate message passing (IAMP) algorithm for empirical risk minimization over an m-dimensional projection (e.g., an m-neuron network) and derives the high-dimensional asymptotic training error of IAMP together with the relation between its test and training errors, in the limit n,d→∞ with n/d→α∈(0,∞). The manuscript states that, based on earlier work in related models, this performance is expected to be optimal among all polynomial-time algorithms.

Significance. A rigorous high-dimensional characterization of IAMP under the multi-index model would clarify which parts of the non-convex ERM landscape are reachable by efficient algorithms and would connect replica-symmetry-breaking analyses to concrete algorithmic thresholds. The work could therefore inform both the statistical-physics approach to learning and the design of practical subspace-learning procedures, provided the optimality expectation is placed on firmer footing.

major comments (1)
  1. [Abstract] Abstract (final sentence): The claim that IAMP performance is optimal among polynomial-time algorithms rests on an expectation transferred from 'earlier work in related models' without a new reduction, state-evolution fixed-point analysis, or explicit verification that the multi-index structure does not introduce additional poly-time barriers (new fixed points or phase transitions) absent from the single-index or related cases referenced. This transfer is load-bearing for the title's reference to 'Algorithmic Thresholds' and for the central narrative about accessible regions of the ERM landscape.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their careful reading and for identifying a key point regarding the presentation of our optimality expectation. We address the comment below and propose revisions to clarify the scope of our claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract (final sentence): The claim that IAMP performance is optimal among polynomial-time algorithms rests on an expectation transferred from 'earlier work in related models' without a new reduction, state-evolution fixed-point analysis, or explicit verification that the multi-index structure does not introduce additional poly-time barriers (new fixed points or phase transitions) absent from the single-index or related cases referenced. This transfer is load-bearing for the title's reference to 'Algorithmic Thresholds' and for the central narrative about accessible regions of the ERM landscape.

    Authors: We agree that the manuscript does not contain a new reduction, state-evolution analysis of additional fixed points, or explicit verification ruling out multi-index-specific poly-time barriers. The optimality statement is presented as an expectation transferred from prior results on related models (single-index and planted problems with comparable structure), where analogous AMP algorithms achieve conjectured optimality. The core contribution remains the high-dimensional characterization of training error and test-training relation for IAMP under the multi-index ERM. We will revise the abstract to replace 'we expect that the performance achieved by our algorithm is optimal' with 'we conjecture, based on earlier work in related models, that the performance achieved by our algorithm is optimal among polynomial-time methods,' making the conjectural nature explicit. We will also adjust the title and narrative to refer to 'algorithmic thresholds achieved by IAMP' rather than implying proven optimality across all poly-time methods. These changes address the load-bearing concern without altering the technical results. revision: yes

standing simulated objections not resolved
  • A rigorous reduction or state-evolution analysis confirming that the multi-index model introduces no new polynomial-time barriers beyond those in the referenced single-index cases.

Circularity Check

1 steps flagged

Optimality expectation transferred from prior related models without new derivation in this paper

specific steps
  1. self citation load bearing [Abstract]
    "Based on earlier work in related models, we expect that the performance achieved by our algorithm is optimal among polynomial-time algorithms."

    The optimality claim among polynomial-time algorithms is not derived or proven within this paper for the multi-index model; it is instead transferred from prior work whose authors overlap with the present authors. No new reduction, fixed-point analysis, or argument is provided showing that multi-index structure introduces no additional poly-time barriers.

full rationale

The paper's core contribution is a precise asymptotic characterization of the training error achieved by the proposed IAMP algorithm (and its relation to test error) under the multi-index model. This appears to be derived internally via the algorithm definition and high-dimensional analysis. However, the additional statement that this performance is expected to be optimal among polynomial-time algorithms is justified only by reference to 'earlier work in related models' without a new proof or state-evolution argument specific to the multi-index setting. This matches the self-citation load-bearing pattern for the optimality aspect, but the central characterization retains independent content, so the overall circularity is moderate rather than total.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the central claim rests on the high-dimensional limit and on optimality results imported from prior models.

axioms (2)
  • domain assumption High-dimensional asymptotics n,d→∞ with n/d→α fixed
    The precise characterization of training and test error is stated to hold in this limit.
  • domain assumption Optimality of IAMP among poly-time algorithms transfers from earlier related models
    Explicitly invoked in the final sentence of the abstract.

pith-pipeline@v0.9.1-grok · 5769 in / 1245 out tokens · 36880 ms · 2026-06-30T00:39:14.159155+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 8 canonical work pages · 2 internal anchors

  1. [1]

    Benjamin Aubin, Antoine Maillard, Florent Krzakala, Nicolas Macris, and Lenka Zdeborov \'a , The committee machine: Computational to statistical gaps in learning a two-layers neural network, Advances in Neural Information Processing Systems 31 (2018)

  2. [2]

    11, 2282--2330

    Gerard Ben Arous, Song Mei, Andrea Montanari, and Mihai Nica, The landscape of the spiked tensor model, Communications on Pure and Applied Mathematics 72 (2019), no. 11, 2282--2330

  3. [3]

    609--633

    Antonio Auffinger, Andrea Montanari, and Eliran Subag, Optimization of random high-dimensional functions: Structure and algorithms, Spin Glass Theory and Far Beyond: Replica Symmetry Breaking After 40 Years, World Scientific, 2023, pp. 609--633

  4. [4]

    Kiana Asgari, Andrea Montanari, and Basil Saeed, Local minima of the empirical risk in high dimension: General theorems and convex examples, Annals of Statistics (2026)

  5. [5]

    12, 5451--5460

    Jean Barbier, Florent Krzakala, Nicolas Macris, L \'e o Miolane, and Lenka Zdeborov \'a , Optimal errors and phase transitions in high-dimensional generalized linear models, Proceedings of the National Academy of Sciences 116 (2019), no. 12, 5451--5460

  6. [6]

    2, 764--785

    Mohsen Bayati and Andrea Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Transactions on Information Theory 57 (2011), no. 2, 764--785

  7. [7]

    4, 1997--2017

    , The lasso risk for gaussian matrices, IEEE Transactions on Information Theory 58 (2011), no. 4, 1997--2017

  8. [8]

    5, 822--883

    Yuxin Chen and Emmanuel J Cand \`e s, Solving random quadratic systems of equations is nearly as easy as solving linear systems, Communications on pure and applied mathematics 70 (2017), no. 5, 822--883

  9. [9]

    Alexander S Cherny, Singular stochastic differential equations, Springer Science & Business Media, 2005

  10. [10]

    1078--1141

    Michael Celentano, Andrea Montanari, and Yuchen Wu, The estimation error of general first order methods, Conference on Learning Theory, PMLR, 2020, pp. 1078--1141

  11. [11]

    1, 129--173

    Wei-Kuo Chen and Arnab Sen, Parisi formula, disorder chaos and fluctuation for the ground state energy in the spherical mixed p-spin models, Communications in Mathematical Physics 350 (2017), no. 1, 129--173

  12. [12]

    Persi Diaconis and David Freedman, Asymptotics of graphical projection pursuit, The Annals of Statistics (1984), 793--815

  13. [13]

    David Donoho and Andrea Montanari, High dimensional robust M-estimation: Asymptotic variance via approximate message passing , Probability Theory and Related Fields 166 (2016), 935--969

  14. [14]

    6, 2922--2960

    Ahmed El Alaoui, Andrea Montanari, and Mark Sellke, Optimization of mean-field spin glasses, The Annals of Probability 49 (2021), no. 6, 2922--2960

  15. [15]

    1, 95--175

    Noureddine El Karoui, On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators, Probability Theory and Related Fields 170 (2018), no. 1, 95--175

  16. [16]

    9, 881--890

    Jerome H Friedman and John W Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Transactions on computers 100 (1974), no. 9, 881--890

  17. [17]

    1, 180--205

    David Gamarnik and Aukosh Jagannath, The overlap gap property and approximate message passing algorithms for p-spin models, The Annals of Probability 49 (2021), no. 1, 180--205

  18. [18]

    G Gy \"o rgyi and P Reimann, Beyond storage capacity in a single model neuron: Continuous replica symmetry breaking, Journal of Statistical Physics 101 (2000), 679--702

  19. [19]

    369--376

    David Gamarnik and Madhu Sudan, Limits of local algorithms over sparse random graphs, Proceedings of the 5th conference on Innovations in theoretical computer science, 2014, pp. 369--376

  20. [20]

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, and Aidan Clark, Training compute-optimal large language models, arXiv:2203.15556 (2022)

  21. [21]

    312--322

    Brice Huang and Mark Sellke, Tight Lipschitz hardness for optimizing mean field spin glasses , 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2022, pp. 312--322

  22. [22]

    2, 1--42

    , Optimization algorithms for multi-species spherical spin glasses, Journal of Statistical Physics 191 (2024), no. 2, 1--42

  23. [23]

    2, 115--144

    Adel Javanmard and Andrea Montanari, State evolution for general approximate message passing algorithms, with applications to spatial coupling, Information and Inference: A Journal of the IMA 2 (2013), no. 2, 115--144

  24. [24]

    3, 979--1017

    Aukosh Jagannath and Ian Tobasco, Low temperature asymptotics of spherical mean field spin glasses, Communications in Mathematical Physics 352 (2017), no. 3, 979--1017

  25. [25]

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei, Scaling laws for neural language models, arXiv preprint arXiv:2001.08361 (2020)

  26. [26]

    Antoine Maillard, Tony Bonnaire, and Giulio Biroli, Topological exploration of high-dimensional empirical risk landscapes: general approach, and applications to phase retrieval, arXiv preprint arXiv:2602.17779 (2026)

  27. [27]

    1445--1450

    Marco Mondelli and Andrea Montanari, Fundamental limits of weak recovery with applications to phase retrieval, Conference On Learning Theory, PMLR, 2018, pp. 1445--1450

  28. [28]

    A. Montanari, Optimization of the sherrington-kirkpatrick hamiltonian, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) (Los Alamitos, CA, USA), IEEE Computer Society, nov 2019, pp. 1417--1433

  29. [29]

    Andrea Montanari, Optimization of random cost functions and statistical physics, Proceedings of the 2023 Solvay Conference on Physics, 2024

  30. [30]

    Andrea Montanari and Emile Richard, A statistical model for tensor pca, Advances in neural information processing systems 27 (2014)

  31. [31]

    1, 1--173

    Andrea Montanari and Subhabrata Sen, A friendly tutorial on mean-field spin glass techniques for non-physicists, Foundations and Trends in Machine Learning 17 (2024), no. 1, 1--173

  32. [32]

    Andrea Montanari and Basil Saeed, Topological trivialization in non-convex empirical risk minimization, arXiv preprint arXiv:2602.14969 (2026)

  33. [33]

    Andrea Montanari and Yuchen Wu, Statistically optimal first order algorithms: A proof via orthogonalization, arXiv preprint arXiv:2201.05101 (2022)

  34. [34]

    Andrea Montanari and Kangjie Zhou, Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks, arXiv preprint arXiv:2206.06526 (2022)

  35. [35]

    , Which exceptional low-dimensional projections of a gaussian point cloud can be found in polynomial time?, arXiv preprint arXiv:2406.02970 (2024)

  36. [36]

    David Nualart, The malliavin calculus and related topics, Springer, 2006

  37. [37]

    Shai Shalev-Shwartz and Shai Ben-David, Understanding machine learning: From theory to algorithms, Cambridge University Press, 2014

  38. [38]

    5, 1021--1044

    Eliran Subag, Following the ground states of full-rsb spherical spin glasses, Communications on Pure and Applied Mathematics 74 (2021), no. 5, 1021--1044

  39. [39]

    54, Springer Science & Business Media, 2010

    Michel Talagrand, Mean field models for spin glasses: Volume i: Basic examples, vol. 54, Springer Science & Business Media, 2010

  40. [40]

    1683--1709

    Christos Thrampoulidis, Samet Oymak, and Babak Hassibi, Regularized linear regression: A precise analysis of the estimation error, Conference on Learning Theory, PMLR, 2015, pp. 1683--1709

  41. [41]

    Matteo Vilucchio, Yatin Dandi, Mat \'e o Pirio Rossignol, Cedric Gerbelot, and Florent Krzakala, Asymptotics of non-convex generalized linear models in high-dimensions: A proof of the replica formula, arXiv preprint arXiv:2502.20003 (2025)