Replica Symmetry Breaking and Algorithmic Thresholds in Empirical Risk Minimization under Multi-Index Model
Pith reviewed 2026-06-30 00:39 UTC · model grok-4.3
The pith
An incremental approximate message passing algorithm achieves a training error conjectured to be optimal among all polynomial-time methods for empirical risk minimization in multi-index models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose an incremental approximate message passing (IAMP) algorithm and precisely characterize the training error it achieves, as well as the relation between test and training error, in the high dimensional asymptotics n,d→∞, with n/d→α∈(0,+∞). Based on earlier work in related models, we expect that the performance achieved by our algorithm is optimal among polynomial-time algorithms.
What carries the argument
Incremental approximate message passing (IAMP) algorithm, which incrementally constructs solutions while characterizing the empirical risk in the multi-index setting.
If this is right
- The training error of IAMP is exactly characterized in the asymptotic limit.
- The relationship between test and training error is precisely described.
- This performance level is expected to be the best achievable by any polynomial-time algorithm.
- Gradient-based methods are limited to the regions accessible by IAMP in this model.
Where Pith is reading between the lines
- Similar algorithmic thresholds may appear in more general neural network architectures.
- The replica symmetry breaking phenomena likely govern the computational hardness in these ERM problems.
- Finite-size simulations could test the accuracy of the high-dimensional predictions for moderate n and d.
Load-bearing premise
The optimality among polynomial-time algorithms is transferred from earlier related models without a dedicated proof for the multi-index case.
What would settle it
Demonstrating a polynomial-time algorithm that attains strictly lower training error than the IAMP prediction for some choice of k, m, and alpha in the high-dimensional limit would disprove the optimality conjecture.
Figures
read the original abstract
Modern machine learning models are trained by optimizing high-dimensional non-convex empirical risk functions. Such cost functions can have a multitude of local optima and yet, gradient-based optimization appears to converge to near-global optima. Within a simple supervised learning setting, we develop a precise picture of which parts of the empirical risk landscape are accessible by polynomial-time algorithms. We are given i.i.d. pairs $\{(\boldsymbol{x}_i,y_i):\; 1 \le i\le n\}$ with $\boldsymbol{x}_i\in \mathbb{R}^d$ standard Gaussian feature vectors, and $y_i\in\mathbb{R}$ response variables that depend on $\boldsymbol{x}_i$ through their projections on an unknown $k$-dimensional subspace. We use empirical risk minimization to learn a model that depends on an $m$-dimensional projection of the data (e.g., an $m$-neurons neural network). We propose an incremental approximate message passing (IAMP) algorithm and precisely characterize the training error it achieves, as well as the relation between test and training error, in the high dimensional asymptotics $n,d\to\infty$, with $n/d\to\alpha \in (0, +\infty)$. Based on earlier work in related models, we expect that the performance achieved by our algorithm is optimal among polynomial-time algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper considers supervised learning with responses depending on an unknown k-dimensional subspace of Gaussian features. It introduces an incremental approximate message passing (IAMP) algorithm for empirical risk minimization over an m-dimensional projection (e.g., an m-neuron network) and derives the high-dimensional asymptotic training error of IAMP together with the relation between its test and training errors, in the limit n,d→∞ with n/d→α∈(0,∞). The manuscript states that, based on earlier work in related models, this performance is expected to be optimal among all polynomial-time algorithms.
Significance. A rigorous high-dimensional characterization of IAMP under the multi-index model would clarify which parts of the non-convex ERM landscape are reachable by efficient algorithms and would connect replica-symmetry-breaking analyses to concrete algorithmic thresholds. The work could therefore inform both the statistical-physics approach to learning and the design of practical subspace-learning procedures, provided the optimality expectation is placed on firmer footing.
major comments (1)
- [Abstract] Abstract (final sentence): The claim that IAMP performance is optimal among polynomial-time algorithms rests on an expectation transferred from 'earlier work in related models' without a new reduction, state-evolution fixed-point analysis, or explicit verification that the multi-index structure does not introduce additional poly-time barriers (new fixed points or phase transitions) absent from the single-index or related cases referenced. This transfer is load-bearing for the title's reference to 'Algorithmic Thresholds' and for the central narrative about accessible regions of the ERM landscape.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for identifying a key point regarding the presentation of our optimality expectation. We address the comment below and propose revisions to clarify the scope of our claims.
read point-by-point responses
-
Referee: [Abstract] Abstract (final sentence): The claim that IAMP performance is optimal among polynomial-time algorithms rests on an expectation transferred from 'earlier work in related models' without a new reduction, state-evolution fixed-point analysis, or explicit verification that the multi-index structure does not introduce additional poly-time barriers (new fixed points or phase transitions) absent from the single-index or related cases referenced. This transfer is load-bearing for the title's reference to 'Algorithmic Thresholds' and for the central narrative about accessible regions of the ERM landscape.
Authors: We agree that the manuscript does not contain a new reduction, state-evolution analysis of additional fixed points, or explicit verification ruling out multi-index-specific poly-time barriers. The optimality statement is presented as an expectation transferred from prior results on related models (single-index and planted problems with comparable structure), where analogous AMP algorithms achieve conjectured optimality. The core contribution remains the high-dimensional characterization of training error and test-training relation for IAMP under the multi-index ERM. We will revise the abstract to replace 'we expect that the performance achieved by our algorithm is optimal' with 'we conjecture, based on earlier work in related models, that the performance achieved by our algorithm is optimal among polynomial-time methods,' making the conjectural nature explicit. We will also adjust the title and narrative to refer to 'algorithmic thresholds achieved by IAMP' rather than implying proven optimality across all poly-time methods. These changes address the load-bearing concern without altering the technical results. revision: yes
- A rigorous reduction or state-evolution analysis confirming that the multi-index model introduces no new polynomial-time barriers beyond those in the referenced single-index cases.
Circularity Check
Optimality expectation transferred from prior related models without new derivation in this paper
specific steps
-
self citation load bearing
[Abstract]
"Based on earlier work in related models, we expect that the performance achieved by our algorithm is optimal among polynomial-time algorithms."
The optimality claim among polynomial-time algorithms is not derived or proven within this paper for the multi-index model; it is instead transferred from prior work whose authors overlap with the present authors. No new reduction, fixed-point analysis, or argument is provided showing that multi-index structure introduces no additional poly-time barriers.
full rationale
The paper's core contribution is a precise asymptotic characterization of the training error achieved by the proposed IAMP algorithm (and its relation to test error) under the multi-index model. This appears to be derived internally via the algorithm definition and high-dimensional analysis. However, the additional statement that this performance is expected to be optimal among polynomial-time algorithms is justified only by reference to 'earlier work in related models' without a new proof or state-evolution argument specific to the multi-index setting. This matches the self-citation load-bearing pattern for the optimality aspect, but the central characterization retains independent content, so the overall circularity is moderate rather than total.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption High-dimensional asymptotics n,d→∞ with n/d→α fixed
- domain assumption Optimality of IAMP among poly-time algorithms transfers from earlier related models
Reference graph
Works this paper leans on
-
[1]
Benjamin Aubin, Antoine Maillard, Florent Krzakala, Nicolas Macris, and Lenka Zdeborov \'a , The committee machine: Computational to statistical gaps in learning a two-layers neural network, Advances in Neural Information Processing Systems 31 (2018)
2018
-
[2]
11, 2282--2330
Gerard Ben Arous, Song Mei, Andrea Montanari, and Mihai Nica, The landscape of the spiked tensor model, Communications on Pure and Applied Mathematics 72 (2019), no. 11, 2282--2330
2019
-
[3]
609--633
Antonio Auffinger, Andrea Montanari, and Eliran Subag, Optimization of random high-dimensional functions: Structure and algorithms, Spin Glass Theory and Far Beyond: Replica Symmetry Breaking After 40 Years, World Scientific, 2023, pp. 609--633
2023
-
[4]
Kiana Asgari, Andrea Montanari, and Basil Saeed, Local minima of the empirical risk in high dimension: General theorems and convex examples, Annals of Statistics (2026)
2026
-
[5]
12, 5451--5460
Jean Barbier, Florent Krzakala, Nicolas Macris, L \'e o Miolane, and Lenka Zdeborov \'a , Optimal errors and phase transitions in high-dimensional generalized linear models, Proceedings of the National Academy of Sciences 116 (2019), no. 12, 5451--5460
2019
-
[6]
2, 764--785
Mohsen Bayati and Andrea Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Transactions on Information Theory 57 (2011), no. 2, 764--785
2011
-
[7]
4, 1997--2017
, The lasso risk for gaussian matrices, IEEE Transactions on Information Theory 58 (2011), no. 4, 1997--2017
2011
-
[8]
5, 822--883
Yuxin Chen and Emmanuel J Cand \`e s, Solving random quadratic systems of equations is nearly as easy as solving linear systems, Communications on pure and applied mathematics 70 (2017), no. 5, 822--883
2017
-
[9]
Alexander S Cherny, Singular stochastic differential equations, Springer Science & Business Media, 2005
2005
-
[10]
1078--1141
Michael Celentano, Andrea Montanari, and Yuchen Wu, The estimation error of general first order methods, Conference on Learning Theory, PMLR, 2020, pp. 1078--1141
2020
-
[11]
1, 129--173
Wei-Kuo Chen and Arnab Sen, Parisi formula, disorder chaos and fluctuation for the ground state energy in the spherical mixed p-spin models, Communications in Mathematical Physics 350 (2017), no. 1, 129--173
2017
-
[12]
Persi Diaconis and David Freedman, Asymptotics of graphical projection pursuit, The Annals of Statistics (1984), 793--815
1984
-
[13]
David Donoho and Andrea Montanari, High dimensional robust M-estimation: Asymptotic variance via approximate message passing , Probability Theory and Related Fields 166 (2016), 935--969
2016
-
[14]
6, 2922--2960
Ahmed El Alaoui, Andrea Montanari, and Mark Sellke, Optimization of mean-field spin glasses, The Annals of Probability 49 (2021), no. 6, 2922--2960
2021
-
[15]
1, 95--175
Noureddine El Karoui, On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators, Probability Theory and Related Fields 170 (2018), no. 1, 95--175
2018
-
[16]
9, 881--890
Jerome H Friedman and John W Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Transactions on computers 100 (1974), no. 9, 881--890
1974
-
[17]
1, 180--205
David Gamarnik and Aukosh Jagannath, The overlap gap property and approximate message passing algorithms for p-spin models, The Annals of Probability 49 (2021), no. 1, 180--205
2021
-
[18]
G Gy \"o rgyi and P Reimann, Beyond storage capacity in a single model neuron: Continuous replica symmetry breaking, Journal of Statistical Physics 101 (2000), 679--702
2000
-
[19]
369--376
David Gamarnik and Madhu Sudan, Limits of local algorithms over sparse random graphs, Proceedings of the 5th conference on Innovations in theoretical computer science, 2014, pp. 369--376
2014
-
[20]
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, and Aidan Clark, Training compute-optimal large language models, arXiv:2203.15556 (2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[21]
312--322
Brice Huang and Mark Sellke, Tight Lipschitz hardness for optimizing mean field spin glasses , 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2022, pp. 312--322
2022
-
[22]
2, 1--42
, Optimization algorithms for multi-species spherical spin glasses, Journal of Statistical Physics 191 (2024), no. 2, 1--42
2024
-
[23]
2, 115--144
Adel Javanmard and Andrea Montanari, State evolution for general approximate message passing algorithms, with applications to spatial coupling, Information and Inference: A Journal of the IMA 2 (2013), no. 2, 115--144
2013
-
[24]
3, 979--1017
Aukosh Jagannath and Ian Tobasco, Low temperature asymptotics of spherical mean field spin glasses, Communications in Mathematical Physics 352 (2017), no. 3, 979--1017
2017
-
[25]
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei, Scaling laws for neural language models, arXiv preprint arXiv:2001.08361 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2001
- [26]
-
[27]
1445--1450
Marco Mondelli and Andrea Montanari, Fundamental limits of weak recovery with applications to phase retrieval, Conference On Learning Theory, PMLR, 2018, pp. 1445--1450
2018
-
[28]
A. Montanari, Optimization of the sherrington-kirkpatrick hamiltonian, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) (Los Alamitos, CA, USA), IEEE Computer Society, nov 2019, pp. 1417--1433
2019
-
[29]
Andrea Montanari, Optimization of random cost functions and statistical physics, Proceedings of the 2023 Solvay Conference on Physics, 2024
2023
-
[30]
Andrea Montanari and Emile Richard, A statistical model for tensor pca, Advances in neural information processing systems 27 (2014)
2014
-
[31]
1, 1--173
Andrea Montanari and Subhabrata Sen, A friendly tutorial on mean-field spin glass techniques for non-physicists, Foundations and Trends in Machine Learning 17 (2024), no. 1, 1--173
2024
- [32]
- [33]
- [34]
- [35]
-
[36]
David Nualart, The malliavin calculus and related topics, Springer, 2006
2006
-
[37]
Shai Shalev-Shwartz and Shai Ben-David, Understanding machine learning: From theory to algorithms, Cambridge University Press, 2014
2014
-
[38]
5, 1021--1044
Eliran Subag, Following the ground states of full-rsb spherical spin glasses, Communications on Pure and Applied Mathematics 74 (2021), no. 5, 1021--1044
2021
-
[39]
54, Springer Science & Business Media, 2010
Michel Talagrand, Mean field models for spin glasses: Volume i: Basic examples, vol. 54, Springer Science & Business Media, 2010
2010
-
[40]
1683--1709
Christos Thrampoulidis, Samet Oymak, and Babak Hassibi, Regularized linear regression: A precise analysis of the estimation error, Conference on Learning Theory, PMLR, 2015, pp. 1683--1709
2015
- [41]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.