Replica Symmetry Breaking and Algorithmic Thresholds in Empirical Risk Minimization under Multi-Index Model

Andrea Montanari; Kangjie Zhou

arxiv: 2606.28573 · v1 · pith:MHLU6MWInew · submitted 2026-06-26 · 💻 cs.LG · math.ST· stat.TH

Replica Symmetry Breaking and Algorithmic Thresholds in Empirical Risk Minimization under Multi-Index Model

Andrea Montanari , Kangjie Zhou This is my paper

Pith reviewed 2026-06-30 00:39 UTC · model grok-4.3

classification 💻 cs.LG math.STstat.TH

keywords multi-index modelempirical risk minimizationapproximate message passingreplica symmetry breakingalgorithmic thresholdshigh-dimensional asymptoticsnon-convex optimization

0 comments

The pith

An incremental approximate message passing algorithm achieves a training error conjectured to be optimal among all polynomial-time methods for empirical risk minimization in multi-index models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines high-dimensional supervised learning where the response depends on an unknown k-dimensional subspace of the features. It introduces the IAMP algorithm to optimize an m-dimensional projection model and derives exact asymptotic expressions for its training error and the test-training error relationship as n and d grow with fixed ratio alpha. This provides a concrete picture of the accessible optima in non-convex landscapes for efficient algorithms.

Core claim

We propose an incremental approximate message passing (IAMP) algorithm and precisely characterize the training error it achieves, as well as the relation between test and training error, in the high dimensional asymptotics n,d→∞, with n/d→α∈(0,+∞). Based on earlier work in related models, we expect that the performance achieved by our algorithm is optimal among polynomial-time algorithms.

What carries the argument

Incremental approximate message passing (IAMP) algorithm, which incrementally constructs solutions while characterizing the empirical risk in the multi-index setting.

If this is right

The training error of IAMP is exactly characterized in the asymptotic limit.
The relationship between test and training error is precisely described.
This performance level is expected to be the best achievable by any polynomial-time algorithm.
Gradient-based methods are limited to the regions accessible by IAMP in this model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar algorithmic thresholds may appear in more general neural network architectures.
The replica symmetry breaking phenomena likely govern the computational hardness in these ERM problems.
Finite-size simulations could test the accuracy of the high-dimensional predictions for moderate n and d.

Load-bearing premise

The optimality among polynomial-time algorithms is transferred from earlier related models without a dedicated proof for the multi-index case.

What would settle it

Demonstrating a polynomial-time algorithm that attains strictly lower training error than the IAMP prediction for some choice of k, m, and alpha in the high-dimensional limit would disprove the optimality conjecture.

Figures

Figures reproduced from arXiv: 2606.28573 by Andrea Montanari, Kangjie Zhou.

**Figure 1.** Figure 1: Left panel: The limiting overlap with w∗ for Bayes AMP, ERM (value achieved by our two-stage algorithm), and projected gradient descent (PGD). Right panel: The limiting training error (normalized empirical correlation) and test error (normalized population correlation) for these three methods. These values are computed under the limit where α → ∞ and λ → 0 such that αλ → α. See Section 5 for a comprehensiv… view at source ↗

**Figure 2.** Figure 2: Left panel: The limiting overlap with w∗ for Bayes AMP, ERM and PGD. Right panel: The normalized limiting empirical and population correlation for these three methods. In our implementation of PGD, we set d = 100 and α = 200, sweeping α ∈ [0, 10] so that λ ranges in [0, 0.05]. All reported numerical results are averaged over 100 independent runs of PGD. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗

**Figure 3.** Figure 3: Left panel: The limiting overlap with w∗ for Bayes AMP, ERM and PGD, under different choices of the activation σ. Right panel: The normalized limiting empirical and population correlation for these three methods. In this experiment, we fix σ1 = 1 and vary k ∈ {2, 3} and σk ∈ {1, 2, 5}. The top and bottom rows present our results for k = 2 and k = 3, respectively. The remaining experimental setup is the sam… view at source ↗

read the original abstract

Modern machine learning models are trained by optimizing high-dimensional non-convex empirical risk functions. Such cost functions can have a multitude of local optima and yet, gradient-based optimization appears to converge to near-global optima. Within a simple supervised learning setting, we develop a precise picture of which parts of the empirical risk landscape are accessible by polynomial-time algorithms. We are given i.i.d. pairs $\{(\boldsymbol{x}_i,y_i):\; 1 \le i\le n\}$ with $\boldsymbol{x}_i\in \mathbb{R}^d$ standard Gaussian feature vectors, and $y_i\in\mathbb{R}$ response variables that depend on $\boldsymbol{x}_i$ through their projections on an unknown $k$-dimensional subspace. We use empirical risk minimization to learn a model that depends on an $m$-dimensional projection of the data (e.g., an $m$-neurons neural network). We propose an incremental approximate message passing (IAMP) algorithm and precisely characterize the training error it achieves, as well as the relation between test and training error, in the high dimensional asymptotics $n,d\to\infty$, with $n/d\to\alpha \in (0, +\infty)$. Based on earlier work in related models, we expect that the performance achieved by our algorithm is optimal among polynomial-time algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper characterizes training error for an incremental AMP algorithm in the multi-index ERM setting but rests its poly-time optimality expectation on transfer from prior overlapping work without a fresh argument.

read the letter

The core contribution is a high-dimensional asymptotic analysis of an incremental approximate message passing algorithm applied to empirical risk minimization under a multi-index model. It gives explicit characterizations of the training error and the training-test error relation as n and d go to infinity with fixed ratio.

This extends the authors' earlier message-passing results to the multi-index case and introduces the incremental variant, which appears to be the main technical step. The model setup with unknown k-dimensional subspaces and m-dimensional projections is stated clearly.

The characterization itself is the part that could be useful if the state-evolution derivations are carried through without gaps. Readers already working in this line will recognize the framework.

The soft spot is the optimality claim. The abstract says the algorithm's performance is expected to be optimal among polynomial-time methods based on earlier work in related models. No new reduction, fixed-point analysis, or argument is supplied to show that the multi-index structure does not introduce additional barriers not present in the single-index cases. The reliance on transfer from overlapping prior papers makes the claim an expectation rather than a derived result for this setting.

This is for people tracking algorithmic thresholds and replica methods in high-dimensional statistics. A reader who knows the single-index AMP literature will extract the extension quickly.

It deserves a serious referee because the error characterization, if rigorous, adds a concrete data point even if the optimality statement needs tightening or explicit caveats.

Referee Report

1 major / 0 minor

Summary. The paper considers supervised learning with responses depending on an unknown k-dimensional subspace of Gaussian features. It introduces an incremental approximate message passing (IAMP) algorithm for empirical risk minimization over an m-dimensional projection (e.g., an m-neuron network) and derives the high-dimensional asymptotic training error of IAMP together with the relation between its test and training errors, in the limit n,d→∞ with n/d→α∈(0,∞). The manuscript states that, based on earlier work in related models, this performance is expected to be optimal among all polynomial-time algorithms.

Significance. A rigorous high-dimensional characterization of IAMP under the multi-index model would clarify which parts of the non-convex ERM landscape are reachable by efficient algorithms and would connect replica-symmetry-breaking analyses to concrete algorithmic thresholds. The work could therefore inform both the statistical-physics approach to learning and the design of practical subspace-learning procedures, provided the optimality expectation is placed on firmer footing.

major comments (1)

[Abstract] Abstract (final sentence): The claim that IAMP performance is optimal among polynomial-time algorithms rests on an expectation transferred from 'earlier work in related models' without a new reduction, state-evolution fixed-point analysis, or explicit verification that the multi-index structure does not introduce additional poly-time barriers (new fixed points or phase transitions) absent from the single-index or related cases referenced. This transfer is load-bearing for the title's reference to 'Algorithmic Thresholds' and for the central narrative about accessible regions of the ERM landscape.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their careful reading and for identifying a key point regarding the presentation of our optimality expectation. We address the comment below and propose revisions to clarify the scope of our claims.

read point-by-point responses

Referee: [Abstract] Abstract (final sentence): The claim that IAMP performance is optimal among polynomial-time algorithms rests on an expectation transferred from 'earlier work in related models' without a new reduction, state-evolution fixed-point analysis, or explicit verification that the multi-index structure does not introduce additional poly-time barriers (new fixed points or phase transitions) absent from the single-index or related cases referenced. This transfer is load-bearing for the title's reference to 'Algorithmic Thresholds' and for the central narrative about accessible regions of the ERM landscape.

Authors: We agree that the manuscript does not contain a new reduction, state-evolution analysis of additional fixed points, or explicit verification ruling out multi-index-specific poly-time barriers. The optimality statement is presented as an expectation transferred from prior results on related models (single-index and planted problems with comparable structure), where analogous AMP algorithms achieve conjectured optimality. The core contribution remains the high-dimensional characterization of training error and test-training relation for IAMP under the multi-index ERM. We will revise the abstract to replace 'we expect that the performance achieved by our algorithm is optimal' with 'we conjecture, based on earlier work in related models, that the performance achieved by our algorithm is optimal among polynomial-time methods,' making the conjectural nature explicit. We will also adjust the title and narrative to refer to 'algorithmic thresholds achieved by IAMP' rather than implying proven optimality across all poly-time methods. These changes address the load-bearing concern without altering the technical results. revision: yes

standing simulated objections not resolved

A rigorous reduction or state-evolution analysis confirming that the multi-index model introduces no new polynomial-time barriers beyond those in the referenced single-index cases.

Circularity Check

1 steps flagged

Optimality expectation transferred from prior related models without new derivation in this paper

specific steps

self citation load bearing [Abstract]
"Based on earlier work in related models, we expect that the performance achieved by our algorithm is optimal among polynomial-time algorithms."

The optimality claim among polynomial-time algorithms is not derived or proven within this paper for the multi-index model; it is instead transferred from prior work whose authors overlap with the present authors. No new reduction, fixed-point analysis, or argument is provided showing that multi-index structure introduces no additional poly-time barriers.

full rationale

The paper's core contribution is a precise asymptotic characterization of the training error achieved by the proposed IAMP algorithm (and its relation to test error) under the multi-index model. This appears to be derived internally via the algorithm definition and high-dimensional analysis. However, the additional statement that this performance is expected to be optimal among polynomial-time algorithms is justified only by reference to 'earlier work in related models' without a new proof or state-evolution argument specific to the multi-index setting. This matches the self-citation load-bearing pattern for the optimality aspect, but the central characterization retains independent content, so the overall circularity is moderate rather than total.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the central claim rests on the high-dimensional limit and on optimality results imported from prior models.

axioms (2)

domain assumption High-dimensional asymptotics n,d→∞ with n/d→α fixed
The precise characterization of training and test error is stated to hold in this limit.
domain assumption Optimality of IAMP among poly-time algorithms transfers from earlier related models
Explicitly invoked in the final sentence of the abstract.

pith-pipeline@v0.9.1-grok · 5769 in / 1245 out tokens · 36880 ms · 2026-06-30T00:39:14.159155+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 8 canonical work pages · 2 internal anchors

[1]

Benjamin Aubin, Antoine Maillard, Florent Krzakala, Nicolas Macris, and Lenka Zdeborov \'a , The committee machine: Computational to statistical gaps in learning a two-layers neural network, Advances in Neural Information Processing Systems 31 (2018)

2018
[2]

11, 2282--2330

Gerard Ben Arous, Song Mei, Andrea Montanari, and Mihai Nica, The landscape of the spiked tensor model, Communications on Pure and Applied Mathematics 72 (2019), no. 11, 2282--2330

2019
[3]

609--633

Antonio Auffinger, Andrea Montanari, and Eliran Subag, Optimization of random high-dimensional functions: Structure and algorithms, Spin Glass Theory and Far Beyond: Replica Symmetry Breaking After 40 Years, World Scientific, 2023, pp. 609--633

2023
[4]

Kiana Asgari, Andrea Montanari, and Basil Saeed, Local minima of the empirical risk in high dimension: General theorems and convex examples, Annals of Statistics (2026)

2026
[5]

12, 5451--5460

Jean Barbier, Florent Krzakala, Nicolas Macris, L \'e o Miolane, and Lenka Zdeborov \'a , Optimal errors and phase transitions in high-dimensional generalized linear models, Proceedings of the National Academy of Sciences 116 (2019), no. 12, 5451--5460

2019
[6]

2, 764--785

Mohsen Bayati and Andrea Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Transactions on Information Theory 57 (2011), no. 2, 764--785

2011
[7]

4, 1997--2017

, The lasso risk for gaussian matrices, IEEE Transactions on Information Theory 58 (2011), no. 4, 1997--2017

2011
[8]

5, 822--883

Yuxin Chen and Emmanuel J Cand \`e s, Solving random quadratic systems of equations is nearly as easy as solving linear systems, Communications on pure and applied mathematics 70 (2017), no. 5, 822--883

2017
[9]

Alexander S Cherny, Singular stochastic differential equations, Springer Science & Business Media, 2005

2005
[10]

1078--1141

Michael Celentano, Andrea Montanari, and Yuchen Wu, The estimation error of general first order methods, Conference on Learning Theory, PMLR, 2020, pp. 1078--1141

2020
[11]

1, 129--173

Wei-Kuo Chen and Arnab Sen, Parisi formula, disorder chaos and fluctuation for the ground state energy in the spherical mixed p-spin models, Communications in Mathematical Physics 350 (2017), no. 1, 129--173

2017
[12]

Persi Diaconis and David Freedman, Asymptotics of graphical projection pursuit, The Annals of Statistics (1984), 793--815

1984
[13]

David Donoho and Andrea Montanari, High dimensional robust M-estimation: Asymptotic variance via approximate message passing , Probability Theory and Related Fields 166 (2016), 935--969

2016
[14]

6, 2922--2960

Ahmed El Alaoui, Andrea Montanari, and Mark Sellke, Optimization of mean-field spin glasses, The Annals of Probability 49 (2021), no. 6, 2922--2960

2021
[15]

1, 95--175

Noureddine El Karoui, On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators, Probability Theory and Related Fields 170 (2018), no. 1, 95--175

2018
[16]

9, 881--890

Jerome H Friedman and John W Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Transactions on computers 100 (1974), no. 9, 881--890

1974
[17]

1, 180--205

David Gamarnik and Aukosh Jagannath, The overlap gap property and approximate message passing algorithms for p-spin models, The Annals of Probability 49 (2021), no. 1, 180--205

2021
[18]

G Gy \"o rgyi and P Reimann, Beyond storage capacity in a single model neuron: Continuous replica symmetry breaking, Journal of Statistical Physics 101 (2000), 679--702

2000
[19]

369--376

David Gamarnik and Madhu Sudan, Limits of local algorithms over sparse random graphs, Proceedings of the 5th conference on Innovations in theoretical computer science, 2014, pp. 369--376

2014
[20]

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, and Aidan Clark, Training compute-optimal large language models, arXiv:2203.15556 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[21]

312--322

Brice Huang and Mark Sellke, Tight Lipschitz hardness for optimizing mean field spin glasses , 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2022, pp. 312--322

2022
[22]

2, 1--42

, Optimization algorithms for multi-species spherical spin glasses, Journal of Statistical Physics 191 (2024), no. 2, 1--42

2024
[23]

2, 115--144

Adel Javanmard and Andrea Montanari, State evolution for general approximate message passing algorithms, with applications to spatial coupling, Information and Inference: A Journal of the IMA 2 (2013), no. 2, 115--144

2013
[24]

3, 979--1017

Aukosh Jagannath and Ian Tobasco, Low temperature asymptotics of spherical mean field spin glasses, Communications in Mathematical Physics 352 (2017), no. 3, 979--1017

2017
[25]

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei, Scaling laws for neural language models, arXiv preprint arXiv:2001.08361 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2001
[26]

Antoine Maillard, Tony Bonnaire, and Giulio Biroli, Topological exploration of high-dimensional empirical risk landscapes: general approach, and applications to phase retrieval, arXiv preprint arXiv:2602.17779 (2026)

work page arXiv 2026
[27]

1445--1450

Marco Mondelli and Andrea Montanari, Fundamental limits of weak recovery with applications to phase retrieval, Conference On Learning Theory, PMLR, 2018, pp. 1445--1450

2018
[28]

A. Montanari, Optimization of the sherrington-kirkpatrick hamiltonian, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) (Los Alamitos, CA, USA), IEEE Computer Society, nov 2019, pp. 1417--1433

2019
[29]

Andrea Montanari, Optimization of random cost functions and statistical physics, Proceedings of the 2023 Solvay Conference on Physics, 2024

2023
[30]

Andrea Montanari and Emile Richard, A statistical model for tensor pca, Advances in neural information processing systems 27 (2014)

2014
[31]

1, 1--173

Andrea Montanari and Subhabrata Sen, A friendly tutorial on mean-field spin glass techniques for non-physicists, Foundations and Trends in Machine Learning 17 (2024), no. 1, 1--173

2024
[32]

Andrea Montanari and Basil Saeed, Topological trivialization in non-convex empirical risk minimization, arXiv preprint arXiv:2602.14969 (2026)

work page arXiv 2026
[33]

Andrea Montanari and Yuchen Wu, Statistically optimal first order algorithms: A proof via orthogonalization, arXiv preprint arXiv:2201.05101 (2022)

work page arXiv 2022
[34]

Andrea Montanari and Kangjie Zhou, Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks, arXiv preprint arXiv:2206.06526 (2022)

work page arXiv 2022
[35]

, Which exceptional low-dimensional projections of a gaussian point cloud can be found in polynomial time?, arXiv preprint arXiv:2406.02970 (2024)

work page arXiv 2024
[36]

David Nualart, The malliavin calculus and related topics, Springer, 2006

2006
[37]

Shai Shalev-Shwartz and Shai Ben-David, Understanding machine learning: From theory to algorithms, Cambridge University Press, 2014

2014
[38]

5, 1021--1044

Eliran Subag, Following the ground states of full-rsb spherical spin glasses, Communications on Pure and Applied Mathematics 74 (2021), no. 5, 1021--1044

2021
[39]

54, Springer Science & Business Media, 2010

Michel Talagrand, Mean field models for spin glasses: Volume i: Basic examples, vol. 54, Springer Science & Business Media, 2010

2010
[40]

1683--1709

Christos Thrampoulidis, Samet Oymak, and Babak Hassibi, Regularized linear regression: A precise analysis of the estimation error, Conference on Learning Theory, PMLR, 2015, pp. 1683--1709

2015
[41]

Matteo Vilucchio, Yatin Dandi, Mat \'e o Pirio Rossignol, Cedric Gerbelot, and Florent Krzakala, Asymptotics of non-convex generalized linear models in high-dimensions: A proof of the replica formula, arXiv preprint arXiv:2502.20003 (2025)

work page arXiv 2025

[1] [1]

Benjamin Aubin, Antoine Maillard, Florent Krzakala, Nicolas Macris, and Lenka Zdeborov \'a , The committee machine: Computational to statistical gaps in learning a two-layers neural network, Advances in Neural Information Processing Systems 31 (2018)

2018

[2] [2]

11, 2282--2330

Gerard Ben Arous, Song Mei, Andrea Montanari, and Mihai Nica, The landscape of the spiked tensor model, Communications on Pure and Applied Mathematics 72 (2019), no. 11, 2282--2330

2019

[3] [3]

609--633

Antonio Auffinger, Andrea Montanari, and Eliran Subag, Optimization of random high-dimensional functions: Structure and algorithms, Spin Glass Theory and Far Beyond: Replica Symmetry Breaking After 40 Years, World Scientific, 2023, pp. 609--633

2023

[4] [4]

Kiana Asgari, Andrea Montanari, and Basil Saeed, Local minima of the empirical risk in high dimension: General theorems and convex examples, Annals of Statistics (2026)

2026

[5] [5]

12, 5451--5460

Jean Barbier, Florent Krzakala, Nicolas Macris, L \'e o Miolane, and Lenka Zdeborov \'a , Optimal errors and phase transitions in high-dimensional generalized linear models, Proceedings of the National Academy of Sciences 116 (2019), no. 12, 5451--5460

2019

[6] [6]

2, 764--785

Mohsen Bayati and Andrea Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Transactions on Information Theory 57 (2011), no. 2, 764--785

2011

[7] [7]

4, 1997--2017

, The lasso risk for gaussian matrices, IEEE Transactions on Information Theory 58 (2011), no. 4, 1997--2017

2011

[8] [8]

5, 822--883

Yuxin Chen and Emmanuel J Cand \`e s, Solving random quadratic systems of equations is nearly as easy as solving linear systems, Communications on pure and applied mathematics 70 (2017), no. 5, 822--883

2017

[9] [9]

Alexander S Cherny, Singular stochastic differential equations, Springer Science & Business Media, 2005

2005

[10] [10]

1078--1141

Michael Celentano, Andrea Montanari, and Yuchen Wu, The estimation error of general first order methods, Conference on Learning Theory, PMLR, 2020, pp. 1078--1141

2020

[11] [11]

1, 129--173

Wei-Kuo Chen and Arnab Sen, Parisi formula, disorder chaos and fluctuation for the ground state energy in the spherical mixed p-spin models, Communications in Mathematical Physics 350 (2017), no. 1, 129--173

2017

[12] [12]

Persi Diaconis and David Freedman, Asymptotics of graphical projection pursuit, The Annals of Statistics (1984), 793--815

1984

[13] [13]

David Donoho and Andrea Montanari, High dimensional robust M-estimation: Asymptotic variance via approximate message passing , Probability Theory and Related Fields 166 (2016), 935--969

2016

[14] [14]

6, 2922--2960

Ahmed El Alaoui, Andrea Montanari, and Mark Sellke, Optimization of mean-field spin glasses, The Annals of Probability 49 (2021), no. 6, 2922--2960

2021

[15] [15]

1, 95--175

Noureddine El Karoui, On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators, Probability Theory and Related Fields 170 (2018), no. 1, 95--175

2018

[16] [16]

9, 881--890

Jerome H Friedman and John W Tukey, A projection pursuit algorithm for exploratory data analysis, IEEE Transactions on computers 100 (1974), no. 9, 881--890

1974

[17] [17]

1, 180--205

David Gamarnik and Aukosh Jagannath, The overlap gap property and approximate message passing algorithms for p-spin models, The Annals of Probability 49 (2021), no. 1, 180--205

2021

[18] [18]

G Gy \"o rgyi and P Reimann, Beyond storage capacity in a single model neuron: Continuous replica symmetry breaking, Journal of Statistical Physics 101 (2000), 679--702

2000

[19] [19]

369--376

David Gamarnik and Madhu Sudan, Limits of local algorithms over sparse random graphs, Proceedings of the 5th conference on Innovations in theoretical computer science, 2014, pp. 369--376

2014

[20] [20]

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, and Aidan Clark, Training compute-optimal large language models, arXiv:2203.15556 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[21] [21]

312--322

Brice Huang and Mark Sellke, Tight Lipschitz hardness for optimizing mean field spin glasses , 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2022, pp. 312--322

2022

[22] [22]

2, 1--42

, Optimization algorithms for multi-species spherical spin glasses, Journal of Statistical Physics 191 (2024), no. 2, 1--42

2024

[23] [23]

2, 115--144

Adel Javanmard and Andrea Montanari, State evolution for general approximate message passing algorithms, with applications to spatial coupling, Information and Inference: A Journal of the IMA 2 (2013), no. 2, 115--144

2013

[24] [24]

3, 979--1017

Aukosh Jagannath and Ian Tobasco, Low temperature asymptotics of spherical mean field spin glasses, Communications in Mathematical Physics 352 (2017), no. 3, 979--1017

2017

[25] [25]

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei, Scaling laws for neural language models, arXiv preprint arXiv:2001.08361 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2001

[26] [26]

Antoine Maillard, Tony Bonnaire, and Giulio Biroli, Topological exploration of high-dimensional empirical risk landscapes: general approach, and applications to phase retrieval, arXiv preprint arXiv:2602.17779 (2026)

work page arXiv 2026

[27] [27]

1445--1450

Marco Mondelli and Andrea Montanari, Fundamental limits of weak recovery with applications to phase retrieval, Conference On Learning Theory, PMLR, 2018, pp. 1445--1450

2018

[28] [28]

A. Montanari, Optimization of the sherrington-kirkpatrick hamiltonian, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) (Los Alamitos, CA, USA), IEEE Computer Society, nov 2019, pp. 1417--1433

2019

[29] [29]

Andrea Montanari, Optimization of random cost functions and statistical physics, Proceedings of the 2023 Solvay Conference on Physics, 2024

2023

[30] [30]

Andrea Montanari and Emile Richard, A statistical model for tensor pca, Advances in neural information processing systems 27 (2014)

2014

[31] [31]

1, 1--173

Andrea Montanari and Subhabrata Sen, A friendly tutorial on mean-field spin glass techniques for non-physicists, Foundations and Trends in Machine Learning 17 (2024), no. 1, 1--173

2024

[32] [32]

Andrea Montanari and Basil Saeed, Topological trivialization in non-convex empirical risk minimization, arXiv preprint arXiv:2602.14969 (2026)

work page arXiv 2026

[33] [33]

Andrea Montanari and Yuchen Wu, Statistically optimal first order algorithms: A proof via orthogonalization, arXiv preprint arXiv:2201.05101 (2022)

work page arXiv 2022

[34] [34]

Andrea Montanari and Kangjie Zhou, Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks, arXiv preprint arXiv:2206.06526 (2022)

work page arXiv 2022

[35] [35]

, Which exceptional low-dimensional projections of a gaussian point cloud can be found in polynomial time?, arXiv preprint arXiv:2406.02970 (2024)

work page arXiv 2024

[36] [36]

David Nualart, The malliavin calculus and related topics, Springer, 2006

2006

[37] [37]

Shai Shalev-Shwartz and Shai Ben-David, Understanding machine learning: From theory to algorithms, Cambridge University Press, 2014

2014

[38] [38]

5, 1021--1044

Eliran Subag, Following the ground states of full-rsb spherical spin glasses, Communications on Pure and Applied Mathematics 74 (2021), no. 5, 1021--1044

2021

[39] [39]

54, Springer Science & Business Media, 2010

Michel Talagrand, Mean field models for spin glasses: Volume i: Basic examples, vol. 54, Springer Science & Business Media, 2010

2010

[40] [40]

1683--1709

Christos Thrampoulidis, Samet Oymak, and Babak Hassibi, Regularized linear regression: A precise analysis of the estimation error, Conference on Learning Theory, PMLR, 2015, pp. 1683--1709

2015

[41] [41]

Matteo Vilucchio, Yatin Dandi, Mat \'e o Pirio Rossignol, Cedric Gerbelot, and Florent Krzakala, Asymptotics of non-convex generalized linear models in high-dimensions: A proof of the replica formula, arXiv preprint arXiv:2502.20003 (2025)

work page arXiv 2025