pith. machine review for the scientific record. sign in

arxiv: 2605.08552 · v1 · submitted 2026-05-08 · 📊 stat.ML · cs.LG

Recognition: no theorem link

Learnability and Competition in High-Dimensional Multi-Component ICA

Eser Ilke Genc, Samet Demir, Zafer Dogan

Pith reviewed 2026-05-12 01:28 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords independent component analysismean-field theoryhigh-dimensional limitsonline learningoverlap dynamicsphase structurelearnability boundaries
0
0 comments X

The pith

In the high-dimensional limit, multi-component online ICA obeys a deterministic ODE for the overlap matrix that distinguishes decoupled and competition phases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an asymptotically exact mean-field theory for learning several independent components simultaneously via online ICA. It shows that the joint distribution of learned estimates and ground-truth components converges to a deterministic limit, which closes into a system of ordinary differential equations for the matrix of overlaps between learned directions and true sources. The resulting description identifies two initialization-dependent regimes: one in which estimates align with distinct components and evolve nearly independently, and another in which overlapping initializations produce orthogonality-driven conflicts that slow reorientation and delay convergence. A reader would care because the framework supplies explicit conditions on step size, data moments, and initialization that determine which components remain recoverable and how long recovery takes.

Core claim

In the high-dimensional limit the joint empirical distribution of learned estimates and ground-truth components converges to a deterministic process, yielding a closed ODE system for the overlap matrix between learned directions and true components. This characterization reveals a genuinely multi-component, initialization-driven phase structure: a decoupled regime, where estimates align with distinct components and evolve nearly independently, and a competition regime, where overlapping initializations induce orthogonality-driven conflicts, slow reorientation, and delayed convergence. Steady-state analysis gives explicit learnability boundaries and competition conditions linking step size, 0

What carries the argument

The closed system of ordinary differential equations for the overlap matrix between learned directions and true components, derived from the mean-field limit of the stochastic online updates together with orthogonalization.

If this is right

  • Larger higher-order moments shrink the interval of stable learning rates.
  • Overlapping initializations lengthen convergence times through orthogonality-driven conflicts.
  • The number of recoverable components increases in discrete steps as the learning rate is raised.
  • The phase boundaries link step size, data moments, and initialization to predict when the decoupled regime is reached.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The ODE description could be used to select initializations that avoid the competition regime and thereby shorten training.
  • Similar mean-field closures might characterize competition effects in other simultaneous orthogonalized learning rules.
  • The predicted staircase in the number of recoverable components can be checked directly on large synthetic ensembles by counting successful recoveries at different learning rates.

Load-bearing premise

The high-dimensional limit and the mean-field closure that converts the stochastic online updates plus orthogonalization into a deterministic ODE system for the overlap matrix remain valid when multiple components are learned simultaneously.

What would settle it

Simulations of the multi-component ICA algorithm in successively higher dimensions should show the empirical trajectories of the overlap matrix approaching the numerical solution of the derived ODE system, with the difference vanishing as dimension tends to infinity.

Figures

Figures reproduced from arXiv: 2605.08552 by Eser Ilke Genc, Samet Demir, Zafer Dogan.

Figure 1
Figure 1. Figure 1: Evolution of the joint probability limiting density for [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Learning dynamics in decoupled and competition regimes. Lines denote ODE predictions, [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Phase portraits of the coupled 2 × 2 system in Figure 2b. First row dynamics (Q1,1, Q1,2) evolve independently in (a), with markers representing specific timestamps: t = 0 (orange), t = 2700 (green), and t = 5000 (yellow), and the dashed red line represents the theoretically predicted competition boundary in (14). In (b), (c), and (d), we plot the second row dynamics (Q2,1, Q2,2) at the corresponding times… view at source ↗
Figure 4
Figure 4. Figure 4: Steady-state analysis reveals learnability boundaries, staircase behavior, and competition [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Multi-component ICA on the Indian Pines dataset. In (a) ODE predictions accurately [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Evolution of the limiting marginal densities, corresponding to the total density in Figure 1 in [PITH_FULL_IMAGE:figures/full_fig_p047_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Evolution of the joint limiting probability density for [PITH_FULL_IMAGE:figures/full_fig_p048_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Evolution of the limiting marginal densities corresponding to the total density in Figure [PITH_FULL_IMAGE:figures/full_fig_p048_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Competition is also visible across other nonlinearities and orthogonalization schemes. [PITH_FULL_IMAGE:figures/full_fig_p056_9.png] view at source ↗
read the original abstract

Independent Component Analysis (ICA) is a foundational tool for unsupervised representation learning, yet its high-dimensional theory remains largely limited to single-component recovery. We develop an asymptotically exact mean-field theory for multi-component online ICA, capturing the coupling induced by simultaneous learning and orthogonalization. In the high-dimensional limit, the joint empirical distribution of learned estimates and ground-truth components converges to a deterministic process, yielding a closed ODE system for the overlap matrix between learned directions and true components. This characterization reveals a genuinely multi-component, initialization-driven phase structure: a decoupled regime, where estimates align with distinct components and evolve nearly independently, and a competition regime, where overlapping initializations induce orthogonality-driven conflicts, slow reorientation, and delayed convergence. Our steady-state analysis gives explicit learnability boundaries and competition conditions linking step size, data moments, and initialization. These conditions show that larger higher-order moments and competition shrink the stable learning-rate window, increase convergence times, and predict a staircase phenomenon in which the number of recoverable components changes discretely with the learning rate. Experiments on synthetic data and hyperspectral remote sensing data validate the predicted trajectories and phase behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops an asymptotically exact mean-field theory for multi-component online ICA. In the high-dimensional limit, the joint empirical distribution of learned estimates and ground-truth components converges to a deterministic process, yielding a closed ODE system for the overlap matrix between learned directions and true components. This reveals an initialization-driven phase structure consisting of a decoupled regime (estimates align with distinct components and evolve nearly independently) and a competition regime (overlapping initializations induce orthogonality-driven conflicts, slow reorientation, and delayed convergence). Steady-state analysis supplies explicit learnability boundaries and competition conditions linking step size, data moments, and initialization; these predict a staircase phenomenon in the number of recoverable components. The predictions are validated on synthetic data and hyperspectral remote sensing data.

Significance. If the mean-field closure is rigorously justified, the work would meaningfully extend single-component ICA theory to the simultaneous multi-component setting. The explicit identification of initialization-dependent regimes, the closed ODE characterization, and the resulting learnability boundaries with the staircase prediction constitute a substantive theoretical contribution that could guide both analysis and practical tuning of online ICA algorithms.

major comments (2)
  1. [Mean-field limit and ODE derivation] The central claim of an asymptotically exact closed ODE system for the overlap matrix (abstract and mean-field analysis) rests on the joint empirical distribution converging to a deterministic process whose evolution closes exactly on the overlap matrix. In the competition regime the orthogonalization step projects each direction onto the orthogonal complement of the others, coupling all learned vectors through their current Gram matrix. The manuscript must explicitly show that all required moments concentrate and that no additional state variables (pairwise overlaps among estimates or data fourth-order tensors) are needed; absent this demonstration the closure is at best approximate rather than asymptotically exact.
  2. [Steady-state analysis] The steady-state learnability boundaries and competition conditions (steady-state analysis) are asserted to link step size, data moments, and initialization independently. Without the explicit ODEs and the derivation steps that produce these boundaries, it is impossible to verify whether they are obtained from the dynamics or reduce to fitted quantities by construction. This directly affects the interpretation of the decoupled/competition phase structure and the predicted staircase phenomenon.
minor comments (2)
  1. The abstract is information-dense; a brief statement of the form of the ODE system (even without full derivation) would improve accessibility for readers unfamiliar with mean-field ICA analyses.
  2. Notation for the overlap matrix and the precise definition of the competition regime should be introduced with a short table or diagram early in the manuscript to aid cross-referencing with the experimental figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments, which highlight important aspects of the mean-field derivation and steady-state analysis. We address each major comment point by point below and will revise the manuscript to incorporate additional explicit derivations and justifications.

read point-by-point responses
  1. Referee: [Mean-field limit and ODE derivation] The central claim of an asymptotically exact closed ODE system for the overlap matrix (abstract and mean-field analysis) rests on the joint empirical distribution converging to a deterministic process whose evolution closes exactly on the overlap matrix. In the competition regime the orthogonalization step projects each direction onto the orthogonal complement of the others, coupling all learned vectors through their current Gram matrix. The manuscript must explicitly show that all required moments concentrate and that no additional state variables (pairwise overlaps among estimates or data fourth-order tensors) are needed; absent this demonstration the closure is at best approximate rather than asymptotically exact.

    Authors: We agree that an explicit demonstration of the mean-field closure is necessary for rigor, especially accounting for the orthogonalization-induced coupling in the competition regime. In the high-dimensional limit, concentration of measure ensures that the empirical joint distribution converges to its deterministic limit, with all required moments (including those arising from the projection) expressible solely as functions of the overlap matrix between learned directions and true components. The Gram matrix of the estimates is determined by these overlaps together with the enforced orthogonality, without introducing independent pairwise overlaps among estimates or higher-order data tensors as additional state variables. We will add a new subsection to the mean-field analysis section that spells out the concentration arguments and closure steps, together with an appendix containing the full moment calculations and the explicit form of the projected dynamics. This revision will establish that the ODE system is closed on the overlap matrix. revision: yes

  2. Referee: [Steady-state analysis] The steady-state learnability boundaries and competition conditions (steady-state analysis) are asserted to link step size, data moments, and initialization independently. Without the explicit ODEs and the derivation steps that produce these boundaries, it is impossible to verify whether they are obtained from the dynamics or reduce to fitted quantities by construction. This directly affects the interpretation of the decoupled/competition phase structure and the predicted staircase phenomenon.

    Authors: We acknowledge that the original manuscript presented the steady-state boundaries without sufficient intermediate steps. These boundaries are obtained by setting the time derivatives in the closed ODE system to zero, solving the resulting algebraic equations for the fixed-point overlaps, and imposing stability via the eigenvalues of the linearized dynamics. The competition conditions follow from the same fixed-point analysis under overlapping initializations. We will include the explicit ODE equations in the main text and provide a complete step-by-step derivation of the learnability boundaries, competition thresholds, and the resulting staircase in the number of recoverable components in a new appendix. This will make clear that the predictions are derived directly from the dynamics. revision: yes

Circularity Check

0 steps flagged

Mean-field closure for multi-component ICA overlap matrix is asymptotically derived without reduction to inputs

full rationale

The paper derives the closed ODE system from the claimed high-dimensional convergence of the joint empirical distribution of estimates and components. This is a standard concentration argument for online stochastic updates with orthogonalization, not a self-definitional loop or a fitted parameter renamed as a prediction. No load-bearing self-citations, uniqueness theorems from prior author work, or smuggled ansatzes are invoked for the central closure. The competition regime coupling via Gram matrix is explicitly part of the claimed deterministic process rather than an untracked extra state. Absent any quoted equation that reduces the target result to its own fitted inputs by construction, the derivation chain remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no explicit free parameters, axioms, or invented entities; the mean-field closure and high-dimensional limit are invoked at a high level without stated assumptions or fitted quantities.

pith-pipeline@v0.9.0 · 5500 in / 1224 out tokens · 39169 ms · 2026-05-12T01:28:16.508888+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

  1. [1]

    Architectures neuromimétiques adaptatives : Détection de primitives

    Bernard Ans, Jeanny Hérault, and Christian Jutten. Architectures neuromimétiques adaptatives : Détection de primitives. InCognitiva 85, volume 2, pages 593–597, 1985

  2. [2]

    Independent component analysis, a new concept?Signal processing, 36(3):287– 314, 1994

    Pierre Comon. Independent component analysis, a new concept?Signal processing, 36(3):287– 314, 1994

  3. [3]

    A new learning algorithm for blind signal separation

    Shun-ichi Amari, Andrzej Cichocki, and Howard Hua Yang. A new learning algorithm for blind signal separation. InAdvances in Neural Information Processing Systems, 1995

  4. [4]

    Independent component analysis: algorithms and applications.Neural networks, 13(4-5):411–430, 2000

    A Hyvärinen and E Oja. Independent component analysis: algorithms and applications.Neural networks, 13(4-5):411–430, 2000

  5. [5]

    Bell and Terrence J

    Anthony J. Bell and Terrence J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution.Neural Computation, 7(6):1129–1159, 1995

  6. [6]

    Hyvärinen

    A. Hyvärinen. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3):626–634, 1999

  7. [7]

    John Wiley & Sons, 2001

    Aapo Hyvärinen, Juha Karhunen, and Erkki Oja.Independent Component Analysis. John Wiley & Sons, 2001

  8. [8]

    Stochastic algorithms with descent guarantees for ICA

    Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso, and Francis Bach. Stochastic algorithms with descent guarantees for ICA. InProceedings of International Conference on Artificial Intelligence and Statistics, 2019

  9. [9]

    Chris Junchi Li and Michael I. Jordan. Stochastic approximation for online tensorial independent component analysis. InProceedings of Conference on Learning Theory, 2021

  10. [10]

    David Saad and Sara A. Solla. Dynamics of on-line gradient descent learning for multilayer neural networks. InAdvances in Neural Information Processing Systems, 1995

  11. [11]

    Scaling limit: Exact and tractable analysis of online learning algorithms with applications to regularized regression and pca.arXiv preprint arXiv:1712.04332, 2017

    Chuang Wang, Jonathan Mattingly, and Yue M Lu. Scaling limit: Exact and tractable analysis of online learning algorithms with applications to regularized regression and pca.arXiv preprint arXiv:1712.04332, 2017

  12. [12]

    Yazhen Wang and Shang Wu. Asymptotic analysis via stochastic differential equations of gradient descent algorithms in statistical and computational paradigms.Journal of Machine Learning Research, 21(199):1–103, 2020

  13. [13]

    SGD in the large: Average-case analysis, asymptotics, and stepsize criticality

    Courtney Paquette, Kiwon Lee, Fabian Pedregosa, and Elliot Paquette. SGD in the large: Average-case analysis, asymptotics, and stepsize criticality. InProceedings of Conference on Learning Theory, 2021

  14. [14]

    High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

    Gerard Ben Arous, Reza Gheissari, and Aukosh Jagannath. High-dimensional limit theorems for SGD: Effective dynamics and critical scaling. InAdvances in Neural Information Processing Systems, 2022

  15. [15]

    High-dimensional limit theorems for SGD: Momentum and adaptive step-sizes

    Aukosh Jagannath, Taj Jones-McCormick, and Varnan Sarangian. High-dimensional limit theorems for SGD: Momentum and adaptive step-sizes. InProceedings of International Conference on Learning Representations, 2026

  16. [16]

    Hitting the high-dimensional notes: an ODE for SGD learning dynamics on GLMs and multi-index models

    Elizabeth Collins-Woodfin, Courtney Paquette, Elliot Paquette, and Inbar Seroussi. Hitting the high-dimensional notes: an ODE for SGD learning dynamics on GLMs and multi-index models. Information and Inference: A Journal of the IMA, 13(4), 2024. 10

  17. [17]

    Online ICA: Understanding global dynamics of nonconvex optimization via diffusion processes

    Chris Junchi Li, Zhaoran Wang, and Han Liu. Online ICA: Understanding global dynamics of nonconvex optimization via diffusion processes. InAdvances in Neural Information Processing Systems, 2016

  18. [18]

    Chuang Wang and Yue M. Lu. The scaling limit of high-dimensional online independent component analysis. InAdvances in Neural Information Processing Systems, 2017

  19. [19]

    Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions

    Fabiola Ricci, Lorenzo Bardone, and Sebastian Goldt. Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions. InProceedings of International Conference on Machine Learning, 2025

  20. [20]

    Statistical dynamics of on-line independent component analysis.Journal of Machine Learning Research, 4:1393–1410, 2003

    Gleb Basalyga and Magnus Rattray. Statistical dynamics of on-line independent component analysis.Journal of Machine Learning Research, 4:1393–1410, 2003

  21. [21]

    Baumgardner, Larry L

    Marion F. Baumgardner, Larry L. Biehl, and David A. Landgrebe. 220 band A VIRIS hyperspec- tral image data set: June 12, 1992 indian pine test site 3, 2015

  22. [22]

    Garrett, and Tor Arne Johansen

    Daniela Lupu, Ion Necoara, Joseph L. Garrett, and Tor Arne Johansen. Stochastic higher-order independent component analysis for hyperspectral dimensionality reduction.IEEE Transactions on Computational Imaging, 8:1184–1194, 2022

  23. [23]

    Learning linear transformations

    Alan Frieze, Mark Jerrum, and Ravi Kannan. Learning linear transformations. InProceedings of Conference on Foundations of Computer Science, 1996

  24. [24]

    An application of the principle of maximum information preservation to linear systems

    Ralph Linsker. An application of the principle of maximum information preservation to linear systems. InAdvances in Neural Information Processing Systems, 1988

  25. [25]

    Natural gradient works efficiently in learning.Neural Computation, 10(2):251– 276, 1998

    Shun-ichi Amari. Natural gradient works efficiently in learning.Neural Computation, 10(2):251– 276, 1998

  26. [26]

    Adaptive blind separation of independent sources: A deflation approach.Signal Processing, 45(1):59–83, 1995

    Nathalie Delfosse and Philippe Loubaton. Adaptive blind separation of independent sources: A deflation approach.Signal Processing, 45(1):59–83, 1995

  27. [27]

    J.-F. Cardoso. High-order contrasts for independent component analysis.Neural Computation, 11(1):157–192, 1999

  28. [28]

    New approximations of differential entropy for independent component analysis and projection pursuit

    Aapo Hyvärinen. New approximations of differential entropy for independent component analysis and projection pursuit. InAdvances in Neural Information Processing Systems, 1997

  29. [29]

    A fast fixed-point algorithm for independent component analysis.Neural Computation, 9(7):1483–1492, 1997

    Aapo Hyvärinen and Erkki Oja. A fast fixed-point algorithm for independent component analysis.Neural Computation, 9(7):1483–1492, 1997

  30. [30]

    Independent component analysis by general nonlinear hebbian- like learning rules.Signal Processing, 64(3):301–313, 1998

    Aapo Hyvärinen and Erkki Oja. Independent component analysis by general nonlinear hebbian- like learning rules.Signal Processing, 64(3):301–313, 1998

  31. [31]

    Hyvärinen

    A. Hyvärinen. A family of fixed-point algorithms for independent component analysis. InIEEE International Conference on Acoustics, Speech, and Signal Processing, 1997

  32. [32]

    Large-dimensional independent component analysis: Statistical optimality and computational tractability.The Annals of Statistics, 53(2):477 – 505, 2025

    Arnab Auddy and Ming Yuan. Large-dimensional independent component analysis: Statistical optimality and computational tractability.The Annals of Statistics, 53(2):477 – 505, 2025

  33. [33]

    Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties.Mathematical Programming, 214:1–90, 2025

    Courtney Paquette, Elliot Paquette, Ben Adlam, and Jeffrey Pennington. Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties.Mathematical Programming, 214:1–90, 2025

  34. [34]

    Phase diagram of stochastic gradient descent in high-dimensional two-layer neural networks

    Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, and Lenka Zdeborová. Phase diagram of stochastic gradient descent in high-dimensional two-layer neural networks. In Advances in Neural Information Processing Systems, 2022

  35. [35]

    Henry P. McKean. Propagation of chaos for a class of non-linear parabolic equations. In Stochastic Differential Equations (Lecture Series in Differential Equations, Session 7), pages 41–57. Catholic University, 1967

  36. [36]

    Sznitman

    A. Sznitman. Topics in propagation of chaos.Lecture Notes in Mathematics, pages 165–251, 1991. 11

  37. [37]

    High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance.The Annals of Applied Probability, 35(5):2983–3045, 2025

    Krishnakumar Balasubramanian, Promit Ghosal, and Ye He. High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance.The Annals of Applied Probability, 35(5):2983–3045, 2025

  38. [38]

    Statistical guarantees for high-dimensional stochastic gradient descent

    Jiaqi Li, Zhipeng Lou, Johannes Schmidt-Hieber, and Wei Biao Wu. Statistical guarantees for high-dimensional stochastic gradient descent. InAdvances in Neural Information Processing Systems, 2025

  39. [39]

    Chuang Wang and Yue M. Lu. Online learning for sparse PCA in high dimensions: Exact dynamics and phase transitions. In2016 IEEE Information Theory Workshop, 2016

  40. [40]

    Implicitly normalized online PCA: A regularized algorithm with exact high-dimensional dynamics.arXiv preprint arXiv:2512.01231, 2025

    Samet Demir and Zafer Dogan. Implicitly normalized online PCA: A regularized algorithm with exact high-dimensional dynamics.arXiv preprint arXiv:2512.01231, 2025

  41. [41]

    Laura Balzano, Yuejie Chi, and Yue M. Lu. Streaming PCA and subspace tracking: The missing data case.Proceedings of the IEEE, 106(8):1293–1310, 2018

  42. [42]

    Eldar, and Yue M

    Chuang Wang, Yonina C. Eldar, and Yue M. Lu. Subspace estimation from incomplete obser- vations: A high-dimensional analysis.IEEE Journal of Selected Topics in Signal Processing, 12(6):1240–1252, 2018

  43. [43]

    Training dynamics of nonlinear contrastive learning model in the high dimensional limit.IEEE Signal Processing Letters, 31:2535–2539, 2024

    Linghuan Meng and Chuang Wang. Training dynamics of nonlinear contrastive learning model in the high dimensional limit.IEEE Signal Processing Letters, 31:2535–2539, 2024

  44. [44]

    A solvable high-dimensional model of GAN

    Chuang Wang, Hong Hu, and Yue Lu. A solvable high-dimensional model of GAN. InAdvances in Neural Information Processing Systems, 2019

  45. [45]

    Exploring the precise dynamics of single-layer GAN models

    Andrew Bond and Zafer Dogan. Exploring the precise dynamics of single-layer GAN models. InAdvances in Neural Information Processing Systems, 2024

  46. [46]

    Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup.Journal of Statistical Mechanics: Theory and Experiment, 2020(12):124010, 2020

    Sebastian Goldt, Madhu S Advani, Andrew M Saxe, Florent Krzakala, and Lenka Zdeborová. Dynamics of stochastic gradient descent for two-layer neural networks in the teacher–student setup.Journal of Statistical Mechanics: Theory and Experiment, 2020(12):124010, 2020

  47. [47]

    A mean field view of the land- scape of two-layer neural networks.Proceedings of the National Academy of Sciences U.S.A, 115(33):E7665–E7671, 2018

    Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the land- scape of two-layer neural networks.Proceedings of the National Academy of Sciences U.S.A, 115(33):E7665–E7671, 2018

  48. [48]

    Mean field analysis of neural networks: A law of large numbers.SIAM Journal on Applied Mathematics, 80(2):725–752, 2020

    Justin Sirignano and Konstantinos Spiliopoulos. Mean field analysis of neural networks: A law of large numbers.SIAM Journal on Applied Mathematics, 80(2):725–752, 2020

  49. [49]

    From high- dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

    Luca Arnaboldi, Ludovic Stephan, Florent Krzakala, and Bruno Loureiro. From high- dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks. InProceedings of Conference on Learning Theory, 2023

  50. [50]

    V . D. Calhoun, T. Adali, G. D. Pearlson, and J. J. Pekar. Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related waveforms.Human Brain Mapping, 13(1):43–53, 2001

  51. [51]

    Leon, Åke Björck, and Walter Gander

    Steven J. Leon, Åke Björck, and Walter Gander. Gram-schmidt orthogonalization: 100 years and more.Numerical Linear Algebra with Applications, 20(3):492–532, 2013

  52. [52]

    Theorie der linearen und nichtlinearen integralgleichungen i

    Erhard Schmidt. Theorie der linearen und nichtlinearen integralgleichungen i. teil: Entwicklung willkürlicherfunktionen nach systemen vorgeschriebener.Mathematische Annalen, 63:433–476, 1907

  53. [53]

    O˘guzhan Gültekin, Samet Demir, and Zafer Do˘gan

    M. O˘guzhan Gültekin, Samet Demir, and Zafer Do˘gan. Learning Rate Should Scale Inversely with High-Order Data Moments in High-Dimensional Online Independent Component Analysis. InProceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2025

  54. [54]

    Three lectures on free probability

    Jonathan Novak. Three lectures on free probability. InRandom Matrices, volume 65 ofMSRI Publications, pages 309–383. 2014. 12

  55. [55]

    The derivation of the pattern formulae of two-way partitions from those of simpler patterns.Proceedings of the London Mathematical Society, pages 195–208, 1931

    Ronald A Fisher and John Wishart. The derivation of the pattern formulae of two-way partitions from those of simpler patterns.Proceedings of the London Mathematical Society, pages 195–208, 1931

  56. [56]

    Rosenthal

    H.P. Rosenthal. On the subspaces of Lp(p >2) spanned by sequences of independent random variables.Israel J. Math., 8(3):273–303, 1970

  57. [57]

    Rosenthal type inequalities for random variables.Journal of Mathematical Inequalities, 14(2):305–318, 2020

    Pingyan Chen and Soo Hak Sung. Rosenthal type inequalities for random variables.Journal of Mathematical Inequalities, 14(2):305–318, 2020

  58. [58]

    Singular value decomposition and the centrality of Löwdin orthogonalizations.American Journal of Computational and Applied Mathematics, 3(1):33–35, 2013

    Ramesh Naidu Annavarapu. Singular value decomposition and the centrality of Löwdin orthogonalizations.American Journal of Computational and Applied Mathematics, 3(1):33–35, 2013

  59. [59]

    Arias, and Steven T

    Alan Edelman, Tomás A. Arias, and Steven T. Smith. The geometry of algorithms with orthogonality constraints.SIAM Journal on Matrix Analysis and Applications, 20(2):303–353, 1998

  60. [60]

    A feasible method for optimization with orthogonality constraints

    Zaiwen Wen and Wotao Yin. A feasible method for optimization with orthogonality constraints. Math. Program., 142(1–2):397–434, 2013

  61. [61]

    f ck,1Qk,i,1 +c k,2Qk,i,2 +e \j k,i ck,1u(j) 1√n + ck,2u(j) 2√n +a (j) k !# + τ√n E

    Nicolas Boumal.An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, 2023. 13 Additional notation used in the Appendix For vectors xk,i, the first index k denotes the iteration (time), and the second index i∈ {1, . . . , p} denotes the component. For any vector a, we write a(α) for its α-th entry. Throughout the Appendix, ∥ · ∥ ...

  62. [62]

    Here we use the setting of Example 4.4

    The setting corresponds to sparse component vectors u1 and u2, for ϕ(x) = 0. Here we use the setting of Example 4.4. For c1, we set β1 = 1, for c2, we set β2 = 0. In Monte Carlo simulations, component vectors are drawn from sparse distributions (P(u= 1/ √ρi) =ρ i and P(u= 0) = 1−ρ i) with sparsity levels ρ1 = 0.5 and ρ2 = 0.3. The dashed dark blue curves ...