pith. machine review for the scientific record. sign in

arxiv: 2605.11181 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.AI· cs.NA· math.NA· math.OC· stat.ML

Recognition: no theorem link

Muon is Not That Special: Random or Inverted Spectra Work Just as Well

Alex Massucco, B\'alint Mucs\'anyi, Carola-Bibiane Sch\"onlieb, Natha\"el Da Costa, Peter Zaika, Philipp Hennig, Yarin Gal, Yoav Gelberg, Zakhar Shumaylov

Pith reviewed 2026-05-13 04:07 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NAmath.NAmath.OCstat.ML
keywords Muon optimizerSchatten quasi-normsnon-Euclidean optimizationstep-size optimalityrandom spectraKaon optimizerGPT-2 trainingalignment and descent potential
0
0 comments X

The pith

Muon optimizer succeeds by guaranteeing optimal step sizes, not by adhering to any specific geometry.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges the view that Muon excels due to non-Euclidean geometric properties akin to second-order methods. It introduces Freon, based on Schatten quasi-norms that interpolate between SGD and Muon and extend into the quasi-norm regime, where best GPT-2 results occur. It then presents Kaon, which substitutes singular values with random noise yet matches Muon performance while retaining convergence guarantees. The authors conclude that performance depends on local alignment and descent potential, which control step-size optimality. A reader cares because this reframes why these optimizers work and questions the need for elaborate geometric constructions.

Core claim

Muon succeeds not by tracking an ideal global geometry, but by guaranteeing step-size optimality; precise geometric structure is not the key factor affecting optimization performance. Freon naturally interpolates between SGD and Muon while smoothly extrapolating into the quasi-norm regime, where the best-performing parameters lie. Kaon replaces singular values with random noise, lacks any coherent geometric structure, yet matches Muon's performance on GPT-2 training and retains classical convergence guarantees. Performance is instead controlled by the two local quantities of alignment and descent potential.

What carries the argument

Kaon optimizer, which replaces singular values with random noise while keeping singular vectors, to demonstrate that coherent geometric structure is unnecessary for performance.

If this is right

  • Freon achieves peak performance in the quasi-norm regime, which cannot be represented by any unitarily invariant linear minimization oracle.
  • Kaon matches Muon performance and keeps classical convergence guarantees despite lacking coherent geometry.
  • Optimization performance is controlled by alignment and descent potential rather than global geometry.
  • Each optimizer must tune its step size around these two local quantities.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This suggests that step-size guarantees can be prioritized over complex geometric modeling in optimizer design.
  • Random-spectrum methods might simplify other non-Euclidean optimizers without hurting results.
  • Experiments on tasks where geometric differences are more pronounced could show where geometry starts to matter.

Load-bearing premise

That performance equivalence on GPT-2 training between Muon, quasi-norm Freon, and random Kaon implies geometry is irrelevant rather than that the tasks and models simply do not expose geometric differences.

What would settle it

A task or model where a geometrically precise optimizer like Muon significantly outperforms a random-spectrum version like Kaon would settle whether geometry is truly irrelevant.

Figures

Figures reproduced from arXiv: 2605.11181 by Alex Massucco, B\'alint Mucs\'anyi, Carola-Bibiane Sch\"onlieb, Natha\"el Da Costa, Peter Zaika, Philipp Hennig, Yarin Gal, Yoav Gelberg, Zakhar Shumaylov.

Figure 1
Figure 1. Figure 1: Transformations of the gradient singular value spectrum under various spectral optimizers [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: NanoGPT learning rate sensitivity. Final validation loss as the tuned learning rates are jointly scaled, averaged over three seeds with ±2 std error bars. Black outlines mark the best learning rate per optimiser. 0 250 500 750 1000 1250 1500 1750 2000 Optimizer step 4 × 10 0 5 × 10 0 6 × 10 0 Validation loss Muon Kaon Freon c=2/3 Freon c=3/4 [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Final validation loss versus learning rate for all optimizer families on WikiText-2 (118M tokens). (a) TruncatedSGD improves over SGD but fails to close the gap to Muon – suppressing large singular values is necessary but not sufficient. (b) Kaon closely matches Muon despite replacing singular values with random noise. (c) Freon with c ≈ 2/3 closely matches Muon; the optimal c lies strictly outside the ran… view at source ↗
Figure 5
Figure 5. Figure 5: Random-feature regression with ReLU activations. We consider the problem in Equation (5), using the setup of Davis and Drusvyatskiy [2026]. Left: training loss for GD, SpecGD, Kaon, their optimal-step variants, and two adaptive-exponent methods. Centre: effective step sizes; the optimal-GD step size highly oscillates, while optimal-specGD step size is constant, being equal to specGD. Right: the exponent c … view at source ↗
Figure 6
Figure 6. Figure 6: Loss change ∆f over the joint (c, η) grid, evaluated on the full validation set. Each panel shows a different training point used for the calculation of the gradient. The dashed black curve marks the optimal learning rate per c; the green dotted line marks the actual training learning rate. The naive RF plug-in for GPT-2: Attempting to map the RF model directly onto a GPT-2 architecture exposes distinct li… view at source ↗
read the original abstract

The recent empirical success of the Muon optimizer has renewed interest in non-Euclidean optimization, typically justified by similarities with second-order methods, and linear minimization oracle (LMO) theory. In this paper, we challenge this geometric narrative through three contributions, demonstrating that precise geometric structure is not the key factor affecting optimization performance. First, we introduce Freon, a family of optimizers based on Schatten (quasi-)norms, powered by a novel, provably optimal QDWH-based iterative approximation. Freon naturally interpolates between SGD and Muon, while smoothly extrapolating into the quasi-norm regime. Empirically, the best-performing Schatten parameters for GPT-2 lie strictly within the quasi-norm regime, and thus cannot be represented by any unitarily invariant LMO. Second, noting that Freon performs well across a wide range of exponents, we introduce Kaon, an absurd optimizer that replaces singular values with random noise. Despite lacking any coherent geometric structure, Kaon matches Muon's performance and retains classical convergence guarantees, proving that strict adherence to a precise geometry is practically irrelevant. Third, having shown that geometry is not the primary driver of performance, we demonstrate it is instead controlled by two local quantities: alignment and descent potential. Ultimately, each optimizer must tune its step size around these two quantities. While their dynamics are difficult to predict a-priori, evaluating them within a stochastic random feature model yields a precise insight: Muon succeeds not by tracking an ideal global geometry, but by guaranteeing step-size optimality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that Muon's success is not due to precise geometric structure (e.g., LMO or Schatten norms) but to guaranteeing local step-size optimality via alignment and descent potential. It introduces Freon (Schatten quasi-norm family with QDWH approximation, interpolating SGD/Muon and extending beyond LMO-representable regimes), shows best GPT-2 performance in the quasi-norm regime, introduces Kaon (random singular values) that matches Muon on GPT-2 while retaining classical convergence guarantees, and uses a stochastic random-feature model to derive the step-size insight.

Significance. If the central claim holds, the work would meaningfully redirect non-Euclidean optimizer research away from global geometry toward local dynamical quantities, with practical implications for design. Credit is due for the provably optimal QDWH-based Freon approximation and for retaining classical convergence guarantees under Kaon's random spectra; these are concrete, reusable contributions.

major comments (3)
  1. [GPT-2 experiments] GPT-2 experiments (abstract and §4): the reported performance equivalence between Muon, Freon (quasi-norm), and Kaon (random spectra) is load-bearing for the claim that 'precise geometric structure is not the key factor,' yet no error bars, run counts, or statistical tests are mentioned; without them the matching cannot be distinguished from noise and does not rule out that the transformer landscape simply fails to expose geometric differences.
  2. [Random-feature model] Random-feature model (final section): the derivation is presented as yielding 'precise insight' into step-size optimality, but the manuscript provides no details on how model parameters are chosen independently of the optimizer runs; if calibrated on the same GPT-2 trajectories the explanation risks post-hoc fitting rather than independent prediction.
  3. [Implications and discussion] Implications and discussion: the inference that GPT-2 equivalence shows geometry is irrelevant in general is not supported by any controlled ablation on quadratics, high-condition-number problems, or strongly convex landscapes where LMO/Schatten geometry is theoretically predicted to matter; such tests are required to close the gap between the empirical observation and the broad claim.
minor comments (2)
  1. [Abstract] Abstract: 'classical convergence guarantees for Kaon' is stated without citing the specific theorem or stating the assumptions under which they hold.
  2. [Freon] Freon section: the QDWH iterative approximation is described as 'provably optimal' but the convergence rate, iteration complexity, or implementation pseudocode is not supplied, hindering reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript accordingly to improve statistical rigor and clarify the theoretical components.

read point-by-point responses
  1. Referee: [GPT-2 experiments] GPT-2 experiments (abstract and §4): the reported performance equivalence between Muon, Freon (quasi-norm), and Kaon (random spectra) is load-bearing for the claim that 'precise geometric structure is not the key factor,' yet no error bars, run counts, or statistical tests are mentioned; without them the matching cannot be distinguished from noise and does not rule out that the transformer landscape simply fails to expose geometric differences.

    Authors: We agree that error bars, run counts, and statistical tests are necessary to rigorously support the performance equivalence. In the revised manuscript we report results from five independent random seeds for each optimizer variant on GPT-2, include standard-error bars on all relevant figures and tables, and add paired t-tests confirming that differences between Muon, Freon (quasi-norm regime), and Kaon are not statistically significant (p > 0.05). These additions directly address the concern that observed matching could be attributable to noise. revision: yes

  2. Referee: [Random-feature model] Random-feature model (final section): the derivation is presented as yielding 'precise insight' into step-size optimality, but the manuscript provides no details on how model parameters are chosen independently of the optimizer runs; if calibrated on the same GPT-2 trajectories the explanation risks post-hoc fitting rather than independent prediction.

    Authors: The random-feature model parameters (feature dimension, kernel bandwidth, and sampling distribution) were fixed a priori using standard values from the random-feature literature for attention kernels, without reference to the GPT-2 optimizer trajectories. We have added an appendix subsection that documents the exact parameter choices, shows that the step-size optimality prediction is stable across reasonable variations of those parameters, and confirms that no fitting to the empirical loss curves was performed. This establishes the derivation as an independent explanatory tool rather than a post-hoc rationalization. revision: yes

  3. Referee: [Implications and discussion] Implications and discussion: the inference that GPT-2 equivalence shows geometry is irrelevant in general is not supported by any controlled ablation on quadratics, high-condition-number problems, or strongly convex landscapes where LMO/Schatten geometry is theoretically predicted to matter; such tests are required to close the gap between the empirical observation and the broad claim.

    Authors: We accept that the GPT-2 results alone do not constitute a universal proof and have expanded the discussion section to explicitly scope our claims to the non-convex, high-dimensional regimes characteristic of language-model training. We explain why quadratic or strongly convex test problems are unlikely to be representative in this setting (presence of saddle points, heterogeneous curvature, and stochastic gradients) and why the success of a geometry-free optimizer such as Kaon on GPT-2 is therefore informative for the practical domain we study. While additional controlled ablations on simpler landscapes would be valuable, they lie outside the paper’s focus on modern deep-learning optimization; the combination of the GPT-2 evidence and the random-feature analysis is sufficient to support the stated conclusions. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical evidence and model analysis remain independent of inputs

full rationale

The derivation proceeds by defining Freon via Schatten (quasi-)norms with a new QDWH solver, reporting empirical best parameters on GPT-2, constructing Kaon by explicit replacement of singular values with random noise, observing performance parity, and then analyzing alignment plus descent potential inside a separate stochastic random-feature model. None of these steps reduce by construction to prior results or fitted parameters: Kaon is deliberately geometry-free by definition, the GPT-2 equivalence is an external observation rather than a tautology, and the random-feature model supplies an independent local analysis whose parameters are not stated to be calibrated on the optimizer runs themselves. The central claim therefore rests on falsifiable empirical comparisons rather than self-referential renaming or post-hoc fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated premise that GPT-2 training dynamics are representative of the regime where geometry should matter, plus the assumption that the random-feature model faithfully captures the relevant local quantities without additional fitted parameters.

axioms (1)
  • domain assumption GPT-2 training trajectories expose the same local alignment and descent statistics that would be observed in any setting where geometric structure matters.
    Invoked when generalizing from the reported GPT-2 results to the broader claim that geometry is irrelevant.

pith-pipeline@v0.9.0 · 5638 in / 1380 out tokens · 55915 ms · 2026-05-13T04:07:47.492016+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages

  1. [1]

    Merity, Stephen and Xiong, Caiming and Bradbury, James and Socher, Richard , year=. The

  2. [2]

    2025 , eprint=

    An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants , author=. 2025 , eprint=

  3. [3]

    Gradient descent on neural networks typically occurs at the edge of stability.arXiv preprint arXiv:2103.00065,

    Gradient descent on neural networks typically occurs at the edge of stability , author=. arXiv preprint arXiv:2103.00065 , year=

  4. [4]

    2013 , publisher=

    Matrix analysis , author=. 2013 , publisher=

  5. [5]

    arXiv preprint arXiv:2603.05002 , year=

    Non-Euclidean Gradient Descent Operates at the Edge of Stability , author=. arXiv preprint arXiv:2603.05002 , year=

  6. [6]

    and Beckermann, Bernhard , title =

    Filip, Silviu-Ioan and Nakatsukasa, Yuji and Trefethen, Lloyd N. and Beckermann, Bernhard , title =. SIAM Journal on Scientific Computing , volume =. 2018 , doi =. https://doi.org/10.1137/17M1132409 , abstract =

  7. [7]

    and Wilber, Heather D

    Trefethen, Lloyd N. and Wilber, Heather D. , title =. SIAM Journal on Scientific Computing , volume =. 2025 , doi =. https://doi.org/10.1137/24M1687960 , abstract =

  8. [8]

    , title =

    Nakatsukasa, Yuji and Freund, Roland W. , title =. SIAM Review , volume =. 2016 , doi =. https://doi.org/10.1137/140990334 , abstract =

  9. [9]

    SIAM Journal on Matrix Analysis and Applications , volume=

    Optimizing Halley's iteration for computing the matrix polar decomposition , author=. SIAM Journal on Matrix Analysis and Applications , volume=. 2010 , publisher=

  10. [10]

    W. B. Gragg , journal =. The Padé Table and Its Relation to Certain Algorithms of Numerical Analysis , urldate =

  11. [11]

    Stegun , title =

    Milton Abramowitz and Irene A. Stegun , title =. 1964 , note =

  12. [12]

    The Fourteenth International Conference on Learning Representations , year=

    The Polar Express: Optimal Matrix Sign Methods and their Application to the Muon Algorithm , author=. The Fourteenth International Conference on Learning Representations , year=

  13. [13]

    Adamuon: Adaptive muon optimizer.arXiv preprint arXiv:2507.11005, 2025

    Adamuon: Adaptive muon optimizer , author=. arXiv preprint arXiv:2507.11005 , year=

  14. [14]

    Gluon: Making Muon & Scion Great Again!(Bridging Theory and Practice of LMO-based Optimizers for LLMs) , author=

  15. [15]

    Dion: Distributed orthonormalized updates.arXiv preprint arXiv:2504.05295, 2025

    Dion: Distributed orthonormalized updates , author=. arXiv preprint arXiv:2504.05295 , year=

  16. [16]

    2026 , eprint=

    The Newton-Muon Optimizer , author=. 2026 , eprint=

  17. [17]

    Old Optimizer, New Norm: An Anthology , author=

  18. [18]

    DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence , author=

  19. [19]

    SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling , author=

  20. [20]

    International Conference on Machine Learning , pages=

    Shampoo: Preconditioned stochastic tensor optimization , author=. International Conference on Machine Learning , pages=. 2018 , organization=

  21. [21]

    2019 , eprint=

    Decoupled Weight Decay Regularization , author=. 2019 , eprint=

  22. [22]

    , author=

    Adaptive subgradient methods for online learning and stochastic optimization. , author=. Journal of machine learning research , volume=

  23. [23]

    2017 , eprint=

    Adam: A Method for Stochastic Optimization , author=. 2017 , eprint=

  24. [24]

    2026 , eprint=

    Insights on Muon from Simple Quadratics , author=. 2026 , eprint=

  25. [25]

    2025 , eprint=

    Muon Outperforms Adam in Tail-End Associative Memory Learning , author=. 2025 , eprint=

  26. [26]

    2026 , eprint=

    Delving into Muon and Beyond: Deep Analysis and Extensions , author=. 2026 , eprint=

  27. [27]

    2026 , eprint=

    Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory , author=. 2026 , eprint=

  28. [28]

    2026 , eprint=

    Preconditioning Benefits of Spectral Orthogonalization in Muon , author=. 2026 , eprint=

  29. [29]

    2026 , eprint=

    Sign-Based Optimizers Are Effective Under Heavy-Tailed Noise , author=. 2026 , eprint=

  30. [30]

    Physical review E , volume=

    Logistic map: A possible random-number generator , author=. Physical review E , volume=. 1995 , publisher=

  31. [31]

    2026 , eprint=

    On the Width Scaling of Neural Optimizers Under Matrix Operator Norms I: Row/Column Normalization and Hyperparameter Transfer , author=. 2026 , eprint=

  32. [32]

    2026 , eprint=

    Step-Size Stability in Stochastic Optimization: A Theoretical Perspective , author=. 2026 , eprint=

  33. [33]

    Journal of Approximation Theory , volume=

    Approximating the pth root by composite rational functions , author=. Journal of Approximation Theory , volume=. 2021 , publisher=

  34. [34]

    1948 , publisher=

    Analytic Theory of Continued Fractions , author=. 1948 , publisher=

  35. [35]

    2026 , url =

    Gram Newton-Schulz , author =. 2026 , url =

  36. [36]

    2008 , publisher=

    Functions of matrices: theory and computation , author=. 2008 , publisher=

  37. [37]

    2026 , eprint=

    Enhancing LLM Training via Spectral Clipping , author=. 2026 , eprint=

  38. [38]

    2026 , eprint=

    When do spectral gradient updates help in deep learning? , author=. 2026 , eprint=

  39. [39]

    arXiv preprint arXiv:2502.04664 , year=

    Implicit bias of spectral descent and muon on multiclass separable data , author=. arXiv preprint arXiv:2502.04664 , year=

  40. [40]

    Understanding gradient orthogonalization for deep learning via non-Euclidean trust-region optimization.arXiv preprint arXiv:2503.12645,

    Understanding gradient orthogonalization for deep learning via non-euclidean trust-region optimization , author=. arXiv preprint arXiv:2503.12645 , year=

  41. [41]

    Training deep learning models with norm-constrained lmos.arXiv preprint arXiv:2502.07529, 2025

    Training deep learning models with norm-constrained lmos , author=. arXiv preprint arXiv:2502.07529 , year=

  42. [42]

    2025 , eprint=

    Fantastic Pretraining Optimizers and Where to Find Them , author=. 2025 , eprint=

  43. [43]

    2025 , eprint=

    Benchmarking Optimizers for Large Language Model Pretraining , author=. 2025 , eprint=

  44. [44]

    2026 , eprint=

    ARO: A New Lens On Matrix Optimization For Large Models , author=. 2026 , eprint=

  45. [45]

    2025 , eprint=

    GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models , author=. 2025 , eprint=

  46. [46]

    2024 , eprint=

    Exact, Tractable Gauss-Newton Optimization in Deep Reversible Architectures Reveal Poor Generalization , author=. 2024 , eprint=

  47. [47]

    2025 , eprint=

    Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold , author=. 2025 , eprint=

  48. [48]

    2021 , eprint=

    Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization , author=. 2021 , eprint=

  49. [49]

    SIAM Journal on Numerical Analysis, Ser

    Generalized rational approximation , author=. SIAM Journal on Numerical Analysis, Ser. B , volume=. 1964 , publisher=

  50. [50]

    SIAM Journal on Numerical Analysis , year=

    Approximation by Generalized Rationals , author=. SIAM Journal on Numerical Analysis , year=

  51. [51]

    Achieser, N. I. , TITLE =. 1992 , PAGES =

  52. [52]

    2023 , eprint=

    Rethinking Gauss-Newton for learning over-parameterized models , author=. 2023 , eprint=

  53. [53]

    2025 , eprint=

    Isotropic Curvature Model for Understanding Deep Learning Optimization: Is Gradient Orthogonalization Optimal? , author=. 2025 , eprint=

  54. [54]

    2025 , eprint=

    Muon is Scalable for LLM Training , author=. 2025 , eprint=

  55. [55]

    2025 , eprint=

    Kimi K2: Open Agentic Intelligence , author=. 2025 , eprint=

  56. [56]

    2025 , eprint=

    Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods , author=. 2025 , eprint=

  57. [57]

    Jui-Nan Yen and Si Si and Zhao Meng and Felix Yu and Sai Surya Duvvuri and Inderjit S Dhillon and Cho-Jui Hsieh and Sanjiv Kumar , booktitle=. Lo. 2025 , url=

  58. [58]

    2024 , url =

    Keller Jordan and Yuchen Jin and Vlado Boza and Jiacheng You and Franz Cesista and Laker Newhouse and Jeremy Bernstein , title =. 2024 , url =

  59. [59]

    arXiv preprint arXiv:2408.01517 , year=

    Gradient flow in parameter space is equivalent to linear interpolation in output space , author=. arXiv preprint arXiv:2408.01517 , year=

  60. [60]

    Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=

    Real-time segmentation of on-line handwritten arabic script , author=. Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on , pages=. 2014 , organization=

  61. [61]

    Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=

    Fast classification of handwritten on-line Arabic characters , author=. Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of , pages=. 2014 , organization=

  62. [62]

    arXiv preprint arXiv:1804.09028 , year=

    Estimate and Replace: A Novel Approach to Integrating Deep Neural Networks with Existing Applications , author=. arXiv preprint arXiv:1804.09028 , year=

  63. [63]

    Kristiadi, Agustinus and Dangel, Felix and Hennig, Philipp , month = oct, year =. The

  64. [64]

    , title =

    Kenney, Charles and Laub, Alan J. , title =. SIAM Journal on Matrix Analysis and Applications , volume =. 1991 , doi =. https://doi.org/10.1137/0612020 , abstract =

  65. [65]

    2015 , eprint=

    Isotropic Local Laws for Sample Covariance and Generalized Wigner Matrices , author=. 2015 , eprint=

  66. [66]

    Pillai and Jun Yin , title =

    Natesh S. Pillai and Jun Yin , title =. The Annals of Applied Probability , number =. 2014 , doi =

  67. [67]

    Limitations of the empirical

    Kunstner, Frederik and Hennig, Philipp and Balles, Lukas , year =. Limitations of the empirical. Advances in

  68. [68]

    2016 , eprint=

    Anisotropic local laws for random matrices , author=. 2016 , eprint=

  69. [69]

    Random Matrix Theory , booktitle=

    Couillet, Romain and Liao, Zhenyu , year=. Random Matrix Theory , booktitle=