Recognition: unknown
Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval
Pith reviewed 2026-05-08 16:20 UTC · model grok-4.3
The pith
Linear associative memories store only n/log n associations under winner-take-all but reach full quadratic capacity n under listwise retrieval.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the isotropic Gaussian model for stored pairs, any linear memory requires d squared proportional to n log n to achieve reliable top-1 retrieval, and the correlation matrix memory realizes this threshold through a sharp phase transition. Under the proposed Tail-Average Margin criterion for listwise retrieval, capacity scales as d squared proportional to n, with exact limiting laws for scores and margins derived from a two-parameter variational principle that also yields a closed-form critical load in the ridgeless case.
What carries the argument
The correlation matrix memory formed by superposing key-target outer products, together with the Tail-Average Margin (TAM) convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list.
If this is right
- The logarithmic factor is information-theoretically necessary for every linear memory under winner-take-all decoding.
- The Tail-Average Margin criterion restores the full quadratic capacity d squared proportional to n for listwise retrieval.
- The variational theory supplies exact limiting distributions of true scores, competitor scores, margins, and percentile profiles at any fixed load.
- A small-tail extrapolation of the same theory yields the conjectural sharp top-1 threshold d squared approximately 2 n log n.
Where Pith is reading between the lines
- The same extreme-value logarithmic cost is expected to appear for top-1 decoding whenever stored vectors possess sub-Gaussian tails.
- The scalar variational principle for TAM may serve as a template for analyzing other convex listwise criteria in high-dimensional linear models.
- Memory-dimension rules derived from the ridgeless critical load could be used directly to size practical linear associative stores for target list accuracy.
Load-bearing premise
The stored key-value pairs follow an isotropic Gaussian distribution.
What would settle it
Numerical simulation that tracks the empirical fraction of successful top-1 retrievals for the correlation matrix memory while sweeping the load alpha equals n over d squared across the value 1 over log n and checks whether a sharp drop occurs exactly there.
Figures
read the original abstract
How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\to\alpha$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes capacity thresholds for d×d linear associative memories storing n key-value pairs under an isotropic Gaussian model. For top-1 (winner-take-all) retrieval, it proves that capacity scales as d² ≍ n log n, achieved by the correlation matrix memory through a sharp phase transition and necessary for any linear memory due to extreme-value analysis. For listwise retrieval, it introduces the Tail-Average Margin (TAM) convex criterion and develops an exact asymptotic theory at load α = n/d² via a two-parameter scalar variational principle, yielding phase transitions, limiting laws for scores/margins/percentiles, and a closed-form critical load in the ridgeless limit. A small-tail extrapolation produces the conjecture d² ∼ 2n log n for the sharp top-1 threshold.
Significance. If the results hold, the work provides sharp, criterion-dependent capacity thresholds that clarify fundamental limits of linear associative memory. The identification of the logarithmic factor as the intrinsic extreme-value price of strict dominance in top-1 retrieval, combined with the quadratic scaling under TAM listwise retrieval, is a notable contribution. The TAM criterion and two-parameter variational principle offer a flexible, rich framework with explicit phenomenology (e.g., ridgeless critical load and limiting distributions). Credit is due for the rigorous necessity argument for top-1 and the exact asymptotic theory for the listwise case, though the top-1 conjecture rests on extrapolation.
major comments (2)
- [Abstract and necessity section] Abstract and necessity argument for top-1: the claim that d² ≍ n log n is necessary for any linear memory (not just correlation matrix memory) is load-bearing for the 'intrinsic price' conclusion; the extreme-value analysis correctly isolates the log n factor but the reduction showing no other linear construction can exceed this scale requires explicit verification of the argument's scope.
- [TAM asymptotic theory section] TAM variational principle and ridgeless limit: the closed-form critical load separating satisfiable/unsatisfiable phases is derived from the two-parameter scalar variational principle, but the explicit equations governing the limiting laws of true scores, competitor scores, and margins should be stated to allow direct verification of the predicted percentile profiles.
minor comments (3)
- [TAM definition] The definition and convexity of the Tail-Average Margin (TAM) criterion should include an explicit formula or optimization expression to clarify its relation to the upper-tail average.
- [Variational analysis] Notation for the load parameter α = n/d² and the two variational parameters should be introduced with a summary table or equation block for readability across the asymptotic analysis.
- [Conjecture section] The small-tail extrapolation leading to the conjectural d² ∼ 2n log n threshold would benefit from a brief discussion of the approximation error or regime of validity.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address the major comments point by point below.
read point-by-point responses
-
Referee: [Abstract and necessity section] Abstract and necessity argument for top-1: the claim that d² ≍ n log n is necessary for any linear memory (not just correlation matrix memory) is load-bearing for the 'intrinsic price' conclusion; the extreme-value analysis correctly isolates the log n factor but the reduction showing no other linear construction can exceed this scale requires explicit verification of the argument's scope.
Authors: We thank the referee for highlighting the importance of the necessity claim. Section 3 establishes necessity for arbitrary linear memories by observing that any such memory produces competitor scores that are linear functionals of the isotropic Gaussian key vectors; the resulting extreme-value tail is therefore identical to the correlation-matrix case and forces the log n factor. To make the scope of this reduction fully explicit, we will insert a clarifying remark in the revised manuscript. revision: yes
-
Referee: [TAM asymptotic theory section] TAM variational principle and ridgeless limit: the closed-form critical load separating satisfiable/unsatisfiable phases is derived from the two-parameter scalar variational principle, but the explicit equations governing the limiting laws of true scores, competitor scores, and margins should be stated to allow direct verification of the predicted percentile profiles.
Authors: We agree that stating the governing equations will improve verifiability. The limiting laws follow from the saddle-point conditions of the two-parameter variational problem already introduced in Section 4; in the revision we will write these equations explicitly together with the resulting expressions for the limiting distributions of true scores, competitor scores, margins, and percentile profiles. revision: yes
Circularity Check
No significant circularity; derivations self-contained in model
full rationale
The paper's central claims—the logarithmic capacity threshold d² ≍ n log n for top-1 retrieval and quadratic scaling d² ≍ n for listwise TAM retrieval—are derived via extreme-value analysis and a two-parameter scalar variational principle applied directly to the isotropic Gaussian stored-pair model. The correlation-matrix construction achieves the threshold through a sharp phase transition, and necessity for arbitrary linear memories follows from the same large-deviation argument identifying the log n factor as the strict-domination price. The ridgeless critical load and limiting laws are obtained from the variational principle without parameter fitting or self-referential definitions. The final top-1 conjecture is explicitly labeled as extrapolation, not a load-bearing step. No self-citation chains, ansatz smuggling, or renaming of known results appear in the derivation chain.
Axiom & Free-Parameter Ledger
free parameters (1)
- load alpha = n/d^2
axioms (1)
- domain assumption Key-value pairs drawn i.i.d. from isotropic Gaussian
invented entities (1)
-
Tail-Average Margin (TAM)
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Factual recall in linear associative memories: sharp asymptotics and mechanistic insights
Linear associative memories store up to p_c log p_c / d^2 = 1/2 associations, with optimal weights pushing correct scores just above the extreme value of competing outputs.
Reference graph
Works this paper leans on
-
[1]
Vershynin, Roman , year=
-
[2]
Vivien Cabannes and Elvis Dohmatob and Alberto Bietti , year=. 2310.02984 , archivePrefix=
-
[3]
Eshaan Nichani and Jason D. Lee and Alberto Bietti , year=. 2412.06538 , archivePrefix=
-
[4]
Alberto Bietti and Vivien Cabannes and Diane Bouchacourt and Herve Jegou and Leon Bottou , year=. 2306.00802 , archivePrefix=
-
[5]
Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory
Juno Kim and Eshaan Nichani and Denny Wu and Alberto Bietti and Jason D. Lee , year=. 2603.26554 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
1972 , volume=
Teuvo Kohonen , journal=. 1972 , volume=
1972
-
[7]
, journal=
Amari, S.-I. , journal=. 1972 , volume=
1972
-
[8]
, year =
Hopfield, J.J. , year =. Proceedings of the National Academy of Sciences , doi =
-
[9]
and Kaiser,
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. 2017 , booktitle =
2017
-
[10]
Gershman and Ila Fiete and Kazuki Irie , year=
Samuel J. Gershman and Ila Fiete and Kazuki Irie , year=. 2501.02950 , archivePrefix=
-
[11]
Hopfield networks is all you need.arXiv preprint arXiv:2008.02217, 2020
Hubert Ramsauer and Bernhard Schäfl and Johannes Lehner and Philipp Seidl and Michael Widrich and Thomas Adler and Lukas Gruber and Markus Holzleitner and Milena Pavlović and Geir Kjetil Sandve and Victor Greiff and David Kreil and Michael Kopp and Günter Klambauer and Johannes Brandstetter and Sepp Hochreiter , year=. 2008.02217 , archivePrefix=
-
[12]
Yibo Jiang and Goutham Rajendran and Pradeep Ravikumar and Bryon Aragam , year=. 2406.18400 , archivePrefix=
-
[13]
Mathematical Biosciences , volume =. 1972 , issn =. doi:10.1016/0025-5564(72)90075-2 , author =
-
[14]
Locating and Editing Factual Associations in
Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , booktitle =. Locating and Editing Factual Associations in
-
[15]
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva and Roei Schuster and Jonathan Berant and Omer Levy , year=. 2012.14913 , archivePrefix=
work page internal anchor Pith review arXiv 2012
-
[16]
Nelson Elhage and Tristan Hume and Catherine Olsson and Nicholas Schiefer and Tom Henighan and Shauna Kravec and Zac Hatfield-Dodds and Robert Lasenby and Dawn Drain and Carol Chen and Roger Grosse and Sam McCandlish and Jared Kaplan and Dario Amodei and Martin Wattenberg and Christopher Olah , year=. 2209.10652 , archivePrefix=
work page internal anchor Pith review arXiv
- [17]
-
[18]
Tyrrell and Uryasev, Stanislav , year =
Rockafellar, R. Tyrrell and Uryasev, Stanislav , year =. The Journal of Risk , doi =
-
[19]
Tyrrell and Uryasev, Stanislav , year =
Rockafellar, R. Tyrrell and Uryasev, Stanislav , year =. Journal of Banking & Finance , doi =
-
[20]
and Lim, Chinghway and Yu, Bin , title =
El Karoui, Noureddine and Bean, Derek and Bickel, Peter J. and Lim, Chinghway and Yu, Bin , title =. Proceedings of the National Academy of Sciences , volume =. 2013 , doi =
2013
-
[21]
and El Karoui, Noureddine , title =
Lei, Lihua and Bickel, Peter J. and El Karoui, Noureddine , title =. Probability Theory and Related Fields , volume =. 2018 , doi =
2018
-
[22]
Probability Theory and Related Fields , volume =
El Karoui, Noureddine , title =. Probability Theory and Related Fields , volume =. 2018 , doi =
2018
-
[23]
Elias, Peter , title =
-
[24]
Foundations and Trends in Theoretical Computer Science , volume =
Guruswami, Venkatesan , title =. Foundations and Trends in Theoretical Computer Science , volume =. 2007 , doi =
2007
-
[25]
Proceedings of the 24th International Conference on Machine Learning , pages =
Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang , title =. Proceedings of the 24th International Conference on Machine Learning , pages =. 2007 , doi =
2007
-
[26]
Kaoru Nakano , journal=. 1972 , volume=. doi:10.1109/TSMC.1972.4309133 , month =
-
[27]
Hebb, D. O. , title =. 1949 , pagetotal =
1949
-
[28]
Willshaw, D. J. and Buneman, O. P. and Longuet-Higgins, H. C. , journal=. 1969 , volume=
1969
-
[29]
E. Gardner , title =. doi:10.1209/0295-5075/4/4/016 , year =
-
[30]
, title=
Cover, Thomas M. , title=. IEEE Transactions on Electronic Computers , year=
-
[31]
Gardner , title=
E. Gardner , title=. Journal of Physics A: Mathematical and General , year=
-
[32]
Alex Graves and Greg Wayne and Ivo Danihelka , year=. 1410.5401 , archivePrefix=
work page internal anchor Pith review arXiv
-
[33]
Jason Weston and Sumit Chopra and Antoine Bordes , year=. 1410.3916 , archivePrefix=
-
[34]
Dmitry Krotov and John J Hopfield , year=. 1606.01164 , archivePrefix=
-
[35]
Advances in Neural Information Processing Systems , year=
Attention Approximates Sparse Distributed Memory , author=. Advances in Neural Information Processing Systems , year=
-
[36]
Beren Millidge and Tommaso Salvatori and Yuhang Song and Thomas Lukasiewicz and Rafal Bogacz , year=. 2202.04557 , archivePrefix=
-
[37]
Zeyuan Allen-Zhu and Yuanzhi Li , year=. 2404.05405 , archivePrefix=
-
[38]
Physical Review Letters , publisher=
Lucibello, Carlo and M\'ezard, Marc , year=. Physical Review Letters , publisher=. doi:10.1103/PhysRevLett.132.077301 , pages =
-
[39]
Frontiers in Computational Neuroscience , year =
Folli, Viola and Leonetti, Marco and Ruocco, Giancarlo , title =. Frontiers in Computational Neuroscience , year =. doi:10.3389/fncom.2016.00144 , issn =
-
[40]
IEEE Transactions on Information Theory , year=
The capacity of the Hopfield associative memory , author=. IEEE Transactions on Information Theory , year=
-
[41]
Nuri Mert Vural and Alberto Bietti and Mahdi Soltanolkotabi and Denny Wu , year=. 2603.15923 , archivePrefix=
-
[42]
Emmanuel J. Candes and Pragya Sur , year=. 1804.09753 , archivePrefix=
-
[43]
Proceedings of the National Academy of Sciences , year=
Sur, Pragya and Cand. Proceedings of the National Academy of Sciences , year=
-
[44]
Proceedings of the National Academy of Sciences , year=
Barbier, Jean and Krzakala, Florent and Macris, Nicolas and Miolane, L. Proceedings of the National Academy of Sciences , year=
-
[45]
and Zdeborov
Aubin, Benjamin and Krzakala, Florent and Lu, Yue M. and Zdeborov. Advances in Neural Information Processing Systems , year=
-
[46]
International conference on machine learning , pages=
The role of regularization in classification of high-dimensional noisy gaussian mixture , author=. International conference on machine learning , pages=
-
[47]
Vivien Cabannes and Berfin Simsek and Alberto Bietti , year=. 2402.18724 , archivePrefix=
-
[48]
Alessio Giorlandino and Sebastian Goldt and Antoine Maillard , year=
-
[49]
1973 , volume=
Kohonen, Teuvo and Ruohonen, Matti , journal=. 1973 , volume=
1973
-
[50]
, journal=
Kosko, B. , journal=. 1988 , volume=
1988
-
[51]
Kanerva, Pentti , title =
-
[52]
Journal of Statistical Physics , volume=
On a model of associative memory with huge storage capacity , author=. Journal of Statistical Physics , volume=. 2017 , publisher=
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.