arxiv: 2605.05189 · v1 · submitted 2026-05-06 · 📊 stat.ML · cs.IT· cs.LG· math.IT

Recognition: unknown

Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

Nicholas Barnfield , Juno Kim , Eshaan Nichani , Jason D. Lee , Yue M. Lu

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:20 UTC · model grok-4.3

classification 📊 stat.ML cs.ITcs.LGmath.IT

keywords linear associative memorycapacity thresholdswinner-take-alllistwise retrievalcorrelation matrix memoryphase transitiontail-average marginextreme value statistics

0 comments

The pith

Linear associative memories store only n/log n associations under winner-take-all but reach full quadratic capacity n under listwise retrieval.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that retrieval criterion determines the usable capacity of a d by d linear memory beyond its raw d squared degrees of freedom. In the isotropic Gaussian model, top-1 decoding forces every correct target to beat its largest distractor and therefore requires the scale d squared proportional to n log n; the correlation matrix memory achieves this threshold via a sharp phase transition, and the same lower bound holds for every linear map. Replacing strict top-1 with listwise retrieval formalized by the convex Tail-Average Margin criterion removes the extreme-value penalty and restores the quadratic scaling d squared proportional to n, together with exact asymptotic laws for scores, margins, and percentile profiles obtained from a two-parameter variational problem.

Core claim

In the isotropic Gaussian model for stored pairs, any linear memory requires d squared proportional to n log n to achieve reliable top-1 retrieval, and the correlation matrix memory realizes this threshold through a sharp phase transition. Under the proposed Tail-Average Margin criterion for listwise retrieval, capacity scales as d squared proportional to n, with exact limiting laws for scores and margins derived from a two-parameter variational principle that also yields a closed-form critical load in the ridgeless case.

What carries the argument

The correlation matrix memory formed by superposing key-target outer products, together with the Tail-Average Margin (TAM) convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list.

If this is right

The logarithmic factor is information-theoretically necessary for every linear memory under winner-take-all decoding.
The Tail-Average Margin criterion restores the full quadratic capacity d squared proportional to n for listwise retrieval.
The variational theory supplies exact limiting distributions of true scores, competitor scores, margins, and percentile profiles at any fixed load.
A small-tail extrapolation of the same theory yields the conjectural sharp top-1 threshold d squared approximately 2 n log n.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same extreme-value logarithmic cost is expected to appear for top-1 decoding whenever stored vectors possess sub-Gaussian tails.
The scalar variational principle for TAM may serve as a template for analyzing other convex listwise criteria in high-dimensional linear models.
Memory-dimension rules derived from the ridgeless critical load could be used directly to size practical linear associative stores for target list accuracy.

Load-bearing premise

The stored key-value pairs follow an isotropic Gaussian distribution.

What would settle it

Numerical simulation that tracks the empirical fraction of successful top-1 retrievals for the correlation matrix memory while sweeping the load alpha equals n over d squared across the value 1 over log n and checks whether a sharp drop occurs exactly there.

Figures

Figures reproduced from arXiv: 2605.05189 by Eshaan Nichani, Jason D. Lee, Juno Kim, Nicholas Barnfield, Yue M. Lu.

**Figure 1.** Figure 1: Finite-dimensional validation of the predicted phase boundary. The solid curve is view at source ↗

**Figure 2.** Figure 2: Scalar theory compared with finite-dimensional simulations along a one view at source ↗

**Figure 3.** Figure 3: Numerical evidence for the small-tail top-1 conjecture. view at source ↗

read the original abstract

How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\to\alpha$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper pins down that linear associative memory pays a log n price for strict top-1 retrieval but recovers d² ~ n scaling under a relaxed listwise criterion via a new convex margin.

read the letter

The main thing to know is that the capacity of a d by d linear memory is not a fixed multiple of d squared; it depends on the retrieval rule. Under winner-take-all, the paper shows a sharp threshold at d squared proportional to n log n that is both achieved by the correlation matrix construction and necessary for any linear map in their model. Under listwise retrieval they introduce the Tail-Average Margin and obtain an exact two-parameter variational characterization that yields closed-form critical loads and limiting score laws at load alpha = n over d squared.

Referee Report

2 major / 3 minor

Summary. The paper analyzes capacity thresholds for d×d linear associative memories storing n key-value pairs under an isotropic Gaussian model. For top-1 (winner-take-all) retrieval, it proves that capacity scales as d² ≍ n log n, achieved by the correlation matrix memory through a sharp phase transition and necessary for any linear memory due to extreme-value analysis. For listwise retrieval, it introduces the Tail-Average Margin (TAM) convex criterion and develops an exact asymptotic theory at load α = n/d² via a two-parameter scalar variational principle, yielding phase transitions, limiting laws for scores/margins/percentiles, and a closed-form critical load in the ridgeless limit. A small-tail extrapolation produces the conjecture d² ∼ 2n log n for the sharp top-1 threshold.

Significance. If the results hold, the work provides sharp, criterion-dependent capacity thresholds that clarify fundamental limits of linear associative memory. The identification of the logarithmic factor as the intrinsic extreme-value price of strict dominance in top-1 retrieval, combined with the quadratic scaling under TAM listwise retrieval, is a notable contribution. The TAM criterion and two-parameter variational principle offer a flexible, rich framework with explicit phenomenology (e.g., ridgeless critical load and limiting distributions). Credit is due for the rigorous necessity argument for top-1 and the exact asymptotic theory for the listwise case, though the top-1 conjecture rests on extrapolation.

major comments (2)

[Abstract and necessity section] Abstract and necessity argument for top-1: the claim that d² ≍ n log n is necessary for any linear memory (not just correlation matrix memory) is load-bearing for the 'intrinsic price' conclusion; the extreme-value analysis correctly isolates the log n factor but the reduction showing no other linear construction can exceed this scale requires explicit verification of the argument's scope.
[TAM asymptotic theory section] TAM variational principle and ridgeless limit: the closed-form critical load separating satisfiable/unsatisfiable phases is derived from the two-parameter scalar variational principle, but the explicit equations governing the limiting laws of true scores, competitor scores, and margins should be stated to allow direct verification of the predicted percentile profiles.

minor comments (3)

[TAM definition] The definition and convexity of the Tail-Average Margin (TAM) criterion should include an explicit formula or optimization expression to clarify its relation to the upper-tail average.
[Variational analysis] Notation for the load parameter α = n/d² and the two variational parameters should be introduced with a summary table or equation block for readability across the asymptotic analysis.
[Conjecture section] The small-tail extrapolation leading to the conjectural d² ∼ 2n log n threshold would benefit from a brief discussion of the approximation error or regime of validity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the major comments point by point below.

read point-by-point responses

Referee: [Abstract and necessity section] Abstract and necessity argument for top-1: the claim that d² ≍ n log n is necessary for any linear memory (not just correlation matrix memory) is load-bearing for the 'intrinsic price' conclusion; the extreme-value analysis correctly isolates the log n factor but the reduction showing no other linear construction can exceed this scale requires explicit verification of the argument's scope.

Authors: We thank the referee for highlighting the importance of the necessity claim. Section 3 establishes necessity for arbitrary linear memories by observing that any such memory produces competitor scores that are linear functionals of the isotropic Gaussian key vectors; the resulting extreme-value tail is therefore identical to the correlation-matrix case and forces the log n factor. To make the scope of this reduction fully explicit, we will insert a clarifying remark in the revised manuscript. revision: yes
Referee: [TAM asymptotic theory section] TAM variational principle and ridgeless limit: the closed-form critical load separating satisfiable/unsatisfiable phases is derived from the two-parameter scalar variational principle, but the explicit equations governing the limiting laws of true scores, competitor scores, and margins should be stated to allow direct verification of the predicted percentile profiles.

Authors: We agree that stating the governing equations will improve verifiability. The limiting laws follow from the saddle-point conditions of the two-parameter variational problem already introduced in Section 4; in the revision we will write these equations explicitly together with the resulting expressions for the limiting distributions of true scores, competitor scores, margins, and percentile profiles. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations self-contained in model

full rationale

The paper's central claims—the logarithmic capacity threshold d² ≍ n log n for top-1 retrieval and quadratic scaling d² ≍ n for listwise TAM retrieval—are derived via extreme-value analysis and a two-parameter scalar variational principle applied directly to the isotropic Gaussian stored-pair model. The correlation-matrix construction achieves the threshold through a sharp phase transition, and necessity for arbitrary linear memories follows from the same large-deviation argument identifying the log n factor as the strict-domination price. The ridgeless critical load and limiting laws are obtained from the variational principle without parameter fitting or self-referential definitions. The final top-1 conjecture is explicitly labeled as extrapolation, not a load-bearing step. No self-citation chains, ansatz smuggling, or renaming of known results appear in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

Central claims rest on isotropic Gaussian model for pairs and newly defined Tail-Average Margin; variational principle is the core derivation.

free parameters (1)

load alpha = n/d^2
Normalized load parametrizing the asymptotic regime and its critical value.

axioms (1)

domain assumption Key-value pairs drawn i.i.d. from isotropic Gaussian
Enables extreme-value and variational calculations.

invented entities (1)

Tail-Average Margin (TAM) no independent evidence
purpose: Convex criterion certifying correct target in candidate list
New functional formalizing listwise regime.

pith-pipeline@v0.9.0 · 10900 in / 1045 out tokens · 91170 ms · 2026-05-08T16:20:23.724603+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Factual recall in linear associative memories: sharp asymptotics and mechanistic insights
stat.ML 2026-05 unverdicted novelty 7.0

Linear associative memories store up to p_c log p_c / d^2 = 1/2 associations, with optimal weights pushing correct scores just above the extreme value of competing outputs.

Reference graph

Works this paper leans on

52 extracted references · 23 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

Vershynin, Roman , year=
[2]

2310.02984 , archivePrefix=

Vivien Cabannes and Elvis Dohmatob and Alberto Bietti , year=. 2310.02984 , archivePrefix=

work page arXiv
[3]

Understanding factual recall in transformers via associative memories.arXiv preprint arXiv:2412.06538, 2024

Eshaan Nichani and Jason D. Lee and Alberto Bietti , year=. 2412.06538 , archivePrefix=

work page arXiv
[4]

2306.00802 , archivePrefix=

Alberto Bietti and Vivien Cabannes and Diane Bouchacourt and Herve Jegou and Leon Bottou , year=. 2306.00802 , archivePrefix=

work page arXiv
[5]

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

Juno Kim and Eshaan Nichani and Denny Wu and Alberto Bietti and Jason D. Lee , year=. 2603.26554 , archivePrefix=

work page internal anchor Pith review Pith/arXiv arXiv
[6]

1972 , volume=

Teuvo Kohonen , journal=. 1972 , volume=

1972
[7]

, journal=

Amari, S.-I. , journal=. 1972 , volume=

1972
[8]

, year =

Hopfield, J.J. , year =. Proceedings of the National Academy of Sciences , doi =
[9]

and Kaiser,

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. 2017 , booktitle =

2017
[10]

Gershman and Ila Fiete and Kazuki Irie , year=

Samuel J. Gershman and Ila Fiete and Kazuki Irie , year=. 2501.02950 , archivePrefix=

work page arXiv
[11]

Hopfield networks is all you need.arXiv preprint arXiv:2008.02217, 2020

Hubert Ramsauer and Bernhard Schäfl and Johannes Lehner and Philipp Seidl and Michael Widrich and Thomas Adler and Lukas Gruber and Markus Holzleitner and Milena Pavlović and Geir Kjetil Sandve and Victor Greiff and David Kreil and Michael Kopp and Günter Klambauer and Johannes Brandstetter and Sepp Hochreiter , year=. 2008.02217 , archivePrefix=

work page arXiv 2008
[12]

2406.18400 , archivePrefix=

Yibo Jiang and Goutham Rajendran and Pradeep Ravikumar and Bryon Aragam , year=. 2406.18400 , archivePrefix=

work page arXiv
[13]

1972 , issn =

Mathematical Biosciences , volume =. 1972 , issn =. doi:10.1016/0025-5564(72)90075-2 , author =

work page doi:10.1016/0025-5564(72)90075-2 1972
[14]

Locating and Editing Factual Associations in

Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , booktitle =. Locating and Editing Factual Associations in
[15]

Transformer Feed-Forward Layers Are Key-Value Memories

Mor Geva and Roei Schuster and Jonathan Berant and Omer Levy , year=. 2012.14913 , archivePrefix=

work page internal anchor Pith review arXiv 2012
[16]

Toy Models of Superposition

Nelson Elhage and Tristan Hume and Catherine Olsson and Nicholas Schiefer and Tom Henighan and Shauna Kravec and Zac Hatfield-Dodds and Robert Lasenby and Dawn Drain and Carol Chen and Roger Grosse and Sam McCandlish and Jared Kaplan and Dario Amodei and Martin Wattenberg and Christopher Olah , year=. 2209.10652 , archivePrefix=

work page internal anchor Pith review arXiv
[17]

Kawata and T

Ryotaro Kawata and Taiji Suzuki , year=. 2602.01863 , archivePrefix=

work page arXiv
[18]

Tyrrell and Uryasev, Stanislav , year =

Rockafellar, R. Tyrrell and Uryasev, Stanislav , year =. The Journal of Risk , doi =
[19]

Tyrrell and Uryasev, Stanislav , year =

Rockafellar, R. Tyrrell and Uryasev, Stanislav , year =. Journal of Banking & Finance , doi =
[20]

and Lim, Chinghway and Yu, Bin , title =

El Karoui, Noureddine and Bean, Derek and Bickel, Peter J. and Lim, Chinghway and Yu, Bin , title =. Proceedings of the National Academy of Sciences , volume =. 2013 , doi =

2013
[21]

and El Karoui, Noureddine , title =

Lei, Lihua and Bickel, Peter J. and El Karoui, Noureddine , title =. Probability Theory and Related Fields , volume =. 2018 , doi =

2018
[22]

Probability Theory and Related Fields , volume =

El Karoui, Noureddine , title =. Probability Theory and Related Fields , volume =. 2018 , doi =

2018
[23]

Elias, Peter , title =
[24]

Foundations and Trends in Theoretical Computer Science , volume =

Guruswami, Venkatesan , title =. Foundations and Trends in Theoretical Computer Science , volume =. 2007 , doi =

2007
[25]

Proceedings of the 24th International Conference on Machine Learning , pages =

Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang , title =. Proceedings of the 24th International Conference on Machine Learning , pages =. 2007 , doi =

2007
[26]

1972 , volume=

Kaoru Nakano , journal=. 1972 , volume=. doi:10.1109/TSMC.1972.4309133 , month =

work page doi:10.1109/tsmc.1972.4309133 1972
[27]

Hebb, D. O. , title =. 1949 , pagetotal =

1949
[28]

Willshaw, D. J. and Buneman, O. P. and Longuet-Higgins, H. C. , journal=. 1969 , volume=

1969
[29]

Gardner , title =

E. Gardner , title =. doi:10.1209/0295-5075/4/4/016 , year =

work page doi:10.1209/0295-5075/4/4/016
[30]

, title=

Cover, Thomas M. , title=. IEEE Transactions on Electronic Computers , year=
[31]

Gardner , title=

E. Gardner , title=. Journal of Physics A: Mathematical and General , year=
[32]

Neural Turing Machines

Alex Graves and Greg Wayne and Ivo Danihelka , year=. 1410.5401 , archivePrefix=

work page internal anchor Pith review arXiv
[33]

Memory Networks

Jason Weston and Sumit Chopra and Antoine Bordes , year=. 1410.3916 , archivePrefix=

work page Pith review arXiv
[34]

1606.01164 , archivePrefix=

Dmitry Krotov and John J Hopfield , year=. 1606.01164 , archivePrefix=

work page arXiv
[35]

Advances in Neural Information Processing Systems , year=

Attention Approximates Sparse Distributed Memory , author=. Advances in Neural Information Processing Systems , year=
[36]

2202.04557 , archivePrefix=

Beren Millidge and Tommaso Salvatori and Yuhang Song and Thomas Lukasiewicz and Rafal Bogacz , year=. 2202.04557 , archivePrefix=

work page arXiv
[37]

Physics of language models: Part 3.3, knowledge capacity scaling laws.arXiv preprint arXiv:2404.05405, 2024

Zeyuan Allen-Zhu and Yuanzhi Li , year=. 2404.05405 , archivePrefix=

work page arXiv
[38]

Physical Review Letters , publisher=

Lucibello, Carlo and M\'ezard, Marc , year=. Physical Review Letters , publisher=. doi:10.1103/PhysRevLett.132.077301 , pages =

work page doi:10.1103/physrevlett.132.077301
[39]

Frontiers in Computational Neuroscience , year =

Folli, Viola and Leonetti, Marco and Ruocco, Giancarlo , title =. Frontiers in Computational Neuroscience , year =. doi:10.3389/fncom.2016.00144 , issn =

work page doi:10.3389/fncom.2016.00144 2016
[40]

IEEE Transactions on Information Theory , year=

The capacity of the Hopfield associative memory , author=. IEEE Transactions on Information Theory , year=
[41]

2603.15923 , archivePrefix=

Nuri Mert Vural and Alberto Bietti and Mahdi Soltanolkotabi and Denny Wu , year=. 2603.15923 , archivePrefix=

work page arXiv
[42]

Candes and Pragya Sur , year=

Emmanuel J. Candes and Pragya Sur , year=. 1804.09753 , archivePrefix=

work page arXiv
[43]

Proceedings of the National Academy of Sciences , year=

Sur, Pragya and Cand. Proceedings of the National Academy of Sciences , year=
[44]

Proceedings of the National Academy of Sciences , year=

Barbier, Jean and Krzakala, Florent and Macris, Nicolas and Miolane, L. Proceedings of the National Academy of Sciences , year=
[45]

and Zdeborov

Aubin, Benjamin and Krzakala, Florent and Lu, Yue M. and Zdeborov. Advances in Neural Information Processing Systems , year=
[46]

International conference on machine learning , pages=

The role of regularization in classification of high-dimensional noisy gaussian mixture , author=. International conference on machine learning , pages=
[47]

2402.18724 , archivePrefix=

Vivien Cabannes and Berfin Simsek and Alberto Bietti , year=. 2402.18724 , archivePrefix=

work page arXiv
[48]

Alessio Giorlandino and Sebastian Goldt and Antoine Maillard , year=
[49]

1973 , volume=

Kohonen, Teuvo and Ruohonen, Matti , journal=. 1973 , volume=

1973
[50]

, journal=

Kosko, B. , journal=. 1988 , volume=

1988
[51]

Kanerva, Pentti , title =
[52]

Journal of Statistical Physics , volume=

On a model of associative memory with huge storage capacity , author=. Journal of Statistical Physics , volume=. 2017 , publisher=

2017