arxiv: 2605.01192 · v1 · submitted 2026-05-02 · 💻 cs.LG · cs.IT· math.IT

Recognition: unknown

Linear-Readout Floors and Threshold Recovery in Computation in Superposition

Hector Borobia , Elies Segu\'i-Mas , Guillermina Tormo-Carb\'o

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:15 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords computation in superpositionlinear readoutsthreshold recoveryWelch boundcross-talkcapacity regimesbiorthogonal systems

0 comments

The pith

Two approaches to computation in superposition reach different capacities without contradiction because they preserve distinct interface invariants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recent methods for computation in superposition report different maximum numbers of features in width d. One uses an approximate-linear recursive template to certify tilde-O(d to the 3/2) features, while the other uses thresholded Boolean recovery to reach near-quadratic capacity up to logs. The paper shows these regimes are compatible by deriving a Welch-type lower bound on cross-talk for any unit-diagonal linear readout, which is Omega(d to the -1/2) for large feature counts and tight on average for tight frames. This bound matches the tolerance of the Hanni correction layer exactly at the d to the 3/2 scale, establishing that scale as a template-specific compatibility threshold rather than a universal limit. At quadratic load the linear method incurs Omega(s/d) squared error on sparse states while threshold recovery still succeeds for sparsities O(d/log d).

Core claim

The results are not contradictory because the methods maintain different interface invariants. A rank-trace Welch-type lower bound for biorthogonal linear readouts shows that worst-case off-diagonal cross-talk is Omega(d^{-1/2}) when F much greater than d, and this floor is achieved on average by unit-norm tight frames. Matching the floor to the published tolerance of the Hanni correction layer accounts for the d^{3/2} regime as a compatibility threshold for the approximate-linear template, while thresholded Boolean recovery evades the floor and reaches higher loads.

What carries the argument

The rank-trace Welch-type lower bound on worst-case off-diagonal cross-talk for any unit-diagonal linear readout of a biorthogonal system, which is Omega(d^{-1/2}) and tight on average for unit-norm tight frames.

If this is right

At quadratic feature load F equals d squared, random-support threshold recovery succeeds for sparsities s equals O(d/log d).
Linear readouts incur Omega(s/d) average per-coordinate squared error on Bernoulli sparse states at the same load.
The published tolerance of the Hanni correction layer aligns with the Welch floor precisely at the d to the 3/2 scale.
Robust nonlinear reset beyond the Hanni template remains an open question.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

An adaptive interface that switches between linear and thresholded recovery depending on load could combine the strengths of both regimes.
Relaxing the unit-diagonal or biorthogonal constraint on the readout might allow linear templates to exceed the Welch floor.
The same Welch analysis could be applied to other recovery mechanisms to map out a full spectrum of achievable capacity regimes.

Load-bearing premise

The two approaches maintain fundamentally different interface invariants that are preserved across the capacity regimes compared.

What would settle it

Direct computation of average per-coordinate squared error for linear readouts on Bernoulli sparse states at quadratic feature load F equals d squared, which should equal Omega(s/d) if the bound governs the Hanni regime.

read the original abstract

Two recent approaches to computation in superposition reach different recursive capacity regimes: H\"anni et al. certify $\tilde{O}(d^{3/2})$ computable features in width $d$ via an approximate-linear recursive template, while Adler and Shavit reach near-quadratic capacity (up to logarithmic factors) using thresholded Boolean recovery. The main contribution of this paper is conceptual: we argue these results are not contradictory because they maintain different interface invariants, and we formalize the distinction. As a tool, we record a rank-trace Welch-type lower bound for biorthogonal linear readouts: for $F \gg d$, the worst-case off-diagonal cross-talk of any unit-diagonal linear readout is $\Omega(d^{-1/2})$, and the bound is tight on average for unit-norm tight frames. At quadratic feature load $F=d^2$, random-support threshold recovery succeeds for sparsities $s=O(d/\log d)$, while linear readouts still incur $\Omega(s/d)$ average per-coordinate squared error on Bernoulli sparse states. Matching the Welch floor against the published tolerance of the H\"anni correction layer explains the $d^{3/2}$ scale as a compatibility threshold for that template, not a universal upper bound. Robust nonlinear reset beyond the H\"anni template is left open.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reconciles the d^{3/2} and near-quadratic capacity results by tying the former to linear-readout crosstalk floors specific to the Hanni template rather than a universal limit.

read the letter

This paper's main contribution is showing that the two recent capacity results for computation in superposition are compatible because they operate under different interface invariants: Hanni et al. use an approximate-linear recursive template that hits a crosstalk floor, while Adler and Shavit use thresholded Boolean recovery that preserves a different one. The d^{3/2} scale comes from matching the linear readout error to the published tolerance of the Hanni correction layer, not from any hard barrier on superposition itself. They leave nonlinear reset as an open question, which keeps the claim proportionate. What the paper does well is record a rank-trace Welch-type lower bound for biorthogonal linear readouts: for F much larger than d, any unit-diagonal linear readout has worst-case off-diagonal crosstalk Omega(d^{-1/2}), and this is tight on average for unit-norm tight frames. They then apply it to show that at quadratic load F=d^2, linear readouts still suffer Omega(s/d) average squared error on Bernoulli sparse states, while random-support threshold recovery works for s=O(d/log d). This is a useful tool from frame theory that directly explains the scaling difference without internal contradiction or parameter fitting. The argument about preserved invariants across regimes looks clean and avoids circularity by treating the Welch bound as an independent lower bound. Soft spots are limited. The full derivation and tightness argument for the bound are stated in the abstract but would benefit from more explicit steps and error analysis in the main text to make verification straightforward. The reconciliation also rests on the invariants holding up under the compared regimes, which seems plausible but could use one more formal check against edge cases. Overall this is for people working on superposition in neural architectures or related coding questions. It shows clear thinking and honest engagement with the priors. I would bring it to a reading group and recommend sending it to peer review; the new bound and the non-contradiction framing are worth referee time even if revisions are needed on the details.

Referee Report

1 major / 1 minor

Summary. The paper claims that the differing capacity regimes reported by Hanni et al. (tilde O(d^{3/2})) and Adler and Shavit (near-quadratic) for computation in superposition are compatible because they rely on distinct interface invariants: approximate-linear recursive template versus thresholded Boolean recovery. It supports this by deriving a rank-trace Welch-type lower bound on the off-diagonal cross-talk for biorthogonal linear readouts, showing Omega(d^{-1/2}) for large F, and contrasting the error behaviors at quadratic load. The d^{3/2} scale is attributed to matching the Welch floor to the Hanni layer's tolerance, leaving open the question of nonlinear reset.

Significance. This work is significant for reconciling apparent contradictions in the literature on superposition computation. By formalizing the distinction in recovery methods and providing a tool (the Welch bound) to analyze linear readouts, it offers a framework for understanding capacity limits as dependent on the interface rather than absolute. The explicit derivation of linear-readout error scaling from the bound and the identification of the Hanni template as the source of the d^{3/2} limit are valuable contributions. It encourages exploration of robust nonlinear methods.

major comments (1)

[Abstract and Welch bound matching argument] The claim that matching the Welch floor against the published tolerance of the Hanni correction layer explains the d^{3/2} scale is load-bearing for the central contribution. However, the scaling step from the Omega(d^{-1/2}) cross-talk to the specific exponent is not explicitly derived or shown with error analysis in the manuscript, leaving the explanation of the exact threshold partial.

minor comments (1)

[Abstract] The use of tilde O notation should be accompanied by a definition or reference to the suppressed factors for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our work's significance in reconciling the Hanni et al. and Adler-Shavit capacity regimes through distinct interface invariants. We appreciate the identification of the Welch bound as a useful tool and address the major comment below. We will revise the manuscript to make the central scaling argument fully explicit.

read point-by-point responses

Referee: [Abstract and Welch bound matching argument] The claim that matching the Welch floor against the published tolerance of the Hanni correction layer explains the d^{3/2} scale is load-bearing for the central contribution. However, the scaling step from the Omega(d^{-1/2}) cross-talk to the specific exponent is not explicitly derived or shown with error analysis in the manuscript, leaving the explanation of the exact threshold partial.

Authors: We agree that the scaling from the Ω(d^{-1/2}) Welch crosstalk floor to the precise d^{3/2} exponent requires an explicit step-by-step derivation together with error analysis, and that the current manuscript presents this only at the level of the abstract summary. This is a fair observation. In the revised version we will insert a dedicated paragraph (in the discussion following the statement of the Welch bound) that derives the compatibility threshold: we will show how the additive error tolerance reported for the Hanni approximate-linear correction layer, when confronted with the worst-case off-diagonal crosstalk lower bound, limits reliable recursion to F = Õ(d^{3/2}) before the accumulated interference exceeds the layer's robustness. The added text will also contain a short propagation argument relating the per-coordinate squared error to the published tolerance, thereby making the exact threshold explicit. This change strengthens rather than alters the central conceptual claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper presents a rank-trace Welch-type lower bound as an independent tool derived from frame theory (not fitted to its own results) and matches it to the external published tolerance from Hanni et al. to explain the d^{3/2} scale as a template-specific threshold. The central claim distinguishes interface invariants (approximate-linear recursive vs. thresholded Boolean recovery) without any self-definitional reduction, fitted-input prediction, or load-bearing self-citation chain. All steps remain externally falsifiable and do not reduce to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The argument rests on standard frame-theory assumptions for tight frames and the preservation of distinct readout interfaces; no new free parameters or invented entities are introduced.

axioms (1)

domain assumption Unit-norm tight frames achieve the average Welch bound on off-diagonal crosstalk for biorthogonal linear readouts
Invoked to establish tightness of the Omega(d^{-1/2}) floor at quadratic feature load.

pith-pipeline@v0.9.0 · 5548 in / 1281 out tokens · 33352 ms · 2026-05-09T14:15:44.340171+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 12 canonical work pages · 6 internal anchors

[1]

Adler, N

M. Adler, N. Shavit, On the complexity of neural computation in super- position, arXiv:2409.15318v3, 2026

work page arXiv 2026
[2]

Borobia, E

H. Borobia, E. Seguí-Mas, G. Tormo-Carbó, How pruning reshapes fea- tures: Sparse autoencoder analysis of weight-pruned language models, arXiv:2603.25325, 2026

work page arXiv 2026
[3]

Bricken, A

T. Bricken, A. Templeton, J. Batson, et al., Towards monosemanticity: Decomposing language models with dictionary learning, Transformer Circuits Thread, Anthropic, 2023

2023
[4]

Sparse Autoencoders Find Highly Interpretable Features in Language Models

H. Cunningham, A. Ewart, L. Riggs, R. Huben, L. Sharkey, Sparse autoencoders find highly interpretable features in language models, arXiv:2309.08600, 2023

work page internal anchor Pith review arXiv 2023
[5]

Elhage, T

N. Elhage, T. Hume, C. Olsson, et al., Toy models of superposition, Transformer Circuits Thread, Anthropic, 2022

2022
[6]

L. Gao, T. Dupré la Tour, H. Tillman, et al., Scaling and evaluating sparse autoencoders, arXiv:2406.04093, 2024

work page internal anchor Pith review arXiv 2024
[7]

Hänni, J

K. Hänni, J. Mendel, D. Vaintrob, L. Chan, Mathematical models of computation in superposition, in: ICML 2024 Workshop on Mechanistic Interpretability, 2024. arXiv:2408.05451. 34

work page arXiv 2024
[8]

Ivanov, N

G. Ivanov, N. Oozeer, S. Raval, T. Pejovic, S. Upadhyay, A. Abdullah, Spectral superposition: A theory of feature geometry, arXiv:2602.02224, 2026

work page arXiv 2026
[9]

W. B. Johnson, J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, in: Contemporary Mathematics, vol. 26, AMS, 1984, pp. 189–206

1984
[10]

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

T. Lieberum, S. Rajamanoharan, et al., Gemma Scope: Open sparse autoencoders everywhere all at once on Gemma 2, arXiv:2408.05147, 2024

work page internal anchor Pith review arXiv 2024
[11]

Y. Liu, Z. Liu, J. Gore, Superposition yields robust neural scaling, arXiv:2505.10465, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

E. J. Michaud, L. Gorton, T. McGrath, Understanding sparse autoen- coder scaling in the presence of feature manifolds, arXiv:2509.02565, 2025

work page arXiv 2025
[13]

Prieto, E

L. Prieto, E. Stevinson, M. Barsbey, T. Birdal, P. A. M. Mediano, From data statistics to feature geometry: How correlations shape superposi- tion, arXiv:2603.09972, 2026

work page arXiv 2026
[14]

Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

N. Sarkar, D. J. Deka, Geometric limits of knowledge distillation: A minimum-width theorem via superposition theory, arXiv:2604.04037, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[15]

Open Problems in Mechanistic Interpretability

L. Sharkey, B. Chughtai, J. Batson, et al., Open problems in mechanistic interpretability, arXiv:2501.16496, 2025

work page internal anchor Pith review arXiv 2025
[16]

Templeton, T

A. Templeton, T. Conerly, J. Marcus, et al., Scaling monosemanticity: Extracting interpretable features from Claude 3 Sonnet, Transformer Circuits Thread, Anthropic, 2024

2024
[17]

L. R. Welch, Lower bounds on the maximum cross correlation of signals, IEEE Trans. Inform. Theory 20 (3) (1974) 397–399. 35 Table A.1: Observed SAE dictionary sizes vs. reference capacity scales. Modeld F obs F/d d 3/2 d2/lnd F/d 3/2 Gemma 3 1B 1,152 9,216 8.0 39,100 188,200 0.24 Gemma 2 2B 2,304 18,432 8.0 110,592 686,000 0.17 Llama 3.2 1B 2,048 16,384 8...

1974