Recognition: unknown
Linear-Readout Floors and Threshold Recovery in Computation in Superposition
Pith reviewed 2026-05-09 14:15 UTC · model grok-4.3
The pith
Two approaches to computation in superposition reach different capacities without contradiction because they preserve distinct interface invariants.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The results are not contradictory because the methods maintain different interface invariants. A rank-trace Welch-type lower bound for biorthogonal linear readouts shows that worst-case off-diagonal cross-talk is Omega(d^{-1/2}) when F much greater than d, and this floor is achieved on average by unit-norm tight frames. Matching the floor to the published tolerance of the Hanni correction layer accounts for the d^{3/2} regime as a compatibility threshold for the approximate-linear template, while thresholded Boolean recovery evades the floor and reaches higher loads.
What carries the argument
The rank-trace Welch-type lower bound on worst-case off-diagonal cross-talk for any unit-diagonal linear readout of a biorthogonal system, which is Omega(d^{-1/2}) and tight on average for unit-norm tight frames.
If this is right
- At quadratic feature load F equals d squared, random-support threshold recovery succeeds for sparsities s equals O(d/log d).
- Linear readouts incur Omega(s/d) average per-coordinate squared error on Bernoulli sparse states at the same load.
- The published tolerance of the Hanni correction layer aligns with the Welch floor precisely at the d to the 3/2 scale.
- Robust nonlinear reset beyond the Hanni template remains an open question.
Where Pith is reading between the lines
- An adaptive interface that switches between linear and thresholded recovery depending on load could combine the strengths of both regimes.
- Relaxing the unit-diagonal or biorthogonal constraint on the readout might allow linear templates to exceed the Welch floor.
- The same Welch analysis could be applied to other recovery mechanisms to map out a full spectrum of achievable capacity regimes.
Load-bearing premise
The two approaches maintain fundamentally different interface invariants that are preserved across the capacity regimes compared.
What would settle it
Direct computation of average per-coordinate squared error for linear readouts on Bernoulli sparse states at quadratic feature load F equals d squared, which should equal Omega(s/d) if the bound governs the Hanni regime.
read the original abstract
Two recent approaches to computation in superposition reach different recursive capacity regimes: H\"anni et al. certify $\tilde{O}(d^{3/2})$ computable features in width $d$ via an approximate-linear recursive template, while Adler and Shavit reach near-quadratic capacity (up to logarithmic factors) using thresholded Boolean recovery. The main contribution of this paper is conceptual: we argue these results are not contradictory because they maintain different interface invariants, and we formalize the distinction. As a tool, we record a rank-trace Welch-type lower bound for biorthogonal linear readouts: for $F \gg d$, the worst-case off-diagonal cross-talk of any unit-diagonal linear readout is $\Omega(d^{-1/2})$, and the bound is tight on average for unit-norm tight frames. At quadratic feature load $F=d^2$, random-support threshold recovery succeeds for sparsities $s=O(d/\log d)$, while linear readouts still incur $\Omega(s/d)$ average per-coordinate squared error on Bernoulli sparse states. Matching the Welch floor against the published tolerance of the H\"anni correction layer explains the $d^{3/2}$ scale as a compatibility threshold for that template, not a universal upper bound. Robust nonlinear reset beyond the H\"anni template is left open.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the differing capacity regimes reported by Hanni et al. (tilde O(d^{3/2})) and Adler and Shavit (near-quadratic) for computation in superposition are compatible because they rely on distinct interface invariants: approximate-linear recursive template versus thresholded Boolean recovery. It supports this by deriving a rank-trace Welch-type lower bound on the off-diagonal cross-talk for biorthogonal linear readouts, showing Omega(d^{-1/2}) for large F, and contrasting the error behaviors at quadratic load. The d^{3/2} scale is attributed to matching the Welch floor to the Hanni layer's tolerance, leaving open the question of nonlinear reset.
Significance. This work is significant for reconciling apparent contradictions in the literature on superposition computation. By formalizing the distinction in recovery methods and providing a tool (the Welch bound) to analyze linear readouts, it offers a framework for understanding capacity limits as dependent on the interface rather than absolute. The explicit derivation of linear-readout error scaling from the bound and the identification of the Hanni template as the source of the d^{3/2} limit are valuable contributions. It encourages exploration of robust nonlinear methods.
major comments (1)
- [Abstract and Welch bound matching argument] The claim that matching the Welch floor against the published tolerance of the Hanni correction layer explains the d^{3/2} scale is load-bearing for the central contribution. However, the scaling step from the Omega(d^{-1/2}) cross-talk to the specific exponent is not explicitly derived or shown with error analysis in the manuscript, leaving the explanation of the exact threshold partial.
minor comments (1)
- [Abstract] The use of tilde O notation should be accompanied by a definition or reference to the suppressed factors for clarity.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work's significance in reconciling the Hanni et al. and Adler-Shavit capacity regimes through distinct interface invariants. We appreciate the identification of the Welch bound as a useful tool and address the major comment below. We will revise the manuscript to make the central scaling argument fully explicit.
read point-by-point responses
-
Referee: [Abstract and Welch bound matching argument] The claim that matching the Welch floor against the published tolerance of the Hanni correction layer explains the d^{3/2} scale is load-bearing for the central contribution. However, the scaling step from the Omega(d^{-1/2}) cross-talk to the specific exponent is not explicitly derived or shown with error analysis in the manuscript, leaving the explanation of the exact threshold partial.
Authors: We agree that the scaling from the Ω(d^{-1/2}) Welch crosstalk floor to the precise d^{3/2} exponent requires an explicit step-by-step derivation together with error analysis, and that the current manuscript presents this only at the level of the abstract summary. This is a fair observation. In the revised version we will insert a dedicated paragraph (in the discussion following the statement of the Welch bound) that derives the compatibility threshold: we will show how the additive error tolerance reported for the Hanni approximate-linear correction layer, when confronted with the worst-case off-diagonal crosstalk lower bound, limits reliable recursion to F = Õ(d^{3/2}) before the accumulated interference exceeds the layer's robustness. The added text will also contain a short propagation argument relating the per-coordinate squared error to the published tolerance, thereby making the exact threshold explicit. This change strengthens rather than alters the central conceptual claim. revision: yes
Circularity Check
No significant circularity; derivation self-contained
full rationale
The paper presents a rank-trace Welch-type lower bound as an independent tool derived from frame theory (not fitted to its own results) and matches it to the external published tolerance from Hanni et al. to explain the d^{3/2} scale as a template-specific threshold. The central claim distinguishes interface invariants (approximate-linear recursive vs. thresholded Boolean recovery) without any self-definitional reduction, fitted-input prediction, or load-bearing self-citation chain. All steps remain externally falsifiable and do not reduce to the paper's own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Unit-norm tight frames achieve the average Welch bound on off-diagonal crosstalk for biorthogonal linear readouts
Reference graph
Works this paper leans on
- [1]
-
[2]
H. Borobia, E. Seguí-Mas, G. Tormo-Carbó, How pruning reshapes fea- tures: Sparse autoencoder analysis of weight-pruned language models, arXiv:2603.25325, 2026
-
[3]
Bricken, A
T. Bricken, A. Templeton, J. Batson, et al., Towards monosemanticity: Decomposing language models with dictionary learning, Transformer Circuits Thread, Anthropic, 2023
2023
-
[4]
Sparse Autoencoders Find Highly Interpretable Features in Language Models
H. Cunningham, A. Ewart, L. Riggs, R. Huben, L. Sharkey, Sparse autoencoders find highly interpretable features in language models, arXiv:2309.08600, 2023
work page internal anchor Pith review arXiv 2023
-
[5]
Elhage, T
N. Elhage, T. Hume, C. Olsson, et al., Toy models of superposition, Transformer Circuits Thread, Anthropic, 2022
2022
-
[6]
L. Gao, T. Dupré la Tour, H. Tillman, et al., Scaling and evaluating sparse autoencoders, arXiv:2406.04093, 2024
work page internal anchor Pith review arXiv 2024
- [7]
- [8]
-
[9]
W. B. Johnson, J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, in: Contemporary Mathematics, vol. 26, AMS, 1984, pp. 189–206
1984
-
[10]
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
T. Lieberum, S. Rajamanoharan, et al., Gemma Scope: Open sparse autoencoders everywhere all at once on Gemma 2, arXiv:2408.05147, 2024
work page internal anchor Pith review arXiv 2024
-
[11]
Y. Liu, Z. Liu, J. Gore, Superposition yields robust neural scaling, arXiv:2505.10465, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [12]
- [13]
-
[14]
Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory
N. Sarkar, D. J. Deka, Geometric limits of knowledge distillation: A minimum-width theorem via superposition theory, arXiv:2604.04037, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[15]
Open Problems in Mechanistic Interpretability
L. Sharkey, B. Chughtai, J. Batson, et al., Open problems in mechanistic interpretability, arXiv:2501.16496, 2025
work page internal anchor Pith review arXiv 2025
-
[16]
Templeton, T
A. Templeton, T. Conerly, J. Marcus, et al., Scaling monosemanticity: Extracting interpretable features from Claude 3 Sonnet, Transformer Circuits Thread, Anthropic, 2024
2024
-
[17]
L. R. Welch, Lower bounds on the maximum cross correlation of signals, IEEE Trans. Inform. Theory 20 (3) (1974) 397–399. 35 Table A.1: Observed SAE dictionary sizes vs. reference capacity scales. Modeld F obs F/d d 3/2 d2/lnd F/d 3/2 Gemma 3 1B 1,152 9,216 8.0 39,100 188,200 0.24 Gemma 2 2B 2,304 18,432 8.0 110,592 686,000 0.17 Llama 3.2 1B 2,048 16,384 8...
1974
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.