pith. machine review for the scientific record. sign in

arxiv: 2605.01192 · v1 · submitted 2026-05-02 · 💻 cs.LG · cs.IT· math.IT

Recognition: unknown

Linear-Readout Floors and Threshold Recovery in Computation in Superposition

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:15 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT
keywords computation in superpositionlinear readoutsthreshold recoveryWelch boundcross-talkcapacity regimesbiorthogonal systems
0
0 comments X

The pith

Two approaches to computation in superposition reach different capacities without contradiction because they preserve distinct interface invariants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recent methods for computation in superposition report different maximum numbers of features in width d. One uses an approximate-linear recursive template to certify tilde-O(d to the 3/2) features, while the other uses thresholded Boolean recovery to reach near-quadratic capacity up to logs. The paper shows these regimes are compatible by deriving a Welch-type lower bound on cross-talk for any unit-diagonal linear readout, which is Omega(d to the -1/2) for large feature counts and tight on average for tight frames. This bound matches the tolerance of the Hanni correction layer exactly at the d to the 3/2 scale, establishing that scale as a template-specific compatibility threshold rather than a universal limit. At quadratic load the linear method incurs Omega(s/d) squared error on sparse states while threshold recovery still succeeds for sparsities O(d/log d).

Core claim

The results are not contradictory because the methods maintain different interface invariants. A rank-trace Welch-type lower bound for biorthogonal linear readouts shows that worst-case off-diagonal cross-talk is Omega(d^{-1/2}) when F much greater than d, and this floor is achieved on average by unit-norm tight frames. Matching the floor to the published tolerance of the Hanni correction layer accounts for the d^{3/2} regime as a compatibility threshold for the approximate-linear template, while thresholded Boolean recovery evades the floor and reaches higher loads.

What carries the argument

The rank-trace Welch-type lower bound on worst-case off-diagonal cross-talk for any unit-diagonal linear readout of a biorthogonal system, which is Omega(d^{-1/2}) and tight on average for unit-norm tight frames.

If this is right

  • At quadratic feature load F equals d squared, random-support threshold recovery succeeds for sparsities s equals O(d/log d).
  • Linear readouts incur Omega(s/d) average per-coordinate squared error on Bernoulli sparse states at the same load.
  • The published tolerance of the Hanni correction layer aligns with the Welch floor precisely at the d to the 3/2 scale.
  • Robust nonlinear reset beyond the Hanni template remains an open question.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • An adaptive interface that switches between linear and thresholded recovery depending on load could combine the strengths of both regimes.
  • Relaxing the unit-diagonal or biorthogonal constraint on the readout might allow linear templates to exceed the Welch floor.
  • The same Welch analysis could be applied to other recovery mechanisms to map out a full spectrum of achievable capacity regimes.

Load-bearing premise

The two approaches maintain fundamentally different interface invariants that are preserved across the capacity regimes compared.

What would settle it

Direct computation of average per-coordinate squared error for linear readouts on Bernoulli sparse states at quadratic feature load F equals d squared, which should equal Omega(s/d) if the bound governs the Hanni regime.

read the original abstract

Two recent approaches to computation in superposition reach different recursive capacity regimes: H\"anni et al. certify $\tilde{O}(d^{3/2})$ computable features in width $d$ via an approximate-linear recursive template, while Adler and Shavit reach near-quadratic capacity (up to logarithmic factors) using thresholded Boolean recovery. The main contribution of this paper is conceptual: we argue these results are not contradictory because they maintain different interface invariants, and we formalize the distinction. As a tool, we record a rank-trace Welch-type lower bound for biorthogonal linear readouts: for $F \gg d$, the worst-case off-diagonal cross-talk of any unit-diagonal linear readout is $\Omega(d^{-1/2})$, and the bound is tight on average for unit-norm tight frames. At quadratic feature load $F=d^2$, random-support threshold recovery succeeds for sparsities $s=O(d/\log d)$, while linear readouts still incur $\Omega(s/d)$ average per-coordinate squared error on Bernoulli sparse states. Matching the Welch floor against the published tolerance of the H\"anni correction layer explains the $d^{3/2}$ scale as a compatibility threshold for that template, not a universal upper bound. Robust nonlinear reset beyond the H\"anni template is left open.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that the differing capacity regimes reported by Hanni et al. (tilde O(d^{3/2})) and Adler and Shavit (near-quadratic) for computation in superposition are compatible because they rely on distinct interface invariants: approximate-linear recursive template versus thresholded Boolean recovery. It supports this by deriving a rank-trace Welch-type lower bound on the off-diagonal cross-talk for biorthogonal linear readouts, showing Omega(d^{-1/2}) for large F, and contrasting the error behaviors at quadratic load. The d^{3/2} scale is attributed to matching the Welch floor to the Hanni layer's tolerance, leaving open the question of nonlinear reset.

Significance. This work is significant for reconciling apparent contradictions in the literature on superposition computation. By formalizing the distinction in recovery methods and providing a tool (the Welch bound) to analyze linear readouts, it offers a framework for understanding capacity limits as dependent on the interface rather than absolute. The explicit derivation of linear-readout error scaling from the bound and the identification of the Hanni template as the source of the d^{3/2} limit are valuable contributions. It encourages exploration of robust nonlinear methods.

major comments (1)
  1. [Abstract and Welch bound matching argument] The claim that matching the Welch floor against the published tolerance of the Hanni correction layer explains the d^{3/2} scale is load-bearing for the central contribution. However, the scaling step from the Omega(d^{-1/2}) cross-talk to the specific exponent is not explicitly derived or shown with error analysis in the manuscript, leaving the explanation of the exact threshold partial.
minor comments (1)
  1. [Abstract] The use of tilde O notation should be accompanied by a definition or reference to the suppressed factors for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of our work's significance in reconciling the Hanni et al. and Adler-Shavit capacity regimes through distinct interface invariants. We appreciate the identification of the Welch bound as a useful tool and address the major comment below. We will revise the manuscript to make the central scaling argument fully explicit.

read point-by-point responses
  1. Referee: [Abstract and Welch bound matching argument] The claim that matching the Welch floor against the published tolerance of the Hanni correction layer explains the d^{3/2} scale is load-bearing for the central contribution. However, the scaling step from the Omega(d^{-1/2}) cross-talk to the specific exponent is not explicitly derived or shown with error analysis in the manuscript, leaving the explanation of the exact threshold partial.

    Authors: We agree that the scaling from the Ω(d^{-1/2}) Welch crosstalk floor to the precise d^{3/2} exponent requires an explicit step-by-step derivation together with error analysis, and that the current manuscript presents this only at the level of the abstract summary. This is a fair observation. In the revised version we will insert a dedicated paragraph (in the discussion following the statement of the Welch bound) that derives the compatibility threshold: we will show how the additive error tolerance reported for the Hanni approximate-linear correction layer, when confronted with the worst-case off-diagonal crosstalk lower bound, limits reliable recursion to F = Õ(d^{3/2}) before the accumulated interference exceeds the layer's robustness. The added text will also contain a short propagation argument relating the per-coordinate squared error to the published tolerance, thereby making the exact threshold explicit. This change strengthens rather than alters the central conceptual claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper presents a rank-trace Welch-type lower bound as an independent tool derived from frame theory (not fitted to its own results) and matches it to the external published tolerance from Hanni et al. to explain the d^{3/2} scale as a template-specific threshold. The central claim distinguishes interface invariants (approximate-linear recursive vs. thresholded Boolean recovery) without any self-definitional reduction, fitted-input prediction, or load-bearing self-citation chain. All steps remain externally falsifiable and do not reduce to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The argument rests on standard frame-theory assumptions for tight frames and the preservation of distinct readout interfaces; no new free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Unit-norm tight frames achieve the average Welch bound on off-diagonal crosstalk for biorthogonal linear readouts
    Invoked to establish tightness of the Omega(d^{-1/2}) floor at quadratic feature load.

pith-pipeline@v0.9.0 · 5548 in / 1281 out tokens · 33352 ms · 2026-05-09T14:15:44.340171+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 12 canonical work pages · 6 internal anchors

  1. [1]

    Adler, N

    M. Adler, N. Shavit, On the complexity of neural computation in super- position, arXiv:2409.15318v3, 2026

  2. [2]

    Borobia, E

    H. Borobia, E. Seguí-Mas, G. Tormo-Carbó, How pruning reshapes fea- tures: Sparse autoencoder analysis of weight-pruned language models, arXiv:2603.25325, 2026

  3. [3]

    Bricken, A

    T. Bricken, A. Templeton, J. Batson, et al., Towards monosemanticity: Decomposing language models with dictionary learning, Transformer Circuits Thread, Anthropic, 2023

  4. [4]

    Sparse Autoencoders Find Highly Interpretable Features in Language Models

    H. Cunningham, A. Ewart, L. Riggs, R. Huben, L. Sharkey, Sparse autoencoders find highly interpretable features in language models, arXiv:2309.08600, 2023

  5. [5]

    Elhage, T

    N. Elhage, T. Hume, C. Olsson, et al., Toy models of superposition, Transformer Circuits Thread, Anthropic, 2022

  6. [6]

    L. Gao, T. Dupré la Tour, H. Tillman, et al., Scaling and evaluating sparse autoencoders, arXiv:2406.04093, 2024

  7. [7]

    Hänni, J

    K. Hänni, J. Mendel, D. Vaintrob, L. Chan, Mathematical models of computation in superposition, in: ICML 2024 Workshop on Mechanistic Interpretability, 2024. arXiv:2408.05451. 34

  8. [8]

    Ivanov, N

    G. Ivanov, N. Oozeer, S. Raval, T. Pejovic, S. Upadhyay, A. Abdullah, Spectral superposition: A theory of feature geometry, arXiv:2602.02224, 2026

  9. [9]

    W. B. Johnson, J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, in: Contemporary Mathematics, vol. 26, AMS, 1984, pp. 189–206

  10. [10]

    Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

    T. Lieberum, S. Rajamanoharan, et al., Gemma Scope: Open sparse autoencoders everywhere all at once on Gemma 2, arXiv:2408.05147, 2024

  11. [11]

    Y. Liu, Z. Liu, J. Gore, Superposition yields robust neural scaling, arXiv:2505.10465, 2025

  12. [12]

    E. J. Michaud, L. Gorton, T. McGrath, Understanding sparse autoen- coder scaling in the presence of feature manifolds, arXiv:2509.02565, 2025

  13. [13]

    Prieto, E

    L. Prieto, E. Stevinson, M. Barsbey, T. Birdal, P. A. M. Mediano, From data statistics to feature geometry: How correlations shape superposi- tion, arXiv:2603.09972, 2026

  14. [14]

    Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

    N. Sarkar, D. J. Deka, Geometric limits of knowledge distillation: A minimum-width theorem via superposition theory, arXiv:2604.04037, 2026

  15. [15]

    Open Problems in Mechanistic Interpretability

    L. Sharkey, B. Chughtai, J. Batson, et al., Open problems in mechanistic interpretability, arXiv:2501.16496, 2025

  16. [16]

    Templeton, T

    A. Templeton, T. Conerly, J. Marcus, et al., Scaling monosemanticity: Extracting interpretable features from Claude 3 Sonnet, Transformer Circuits Thread, Anthropic, 2024

  17. [17]

    L. R. Welch, Lower bounds on the maximum cross correlation of signals, IEEE Trans. Inform. Theory 20 (3) (1974) 397–399. 35 Table A.1: Observed SAE dictionary sizes vs. reference capacity scales. Modeld F obs F/d d 3/2 d2/lnd F/d 3/2 Gemma 3 1B 1,152 9,216 8.0 39,100 188,200 0.24 Gemma 2 2B 2,304 18,432 8.0 110,592 686,000 0.17 Llama 3.2 1B 2,048 16,384 8...