Expander SAEs apply left-d-regular expander masks to TopK SAEs, learning only dn decoder parameters instead of mn and tracing a storage-fidelity frontier that reaches 293x compression with 84% retained performance on Qwen2.5-3B.
From superposition to sparse codes: interpretable representations in neural networks , shorttitle =
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
years
2026 4representative citing papers
LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.
SRF factorizes similarity matrices into low-dimensional non-negative interpretable dimensions, shown to work on sparse data and match task-specific models across simulations and real datasets.
Introduces the Manifold Probe to discover representation manifolds in superposition and demonstrates causal steering on time concepts in Llama 2-7b.
citing papers explorer
-
When Does LeJEPA Learn a World Model?
LeJEPA achieves linear identifiability of latent variables uniquely when the latents are Gaussian in worlds with stationary additive-noise transitions.