Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models , url=

Leask, Patrick, Nanda, Neel, Moubayed, Noura Al , year= · arXiv 2505.17769

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability

cs.LG · 2026-06-04 · conditional · novelty 7.0

SASA replaces single-vector decoders in SAEs with learned subspaces plus block sparsity and nuclear-norm regularization, proving that a single group becomes the global minimizer once block size meets intrinsic dimension and yielding polynomial rather than exponential sample complexity.

citing papers explorer

Showing 1 of 1 citing paper.

Subspace-Aware Sparse Autoencoders for Effective Mechanistic Interpretability cs.LG · 2026-06-04 · conditional · none · ref 31
SASA replaces single-vector decoders in SAEs with learned subspaces plus block sparsity and nuclear-norm regularization, proving that a single group becomes the global minimizer once block size meets intrinsic dimension and yielding polynomial rather than exponential sample complexity.

Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models , url=

fields

years

verdicts

representative citing papers

citing papers explorer