Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.LG 3verdicts
UNVERDICTED 3representative citing papers
An entropy criterion on mean representations characterises the polarised regime in VAEs and related models, with theoretical links to KL minimisation and empirical tests across several architectures.
Sparse autoencoders applied to language model activations yield more interpretable and monosemantic features than alternative approaches, enabling finer causal analysis on the indirect object identification task.
citing papers explorer
-
Entropy-Based Characterisation of the Polarised Regime in Latent Variable Models
An entropy criterion on mean representations characterises the polarised regime in VAEs and related models, with theoretical links to KL minimisation and empirical tests across several architectures.