Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
arXiv preprint arXiv:2506.19708 , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Adapting diffusion models causes hidden damage to unrelated concepts detectable via sparse autoencoders and zero-shot classification, and DriftScope provides a prompt-level token-drift diagnostic.
citing papers explorer
-
DriftScope: Measuring The Hidden Effects of Diffusion Model Adaptation
Adapting diffusion models causes hidden damage to unrelated concepts detectable via sparse autoencoders and zero-shot classification, and DriftScope provides a prompt-level token-drift diagnostic.