Top-k logits in vision-language models leak task-irrelevant image information at levels comparable to tuned-lens projections of the residual stream.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
What do your logits know? (The answer may surprise you!)
Top-k logits in vision-language models leak task-irrelevant image information at levels comparable to tuned-lens projections of the residual stream.