Visual tokens enter VLMs as raw signals and are reshaped differently by in-context versus layer-injection paradigms, each capturing distinct frequency characteristics that drive task performance.
arXiv preprint arXiv:2508.20279 (2025)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
HONES ranks feed-forward neurons by their causal contributions from task-relevant attention heads and uses lightweight scaling to steer performance on multiple vision-language tasks.
citing papers explorer
-
The Hidden Evolution of Disguised Visual Context inside the VLM
Visual tokens enter VLMs as raw signals and are reshaped differently by in-context versus layer-injection paradigms, each capturing distinct frequency characteristics that drive task performance.
-
From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision-Language Models
HONES ranks feed-forward neurons by their causal contributions from task-relevant attention heads and uses lightweight scaling to steer performance on multiple vision-language tasks.