← back to paper
arxiv: 2605.10815 · 2 revisions
Probing Cross-modal Information Hubs in Audio-Visual LLMs