Vision-language models overestimate common ground in asymmetric dialogues by treating map content as evidence of mutual understanding rather than tracking how grounding unfolds through interaction.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Introduces Image Reconstruction Game benchmark showing describer model dominates reconstruction quality in multi-turn VLM-generator dialogue, with math images hardest and token budget affecting convergence.
citing papers explorer
-
The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue
Introduces Image Reconstruction Game benchmark showing describer model dominates reconstruction quality in multi-turn VLM-generator dialogue, with math images hardest and token budget affecting convergence.