pith. machine review for the scientific record. sign in

arxiv: 2601.09365 · v2 · submitted 2026-01-14 · 💻 cs.CL · cs.AI

Recognition: unknown

Frame of Reference: Addressing the Challenges of Common Ground Representation in Situational Dialogs

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords groundcommondialogssharedabilityconversationaldialogestablish
0
0 comments X
read the original abstract

Common ground plays a critical role in situated spoken dialogs, where interlocutors must establish and maintain shared references to entities, events, and relations to sustain coherent interaction in a shared space and over time. With the increasing presence of embodied conversational agents and social robots, the ability to correctly ground this kind of conversational content in order to refer back later also becomes important for dialog systems. Prior studies have demonstrated that LLMs are capable of performing certain grounding acts like acknowledgments. However, relatively little work has investigated their capacity to leverage the grounded information, like in complex scenarios involving space and time (e.g., "let's go to that caf\'e near the park we went to yesterday"). To that end, in this work, we evaluate a model's ability to establish common ground by utilizing these "relational references" in the dynamic and shared environments of situated dialogs. We then test multiple methods for representing common ground and further propose approaches to improve their performance by using reinforcement learning on our synthetically generated dialog data .

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue

    cs.CL 2026-04 unverdicted novelty 6.0

    Incremental visual scaffolding using multimodal models improves persistent common ground representation in situated dialogue by reducing representational blur compared to text-only approaches, with hybrid text-visual ...