LLMs beat humans and supervised models at next speaker prediction in meetings using only text, while multimodal LLMs improve on addressee and turn-change tasks but remain below human performance.
Multimodal conversation structure understanding,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Evaluating Large Language Models Abilities for Addressee, Turn-change, and Next Speaker Prediction in Meetings
LLMs beat humans and supervised models at next speaker prediction in meetings using only text, while multimodal LLMs improve on addressee and turn-change tasks but remain below human performance.