Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues

· 2025 · cs.CL · arXiv 2510.19028

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

As LLMs are increasingly deployed in real-world interactions, their social reasoning in interpersonal communication becomes critical. To explore their capabilities, we introduce SCRIPTS, a 1.1k-dialogue dataset in English and Korean, sourced from movie scripts and propose a social reasoning task based on SCRIPTS that evaluates the capacity of LLMs to infer the social relationships (e.g., friends, lovers) between speakers in each dialogue. Evaluating nine models on our task, current LLMs achieve around 75--80% on the English dataset and 58--69% in Korean, and models predict an Unlikely relationship in 10--25% of responses in both languages. Furthermore, we find that thinking models and chain-of-thought prompting provide minimal benefits for social reasoning and occasionally amplify social biases. In sum, there are significant limitations in current LLMs' social reasoning capabilities, especially for Korean, highlighting the need for efforts to develop socially-aware LLMs across languages.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

LLMs exhibit pseudo-deliberation, with consistent value-action misalignment in generated dialogues despite reasoning, as measured by the new VALDI framework across 4941 scenarios.

LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control

cs.CL · 2026-05-20 · unverdicted · novelty 5.0

LoCar is a localization-aware evaluation framework for in-vehicle assistants that identifies unstable Korean honorific control and weaker performance on strategic metrics like clarification and proactivity in current LLMs.

citing papers explorer

Showing 2 of 2 citing papers.

Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions cs.CL · 2026-05-11 · unverdicted · none · ref 33 · internal anchor
LLMs exhibit pseudo-deliberation, with consistent value-action misalignment in generated dialogues despite reasoning, as measured by the new VALDI framework across 4941 scenarios.
LoCar: Localization-Aware Evaluation of In-Vehicle Assistants through Fine-Grained Sociolinguistic Control cs.CL · 2026-05-20 · unverdicted · none · ref 11 · internal anchor
LoCar is a localization-aware evaluation framework for in-vehicle assistants that identifies unstable Korean honorific control and weaker performance on strategic metrics like clarification and proactivity in current LLMs.

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer