A new framework identifies four mental model discrepancy types in team dialogues and demonstrates they carry predictive signals for future misalignments via uniform-weighted historical counts.
Proceedings of the National Academy of Sciences , volume=
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Improvements in LLM Theory of Mind on static benchmarks do not reliably improve performance in dynamic, first-person human-AI interactions across goal-oriented and experience-oriented tasks.
Fine-tuning LLMs on structured tasks inspired by maladaptive behaviors produces stable, context-general shifts in next-token distributions and response tendencies consistent with altered behavioral priors.
LLMs generally fail to maintain stable worldviews under adversarial conversational pressure, indicating they lack core beliefs akin to those in human cognition.
citing papers explorer
-
Are you with me? A Framework for Detecting Mental Model Discrepancies in Task-Based Team Dialogues
A new framework identifies four mental model discrepancy types in team dialogues and demonstrates they carry predictive signals for future misalignments via uniform-weighted historical counts.
-
Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations
Improvements in LLM Theory of Mind on static benchmarks do not reliably improve performance in dynamic, first-person human-AI interactions across goal-oriented and experience-oriented tasks.
-
Modeling Pathology-Like Behavioral Patterns in Language Models Through Behavioral Fine-Tuning
Fine-tuning LLMs on structured tasks inspired by maladaptive behaviors produces stable, context-general shifts in next-token distributions and response tendencies consistent with altered behavioral priors.
-
Do LLMs have core beliefs?
LLMs generally fail to maintain stable worldviews under adversarial conversational pressure, indicating they lack core beliefs akin to those in human cognition.