Introduces NCP-ExploreToM framework to evaluate LLMs on inducing belief states via planning and action, with GPT-5 succeeding on ~80% of tasks and outperforming humans.
arXiv preprint arXiv:2506.23046 , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
GroupToM-Bench is presented as the first multimodal benchmark for group-level Theory of Mind spanning micro BDI states to macro outcome prediction, with experiments showing current MLLMs lag human baselines on nonlinear social dynamics.
citing papers explorer
-
Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action
Introduces NCP-ExploreToM framework to evaluate LLMs on inducing belief states via planning and action, with GPT-5 succeeding on ~80% of tasks and outperforming humans.
-
GroupToM-Bench: Benchmarking Group Theory of Mind and Nonlinear Social Emergence in MLLMs
GroupToM-Bench is presented as the first multimodal benchmark for group-level Theory of Mind spanning micro BDI states to macro outcome prediction, with experiments showing current MLLMs lag human baselines on nonlinear social dynamics.