Minding language models’ (lack of) theory of mind: A plug-and-play multi-character belief tracker

Sclar, Melanie, Kumar, Sachin, West, Peter, Suhr, Alane, Choi, Yejin, Tsvetkov, Yulia · 2023 · DOI 10.18653/v1/2023.acl-long.780

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Bayesian Social Deduction with Graph-Informed Language Models

cs.AI · 2025-06-21 · unverdicted · novelty 7.0

Hybrid Bayesian-graph LLM agent reaches competitive performance against large models and achieves 67% win rate against humans in controlled Avalon play, outperforming baselines and human teammates.

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

cs.CL · 2026-06-04 · unverdicted · novelty 6.0

AURA improves implicit-need coverage by 0.07 over ReAct baselines on a 100-query benchmark by inserting an intent inference step controlled by a gap score, while cutting probes 82% on factual tasks.

PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.

citing papers explorer

Showing 3 of 3 citing papers.

Bayesian Social Deduction with Graph-Informed Language Models cs.AI · 2025-06-21 · unverdicted · none · ref 48
Hybrid Bayesian-graph LLM agent reaches competitive performance against large models and achieves 67% win rate against humans in controlled Avalon play, outperforming baselines and human teammates.
AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents cs.CL · 2026-06-04 · unverdicted · none · ref 54
AURA improves implicit-need coverage by 0.07 over ReAct baselines on a 100-query benchmark by inserting an intent inference step controlled by a gap score, while cutting probes 82% on factual tasks.
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking cs.CL · 2026-04-20 · unverdicted · none · ref 82
PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.

Minding language models’ (lack of) theory of mind: A plug-and-play multi-character belief tracker

fields

years

verdicts

representative citing papers

citing papers explorer