Revisiting the evaluation of theory of mind through question answering

Matthew Le, Y-Lan Boureau, Maximilian Nickel · 2019 · DOI 10.18653/v1/d19-1598

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Instructions Shape Production of Language, not Processing

cs.CL · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.

EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents

cs.AI · 2026-05-11 · conditional · novelty 6.0 · 2 refs

EnactToM is an evolving benchmark of embodied multi-agent tasks that tests functional Theory of Mind by requiring agents to act optimally on implicit beliefs in partially observable 3D environments.

PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.

CogniFold: Always-On Proactive Memory via Cognitive Folding

cs.AI · 2026-05-13

DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories

cs.CL · 2026-04-22

citing papers explorer

Showing 5 of 5 citing papers.

Instructions Shape Production of Language, not Processing cs.CL · 2026-05-11 · unverdicted · none · ref 32 · 2 links
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents cs.AI · 2026-05-11 · conditional · none · ref 5 · 2 links
EnactToM is an evolving benchmark of embodied multi-agent tasks that tests functional Theory of Mind by requiring agents to act optimally on implicit beliefs in partially observable 3D environments.
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking cs.CL · 2026-04-20 · unverdicted · none · ref 81
PDDL-Mind improves LLM accuracy on theory-of-mind benchmarks by over 5% by translating stories into verifiable PDDL states that decouple environment tracking from belief inference.
CogniFold: Always-On Proactive Memory via Cognitive Folding cs.AI · 2026-05-13 · unreviewed · ref 30
DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories cs.CL · 2026-04-22 · unreviewed · ref 23

Revisiting the evaluation of theory of mind through question answering

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer