arxiv: 2605.12745 · v1 · submitted 2026-05-12 · 💻 cs.HC · cs.AI

Recognition: no theorem link

What Do You Think I Think? Accounting for Human Beliefs Using Second-Order Theory of Mind

Patrick Callaghan , Reid Simmons , Henny Admoni

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:33 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords second-order theory of mindI-POMDPcognitive biases and heuristicshuman-agent interactionuser studyfeedback generationbelief modeling

0 comments

The pith

An agent using second-order theory of mind can detect and correct for humans' mistaken beliefs about the agent's own knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that extending the I-POMDP framework to second-order theory of mind lets an agent track how a person's beliefs about the agent's knowledge evolve and what cognitive biases and heuristics drive any errors in those beliefs. Once detected, the agent can generate targeted feedback that accounts for those biases during interaction. An in-person user study found that this ToM-2 capability made teacher actions significantly more informative and that people judged the resulting feedback more useful than feedback from agents lacking the second-order model.

Core claim

Using the I-POMDP as a framework for a second-order Theory of Mind, this work endows an agent with the ability to model the evolution of a person's erroneous beliefs about an agent and the cognitive biases and heuristics from which they arise, enabling the agent to detect when such biases might be at play and adaptively generate feedback that accounts for them.

What carries the argument

I-POMDP extended to second-order Theory of Mind (ToM-2), which tracks a human's beliefs about the agent's knowledge state together with the cognitive biases and heuristics that shape those beliefs.

Load-bearing premise

That the I-POMDP framework, when extended to second-order beliefs, can accurately capture and detect the evolution of a person's erroneous beliefs about the agent together with the specific cognitive biases and heuristics driving them.

What would settle it

A controlled user study that finds no significant increase in teacher-action informativeness when the agent uses the ToM-2 model compared with a first-order baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.12745 by Henny Admoni, Patrick Callaghan, Reid Simmons.

**Figure 2.** Figure 2: Proof-of-Concept Domain: Multi-featured cards are sorted into bins according to a rule. With this formulation, we conduct an in-person user study and showcase: • A novel I-POMDP-based framework that enables an agent to approximate when and how CBH affects a human’s behavior and beliefs; • This model enables an agent to generate feedback that accounts for a person’s CBH upon being detected; and • Empirica… view at source ↗

**Figure 3.** Figure 3: Mean Relative Information Gain: The IG of the card placed by the human normalized by the IG of that time step’s optimal card. Left: When holding feedback content constant, the ToM-0 and ToM-2 learners elicit more informative teaching due to the ToM-2 learner’s adaptive feedback process. Right: The content of the ToM-2 learner’s two-statement feedback elicits more informative teaching [PITH_FULL_IMAGE:figu… view at source ↗

**Figure 4.** Figure 4: Mean Teacher Action Count: ToM-2 learners tend to elicit fewer teacher actions before teachers successfully end the session. Marginal significance indicated by +. (M2), deliberation time (M4), or total teaching session duration (M5), suggesting participants often identified a sufficient teaching strategy regardless of the learner they taught. Given the domain’s simplicity, this result is hardly surprising… view at source ↗

read the original abstract

Discrepancies between an agent's actual knowledge and what a person thinks the agent knows can hinder interactions. If an agent could detect such discrepancies, it could provide feedback to account for them and improve current and future interactions. Using the I-POMDP as a framework for a second-order Theory of Mind (ToM-2), this work endows an agent with the ability to model the evolution of a person's erroneous beliefs about an agent and the cognitive biases and heuristics (CBH) from which they arise. In doing so, the agent can detect when CBH might be at play during an interaction and adaptively generate feedback that accounts for them. An in-person user study shows how a ToM-2 learner can account for the effects of a teacher's CBH to significantly improve the informativeness of teacher actions, and subjective results suggest people find the ToM-2 learner's feedback more useful.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper extends I-POMDP to track second-order beliefs shaped by cognitive biases and heuristics, then uses that for adaptive feedback, with a user study claiming gains in informativeness.

read the letter

The main point is a straightforward extension of the I-POMDP framework to second-order theory of mind. The agent models not only what a person believes about its knowledge but also how cognitive biases and heuristics might be causing those beliefs to drift, then generates feedback that accounts for the drift. That combination is the actual new piece here, and it is a reasonable way to make the model more useful in collaborative settings. The in-person user study is presented as evidence that this leads to more informative teacher actions and better subjective ratings from participants. That is the part that could matter for applied work. The modeling itself stays grounded in the existing I-POMDP structure without adding free parameters or invented entities, which keeps it clean. The soft spots are in the empirical side. The abstract reports significant improvement but gives no sample size, no description of controls, no statistical tests, and no exclusion rules. Without those, it is difficult to tell whether the reported gains come from the bias-specific modeling or simply from having any second-order belief tracking at all. The stress-test point about validating that the inferred biases actually match participants' real heuristics is worth checking in the methods; if that check is missing or weak, the central claim loses force. This is the kind of paper that would interest people working on human-AI collaboration and practical ToM implementations. A reader who wants a concrete mechanism for handling belief mismatches would get something usable from the framework and study design. It deserves a serious referee to look at the full methods, statistics, and any bias-validation steps, even if the paper needs tightening on reporting.

Referee Report

2 major / 1 minor

Summary. The paper proposes extending the I-POMDP framework to a second-order Theory of Mind (ToM-2) model that tracks the evolution of a person's erroneous beliefs about an agent's knowledge and attributes discrepancies to specific cognitive biases and heuristics (CBH). It claims that an agent using this model can detect CBH during interactions and generate adaptive feedback, with an in-person user study demonstrating significantly improved informativeness of teacher actions and higher subjective usefulness ratings compared to baselines.

Significance. If the user-study results and CBH attribution hold under proper validation, the work could advance human-AI interaction design by enabling agents to address human misconceptions in real time, with applications in tutoring systems, collaborative robotics, and assistive technologies. The integration of CBH into an established I-POMDP belief-update structure offers a concrete, falsifiable mechanism for second-order belief modeling that goes beyond standard Bayesian updates.

major comments (2)

[User Study] User Study section: The abstract and results claim statistically significant improvement in teacher-action informativeness and positive subjective ratings, yet no sample size, control conditions, statistical tests, exclusion criteria, or power analysis are reported. Without these, the central claim that ToM-2 accounting for CBH drives the gains cannot be evaluated.
[Modeling Approach] Modeling section (I-POMDP extension): Standard I-POMDP belief updates are Bayesian and do not encode non-Bayesian CBH such as anchoring or confirmation bias. The manuscript must specify how custom observation/transition functions inject these biases, report the fitting procedure from interaction data, and provide targeted validation (e.g., alignment of inferred CBH labels with participants' self-reports or controlled bias-induction trials) to show that performance gains are CBH-specific rather than generic second-order tracking.

minor comments (1)

[Notation] Notation for second-order belief states and CBH parameters should be defined explicitly in a single table or section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important areas where additional clarity and detail will strengthen the manuscript. We address each major comment below and will revise the paper to incorporate the requested information and clarifications.

read point-by-point responses

Referee: [User Study] User Study section: The abstract and results claim statistically significant improvement in teacher-action informativeness and positive subjective ratings, yet no sample size, control conditions, statistical tests, exclusion criteria, or power analysis are reported. Without these, the central claim that ToM-2 accounting for CBH drives the gains cannot be evaluated.

Authors: We agree that these details are necessary for proper evaluation of the results. In the revised manuscript we will expand the User Study section to explicitly report the sample size, fully describe all control conditions (including the ToM-1 baseline), specify the statistical tests and their results, list exclusion criteria, and include a power analysis. We will also add a supplementary analysis that isolates the contribution of CBH modeling to the observed gains. revision: yes
Referee: [Modeling Approach] Modeling section (I-POMDP extension): Standard I-POMDP belief updates are Bayesian and do not encode non-Bayesian CBH such as anchoring or confirmation bias. The manuscript must specify how custom observation/transition functions inject these biases, report the fitting procedure from interaction data, and provide targeted validation (e.g., alignment of inferred CBH labels with participants' self-reports or controlled bias-induction trials) to show that performance gains are CBH-specific rather than generic second-order tracking.

Authors: We acknowledge that the current description of the I-POMDP extension does not provide sufficient technical detail on the non-Bayesian components. In the revision we will explicitly define the modified observation and transition functions that encode specific CBH (e.g., anchoring and confirmation bias), describe the parameter-fitting procedure applied to the collected interaction data, and add targeted validation results comparing inferred CBH labels against participants' self-reported biases. revision: yes

Circularity Check

0 steps flagged

No significant circularity in I-POMDP ToM-2 extension

full rationale

The paper extends the pre-existing I-POMDP framework to second-order ToM without any self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations that collapse the central claims. The derivation relies on standard Bayesian belief updates within I-POMDP, augmented by custom functions for CBH, but these are not shown to be tautological with the reported user-study outcomes. The in-person study supplies independent empirical evidence on informativeness and usefulness, keeping the work self-contained against external I-POMDP benchmarks rather than circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that I-POMDP can represent second-order beliefs including cognitive biases; no free parameters or invented entities are specified in the abstract.

axioms (1)

domain assumption I-POMDP can be extended to model the evolution of a person's erroneous beliefs about an agent and the cognitive biases and heuristics from which they arise
This assumption underpins the agent's ability to detect CBH and generate corrective feedback.

pith-pipeline@v0.9.0 · 5456 in / 1181 out tokens · 77621 ms · 2026-05-14T19:33:56.397800+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

M., Basappa, R.,Bergsmann,S.,Bouneffouf,D.,Callaghan,P.,Cavazza, M., Chaminade, T.,

Abrini, M., Abend, O., Acklin, D., Admoni, H., Aichinger, G., Alon, N., Ashktorab, Z., Atreja, A., Auron, M., Aufre- iter, A., Awasthi, R., Banerjee, S., Barnby, J. M., Basappa, R.,Bergsmann,S.,Bouneffouf,D.,Callaghan,P.,Cavazza, M., Chaminade, T., ... Zilberstein, S. (2025, April 28). Proceedings of 1st workshop on advancing artificial intel- ligence thr...

work page doi:10.1609/aimag.v35i4.2513 2025
[2]

W., Pelletier, J., & Homer, B

Astington, J. W., Pelletier, J., & Homer, B. (2002). Theory of mind and epistemological development: The relation betweenchildren’ssecond-orderfalse-beliefunderstanding andtheirabilitytoreasonaboutevidence.NewIdeasinPsy- chology,20(2), 131–144. https://doi.org/10.1016/S0732- 118X(02)00005-3 Benjamin,D.J.(2019,January1).Errorsinprobabilisticrea- soning and...

work page doi:10.1016/s0732- 2002
[3]

(2019, September 13)

Brooks, C., & Szafir, D. (2019, September 13). Building second-order mental models for human-robot interaction. Retrieved May 14, 2024, from http://arxiv.org/abs/1909. 06508

work page 2019
[4]

M., Palacci, A., Vélez, N., Hawkins, R

Chen, A. M., Palacci, A., Vélez, N., Hawkins, R. D., & Gershman, S. J. (2024). A hierarchi- cal bayesian model of adaptive teaching [_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/cogs.13477]. Cognitive Science,48(7), e13477. https://doi.org/10.1111/ cogs.13477

work page doi:10.1111/cogs.13477 2024
[5]

Cheyette, S., Callaway, F., Bramley, N., Nelson, J., & Tenen- baum, J. (2023). People seek easily interpretable informa- tion.45(45)

work page 2023
[6]

Clark and Susan E

Clark, H. H., & Brennan, S. E. (1991). Grounding in com- munication. InPerspectives on socially shared cognition (pp.127–149).AmericanPsychologicalAssociation.https: //doi.org/10.1037/10096-006

work page doi:10.1037/10096-006 1991
[7]

Doshi, P., & Gmytrasiewicz, P. J. (2009). Monte carlo sam- pling methods for approximating interactive POMDPs. Journal of Artificial Intelligence Research,34, 297–337. https://doi.org/10.1613/jair.2630

work page doi:10.1613/jair.2630 2009
[8]

S., & Young, D

Doshi, P., Qu, X., Goodie, A. S., & Young, D. L. (2012). Modeling human recursive reasoning using empirically in- formed interactive partially observable markov decision processes.IEEETransactionsonSystems,Man,andCyber- netics - Part A: Systems and Humans,42(6), 1529–1542. https://doi.org/10.1109/TSMCA.2012.2199484

work page doi:10.1109/tsmca.2012.2199484 2012
[9]

Gennaioli, N., & Shleifer, A. (2010). What comes to mind. The Quarterly Journal of Economics,125(4), 1399–1433. https://doi.org/10.1162/qjec.2010.125.4.1399

work page doi:10.1162/qjec.2010.125.4.1399 2010
[10]

Gilovich, T., Griffin, D., & Kahneman, D. (Eds.). (2002). Heuristicsandbiases:Thepsychologyofintuitivejudgment. Cambridge University Press. https://doi.org/10.1017/ CBO9780511808098

work page 2002
[11]

J., & Doshi, P

Gmytrasiewicz, P. J., & Doshi, P. (2005). A framework for sequential planning in multi-agent settings.Journal of Ar- tificial Intelligence Research,24, 49–79. https://doi.org/10. 1613/jair.1579

work page 2005
[12]

Habibian, S., Jonnavittula, A., & Losey, D. P. (2021). Here’s whati’velearned:Askingquestionsthatrevealrewardlearn- ing [ZSCC: 0000003].arXiv:2107.01995 [cs]. Retrieved November 13, 2021, from http://arxiv.org/abs/2107.01995

work page arXiv 2021
[13]

Han, Y., & Gmytrasiewicz, P. (2018). Learning others’ intentional models in multi-agent settings using inter- active POMDPs.Advances in Neural Information Pro- cessing Systems,31. Retrieved June 4, 2024, from https : / / proceedings . neurips . cc / paper / 2018 / hash / 65fc9fb4897a89789352e211ca2d398f-Abstract.html

work page 2018
[14]

Han, Y., & Gmytrasiewicz, P. (2019). IPOMDP-net: A deep neural network for partially observable multi-agent plan- ningusinginteractivePOMDPs[Number:01].Proceedings of the AAAI Conference on Artificial Intelligence,33(1), 6062–6069. https://doi.org/10.1609/aaai.v33i01.33016062 Nickerson,R.S.(1998).Confirmationbias:Aubiquitousphe- nomenon in many guises [Pu...

work page doi:10.1609/aaai.v33i01.33016062 2019
[15]

Premack, D., & Woodruff, G. (1978). Does the chimpanzee haveatheoryofmind?BrainandBehavioralSciences,1(4), 515–526. https://doi.org/10.1017/S0140525X00076512

work page doi:10.1017/s0140525x00076512 1978
[16]

N., Brunskill, E., Griffiths, T

Rafferty, A. N., Brunskill, E., Griffiths, T. L., & Shafto, P. (2016). Faster teaching via POMDP planning.Cognitive Science,40(6), 1290–1332. https://doi.org/10.1111/cogs. 12290

work page doi:10.1111/cogs 2016
[17]

Rathnasabapathy, B., Doshi, P., & Gmytrasiewicz, P. (2006). Exact solutions of interactive POMDPs using behavioral equivalence.Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, 1025–1032. https://doi.org/10.1145/1160633.1160816

work page doi:10.1145/1160633.1160816 2006
[18]

Schwartz, J., Zhou, R., & Kurniawati, H. (2022). Online planningforinteractive-POMDPsusingnestedmontecarlo tree search [ISSN: 2153-0866].2022 IEEE/RSJ Interna- tionalConferenceonIntelligentRobotsandSystems(IROS), 8770–8777. https://doi.org/10.1109/IROS47612.2022. 9981713

work page doi:10.1109/iros47612.2022 2022
[19]

Goodman and Thomas L

Shafto, P., Goodman, N. D., & Griffiths, T. L. (2014). A ra- tional account of pedagogical reasoning: Teaching by, and learningfrom,examples.CognitivePsychology,71,55–89. https://doi.org/10.1016/j.cogpsych.2013.12.004

work page doi:10.1016/j.cogpsych.2013.12.004 2014
[20]

M., & Gigerenzer, G

Todd, P. M., & Gigerenzer, G. (2000). Simple heuristics that make us smart. Tversky,A.,&Kahneman,D.(1974).Judgmentunderuncer- tainty: Heuristics and biases [Publisher: American Associ- ationfortheAdvancementofScience].Science,185(4157), 1124–1131.RetrievedJanuary18,2024,fromhttps://www. jstor.org/stable/1738360

work page arXiv 2000
[21]

Rossano, F., Lu, H., Zhu, Y., & Zhu, S.-C. (2022). In situ bidirectional human-robot value alignment [Publisher: American Association for the Advancement of Science]. Science Robotics,7(68), eabm4183. https://doi.org/10. 1126/scirobotics.abm4183

work page 2022
[22]

Y., & Shu, T

Zhang, Z., Jin, C., Jia, M. Y., & Shu, T. (2025, February 21). AutoToM:Automatedbayesianinverseplanningandmodel discovery for open-ended theory of mind. https://doi.org/ 10.48550/arXiv.2502.15676

work page doi:10.48550/arxiv.2502.15676 2025