pith. machine review for the scientific record. sign in

arxiv: 2605.03855 · v2 · submitted 2026-05-05 · 💻 cs.RO

Recognition: 1 theorem link

Evaluating Generative Models as Interactive Emergent Representations of Human-Like Collaborative Behavior

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:23 UTC · model grok-4.3

classification 💻 cs.RO
keywords human-AI collaborationembodied agentsfoundation modelscollaborative behaviortheory of mindLLM evaluationcolor-matching gamebehavior detection
0
0 comments X

The pith

Foundation model agents exhibit emergent collaborative behaviors like perspective-taking and theory of mind in a human-AI color-matching game without explicit training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether large language model agents in embodied settings can display human-like collaborative behaviors that suggest they have mental models of their partners. This matters for building AI that can coordinate effectively with humans in shared tasks. The authors built a 2D game where agents and humans match colors through coordination, defined five specific behaviors as signs of such mental models, and created an automated system using LLM judges to detect them. Results indicate these behaviors emerge consistently across models, with variations by model and task stage, and humans report positive collaboration experiences. This provides evidence and a method for evaluating such capabilities in generative models.

Core claim

Embodied foundation model agents consistently exhibit emergent collaborative behaviors—perspective-taking, collaborator-aware planning, introspection, theory of mind, and clarification—without being explicitly trained to do so, as identified by an automated behavior detection system using LLM-based judges that achieves fair to substantial agreement with human annotations in a 2D collaborative color-matching game; these behaviors show distinct patterns across different LLMs and collaboration stages, accompanied by positive human satisfaction in user studies.

What carries the argument

The automated behavior detection system that uses LLM-based judges to identify five collaborative behaviors (perspective-taking, collaborator-aware planning, introspection, theory of mind, and clarification) in the 2D game environment.

If this is right

  • Foundation models can serve as interactive emergent representations of human-like collaborative behavior in embodied settings.
  • Collaborative behaviors occur at varying frequencies during different stages of collaboration.
  • Distinct patterns of these behaviors appear across different large language models.
  • Human participants report positive collaboration experiences with such agents, appreciating task focus and plan verbalization.
  • The experimental framework enables further assessment of collaboration effectiveness in human-AI teams.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If these behaviors indicate mental models, then scaling up models or refining prompts could enhance coordination without additional training data.
  • The approach might extend to real-world robotic applications where AI must adapt to human partners dynamically.
  • Discrepancies between LLM judges and humans could highlight limitations in current models' understanding of social cues.
  • This work opens the door to using game-based evaluations for measuring theory-of-mind capabilities in generative AI.

Load-bearing premise

The five predefined collaborative behaviors reliably indicate underlying mental models of collaborators and that LLM-based judges accurately detect them in a manner that generalizes beyond this game.

What would settle it

A replication study where human annotators disagree substantially with the LLM judges on behavior detection in new game sessions or where no emergent behaviors are observed in a wider range of foundation models.

read the original abstract

Human-AI collaboration requires AI agents to understand human behavior for effective coordination. While advances in foundation models show promising capabilities in understanding and showing human-like behavior, their application in embodied collaborative settings needs further investigation. This work examines whether embodied foundation model agents exhibit emergent collaborative behaviors indicating underlying mental models of their collaborators, which is an important aspect of effective coordination. This paper develops a 2D collaborative game environment where large language model agents and humans complete color-matching tasks requiring coordination. We define five collaborative behaviors as indicators of emergent mental model representation: perspective-taking, collaborator-aware planning, introspection, theory of mind, and clarification. An automated behavior detection system using LLM-based judges identifies these behaviors, achieving fair to substantial agreement with human annotations. Results from the automated behavior detection system show that foundation models consistently exhibit emergent collaborative behaviors without being explicitly trained to do so. These behaviors occur at varying frequencies during collaboration stages, with distinct patterns across different LLMs. A user study was also conducted to evaluate human satisfaction and perceived collaboration effectiveness, with the results indicating positive collaboration experiences. Participants appreciated the agents' task focus, plan verbalization, and initiative, while suggesting improvements in response times and human-like interactions. This work provides an experimental framework for human-AI collaboration, empirical evidence of collaborative behaviors in embodied LLM agents, a validated behavioral analysis methodology, and an assessment of collaboration effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces a 2D color-matching collaborative game in which LLM agents interact with humans or other agents. It defines five behaviors (perspective-taking, collaborator-aware planning, introspection, theory of mind, and clarification) as indicators of emergent mental models, deploys an automated LLM-judge detection system that achieves fair-to-substantial agreement with human annotations, reports that foundation models exhibit these behaviors at varying frequencies without explicit training, and presents a user study showing positive human perceptions of collaboration effectiveness.

Significance. If the detection methodology proves robust, the work supplies a concrete experimental framework and initial quantitative evidence that embodied LLM agents can display human-like collaborative behaviors in a coordination task. This could help evaluate and improve human-AI teaming systems. The current manuscript, however, leaves the reliability of the LLM judges and the mapping from observed behaviors to mental models insufficiently substantiated.

major comments (3)
  1. [Abstract] Abstract: the claim that foundation models 'consistently exhibit emergent collaborative behaviors' rests on an automated detector whose agreement with humans is described only as 'fair to substantial.' Without per-behavior kappa values or a breakdown showing that no behavior falls in the fair range (0.21-0.40), the reliability of the frequency counts used to support consistency cannot be evaluated.
  2. [Behavior detection system] Behavior detection system (described after the game definition): the paper provides no details on judge prompting, model choice for the judges, bias-mitigation steps, or any baseline detector (e.g., rule-based or non-LLM). Because every quantitative result flows through these LLM judges, the absence of such controls is load-bearing for the central emergence claim and leaves open the possibility that outputs reflect shared training-data priors rather than independent observation of agent traces.
  3. [Results and interpretation] Results and interpretation sections: the five behaviors are treated as direct indicators of 'underlying mental models,' yet the manuscript reports no additional validation such as direct probing of the agents, human mental-model elicitation, or comparison against human-human play traces. Frequency patterns alone in a single game do not establish that the behaviors reflect genuine collaborator modeling rather than surface-level response patterns.
minor comments (2)
  1. The abstract and methods should report the exact LLMs tested, number of trials per condition, and any data-exclusion criteria or statistical tests applied to the behavior frequencies.
  2. Figure captions and table legends should explicitly state whether error bars represent standard error, standard deviation, or confidence intervals.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We appreciate the referee's identification of areas where additional methodological transparency and interpretive caution would strengthen the work. We address each major comment below and will incorporate revisions to improve the reliability assessment of the LLM judges and to moderate claims about mental models.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that foundation models 'consistently exhibit emergent collaborative behaviors' rests on an automated detector whose agreement with humans is described only as 'fair to substantial.' Without per-behavior kappa values or a breakdown showing that no behavior falls in the fair range (0.21-0.40), the reliability of the frequency counts used to support consistency cannot be evaluated.

    Authors: We agree that the abstract claim would be more robust with granular agreement metrics. In the revised manuscript we will add a table reporting per-behavior Cohen's kappa values (computed from our existing human annotations) and will adjust the wording from 'consistently exhibit' to 'frequently exhibit' if any behavior falls in the fair range. This change will be reflected in both the abstract and the results summary. revision: yes

  2. Referee: [Behavior detection system] Behavior detection system (described after the game definition): the paper provides no details on judge prompting, model choice for the judges, bias-mitigation steps, or any baseline detector (e.g., rule-based or non-LLM). Because every quantitative result flows through these LLM judges, the absence of such controls is load-bearing for the central emergence claim and leaves open the possibility that outputs reflect shared training-data priors rather than independent observation of agent traces.

    Authors: We acknowledge the omission of these details. The revised manuscript will expand the Behavior Detection System section (and add an appendix) with: the exact judge prompts for each behavior, the specific model used (GPT-4o), bias-mitigation steps including multiple independent judges and majority voting, and a comparison to a keyword-based rule detector on the same traces. These additions will allow readers to evaluate whether detections exceed surface priors. revision: yes

  3. Referee: [Results and interpretation] Results and interpretation sections: the five behaviors are treated as direct indicators of 'underlying mental models,' yet the manuscript reports no additional validation such as direct probing of the agents, human mental-model elicitation, or comparison against human-human play traces. Frequency patterns alone in a single game do not establish that the behaviors reflect genuine collaborator modeling rather than surface-level response patterns.

    Authors: We accept that frequency patterns alone do not constitute direct proof of internal mental models. In revision we will (1) change phrasing throughout results and discussion from 'indicators of underlying mental models' to 'behaviors consistent with emergent collaborator modeling,' (2) add an explicit limitations paragraph noting the lack of direct probing or human-human baselines, and (3) outline future work on agent-state probing and human-human comparisons. The user-study results on perceived effectiveness will be repositioned as complementary rather than confirmatory evidence. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical frequencies of a priori behaviors

full rationale

The paper defines five collaborative behaviors upfront as indicators of mental models, then applies a separate LLM-based judge system to detect their presence in agent interaction traces during the color-matching game. Detection reliability is assessed via agreement with human annotations (reported as fair to substantial), and results are reported as observed occurrence frequencies across collaboration stages and models. No equations, parameter fitting, or self-referential derivations are present; the central claim of emergent behaviors follows directly from the measured detection rates rather than reducing to the definitions or any fitted inputs by construction. The methodology is self-contained against external human validation benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is inferred from the described methodology; the central claim rests on assumptions about what behaviors count as evidence of mental models and the reliability of LLM judges.

axioms (2)
  • domain assumption The five behaviors (perspective-taking, collaborator-aware planning, introspection, theory of mind, clarification) serve as valid indicators of emergent mental model representation in collaborators.
    This premise directly links observed actions to the paper's claim about underlying representations.
  • domain assumption LLM-based judges can detect these behaviors with fair to substantial agreement to human annotations.
    This enables the automated system whose results support the main finding.

pith-pipeline@v0.9.0 · 5556 in / 1476 out tokens · 44820 ms · 2026-05-08T18:23:20.908215+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 32 canonical work pages · 4 internal anchors

  1. [1]

    Evaluating XAI: A Comparison of Rule-Based and Example-Based Ex- planations

    Bard, N., Foerster, J.N., Chandar, S., Burch, N., Lanctot, M., Song, H.F., Parisotto, E., Dumoulin, V., Moitra, S., Hughes, E., Dunning, I., Mourad, S., Larochelle, H., Bellemare, M.G., Bowling, M.: The Hanabi challenge: A new frontier for AI research. Artificial Intelligence280, 103216 (2020) https://doi.org/10.1016/j.artint. 2019.103216 40

  2. [2]

    CoRRabs/2501.08389(2025) https://doi.org/ 10.48550/arXiv.2501.08389

    Belsare, A., Karimi, Z., Mattson, C., Brown, D.S.: Toward zero-shot user intent recognition in shared autonomy. CoRRabs/2501.08389(2025) https://doi.org/ 10.48550/arXiv.2501.08389

  3. [3]

    ACDC: The adverse conditions dataset with correspondences for robust semantic driving scene perception,

    Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019) https://doi.org/10.1109/TPAMI. 2019.2929257

  4. [4]

    A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20(1):37–46, 1960

    Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement20(1), 37–46 (1960) https://doi.org/10.1177/001316446002000104

  5. [5]

    In: Advances in Neural Information Processing Systems (NeurIPS), vol

    Dragan, A.: On the Utility of Learning about Humans for Human- AI Coordination. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 32 (2019). https://proceedings.neurips.cc/paper files/paper/2019/ file/f5b1b89d98b7286673128a5fb112cb9a-Paper.pdf

  6. [6]

    International Journal of Social Robotics15(5), 867–877 (2023) https://doi

    Cucciniello, I., Sangiovanni, S., Maggi, G., Rossi, S.: Mind perception in hri: Exploring users’ attribution of mental and emotional states to robots with different behavioural styles. International Journal of Social Robotics15(5), 867–877 (2023) https://doi. org/10.1007/s12369-023-00989-z

  7. [7]

    In: ACM/IEEE Int

    Devin, S., Alami, R.: An implemented theory of mind to improve human-robot shared plans execution. In: ACM/IEEE Int. Conf. Human-Robot Interaction (HRI), pp. 319–326 (2016). https://doi.org/10.1109/HRI.2016.7451768

  8. [8]

    Journal of Human-Robot Interaction2(2), 58–79 (2013) https://doi.org/10.5898/JHRI.2.2.Deits

    Deits, R., Tellex, S., Thaker, P., Simeonov, D., Kollar, T., Roy, N.: Clarifying com- mands with information-theoretic human-robot dialog. Journal of Human-Robot Interaction2(2), 58–79 (2013) https://doi.org/10.5898/JHRI.2.2.Deits

  9. [9]

    Current Biology15(17), 644–645 (2005) https: //doi.org/10.1016/j.cub.2005.08.041

    Frith, C., Frith, U.: Theory of mind. Current Biology15(17), 644–645 (2005) https: //doi.org/10.1016/j.cub.2005.08.041

  10. [10]

    CoRR abs/1907.08584(2019) https://doi.org/10.48550/arXiv.1907.08584

    Szlam, A.: Craftassist: A framework for dialogue-enabled interactive agents. CoRR abs/1907.08584(2019) https://doi.org/10.48550/arXiv.1907.08584

  11. [11]

    arXiv preprint arXiv:2504.15236 , year =

    Huang, S., Durmus, E., McCain, M., Handa, K., Tamkin, A., Hong, J., Stern, M., Somani, A., Zhang, X., Ganguli, D.: Values in the wild: Discovering and mapping values in real-world language model interactions. In: Second Confer- ence on Language Modeling (2025). https://doi.org/10.48550/arXiv.2504.15236 . https://openreview.net/forum?id=zJHZJClG1Z

  12. [12]

    https: //doi.org/10.31234/osf.io/munc9 42

    Huijzer, R., Hill, Y.: Large Language Models Show Human Behavior (2023). https: //doi.org/10.31234/osf.io/munc9 41

  13. [13]

    In: Proc

    Hiatt, L.M., Harrison, A.M., Trafton, J.G.: Accommodating human variability in human-robot teams through theory of mind. In: Proc. Int. Joint Conf. Artificial Intelligence (IJCAI), pp. 2066–2071 (2011). https://doi.org/10.5591/ 978-1-57735-516-8/IJCAI11-345

  14. [14]

    Inner Monologue: Embodied Reasoning through Planning with Language Models

    Levine, S., Hausman, K., Ichter, B.: Inner Monologue: Embodied Reasoning through Planning with Language Models. In: Proc. 6th Conf. Robot Learning (CoRL), vol. 205, pp. 1769–1782 (2023). https://doi.org/10.48550/arXiv.2207.05608 . https:// proceedings.mlr.press/v205/huang23c.html

  15. [15]

    In: Proc

    Johnson, M., Hofmann, K., Hutton, T., Bignell, D.: The malmo platform for artifi- cial intelligence experimentation. In: Proc. Int. Joint Conf. Artificial Intelligence (IJCAI), pp. 4246–4247 (2016)

  16. [16]

    Bilancia, L

    Jahanmahin, R., Masoud, S., Rickli, J., Djuric, A.: Human-robot interactions in manufacturing: A survey of human behavior modeling. Robotics and Computer- Integrated Manufacturing78, 102404 (2022) https://doi.org/10.1016/j.rcim.2022. 102404

  17. [17]

    Proceed- ings of the National Academy of Sciences121(45) (2024) https://doi.org/10.1073/ pnas.2405460121

    Kosinski, M.: Evaluating Large Language Models in Theory of Mind Tasks. Proceed- ings of the National Academy of Sciences121(45) (2024) https://doi.org/10.1073/ pnas.2405460121

  18. [18]

    OpenVLA: An Open-Source Vision-Language-Action Model

    Kim, M.J., Pertsch, K., Karamcheti, S., Xiao, T., Balakrishna, A., Nair, S., Rafailov, R., Foster, E., Lam, G., Sanketi, P., Vuong, Q., Kollar, T., Burchfiel, B., Tedrake, R., Sadigh, D., Levine, S., Liang, P., Finn, C.: Openvla: An open-source vision- language-action model. CoRRabs/2406.09246(2024) https://doi.org/10.48550/ arXiv.2406.09246

  19. [19]

    In: 2024 IEEE/RSJ Int

    Kannan, S.S., Venkatesh, V.L.N., Min, B.-C.: Smart-llm: Smart multi-agent robot task planning using large language models. In: 2024 IEEE/RSJ Int. Conf. Intelli- gent Robots and Systems (IROS), pp. 12140–12147 (2024). https://doi.org/10.1109/ IROS58592.2024.10802322

  20. [20]

    In: 2023 IEEE Int

    Khanna, P., Yadollahi, E., Bjorkman, M., Leite, I., Smith, C.: Effects of explanation strategies to resolve failures in human-robot collaboration. In: 2023 IEEE Int. Conf. Robot and Human Interactive Communication (RO-MAN), pp. 1829–1836 (2023). https://doi.org/10.1109/RO-MAN57019.2023.10309394

  21. [21]

    In: Proc

    Liu, Z., Bahety, A., Song, S.: REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction. In: Proc. 7th Conf. Robot Learning (CoRL), vol. 229, pp. 3468–3484 (2023). https://proceedings.mlr.press/v229/liu23g.html

  22. [22]

    LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods

    Li, H., Dong, Q., Chen, J., Su, H., Zhou, Y., Ai, Q., Ye, Z., Liu, Y.: LLMs- as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods. CoRR 42 abs/2412.05579(2024) https://doi.org/10.48550/arXiv.2412.05579

  23. [23]

    MediaPipe: A Framework for Building Perception Pipelines

    Grundmann, M.: Mediapipe: A framework for building perception pipelines. CoRR abs/1906.08172(2019) https://doi.org/10.48550/arXiv.1906.08172

  24. [24]

    In: 38th Conf

    Manling, L.,et al.: Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making. In: 38th Conf. Neural Information Processing Systems (NeurIPS), pp. 100428–100444 (2024)

  25. [25]

    In: Workshop on Vision-Language Models for Navigation and Manipulation at ICRA (2024)

    Kreiman, T., Xu, C., Luo, J., Tan, Y.L., Sadigh, D., Finn, C., Levine, S.: Octo: An open-source generalist robot policy. In: Workshop on Vision-Language Models for Navigation and Manipulation at ICRA (2024). https://openreview.net/forum?id= jGrtIvJBpS

  26. [26]

    In: Advances in Neural Information Processing Systems (NeurIPS), vol

    Lowe, R.: Training language models to follow instructions with human feed- back. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 27730–27744 (2022). https://proceedings.neurips.cc/paper files/paper/2022/ file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf

  27. [27]

    In: Proc

    Park, J.S., O’Brien, J., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S.: Generative agents: Interactive simulacra of human behavior. In: Proc. 36th Annu. ACM Symp. User Interface Software and Technology, pp. 1–22 (2023). https://doi.org/10.1145/ 3586183.3606763

  28. [28]

    New York TImes

    Sheridan, T.B.: Human-robot interaction: Status and challenges. Human Fac- tors58(4), 525–532 (2016) https://doi.org/10.1177/0018720816644364 . PMID: 27098262

  29. [29]

    Llm with tools: A survey.arXiv preprint arXiv:2409.18807, 2024

    Shen, Z.: LLM With Tools: A Survey. CoRRabs/2409.18807(2024) https://doi. org/10.48550/arXiv.2409.18807

  30. [30]

    IEEE Int

    Shaji, S., Huppertz, F., Mitrevski, A., Houben, S.: From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition? In: Proc. IEEE Int. Conf. Robotics and Automation (ICRA) (2026)

  31. [31]

    In: Proc

    Sharma, A., Rao, S., Brockett, C., Malhotra, A., Jojic, N., Dolan, B.: Investigating agency of LLMs in human-AI collaboration tasks. In: Proc. 18th Conf. Euro- pean Chapter Assoc. Comput. Linguistics (Volume 1: Long Papers), pp. 1968–1987 (2024). https://doi.org/10.18653/v1/2024.eacl-long.119 . https://aclanthology.org/ 2024.eacl-long.119/ 43

  32. [32]

    agentic ai: A conceptual taxonomy, applications and challenges

    Sapkota, R., Roumeliotis, K.I., Karkee, M.: Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenges. Information Fusion126, 103599 (2025) https://doi.org/10.1016/j.inffus.2025.103599

  33. [33]

    In: Proc

    Sidji, M., Smith, W., Rogerson, M.J.: The hidden rules of hanabi: How humans out- perform ai agents. In: Proc. CHI Conf. Human Factors in Computing Systems, pp. 1–16 (2023). https://doi.org/10.1145/3544548.3581550

  34. [34]

    CoRRabs/2403.02274(2024) https://doi.org/10

    Shrestha, S., Zha, Y., Banagiri, S., Gao, G., Aloimonos, Y., Fermuller, C.: Natsgd: A dataset with speech, gestures, and demonstrations for robot learning in natu- ral human-robot interaction. CoRRabs/2403.02274(2024) https://doi.org/10. 48550/arXiv.2403.02274

  35. [35]

    Large language models fail on trivial alterations to theory-of-mind tasks, 2023

    Ullman, T.: Large language models fail on trivial alterations to theory-of-mind tasks. CoRRabs/2302.08399(2023) https://doi.org/10.48550/arXiv.2302.08399

  36. [36]

    Verma, M., Bhambri, S., Kambhampati, S.: Theory of Mind Abilities of Large Language Models in Human-Robot Interaction: An Illusion? In: Companion of ACM/IEEE Int. Conf. Human-Robot Interaction (HRI), pp. 36–45 (2024). https: //doi.org/10.1145/3610978.3640767

  37. [37]

    In: Proc

    Hoorn, D.P.M., Neerincx, A., Graaf, M.M.A.: ”I think you are doing a bad job!”: The Effect of Blame Attribution by a Robot in Human-Robot Collaboration. In: Proc. ACM/IEEE Int. Conf. Human-Robot Interaction (HRI), pp. 140–148 (2021). https: //doi.org/10.1145/3434073.3444681 . https://doi.org/10.1145/3434073.3444681

  38. [38]

    first come, first go

    Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., Anandkumar, A.: Voyager: An Open-Ended Embodied Agent with Large Language Models. Trans- actions on Machine Learning Research (2024) https://doi.org/10.48550/arXiv.2305. 16291

  39. [39]

    Ex- ploring large language models for communica- tion games: An empirical study on werewolf

    Xu, Y., Wang, S., Li, P., Luo, F., Wang, X., Liu, W., Liu, Y.: Exploring large lan- guage models for communication games: An empirical study on werewolf. CoRR abs/2309.04658(2023) https://doi.org/10.48550/arXiv.2309.04658

  40. [40]

    Frontiers in Robotics and AI10 (2023) https://doi.org/10.3389/frobt.2023.1233328

    Zhang, Y., Doyle, T.: Integrating intention-based systems in human-robot interaction: a scoping review of sensors, algorithms, and trust. Frontiers in Robotics and AI10 (2023) https://doi.org/10.3389/frobt.2023.1233328

  41. [41]

    Ortega, F

    Zhang, L., Ji, Z., Chen, B.: CREW: Facilitating Human-AI Teaming Research. Trans- actions on Machine Learning Research (2024) https://doi.org/10.48550/arXiv.2408. 00170 44