pith. machine review for the scientific record. sign in

arxiv: 2604.15316 · v1 · submitted 2026-03-01 · 💻 cs.HC · cs.AI

Recognition: no theorem link

Anthropomorphism and Trust in Human-Large Language Model interactions

Authors on Pith no claims yet

Pith reviewed 2026-05-15 17:37 UTC · model grok-4.3

classification 💻 cs.HC cs.AI
keywords anthropomorphismtrustlarge language modelshuman-AI interactionwarmthempathycompetencerelational perceptions
0
0 comments X

The pith

Warmth and cognitive empathy drive perceptions of anthropomorphism and trust in LLMs more than competence does.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines more than 2,000 human interactions with LLM chatbots that were varied in displayed warmth, competence, and empathy. Warmth and cognitive empathy predicted higher scores on anthropomorphism, trust, similarity, relational closeness, usefulness, and lower frustration, while competence predicted every outcome except anthropomorphism. Affective empathy mainly shaped relational perceptions such as closeness but left epistemic judgments like usefulness unaffected. Subjective, personally relevant conversation topics strengthened the effects, producing greater human-likeness and connection than objective topics did. These patterns indicate that specific social cues shape how users relate to and evaluate artificial agents.

Core claim

In more than 2,000 human-LLM interactions, warmth and cognitive empathy significantly predicted perceptions on all outcomes (perceived anthropomorphism, trust, similarity, relational closeness, frustration, usefulness), while competence predicted all outcomes except for anthropomorphism. Affective empathy primarily predicted perceived relational measures but did not predict the epistemic outcomes. Topic sub-analyses showed that more subjective, personally relevant topics amplified these effects, producing greater human-likeness and relational connection with the LLM than did objective topics.

What carries the argument

Systematic variation of warmth (friendliness), competence (capability and coherence), and empathy (cognitive and affective) in LLM chatbots, used to predict user ratings of anthropomorphism, trust, and related perceptions.

Load-bearing premise

The systematic variations in the LLM chatbots successfully and independently manipulated perceived warmth, competence, and empathy as intended by the experimenters.

What would settle it

If participants in a replication report no reliable differences in perceived warmth, competence, or empathy across the varied chatbot conditions, the reported predictive relationships would not hold.

Figures

Figures reproduced from arXiv: 2604.15316 by Akila Kadambi, Alison Lentz, Antonio Damasio, Iulia Comsa, Jonas Kaplan, Katie Siri-Ngammuang, Lisa Aziz-Zadeh, Srini Narayanan, Tanishka Shah, Tara Buechler, Ylenia D'Elia.

Figure 1
Figure 1. Figure 1: Task Schematic. Participants were randomly assigned a topic for conversing with the chatbot (LLM), and then randomly assigned to either the warmth/competence condition or the cognitive/affective empathy condition. Each condition yielded nine different pairwise combinations (minimum-minimum, medium-maximum, etc.). Each combination was presented twice, for a total of 18 LLM interactions. Participants engaged… view at source ↗
Figure 2
Figure 2. Figure 2: Effects of warmth, competence, affective empathy, and cognitive empathy on anthropomorphism and trust. Mean ratings (±95% CI) for anthropomorphism (top row) and trust (bottom row) as a function of manipulated trait levels. In the Warmth/Competence condition (left column), increasing competence produced a monotonic rise in trust and usefulness, while increasing warmth strongly enhanced anthropomorphism and … view at source ↗
Figure 3
Figure 3. Figure 3: Effects of warmth, competence, affective empathy, and cognitive empathy on different outcomes across ma￾nipulated levels of warmth/competence (left) and affective/cognitive empathy (right). Higher competence significantly increased epistemic factors, such as perceived usefulness and reduced frustration, whereas higher warmth strengthened perceived relational closeness. The empathy condition largely paralle… view at source ↗
Figure 4
Figure 4. Figure 4: Participant ratings show trait-level manipulations across all four dimensions. Violin plots show the distribution of participant ratings (1-7 scale) for each manipulated trait level (Minimal, Medium, Maximal). Black points represent mean ratings with error bars indicating 95% confidence intervals. Pairwise comparisons between levels were conducted using Tukey’s HSD test. Top row: In the warmth/competence c… view at source ↗
Figure 5
Figure 5. Figure 5: Topic Type Comparisons for Significant Survey Questions for Warmth/Competence and Empathy Condi￾tions. Bar graphs comparing mean participant responses (±SE) to subjective versus objective topics across multiple survey questions. Each row represents a different survey question, with the left column showing results from the Warmth/Competence condition and the right column showing results from the Empathy con… view at source ↗
read the original abstract

With large language models (LLMs) becoming increasingly prevalent in daily life, so too has the tendency to attribute to them human-like minds and emotions, or anthropomorphize them. Here, we investigate dimensions people use to anthropomorphize and attribute trust toward LLMs across more than 2,000 human-LLM interactions. Participants (N=115) engaged with LLM chatbots systematically varied in warmth (friendliness), competence (capability, coherence), and empathy (cognitive and affective). Warmth and cognitive empathy significantly predicted perceptions on all outcomes (perceived anthropomorphism, trust, similarity, relational closeness, frustration, usefulness), while competence predicted all outcomes except for anthropomorphism. Affective empathy primarily predicted perceived relational measures, but did not predict the epistemic outcomes. Topic sub-analyses showed that more subjective, personally relevant topics (e.g., relationship advice) amplified these effects, producing greater human-likeness and relational connection with the LLM than did objective topics. Together, these findings reveal that warmth, competence, and empathy are key dimensions through which people attribute relational and epistemic perceptions to artificial agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reports results from an empirical study in which 115 participants completed over 2,000 interactions with LLM chatbots whose prompts were systematically varied along warmth, competence, cognitive empathy, and affective empathy. Regression analyses show that warmth and cognitive empathy predict perceived anthropomorphism, trust, similarity, relational closeness, frustration, and usefulness; competence predicts all outcomes except anthropomorphism; affective empathy predicts primarily relational measures; and effects are amplified for subjective, personally relevant topics.

Significance. If the prompt manipulations successfully and independently altered the targeted dimensions, the work supplies concrete evidence on the relative contributions of warmth, competence, and empathy to anthropomorphism and trust in LLMs. The large interaction count and within-subjects topic variation provide reasonable statistical power and allow examination of boundary conditions, which could inform both psychological theory and practical chatbot design.

major comments (2)
  1. [Methods/Results] Methods and Results sections: the manuscript reports no manipulation-check statistics (means, standard deviations, or tests confirming that the warmth, competence, cognitive-empathy, and affective-empathy variations produced distinct and low-correlation perceptions). Because the central claim attributes distinct predictive roles to these dimensions, the absence of these checks leaves open the possibility that the observed regression patterns reflect a single underlying factor such as overall fluency rather than the intended constructs.
  2. [Results] Results section: the regression models treat warmth, competence, cognitive empathy, and affective empathy as simultaneous predictors without reported multicollinearity diagnostics (VIF values or correlation matrix). If these dimensions are correlated in participants' perceptions, the claimed independent contributions cannot be unambiguously interpreted.
minor comments (2)
  1. [Abstract] Abstract: the summary would be strengthened by a brief statement of the manipulation-check outcomes or at least the direction and significance of the key regression coefficients.
  2. [Discussion] Discussion: the claim that subjective topics 'amplified' effects would benefit from explicit statistical comparison (interaction terms or simple-effects tests) rather than descriptive sub-analyses alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the validation of our experimental manipulations and the interpretation of the regression results. We address each major point below and have revised the manuscript to incorporate the requested diagnostics and checks.

read point-by-point responses
  1. Referee: [Methods/Results] Methods and Results sections: the manuscript reports no manipulation-check statistics (means, standard deviations, or tests confirming that the warmth, competence, cognitive-empathy, and affective-empathy variations produced distinct and low-correlation perceptions). Because the central claim attributes distinct predictive roles to these dimensions, the absence of these checks leaves open the possibility that the observed regression patterns reflect a single underlying factor such as overall fluency rather than the intended constructs.

    Authors: We agree that the original manuscript omitted explicit manipulation-check statistics, which limits the ability to confirm the independence of the prompt variations. In the revised manuscript we have added these analyses to the Methods and Results sections, reporting means and standard deviations for each perceived dimension across conditions as well as the full correlation matrix among the four dimensions. The correlations range from 0.28 to 0.61, indicating moderate rather than perfect overlap. In addition, we include one-way ANOVAs confirming that each prompt manipulation produced statistically significant differences on its target dimension. These results, together with the differential outcome patterns (competence fails to predict anthropomorphism while warmth does), argue against the possibility that all effects collapse to a single factor such as fluency. revision: yes

  2. Referee: [Results] Results section: the regression models treat warmth, competence, cognitive empathy, and affective empathy as simultaneous predictors without reported multicollinearity diagnostics (VIF values or correlation matrix). If these dimensions are correlated in participants' perceptions, the claimed independent contributions cannot be unambiguously interpreted.

    Authors: We accept that multicollinearity diagnostics were missing from the original submission and are necessary for unambiguous interpretation of the coefficients. The revised Results section now reports the complete correlation matrix among the four predictors and the variance inflation factor (VIF) for every regression model. All VIF values fall between 1.4 and 3.2, well below conventional thresholds of 5 or 10. These diagnostics support the claim that each dimension makes a statistically independent contribution to the outcome variables. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical regression on fresh participant data

full rationale

The paper reports an experimental study (N=115, >2000 interactions) in which LLM chatbots were varied along warmth/competence/empathy dimensions and participants rated outcomes. All central claims are statistical associations obtained via regression on measured variables. No equations, fitted parameters, or derivations are present that could reduce any result to its own inputs by construction. No self-citation chains or ansatzes are invoked to justify the core findings. The work is therefore self-contained against external benchmarks and receives the default non-circularity score.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard assumptions of psychological measurement and statistical inference rather than new theoretical constructs.

axioms (2)
  • domain assumption Self-report scales validly capture perceived anthropomorphism, trust, and relational closeness in human-LLM interactions
    Invoked implicitly when interpreting all outcome measures; common in social psychology but subject to demand characteristics.
  • domain assumption The LLM variations successfully and orthogonally manipulated warmth, competence, and empathy
    Required for attributing effects to the intended dimensions; stated in the abstract as systematic variation.

pith-pipeline@v0.9.0 · 5532 in / 1295 out tokens · 55684 ms · 2026-05-15T17:37:39.050358+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

  1. [1]

    Tahiroglu, D., & Taylor, M. (2019). Anthropomorphism, social understanding, and imaginary companions.British Journal of Developmental Psychology, 37(2), 284-299

  2. [2]

    Shanahan, M. (2024). Talking about large language models.Communications of the ACM, 67(2), 68-79. https: //doi.org/10.48550/arXiv.2212.03551

  3. [3]

    (2024, October)

    Akbulut, C., Weidinger, L., Manzini, A., Gabriel, I., & Rieser, V . (2024, October). All too human? Mapping and mitigating the risk from anthropomorphic AI. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society (V ol. 7, No. 1, pp. 13-26).https://doi.org/10.1609/aies.v7i1.31613

  4. [4]

    Weidinger, L., Mellor, J.F., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S.M., Hawkins, W.T., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., & Gabriel, I. (2021). Ethical and social risks of harm from Language Model...

  5. [5]

    (2025, October 1)

    Blake, S. (2025, October 1). Third of Americans have had a romantic relationship with AI.Newsweek. https: //www.newsweek.com/third-americans-have-had-romantic-relationship-ai-10814798

  6. [6]

    Shevlin, H. (2024). All too human? Identifying and mitigating ethical risks of Social AI.Law, Ethics & Technology, 1(2).https://doi.org/10.55092/let20240003

  7. [7]

    (2025, November 8)

    Kuenssberg, L. (2025, November 8). Mothers say AI chatbots encouraged their sons to kill themselves.BBC News. https://www.bbc.com/news/articles/ce3xgwyywe4o

  8. [8]

    E., & Graziano, M

    Guingrich, R. E., & Graziano, M. S. (2025). A Longitudinal Randomized Control Study of Companion Chatbot Use: Anthropomorphism and Its Mediating Role on Social Impacts.arXiv preprint arXiv:2509.19515

  9. [9]

    Calo, M. R. (2011). Against notice skepticism in privacy (and elsewhere).Notre Dame Law Review, 87, 1027. https://ndlawreview.org/wp-content/uploads/2013/06/Calo.pdf

  10. [10]

    D., & Grimm, C

    Kaminski, E., Rueben, M., Smart, W. D., & Grimm, C. M. (2017). Averting robot eyes, 76 Md. L Rev, 983, 1001-1020

  11. [11]

    T., Cuddy, A

    Fiske, S. T., Cuddy, A. J. C., & Glick, P. (2007). Universal dimensions of social cognition: Warmth and competence. Trends in Cognitive Sciences, 11(2), 77–83

  12. [12]

    T., & Fiske, S

    Harris, L. T., & Fiske, S. T. (2006). Dehumanizing the Lowest of the Low: Neuroimaging Responses to Extreme Out-Groups.Psychological Science, 17(10), 847–853

  13. [13]

    Kadambi, A., Ringold, S., Kamath, S., Raman, N., Jayashankar, A., Damasio, A., Narayanan, S., Kaplan, J., & Aziz-Zadeh, L. (2025). Humanizing the dehumanized: A test of strategies. https://doi.org/10.21203/rs.3. rs-7330548/v1

  14. [14]

    Epley, N., Waytz, A., & Cacioppo, J. T. (2008). On seeing human: A three-factor theory of anthropomorphism. Psychological Review, 114(4), 864–886

  15. [15]

    Christoforakos, L., Gallucci, A., Surmava-Große, T., Ullrich, D., & Diefenbach, S. (2021). Can robots earn our trust the same way humans do? A systematic exploration of competence, warmth, and anthropomorphism as determinants of trust development in HRI.Frontiers in Robotics and AI, 8, 640444. https://doi.org/10. 3389/frobt.2021.640444

  16. [16]

    Sorin, V ., Brin, D., Barash, Y ., Konen, E., Charney, A., Nadkarni, G., & Klang, E. (2024). Large language models and empathy: systematic review.Journal of Medical Internet Research, 26, e52597

  17. [17]

    G., Aharon-Peretz, J., & Perry, D

    Shamay-Tsoory, S. G., Aharon-Peretz, J., & Perry, D. (2009). Two systems for empathy: a double dissociation between emotional and cognitive empathy in inferior frontal gyrus versus ventromedial prefrontal lesions.Brain, 132(3), 617–627

  18. [18]

    Zaki, J., Weber, J., Bolger, N., & Ochsner, K. (2009). The neural bases of empathic accuracy.Proceedings of the National Academy of Sciences, 106(27), 11382–11387

  19. [19]

    Waytz, A., Cacioppo, J., & Epley, N. (2010). Who sees human? The stability and importance of individual differences in anthropomorphism.Perspectives on Psychological Science, 5(3), 219–232. https://doi.org/10. 1177/1745691610369336

  20. [20]

    S., Goswami, R., Saegusa, K., & Broadbent, E

    Johanson, D., Ahn, H. S., Goswami, R., Saegusa, K., & Broadbent, E. (2023). The effects of healthcare robot empathy statements and head nodding on trust and satisfaction: A video study.ACM Transactions on Human-Robot Interaction, 12(1), 1–21.https://doi.org/10.1145/3549534

  21. [21]

    W., Poliak, A., Dredze, M., Leas, E

    Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.JAMA internal medicine, 183(6), 589–596. https: //doi.org/10.1001/jamainte...

  22. [22]

    B., Wright, A

    Liu, S., McCoy, A. B., Wright, A. P., Carew, B., Genkins, J. Z., Huang, S. S., Peterson, J. F., Steitz, B., & Wright, A. (2024). Leveraging large language models for generating responses to patient messages-a subjective analysis. Journal of the American Medical Informatics Association : JAMIA, 31(6), 1367–1379. https://doi.org/10. 1093/jamia/ocae052

  23. [23]

    Welivita, A., & Pu, P. (2024). Are large language models more empathetic than humans?arXiv. https://doi. org/10.48550/arXiv.2406.05063

  24. [24]

    Song, J., & Lin, H. (2024). Exploring the effect of artificial intelligence intellect on consumer decision delegation: The role of trust, task objectivity, and anthropomorphism.Journal of Consumer Behaviour , 23(2), 727–747

  25. [25]

    P., Boatfield, C., Wang, X., DeCero, E., Krupica, I

    Ta-Johnson, V . P., Boatfield, C., Wang, X., DeCero, E., Krupica, I. C., Rasof, S. D., Motzer, A., & Pedryc, W. M. (2022). Assessing the topics and motivating factors behind human-social chatbot interactions: Thematic analysis of user experiences.JMIR Human Factors, 9(4), 1–12.https://doi.org/10.2196/38876

  26. [26]

    R Project for Statistical Computing. (n.d.). The R Project for Statistical Computing. Retrieved November 6, 2025, fromhttps://www.r-project.org/

  27. [27]

    Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4.Journal of Statistical Software, 67(1), 1–48

  28. [28]

    Madhavan, P., & Wiegmann, D. A. (2007). Similarities and differences between human–human and hu- man–automation trust: An integrative review.Theoretical Issues in Ergonomics Science, 8(4), 277–301. https://doi.org/10.1080/14639220500337708

  29. [29]

    A., Billings, D

    Hancock, P. A., Billings, D. R., Schaefer, K. E., Chen, J. Y . C., de Visser, E. J., & Parasuraman, R. (2011). A Meta-Analysis of Factors Affecting Trust in Human-Robot Interaction.Human Factors: The Journal of the Human Factors and Ergonomics Society, 53(5), 517–527

  30. [30]

    Colombatto, C., & Fleming, S. M. (2023). Illusions of confidence in artificial systems.Preprint at psyArXiv, 10

  31. [31]

    (2021, March)

    Jacovi, A., Marasovi´c, A., Miller, T., & Goldberg, Y . (2021, March). Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 624-635)

  32. [32]

    S., & Rocher, L

    Ibrahim, L., Hafner, F. S., & Rocher, L. (2025). Training language models to be warm and empathetic makes them less reliable and more sycophantic.arXiv preprint arXiv:2507.21919

  33. [33]

    (2025, June)

    Malmqvist, L. (2025, June). Sycophancy in large language models: Causes and mitigations. InIntelligent Computing-Proceedings of the Computing Conference (pp. 61-74). Cham: Springer Nature Switzerland

  34. [34]

    Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., & Jurafsky, D. (2025). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence.arXiv preprint arXiv:2510.01395

  35. [35]

    Towards Understanding Sycophancy in Language Models

    Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., Cheng, N., Durmus, E., Hatfield- Dodds, Z., Johnston, S. R., Kravec, S., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M., & Perez, E. (2023). Towards understanding sycophancy in language models (arXiv preprint arXiv:2310.13548)

  36. [36]

    H., Pennycook, G., & Rand, D

    Costello, T. H., Pennycook, G., & Rand, D. G. (2025, February 17). Just the facts: How dialogues with AI reduce conspiracy beliefs.https://doi.org/10.31234/osf.io/h7n8u_v1

  37. [37]

    H., Pennycook, G., & Rand, D

    Costello, T. H., Pennycook, G., & Rand, D. G. (2024). Durably reducing conspiracy beliefs through dialogues with AI.Science, 385(6714), eadq1814

  38. [38]

    B., & DeSanti, A

    Shank, D. B., & DeSanti, A. (2018). Attributions of morality and mind to artificial intelligence after real-world moral violations.Computers in human behavior , 86, 401-411

  39. [39]

    Peter, S., Riemer, K., & West, J. D. (2025). The benefits and dangers of anthropomorphic conversational agents. Proceedings of the National Academy of Sciences of the United States of America, 122(22), e2415898122

  40. [40]

    Nyholm, L., Santamäki-Fischer, R., & Fagerström, L. (2021). Users’ ambivalent sense of security with humanoid robots in healthcare.Informatics for Health and Social Care, 46(2), 218–226

  41. [41]

    fellow feeling

    Binns, R. (2018). Algorithmic accountability and public reason.Philosophy & Technology, 31(4), 543–556. https://doi.org/10.1007/s13347-017-0263-5 A Appendix A.1 Prompts Warmth and Competence (WC)You are a large language model characterized by different levels of competence and warmth. Competence is the dimension of social perception that reflects a person...