Recognition: no theorem link
Anthropomorphism and Trust in Human-Large Language Model interactions
Pith reviewed 2026-05-15 17:37 UTC · model grok-4.3
The pith
Warmth and cognitive empathy drive perceptions of anthropomorphism and trust in LLMs more than competence does.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In more than 2,000 human-LLM interactions, warmth and cognitive empathy significantly predicted perceptions on all outcomes (perceived anthropomorphism, trust, similarity, relational closeness, frustration, usefulness), while competence predicted all outcomes except for anthropomorphism. Affective empathy primarily predicted perceived relational measures but did not predict the epistemic outcomes. Topic sub-analyses showed that more subjective, personally relevant topics amplified these effects, producing greater human-likeness and relational connection with the LLM than did objective topics.
What carries the argument
Systematic variation of warmth (friendliness), competence (capability and coherence), and empathy (cognitive and affective) in LLM chatbots, used to predict user ratings of anthropomorphism, trust, and related perceptions.
Load-bearing premise
The systematic variations in the LLM chatbots successfully and independently manipulated perceived warmth, competence, and empathy as intended by the experimenters.
What would settle it
If participants in a replication report no reliable differences in perceived warmth, competence, or empathy across the varied chatbot conditions, the reported predictive relationships would not hold.
Figures
read the original abstract
With large language models (LLMs) becoming increasingly prevalent in daily life, so too has the tendency to attribute to them human-like minds and emotions, or anthropomorphize them. Here, we investigate dimensions people use to anthropomorphize and attribute trust toward LLMs across more than 2,000 human-LLM interactions. Participants (N=115) engaged with LLM chatbots systematically varied in warmth (friendliness), competence (capability, coherence), and empathy (cognitive and affective). Warmth and cognitive empathy significantly predicted perceptions on all outcomes (perceived anthropomorphism, trust, similarity, relational closeness, frustration, usefulness), while competence predicted all outcomes except for anthropomorphism. Affective empathy primarily predicted perceived relational measures, but did not predict the epistemic outcomes. Topic sub-analyses showed that more subjective, personally relevant topics (e.g., relationship advice) amplified these effects, producing greater human-likeness and relational connection with the LLM than did objective topics. Together, these findings reveal that warmth, competence, and empathy are key dimensions through which people attribute relational and epistemic perceptions to artificial agents.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports results from an empirical study in which 115 participants completed over 2,000 interactions with LLM chatbots whose prompts were systematically varied along warmth, competence, cognitive empathy, and affective empathy. Regression analyses show that warmth and cognitive empathy predict perceived anthropomorphism, trust, similarity, relational closeness, frustration, and usefulness; competence predicts all outcomes except anthropomorphism; affective empathy predicts primarily relational measures; and effects are amplified for subjective, personally relevant topics.
Significance. If the prompt manipulations successfully and independently altered the targeted dimensions, the work supplies concrete evidence on the relative contributions of warmth, competence, and empathy to anthropomorphism and trust in LLMs. The large interaction count and within-subjects topic variation provide reasonable statistical power and allow examination of boundary conditions, which could inform both psychological theory and practical chatbot design.
major comments (2)
- [Methods/Results] Methods and Results sections: the manuscript reports no manipulation-check statistics (means, standard deviations, or tests confirming that the warmth, competence, cognitive-empathy, and affective-empathy variations produced distinct and low-correlation perceptions). Because the central claim attributes distinct predictive roles to these dimensions, the absence of these checks leaves open the possibility that the observed regression patterns reflect a single underlying factor such as overall fluency rather than the intended constructs.
- [Results] Results section: the regression models treat warmth, competence, cognitive empathy, and affective empathy as simultaneous predictors without reported multicollinearity diagnostics (VIF values or correlation matrix). If these dimensions are correlated in participants' perceptions, the claimed independent contributions cannot be unambiguously interpreted.
minor comments (2)
- [Abstract] Abstract: the summary would be strengthened by a brief statement of the manipulation-check outcomes or at least the direction and significance of the key regression coefficients.
- [Discussion] Discussion: the claim that subjective topics 'amplified' effects would benefit from explicit statistical comparison (interaction terms or simple-effects tests) rather than descriptive sub-analyses alone.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the validation of our experimental manipulations and the interpretation of the regression results. We address each major point below and have revised the manuscript to incorporate the requested diagnostics and checks.
read point-by-point responses
-
Referee: [Methods/Results] Methods and Results sections: the manuscript reports no manipulation-check statistics (means, standard deviations, or tests confirming that the warmth, competence, cognitive-empathy, and affective-empathy variations produced distinct and low-correlation perceptions). Because the central claim attributes distinct predictive roles to these dimensions, the absence of these checks leaves open the possibility that the observed regression patterns reflect a single underlying factor such as overall fluency rather than the intended constructs.
Authors: We agree that the original manuscript omitted explicit manipulation-check statistics, which limits the ability to confirm the independence of the prompt variations. In the revised manuscript we have added these analyses to the Methods and Results sections, reporting means and standard deviations for each perceived dimension across conditions as well as the full correlation matrix among the four dimensions. The correlations range from 0.28 to 0.61, indicating moderate rather than perfect overlap. In addition, we include one-way ANOVAs confirming that each prompt manipulation produced statistically significant differences on its target dimension. These results, together with the differential outcome patterns (competence fails to predict anthropomorphism while warmth does), argue against the possibility that all effects collapse to a single factor such as fluency. revision: yes
-
Referee: [Results] Results section: the regression models treat warmth, competence, cognitive empathy, and affective empathy as simultaneous predictors without reported multicollinearity diagnostics (VIF values or correlation matrix). If these dimensions are correlated in participants' perceptions, the claimed independent contributions cannot be unambiguously interpreted.
Authors: We accept that multicollinearity diagnostics were missing from the original submission and are necessary for unambiguous interpretation of the coefficients. The revised Results section now reports the complete correlation matrix among the four predictors and the variance inflation factor (VIF) for every regression model. All VIF values fall between 1.4 and 3.2, well below conventional thresholds of 5 or 10. These diagnostics support the claim that each dimension makes a statistically independent contribution to the outcome variables. revision: yes
Circularity Check
No circularity: purely empirical regression on fresh participant data
full rationale
The paper reports an experimental study (N=115, >2000 interactions) in which LLM chatbots were varied along warmth/competence/empathy dimensions and participants rated outcomes. All central claims are statistical associations obtained via regression on measured variables. No equations, fitted parameters, or derivations are present that could reduce any result to its own inputs by construction. No self-citation chains or ansatzes are invoked to justify the core findings. The work is therefore self-contained against external benchmarks and receives the default non-circularity score.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Self-report scales validly capture perceived anthropomorphism, trust, and relational closeness in human-LLM interactions
- domain assumption The LLM variations successfully and orthogonally manipulated warmth, competence, and empathy
Reference graph
Works this paper leans on
-
[1]
Tahiroglu, D., & Taylor, M. (2019). Anthropomorphism, social understanding, and imaginary companions.British Journal of Developmental Psychology, 37(2), 284-299
work page 2019
-
[2]
Shanahan, M. (2024). Talking about large language models.Communications of the ACM, 67(2), 68-79. https: //doi.org/10.48550/arXiv.2212.03551
-
[3]
Akbulut, C., Weidinger, L., Manzini, A., Gabriel, I., & Rieser, V . (2024, October). All too human? Mapping and mitigating the risk from anthropomorphic AI. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society (V ol. 7, No. 1, pp. 13-26).https://doi.org/10.1609/aies.v7i1.31613
-
[4]
Weidinger, L., Mellor, J.F., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S.M., Hawkins, W.T., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., & Gabriel, I. (2021). Ethical and social risks of harm from Language Model...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2112.04359 2021
-
[5]
Blake, S. (2025, October 1). Third of Americans have had a romantic relationship with AI.Newsweek. https: //www.newsweek.com/third-americans-have-had-romantic-relationship-ai-10814798
work page 2025
-
[6]
Shevlin, H. (2024). All too human? Identifying and mitigating ethical risks of Social AI.Law, Ethics & Technology, 1(2).https://doi.org/10.55092/let20240003
-
[7]
Kuenssberg, L. (2025, November 8). Mothers say AI chatbots encouraged their sons to kill themselves.BBC News. https://www.bbc.com/news/articles/ce3xgwyywe4o
work page 2025
-
[8]
Guingrich, R. E., & Graziano, M. S. (2025). A Longitudinal Randomized Control Study of Companion Chatbot Use: Anthropomorphism and Its Mediating Role on Social Impacts.arXiv preprint arXiv:2509.19515
-
[9]
Calo, M. R. (2011). Against notice skepticism in privacy (and elsewhere).Notre Dame Law Review, 87, 1027. https://ndlawreview.org/wp-content/uploads/2013/06/Calo.pdf
work page 2011
-
[10]
Kaminski, E., Rueben, M., Smart, W. D., & Grimm, C. M. (2017). Averting robot eyes, 76 Md. L Rev, 983, 1001-1020
work page 2017
-
[11]
Fiske, S. T., Cuddy, A. J. C., & Glick, P. (2007). Universal dimensions of social cognition: Warmth and competence. Trends in Cognitive Sciences, 11(2), 77–83
work page 2007
-
[12]
Harris, L. T., & Fiske, S. T. (2006). Dehumanizing the Lowest of the Low: Neuroimaging Responses to Extreme Out-Groups.Psychological Science, 17(10), 847–853
work page 2006
-
[13]
Kadambi, A., Ringold, S., Kamath, S., Raman, N., Jayashankar, A., Damasio, A., Narayanan, S., Kaplan, J., & Aziz-Zadeh, L. (2025). Humanizing the dehumanized: A test of strategies. https://doi.org/10.21203/rs.3. rs-7330548/v1
-
[14]
Epley, N., Waytz, A., & Cacioppo, J. T. (2008). On seeing human: A three-factor theory of anthropomorphism. Psychological Review, 114(4), 864–886
work page 2008
-
[15]
Christoforakos, L., Gallucci, A., Surmava-Große, T., Ullrich, D., & Diefenbach, S. (2021). Can robots earn our trust the same way humans do? A systematic exploration of competence, warmth, and anthropomorphism as determinants of trust development in HRI.Frontiers in Robotics and AI, 8, 640444. https://doi.org/10. 3389/frobt.2021.640444
-
[16]
Sorin, V ., Brin, D., Barash, Y ., Konen, E., Charney, A., Nadkarni, G., & Klang, E. (2024). Large language models and empathy: systematic review.Journal of Medical Internet Research, 26, e52597
work page 2024
-
[17]
G., Aharon-Peretz, J., & Perry, D
Shamay-Tsoory, S. G., Aharon-Peretz, J., & Perry, D. (2009). Two systems for empathy: a double dissociation between emotional and cognitive empathy in inferior frontal gyrus versus ventromedial prefrontal lesions.Brain, 132(3), 617–627
work page 2009
-
[18]
Zaki, J., Weber, J., Bolger, N., & Ochsner, K. (2009). The neural bases of empathic accuracy.Proceedings of the National Academy of Sciences, 106(27), 11382–11387
work page 2009
-
[19]
Waytz, A., Cacioppo, J., & Epley, N. (2010). Who sees human? The stability and importance of individual differences in anthropomorphism.Perspectives on Psychological Science, 5(3), 219–232. https://doi.org/10. 1177/1745691610369336
work page 2010
-
[20]
S., Goswami, R., Saegusa, K., & Broadbent, E
Johanson, D., Ahn, H. S., Goswami, R., Saegusa, K., & Broadbent, E. (2023). The effects of healthcare robot empathy statements and head nodding on trust and satisfaction: A video study.ACM Transactions on Human-Robot Interaction, 12(1), 1–21.https://doi.org/10.1145/3549534
-
[21]
W., Poliak, A., Dredze, M., Leas, E
Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.JAMA internal medicine, 183(6), 589–596. https: //doi.org/10.1001/jamainte...
-
[22]
Liu, S., McCoy, A. B., Wright, A. P., Carew, B., Genkins, J. Z., Huang, S. S., Peterson, J. F., Steitz, B., & Wright, A. (2024). Leveraging large language models for generating responses to patient messages-a subjective analysis. Journal of the American Medical Informatics Association : JAMIA, 31(6), 1367–1379. https://doi.org/10. 1093/jamia/ocae052
work page 2024
-
[23]
Welivita, A., & Pu, P. (2024). Are large language models more empathetic than humans?arXiv. https://doi. org/10.48550/arXiv.2406.05063
-
[24]
Song, J., & Lin, H. (2024). Exploring the effect of artificial intelligence intellect on consumer decision delegation: The role of trust, task objectivity, and anthropomorphism.Journal of Consumer Behaviour , 23(2), 727–747
work page 2024
-
[25]
P., Boatfield, C., Wang, X., DeCero, E., Krupica, I
Ta-Johnson, V . P., Boatfield, C., Wang, X., DeCero, E., Krupica, I. C., Rasof, S. D., Motzer, A., & Pedryc, W. M. (2022). Assessing the topics and motivating factors behind human-social chatbot interactions: Thematic analysis of user experiences.JMIR Human Factors, 9(4), 1–12.https://doi.org/10.2196/38876
-
[26]
R Project for Statistical Computing. (n.d.). The R Project for Statistical Computing. Retrieved November 6, 2025, fromhttps://www.r-project.org/
work page 2025
-
[27]
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4.Journal of Statistical Software, 67(1), 1–48
work page 2015
-
[28]
Madhavan, P., & Wiegmann, D. A. (2007). Similarities and differences between human–human and hu- man–automation trust: An integrative review.Theoretical Issues in Ergonomics Science, 8(4), 277–301. https://doi.org/10.1080/14639220500337708
-
[29]
Hancock, P. A., Billings, D. R., Schaefer, K. E., Chen, J. Y . C., de Visser, E. J., & Parasuraman, R. (2011). A Meta-Analysis of Factors Affecting Trust in Human-Robot Interaction.Human Factors: The Journal of the Human Factors and Ergonomics Society, 53(5), 517–527
work page 2011
-
[30]
Colombatto, C., & Fleming, S. M. (2023). Illusions of confidence in artificial systems.Preprint at psyArXiv, 10
work page 2023
-
[31]
Jacovi, A., Marasovi´c, A., Miller, T., & Goldberg, Y . (2021, March). Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 624-635)
work page 2021
-
[32]
Ibrahim, L., Hafner, F. S., & Rocher, L. (2025). Training language models to be warm and empathetic makes them less reliable and more sycophantic.arXiv preprint arXiv:2507.21919
-
[33]
Malmqvist, L. (2025, June). Sycophancy in large language models: Causes and mitigations. InIntelligent Computing-Proceedings of the Computing Conference (pp. 61-74). Cham: Springer Nature Switzerland
work page 2025
- [34]
-
[35]
Towards Understanding Sycophancy in Language Models
Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., Cheng, N., Durmus, E., Hatfield- Dodds, Z., Johnston, S. R., Kravec, S., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M., & Perez, E. (2023). Towards understanding sycophancy in language models (arXiv preprint arXiv:2310.13548)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[36]
Costello, T. H., Pennycook, G., & Rand, D. G. (2025, February 17). Just the facts: How dialogues with AI reduce conspiracy beliefs.https://doi.org/10.31234/osf.io/h7n8u_v1
-
[37]
Costello, T. H., Pennycook, G., & Rand, D. G. (2024). Durably reducing conspiracy beliefs through dialogues with AI.Science, 385(6714), eadq1814
work page 2024
-
[38]
Shank, D. B., & DeSanti, A. (2018). Attributions of morality and mind to artificial intelligence after real-world moral violations.Computers in human behavior , 86, 401-411
work page 2018
-
[39]
Peter, S., Riemer, K., & West, J. D. (2025). The benefits and dangers of anthropomorphic conversational agents. Proceedings of the National Academy of Sciences of the United States of America, 122(22), e2415898122
work page 2025
-
[40]
Nyholm, L., Santamäki-Fischer, R., & Fagerström, L. (2021). Users’ ambivalent sense of security with humanoid robots in healthcare.Informatics for Health and Social Care, 46(2), 218–226
work page 2021
-
[41]
Binns, R. (2018). Algorithmic accountability and public reason.Philosophy & Technology, 31(4), 543–556. https://doi.org/10.1007/s13347-017-0263-5 A Appendix A.1 Prompts Warmth and Competence (WC)You are a large language model characterized by different levels of competence and warmth. Competence is the dimension of social perception that reflects a person...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.