arxiv: 2604.15316 · v1 · submitted 2026-03-01 · 💻 cs.HC · cs.AI

Recognition: no theorem link

Anthropomorphism and Trust in Human-Large Language Model interactions

Akila Kadambi , Ylenia D'Elia , Tanishka Shah , Iulia Comsa , Alison Lentz , Katie Siri-Ngammuang , Tara Buechler , Jonas Kaplan

show 3 more authors

Antonio Damasio Srini Narayanan Lisa Aziz-Zadeh

Authors on Pith no claims yet

Pith reviewed 2026-05-15 17:37 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords anthropomorphismtrustlarge language modelshuman-AI interactionwarmthempathycompetencerelational perceptions

0 comments

The pith

Warmth and cognitive empathy drive perceptions of anthropomorphism and trust in LLMs more than competence does.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines more than 2,000 human interactions with LLM chatbots that were varied in displayed warmth, competence, and empathy. Warmth and cognitive empathy predicted higher scores on anthropomorphism, trust, similarity, relational closeness, usefulness, and lower frustration, while competence predicted every outcome except anthropomorphism. Affective empathy mainly shaped relational perceptions such as closeness but left epistemic judgments like usefulness unaffected. Subjective, personally relevant conversation topics strengthened the effects, producing greater human-likeness and connection than objective topics did. These patterns indicate that specific social cues shape how users relate to and evaluate artificial agents.

Core claim

In more than 2,000 human-LLM interactions, warmth and cognitive empathy significantly predicted perceptions on all outcomes (perceived anthropomorphism, trust, similarity, relational closeness, frustration, usefulness), while competence predicted all outcomes except for anthropomorphism. Affective empathy primarily predicted perceived relational measures but did not predict the epistemic outcomes. Topic sub-analyses showed that more subjective, personally relevant topics amplified these effects, producing greater human-likeness and relational connection with the LLM than did objective topics.

What carries the argument

Systematic variation of warmth (friendliness), competence (capability and coherence), and empathy (cognitive and affective) in LLM chatbots, used to predict user ratings of anthropomorphism, trust, and related perceptions.

Load-bearing premise

The systematic variations in the LLM chatbots successfully and independently manipulated perceived warmth, competence, and empathy as intended by the experimenters.

What would settle it

If participants in a replication report no reliable differences in perceived warmth, competence, or empathy across the varied chatbot conditions, the reported predictive relationships would not hold.

Figures

Figures reproduced from arXiv: 2604.15316 by Akila Kadambi, Alison Lentz, Antonio Damasio, Iulia Comsa, Jonas Kaplan, Katie Siri-Ngammuang, Lisa Aziz-Zadeh, Srini Narayanan, Tanishka Shah, Tara Buechler, Ylenia D'Elia.

**Figure 1.** Figure 1: Task Schematic. Participants were randomly assigned a topic for conversing with the chatbot (LLM), and then randomly assigned to either the warmth/competence condition or the cognitive/affective empathy condition. Each condition yielded nine different pairwise combinations (minimum-minimum, medium-maximum, etc.). Each combination was presented twice, for a total of 18 LLM interactions. Participants engaged… view at source ↗

**Figure 2.** Figure 2: Effects of warmth, competence, affective empathy, and cognitive empathy on anthropomorphism and trust. Mean ratings (±95% CI) for anthropomorphism (top row) and trust (bottom row) as a function of manipulated trait levels. In the Warmth/Competence condition (left column), increasing competence produced a monotonic rise in trust and usefulness, while increasing warmth strongly enhanced anthropomorphism and … view at source ↗

**Figure 3.** Figure 3: Effects of warmth, competence, affective empathy, and cognitive empathy on different outcomes across manipulated levels of warmth/competence (left) and affective/cognitive empathy (right). Higher competence significantly increased epistemic factors, such as perceived usefulness and reduced frustration, whereas higher warmth strengthened perceived relational closeness. The empathy condition largely paralle… view at source ↗

**Figure 4.** Figure 4: Participant ratings show trait-level manipulations across all four dimensions. Violin plots show the distribution of participant ratings (1-7 scale) for each manipulated trait level (Minimal, Medium, Maximal). Black points represent mean ratings with error bars indicating 95% confidence intervals. Pairwise comparisons between levels were conducted using Tukey’s HSD test. Top row: In the warmth/competence c… view at source ↗

**Figure 5.** Figure 5: Topic Type Comparisons for Significant Survey Questions for Warmth/Competence and Empathy Conditions. Bar graphs comparing mean participant responses (±SE) to subjective versus objective topics across multiple survey questions. Each row represents a different survey question, with the left column showing results from the Warmth/Competence condition and the right column showing results from the Empathy con… view at source ↗

read the original abstract

With large language models (LLMs) becoming increasingly prevalent in daily life, so too has the tendency to attribute to them human-like minds and emotions, or anthropomorphize them. Here, we investigate dimensions people use to anthropomorphize and attribute trust toward LLMs across more than 2,000 human-LLM interactions. Participants (N=115) engaged with LLM chatbots systematically varied in warmth (friendliness), competence (capability, coherence), and empathy (cognitive and affective). Warmth and cognitive empathy significantly predicted perceptions on all outcomes (perceived anthropomorphism, trust, similarity, relational closeness, frustration, usefulness), while competence predicted all outcomes except for anthropomorphism. Affective empathy primarily predicted perceived relational measures, but did not predict the epistemic outcomes. Topic sub-analyses showed that more subjective, personally relevant topics (e.g., relationship advice) amplified these effects, producing greater human-likeness and relational connection with the LLM than did objective topics. Together, these findings reveal that warmth, competence, and empathy are key dimensions through which people attribute relational and epistemic perceptions to artificial agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows warmth and cognitive empathy in LLMs drive anthropomorphism, trust, and relational perceptions more than competence does, with stronger effects on subjective topics.

read the letter

The core finding is straightforward: when LLMs are prompted to vary in warmth, competence, and empathy types, warmth and cognitive empathy track with higher perceived anthropomorphism, trust, similarity, closeness, usefulness, and lower frustration across the board. Competence drops out for anthropomorphism specifically, and affective empathy sticks mostly to relational measures. Subjective topics like relationship advice amplify the whole pattern compared with objective ones. That three-way interaction with topic type is the piece that feels freshest relative to the cited prior work on human-AI perception.

Referee Report

2 major / 2 minor

Summary. The paper reports results from an empirical study in which 115 participants completed over 2,000 interactions with LLM chatbots whose prompts were systematically varied along warmth, competence, cognitive empathy, and affective empathy. Regression analyses show that warmth and cognitive empathy predict perceived anthropomorphism, trust, similarity, relational closeness, frustration, and usefulness; competence predicts all outcomes except anthropomorphism; affective empathy predicts primarily relational measures; and effects are amplified for subjective, personally relevant topics.

Significance. If the prompt manipulations successfully and independently altered the targeted dimensions, the work supplies concrete evidence on the relative contributions of warmth, competence, and empathy to anthropomorphism and trust in LLMs. The large interaction count and within-subjects topic variation provide reasonable statistical power and allow examination of boundary conditions, which could inform both psychological theory and practical chatbot design.

major comments (2)

[Methods/Results] Methods and Results sections: the manuscript reports no manipulation-check statistics (means, standard deviations, or tests confirming that the warmth, competence, cognitive-empathy, and affective-empathy variations produced distinct and low-correlation perceptions). Because the central claim attributes distinct predictive roles to these dimensions, the absence of these checks leaves open the possibility that the observed regression patterns reflect a single underlying factor such as overall fluency rather than the intended constructs.
[Results] Results section: the regression models treat warmth, competence, cognitive empathy, and affective empathy as simultaneous predictors without reported multicollinearity diagnostics (VIF values or correlation matrix). If these dimensions are correlated in participants' perceptions, the claimed independent contributions cannot be unambiguously interpreted.

minor comments (2)

[Abstract] Abstract: the summary would be strengthened by a brief statement of the manipulation-check outcomes or at least the direction and significance of the key regression coefficients.
[Discussion] Discussion: the claim that subjective topics 'amplified' effects would benefit from explicit statistical comparison (interaction terms or simple-effects tests) rather than descriptive sub-analyses alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the validation of our experimental manipulations and the interpretation of the regression results. We address each major point below and have revised the manuscript to incorporate the requested diagnostics and checks.

read point-by-point responses

Referee: [Methods/Results] Methods and Results sections: the manuscript reports no manipulation-check statistics (means, standard deviations, or tests confirming that the warmth, competence, cognitive-empathy, and affective-empathy variations produced distinct and low-correlation perceptions). Because the central claim attributes distinct predictive roles to these dimensions, the absence of these checks leaves open the possibility that the observed regression patterns reflect a single underlying factor such as overall fluency rather than the intended constructs.

Authors: We agree that the original manuscript omitted explicit manipulation-check statistics, which limits the ability to confirm the independence of the prompt variations. In the revised manuscript we have added these analyses to the Methods and Results sections, reporting means and standard deviations for each perceived dimension across conditions as well as the full correlation matrix among the four dimensions. The correlations range from 0.28 to 0.61, indicating moderate rather than perfect overlap. In addition, we include one-way ANOVAs confirming that each prompt manipulation produced statistically significant differences on its target dimension. These results, together with the differential outcome patterns (competence fails to predict anthropomorphism while warmth does), argue against the possibility that all effects collapse to a single factor such as fluency. revision: yes
Referee: [Results] Results section: the regression models treat warmth, competence, cognitive empathy, and affective empathy as simultaneous predictors without reported multicollinearity diagnostics (VIF values or correlation matrix). If these dimensions are correlated in participants' perceptions, the claimed independent contributions cannot be unambiguously interpreted.

Authors: We accept that multicollinearity diagnostics were missing from the original submission and are necessary for unambiguous interpretation of the coefficients. The revised Results section now reports the complete correlation matrix among the four predictors and the variance inflation factor (VIF) for every regression model. All VIF values fall between 1.4 and 3.2, well below conventional thresholds of 5 or 10. These diagnostics support the claim that each dimension makes a statistically independent contribution to the outcome variables. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical regression on fresh participant data

full rationale

The paper reports an experimental study (N=115, >2000 interactions) in which LLM chatbots were varied along warmth/competence/empathy dimensions and participants rated outcomes. All central claims are statistical associations obtained via regression on measured variables. No equations, fitted parameters, or derivations are present that could reduce any result to its own inputs by construction. No self-citation chains or ansatzes are invoked to justify the core findings. The work is therefore self-contained against external benchmarks and receives the default non-circularity score.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard assumptions of psychological measurement and statistical inference rather than new theoretical constructs.

axioms (2)

domain assumption Self-report scales validly capture perceived anthropomorphism, trust, and relational closeness in human-LLM interactions
Invoked implicitly when interpreting all outcome measures; common in social psychology but subject to demand characteristics.
domain assumption The LLM variations successfully and orthogonally manipulated warmth, competence, and empathy
Required for attributing effects to the intended dimensions; stated in the abstract as systematic variation.

pith-pipeline@v0.9.0 · 5532 in / 1295 out tokens · 55684 ms · 2026-05-15T17:37:39.050358+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

[1]

Tahiroglu, D., & Taylor, M. (2019). Anthropomorphism, social understanding, and imaginary companions.British Journal of Developmental Psychology, 37(2), 284-299

work page 2019
[2]

Shanahan, M. (2024). Talking about large language models.Communications of the ACM, 67(2), 68-79. https: //doi.org/10.48550/arXiv.2212.03551

work page doi:10.48550/arxiv.2212.03551 2024
[3]

(2024, October)

Akbulut, C., Weidinger, L., Manzini, A., Gabriel, I., & Rieser, V . (2024, October). All too human? Mapping and mitigating the risk from anthropomorphic AI. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society (V ol. 7, No. 1, pp. 13-26).https://doi.org/10.1609/aies.v7i1.31613

work page doi:10.1609/aies.v7i1.31613 2024
[4]

Weidinger, L., Mellor, J.F., Rauh, M., Griffin, C., Uesato, J., Huang, P., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S.M., Hawkins, W.T., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L.A., Isaac, W.S., Legassick, S., Irving, G., & Gabriel, I. (2021). Ethical and social risks of harm from Language Model...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2112.04359 2021
[5]

(2025, October 1)

Blake, S. (2025, October 1). Third of Americans have had a romantic relationship with AI.Newsweek. https: //www.newsweek.com/third-americans-have-had-romantic-relationship-ai-10814798

work page 2025
[6]

Shevlin, H. (2024). All too human? Identifying and mitigating ethical risks of Social AI.Law, Ethics & Technology, 1(2).https://doi.org/10.55092/let20240003

work page doi:10.55092/let20240003 2024
[7]

(2025, November 8)

Kuenssberg, L. (2025, November 8). Mothers say AI chatbots encouraged their sons to kill themselves.BBC News. https://www.bbc.com/news/articles/ce3xgwyywe4o

work page 2025
[8]

E., & Graziano, M

Guingrich, R. E., & Graziano, M. S. (2025). A Longitudinal Randomized Control Study of Companion Chatbot Use: Anthropomorphism and Its Mediating Role on Social Impacts.arXiv preprint arXiv:2509.19515

work page arXiv 2025
[9]

Calo, M. R. (2011). Against notice skepticism in privacy (and elsewhere).Notre Dame Law Review, 87, 1027. https://ndlawreview.org/wp-content/uploads/2013/06/Calo.pdf

work page 2011
[10]

D., & Grimm, C

Kaminski, E., Rueben, M., Smart, W. D., & Grimm, C. M. (2017). Averting robot eyes, 76 Md. L Rev, 983, 1001-1020

work page 2017
[11]

T., Cuddy, A

Fiske, S. T., Cuddy, A. J. C., & Glick, P. (2007). Universal dimensions of social cognition: Warmth and competence. Trends in Cognitive Sciences, 11(2), 77–83

work page 2007
[12]

T., & Fiske, S

Harris, L. T., & Fiske, S. T. (2006). Dehumanizing the Lowest of the Low: Neuroimaging Responses to Extreme Out-Groups.Psychological Science, 17(10), 847–853

work page 2006
[13]

Kadambi, A., Ringold, S., Kamath, S., Raman, N., Jayashankar, A., Damasio, A., Narayanan, S., Kaplan, J., & Aziz-Zadeh, L. (2025). Humanizing the dehumanized: A test of strategies. https://doi.org/10.21203/rs.3. rs-7330548/v1

work page doi:10.21203/rs.3 2025
[14]

Epley, N., Waytz, A., & Cacioppo, J. T. (2008). On seeing human: A three-factor theory of anthropomorphism. Psychological Review, 114(4), 864–886

work page 2008
[15]

Christoforakos, L., Gallucci, A., Surmava-Große, T., Ullrich, D., & Diefenbach, S. (2021). Can robots earn our trust the same way humans do? A systematic exploration of competence, warmth, and anthropomorphism as determinants of trust development in HRI.Frontiers in Robotics and AI, 8, 640444. https://doi.org/10. 3389/frobt.2021.640444

work page arXiv 2021
[16]

Sorin, V ., Brin, D., Barash, Y ., Konen, E., Charney, A., Nadkarni, G., & Klang, E. (2024). Large language models and empathy: systematic review.Journal of Medical Internet Research, 26, e52597

work page 2024
[17]

G., Aharon-Peretz, J., & Perry, D

Shamay-Tsoory, S. G., Aharon-Peretz, J., & Perry, D. (2009). Two systems for empathy: a double dissociation between emotional and cognitive empathy in inferior frontal gyrus versus ventromedial prefrontal lesions.Brain, 132(3), 617–627

work page 2009
[18]

Zaki, J., Weber, J., Bolger, N., & Ochsner, K. (2009). The neural bases of empathic accuracy.Proceedings of the National Academy of Sciences, 106(27), 11382–11387

work page 2009
[19]

Waytz, A., Cacioppo, J., & Epley, N. (2010). Who sees human? The stability and importance of individual differences in anthropomorphism.Perspectives on Psychological Science, 5(3), 219–232. https://doi.org/10. 1177/1745691610369336

work page 2010
[20]

S., Goswami, R., Saegusa, K., & Broadbent, E

Johanson, D., Ahn, H. S., Goswami, R., Saegusa, K., & Broadbent, E. (2023). The effects of healthcare robot empathy statements and head nodding on trust and satisfaction: A video study.ACM Transactions on Human-Robot Interaction, 12(1), 1–21.https://doi.org/10.1145/3549534

work page doi:10.1145/3549534 2023
[21]

W., Poliak, A., Dredze, M., Leas, E

Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023). Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.JAMA internal medicine, 183(6), 589–596. https: //doi.org/10.1001/jamainte...

work page doi:10.1001/jamainternmed.2023.1838 2023
[22]

B., Wright, A

Liu, S., McCoy, A. B., Wright, A. P., Carew, B., Genkins, J. Z., Huang, S. S., Peterson, J. F., Steitz, B., & Wright, A. (2024). Leveraging large language models for generating responses to patient messages-a subjective analysis. Journal of the American Medical Informatics Association : JAMIA, 31(6), 1367–1379. https://doi.org/10. 1093/jamia/ocae052

work page 2024
[23]

Welivita, A., & Pu, P. (2024). Are large language models more empathetic than humans?arXiv. https://doi. org/10.48550/arXiv.2406.05063

work page doi:10.48550/arxiv.2406.05063 2024
[24]

Song, J., & Lin, H. (2024). Exploring the effect of artificial intelligence intellect on consumer decision delegation: The role of trust, task objectivity, and anthropomorphism.Journal of Consumer Behaviour , 23(2), 727–747

work page 2024
[25]

P., Boatfield, C., Wang, X., DeCero, E., Krupica, I

Ta-Johnson, V . P., Boatfield, C., Wang, X., DeCero, E., Krupica, I. C., Rasof, S. D., Motzer, A., & Pedryc, W. M. (2022). Assessing the topics and motivating factors behind human-social chatbot interactions: Thematic analysis of user experiences.JMIR Human Factors, 9(4), 1–12.https://doi.org/10.2196/38876

work page doi:10.2196/38876 2022
[26]

R Project for Statistical Computing. (n.d.). The R Project for Statistical Computing. Retrieved November 6, 2025, fromhttps://www.r-project.org/

work page 2025
[27]

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4.Journal of Statistical Software, 67(1), 1–48

work page 2015
[28]

Madhavan, P., & Wiegmann, D. A. (2007). Similarities and differences between human–human and hu- man–automation trust: An integrative review.Theoretical Issues in Ergonomics Science, 8(4), 277–301. https://doi.org/10.1080/14639220500337708

work page doi:10.1080/14639220500337708 2007
[29]

A., Billings, D

Hancock, P. A., Billings, D. R., Schaefer, K. E., Chen, J. Y . C., de Visser, E. J., & Parasuraman, R. (2011). A Meta-Analysis of Factors Affecting Trust in Human-Robot Interaction.Human Factors: The Journal of the Human Factors and Ergonomics Society, 53(5), 517–527

work page 2011
[30]

Colombatto, C., & Fleming, S. M. (2023). Illusions of confidence in artificial systems.Preprint at psyArXiv, 10

work page 2023
[31]

(2021, March)

Jacovi, A., Marasovi´c, A., Miller, T., & Goldberg, Y . (2021, March). Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. InProceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 624-635)

work page 2021
[32]

S., & Rocher, L

Ibrahim, L., Hafner, F. S., & Rocher, L. (2025). Training language models to be warm and empathetic makes them less reliable and more sycophantic.arXiv preprint arXiv:2507.21919

work page arXiv 2025
[33]

(2025, June)

Malmqvist, L. (2025, June). Sycophancy in large language models: Causes and mitigations. InIntelligent Computing-Proceedings of the Computing Conference (pp. 61-74). Cham: Springer Nature Switzerland

work page 2025
[34]

Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., & Jurafsky, D. (2025). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence.arXiv preprint arXiv:2510.01395

work page arXiv 2025
[35]

Towards Understanding Sycophancy in Language Models

Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., Cheng, N., Durmus, E., Hatfield- Dodds, Z., Johnston, S. R., Kravec, S., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M., & Perez, E. (2023). Towards understanding sycophancy in language models (arXiv preprint arXiv:2310.13548)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

H., Pennycook, G., & Rand, D

Costello, T. H., Pennycook, G., & Rand, D. G. (2025, February 17). Just the facts: How dialogues with AI reduce conspiracy beliefs.https://doi.org/10.31234/osf.io/h7n8u_v1

work page doi:10.31234/osf.io/h7n8u_v1 2025
[37]

H., Pennycook, G., & Rand, D

Costello, T. H., Pennycook, G., & Rand, D. G. (2024). Durably reducing conspiracy beliefs through dialogues with AI.Science, 385(6714), eadq1814

work page 2024
[38]

B., & DeSanti, A

Shank, D. B., & DeSanti, A. (2018). Attributions of morality and mind to artificial intelligence after real-world moral violations.Computers in human behavior , 86, 401-411

work page 2018
[39]

Peter, S., Riemer, K., & West, J. D. (2025). The benefits and dangers of anthropomorphic conversational agents. Proceedings of the National Academy of Sciences of the United States of America, 122(22), e2415898122

work page 2025
[40]

Nyholm, L., Santamäki-Fischer, R., & Fagerström, L. (2021). Users’ ambivalent sense of security with humanoid robots in healthcare.Informatics for Health and Social Care, 46(2), 218–226

work page 2021
[41]

fellow feeling

Binns, R. (2018). Algorithmic accountability and public reason.Philosophy & Technology, 31(4), 543–556. https://doi.org/10.1007/s13347-017-0263-5 A Appendix A.1 Prompts Warmth and Competence (WC)You are a large language model characterized by different levels of competence and warmth. Competence is the dimension of social perception that reflects a person...

work page doi:10.1007/s13347-017-0263-5 2018