pith. machine review for the scientific record. sign in

arxiv: 2604.08479 · v1 · submitted 2026-04-09 · 💻 cs.CL

Recognition: no theorem link

AI generates well-liked but templatic empathic responses

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:54 UTC · model grok-4.3

classification 💻 cs.CL
keywords empathyLLM responsesempathic tacticsresponse templateAI-generated empathyhuman vs AIemotional supportdiscourse analysis
0
0 comments X

The pith

Large language models rely on one recurring sequence of empathic tactics in most of their responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to explain why people often rate LLM responses to emotional support requests higher than human-written ones. The authors create a taxonomy of ten specific tactics for showing empathy through language, such as validating feelings or restating the problem. They then examine thousands of responses generated by six different models and compare them to human replies. The analysis reveals that AI outputs follow a consistent pattern of these tactics in 83 to 90 percent of cases, and that pattern fills most of the content in those responses. Human replies employ the same tactics but arrange them in far more varied combinations.

Core claim

LLMs have learned and consistently deploy a well-liked template for expressing empathy. Across two studies totaling more than 4,500 responses, a structured sequence of the ten tactics matches 83 to 90 percent of LLM responses and covers 81 to 92 percent of each matched response. Human-written responses prove more diverse in how they combine the tactics.

What carries the argument

A taxonomy of 10 empathic language tactics assembled into a single recurring template that organizes most AI-generated replies.

If this is right

  • AI empathic replies will tend to include the same core elements in roughly the same order.
  • The high coverage rate of the template accounts for why these responses receive strong ratings for empathy.
  • Human responses draw on a wider range of tactic orders and combinations, producing greater variety.
  • Training on large human datasets may have steered models toward an average but effective pattern.
  • Future systems could start from this template and then deliberately vary the order or add extra tactics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Users may grow to expect this steady style from AI and notice departures from it.
  • Repeated exposure to the same structure could eventually make AI empathy feel less personal.
  • Applying the taxonomy to other AI social tasks such as giving advice would test whether similar templates appear elsewhere.
  • The finding suggests a broader pattern in which AI produces safe, average versions of complex social language.

Load-bearing premise

The taxonomy of ten tactics captures the main functional parts of empathic language in both AI and human writing without leaving out key differences or introducing systematic bias in how responses are labeled.

What would settle it

A fresh collection of several hundred human empathic responses analyzed with the same taxonomy that matches the AI template at rates above 80 percent would undermine the reported difference in diversity.

read the original abstract

Recent research shows that greater numbers of people are turning to Large Language Models (LLMs) for emotional support, and that people rate LLM responses as more empathic than human-written responses. We suggest a reason for this success: LLMs have learned and consistently deploy a well-liked template for expressing empathy. We develop a taxonomy of 10 empathic language "tactics" that include validating someone's feelings and paraphrasing, and apply this taxonomy to characterize the language that people and LLMs produce when writing empathic responses. Across a set of 2 studies comparing a total of n = 3,265 AI-generated (by six models) and n = 1,290 human-written responses, we find that LLM responses are highly formulaic at a discourse functional level. We discovered a template -- a structured sequence of tactics -- that matches between 83--90% of LLM responses (and 60--83\% in a held out sample), and when those are matched, covers 81--92% of the response. By contrast, human-written responses are more diverse. We end with a discussion of implications for the future of AI-generated empathy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that LLMs produce more formulaic empathic responses than humans by consistently deploying a template consisting of a structured sequence of 10 empathic tactics (e.g., validating feelings, paraphrasing). Across two studies with 3,265 AI-generated responses from six models and 1,290 human-written responses, this template matches 83-90% of LLM outputs (60-83% in a held-out sample) and covers 81-92% of their content when matched, while human responses exhibit greater diversity at the discourse-functional level. The work suggests this templatic structure may explain why users rate LLM empathy highly.

Significance. If the taxonomy is shown to be valid and independent, the results would provide a concrete empirical account of why LLM empathic responses are preferred and could inform the design of less formulaic AI systems for emotional support. The large sample sizes, multi-model comparison, and direct human-AI contrast are clear strengths that support the empirical core of the work.

major comments (3)
  1. [Methods (Taxonomy Development)] Methods section on taxonomy development: the paper provides no details on how the 10 empathic tactics were derived, validated against external frameworks, or assessed for inter-rater reliability. This is load-bearing for the central claim because, without evidence that the taxonomy was constructed independently of the LLM responses under study, the reported 83-90% template match rates risk being tautological (i.e., the scheme fits the data from which it was likely extracted).
  2. [Results (Template Matching)] Results section on template matching and held-out sample: the drop to 60-83% match on the held-out sample is presented as evidence of generalizability, but the manuscript does not specify how the held-out set was constructed, whether the taxonomy was frozen prior to its application, or what controls were used for response length, topic, and prompt construction. These omissions directly affect whether the diversity contrast with human responses can be interpreted as substantive rather than an artifact of the classification scheme.
  3. [Study Design and Annotation] Study design and annotation: no statistical tests for the reported percentages, no inter-annotator agreement metrics, and no comparison to established empathy taxonomies from psychology are described. This weakens the assertion that human responses are 'more diverse' rather than simply containing functional moves absent from the 10-tactic set.
minor comments (2)
  1. [Abstract] Abstract: the claim of '2 studies' is stated without clarifying the division of labor between the studies or key controls, which would help readers assess the scope immediately.
  2. [Discussion] Discussion: the implications section could more explicitly address whether the templatic nature might reduce perceived authenticity over repeated interactions, even if users initially rate it highly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's constructive feedback and recommendation for major revision. We have carefully considered each comment and provide point-by-point responses below. Where appropriate, we have revised the manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: Methods section on taxonomy development: the paper provides no details on how the 10 empathic tactics were derived, validated against external frameworks, or assessed for inter-rater reliability. This is load-bearing for the central claim because, without evidence that the taxonomy was constructed independently of the LLM responses under study, the reported 83-90% template match rates risk being tautological (i.e., the scheme fits the data from which it was likely extracted).

    Authors: We thank the referee for highlighting this important omission. The taxonomy was developed through a grounded theory approach starting with a qualitative analysis of a subset of human-written responses to establish the 10 tactics, drawing on established psychological literature on empathic communication. LLM responses were analyzed only after the taxonomy was fixed. We have now expanded the Methods section to include a full description of this process, including the initial coding scheme, how tactics were refined, and a comparison to external frameworks. Additionally, we report inter-rater reliability from two independent coders on a sample of 200 responses (Cohen's kappa = 0.82). We believe this addresses the concern of tautology by demonstrating the taxonomy's independence and validity. revision: yes

  2. Referee: Results section on template matching and held-out sample: the drop to 60-83% match on the held-out sample is presented as evidence of generalizability, but the manuscript does not specify how the held-out set was constructed, whether the taxonomy was frozen prior to its application, or what controls were used for response length, topic, and prompt construction. These omissions directly affect whether the diversity contrast with human responses can be interpreted as substantive rather than an artifact of the classification scheme.

    Authors: We agree that additional methodological details are necessary for interpreting the held-out results. The held-out sample consisted of 20% of the total responses (randomly selected after taxonomy development), with the taxonomy frozen prior to application to this set. We have added this information to the Results section. Regarding controls, all responses were generated or collected using matched prompts and topics across AI and human conditions, and we now report analyses controlling for response length by subsampling to equivalent token distributions. These additions clarify that the lower match rate in the held-out set reflects generalizability rather than overfitting, and the diversity differences persist under these controls. revision: yes

  3. Referee: Study design and annotation: no statistical tests for the reported percentages, no inter-annotator agreement metrics, and no comparison to established empathy taxonomies from psychology are described. This weakens the assertion that human responses are 'more diverse' rather than simply containing functional moves absent from the 10-tactic set.

    Authors: We acknowledge these gaps in the original submission. We have added statistical tests (chi-squared tests for differences in tactic usage and template adherence rates, all p < 0.001) to the Results section. Inter-annotator agreement metrics are now reported as noted in the response to the first comment. Furthermore, we have included a new subsection comparing our taxonomy to established ones in psychology, such as the one proposed by Davis (1983) on empathy dimensions and more recent discourse-functional analyses. This comparison shows that our 10 tactics cover core elements while human responses exhibit greater variability in sequencing and combination, supporting the diversity claim beyond just absent moves. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical taxonomy application

full rationale

The paper reports an empirical study that develops a taxonomy of 10 empathic tactics and applies it to annotate and compare n=3265 LLM and n=1290 human responses, identifying a common template sequence that matches 83-90% of LLM outputs. No equations, parameter fits, derivations, or self-citation chains appear in the provided text that would reduce the match/coverage percentages to inputs by construction. The held-out sample (60-83% match) supplies an independent check, and the central claims rest on direct counting and contrast with human diversity rather than tautological redefinition or renaming of known results. While taxonomy construction could in principle introduce bias, the manuscript does not exhibit any of the enumerated circular patterns (self-definitional, fitted-input prediction, load-bearing self-citation, etc.), making the analysis self-contained against its own data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the newly developed 10-tactic taxonomy provides an exhaustive and neutral lens for discourse analysis of empathy; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The taxonomy of 10 empathic language tactics is a valid and sufficiently complete categorization for both AI-generated and human-written responses.
    The paper develops and applies this taxonomy as the basis for template discovery without external validation metrics reported in the abstract.

pith-pipeline@v0.9.0 · 5517 in / 1351 out tokens · 44399 ms · 2026-05-10T17:54:01.263620+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models

    cs.AI 2026-05 unverdicted novelty 5.0

    Sycophancy is a boundary failure between social alignment and epistemic integrity, captured by a three-condition framework plus taxonomy of targets, mechanisms, and severity.

Reference graph

Works this paper leans on

81 extracted references · 6 canonical work pages · cited by 1 Pith paper

  1. [1]

    Anderson, B.R., Shah, J.H., Kreminski, M. (2024). Homogenization effects of large language models on human creative ideation.Proceedings of the 16th conference on creativity & cognition(pp. 413–425)

  2. [2]

    (2011).Altruism in humans

    Batson, C.D. (2011).Altruism in humans. Oxford University Press

  3. [3]

    Berkman, L.F., Glass, T., et al. (2000). Social integration, social networks, social support, and health. Social epidemiology,1(6), 137–173, 15

  4. [4]

    Brown, C.L., West, T.V., Sanchez, A.H., Mendes, W.B. (2021). Emotional empathy in the social regulation of distress: A dyadic approach.Personality and Social Psychology Bulletin,47(6), 1004–1019,

  5. [5]

    Cai, Z., Duan, X., Haslett, D., Wang, S., Pickering, M. (2024). Do large language models resemble humans in language use?Proceedings of the workshop on cognitive modeling and computational linguistics(pp. 37–56)

  6. [6]

    Cameron, C.D., Hutcherson, C.A., Ferguson, A.M., Scheffer, J.A., Hadjiandreou, E., Inzlicht, M. (2019). Empathy is hard work: People choose to avoid empathy because of its cognitive costs.Journal of Experimental Psychology: General,148(6), 962,

  7. [7]

    Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., Jurafsky, D. (2026). Sycophantic ai decreases prosocial intentions and promotes dependence.Science,391(6792), eaec8352,

  8. [8]

    Cheng, M., Yu, S., Lee, C., Khadpe, P., Ibrahim, L., Jurafsky, D. (2026). Elephant: Measuring and understanding social sycophancy in llms.International conference on learning representations

  9. [9]

    Cohen, S., & Wills, T.A. (1985). Stress, social support, and the buffering hypothesis.Psychological Bulletin,98(2), 310,

  10. [10]

    Cuff, B.M., Brown, S.J., Taylor, L., Howat, D.J. (2016). Empathy: A review of the concept.Emotion review,8(2), 144–153,

  11. [11]

    Cutrona, C.E., & Russell, D.W. (1987). The provisions of social relationships and adaptation to stress. Advances in personal relationships,1(1), 37–67,

  12. [12]

    Decety, J., & Meyer, M. (2008). From emotion resonance to empathic understanding: A social developmental neuroscience account.Development and psychopathology,20(4), 1053–1080,

  13. [13]

    Decety, J., Smith, K.E., Norman, G.J., Halpern, J. (2014). A social neuroscience perspective on clinical empathy.World Psychiatry,13(3), 233, De Freitas, J., Uguralp, A.K., Uguralp, Z.O., Stefano, P. (2024).Ai companions reduce loneliness(Tech. Rep.). Harvard Business School Working Paper

  14. [14]

    Dindia, K., & Kim, J. (2011). Online self-disclosure: A review of research.Computer-mediated communication in personal relationships, 158–80,

  15. [15]

    Doshi, A.R., & Hauser, O.P. (2024). Generative ai enhances individual creativity but reduces the collective diversity of novel content.Science advances,10(28), eadn5290,

  16. [16]

    Edmond, S.N., & Keefe, F.J. (2015). Validating pain communication: current state of the science.Pain, 156(2), 215–219,

  17. [17]

    Elliott, R., Bohart, A.C., Watson, J.C., Greenberg, L.S. (2011). Empathy.Psychotherapy,48(1), 43, 16

  18. [18]

    Eskreis-Winkler, L., Fishbach, A., Duckworth, A.L. (2018). Dear abby: Should i give advice or receive it?Psychological Science,29(11), 1797–1806,

  19. [19]

    Galinsky, A.D., Maddux, W.W., Gilin, D., White, J.B. (2008). Why it pays to get inside the head of your opponent: The differential effects of perspective taking and empathy in negotiations.Psychological science,19(4), 378–384,

  20. [20]

    Gerlich, M. (2025). Ai tools in society: Impacts on cognitive offloading and the future of critical thinking. Societies,15(1), 6,

  21. [21]

    Glenn, P. (2024). ” so you’re telling me...”: Paraphrasing (formulating), affective stance, and active listening.International Journal of Listening,38(1), 28–40,

  22. [22]

    Yeager, D.S

    Hecht, C.A., Ong, D.C., Clapper, M., Jones, M., Demszky, D., Yang, D., . . . Yeager, D.S. (2025). Using large language models in behavioral science interventions: Promise & risk.Behavioral Science & Policy,11(1), 1–9,

  23. [23]

    Jacobson, N.C

    Heinz, M.V., Mackin, D.M., Trudeau, B.M., Bhattacharya, S., Wang, Y., Banta, H.A., . . . Jacobson, N.C. (2025). Randomized trial of a generative ai chatbot for mental health treatment.NEJM AI, 2(4), AIoa2400802,

  24. [24]

    Howcroft, A., Bennett-Weston, A., Khan, A., Griffiths, J., Gay, S., Howick, J. (2025). Ai chatbots versus human healthcare professionals: a systematic review and meta-analysis of empathy in patient care. British Medical Bulletin,156(1), ldaf017,

  25. [25]

    Hu, Y., Tan, M., Zhang, C., Li, Z., Liang, X., Yang, M., . . . Hu, X. (2024). Aptness: Incorporating appraisal theory and emotion support strategies for empathetic response generation.Proceedings of the 33rd acm international conference on information and knowledge management(pp. 900–909)

  26. [26]

    Ibrahim, L., Hafner, F.S., Rocher, L. (2026). Training language models to be warm can undermine accuracy and increase sycophancy.Nature, ,

  27. [27]

    Inzlicht, M., Cameron, C.D., D’Cruz, J., Bloom, P. (2024). In praise of empathic ai.Trends in Cognitive Sciences,28(2), 89–91,

  28. [28]

    Iyer, L., Aggarwal, K., Koyejo, S., Heyman, G., Ong, D.C., Mukherjee, S. (2026). Heart: A uni- fied benchmark for assessing humans and llms in emotional support dialogue.arXiv preprint arXiv:2601.19922, ,

  29. [29]

    Jackson, J.C., Yam, K.C., Tang, P.M., Liu, T., Shariff, A. (2023). Exposure to robot preachers undermines religious commitment.Journal of Experimental Psychology: General,152(12), 3344,

  30. [30]

    Jiang, L., Chai, Y., Li, M., Liu, M., Fok, R., Dziri, N., . . . Choi, Y. (2025). Artificial hivemind: The open-ended homogeneity of language models (and beyond).The thirty-ninth annual conference on neural information processing system datasets and benchmarks track. 17

  31. [31]

    Karnaze, M.M., & Bloss, C.S. (2026). Six reasons to study emotional support from conversational artificial intelligence.Nature Human Behaviour, 1–4,

  32. [32]

    Kumar, A., Poungpeth, N., Yang, D., Lambert, B., Groh, M. (2026). Practicing with language models cultivates human empathic communication.arXiv preprint arXiv:2603.15245, ,

  33. [33]

    Lamm, C., Batson, C.D., Decety, J. (2007). The neural substrate of human empathy: effects of perspective-taking and cognitive appraisal.Journal of cognitive neuroscience,19(1), 42–58,

  34. [34]

    Lee, A., Kummerfeld, J.K., Ann, L., Mihalcea, R. (2024). A comparative multidimensional analysis of empathetic systems.Proceedings of the 18th conference of the european chapter of the association for computational linguistics (volume 1: Long papers)(pp. 179–189)

  35. [35]

    Lee, Y.K., Suh, J., Zhan, H., Li, J.J., Ong, D.C. (2024). Large language models produce responses perceived to be empathic.arXiv preprint arXiv:2403.18148, ,

  36. [36]

    Linton, S.J., Boersma, K., Vangronsveld, K., Fruzzetti, A. (2012). Painfully reassuring? the effects of validation on emotions and adherence in a pain test.European journal of pain,16(4), 592–599,

  37. [37]

    Huang, M

    Liu, S., Zheng, C., Demasi, O., Sabour, S., Li, Y., Yu, Z., . . . Huang, M. (2021). Towards emotional support dialog systems.Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: Long papers)(pp. 3469–3483)

  38. [38]

    Louie, R., Nandi, A., Fang, W., Chang, C., Brunskill, E., Yang, D. (2024). Roleplay-doh: Enabling domain-experts to create llm-simulated patients via eliciting and adhering to principles.Proceedings of the 2024 conference on empirical methods in natural language processing(pp. 10570–10603)

  39. [39]

    MacGeorge, E.L., Feng, B., Burleson, B.R. (2011). Supportive communication.Handbook of interpersonal communication,4, 317–354,

  40. [40]

    Maisel, N.C., Gable, S.L., Strachman, A. (2008). Responsive behaviors in good times and in bad. Personal Relationships,15(3), 317–338,

  41. [41]

    Malik, A., Sabri, N., Karnaze, M., ElSherief, M. (2025). Are llms empathetic to all? investigating the influence of multi-demographic personas on a model’s empathy.Findings of the Association for Computational Linguistics: EMNLP 2025, ,

  42. [42]

    Maples, B., Cerit, M., Vishwanath, A., Pea, R. (2024). Loneliness and suicide mitigation for students using gpt3-enabled chatbots.npj mental health research,3(1), 4,

  43. [43]

    others (2025)

    McBain, R.K., Bozick, R., Diliberti, M., Zhang, L.A., Zhang, F., Burnett, A., . . . others (2025). Use of generative ai for mental health advice among us adolescents and young adults.JAMA Network Open,8(11), e2542281,

  44. [44]

    McRae, K., Ciesielski, B., Gross, J.J. (2012). Unpacking cognitive reappraisal: goals, tactics, and outcomes.Emotion,12(2), 250, 18

  45. [45]

    Montemayor, C., Halpern, J., Fairweather, A. (2022). In principle obstacles for empathic ai: why we can’t replace human empathy in healthcare.AI & society,37(4), 1353–1359,

  46. [46]

    Moore, J., Grabb, D., Agnew, W., Klyman, K., Chancellor, S., Ong, D.C., Haber, N. (2025). Expressing stigma and inappropriate responses prevents llms from safely replacing mental health providers. Proceedings of the 2025 acm conference on fairness, accountability, and transparency(pp. 599– 627)

  47. [47]

    others (2026)

    Moore, J., Mehta, A., Agnew, W., Anthis, J.R., Louie, R., Mai, Y., . . . others (2026). Characterizing delusional spirals through human-llm chat logs.Proceedings of the 2026 ACM Conference on

  48. [48]

    Nadler, A. (1991). Help-seeking behavior: Psychological costs and instrumental benefits. M.S. Clark (Ed.),Prosocial behavior(p. 290-311). Sage Publications, Inc

  49. [49]

    Namuduri, R., Wu, Y., Zheng, A.A., Wadhwa, M., Durrett, G., Li, J.J. (2025). Qudsim: Quantifying discourse similarities in llm-generated text.Second conference on language modeling

  50. [50]

    (2026, Jan 8).”slop” was the 2025 word of the year

    Nolan, S.A., & Kimball, M. (2026, Jan 8).”slop” was the 2025 word of the year. what comes next? Retrieved from https://www.psychologytoday.com/us/blog/misinformation-desk/202601/2025s- word-of-the-year-was-slop-what-comes-next Oldemburgo de Mello, V., Ayad, R., Cˆ ot´ e,´E., Inbar, Y., Plaks, J., Inzlicht, M. (2025). The moralization of artificial intelli...

  51. [51]

    Ong, D.C., Goldenberg, A., Inzlicht, M., Perry, A. (2026). Ai-generated empathy: Opportunities, limits, and future directions.Current Directions in Psychological Science, ,

  52. [52]

    Ovsyannikova, D., de Mello, V.O., Inzlicht, M. (2025). Third-party evaluators perceive ai as more compassionate than expert humans.Communications Psychology,3(1), 4,

  53. [53]

    Perry, A. (2023). Ai will never convey the essence of human empathy.Nature Human Behaviour,7(11), 1808–1809,

  54. [54]

    Perry, A., & Shamay-Tsoory, S. (2013). Understanding emotional and cognitive empathy: A neu- ropsychological perspective.Understanding other minds: Perspectives from developmental social neuroscience, 179–194,

  55. [55]

    Rathje, S., Ye, M., Globig, L., Pillai, R., de Mello, V., Van Bavel, J. (2025). Sycophantic ai increases attitude extremity and overconfidence

  56. [56]

    Reinhart, A., Markey, B., Laudenbach, M., Pantusen, K., Yurko, R., Weinberg, G., Brown, D.W. (2025). Do llms write like humans? variation in grammatical and rhetorical styles.Proceedings of the National Academy of Sciences,122(8), e2422455122,

  57. [57]

    (1957).Active listening

    Rogers, C., & Farson, R. (1957).Active listening. Chicago: Univ. of Chicago. 19

  58. [58]

    Rubin, M., Li, J.Z., Zimmerman, F., Ong, D.C., Goldenberg, A., Perry, A. (2025). The value of perceiving a human response: Comparing perceived human versus ai-generated empathy.Nature Human Behaviour, ,

  59. [59]

    Shaib, C., Chakrabarty, T., Garcia-Olano, D., Wallace, B.C. (2025). Measuring ai” slop” in text.arXiv preprint arXiv:2509.19163, ,

  60. [60]

    Shaib, C., Elazar, Y., Li, J.J., Wallace, B.C. (2024). Detection and measurement of syntactic templates in generated text.Proceedings of the 2024 conference on empirical methods in natural language processing(pp. 6416–6431)

  61. [61]

    Shamay-Tsoory, S.G., Aharon-Peretz, J., Perry, D. (2009). Two systems for empathy: a double dis- sociation between emotional and cognitive empathy in inferior frontal gyrus versus ventromedial prefrontal lesions.Brain,132(3), 617–627,

  62. [62]

    Sharma, A., Lin, I.W., Miner, A.S., Atkins, D.C., Althoff, T. (2023). Human–ai collaboration enables more empathic conversations in text-based peer-to-peer mental health support.Nature Machine Intelligence,5(1), 46–57,

  63. [63]

    Sharma, A., Miner, A., Atkins, D., Althoff, T. (2020). A computational approach to understanding empathy expressed in text-based mental health support.Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp)(pp. 5263–5276)

  64. [64]

    others (2024)

    Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S.R., . . . others (2024). Towards understanding sycophancy in language models.Twelfth international conference on learning representations

  65. [65]

    it happened to be the perfect thing

    Siddals, S., Torous, J., Coxon, A. (2024). “it happened to be the perfect thing”: experiences of generative ai chatbots for mental health.npj Mental Health Research,3(1), 48,

  66. [66]

    Sprecher, S., & Hendrick, S.S. (2004). Self-disclosure in intimate relationships: Associations with indi- vidual and relationship characteristics over time.Journal of Social and Clinical Psychology,23(6), 857–877,

  67. [67]

    Stade, E., Tait, Z., Campione, S., Stirman, S., et al. (2025). Current real-world use of large language models for mental health

  68. [68]

    Hernandez, J

    Suh, J., Le, L., Shayegani, E., Ramos, G., Amores, J., Ong, D.C., . . . Hernandez, J. (2026). Sense-7: Tax- onomy and dataset for measuring user perceptions of empathy in sustained human-ai conversations. IEEE Transactions on Affective Computing, ,

  69. [69]

    Suzgun, M., Gur, T., Bianchi, F., Ho, D.E., Icard, T., Jurafsky, D., Zou, J. (2025). Language models cannot reliably distinguish belief from knowledge and fact.Nature Machine Intelligence, 1–11,

  70. [70]

    Taylor, S.E., et al. (2011). Social support: A review.The Oxford handbook of health psychology,1, 189–214, 20

  71. [71]

    Thomas, D.R., & Hodges, I.D. (2026). A new taxonomy of social support: Clarifying supportive behaviours and measures.Journal of Health Psychology, 13591053251412946,

  72. [72]

    Uchino, B.N., Cacioppo, J.T., Kiecolt-Glaser, J.K. (1996). The relationship between social support and physiological processes: a review with emphasis on underlying mechanisms and implications for health.Psychological Bulletin,119(3), 488,

  73. [73]

    Watson, J.C. (2007). Facilitating empathy.European Psychotherapy,7(1), 59–65,

  74. [74]

    Welivita, A., & Pu, P. (2020). A taxonomy of empathetic response intents in human social conversations. Proceedings of the 28th international conference on computational linguistics(pp. 4886–4899)

  75. [75]

    Welivita, A., & Pu, P. (2024). Are large language models more empathetic than humans?arXiv preprint arXiv:2406.05063, ,

  76. [76]

    Wenger, J.D., Cameron, C.D., Inzlicht, M. (2026). People choose to receive human empathy despite rating ai empathy higher.Communications Psychology, ,

  77. [77]

    Yin, Y., Jia, N., Wakslak, C.J. (2024). Ai can help people feel heard, but an ai label diminishes this impact.Proceedings of the National Academy of Sciences,121(14), e2319112121,

  78. [78]

    Zaki, J. (2014). Empathy: a motivated account.Psychological Bulletin,140(6), 1608,

  79. [79]

    Zaki, J., & Ochsner, K.N. (2012). The neuroscience of empathy: progress, pitfalls and promise.Nature neuroscience,15(5), 675–680,

  80. [80]

    Zhang, Y., Das, S.S.S., Zhang, R. (2024). Verbosity neq veracity: Demystify verbosity compensation behavior of large language models.arXiv preprint arXiv:2411.07858, ,

Showing first 80 references.