pith. sign in

arxiv: 2509.09870 · v2 · submitted 2025-09-11 · 💻 cs.HC · cs.AI· cs.CL

Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks

Pith reviewed 2026-05-18 16:56 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CL
keywords LLM conversational agentspersonality expressionBig Five traitsuser perceptionspersonality alignmentgoal-oriented tasksTrait Modulation Keys
0
0 comments X

The pith

Medium personality expression in LLM agents produces the highest user ratings for intelligence, trust, and likeability in goal-oriented tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how much personality conversational agents built on large language models should display when helping users with tasks such as travel planning. In an experiment with 150 participants, agents were set to low, medium, or high expression of the Big Five traits through a new control method. Medium expression consistently received the strongest positive feedback across intelligence, enjoyment, anthropomorphism, adoption intention, trust, and likeability, beating both lower and higher settings. Matching the agent's traits to the user's own personality improved results further, especially for extraversion and emotional stability. Cluster analysis also found three user groups with different compatibility patterns, one of which showed especially favorable responses.

Core claim

In a between-subjects experiment (N=150) where participants completed travel planning with LLM-based agents, medium personality expression across the Big Five traits produced the most positive evaluations on intelligence, enjoyment, anthropomorphism, intention to adopt, trust, and likeability, significantly outperforming both low and high expression levels. Personality alignment between user and agent further enhanced outcomes, with extraversion and emotional stability as the most influential traits, and three distinct compatibility profiles emerged from cluster analysis.

What carries the argument

Trait Modulation Keys framework that isolates and controls personality expression levels in LLM agents.

Load-bearing premise

The Trait Modulation Keys framework successfully isolates and controls personality expression levels in the LLM agents without introducing unintended changes to response quality, coherence, or task performance.

What would settle it

A follow-up experiment in which agents with medium personality expression do not receive significantly higher scores than low- or high-expression agents on the same perception measures, or in which alignment produces no measurable improvement.

Figures

Figures reproduced from arXiv: 2509.09870 by Hasibur Rahman, Smit Desai.

Figure 1
Figure 1. Figure 1: Overview of our Large Language Model (LLM)-based Conversational Agents’ (CA) personality prompting and its impact on [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Text-only study interface and example exchange with the NYC day-trip CA. The left sidebar presents the vignette and the [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: TMK prompting framework used to elicit three distinct personality profiles, illustrated with example responses from Low CA, [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Control fidelity of TMK personality prompting on the CA, measured with Mini-IPIP across all 243 trait–level configurations. [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Single session study flow 3.5 Participants To detect a one-standard-deviation effect in our 3×1 between-subjects design, a priori power analysis indicated a minimum sample of 148 participants (𝛼 = .05; 1 − 𝛽 = .80). To meet this threshold, we therefore recruited 150 US adults through Prolific, with 50 participants assigned to each experimental condition (low, medium, high) in a counterbalanced order. 2http… view at source ↗
Figure 6
Figure 6. Figure 6: Participants’ Mini-IPIP Big-Five scores by experimental condition ( [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Post hoc Mann–Whitney 𝑈 comparisons (Bonferroni-adjusted, 𝛼 = .05) for the medium vs low (left) and high (right) across six outcomes. Points show group medians; circle size encodes |rank–biserial 𝑟| (.10/.30/.50). Arrows mark pairs with Bonferroni-corrected 𝑝 ≤ .05 and point toward the higher median. Medium CA > low for all six constructs; medium > high for Likeability and Intelligence only (others n.s.). … view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of user–CA Personality Alignment Scores (0–1) by experimental condition ( [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Personality alignment clusters from 𝑘-means on trait distances (Openness, Conscientiousness, Extraversion, Agreeableness, Emotional Stability; 0–1) plus overall Personality Alignment Score. The radar plot shows three profiles: Extraversion-Misaligned (purple) has a large Extraversion distance with otherwise moderate distances and a mid alignment score; Globally-Misaligned (green) shows high distances on al… view at source ↗
Figure 10
Figure 10. Figure 10: Post hoc Mann–Whitney 𝑈 results (Bonferroni-adjusted, 𝛼 = .05) comparing the Well-Aligned cluster (C2) with Extraversion￾Misaligned (C0, left) and Globally-Misaligned (C1, right). Points show cluster medians; circle size encodes |𝑟| (.10/.30/.50). Arrows mark pairs with Bonferroni-corrected 𝑝 ≤ .05 and point toward the higher median. C2 > C0 for Intelligence, Trust, and Likeability (others n.s.). C2 > C1 … view at source ↗
Figure 11
Figure 11. Figure 11: Control fidelity of TMK across two LLMs using Mini-IPIP trait scores. Five panels (one per Big-Five trait) plot target levels [PITH_FULL_IMAGE:figures/full_fig_p039_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Model selection for 𝑘-means clustering. Left (Elbow): within-cluster sum of squares drops steeply from 𝑘=2 to 𝑘=3 and then flattens, indicating an elbow at 𝑘≈3. Right (Silhouette): scores are highest at very small 𝑘 and decrease thereafter; at 𝑘=3 the silhouette is ≈ .31. Balancing fit and parsimony, we select 𝑘=3. Manuscript submitted to ACM [PITH_FULL_IMAGE:figures/full_fig_p040_12.png] view at source ↗
read the original abstract

Large language models (LLMs) enable conversational agents (CAs) to express distinctive personalities, raising new questions about how such designs shape user perceptions. This study investigates how personality expression levels and user-agent personality alignment influence perceptions in goal-oriented tasks. In a between-subjects experiment (N=150), participants completed travel planning with CAs exhibiting low, medium, or high expression across the Big Five traits, controlled via our novel Trait Modulation Keys framework. Results revealed an inverted-U relationship: medium expression produced the most positive evaluations across Intelligence, Enjoyment, Anthropomorphism, Intention to Adopt, Trust, and Likeability, significantly outperforming both extremes. Personality alignment further enhanced outcomes, with Extraversion and Emotional Stability emerging as the most influential traits. Cluster analysis identified three distinct compatibility profiles, with "Well-Aligned" users reporting substantially positive perceptions. These findings demonstrate that personality expression and strategic trait alignment constitute optimal design targets for CA personality, offering design implications as LLM-based CAs become increasingly prevalent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This paper reports a between-subjects experiment (N=150) in which participants completed a goal-oriented travel-planning task with LLM-based conversational agents whose Big Five personality expression was set to low, medium, or high levels via a novel Trait Modulation Keys framework. The central findings are an inverted-U pattern in which medium expression produced the highest ratings on Intelligence, Enjoyment, Anthropomorphism, Intention to Adopt, Trust, and Likeability, plus positive effects of personality alignment (especially Extraversion and Emotional Stability) and three emergent user compatibility clusters.

Significance. If the Trait Modulation Keys truly isolate personality strength without altering response length, coherence, or task helpfulness, the results supply concrete design guidance for LLM conversational agents: moderate rather than extreme personality expression is preferable, and deliberate trait alignment can further improve user perceptions. The sizable sample and multi-scale measurement strengthen the empirical contribution to human-AI interaction research.

major comments (2)
  1. [Methods] Methods section (Trait Modulation Keys): The manuscript presents the framework as cleanly isolating personality expression levels, yet reports no quantitative checks (response length, lexical diversity, coherence ratings, or task-completion metrics) across the low/medium/high conditions. Without such evidence or equivalence tests, systematic differences in agent verbosity or helpfulness could explain the inverted-U pattern on the six perception scales rather than personality per se.
  2. [Results] Results section: The headline claim that medium expression significantly outperforms both extremes on all six scales, and that alignment effects are driven by specific traits, is load-bearing for the design implications, but the provided abstract and summary contain no p-values, effect sizes, confidence intervals, or exclusion criteria. These details are required to evaluate whether the statistical evidence supports the reported superiority.
minor comments (2)
  1. The paper would be strengthened by including the exact Trait Modulation Key prompts or parameter settings in an appendix to support replication.
  2. Cluster-analysis description should specify the distance metric, linkage method, and validation procedure used to identify the three compatibility profiles.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have reviewed each major comment carefully and provide point-by-point responses below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Methods] Methods section (Trait Modulation Keys): The manuscript presents the framework as cleanly isolating personality expression levels, yet reports no quantitative checks (response length, lexical diversity, coherence ratings, or task-completion metrics) across the low/medium/high conditions. Without such evidence or equivalence tests, systematic differences in agent verbosity or helpfulness could explain the inverted-U pattern on the six perception scales rather than personality per se.

    Authors: We agree that the absence of quantitative validation for the Trait Modulation Keys leaves open the possibility of confounds. The original manuscript did not report these checks because the primary focus was on the framework's implementation and downstream perceptual effects. In the revised version we will add a dedicated subsection to the Methods (and corresponding results) that reports mean response length (word count), lexical diversity (type-token ratio), automated coherence scores, and task-completion rates for each personality-expression condition. We will also include equivalence tests (TOST) with pre-specified bounds to demonstrate that these variables do not differ meaningfully across conditions. These additions will strengthen the claim that the inverted-U pattern is driven by personality expression rather than ancillary linguistic features. revision: yes

  2. Referee: [Results] Results section: The headline claim that medium expression significantly outperforms both extremes on all six scales, and that alignment effects are driven by specific traits, is load-bearing for the design implications, but the provided abstract and summary contain no p-values, effect sizes, confidence intervals, or exclusion criteria. These details are required to evaluate whether the statistical evidence supports the reported superiority.

    Authors: The full Results section already contains the complete statistical reporting (one-way ANOVAs, post-hoc Tukey tests, partial eta-squared effect sizes, 95 % confidence intervals, and exact p-values) together with the exclusion criteria applied in the Participants subsection. However, we acknowledge that the abstract and the high-level summary supplied to the referee omit these numbers. We will therefore revise the abstract to include the key statistical outcomes (e.g., significant main effects for the medium condition on each scale, effect sizes, and the most influential alignment traits). We will also add a brief sentence in the Results overview that explicitly references the exclusion criteria and directs readers to the full statistical tables. revision: yes

Circularity Check

0 steps flagged

Empirical user study with direct measurements exhibits no circularity

full rationale

The paper reports a between-subjects experiment (N=150) in which participants completed goal-oriented travel planning tasks with LLM agents whose personality expression was varied across low/medium/high levels via the Trait Modulation Keys framework. Outcomes consist of participant ratings on established scales (Intelligence, Enjoyment, Anthropomorphism, Intention to Adopt, Trust, Likeability) plus post-hoc cluster analysis of alignment profiles. No equations, fitted parameters, or predictions are presented; results are obtained from raw experimental data and standard statistical comparisons. The central claims therefore rest on external participant responses rather than reducing to quantities defined by the analysis itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard experimental assumptions rather than mathematical derivations or new postulated entities.

axioms (1)
  • standard math Standard assumptions of between-subjects experimental design and statistical testing (e.g., independence of observations, appropriate use of ANOVA or equivalent)
    Invoked to interpret differences across low/medium/high conditions and alignment effects

pith-pipeline@v0.9.0 · 5712 in / 1174 out tokens · 37869 ms · 2026-05-18T16:56:51.557354+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Same Voice, Different Lab: On the Homogenization of Frontier LLM Personalities

    cs.HC 2026-03 unverdicted novelty 5.0

    Frontier LLMs homogenize toward systematic and analytical personalities, suppressing emotional traits like remorseful or sycophantic, indicating an implicit consensus on optimal assistant behavior.

  2. The Differential Effects of Agreeableness and Extraversion on Older Adults' Perceptions of Conversational AI Explanations in Assistive Settings

    cs.HC 2026-03 unverdicted novelty 5.0

    High agreeableness in LLM voice assistants increases older adults' empathy perceptions and real-time explanations outperform history-based ones, but personality does not affect perceived intelligence.

Reference graph

Works this paper leans on

117 extracted references · 117 canonical work pages · cited by 2 Pith papers · 1 internal anchor

  1. [1]

    Jennifer L. Aaker. 1997. Dimensions of Brand Personality. Journal of Marketing Research 34, 3 (1997), 347–356. doi:10.2307/3151897

  2. [2]

    Rangina Ahmad, Dominik Siemon, and Susanne Robra-Bissantz. 2020. ExtraBot vs IntroBot: The Influence of Linguistic Cues on Communication Satisfaction. AMCIS 2020 Proceedings (Aug. 2020). https://aisel.aisnet.org/amcis2020/cognitive_in_is/cognitive_in_is/10

  3. [3]

    Abeer Alessa and Hend Al-Khalifa. 2023. Towards Designing a ChatGPT Conversational Companion for Elderly People. In Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments (PETRA ’23) . Association for Computing Machinery, New York, NY, USA, 667–674. doi:10.1145/3594806.3596572

  4. [4]

    Mohammad Amin Kuhail, Mohamed Bahja, Ons Al-Shamaileh, Justin Thomas, Amina Alkazemi, and Joao Negreiros. 2024. Assessing the Impact of Chatbot-Human Personality Congruence on User Behavior: A Chatbot-Based Advising System Case. IEEE Access 12 (2024), 71761–71782. doi:10.1109/ACCESS.2024.3402977

  5. [5]

    Sean Andrist, Bilge Mutlu, and Adriana Tapus. 2015. Look Like Me: Matching Robot Personality via Gaze to Increase Motivation. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15) . Association for Computing Machinery, New York, NY, USA, Manuscript submitted to ACM 30 Hasibur Rahman and Smit Desai 3603–3612. doi:...

  6. [6]

    Michael Argyle. 1988. Bodily communication. Methuen & Co Ltd, London, UK

  7. [7]

    Christiane Atzmüller and Peter M. Steiner. 2010. Experimental vignette studies in survey research. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences 6, 3 (2010), 128–138. doi:10.1027/1614-2241/a000014 Place: Germany Publisher: Hogrefe Publishing

  8. [8]

    Yuntao Bai, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain, Stanislav Fort, Deep Ganguli, Tom Henighan, Nicholas Joseph, Saurav Kadavath, Jackson Kernion, Tom Conerly, Sheer El-Showk, Nelson Elhage, Zac Hatfield-Dodds, Danny Hernandez, Tristan Hume, Scott Johnston, Shauna Kravec, Liane Lovitt, Neel Nanda, Catherine Olsson, ...

  9. [9]

    Christoph Bartneck. 2023. Godspeed Questionnaire Series: Translations and Usage. In International Handbook of Behavioral Health Assessment . Springer, Cham, 1–35. doi:10.1007/978-3-030-89738-3_24-1

  10. [10]

    Christoph Bartneck, Dana Kulić, Elizabeth Croft, and Susana Zoghbi. 2009. Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots.International Journal of Social Robotics 1, 1 (Jan. 2009), 71–81. doi:10.1007/s12369- 008-0001-3

  11. [11]

    Bernier and Brian Scassellati

    Emily P. Bernier and Brian Scassellati. 2010. The similarity-attraction effect in human-robot interaction. In 2010 IEEE 9th International Conference on Development and Learning . 286–290. doi:10.1109/DEVLRN.2010.5578828 ISSN: 2161-9476

  12. [12]

    Beukeboom, Martin Tanis, and Ivar E

    Camiel J. Beukeboom, Martin Tanis, and Ivar E. Vermeulen. 2013. The language of extraversion: Extraverted people talk more abstractly, introverts are more concrete. Journal of Language and Social Psychology 32, 2 (2013), 191–201. doi:10.1177/0261927X12460844 tex.eprint: https://doi.org/10.1177/0261927X12460844

  13. [13]

    Bickmore, Suzanne E

    Timothy W. Bickmore, Suzanne E. Mitchell, Brian W. Jack, Michael K. Paasche-Orlow, Laura M. Pfeifer, and Julie Odonnell. 2010. Response to a Relational Agent by Hospital Patients with Depressive Symptoms. Interacting with Computers 22, 4 (July 2010), 289–298. doi:10.1016/j.intcom.2009. 12.001

  14. [14]

    Ryan L Boyd and James W Pennebaker. 2017. Language-based personality: a new approach to personality in a digital world. Current Opinion in Behavioral Sciences 18 (Dec. 2017), 63–68. doi:10.1016/j.cobeha.2017.07.017

  15. [15]

    Michael Braun, Anja Mainz, Ronee Chadowitz, Bastian Pfleging, and Florian Alt. 2019. At Your Service: Designing Voice Assistant Personalities to Improve Automotive User Interfaces. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19) . Association for Computing Machinery, New York, NY, USA, 1–11. doi:10.1145/3290605.3300270

  16. [16]

    Burger and David F

    Jerry M. Burger and David F. Caldwell. 2000. Personality, Social Activities, Job-Search Behavior and Interview Success: Distinguishing Between PANAS Trait Positive Affect and NEO Extraversion. Motivation and Emotion 24, 1 (March 2000), 51–62. doi:10.1023/A:1005539609679

  17. [17]

    Wanling Cai, Yucheng Jin, and Li Chen. 2022. Impacts of Personal Characteristics on User Trust in Conversational Recommender Systems. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22) . Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3491102.3517471

  18. [18]

    Philippe Rushton

    Anne Campbell and J. Philippe Rushton. 1978. Bodily communication and personality. British Journal of Social and Clinical Psychology 17, 1 (1978), 31–36. doi:10.1111/j.2044-8260.1978.tb00893.x _eprint: https://bpspsychub.onlinelibrary.wiley.com/doi/pdf/10.1111/j.2044-8260.1978.tb00893.x

  19. [19]

    D. W. Carment and C. G. Miles. 1965. Persuasiveness and persuasibility as related to intelligence and extraversion. British Journal of Social & Clinical Psychology 4, 1 (1965), 1–7. doi:10.1111/j.2044-8260.1965.tb00433.x Place: United Kingdom Publisher: British Psychological Society

  20. [20]

    Carter, Joshua D

    Nathan T. Carter, Joshua D. Miller, and Thomas A. Widiger. 2018. Extreme Personalities at Work and in Life.Current Directions in Psychological Science 27, 6 (Dec. 2018), 429–436. doi:10.1177/0963721418793134 Publisher: SAGE Publications Inc

  21. [21]

    Zhu, Kunyao Lan, Zhiling Zhang, and Lyuchun Cui

    Siyuan Chen, Mengyue Wu, Kenny Q. Zhu, Kunyao Lan, Zhiling Zhang, and Lyuchun Cui. 2023. LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and Evaluation. doi:10.48550/arXiv.2305.13614 arXiv:2305.13614 [cs]

  22. [22]

    Jessie Chin and Smit Desai. 2021. Being a Nice Partner: The Effects of Age and Interaction Types on the Perceived Social Abilities of Conversational Agents. In Technology, Mind, and Behavior. doi:10.1037/tms0000027

  23. [23]

    Jessie Chin, Smit Desai, Sheny (Cheng-Hsuan) Lin, and Shannon Mejia. 2024. Like My Aunt Dorothy: Effects of Conversational Styles on Perceptions, Acceptance and Metaphorical Descriptions of Voice Assistants during Later Adulthood. Proc. ACM Hum.-Comput. Interact. 8, CSCW1 (April 2024), 88:1–88:21. doi:10.1145/3637365

  24. [24]

    Costa and Robert R

    Paul T. Costa and Robert R. McCrae. 1992. Four ways five factors are basic. Personality and Individual Differences 13, 6 (June 1992), 653–665. doi:10.1016/0191-8869(92)90236-I

  25. [25]

    Ian J. Deary. 2009. The trait approach to personality. In The Cambridge handbook of personality psychology . Cambridge University Press, New York, NY, US, 89–109. doi:10.1017/CBO9780511596544.009

  26. [26]

    Smit Desai, Jessie Chin, Dakuo Wang, Benjamin Cowan, and Michael Twidale. 2025. Toward Metaphor-Fluid Conversation Design for Voice User Interfaces. arXiv:2502.11554 (Feb. 2025). doi:10.48550/arXiv.2502.11554 arXiv:2502.11554 [cs]

  27. [27]

    Smit Desai, Mateusz Dubiel, and Luis A. Leiva. 2024. Examining Humanness as a Metaphor to Design Voice User Interfaces. In Proceedings of the 6th ACM Conference on Conversational User Interfaces (CUI ’24) . Association for Computing Machinery, New York, NY, USA, 1–15. doi:10.1145/ 3640794.3665535

  28. [28]

    Smit Desai and Michael Twidale. 2023. Metaphors in Voice User Interfaces: A Slippery Fish. ACM Trans. Comput.-Hum. Interact. 30, 6 (Sept. 2023), 89:1–89:37. doi:10.1145/3609326 Manuscript submitted to ACM Vibe Check: Understanding the Effects of LLM-Based Conversational Agents’ Personality and Alignment on User Perceptions in Goal-Oriented Tasks 31

  29. [29]

    Jean-Marc Dewaele and Adrian Furnham. 2000. Personality and speech production: A pilot study of second language learners. Personality and Individual Differences 28, 2 (2000), 355–365. doi:10.1016/S0191-8869(99)00106-3 Place: Netherlands Publisher: Elsevier Science

  30. [30]

    Brent Donnellan, Frederick L

    M. Brent Donnellan, Frederick L. Oswald, Brendan M. Baird, and Richard E. Lucas. 2006. The Mini-IPIP Scales: Tiny-yet-effective measures of the Big Five Factors of Personality. Psychological Assessment 18, 2 (2006), 192–203. doi:10.1037/1040-3590.18.2.192 Place: US Publisher: American Psychological Association

  31. [31]

    Mateusz Dubiel, Sylvain Daronnat, and Luis A. Leiva. 2022. Conversational Agents Trust Calibration: A User-Centred Perspective to Design. In Proceedings of the 4th Conference on Conversational User Interfaces (CUI ’22) . Association for Computing Machinery, New York, NY, USA, 1–6. doi:10.1145/3543829.3544518

  32. [32]

    Adrian Furnham. 1990. Language and personality. In Handbook of language and social psychology . John Wiley & Sons, Oxford, England, 73–95

  33. [33]

    Asma Ghandeharioun, Daniel McDuff, Mary Czerwinski, and Kael Rowan. 2019. Towards Understanding Emotional Intelligence for Behavior Change Chatbots. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII) . 8–14. doi:10.1109/ACII.2019.8925433 ISSN: 2156-8111

  34. [34]

    Gill and Jon Oberlander

    Alastair J. Gill and Jon Oberlander. 2002. Taking Care of the Linguistic Features of Extraversion. Proceedings of the Annual Meeting of the Cognitive Science Society 24, 24 (2002). https://escholarship.org/uc/item/6n5652cx

  35. [35]

    Ulrich Gnewuch, Meng Yu, and Alexander Maedche. 2020. The Effect of Perceived Similarity in Dominance on Customer Self-Disclosure to Chatbots in Conversational Commerce. ECIS 2020 Research Papers (June 2020). https://aisel.aisnet.org/ecis2020_rp/53

  36. [36]

    Goldberg

    Lewis R. Goldberg. 1992. The development of markers for the Big-Five factor structure.Psychological Assessment 4, 1 (1992), 26–42. doi:10.1037/1040- 3590.4.1.26 Place: US Publisher: American Psychological Association

  37. [37]

    Adam M. Grant. 2013. Rethinking the Extraverted Sales Ideal: The Ambivert Advantage. Psychological Science 24, 6 (June 2013), 1024–1030. doi:10.1177/0956797612463706 Publisher: SAGE Publications Inc

  38. [38]

    Grant and Barry Schwartz

    Adam M. Grant and Barry Schwartz. 2011. Too Much of a Good Thing: The Challenge and Opportunity of the Inverted U. Perspectives on Psychological Science 6, 1 (Jan. 2011), 61–76. doi:10.1177/1745691610393523 Publisher: SAGE Publications Inc

  39. [39]

    Kashyap Haresamudram, Nena Van As, and Stefan Larsson. 2025. Tasks Over Traits: User Perception of Humanlike Features in Goal-Oriented Chatbots. International Journal of Human–Computer Interaction 0, 0 (March 2025), 1–19. doi:10.1080/10447318.2025.2470311 Publisher: Taylor & Francis _eprint: https://doi.org/10.1080/10447318.2025.2470311

  40. [40]

    Evelien Heyselaar. 2023. The CASA theory no longer applies to desktop computers. Scientific Reports 13, 1 (Nov. 2023), 19693. doi:10.1038/s41598- 023-46527-9 Publisher: Nature Publishing Group

  41. [41]

    Hirsh and Jordan B

    Jacob B. Hirsh and Jordan B. Peterson. 2009. Personality and language use in self-narratives. Journal of Research in Personality 43, 3 (June 2009), 524–527. doi:10.1016/j.jrp.2009.01.006

  42. [42]

    Thomas Holtgraves. 2011. Text messaging, personality, and the social context. Journal of Research in Personality 45, 1 (Feb. 2011), 92–99. doi:10.1016/j.jrp.2010.11.015

  43. [43]

    Jen-tse Huang, Wenxiang Jiao, Man Ho Lam, Eric John Li, Wenxuan Wang, and Michael Lyu. 2024. On the Reliability of Psychological Scales on Large Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Mi...

  44. [44]

    Bahar Irfan, Sanna-Mari Kuoppamäki, and Gabriel Skantze. 2023. Between Reality and Delusion: Challenges of Applying Large Language Models to Companion Robots for Open-Domain Dialogues with Older Adults. doi:10.21203/rs.3.rs-2884789/v1 ISSN: 2693-5015

  45. [45]

    KATHERINE Isbister and CLIFFORD Nass. 2000. Consistency of personality in interactive characters: verbal cues, non-verbal cues, and user characteristics. International Journal of Human-Computer Studies 53, 2 (Aug. 2000), 251–267. doi:10.1006/ijhc.2000.0368

  46. [46]

    Bisantz, and Colin G

    Jiun-Yin Jian, Ann M. Bisantz, and Colin G. Drury. 2000. Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics 4, 1 (March 2000), 53–71. doi:10.1207/S15327566IJCE0401_04 Publisher: Routledge _eprint: https://doi.org/10.1207/S15327566IJCE0401_04

  47. [47]

    Guangyuan Jiang, Manjie Xu, Song-Chun Zhu, Wenjuan Han, Chi Zhang, and Yixin Zhu. 2023. Evaluating and inducing personality in pre-trained language models. In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS ’23) . Curran Associates Inc., Red Hook, NY, USA, 10622–10643

  48. [48]

    Hang Jiang, Xiajie Zhang, Xubo Cao, Cynthia Breazeal, Deb Roy, and Jad Kabbara. 2024. PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits. In Findings of the Association for Computational Linguistics: NAACL 2024 , Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computational Linguistics, Mexico ...

  49. [49]

    Xinsheng Jiang, Xiaojun Li, Xia Dong, and Lan Wang. 2022. How the Big Five personality traits related to aggression from perspectives of the benign and malicious envy. BMC Psychology 10 (Aug. 2022), 203. doi:10.1186/s40359-022-00906-5

  50. [50]

    Oliver P. John, E. M. Donahue, and R. L. Kentle. 1991. Big Five Inventory. doi:10.1037/t07550-000 Institution: American Psychological Association

  51. [51]

    O. P. John and Sanjay Srivastava. 1999. The Big-Five Trait Taxonomy: History, Measurement, and Theoretical Perspectives. In Handbook of Personality: Theory and Research , O. P. John and L.A. Pervin (Eds.). Vol. 2. Guilford Press

  52. [52]

    Maciej Karwowski, Izabela Lebuda, Ewa Wisniewska, and Jacek Gralewski. 2013. Big five personality traits as the predictors of creative self-efficacy and creative personal identity: Does gender matter? The Journal of Creative Behavior 47, 3 (2013), 215–232. doi:10.1002/jocb.32 Place: United Kingdom Publisher: Wiley-Blackwell Publishing Ltd.. Manuscript sub...

  53. [53]

    Weaver III

    Christian Kiewitz and James B. Weaver III. 2007. The aggression questionnaire. In Handbook of research on electronic surveys and measurements . Idea Group Reference/IGI Global, Hershey, PA, US, 343–347. doi:10.4018/978-1-59140-792-8.ch047

  54. [54]

    Nikola Kovacevic, Tobias Boschung, Christian Holz, Markus Gross, and Rafael Wampfler. 2024. Chatbots With Attitude: Enhancing Chatbot Interactions Through Dynamic Personality Infusion. InProceedings of the 6th ACM Conference on Conversational User Interfaces (CUI ’24). Association for Computing Machinery, New York, NY, USA, 1–16. doi:10.1145/3640794.3665543

  55. [55]

    Kwan Min Lee and Clifford Nass. 2003. Designing social presence of social actors in human computer interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’03) . Association for Computing Machinery, New York, NY, USA, 289–296. doi:10.1145/ 642611.642662

  56. [56]

    Seo-young Lee, Soomin Kim, Gyuho Lee, and Joonhwan Lee. 2018. Robots in Diverse Contexts: Effects of Robots Tasks on Expected Personality. In Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’18) . Association for Computing Machinery, New York, NY, USA, 169–170. doi:10.1145/3173386.3176989

  57. [57]

    Walker, Matthias R

    François Mairesse, Marilyn A. Walker, Matthias R. Mehl, and Roger K. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Int. Res. 30, 1 (Nov. 2007), 457–500

  58. [58]

    Robert McCrae. 2002. Cross-Cultural Research on the Five-Factor Model of Personality. Online Readings in Psychology and Culture 4, 4 (Aug. 2002). doi:10.9707/2307-0919.1038

  59. [59]

    R. R. McCrae and O. P. John. 1992. An introduction to the five-factor model and its applications. Journal of Personality 60, 2 (June 1992), 175–215. doi:10.1111/j.1467-6494.1992.tb00970.x

  60. [60]

    Mehl, Samuel D

    Matthias R. Mehl, Samuel D. Gosling, and James W. Pennebaker. 2006. Personality in its natural habitat: manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology 90, 5 (May 2006), 862–877. doi:10.1037/0022-3514.90.5.862

  61. [61]

    YOUNGME MOON and CLIFFORD NASS. 1996. How “Real” Are Computer Personalities?: Psychological Responses to Personality Types in Human-Computer Interaction. Communication Research 23, 6 (Dec. 1996), 651–674. doi:10.1177/009365096023006002 Publisher: SAGE Publications Inc

  62. [62]

    IEEE Robotics and Automa- tion Magazine19(2), 98–100 (2012) https: //doi.org/10.1109/MRA.2012.2192811

    Masahiro Mori, Karl F. MacDorman, and Norri Kageki. 2012. The Uncanny Valley [From the Field]. IEEE Robotics & Automation Magazine 19, 2 (June 2012), 98–100. doi:10.1109/MRA.2012.2192811

  63. [63]

    Sara Moussawi, Marios Koufaris, and Raquel Benbunan-Fich. 2021. How perceptions of intelligence and anthropomorphism affect adoption of personal intelligent agents. Electronic Markets 31, 2 (June 2021), 343–364. doi:10.1007/s12525-020-00411-w

  64. [64]

    Clifford Nass and Scott Brave. 2005. Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship . The MIT Press

  65. [65]

    Clifford Nass and Kwan Min Lee. 2001. Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity- attraction, and consistency-attraction. Journal of Experimental Psychology: Applied 7, 3 (2001), 171–181. doi:10.1037/1076-898X.7.3.171 Place: US Publisher: American Psychological Association

  66. [66]

    Clifford Nass, Jonathan Steuer, and Ellen R. Tauber. 1994. Computers are social actors. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’94). Association for Computing Machinery, New York, NY, USA, 72–78. doi:10.1145/191666.191703

  67. [67]

    Nguyen, Anna Sidorova, and Russell Torres

    Quynh N. Nguyen, Anna Sidorova, and Russell Torres. 2022. User interactions with chatbot interfaces vs. Menu-based interfaces: An empirical study. Computers in Human Behavior 128 (March 2022), 107093. doi:10.1016/j.chb.2021.107093

  68. [68]

    Jon Oberlander and Alastair J. Gill. 2004. Individual differences and implicit language: personality, parts-of-speech and pervasiveness. Proceedings of the Annual Meeting of the Cognitive Science Society 26, 26 (2004). https://escholarship.org/uc/item/94c490mq

  69. [69]

    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human fee...

  70. [70]

    Rock Yuren Pang, Hope Schroeder, Kynnedy Simone Smith, Solon Barocas, Ziang Xiao, Emily Tseng, and Danielle Bragg. 2025. Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems . ACM, Yokohama Japan, 1–20. doi:10.1145...

  71. [71]

    Ashwin Paranjape and Christopher Manning. 2021. Human-like informative conversations: Better acknowledgements using conditional mutual information. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Ha...

  72. [72]

    Laura Parks-Leduc, Gilad Feldman, and Anat Bardi. 2015. Personality traits and personal values: a meta-analysis. Personality and Social Psychology Review: An Official Journal of the Society for Personality and Social Psychology, Inc 19, 1 (Feb. 2015), 3–29. doi:10.1177/1088868314538548

  73. [73]

    Patterson and D.S

    M. Patterson and D.S. Holmes. 1966. Social Interaction Correlates of the MMPI Extraversion Introversion Scale. American Psychologist 21 (1966), 724–25

  74. [74]

    Matej Pavlić, Matea Kramarić, and Ana Butković. 2023. The relationship between personality and creative self-beliefs at different levels of personality hierarchy. Psihologijske Teme 32, 1 (2023), 125–141. doi:10.31820/pt.32.1.7 Place: Croatia Publisher: University of Rijeka

  75. [75]

    James Pennebaker, Martha Francis, and Roger Booth. 2001. Linguistic inquiry and word count (LIWC) . Erlbaum Publishers, Mahwah, NJ

  76. [76]

    J. W. Pennebaker and L. A. King. 1999. Linguistic styles: language use as an individual difference. Journal of Personality and Social Psychology 77, 6 (Dec. 1999), 1296–1312. doi:10.1037//0022-3514.77.6.1296 Manuscript submitted to ACM Vibe Check: Understanding the Effects of LLM-Based Conversational Agents’ Personality and Alignment on User Perceptions i...

  77. [77]

    Pinhanez

    Claudio S. Pinhanez. 2021. Expose Uncertainty, Instill Distrust, Avoid Explanations: Towards Ethical Guidelines for AI. arXiv:2112.01281 (Nov. 2021). doi:10.48550/arXiv.2112.01281 arXiv:2112.01281 [cs]

  78. [78]

    Phantom Friend

    Alisha Pradhan, Leah Findlater, and Amanda Lazar. 2019. "Phantom Friend" or "Just a Box with Information": Personification and Ontological Categorization of Smart Speaker-based Voice Assistants by Older Adults. Proc. ACM Hum.-Comput. Interact. 3, CSCW (Nov. 2019), 214:1–214:21. doi:10.1145/3359316

  79. [79]

    Alisha Pradhan, Amanda Lazar, and Leah Findlater. 2020. Use of Intelligent Voice Assistants by Older Adults with Low Technology Use.ACM Trans. Comput.-Hum. Interact. 27, 4 (Sept. 2020), 31:1–31:27. doi:10.1145/3373759

  80. [80]

    Fangying Quan, Yan Gou, Yibo Gao, Xinxin Yu, and Bao Wei. 2024. The relationship between neuroticism and social aggression: a moderated mediation model. BMC Psychology 12 (Aug. 2024), 443. doi:10.1186/s40359-024-01938-9

Showing first 80 references.