pith. sign in

arxiv: 1907.10664 · v1 · pith:UQVDNNW4new · submitted 2019-07-23 · 💻 cs.HC

Towards Understanding Emotional Intelligence for Behavior Change Chatbots

Pith reviewed 2026-05-24 17:41 UTC · model grok-4.3

classification 💻 cs.HC
keywords emotion-aware chatbotexperience samplingbehavior changeemotional intelligencemHealthextraversionpersonality
0
0 comments X

The pith

An emotion-aware chatbot for mood tracking is preferred by extraverts and yields more positive mood reports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper designs a chatbot that performs experience sampling while responding with emotional awareness and empathy. It tests this design in a one-week study with 39 participants. Extraverts showed significantly higher preference for the emotion-aware version compared with introverts. Participants also produced a higher share of positive mood reports when using the empathetic bot. The work ends by offering design guidelines for similar tools in mobile health settings.

Core claim

The authors built an emotion-aware chatbot that conducts experience sampling in an empathetic manner and evaluated it with N=39 participants over the course of a week. Their results show that extraverts preferred the emotion-aware chatbot significantly more than introverts, and that participants reported a higher percentage of positive mood reports when interacting with the empathetic bot. They conclude with guidelines for the design of emotion-aware chatbots for potential use in mHealth contexts.

What carries the argument

The emotion-aware chatbot that conducts experience sampling empathetically by detecting and responding to user emotions.

If this is right

  • Behavior change tools in mHealth can increase engagement by matching emotional responses to user personality traits such as extraversion.
  • Empathetic sampling may shift the distribution of self-reported moods toward more positive entries during longitudinal tracking.
  • Design guidelines from the study can be used to build other chatbots that combine experience sampling with emotional intelligence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Personality-based tailoring of chatbot emotional style could extend to other health behaviors such as medication reminders or exercise prompts.
  • The preference difference suggests that introverts might respond better to chatbots with lower emotional expressiveness.
  • Longer-term studies could test whether the mood-report shift leads to measurable changes in actual behavior or symptom tracking accuracy.

Load-bearing premise

The differences in preference and mood reports are caused by the emotional awareness features rather than other aspects of the chatbot design or the sampling task.

What would settle it

A controlled comparison that keeps all other chatbot elements identical but removes emotional responses, then finds no difference in extravert preference or positive mood reports.

Figures

Figures reproduced from arXiv: 1907.10664 by Asma Ghandeharioun, Daniel McDuff, Kael Rowan, Mary Czerwinski.

Figure 1
Figure 1. Figure 1: System design in improving the average user’s engagement and compliance. Some researchers have tried to reduce the burden on the user by incorporating ESM more seamlessly into their daily pipelines, for example, by placing it in the phone unlock screen [5]. Other researchers have designed engaging games, making short questionnaires part of the game flow, and have validated ESM responses captured in the gam… view at source ↗
Figure 3
Figure 3. Figure 3: Set of emojis used. IV. HUMAN SUBJECTS The study protocol was approved by the institutional re￾view board at [anonymous institution]. Participants signed￾up for the study online and were randomly assigned to the Emotion-Aware condition or the Control group. Forty one participants were recruited. One participant dropped out early due the app’s phone battery usage. Another participant had previous knowledge … view at source ↗
Figure 5
Figure 5. Figure 5: Average percentage of ESM responses per participant with one [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
read the original abstract

A natural conversational interface that allows longitudinal symptom tracking would be extremely valuable in health/wellness applications. However, the task of designing emotionally-aware agents for behavior change is still poorly understood. In this paper, we present the design and evaluation of an emotion-aware chatbot that conducts experience sampling in an empathetic manner. We evaluate it through a human-subject experiment with N=39 participants over the course of a week. Our results show that extraverts preferred the emotion-aware chatbot significantly more than introverts. Also, participants reported a higher percentage of positive mood reports when interacting with the empathetic bot. Finally, we provide guidelines for the design of emotion-aware chatbots for potential use in mHealth contexts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents the design of an emotion-aware chatbot for empathetic experience sampling in behavior-change contexts and reports results from a one-week human-subject study with N=39 participants. It claims that extraverts preferred the emotion-aware chatbot significantly more than introverts and that participants gave a higher percentage of positive mood reports when interacting with the empathetic version; the work concludes with design guidelines for mHealth applications.

Significance. If the causal attribution to emotional-awareness features can be established, the directional findings on personality-moderated preference and mood reporting would offer useful empirical grounding for chatbot design in wellness applications. The N=39 sample size and longitudinal element provide a modest but concrete data point; however, the absence of controls leaves the central claims only moderately supported.

major comments (2)
  1. [Abstract] Abstract (results paragraph): the claims that extraverts preferred the emotion-aware chatbot more and that positive mood reports increased rest on the assumption that these outcomes are caused by the emotional-awareness/empathetic components, yet the study description supplies no non-emotion-aware control arm, counterbalanced conditions, or regression covariates for interaction length, prompt wording, or demand characteristics.
  2. [Abstract] Abstract (evaluation paragraph): with only N=39 and no reported statistical methods, power analysis, or handling of personality-measurement reliability, the reported significance of the extraversion-preference difference cannot be evaluated for robustness or confounds.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our abstract and evaluation. The comments correctly identify areas where causal language and statistical reporting can be clarified. We respond point-by-point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract (results paragraph): the claims that extraverts preferred the emotion-aware chatbot more and that positive mood reports increased rest on the assumption that these outcomes are caused by the emotional-awareness/empathetic components, yet the study description supplies no non-emotion-aware control arm, counterbalanced conditions, or regression covariates for interaction length, prompt wording, or demand characteristics.

    Authors: The study was a single-condition longitudinal deployment; the extraversion finding compares preference ratings between high- and low-extraversion participants using the same emotion-aware chatbot. The mood-report claim compares positive-mood percentages across the sample but does not include a non-empathetic control arm. We agree that causal attribution to the empathetic features cannot be established without such a control. We will revise the abstract to remove any implication of causality, explicitly describe the single-arm design, and add a limitations paragraph discussing the absence of a control condition and potential confounds. revision: yes

  2. Referee: [Abstract] Abstract (evaluation paragraph): with only N=39 and no reported statistical methods, power analysis, or handling of personality-measurement reliability, the reported significance of the extraversion-preference difference cannot be evaluated for robustness or confounds.

    Authors: The full manuscript reports the statistical tests (Pearson correlation between extraversion scores and preference ratings, with exact p-value and effect size), the 10-item Big-Five Inventory used, and its established reliability coefficients. No a-priori power analysis was performed because the study was exploratory. We will expand the abstract's evaluation paragraph to include a one-sentence summary of the statistical approach and will add an explicit statement on sample-size limitations and lack of power analysis. revision: partial

Circularity Check

0 steps flagged

Empirical user study with observed data; no derivations, fitted predictions, or self-citation chains

full rationale

The paper reports a human-subject experiment (N=39) evaluating an emotion-aware chatbot for experience sampling. Central claims rest on direct participant preference ratings and mood-report percentages, with no mathematical equations, parameter fitting, or predictive models that could reduce to inputs by construction. No self-citations invoke uniqueness theorems or ansatzes; the work contains no derivation chain at all. Methodological concerns (e.g., control conditions) affect validity but do not constitute circularity under the defined patterns. The result is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical HCI study; no free parameters, axioms, or invented entities are introduced or required.

pith-pipeline@v0.9.0 · 5645 in / 985 out tokens · 30400 ms · 2026-05-24T17:41:41.924183+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

  1. [1]

    Environmental effects on cog- nitive and affective states: The experiential time sampling approach,

    S. Prescott and M. Csikszentmihalyi, “Environmental effects on cog- nitive and affective states: The experiential time sampling approach,” Social Behavior and Personality: an international journal , vol. 9, no. 1, pp. 23–32, 1981

  2. [2]

    Experience sampling: promises and pitfalls, strength and weaknesses,

    C. N. Scollon, C.-K. Prieto, and E. Diener, “Experience sampling: promises and pitfalls, strength and weaknesses,” in Assessing well-being. Springer, 2009, pp. 157–180

  3. [3]

    Validity and reliability of the experience-sampling method,

    M. Csikszentmihalyi and R. Larson, “Validity and reliability of the experience-sampling method,” in Flow and the foundations of positive psychology. Springer, 2014, pp. 35–54

  4. [4]

    Trends in ambulatory self-report: the role of momentary experience in psychosomatic medicine,

    T. S. Conner and L. F. Barrett, “Trends in ambulatory self-report: the role of momentary experience in psychosomatic medicine,” Psychosomatic medicine, vol. 74, no. 4, p. 327, 2012

  5. [5]

    ”kind and grateful

    A. Ghandeharioun, A. Azaria, S. Taylor, and R. W. Picard, “”kind and grateful”: a context-sensitive smartphone app utilizing inspirational content to promote gratitude,” Psychology of well-being , vol. 6, no. 1, pp. 1–21, 2016

  6. [6]

    Use of in-game rewards to motivate daily self-report compliance: Randomized controlled trial,

    S. Taylor, C. Ferguson, F. Peng, M. Schoeneich, and R. W. Picard, “Use of in-game rewards to motivate daily self-report compliance: Randomized controlled trial,” Journal of medical Internet research , vol. 21, no. 1, p. e11683, 2019

  7. [7]

    Echoes from the past: how technology mediated reflection improves well-being,

    E. Isaacs, A. Konrad, A. Walendowski, T. Lennig, V . Hollis, and S. Whittaker, “Echoes from the past: how technology mediated reflection improves well-being,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems . ACM, 2013, pp. 1071–1080

  8. [8]

    Toward an affect-sensitive autotutor,

    S. D’Mello, R. W. Picard, and A. Graesser, “Toward an affect-sensitive autotutor,” IEEE Intelligent Systems , vol. 22, no. 4, 2007

  9. [9]

    Simsensei kiosk: A virtual human interviewer for healthcare decision support,

    D. DeVault, R. Artstein, G. Benn, T. Dey, E. Fast, A. Gainer, K. Georgila, J. Gratch, A. Hartholt, M. Lhommet et al., “Simsensei kiosk: A virtual human interviewer for healthcare decision support,” in Proceedings of the 2014 international conference on Autonomous agents and multi- agent systems. International Foundation for Autonomous Agents and Multiagen...

  10. [10]

    An affectively aware virtual therapist for depression counseling,

    L. Ring, T. Bickmore, and P. Pedrelli, “An affectively aware virtual therapist for depression counseling,” in ACM SIGCHI Conference on Human Factors in Computing Systems (CHI) workshop on Computing and Mental Health , 2016

  11. [11]

    Embedded empathy in continuous, interactive health assessment,

    K. Liu and R. W. Picard, “Embedded empathy in continuous, interactive health assessment,” in CHI Workshop on HCI Challenges in Health Assessment, vol. 1, no. 2. Citeseer, 2005, p. 3

  12. [12]

    Relational agents: a model and implemen- tation of building user trust,

    T. Bickmore and J. Cassell, “Relational agents: a model and implemen- tation of building user trust,” in Proceedings of the SIGCHI conference on Human factors in computing systems . ACM, 2001, pp. 396–403

  13. [13]

    Creating rapport with virtual agents,

    J. Gratch, N. Wang, J. Gerten, E. Fast, and R. Duffy, “Creating rapport with virtual agents,” in International Workshop on Intelligent Virtual Agents. Springer, 2007, pp. 125–138

  14. [14]

    Its only a computer: Virtual humans increase willingness to disclose,

    G. M. Lucas, J. Gratch, A. King, and L.-P. Morency, “Its only a computer: Virtual humans increase willingness to disclose,” Computers in Human Behavior , vol. 37, pp. 94–100, 2014

  15. [15]

    The problem of informant accuracy: The validity of retrospective data,

    H. R. Bernard, P. Killworth, D. Kronenfeld, and L. Sailer, “The problem of informant accuracy: The validity of retrospective data,” Annual review of anthropology, vol. 13, no. 1, pp. 495–517, 1984

  16. [16]

    Using the past to enhance the present: Boosting happiness through positive reminiscence,

    F. B. Bryant, C. M. Smart, and S. P. King, “Using the past to enhance the present: Boosting happiness through positive reminiscence,” Journal of Happiness Studies , vol. 6, no. 3, pp. 227–260, 2005

  17. [17]

    Studyportal api,

    K. Rowan, “Studyportal api,” http://studyservice.cloudapp.net/docs/, 2013, online, Retrieved August 14, 2017

  18. [18]

    Improving smartphone users’ affect and wellbeing with personalized positive psychology interventions,

    S. Jeong and C. L. Breazeal, “Improving smartphone users’ affect and wellbeing with personalized positive psychology interventions,” in Proceedings of the Fourth International Conference on Human Agent Interaction. ACM, 2016, pp. 131–137

  19. [19]

    A circumplex model of affect,

    J. A. Russell, “A circumplex model of affect,” Journal of Personality and Social Psychology , vol. 39, no. 6, pp. 1161–1178, 1980

  20. [20]

    The structure of negative emotional states: Comparison of the depression anxiety stress scales (dass) with the beck depression and anxiety inventories,

    P. F. Lovibond and S. H. Lovibond, “The structure of negative emotional states: Comparison of the depression anxiety stress scales (dass) with the beck depression and anxiety inventories,” Behaviour research and therapy, vol. 33, no. 3, pp. 335–343, 1995

  21. [21]

    This computer responds to user frustration: Theory, design, and results,

    J. Klein, Y . Moon, and R. W. Picard, “This computer responds to user frustration: Theory, design, and results,” Interacting with computers , vol. 14, no. 2, pp. 119–140, 2002

  22. [22]

    Subtle expressivity by relational agents,

    T. Bickmore and R. Picard, “Subtle expressivity by relational agents,” in Proceedings of the CHI 2003 Workshop on Subtle Expressivity for Characters and Robots , 2003

  23. [23]

    Reeves and C

    B. Reeves and C. I. Nass, The media equation: How people treat computers, television, and new media like real people and places. Cambridge university press, 1996

  24. [24]

    Does computer-generated speech manifest personality? an experimental test of similarity-attraction,

    C. Nass and K. M. Lee, “Does computer-generated speech manifest personality? an experimental test of similarity-attraction,” in Proceedings of the SIGCHI conference on Human Factors in Computing Systems . ACM, 2000, pp. 329–336

  25. [25]

    The influence of users personality and gen- der on the processing of virtual agents multimodal behavior,

    S. Buisine and J.-C. Martin, “The influence of users personality and gen- der on the processing of virtual agents multimodal behavior,” Advances in Psychology Research, vol. 65, pp. 1–14, 2010

  26. [26]

    Personality structure: Emergence of the five-factor model,

    J. M. Digman, “Personality structure: Emergence of the five-factor model,” Annual review of psychology, vol. 41, no. 1, pp. 417–440, 1990

  27. [27]

    Development and validation of brief measures of positive and negative affect: the panas scales

    D. Watson, L. A. Clark, and A. Tellegen, “Development and validation of brief measures of positive and negative affect: the panas scales.” Journal of personality and social psychology , vol. 54, no. 6, p. 1063, 1988

  28. [28]

    Personality depends on the medium: differences in self-perception on snapchat, facebook and offline,

    L. Taber and S. Whittaker, “Personality depends on the medium: differences in self-perception on snapchat, facebook and offline,” in Pro- ceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 2018, p. 607

  29. [29]

    Using millions of emoji occurrences to learn any-domain represen- tations for detecting sentiment, emotion and sarcasm,

    B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann, “Using millions of emoji occurrences to learn any-domain represen- tations for detecting sentiment, emotion and sarcasm,” arXiv preprint arXiv:1708.00524, 2017

  30. [30]

    Toward controlled generation of text,

    Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing, “Toward controlled generation of text,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 . JMLR. org, 2017, pp. 1587–1596

  31. [31]

    Approximating interactive human eval- uation with self-play for open-domain dialog systems,

    A. Ghandeharioun, J. Shen, N. Jaques, C. Ferguson, N. Jones, A. Lapedriza, and R. Picard, “Approximating interactive human eval- uation with self-play for open-domain dialog systems,” arXiv preprint arXiv:, 2019