Towards Understanding Emotional Intelligence for Behavior Change Chatbots
Pith reviewed 2026-05-24 17:41 UTC · model grok-4.3
The pith
An emotion-aware chatbot for mood tracking is preferred by extraverts and yields more positive mood reports.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors built an emotion-aware chatbot that conducts experience sampling in an empathetic manner and evaluated it with N=39 participants over the course of a week. Their results show that extraverts preferred the emotion-aware chatbot significantly more than introverts, and that participants reported a higher percentage of positive mood reports when interacting with the empathetic bot. They conclude with guidelines for the design of emotion-aware chatbots for potential use in mHealth contexts.
What carries the argument
The emotion-aware chatbot that conducts experience sampling empathetically by detecting and responding to user emotions.
If this is right
- Behavior change tools in mHealth can increase engagement by matching emotional responses to user personality traits such as extraversion.
- Empathetic sampling may shift the distribution of self-reported moods toward more positive entries during longitudinal tracking.
- Design guidelines from the study can be used to build other chatbots that combine experience sampling with emotional intelligence.
Where Pith is reading between the lines
- Personality-based tailoring of chatbot emotional style could extend to other health behaviors such as medication reminders or exercise prompts.
- The preference difference suggests that introverts might respond better to chatbots with lower emotional expressiveness.
- Longer-term studies could test whether the mood-report shift leads to measurable changes in actual behavior or symptom tracking accuracy.
Load-bearing premise
The differences in preference and mood reports are caused by the emotional awareness features rather than other aspects of the chatbot design or the sampling task.
What would settle it
A controlled comparison that keeps all other chatbot elements identical but removes emotional responses, then finds no difference in extravert preference or positive mood reports.
Figures
read the original abstract
A natural conversational interface that allows longitudinal symptom tracking would be extremely valuable in health/wellness applications. However, the task of designing emotionally-aware agents for behavior change is still poorly understood. In this paper, we present the design and evaluation of an emotion-aware chatbot that conducts experience sampling in an empathetic manner. We evaluate it through a human-subject experiment with N=39 participants over the course of a week. Our results show that extraverts preferred the emotion-aware chatbot significantly more than introverts. Also, participants reported a higher percentage of positive mood reports when interacting with the empathetic bot. Finally, we provide guidelines for the design of emotion-aware chatbots for potential use in mHealth contexts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the design of an emotion-aware chatbot for empathetic experience sampling in behavior-change contexts and reports results from a one-week human-subject study with N=39 participants. It claims that extraverts preferred the emotion-aware chatbot significantly more than introverts and that participants gave a higher percentage of positive mood reports when interacting with the empathetic version; the work concludes with design guidelines for mHealth applications.
Significance. If the causal attribution to emotional-awareness features can be established, the directional findings on personality-moderated preference and mood reporting would offer useful empirical grounding for chatbot design in wellness applications. The N=39 sample size and longitudinal element provide a modest but concrete data point; however, the absence of controls leaves the central claims only moderately supported.
major comments (2)
- [Abstract] Abstract (results paragraph): the claims that extraverts preferred the emotion-aware chatbot more and that positive mood reports increased rest on the assumption that these outcomes are caused by the emotional-awareness/empathetic components, yet the study description supplies no non-emotion-aware control arm, counterbalanced conditions, or regression covariates for interaction length, prompt wording, or demand characteristics.
- [Abstract] Abstract (evaluation paragraph): with only N=39 and no reported statistical methods, power analysis, or handling of personality-measurement reliability, the reported significance of the extraversion-preference difference cannot be evaluated for robustness or confounds.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our abstract and evaluation. The comments correctly identify areas where causal language and statistical reporting can be clarified. We respond point-by-point below.
read point-by-point responses
-
Referee: [Abstract] Abstract (results paragraph): the claims that extraverts preferred the emotion-aware chatbot more and that positive mood reports increased rest on the assumption that these outcomes are caused by the emotional-awareness/empathetic components, yet the study description supplies no non-emotion-aware control arm, counterbalanced conditions, or regression covariates for interaction length, prompt wording, or demand characteristics.
Authors: The study was a single-condition longitudinal deployment; the extraversion finding compares preference ratings between high- and low-extraversion participants using the same emotion-aware chatbot. The mood-report claim compares positive-mood percentages across the sample but does not include a non-empathetic control arm. We agree that causal attribution to the empathetic features cannot be established without such a control. We will revise the abstract to remove any implication of causality, explicitly describe the single-arm design, and add a limitations paragraph discussing the absence of a control condition and potential confounds. revision: yes
-
Referee: [Abstract] Abstract (evaluation paragraph): with only N=39 and no reported statistical methods, power analysis, or handling of personality-measurement reliability, the reported significance of the extraversion-preference difference cannot be evaluated for robustness or confounds.
Authors: The full manuscript reports the statistical tests (Pearson correlation between extraversion scores and preference ratings, with exact p-value and effect size), the 10-item Big-Five Inventory used, and its established reliability coefficients. No a-priori power analysis was performed because the study was exploratory. We will expand the abstract's evaluation paragraph to include a one-sentence summary of the statistical approach and will add an explicit statement on sample-size limitations and lack of power analysis. revision: partial
Circularity Check
Empirical user study with observed data; no derivations, fitted predictions, or self-citation chains
full rationale
The paper reports a human-subject experiment (N=39) evaluating an emotion-aware chatbot for experience sampling. Central claims rest on direct participant preference ratings and mood-report percentages, with no mathematical equations, parameter fitting, or predictive models that could reduce to inputs by construction. No self-citations invoke uniqueness theorems or ansatzes; the work contains no derivation chain at all. Methodological concerns (e.g., control conditions) affect validity but do not constitute circularity under the defined patterns. The result is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Environmental effects on cog- nitive and affective states: The experiential time sampling approach,
S. Prescott and M. Csikszentmihalyi, “Environmental effects on cog- nitive and affective states: The experiential time sampling approach,” Social Behavior and Personality: an international journal , vol. 9, no. 1, pp. 23–32, 1981
work page 1981
-
[2]
Experience sampling: promises and pitfalls, strength and weaknesses,
C. N. Scollon, C.-K. Prieto, and E. Diener, “Experience sampling: promises and pitfalls, strength and weaknesses,” in Assessing well-being. Springer, 2009, pp. 157–180
work page 2009
-
[3]
Validity and reliability of the experience-sampling method,
M. Csikszentmihalyi and R. Larson, “Validity and reliability of the experience-sampling method,” in Flow and the foundations of positive psychology. Springer, 2014, pp. 35–54
work page 2014
-
[4]
Trends in ambulatory self-report: the role of momentary experience in psychosomatic medicine,
T. S. Conner and L. F. Barrett, “Trends in ambulatory self-report: the role of momentary experience in psychosomatic medicine,” Psychosomatic medicine, vol. 74, no. 4, p. 327, 2012
work page 2012
-
[5]
A. Ghandeharioun, A. Azaria, S. Taylor, and R. W. Picard, “”kind and grateful”: a context-sensitive smartphone app utilizing inspirational content to promote gratitude,” Psychology of well-being , vol. 6, no. 1, pp. 1–21, 2016
work page 2016
-
[6]
Use of in-game rewards to motivate daily self-report compliance: Randomized controlled trial,
S. Taylor, C. Ferguson, F. Peng, M. Schoeneich, and R. W. Picard, “Use of in-game rewards to motivate daily self-report compliance: Randomized controlled trial,” Journal of medical Internet research , vol. 21, no. 1, p. e11683, 2019
work page 2019
-
[7]
Echoes from the past: how technology mediated reflection improves well-being,
E. Isaacs, A. Konrad, A. Walendowski, T. Lennig, V . Hollis, and S. Whittaker, “Echoes from the past: how technology mediated reflection improves well-being,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems . ACM, 2013, pp. 1071–1080
work page 2013
-
[8]
Toward an affect-sensitive autotutor,
S. D’Mello, R. W. Picard, and A. Graesser, “Toward an affect-sensitive autotutor,” IEEE Intelligent Systems , vol. 22, no. 4, 2007
work page 2007
-
[9]
Simsensei kiosk: A virtual human interviewer for healthcare decision support,
D. DeVault, R. Artstein, G. Benn, T. Dey, E. Fast, A. Gainer, K. Georgila, J. Gratch, A. Hartholt, M. Lhommet et al., “Simsensei kiosk: A virtual human interviewer for healthcare decision support,” in Proceedings of the 2014 international conference on Autonomous agents and multi- agent systems. International Foundation for Autonomous Agents and Multiagen...
work page 2014
-
[10]
An affectively aware virtual therapist for depression counseling,
L. Ring, T. Bickmore, and P. Pedrelli, “An affectively aware virtual therapist for depression counseling,” in ACM SIGCHI Conference on Human Factors in Computing Systems (CHI) workshop on Computing and Mental Health , 2016
work page 2016
-
[11]
Embedded empathy in continuous, interactive health assessment,
K. Liu and R. W. Picard, “Embedded empathy in continuous, interactive health assessment,” in CHI Workshop on HCI Challenges in Health Assessment, vol. 1, no. 2. Citeseer, 2005, p. 3
work page 2005
-
[12]
Relational agents: a model and implemen- tation of building user trust,
T. Bickmore and J. Cassell, “Relational agents: a model and implemen- tation of building user trust,” in Proceedings of the SIGCHI conference on Human factors in computing systems . ACM, 2001, pp. 396–403
work page 2001
-
[13]
Creating rapport with virtual agents,
J. Gratch, N. Wang, J. Gerten, E. Fast, and R. Duffy, “Creating rapport with virtual agents,” in International Workshop on Intelligent Virtual Agents. Springer, 2007, pp. 125–138
work page 2007
-
[14]
Its only a computer: Virtual humans increase willingness to disclose,
G. M. Lucas, J. Gratch, A. King, and L.-P. Morency, “Its only a computer: Virtual humans increase willingness to disclose,” Computers in Human Behavior , vol. 37, pp. 94–100, 2014
work page 2014
-
[15]
The problem of informant accuracy: The validity of retrospective data,
H. R. Bernard, P. Killworth, D. Kronenfeld, and L. Sailer, “The problem of informant accuracy: The validity of retrospective data,” Annual review of anthropology, vol. 13, no. 1, pp. 495–517, 1984
work page 1984
-
[16]
Using the past to enhance the present: Boosting happiness through positive reminiscence,
F. B. Bryant, C. M. Smart, and S. P. King, “Using the past to enhance the present: Boosting happiness through positive reminiscence,” Journal of Happiness Studies , vol. 6, no. 3, pp. 227–260, 2005
work page 2005
-
[17]
K. Rowan, “Studyportal api,” http://studyservice.cloudapp.net/docs/, 2013, online, Retrieved August 14, 2017
work page 2013
-
[18]
S. Jeong and C. L. Breazeal, “Improving smartphone users’ affect and wellbeing with personalized positive psychology interventions,” in Proceedings of the Fourth International Conference on Human Agent Interaction. ACM, 2016, pp. 131–137
work page 2016
-
[19]
J. A. Russell, “A circumplex model of affect,” Journal of Personality and Social Psychology , vol. 39, no. 6, pp. 1161–1178, 1980
work page 1980
-
[20]
P. F. Lovibond and S. H. Lovibond, “The structure of negative emotional states: Comparison of the depression anxiety stress scales (dass) with the beck depression and anxiety inventories,” Behaviour research and therapy, vol. 33, no. 3, pp. 335–343, 1995
work page 1995
-
[21]
This computer responds to user frustration: Theory, design, and results,
J. Klein, Y . Moon, and R. W. Picard, “This computer responds to user frustration: Theory, design, and results,” Interacting with computers , vol. 14, no. 2, pp. 119–140, 2002
work page 2002
-
[22]
Subtle expressivity by relational agents,
T. Bickmore and R. Picard, “Subtle expressivity by relational agents,” in Proceedings of the CHI 2003 Workshop on Subtle Expressivity for Characters and Robots , 2003
work page 2003
-
[23]
B. Reeves and C. I. Nass, The media equation: How people treat computers, television, and new media like real people and places. Cambridge university press, 1996
work page 1996
-
[24]
Does computer-generated speech manifest personality? an experimental test of similarity-attraction,
C. Nass and K. M. Lee, “Does computer-generated speech manifest personality? an experimental test of similarity-attraction,” in Proceedings of the SIGCHI conference on Human Factors in Computing Systems . ACM, 2000, pp. 329–336
work page 2000
-
[25]
S. Buisine and J.-C. Martin, “The influence of users personality and gen- der on the processing of virtual agents multimodal behavior,” Advances in Psychology Research, vol. 65, pp. 1–14, 2010
work page 2010
-
[26]
Personality structure: Emergence of the five-factor model,
J. M. Digman, “Personality structure: Emergence of the five-factor model,” Annual review of psychology, vol. 41, no. 1, pp. 417–440, 1990
work page 1990
-
[27]
Development and validation of brief measures of positive and negative affect: the panas scales
D. Watson, L. A. Clark, and A. Tellegen, “Development and validation of brief measures of positive and negative affect: the panas scales.” Journal of personality and social psychology , vol. 54, no. 6, p. 1063, 1988
work page 1988
-
[28]
Personality depends on the medium: differences in self-perception on snapchat, facebook and offline,
L. Taber and S. Whittaker, “Personality depends on the medium: differences in self-perception on snapchat, facebook and offline,” in Pro- ceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 2018, p. 607
work page 2018
-
[29]
B. Felbo, A. Mislove, A. Søgaard, I. Rahwan, and S. Lehmann, “Using millions of emoji occurrences to learn any-domain represen- tations for detecting sentiment, emotion and sarcasm,” arXiv preprint arXiv:1708.00524, 2017
-
[30]
Toward controlled generation of text,
Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, and E. P. Xing, “Toward controlled generation of text,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 . JMLR. org, 2017, pp. 1587–1596
work page 2017
-
[31]
Approximating interactive human eval- uation with self-play for open-domain dialog systems,
A. Ghandeharioun, J. Shen, N. Jaques, C. Ferguson, N. Jones, A. Lapedriza, and R. Picard, “Approximating interactive human eval- uation with self-play for open-domain dialog systems,” arXiv preprint arXiv:, 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.