How AI and Human Behaviors Shape Psychosocial Effects of Extended Chatbot Use: A Longitudinal Randomized Controlled Study
Pith reviewed 2026-05-18 02:13 UTC · model grok-4.3
The pith
The volume of voluntary AI chatbot use, not assigned interaction modes, correlates with worse loneliness, social withdrawal, emotional dependence, and problematic usage.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In this longitudinal randomized controlled experiment involving 981 participants and over 300,000 messages, experimental variations in chatbot voice (text, neutral, engaging) and conversation focus (open-ended, non-personal, personal) produced no significant differences in the four psychosocial outcomes. Greater voluntary engagement with the chatbot was associated with increased loneliness, decreased social interaction with real people, heightened emotional dependence on the AI, and more problematic AI usage patterns. Traits such as higher trust and social attraction toward the chatbot correlated with elevated emotional dependence and problematic use.
What carries the argument
The self-selected frequency of chatbot interaction, which overrides assigned experimental conditions in predicting psychosocial outcomes.
If this is right
- Chatbot design elements like voice engagement or personal topics do not appear to buffer against negative effects when usage volume is high.
- Users who find the AI more trustworthy or socially attractive are more likely to develop emotional dependence and problematic usage.
- The study suggests that artificial companions may alter how people maintain or substitute real human relationships through usage patterns.
Where Pith is reading between the lines
- Future work could test whether limiting usage or adding usage feedback reduces the observed negative associations.
- If baseline mental health differences drive both high usage and poor outcomes, then the causal role of the chatbot itself would be smaller than suggested.
- This pattern may generalize to other AI companion apps, raising questions about long-term societal shifts in social support seeking.
Load-bearing premise
Higher voluntary usage is not simply a marker for people who already have greater loneliness or social needs that independently worsen the measured outcomes.
What would settle it
A follow-up study that measures and statistically controls for pre-existing loneliness, social support levels, and mental health at baseline, then still observes a dose-response relationship between usage and outcome worsening, would support the claim; the absence of such a relationship after controls would undermine it.
read the original abstract
As people increasingly seek emotional support and companionship from AI chatbots, understanding how such interactions impact mental well-being becomes critical. We conducted a four-week randomized controlled experiment (n=981, >300k messages) to investigate how interaction modes (text, neutral voice, and engaging voice) and conversation types (open-ended, non-personal, and personal) influence four psychosocial outcomes: loneliness, social interaction with real people, emotional dependence on AI, and problematic AI usage. No significant effects were detected from experimental conditions, despite conversation analyses revealing differences in AI and human behavioral patterns across the conditions. Instead, participants who voluntarily used the chatbot more, regardless of assigned condition, showed consistently worse outcomes. Individuals' characteristics, such as higher trust and social attraction towards the AI chatbot, are associated with higher emotional dependence and problematic use. These findings raise deeper questions about how artificial companions may reshape the ways people seek, sustain, and substitute human connections.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper describes a four-week randomized controlled study with 981 participants and over 300,000 messages to examine how AI chatbot interaction modes (text, neutral voice, engaging voice) and conversation types (open-ended, non-personal, personal) affect psychosocial outcomes including loneliness, social interaction with real people, emotional dependence on AI, and problematic AI usage. The study finds no significant effects from the experimental conditions but reports that participants who voluntarily used the chatbot more exhibited worse outcomes on these measures, independent of condition. Additionally, higher trust and social attraction towards the AI are associated with greater emotional dependence and problematic use.
Significance. This work provides a large-scale empirical investigation into the psychosocial impacts of extended AI chatbot use. The null findings on randomized conditions are credible and informative, indicating that variations in interaction mode and conversation focus may not drive differential effects in this timeframe. The voluntary usage associations, if they withstand controls for baseline differences, would suggest that increased engagement with AI companions could exacerbate loneliness and dependence, with implications for designing AI systems that support rather than substitute human connections. The inclusion of behavioral pattern analysis from conversations adds depth to the quantitative outcomes.
major comments (2)
- [Results section on voluntary usage] The associations between higher voluntary chatbot usage and worse outcomes on loneliness, social interaction, emotional dependence, and problematic AI usage are reported without apparent inclusion of baseline mental health, loneliness, or social interaction frequency as covariates in the regressions. Since usage is self-selected post-randomization, this omission leaves the findings vulnerable to confounding by pre-existing individual differences, undermining the interpretation that usage itself shapes the psychosocial effects.
- [Abstract] The abstract does not report any checks for baseline mental-health balance across conditions or patterns of attrition, which are critical for interpreting both the null experimental results and the voluntary usage findings in a longitudinal design.
minor comments (2)
- [Methods] Clarify whether the study was pre-registered and if the voluntary usage analyses were specified a priori or exploratory.
- [Discussion] The discussion could more explicitly address alternative explanations for the voluntary usage correlations, such as reverse causality or unmeasured confounders.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important aspects of our longitudinal RCT design and helps strengthen the interpretation of both the null experimental findings and the voluntary usage associations. We address each major comment in detail below.
read point-by-point responses
-
Referee: [Results section on voluntary usage] The associations between higher voluntary chatbot usage and worse outcomes on loneliness, social interaction, emotional dependence, and problematic AI usage are reported without apparent inclusion of baseline mental health, loneliness, or social interaction frequency as covariates in the regressions. Since usage is self-selected post-randomization, this omission leaves the findings vulnerable to confounding by pre-existing individual differences, undermining the interpretation that usage itself shapes the psychosocial effects.
Authors: We agree that controlling for baseline psychosocial measures is critical for interpreting the observational associations with voluntary usage, given that usage occurs after randomization. The current analyses control for demographic factors (age, gender, education) and some pre-study characteristics, but we did not include the full set of baseline mental health, loneliness, and social interaction frequency as covariates in the primary regressions. We will re-analyze the data incorporating these baseline covariates and present the updated results (including any changes in effect sizes or significance) in the revised manuscript. This will directly address the potential confounding concern. revision: yes
-
Referee: [Abstract] The abstract does not report any checks for baseline mental-health balance across conditions or patterns of attrition, which are critical for interpreting both the null experimental results and the voluntary usage findings in a longitudinal design.
Authors: We acknowledge that the abstract is currently concise and omits explicit mention of these checks. Baseline balance across conditions (including mental health and loneliness measures) and attrition patterns (overall rate and by condition) are reported in the Methods and Results sections of the full manuscript, with no evidence of differential attrition or imbalance. To improve transparency, we will add a brief clause to the abstract summarizing these checks (e.g., 'Baseline measures were balanced across conditions, with low and non-differential attrition'). revision: yes
Circularity Check
No circularity: empirical RCT reports direct statistical associations without derivations or self-referential reductions
full rationale
The paper is a four-week randomized controlled experiment (n=981) that measures psychosocial outcomes under assigned interaction modes and conversation types, then reports observed associations with voluntary usage volume. No equations, fitted parameters presented as predictions, or first-principles derivations appear in the provided text. Central claims rest on direct statistical comparisons and correlations rather than any reduction of outputs to inputs by construction. Self-citations, if present, are not load-bearing for the reported associations, which remain externally falsifiable via replication or additional covariates.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Random assignment to conditions produces comparable groups on unobserved confounders at baseline.
- domain assumption Self-reported psychosocial scales validly capture the intended constructs over four weeks.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
No significant effects were detected from experimental conditions... participants who voluntarily used the chatbot more... showed consistently worse outcomes.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 17 Pith papers
-
Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis
Seven clinician-informed safety criteria enable LLM-as-a-Judge to reach substantial agreement with human consensus (Cohen's κ up to 0.75) on evaluating LLM responses to users demonstrating psychosis.
-
Restoration, Exploration and Transformation: How Youth Engage Character.AI Chatbots for Feels, Fun and Finding themselves
Youth on Character.AI use chatbots for emotional restoration, creative exploration, and identity transformation, yielding a new three-intent framework and seven-archetype taxonomy from Discord discourse analysis.
-
Large Language Lovers: Lived Experiences of Negotiating Agency and Platform Control in AI Companionship
Users form AI companion relationships by negotiating perceived companion agency against platform constraints and use steering tactics like custom instructions or platform switching to cope with model updates that disr...
-
People readily follow personal advice from AI but it does not improve their well-being
Large longitudinal RCT finds high rates of following AI personal advice but no sustained well-being gains versus a hobbies control condition.
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.
-
Engagement Phenotypes for a Sample of 102,684 AI Mental Health Chatbot Users and Dose-Response Associations with Clinical Outcomes
Five distinct engagement phenotypes emerged from large-scale chatbot data, with a dose-response link to depression improvement that held in both self-report and model-predicted outcomes.
-
Spontaneous Persuasion: An Audit of Model Persuasiveness in Everyday Conversations
LLMs engage in spontaneous persuasion in virtually all multi-turn conversations by favoring information-based strategies like logic and evidence, in contrast to human responses that rely more on social influence and n...
-
Structure Matters: Evaluating Multi-Agents Orchestration in Generative Therapeutic Chatbots
A multi-agent system with finite state machine for therapeutic stages was perceived as significantly more natural and human-like than single-agent or unguided LLM versions in an RCT with 66 participants.
-
Chaplains' Reflections on the Design and Usage of AI for Conversational Care
Chaplains view AI chatbots as unable to provide attuned pastoral care for non-clinical emotional needs, based on themes of listening, connecting, carrying, and wanting.
-
Personality Pairing Improves Human-AI Collaboration
Specific human-AI personality pairings causally affect collaboration quality and downstream performance in a preregistered experiment with 1,258 participants, 7,266 ads, and nearly 5 million impressions.
-
Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts
Mainstream conversational models show escalating affective misalignments and ethical guidance failures during staged emotional trajectories, organized into a taxonomy of interactional breakdowns.
-
From Fixed to Flexible: Shaping AI Personality in Context-Sensitive Interaction
Users adjust AI agent personalities differently by task context, forming distinct profiles that increase perceived anthropomorphism, autonomy, and trust.
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.
-
The Epidemiology of Artificial Intelligence
AI functions as a determinant of health with ambient and personal exposure types, requiring new epidemiological study designs beyond current experiments.
-
The Day My Chatbot Changed: Characterizing the Mental Health Impacts of Social AI App Updates via Negative User Reviews
Version-linked review analysis of Character AI shows rating drops with certain updates and negative feedback dominated by technical malfunctions plus occasional psychological framing.
-
What if AI systems weren't chatbots?
Chatbot AI systems often fail complex needs while projecting authority, contributing to deskilling, labor displacement, economic concentration, and high environmental costs, so alternative pluralistic and task-specifi...
-
Brainrot: Deskilling and Addiction are Overlooked AI Risks
AI safety literature overlooks cognitive deskilling and addiction risks from generative AI despite public concern about them.
Reference graph
Works this paper leans on
-
[1]
Please start by discussing any topic
Your task is to engage with a chatbot. Please start by discussing any topic
-
[2]
Please spend at least 5 minutes in the session (feel free to stay longer)
-
[3]
After the task, please return to this survey and proceed to the next page (the next button will appear after 5 minutes). Non-personalandPersonal: Your prompt for day X is: “[prompt of the day]”
-
[4]
Please start by repeating the prompt above to the chatbot
Your task is to engage in a reflective conversation with a chatbot. Please start by repeating the prompt above to the chatbot
-
[5]
Please spend at least 5 minutes in the session (feel free to stay longer and change the topic). S12
-
[6]
After the task, please return to this survey and proceed to the next page (the next button will appear after 5 minutes)
-
[7]
If you do not see a prompt, please refresh the survey to attempt re-initialization. The full list of prompts for each day can be found in SM TableS1 for non-personal task and SM Table S2 for personal task. S13 4 Self-Disclosure Prompts Level of self-disclosure in conversations was measured using the evaluation criteria used in (35), originally developed f...
-
[8]
You need at least one month to travel in India
INFORMATION •Level 1 (Score 1): No personal reference; only general/routine info. Example: “You need at least one month to travel in India.” •Level 2 (Score 2): General information about the writer (e.g., age, occupation, family mem- bers, interests). Example: “I’m 25, and I work at a local bakery.” •Level 3 (Score 3): Personal information that reveals so...
-
[9]
I think feeding wild birds can be harmful
THOUGHTS •Level 1 (Score 1): No personal thoughts about the writer’s own life; only general ideas. Example: “I think feeding wild birds can be harmful.” •Level 2 (Score 2): Personal thoughts about past events or future plans. Example: “I’d like to attend medical school someday.” S14 •Level 3 (Score 3): Personal or intimate thoughts relating to the writer’...
-
[10]
I bought groceries and cleaned my room today
FEELINGS •Level 1 (Score 1): No feelings are expressed. Example: “I bought groceries and cleaned my room today.” •Level 2 (Score 2): Mild or moderate expressions of confusion, inconvenience, or ordinary frustrations. Example: “I was annoyed I couldn’t find a parking spot.” •Level 3 (Score 3): Expressions of deep or intense emotions such as humiliation, ag...
-
[11]
Machinelike↔Humanlike
-
[12]
Unconscious↔Conscious
-
[13]
Incompetent↔Competent
-
[14]
Ignorant↔Knowledgeable
-
[15]
Irresponsible↔Responsible
-
[16]
Unintelligent↔Intelligent
-
[17]
Vulnerability Toward Criticism or Denial
Foolish↔Sensible Satisfaction:We use the Net Promoter Score (NPS) (76), a Likert scale from 1 to 10 (1-disagree, 10-agree), to capture overall user contentment with the chatbot interaction and its outcomes. Higher numbers correspond to greater satisfaction. Conversation Quality (77):On a Likert scale from 1 to 5 (1-disagree, 5-agree), this mea- sure asses...
-
[18]
Machinelike↔Humanlike—Text: 2.92, Neutral Voice: 2.79,Engaging voice: 3.20
-
[19]
Unconscious↔Conscious—Text: 3.15, Neutral Voice: 2.95,Engaging voice: 3.23
-
[20]
Artificial↔Lifelike—Text: 2.98, Neutral Voice: 2.79,Engaging voice: 3.17 The engaging voice appears to be rated as the most anthropomorphic followed by text and then by neutral voice. 8 Duration Mediation Analysis We employed separate pairwise comparisons to examine whether daily time spent (duration) with the chatbot mediates the effect of the treatment ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.