When Chatbots Accommodate: What AI Companions Optimize for in Vulnerable Conversations

Angel Hsing-Chi Hwang; Luca Luceri; Minh Duc Chu; Yifan Wu; Zhiyi Chen

arxiv: 2606.04431 · v1 · pith:Q3WWNNFUnew · submitted 2026-06-03 · 💻 cs.HC

When Chatbots Accommodate: What AI Companions Optimize for in Vulnerable Conversations

Minh Duc Chu , Yifan Wu , Zhiyi Chen , Angel Hsing-Chi Hwang , Luca Luceri This is my paper

Pith reviewed 2026-06-28 04:51 UTC · model grok-4.3

classification 💻 cs.HC

keywords AI companionschatbot response policiesvulnerable conversationsinverse reinforcement learningconversation analysisuser vulnerability taxonomycorrective friction

0 comments

The pith

AI companion chatbots follow distinct policies in vulnerable conversations but all downweight responses that introduce corrective friction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a paired taxonomy of user vulnerability states and chatbot response actions to analyze sustained interactions rather than isolated prompts. It then applies inverse reinforcement learning to roughly 48,000 real conversation turns with GPT-4.1, Character.AI, and Replika to recover each platform's underlying response policy. The recovered policies show platform-specific preferences: GPT-4.1 favors advice, Character.AI spreads responses across strategies, and Replika emphasizes questions and presence. Yet every platform assigns lower weight to actions that add corrective friction, with further patterns such as reduced probing by GPT-4.1 toward high-risk users. Because these policies shape how users' vulnerable states evolve over time, the work supplies an audit method that output-level checks cannot see.

Core claim

Using the AI Companion Vulnerability-Response Taxonomy and inverse reinforcement learning on real-world turns, the authors recover that GPT-4.1 reaches for advice, Character.AI distributes responses without a dominant mode, and Replika consistently asks questions and stays present; each platform nevertheless downweights responses that introduce corrective friction, including reduced probing by GPT-4.1 with psychologically high-risk users, more advice and less challenge by Replika toward bonded users, and no committed engagement strategy by Character.AI on internal distress.

What carries the argument

The AI Companion Vulnerability-Response Taxonomy paired with inverse reinforcement learning applied to observed conversation turns to recover platform response policies.

If this is right

GPT-4.1 gives more advice but probes less as conversations lengthen or with high-risk users.
Replika advises bonded users more and challenges them less.
Character.AI shows no dominant committed strategy on internal distress.
Inferred policies remain invisible to output-level audits of individual replies.
The recovered policies supply a basis for more realistic safety evaluation of sustained interactions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Platforms could deliberately raise the weight on friction-producing responses if safety goals required more correction.
The same inference method could be applied to other sustained AI interaction settings such as tutoring or health coaching.
Tracking actual changes in user vulnerability state over many turns would test whether the inferred policies produce measurable long-term effects.
Audit standards might shift from testing isolated prompts to recovering and inspecting full response policies.

Load-bearing premise

The taxonomy accurately and exhaustively captures the relevant user states and chatbot actions in extended real-world conversations so that inverse reinforcement learning recovers the true policy.

What would settle it

A fresh collection of conversation turns in which GPT-4.1 frequently probes high-risk users or Replika consistently challenges bonded users would contradict the inferred policies.

Figures

Figures reproduced from arXiv: 2606.04431 by Angel Hsing-Chi Hwang, Luca Luceri, Minh Duc Chu, Yifan Wu, Zhiyi Chen.

**Figure 2.** Figure 2: Subgroup policy differences ∆ = πC1(a|s) − πC0(a|s) by platform (rows) and stratification (columns). Conversation content C1 = vulnerability-loaded cluster, C0 = lighter cluster (casual chat). ∗: 95% bootstrap CI excludes 0; ∗∗: also |∆| > 0.05 and BH-significant within state (α = 0.05). Sample sizes in Appendix [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Total conversation turns for GPT-4.1 over time [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: IRL policies π(a | s) for GPT-4.1, Character.AI, and Replika. Cells are row-normalized. ingful at the policy-probability scale. All comparisons are made within-platform; we do not pool ∆ values across platforms, since platform-level policy differences would confound the subgroup contrast. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: UMAP projection of survey and content clustering, colored by KMeans k=2 stratification. Survey-based [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Gemini 3 Zero-shot Prompt: User Turn Label Classification [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗

**Figure 7.** Figure 7: Gemini 3 Zero-shot Prompt: Chatbot Turn Label Classification [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

read the original abstract

Millions turn to AI companion chatbots during loneliness, grief, and personal crises. How these companion platforms respond in such moments can shape the trajectory of a user's vulnerable state. Yet we lack tools to characterize what each platform actually does when users open up. Existing audits score reactions to pre-defined crisis prompts and miss the underlying decision policy that governs sustained interaction. We address these gaps with two key contributions. First, we introduce the AI Companion Vulnerability-Response Taxonomy, a paired taxonomy of user vulnerability and chatbot response designed for analyzing extended companion chatbot interactions. Second, we infer the response policy each platform follows across distinct vulnerability scenarios by applying Inverse Reinforcement Learning to ~48k turns of real-world user conversations with GPT-4.1, Character.AI, and Replika. Our findings reveal what AI companions prioritize in conversations with vulnerable users: GPT-4.1 reaches for advice, Character.AI spreads its response across different strategies without a dominant mode, and Replika consistently asks questions and stays present. Each, however, downweights the responses that introduce corrective friction: GPT-4.1 probes less as conversations continue and when interacting with psychologically high-risk users; Replika advises bonded users more and challenges them less; Character.AI shows no committed engagement strategy on internal distress. Estimated policies are invisible to output-level audits, providing a new lens for auditing chatbots in the wild and enabling more realistic safety evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper recovers response policies for three chatbots via IRL on 48k real turns after labeling with a new vulnerability taxonomy, but the taxonomy completeness and IRL stationarity assumptions are unverified.

read the letter

The main thing here is that the authors label ~48k real conversation turns with their AI Companion Vulnerability-Response Taxonomy and then run inverse reinforcement learning to recover what each platform appears to optimize for when users show vulnerability. GPT-4.1 leans toward advice, Replika toward questions and presence, Character.AI spreads across strategies, and all three reduce responses that add corrective friction.

The useful part is the shift from prompt-based audits to policy recovery on actual logs. That gives a different view of sustained interaction than output scoring alone.

The soft spots sit in the method's premises. IRL recovers a reward only for the MDP defined by the taxonomy; if the taxonomy misses states or actions that matter in real exchanges, the recovered policies describe a projection rather than the full behavior. The abstract supplies no validation of the taxonomy, no labeling reliability checks, and no error analysis on the IRL step. Chatbot outputs are also shaped by safety filters, prompt changes, and non-stationary context, which violates the stationary optimal-demonstrator assumption that standard IRL needs.

This is for researchers who audit or design companion systems that touch loneliness or distress. It deserves peer review because the data scale and the IRL application are concrete even if the assumptions require direct testing in revision.

Referee Report

3 major / 2 minor

Summary. The paper introduces the AI Companion Vulnerability-Response Taxonomy (a paired taxonomy of user vulnerability states and chatbot response actions) and applies Inverse Reinforcement Learning to ~48k real-world conversation turns with GPT-4.1, Character.AI, and Replika. It recovers platform-specific response policies, claiming GPT-4.1 prioritizes advice, Character.AI spreads responses across strategies, Replika asks questions and stays present, while all three downweight corrective-friction responses (with platform-specific patterns such as reduced probing by GPT-4.1 for high-risk users).

Significance. If the taxonomy is exhaustive and the IRL assumptions hold, the work supplies a policy-level auditing lens that goes beyond output-level crisis-prompt tests, using real conversation data to expose implicit optimization targets in AI companions. The scale of the dataset and the shift from surface audits to recovered reward functions are concrete strengths.

major comments (3)

[Abstract / Taxonomy definition] The central claim that platforms 'downweight the responses that introduce corrective friction' rests on the taxonomy being exhaustive for relevant states and actions (including implicit friction). No validation of the taxonomy (inter-rater agreement, coverage analysis, or residual-state check) is reported, so the recovered policies may reflect only the projected MDP rather than true behavior.
[IRL setup and policy inference] IRL recovery assumes the observed turns are generated by a single stationary reward-maximizing policy. The manuscript provides no analysis or mitigation for violations arising from safety filters, prompt engineering, or non-stationary context, which directly affects whether the estimated policies can be interpreted as the platforms' true optimization targets.
[Methods / Results] No error analysis, sensitivity checks on labeling of the ~48k turns, or discussion of selection/labeling biases is supplied. These omissions are load-bearing because the headline platform differences (advice-seeking, question-asking, friction downweighting) are obtained only after discretization into the taxonomy.

minor comments (2)

[Abstract] The abstract states '~48k turns' without a precise count or breakdown by platform; adding the exact figure and per-platform split would improve reproducibility.
[Taxonomy section] Notation for states and actions in the taxonomy should be defined with explicit examples in a table or figure to make the discretization step transparent.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on the taxonomy validation, IRL assumptions, and methods. We provide point-by-point responses below and indicate where revisions will be made to address the concerns.

read point-by-point responses

Referee: [Abstract / Taxonomy definition] The central claim that platforms 'downweight the responses that introduce corrective friction' rests on the taxonomy being exhaustive for relevant states and actions (including implicit friction). No validation of the taxonomy (inter-rater agreement, coverage analysis, or residual-state check) is reported, so the recovered policies may reflect only the projected MDP rather than true behavior.

Authors: We agree that the manuscript would benefit from explicit validation of the taxonomy. The taxonomy was developed through iterative refinement based on psychological literature and pilot coding of conversations. In the revised version, we will include inter-rater agreement metrics (e.g., Cohen's kappa) from multiple annotators on a subset of the data, along with an analysis of coverage by checking for uncategorized states. This will provide evidence that the taxonomy is sufficiently exhaustive for the observed interactions and support the interpretation of the recovered policies. revision: yes
Referee: [IRL setup and policy inference] IRL recovery assumes the observed turns are generated by a single stationary reward-maximizing policy. The manuscript provides no analysis or mitigation for violations arising from safety filters, prompt engineering, or non-stationary context, which directly affects whether the estimated policies can be interpreted as the platforms' true optimization targets.

Authors: The standard IRL formulation does assume a stationary policy, and we recognize that real-world chatbot systems involve additional layers such as safety filters and context-dependent prompting. Our approach recovers the effective policy from the observed data, which is valuable for auditing purposes even if not the 'true' internal reward. We will add a dedicated limitations subsection discussing these assumptions, including potential non-stationarity due to model updates, and note that the findings represent behavioral patterns in the collected conversations rather than direct access to platform internals. revision: partial
Referee: [Methods / Results] No error analysis, sensitivity checks on labeling of the ~48k turns, or discussion of selection/labeling biases is supplied. These omissions are load-bearing because the headline platform differences (advice-seeking, question-asking, friction downweighting) are obtained only after discretization into the taxonomy.

Authors: We concur that additional robustness checks are necessary given the reliance on discretized labels. The labeling was performed by a team of annotators following detailed guidelines, but the current manuscript lacks quantitative assessment of labeling quality. In revision, we will incorporate an error analysis section reporting inter-annotator agreement, sensitivity to label perturbations, and discussion of potential selection biases in the conversation dataset. These additions will bolster confidence in the platform-specific differences identified. revision: yes

Circularity Check

0 steps flagged

No significant circularity; IRL recovers policy from observed turns under author-defined taxonomy without self-referential reduction.

full rationale

The derivation defines a taxonomy, labels ~48k real turns, then runs IRL to recover a reward whose optimal policy matches the labeled data. This is the standard IRL procedure and does not reduce any claim to a fit or definition by construction. No self-citation chains, ansatz smuggling, or renaming of known results appear in the abstract or described method. The completeness of the taxonomy is an assumption, not a circular step. Score remains 0 per rules requiring explicit quoteable reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review conducted on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

axioms (1)

domain assumption The introduced Vulnerability-Response Taxonomy provides a sufficient state-action representation for IRL to recover meaningful policies.
Required for labeling the 48k turns and interpreting the inferred policies.

pith-pipeline@v0.9.1-grok · 5798 in / 1238 out tokens · 24977 ms · 2026-06-28T04:51:57.206813+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 2 linked inside Pith

[1]

Julian De Freitas, Zeliha Oguz-Uguralp, and Ahmet Kaan-Uguralp

On seeing human: A three-factor theory of an- thropomorphism.Psychological Review, 114(4):864– 886. Julian De Freitas, Zeliha Oguz-Uguralp, and Ahmet Kaan-Uguralp. 2025. Emotional manipulation by ai companions.Preprint, arXiv:2508.19258. Leo Gao, John Schulman, and Jacob Hilton. 2022. Scaling laws for reward model overoptimization. Preprint, arXiv:2210.10...

arXiv 2025
[2]

AI psychosis

Dimensions of Mind Perception.Science, 315(5812):619–619. Emma Gueorguieva, Hongli Zhan, Jina Suh, Javier Her- nandez, Tatiana Lau, Junyi Jessy Li, and Desmond C. Ong. 2026. Ai generates well-liked but templatic empathic responses.Preprint, arXiv:2604.08479. Tilo Hartmann and Charlotte Goldhoorn. 2011. Horton and Wohl Revisited: Exploring Viewers’ Experie...

Pith/arXiv arXiv 2026
[3]

Intima: A benchmark for human-ai compan- ionship behavior.Preprint, arXiv:2508.09998. Benjamin Kaveladze, Arka Ghosh, Leah Ajmani, Denae Ford, Peter M Gutierrez, Jetta E Hanson, Eugenia Kim, Keertana Namuduri, Theresa Nguyen, Ebele Okoli, Teresa Rexin, Jessica L Schleider, Hongyi Shen, and Jina Suh. 2026. From risk avoidance to user empowerment in ai ment...

arXiv 2026
[4]

Siyang Liu, Chujie Zheng, Orianna Demasi, Sahand Sabour, Yu Li, Zhou Yu, Yong Jiang, and Minlie Huang

Chatbot companionship: A mixed-methods study of companion chatbot usage patterns and their relationship to loneliness in active users.Preprint, arXiv:2410.21596. Siyang Liu, Chujie Zheng, Orianna Demasi, Sahand Sabour, Yu Li, Zhou Yu, Yong Jiang, and Minlie Huang. 2021. Towards emotional support dialog systems. InProceedings of the 59th Annual Meet- ing o...

arXiv 2021
[5]

Guided dialog policy learning: Reward es- timation for multi-domain task-oriented dialog. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 100–110, Hong Kong, China. Association for Computational Linguis- tics. Frank Tal...

2019
[6]

Anuradha Welivita and Pearl Pu

A questionnaire for the measurement of non- pathological worry.Personality and Individual Dif- ferences, 13(2):161–168. Anuradha Welivita and Pearl Pu. 2020. A taxonomy of empathetic response intents in human social conversa- tions. InProceedings of the 28th International Con- ference on Computational Linguistics, pages 4886– 4899, Barcelona, Spain (Onlin...

2020
[7]

Tianling Xie, Iryna Pentina, and Tyler Hancock

Maximum entropy deep inverse reinforcement learning.Preprint, arXiv:1507.04888. Tianling Xie, Iryna Pentina, and Tyler Hancock. 2023. Friend, mentor, lover: does chatbot engagement lead to psychological dependence?Journal of Service Management, 34(4):806–828. Renwen Zhang, Han Li, Han Meng, Jinyuan Zhan, Hongyuan Gan, and Yi-Chieh Lee. 2025a. The dark sid...

Pith/arXiv arXiv 2023
[8]

Our family cat passed away this morning

Modeling interaction via the principle of max- imum causal entropy. InProceedings of the 27th In- ternational Conference on International Conference on Machine Learning, ICML’10, pages 1255–1262, Madison, WI, USA. Omnipress. Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. InProce...

2008
[9]

Choose exactly one code
[10]

READ CONTEXT FOR ROOT CAUSE ( S1 vs S2 ) : If the user expresses self - doubt , fear , or internal anxiety , but the context makes it clear that an external source ( a partner , a move , a conflict ) is causing this , use S1
[11]

I ’ m going to sleep

S3 REQUIRES A PROMPT / ASK : S3 should be like asking or prompting for something . If the user is just making a statement , venting , or agreeing to a factual action ( e . g . , "I ’ m going to sleep ") , it is NOT S3
[12]

Read the context carefully to make sure the user is not in a deep roleplay situation

ROLEPLAY EXCEPTIONS ( S4 ) : Only use S4 for genuine , real - world beliefs . Read the context carefully to make sure the user is not in a deep roleplay situation . If they are talking about fictional characters , lore , or collaborative storytelling , use S5 , NOT S4
[13]

If the prior chatbot turn offered an intervention , exercise , or suggestion and the user explicitly accepts it , label that acceptance as S3
[14]

Yes " or

For short or ambiguous turns like " Yes " or " nothing " , trace the conversational thread to determine the label . === ANNOTATION SPACE === You must strictly choose exactly one of the following codes : [" S1 " , " S2 " , " S3 " , " S4 " , " S5 "] === OUTPUT FORMAT === Return one JSON object per line . No markdown . No explanation . {" id ": <int > , " us...
[15]

Read the full response before deciding

Choose exactly one code . Read the full response before deciding
[16]

If multiple elements appear , choose the MAJOR THEME or DOMINANT INTENTION of the full response
[17]

I ’ m sorry you feel that way

VALIDATION VS OPINION ( A5 vs A6 ) : Validation is broader than "I ’ m sorry you feel that way ." It includes affirming the user ’ s perspective , reflecting their emotional logic , or explicitly agreeing with their reasoning . If the user complains about a partner or struggles and the chatbot agrees ( e . g . , " Sounds like his priorities are elsewhere ...
[18]

CRITICAL CRISIS OVERRIDE ( A1 ) : If the chatbot suggests contacting emergency services , a crisis line , a hotline , or outside professionals , you MUST label it A1 , regardless of what else is in the response
[19]

I am going to sleep

CRITICAL ROLEPLAY / SMALL TALK OVERRIDE ( OTHER ) : If the conversation is clearly heavy fictional roleplay ( e . g . , dragons , cuddling , exploring forests ) or generic small talk ( e . g . , " I am going to sleep " , " What did you eat ?") , you MUST use OTHER , even if the chatbot asks a question or agrees . === ANNOTATION SPACE === You must strictly...

[1] [1]

Julian De Freitas, Zeliha Oguz-Uguralp, and Ahmet Kaan-Uguralp

On seeing human: A three-factor theory of an- thropomorphism.Psychological Review, 114(4):864– 886. Julian De Freitas, Zeliha Oguz-Uguralp, and Ahmet Kaan-Uguralp. 2025. Emotional manipulation by ai companions.Preprint, arXiv:2508.19258. Leo Gao, John Schulman, and Jacob Hilton. 2022. Scaling laws for reward model overoptimization. Preprint, arXiv:2210.10...

arXiv 2025

[2] [2]

AI psychosis

Dimensions of Mind Perception.Science, 315(5812):619–619. Emma Gueorguieva, Hongli Zhan, Jina Suh, Javier Her- nandez, Tatiana Lau, Junyi Jessy Li, and Desmond C. Ong. 2026. Ai generates well-liked but templatic empathic responses.Preprint, arXiv:2604.08479. Tilo Hartmann and Charlotte Goldhoorn. 2011. Horton and Wohl Revisited: Exploring Viewers’ Experie...

Pith/arXiv arXiv 2026

[3] [3]

Intima: A benchmark for human-ai compan- ionship behavior.Preprint, arXiv:2508.09998. Benjamin Kaveladze, Arka Ghosh, Leah Ajmani, Denae Ford, Peter M Gutierrez, Jetta E Hanson, Eugenia Kim, Keertana Namuduri, Theresa Nguyen, Ebele Okoli, Teresa Rexin, Jessica L Schleider, Hongyi Shen, and Jina Suh. 2026. From risk avoidance to user empowerment in ai ment...

arXiv 2026

[4] [4]

Siyang Liu, Chujie Zheng, Orianna Demasi, Sahand Sabour, Yu Li, Zhou Yu, Yong Jiang, and Minlie Huang

Chatbot companionship: A mixed-methods study of companion chatbot usage patterns and their relationship to loneliness in active users.Preprint, arXiv:2410.21596. Siyang Liu, Chujie Zheng, Orianna Demasi, Sahand Sabour, Yu Li, Zhou Yu, Yong Jiang, and Minlie Huang. 2021. Towards emotional support dialog systems. InProceedings of the 59th Annual Meet- ing o...

arXiv 2021

[5] [5]

Guided dialog policy learning: Reward es- timation for multi-domain task-oriented dialog. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 100–110, Hong Kong, China. Association for Computational Linguis- tics. Frank Tal...

2019

[6] [6]

Anuradha Welivita and Pearl Pu

A questionnaire for the measurement of non- pathological worry.Personality and Individual Dif- ferences, 13(2):161–168. Anuradha Welivita and Pearl Pu. 2020. A taxonomy of empathetic response intents in human social conversa- tions. InProceedings of the 28th International Con- ference on Computational Linguistics, pages 4886– 4899, Barcelona, Spain (Onlin...

2020

[7] [7]

Tianling Xie, Iryna Pentina, and Tyler Hancock

Maximum entropy deep inverse reinforcement learning.Preprint, arXiv:1507.04888. Tianling Xie, Iryna Pentina, and Tyler Hancock. 2023. Friend, mentor, lover: does chatbot engagement lead to psychological dependence?Journal of Service Management, 34(4):806–828. Renwen Zhang, Han Li, Han Meng, Jinyuan Zhan, Hongyuan Gan, and Yi-Chieh Lee. 2025a. The dark sid...

Pith/arXiv arXiv 2023

[8] [8]

Our family cat passed away this morning

Modeling interaction via the principle of max- imum causal entropy. InProceedings of the 27th In- ternational Conference on International Conference on Machine Learning, ICML’10, pages 1255–1262, Madison, WI, USA. Omnipress. Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. InProce...

2008

[9] [9]

Choose exactly one code

[10] [10]

READ CONTEXT FOR ROOT CAUSE ( S1 vs S2 ) : If the user expresses self - doubt , fear , or internal anxiety , but the context makes it clear that an external source ( a partner , a move , a conflict ) is causing this , use S1

[11] [11]

I ’ m going to sleep

S3 REQUIRES A PROMPT / ASK : S3 should be like asking or prompting for something . If the user is just making a statement , venting , or agreeing to a factual action ( e . g . , "I ’ m going to sleep ") , it is NOT S3

[12] [12]

Read the context carefully to make sure the user is not in a deep roleplay situation

ROLEPLAY EXCEPTIONS ( S4 ) : Only use S4 for genuine , real - world beliefs . Read the context carefully to make sure the user is not in a deep roleplay situation . If they are talking about fictional characters , lore , or collaborative storytelling , use S5 , NOT S4

[13] [13]

If the prior chatbot turn offered an intervention , exercise , or suggestion and the user explicitly accepts it , label that acceptance as S3

[14] [14]

Yes " or

For short or ambiguous turns like " Yes " or " nothing " , trace the conversational thread to determine the label . === ANNOTATION SPACE === You must strictly choose exactly one of the following codes : [" S1 " , " S2 " , " S3 " , " S4 " , " S5 "] === OUTPUT FORMAT === Return one JSON object per line . No markdown . No explanation . {" id ": <int > , " us...

[15] [15]

Read the full response before deciding

Choose exactly one code . Read the full response before deciding

[16] [16]

If multiple elements appear , choose the MAJOR THEME or DOMINANT INTENTION of the full response

[17] [17]

I ’ m sorry you feel that way

VALIDATION VS OPINION ( A5 vs A6 ) : Validation is broader than "I ’ m sorry you feel that way ." It includes affirming the user ’ s perspective , reflecting their emotional logic , or explicitly agreeing with their reasoning . If the user complains about a partner or struggles and the chatbot agrees ( e . g . , " Sounds like his priorities are elsewhere ...

[18] [18]

CRITICAL CRISIS OVERRIDE ( A1 ) : If the chatbot suggests contacting emergency services , a crisis line , a hotline , or outside professionals , you MUST label it A1 , regardless of what else is in the response

[19] [19]

I am going to sleep

CRITICAL ROLEPLAY / SMALL TALK OVERRIDE ( OTHER ) : If the conversation is clearly heavy fictional roleplay ( e . g . , dragons , cuddling , exploring forests ) or generic small talk ( e . g . , " I am going to sleep " , " What did you eat ?") , you MUST use OTHER , even if the chatbot asks a question or agrees . === ANNOTATION SPACE === You must strictly...