pith. machine review for the scientific record. sign in

arxiv: 2601.21459 · v4 · submitted 2026-01-29 · 💻 cs.LG · cs.AI

HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing

Pith reviewed 2026-05-16 09:38 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords LLM role-playingcognitive simulationdual-layer thinkingreinforcement learningreasoning tracespersona simulationhuman-aligned rewards
0
0 comments X

The pith

HER enables LLMs to simulate character inner thoughts by separating first-person persona reasoning from third-person model oversight and training on reverse-engineered data plus human-aligned rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to move LLM role-playing beyond surface-level tones and knowledge toward genuine cognitive simulation of why characters act as they do. Prior work lacked high-quality reasoning traces in training data and lacked reward signals that reliably match human judgments of persona behavior. HER supplies both by reverse-engineering reasoning-augmented examples and by defining explicit human-aligned principles that guide reward models. Models trained this way on Qwen3-32B show clear gains on role-play benchmarks. If the approach holds, role-play systems for companions, games, and content creation could produce more coherent and believable inner monologues.

Core claim

HER is a unified framework for cognitive-level persona simulation. It introduces dual-layer thinking that keeps characters' first-person thinking distinct from the LLM's third-person analysis. The authors curate reasoning-augmented role-playing data via reverse engineering, construct human-aligned principles, and train reward models on those principles. Supervised and reinforcement learning on these resources produces models that outperform the Qwen3-32B baseline by 30.26 points on CoSER and 14.97 percent on the Minimax Role-Play Bench.

What carries the argument

Dual-layer thinking mechanism that separates a character's first-person inner reasoning from the LLM's third-person oversight, supported by reverse-engineered reasoning traces and human-aligned reward models.

If this is right

  • HER models deliver a 30.26-point gain on the CoSER benchmark over the Qwen3-32B baseline.
  • The same training yields a 14.97 percent improvement on the Minimax Role-Play Bench.
  • Released datasets, principles, and models provide resources that future work can build on for cognitive role simulation.
  • Applications such as digital companions and games gain more consistent inner-thought simulation without additional prompt engineering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The reverse-engineering technique for obtaining reasoning traces could reduce the cost of creating high-quality thought data for other dialogue or planning tasks.
  • Maintaining an explicit separation between character and model perspectives may help maintain coherence over longer multi-turn interactions.
  • Reward models trained on the human-aligned principles might transfer to preference tuning in general conversational agents beyond role-play.

Load-bearing premise

Reverse-engineered reasoning data and the constructed reward models supply traces and signals that accurately reflect human preferences for how personas should think and act.

What would settle it

Training two otherwise identical models—one with the dual-layer distinction and reverse-engineered traces, one without—then measuring whether the gap on CoSER and Minimax benchmarks disappears would directly test the necessity of these components.

Figures

Figures reproduced from arXiv: 2601.21459 by Aili Chen, Chengyu Du, Deming Ding, Junteng Liu, Liheng Feng, Pengyu Zhao, Rong Tian, Rui Xu, Weiyuan Li, Xintao Wang, Yanghua Xiao, Yuhao Li, Zijun Sun, Zishan Huang.

Figure 1
Figure 1. Figure 1: The reasoning-driven LLM role-play frame￾work of HER. HER introduces Dual-layer Thinking and a three-stage reverse synthesis pipeline to construct reasoning￾augmented LLM role-play trajectories. where an agent must remain in character through￾out an interactive conversation. Large language models (LLMs) have demonstrated strong general￾purpose language capabilities, largely attributed to large-scale pretra… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of HER training. Top: we train a Role-play GRM by distilling reusable principles from real conversational preference data, and teaching the model to do pairwise judging with by-case principles → analysis → final decision. Bottom: we first cold-start the LLM role-play model with SFT on HER data, and then apply RL where the GRM compares the policy response with a baseline response to produce the rew… view at source ↗
Figure 3
Figure 3. Figure 3: Performance of HER Role-play RL training on CoSER Benchmark. 4.3 Reward Model Supervision: General vs. By-case Principles We compare by-case principles with fixed princi￾ples on a test set of 4,739 preference pairs anno￾tated by human experts. All GRM variants in this section are trained from the same SFT checkpoint; only the supervision format differs. Further details on data construction are in Appendix … view at source ↗
Figure 4
Figure 4. Figure 4: Pattern collapse vs. stable dimension-wise judg￾ments during GRM RL training. construction and mixing different judging patterns with controlled proportions in Appendix C. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of system thinking and RL on CoSER Benchmark. We compare a base model, SFT without think￾ing, SFT with system_thinking, and RL model. 4.5 System Thinking Improves Character Fidelity We test whether enabling explicit system thinking during training and inference improves in-character ability. Specifically, the model generates an explicit system thinking block before each response to rea￾son about cha… view at source ↗
Figure 6
Figure 6. Figure 6: shows the collapse dynamics: in the Collapsed setting, Top-1 pattern concentration crosses the 90% threshold by step 28 and reaches 96.3% at step 50 with entropy dropping from 1.32 to 0.29; in contrast, the Diversified setting maintains Top-1 concentration between 43–54% throughout 100 steps and keeps entropy consis￾tently above 2.0. Details in Appendix B.4 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Failure Type 1: Character “mind-reads” another’s [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Failure Type 2: <system_thinking> uses charac￾ter’s first-person voice instead of model’s third-person plan￾ning perspective. Type 3: Hallucinated enhancement [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Failure Type 3: Enhancement without dialogue [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Data schema showing the hierarchical structure. [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
read the original abstract

LLM role-playing, i.e., using LLMs to simulate specific personas, has emerged as a key capability in various applications, such as companionship, content creation and digital games. While current models effectively capture character tones and knowledge, simulating the inner thoughts behind their behaviors remains a challenge. Towards cognitive simulation in LLM role-play, previous efforts mainly suffer from two deficiencies: lacking data with high-quality reasoning traces, and lacking reliable reward signals aligned with human preferences. In this paper, we propose HER, a unified framework for cognitive-level persona simulation. HER introduces dual-layer thinking, which distinguishes characters' first-person thinking from LLMs' third-person thinking. To bridge these gaps, we curate reasoning-augmented role-playing data via reverse engineering, and construct human-aligned principles and reward models. Leveraging these resources, we train HER models based on Qwen3-32B via supervised and reinforcement learning. Extensive experiments validate the effectiveness of our approach. Notably, our models significantly outperform the Qwen3-32B baseline, achieving a 30.26 improvement on the CoSER benchmark and a 14.97% gain on the Minimax Role-Play Bench. Our datasets, principles, and models are released to facilitate future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes the HER framework for cognitive-level persona simulation in LLMs. It introduces dual-layer thinking to separate characters' first-person inner thoughts from the LLM's third-person reasoning. The authors curate reasoning-augmented role-playing data via reverse engineering, construct human-aligned principles and reward models, and train Qwen3-32B models with supervised fine-tuning followed by reinforcement learning. They report large gains over the Qwen3-32B baseline: +30.26 on the CoSER benchmark and +14.97% on the Minimax Role-Play Bench, and release the associated datasets, principles, and models.

Significance. If the reverse-engineered traces and reward models prove reliable, the dual-layer approach could provide a practical route to better inner-thought simulation in role-play agents. The public release of the curated resources is a clear strength that supports reproducibility and follow-on work.

major comments (3)
  1. [Abstract / Experiments] Abstract and Experiments section: the headline performance deltas (+30.26 on CoSER, +14.97% on Minimax) are stated without error bars, confidence intervals, number of runs, or statistical tests, so it is impossible to judge whether the gains are robust or attributable to the proposed framework rather than base-model scale or generic RL.
  2. [Data Curation] Data curation section: the reverse-engineered reasoning-augmented traces are presented as high-quality, yet no human agreement scores, inter-annotator reliability, or validation against expert annotations are reported; this validation is load-bearing for the claim that the performance improvement stems from cognitive-level traces rather than artifacts of the reverse-engineering process.
  3. [Experiments] Experiments section: no ablation studies isolate the contribution of dual-layer thinking, the human-aligned principles, or the learned reward model from the base Qwen3-32B checkpoint or from standard SFT+RL; without these controls the central attribution of gains to HER remains untested.
minor comments (2)
  1. [Methods] Clarify the precise operational definition of 'first-person thinking' versus 'third-person thinking' with concrete prompt examples early in the methods.
  2. [Data Curation] Add a table summarizing the scale and composition of the curated dataset (number of dialogues, average trace length, source personas).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will incorporate revisions to improve the robustness and clarity of the claims.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and Experiments section: the headline performance deltas (+30.26 on CoSER, +14.97% on Minimax) are stated without error bars, confidence intervals, number of runs, or statistical tests, so it is impossible to judge whether the gains are robust or attributable to the proposed framework rather than base-model scale or generic RL.

    Authors: We agree that the lack of error bars and statistical measures makes it difficult to fully assess robustness. The reported figures come from single-run evaluations, which is common given the computational expense of LLM training and inference. In the revision we will add an explicit statement on the evaluation protocol and, where additional runs are feasible, include standard deviations across seeds. This will help distinguish framework-driven gains from baseline variability. revision: partial

  2. Referee: [Data Curation] Data curation section: the reverse-engineered reasoning-augmented traces are presented as high-quality, yet no human agreement scores, inter-annotator reliability, or validation against expert annotations are reported; this validation is load-bearing for the claim that the performance improvement stems from cognitive-level traces rather than artifacts of the reverse-engineering process.

    Authors: The reverse-engineering procedure uses a structured, principle-guided prompting approach to generate traces. While internal sampling checks were performed, quantitative inter-annotator agreement was not computed because the process is largely automated. We will revise the data curation section to describe the verification protocol, report agreement on a sampled subset, and include representative examples that illustrate alignment with cognitive simulation. revision: yes

  3. Referee: [Experiments] Experiments section: no ablation studies isolate the contribution of dual-layer thinking, the human-aligned principles, or the learned reward model from the base Qwen3-32B checkpoint or from standard SFT+RL; without these controls the central attribution of gains to HER remains untested.

    Authors: We acknowledge that component-wise ablations would strengthen causal attribution. The current results compare the full HER pipeline against the base Qwen3-32B and implicit standard SFT+RL baselines, but do not isolate each element. In the revised manuscript we will add ablation experiments that remove dual-layer thinking and the learned reward model individually, reporting their incremental contributions on the same benchmarks. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical pipeline relies on external data curation

full rationale

The paper introduces dual-layer thinking and trains HER models on Qwen3-32B via SFT and RL after curating reasoning-augmented data through reverse engineering and constructing human-aligned principles plus reward models. These steps depend on newly created external resources and standard training procedures rather than any self-definitional equations, fitted parameters renamed as predictions, or load-bearing self-citation chains. The reported gains (+30.26 on CoSER, +14.97% on Minimax) are presented as empirical results of this process, with no reduction of claims to inputs by construction visible in the abstract or described framework. The derivation remains self-contained through data creation and RL optimization.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not describe any free parameters, axioms, or invented entities; the method relies on standard supervised and reinforcement learning applied to newly curated data and rewards.

pith-pipeline@v0.9.0 · 5563 in / 1105 out tokens · 36711 ms · 2026-05-16T09:38:57.430880+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    HER introduces dual-layer thinking, which distinguishes characters' first-person thinking from LLMs' third-person thinking... we curate reasoning-augmented role-playing data via reverse engineering, and construct human-aligned principles and reward models... train HER models based on Qwen3-32B via supervised and reinforcement learning.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We train a Role-play GRM by distilling reusable principles... pairwise judging with by-case principles → analysis → final decision... RL where the GRM compares the policy response with a baseline response

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 3 internal anchors

  1. [1]

    ArXiv preprint, abs/2310.00785

    Booookscore: A systematic exploration of book-length summarization in the era of llms. ArXiv preprint, abs/2310.00785. Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu, Aili Chen, Nianqi Li, Lida Chen, Caiyu Hu, Siye Wu, Scott Ren, Ziquan Fu, and Yanghua Xiao. 2024a. From persona to persona...

  2. [2]

    In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8506–8520, Sin- gapore

    Large language models meet harry potter: A dataset for aligning dialogue agents with characters. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8506–8520, Sin- gapore. Association for Computational Linguistics. Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, and Zhiwu Lu. 2024. Mmrole: A com- prehensive framework f...

  3. [3]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Deepseek-r1: Incentivizing reasoning capa- bility in llms via reinforcement learning. Preprint, arXiv:2501.12948. Ameet Deshpande, Vishvak Murahari, Tanmay Rajpuro- hit, Ashwin Kalyan, and Karthik Narasimhan. 2023. Toxicity in chatgpt: Analyzing persona-assigned lan- guage models. In Findings of the Association for Computational Linguistics: EMNLP 2023, p...

  4. [4]

    ArXiv preprint, abs/2308.09597

    Chatharuhi: Reviving anime character in reality via large language model. ArXiv preprint, abs/2308.09597. 9 Dawei Li, Bohan Jiang, Liangjie Huang, Alimoham- mad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tian- hao Wu, and 1 others. 2024. From generation to judgment: Opportunities and challenges of llm-as-a- judge. Ar...

  5. [5]

    In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4471–4500

    Bookworm: A dataset for character descrip- tion and analysis. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4471–4500. Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative agents: Interactive sim- ulacra of human behavior. In In the 36th Annual A...

  6. [6]

    Nature, 623(7987):493–498

    Role play with large language models. Nature, 623(7987):493–498. Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu

  7. [7]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Character-LLM: A trainable agent for role- playing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 13153–13187, Singapore. Association for Computational Linguistics. Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. 2024a....

  8. [8]

    Recursively Summarizing Books with Human Feedback

    Recursively summarizing books with human feedback. ArXiv preprint, abs/2109.10862. Rui Xu, Xintao Wang, Jiangjie Chen, Siyu Yuan, Xin- feng Yuan, Jiaqing Liang, Zulong Chen, Xiaoqing Dong, and Yanghua Xiao. 2024. Character is des- tiny: Can large language models simulate persona- driven decisions in role-playing? ArXiv preprint, abs/2404.12138. An Yang, A...

  9. [9]

    remember

    Evaluating character understanding of large language models via character profiling from fictional works. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Naifan Zhang, Ruihan Sun, Ruixi Su, Shiqi Ma, Shiya Zhang, Xianna Weng, Xiaofan Zhang, Yuhan Zhan, Yuyang Xu, Zhaohan Chen, Zhengyuan Pan, and Ziyi Song. 2025. ...

  10. [10]

    Principles sharing frequent N-gram patterns are grouped to- gether, revealing common evaluation criteria that may not match predefined keywords

    and identify high-frequency patterns. Principles sharing frequent N-gram patterns are grouped to- gether, revealing common evaluation criteria that may not match predefined keywords. The combination of both methods yields15 high- level categories, each representing a coherent eval- uation dimension. Frequency-Based SelectionWithin each of the 15 categorie...

  11. [11]

    Merge redundant principles:Combine se- mantically equivalent principles that differ only in phrasing

  12. [12]

    Refine ambiguous statements:Rewrite vague criteria into concrete, measurable stan- dards

  13. [13]

    better_response

    Reorganize categories:Consolidate the 15 clusters into a cleaner 12-dimension taxon- omy. The final output is51 principlesorganized into 12 dimensions. Each dimension covers a distinct aspect of roleplay quality evaluation (Table 22). C Balanced Construction and Pattern Parsing Rules This appendix provides the GRM output format, mixture design for balance...

  14. [14]

    This is third-person analysis of how to portray the role

    System Thinking: A single block at the beginning, wrapped in <system_thinking>...</ system_thinking>. This is third-person analysis of how to portray the role

  15. [15]

    Use <role_thinking>...</role_thinking> for thoughts (invisible to others) and <role_action>...</ role_action> for actions (visible to others)

    Role-play Response: Include thought, speech and action. Use <role_thinking>...</role_thinking> for thoughts (invisible to others) and <role_action>...</ role_action> for actions (visible to others). These elements can appear multiple times and be freely interleaved. Format conversion for baselines.For baseline models in baseline formats. We automatically ...

  16. [16]

    Read the story context, character profiles, and reference conversation

  17. [17]

    Evaluate the simulated conversation on the spec- ified dimension

  18. [18]

    Identify all flaw instances with type and severity (1-5)

  19. [19]

    Dimension_Name

    Output structured JSON with flaws list The full judge prompt template is provided be- low: Output format.The judge outputs structured JSON: { "Dimension_Name": { "flaws": [ { "instance": "description of the flaw", "type": "flaw category", "severity": 3 // 1 (minor) to 5 (severe ) } ] } } In this section, we list the detailed prompts for: 2)RPLA and multi-...

  20. [20]

    I’ll...”, “I will

    Thinking contains planning language: “I’ll...”, “I will...”, “I need to...”, “I must...”, “I should...”

  21. [21]

    I’ll take the opening

    Thinking explains why to perform an action: “I’ll take the opening...”, “It’s best to...”

  22. [22]

    Thinking depends on the result of the action ✓Can swap when:

  23. [23]

    Action is an independent small movement (adjusting posture, arranging clothes, simple gestures)

  24. [24]

    Thinking is an independent observation or reaction (analyzing what happened, observing environment)

  25. [25]

    {character_name}:

    Thinking contains no planning or explanatory language Scheme A: Re- order Rules: - Do not split original content - Only swap order when logical independence is confirmed - If independence cannot be determined, be conservative and do not swap Example:think(independent observation)→act(simple action)→speech⇒act→think→speech Scheme B: Split & Reor- ganize Co...

  26. [26]

    Output EXACTLY {num_turns} entries in the JSON array

  27. [27]

    Use EXACTLY these field names:dialogue_index,revised_sys_thinking,revision_notes

  28. [28]

    For Type A: PRESERVE LENGTH (±10%) and STRUCTURE exactly

  29. [29]

    For Type B/C: Generate proper third-person analysis (∼800-1500 chars)

  30. [30]

    Type A: preserved format

    Inrevision_notes: indicate “Type A: preserved format” or “Type B: rewrote” or “Type C: generated new” Table 17: Full prompt for system thinking consistency rewriting. 28 Tag Definition Visibility <system_thinking> Model’s planning voice (3rd person) “I need to portray Elizabeth as confrontational yet com- posed...” Only current turn <role_thinking> Charac...

  31. [31]

    Carefully read the entire dialogue history and understand the full context

  32. [32]

    Evaluate allnegative principlesfirst: If one response violates any negative principle→ the other wins immediately

  33. [33]

    Select relevantpositive principlesonly: Choose principles that matter for the current turn and explain why

  34. [34]

    For each selected principle: Analyze both candidates separately, provide evidence, and decide a winner

  35. [35]

    result”: [{ “cand_1

    Make the final decision considering: number of principles won, weight/importance, and degree of difference. GenRM Output (Structured Judg- ment) { “result”: [{ “cand_1”: “[Response candidate 1 text]”, “cand_2”: “[Response candidate 2 text]”, “principle”: { “Principle 1”: {“principle_name”: ..., “dimension_name”: ..., “reason_for_choosing”: ...}, ... }, “a...

  36. [37]

    (for CoSER models) Your output should include **thought**, **speech**, and **action**

    Limit your response to approximately 200 words. (for CoSER models) Your output should include **thought**, **speech**, and **action**. Use [your thought] for thoughts, which others can’t see. Use (your action) for actions, which others can see. (for HER or api models) Your output should follow this two-part structure in strict order: 1. System Thinking: A...

  37. [38]

    OtherCharacter:

    You are ONLY playing {character}. NEVER speak or act as other characters. 2. Output ONLY ONE turn of dialogue. Do NOT generate multiple conversation rounds. 3. NEVER include other character names followed by colons (e.g., "OtherCharacter:") in your output. 4. Keep your response concise and focused on your character’s single turn. 5. Stop after completing ...

  38. [39]

    random". If you believe the scene or conversation should conclude, output

    Limit your response to approximately 200 words. (for CoSER models) Your output should include **thought**, **speech**, and **action**. Use [your thought] for thoughts, which others can’t see. Use (your action) for actions, which others can see. (for HER or api models) Your output should follow this two-part structure in strict order: 1. System Thinking: A...

  39. [40]

    greatly troubled

    Internal Thought: Acknowledge the dichotomy—finding the situation amusing yet worrying for her reputation. 3.Speech: Address her directly but with a teasing opener. </system_thinking> Mr Bennet:<role_action>picks up the letter, tapping it thoughtfully against the desk</role_action| <role_thinking>It is a delicate matter, this business with Darcy, yet I ca...

  40. [41]

    troubled

    Internal:I am feeling a mix of relief (that I stood my ground) and anxiety (about the consequences). I am not “troubled” in a weak sense, but I am certainly emotionally charged. I respect my father, but I am wary of his detachment

  41. [42]

    I believe I can manage

    External:I need to appear composed. I am a Bennet, and I have pride. I cannot let him see me crumble, nor be overly dramatic like my mother. Goal:Answer his question honestly but carefully. Convey that I am holding up well, without dismissing the gravity of the situation. Mask my deeper worry with a facade of composure. Plan: •Role Thinking:Reflect on the...