pith. sign in

arxiv: 2606.10051 · v1 · pith:KRVKV6O5new · submitted 2026-06-08 · 💻 cs.CY · cs.HC

The Empirically Grounded Adaptive Virtual Patient for Psychotherapy Training: Disclosure That Responds to Therapist Micro-Skills

Pith reviewed 2026-06-27 14:37 UTC · model grok-4.3

classification 💻 cs.CY cs.HC
keywords adaptive virtual patientpsychotherapy trainingstructural equation modelLLM disclosuretherapist empathyexploration skillspatient opennessmicro-skills
0
0 comments X

The pith

Virtual patient adapts disclosure level to therapist empathy and exploration

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Adaptive Virtual Patient (AVP) that changes how much it reveals in simulated therapy sessions according to the trainee's empathy and exploratory questions. Adaptation is driven by a structural equation model fitted to nearly 2,000 hours of real psychotherapy transcripts that tracks how these micro-skills affect patient openness over time. An LLM produces the patient's utterances while a separate dynamics module updates the disclosure level each turn. In evaluation across 80 sessions with 20 clinicians and trainees, AVP disclosure rose as expected while a prompt-only baseline stayed flat, and ablations showed the empirical parameters work better than alternatives. This setup aims to give scalable practice that stays responsive rather than drifting or staying scripted.

Core claim

The AVP conditions LLM-generated responses on a disclosure level that is updated each turn by a structural equation model fitted to real transcripts, which quantifies how therapist empathy and exploration shift patient openness. In 1,033 turns of testing the AVP's disclosure rose in response to these skills while baselines did not, with exploration carrying most of the signal and the full parameterization outperforming ablated versions.

What carries the argument

Structural equation model that quantifies how therapist empathy and exploration shift patient openness over time, used to update the disclosure level that conditions LLM utterances.

If this is right

  • AVP disclosure increases with therapist empathy and exploration while prompt-only versions stay flat.
  • Exploration supplies the strongest adaptive signal according to the model and ablations.
  • The empirically fitted parameters outperform other ways of controlling the LLM behavior.
  • The system maintains consistent adaptation across long sessions of over 1,000 turns.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same modeling approach could be used to adapt other patient behaviors such as emotional expression or resistance.
  • Training programs could add real-time scoring of micro-skills to give trainees immediate feedback on what increases disclosure.
  • The model might need re-fitting or testing when applied to therapy styles or patient populations not well represented in the original transcripts.

Load-bearing premise

The relationships between therapist micro-skills and patient openness measured in real transcripts can be applied directly to adjust disclosure in LLM responses without losing validity.

What would settle it

New sessions in which high therapist empathy and exploration scores produce no corresponding rise in AVP disclosure level, or produce the same flat pattern as the prompt-only baseline.

Figures

Figures reproduced from arXiv: 2606.10051 by Angela Chen, Canwen Wang, Catherine Bao, Haiyi Zhu, Robert E. Kraut, Siwei Jin, Tongshuang Wu.

Figure 1
Figure 1. Figure 1: Overview of the AVP framework. 1 arXiv:2606.10051v1 [cs.CY] 8 Jun 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: P3 Faithful Realization. Confusion matrices comparing the system-derived disclosure label to an independent evaluator’s label (G/M/H). AVP overall agreement = 0.651; SVP = 0.311. The SVP H column is dominated by false positives. by-skill interaction is positive but does not reach significance, probably because the test is under￾powered at N = 20 per stratum (βˆ = +0.020, p = .27); the Static condition is s… view at source ↗
Figure 4
Figure 4. Figure 4: System interface. Left panel: VP condition selection and patient background. Main panel: turn-by-turn chat workspace. A restart control allows the same case to be replayed from initial state. which read in full: The risks and discomfort associ￾ated with participation in this study are no greater than those ordinarily encountered in psychother￾apy training. There is a risk of potentially private or sensitiv… view at source ↗
Figure 5
Figure 5. Figure 5: P2 Gradual Build-up: AVP vs. SVP (trimmed 0–80%). Decile-averaged evaluator-coded disclosure (G=1, M=2, H=3) over session progress, trimmed to remove the closing-turn wrap-up artifact. AVP produces a strongly climbing trajectory (∆ = +0.97, R2 = 0.94); SVP is essentially flat (∆ = +0.56, R2 = 0.15). Shaded bands show 95% CIs across sessions. dentiality protections, including separate storage of consent for… view at source ↗
Figure 6
Figure 6. Figure 6: P2 Per-persona disclosure trajectories (trimmed 0–80%). Left panel: Sam persona; right panel: Alex persona. The adaptive advantage is larger for Alex (AVP ∆ = +1.10, R2 = .94; SVP ∆ = +0.39, R2 = .01) than for Sam (AVP ∆ = +0.85, R2 = .89; SVP ∆ = +0.65, R2 = .42), consistent with Sam’s adversarial persona eliciting a less distinct trajectory contrast. Empathy total (interp + emo + reflect) Exploration 0.0… view at source ↗
Figure 7
Figure 7. Figure 7: P1 Responsiveness: βˆ coefficients for empathy and exploration. Grouped bars show OLS coefficients from ∆eval ∼ βˆ 0 + βˆ 1 emp_total + βˆ 2 exploration with cluster-robust SEs (session level). Exploration drives responsiveness in AVP (βˆ = +0.281, p < 10−6 ); the empathy block is near zero in both designs (β <ˆ 0.005, p > .87). Error bars = ±1.96× SE. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: P3 Faithful Realization (text alignment view): confusion matrices. Cell values show counts and row-normalized proportions. AVP overall agreement = 0.651; G-precision = 0.83, indicating the state manager successfully restrains over-disclosure. SVP overall agreement = 0.311; SVP’s H column is dominated by false positives (the always-H card causes the system to claim H-level disclosure while the evaluator obs… view at source ↗
Figure 9
Figure 9. Figure 9: P4 Faithful Realization (skill split view). Sessions split at the median composite skill score (emp_total + 3×exploration, n = 20 per stratum). AVP shows a directional pattern: high-skill sessions (∆ = +0.85) climb steeper than low-skill (∆ = +0.60), though the formal slope × skill interaction is underpowered (βˆ = +0.020, p = .27). SVP trajectories are flat and skill-invariant. 20 [PITH_FULL_IMAGE:figure… view at source ↗
Figure 10
Figure 10. Figure 10: Human Likert ratings by condition and dimension (n = 20 per condition). Error bars show ±1.96× SE. Solid fill = Static VP (Sam Alpha; Alex Alpha); hatched fill = Adaptive VP (Sam Beta; Alex Beta). Alex Beta (adaptive) is rated highest on all four dimensions; Sam Beta (adaptive) is rated lower than Sam Alpha on perceived adaptivity, suggesting a persona-dependent reversal of the expected adaptivity advanta… view at source ↗
Figure 11
Figure 11. Figure 11: Evaluator agreement by weight scheme (post-hoc ablation). The deployed weighting outperforms all alternatives. Empathy-only loses 15.8 pp in AVP (the worst alternative) because without exploration the cumulative score cannot reach H-level disclosure. Exploration-only loses only 7.2 pp, confirming exploration as the dominant signal. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
read the original abstract

Simulated patients offer a scalable way to train psychotherapy micro-skills such as empathic responding and exploratory probing, but current systems either follow fixed scripts or rely on LLMs that drift unpredictably over long sessions. We present the Adaptive Virtual Patient (AVP), which adapts its disclosure behavior -- from guarded, through moderate openness, to full disclosure -- in response to trainee skill. The AVP is grounded in a structural equation model fit to nearly 2{,}000 hours of real-world psychotherapy transcripts, which quantifies how therapist empathy and exploration shift a patient's openness over time. An LLM generates the AVP's utterances conditioned on a disclosure level that the dynamics module updates each turn. In an evaluation with 20 clinicians and trainees over 80 sessions (1{,}033 turns), the AVP's disclosure rises in response to therapist empathy and exploration, while a prompt-only baseline stays flat; ablations confirm that the empirically motivated parameterization outperforms alternatives, with exploration carrying most of the adaptive signal.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Adaptive Virtual Patient (AVP), an LLM-based simulated patient for psychotherapy training whose disclosure level (guarded to full) is dynamically updated each turn by a structural equation model (SEM) fitted to real psychotherapy transcripts. The SEM quantifies effects of therapist empathy and exploration on patient openness; the resulting scalar conditions the LLM prompt. A study with 20 clinicians/trainees across 80 sessions (1,033 turns) reports that AVP disclosure rises appropriately with therapist skill while a prompt-only baseline remains flat; ablations indicate the empirical parameterization (especially exploration) drives the adaptation.

Significance. If the SEM-to-LLM transfer is shown to preserve the fitted causal relationships in generated text, the work supplies a concrete, data-grounded mechanism for controllable long-horizon patient simulation that outperforms prompt engineering alone. This directly addresses a practical bottleneck in scalable micro-skill training and supplies an explicit, falsifiable link between transcript-derived parameters and system behavior.

major comments (3)
  1. [§5, §4.2] §5 (Evaluation) and §4.2 (Dynamics module): The reported adaptation is measured solely via the internal disclosure scalar and participant ratings of session quality; no independent coding or quantitative analysis of the generated utterances' actual disclosure/openness content (e.g., via blinded raters applying the same openness scale used to fit the SEM) is presented. This leaves the central claim—that LLM outputs respect the SEM coefficients—unverified and vulnerable to prompt artifacts.
  2. [§3.1] §3.1 (SEM construction): The manuscript states the model was fit to “nearly 2,000 hours” of transcripts but supplies neither the exact sample size (sessions/transcripts), fitting procedure (e.g., maximum likelihood, estimator), goodness-of-fit indices (CFI, RMSEA, SRMR), nor any cross-validation or sensitivity checks on the empathy/exploration coefficients. These parameters are load-bearing for the adaptation rule and must be reported for reproducibility and credibility assessment.
  3. [§5.3] §5.3 (Quantitative results) and Table 2/3 (if present): The comparison of AVP vs. baseline and ablations reports directional changes in disclosure but omits the statistical tests employed, exact p-values or confidence intervals, effect sizes, and any correction for multiple comparisons across the 1,033 turns. Without these, the strength of evidence that “exploration carries most of the adaptive signal” cannot be evaluated.
minor comments (2)
  1. [Abstract, §2] Abstract and §2: The phrase “nearly 2,000 hours” should be replaced by the precise figure and a citation to the transcript corpus once the fitting details are added.
  2. [Figure 1] Figure 1 (system diagram): The arrow from “Dynamics module” to “LLM prompt” should explicitly label the disclosure scalar and note whether it is normalized or passed raw.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§5, §4.2] §5 (Evaluation) and §4.2 (Dynamics module): The reported adaptation is measured solely via the internal disclosure scalar and participant ratings of session quality; no independent coding or quantitative analysis of the generated utterances' actual disclosure/openness content (e.g., via blinded raters applying the same openness scale used to fit the SEM) is presented. This leaves the central claim—that LLM outputs respect the SEM coefficients—unverified and vulnerable to prompt artifacts.

    Authors: We agree that direct coding of generated utterances against the openness scale would provide stronger verification that LLM outputs respect the SEM coefficients. The current evaluation centers on the disclosure scalar (the explicit mechanism) and clinician ratings of session quality as external indicators of adaptation. In the revision we will add a post-hoc blinded coding analysis on a sample of utterances to quantify alignment with predicted disclosure levels and address potential prompt artifacts. revision: yes

  2. Referee: [§3.1] §3.1 (SEM construction): The manuscript states the model was fit to “nearly 2,000 hours” of transcripts but supplies neither the exact sample size (sessions/transcripts), fitting procedure (e.g., maximum likelihood, estimator), goodness-of-fit indices (CFI, RMSEA, SRMR), nor any cross-validation or sensitivity checks on the empathy/exploration coefficients. These parameters are load-bearing for the adaptation rule and must be reported for reproducibility and credibility assessment.

    Authors: We will expand §3.1 to report the exact sample size, estimation procedure, all goodness-of-fit indices, cross-validation results, and sensitivity checks on the key coefficients. These details were generated during model development but omitted for brevity; including them will support reproducibility and credibility assessment of the adaptation rule. revision: yes

  3. Referee: [§5.3] §5.3 (Quantitative results) and Table 2/3 (if present): The comparison of AVP vs. baseline and ablations reports directional changes in disclosure but omits the statistical tests employed, exact p-values or confidence intervals, effect sizes, and any correction for multiple comparisons across the 1,033 turns. Without these, the strength of evidence that “exploration carries most of the adaptive signal” cannot be evaluated.

    Authors: We will revise §5.3 and associated tables to report the statistical tests used (linear mixed-effects models), exact p-values, confidence intervals, effect sizes, and a note on multiple-comparison handling for the pre-specified contrasts. This will allow full evaluation of the evidence strength regarding the role of exploration. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external transcript data and human evaluation

full rationale

The structural equation model is fitted to an independent corpus of ~2000 hours of real psychotherapy transcripts (external to the present LLM system). The adaptation mechanism updates a disclosure scalar from this model and conditions LLM generation on it; the central empirical claim is then tested via a separate human evaluation (20 clinicians, 80 sessions, 1033 turns) that compares against a prompt-only baseline and ablations. No equation or claim reduces by construction to its own fitted inputs or to a self-citation chain; the evaluation provides an external benchmark that is not tautological with the parameterization.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The claim depends on fitted SEM parameters from transcripts and the assumption that the model transfers to LLM control; no new entities are postulated.

free parameters (1)
  • SEM coefficients for empathy and exploration effects on openness
    Fitted to nearly 2000 hours of real psychotherapy transcripts to quantify temporal shifts in patient disclosure.
axioms (1)
  • domain assumption Therapist empathy and exploration causally influence patient openness in a manner quantifiable by a linear structural equation model that generalizes beyond the original transcripts.
    Invoked to justify updating the disclosure level each turn and to claim adaptation in the evaluation.

pith-pipeline@v0.9.1-grok · 5729 in / 1422 out tokens · 30981 ms · 2026-06-27T14:37:41.515750+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 7 canonical work pages

  1. [1]

    and Cooper, Janice L

    Patel, Vikram and Saxena, Shekhar and Lund, Crick and Thornicroft, Graham and Baingana, Florence and Bolton, Paul and Chisholm, Dan and Collins, Pamela Y. and Cooper, Janice L. and Eaton, Julian and Herrman, Helen and Herzallah, Mohammad M. and Huang, Yueqin and Jordans, Mark J. D. and Kleinman, Arthur and Medina-Mora, Maria Elena and Morgan, Ellen and Ni...

  2. [2]

    and Desiraju, Keshav and Morris, Jodi E

    Kakuma, Ritsuko and Minas, Harry and van Ginneken, Nadja and Dal Poz, Mario R. and Desiraju, Keshav and Morris, Jodi E. and Saxena, Shekhar and Scheffler, Richard M. , title =. The Lancet , volume =. 2011 , doi =

  3. [3]

    , title =

    Barrows, Howard S. , title =. Academic Medicine , volume =. 1993 , doi =

  4. [4]

    and Watson, Jeanne C

    Elliott, Robert and Bohart, Arthur C. and Watson, Jeanne C. and Greenberg, Leslie S. , title =. Psychotherapy Relationships That Work: Evidence-Based Responsiveness , edition =

  5. [5]

    , title =

    Bordin, Edward S. , title =. Psychotherapy: Theory, Research and Practice , volume =. 1979 , doi =

  6. [6]

    The alliance in adult psychotherapy: A meta-analytic synthesis , journal =

    Fl. The alliance in adult psychotherapy: A meta-analytic synthesis , journal =. 2018 , doi =

  7. [7]

    , title =

    Bollen, Kenneth A. , title =. 1989 , doi =

  8. [8]

    , title =

    Kline, Rex B. , title =. 2023 , isbn =

  9. [9]

    JMIR Medical Education , volume =

    Holderried, Friederike and Stegemann-Philipps, Christian and Herrmann-Werner, Anne and Festl-Wietek, Teresa and Holderried, Martin and Eickhoff, Carsten and Mahling, Moritz , title =. JMIR Medical Education , volume =. 2024 , doi =

  10. [10]

    and Zhi, Jiayin and Eack, Shaun M

    Wang, Ruiyi and Milani, Stephanie and Chiu, Jamie C. and Zhi, Jiayin and Eack, Shaun M. and Labrum, Travis and Murphy, Samuel M and Jones, Nev and Hardy, Kate V and Shen, Hong and Fang, Fei and Chen, Zhiyu , booktitle =. 2024 , address =. doi:10.18653/v1/2024.emnlp-main.711 , pages =

  11. [11]

    2021 , isbn =

    Ethics and Governance of Artificial Intelligence for Health:. 2021 , isbn =

  12. [12]

    2023 , doi =

    Tabassi, Elham , title =. 2023 , doi =

  13. [13]

    and Triola, Marc M

    Cook, David A. and Triola, Marc M. , title =. Medical Education , volume =. 2009 , doi =

  14. [14]

    and Woodham, Luke A

    Kononowicz, Andrzej A. and Woodham, Luke A. and Edelbring, Samuel and Stathakarou, Natalia and Davies, David and Saxena, Nakul and Tudor Car, Lorainne and Carlstedt-Duke, Jan and Car, Josip and Zary, Nabil , title =. Journal of Medical Internet Research , volume =. 2019 , doi =

  15. [15]

    and Hatala, Rose and Brydges, Ryan and Zendejas, Benjamin and Szostek, Jason H

    Cook, David A. and Hatala, Rose and Brydges, Ryan and Zendejas, Benjamin and Szostek, Jason H. and Wang, Amy T. and Erwin, Patricia J. and Hamstra, Stanley J. , title =. JAMA , volume =. 2011 , doi =

  16. [16]

    Computers & Education , volume =

    Consorti, Fabrizio and Mancuso, Rosaria and Nocioni, Martina and Piccolo, Annalisa , title =. Computers & Education , volume =. 2012 , doi =

  17. [17]

    and Tong, Huong Ly and Kocaballi, Ahmet Baki and Chen, Jessica and Bashir, Rabia and Surian, Didi and Gallego, Blanca and Magrabi, Farah and Lau, Annie Y

    Laranjo, Liliana and Dunn, Adam G. and Tong, Huong Ly and Kocaballi, Ahmet Baki and Chen, Jessica and Bashir, Rabia and Surian, Didi and Gallego, Blanca and Magrabi, Farah and Lau, Annie Y. S. and Coiera, Enrico , title =. Journal of the American Medical Informatics Association , volume =. 2018 , doi =

  18. [18]

    2009 , doi =

    Pearl, Judea , title =. 2009 , doi =

  19. [19]

    arXiv preprint arXiv:2303.08774 , year =

  20. [20]

    Sara and Wei, Jason and Chung, Hyung Won and Scales, Nathan and Tanwani, Ajay and Cole-Lewis, Heather and Pfohl, Stephen and others , title =

    Singhal, Karan and Azizi, Shekoofeh and Tu, Tao and Mahdavi, S. Sara and Wei, Jason and Chung, Hyung Won and Scales, Nathan and Tanwani, Ajay and Cole-Lewis, Heather and Pfohl, Stephen and others , title =. Nature , volume =. 2023 , doi =

  21. [21]

    and Larsson, Staffan , title =

    Traum, David R. and Larsson, Staffan , title =. Current and New Directions in Discourse and Dialogue , series =. 2003 , doi =

  22. [22]

    and Picard, Rosalind W

    Bickmore, Timothy W. and Picard, Rosalind W. , title =. ACM Transactions on Computer-Human Interaction , volume =. 2005 , doi =

  23. [23]

    arXiv preprint arXiv:2602.12450 , year=

    Empirical Modeling of Therapist-Client Dynamics in Psychotherapy Using LLM-Based Assessments , author=. arXiv preprint arXiv:2602.12450 , year=

  24. [24]

    Roleplay-doh: Enabling Domain-Experts to Create LLM -simulated Patients via Eliciting and Adhering to Principles

    Louie, Ryan and Nandi, Ananjan and Fang, William and Chang, Cheng and Brunskill, Emma and Yang, Diyi. Roleplay-doh: Enabling Domain-Experts to Create LLM -simulated Patients via Eliciting and Adhering to Principles. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.591

  25. [25]

    Proceedings of the 36th Annual

    Generative Agents: Interactive Simulacra of Human Behavior , author =. Proceedings of the 36th Annual. 2023 , publisher =

  26. [26]

    and Xiong, Caiming and Socher, Richard , journal =

    Keskar, Nitish Shirish and McCann, Bryan and Varshney, Lav R. and Xiong, Caiming and Socher, Richard , journal =. 2019 , url =

  27. [27]

    Personalizing Dialogue Agents:

    Zhang, Saizheng and Dinan, Emily and Urbanek, Jack and Szlam, Arthur and Kiela, Douwe and Weston, Jason , booktitle =. Personalizing Dialogue Agents:. 2018 , publisher =

  28. [28]

    Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (

    Recent Neural Methods on Dialogue State Tracking for Task-Oriented Dialogue Systems: A Survey , author =. Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue (. 2021 , publisher =

  29. [29]

    Psychological Review , volume =

    The Role of Deliberate Practice in the Acquisition of Expert Performance , author =. Psychological Review , volume =. 1993 , publisher =

  30. [30]

    International workshop on intelligent virtual agents , pages=

    Virtual patients for clinical therapist skills training , author=. International workshop on intelligent virtual agents , pages=. 2007 , organization=

  31. [31]

    Journal of medical Internet research , volume=

    Development and evaluation of ClientBot: Patient-like conversational agent to train basic counseling skills , author=. Journal of medical Internet research , volume=. 2019 , publisher=

  32. [32]

    JMIR Medical Education , volume=

    Leveraging large language models for simulated psychotherapy client interactions: Development and usability study of client101 , author=. JMIR Medical Education , volume=. 2025 , publisher=

  33. [33]

    Journal of medical Internet research , volume=

    Medical student and tutor perceptions of video versus text in an interactive online virtual patient for problem-based learning: a pilot study , author=. Journal of medical Internet research , volume=. 2015 , publisher=

  34. [34]

    Partial least squares structural equation modeling (PLS-SEM) using R: a workbook , pages=

    An introduction to structural equation modeling , author=. Partial least squares structural equation modeling (PLS-SEM) using R: a workbook , pages=. 2021 , publisher=

  35. [35]

    Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Learning discourse-level diversity for neural dialog models using conditional variational autoencoders , author=. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  36. [36]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    A hierarchical latent variable encoder-decoder model for generating dialogues , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  37. [37]

    Character- LLM : A Trainable Agent for Role-Playing

    Shao, Yunfan and Li, Linyang and Dai, Junqi and Qiu, Xipeng. Character- LLM : A Trainable Agent for Role-Playing. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023

  38. [38]

    Can LLM s Effectively Simulate Human Learners? Teachers' Insights from Tutoring LLM Students

    Martynova, Daria and Macina, Jakub and Daheim, Nico and Yalcin, Nilay and Zhang, Xiaoyu and Sachan, Mrinmaya. Can LLM s Effectively Simulate Human Learners? Teachers' Insights from Tutoring LLM Students. Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). 2025. doi:10.18653/v1/2025.bea-1.8

  39. [39]

    Wang, Noah and Peng, Z.y. and Que, Haoran and Liu, Jiaheng and Zhou, Wangchunshu and Wu, Yuhan and Guo, Hongcheng and Gan, Ruitong and Ni, Zehao and Yang, Jian and Zhang, Man and Zhang, Zhaoxiang and Ouyang, Wanli and Xu, Ke and Huang, Wenhao and Fu, Jie and Peng, Junran. R ole LLM : Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large L...

  40. [40]

    2024 , eprint=

    Measuring and Controlling Instruction (In)Stability in Language Model Dialogs , author=. 2024 , eprint=

  41. [41]

    P ersona G ym: Evaluating Persona Agents and LLM s

    Samuel, Vinay and Zou, Henry Peng and Zhou, Yue and Chaudhari, Shreyas and Kalyan, Ashwin and Rajpurohit, Tanmay and Deshpande, Ameet and Narasimhan, Karthik R and Murahari, Vishvak. P ersona G ym: Evaluating Persona Agents and LLM s. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.368

  42. [42]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  43. [43]

    Review of Educational Research , volume =

    John Hattie and Helen Timperley , title =. Review of Educational Research , volume =. 2007 , doi =

  44. [44]

    Mcgaghie, William and Issenberg, Barry and Cohen, Elaine and Barsuk, Jeffrey and Wayne, Diane , year =. Does Simulation-Based Medical Education With Deliberate Practice Yield Better Results Than Traditional Clinical Education? A Meta-Analytic Comparative Review of the Evidence , volume =. Academic medicine : journal of the Association of American Medical ...

  45. [45]

    Journal of Medical Internet Research , author =

    Kononowicz, Andrzej A and Woodham, Luke A and Edelbring, Samuel and Stathakarou, Natalia and Davies, David and Saxena, Nakul and Tudor Car, Lorainne and Carlstedt-Duke, Jan and Car, Josip and Zary, Nabil. Virtual Patient Simulations in Health Professions Education: Systematic Review and Meta-Analysis by the Digital Health Education Collaboration. J Med In...

  46. [46]

    , journal=

    Young, Steve and Gašić, Milica and Thomson, Blaise and Williams, Jason D. , journal=. POMDP-Based Statistical Spoken Dialog Systems: A Review , year=

  47. [47]

    and Favre, Benoit

    Jacqmin, L \'e o and Rojas Barahona, Lina M. and Favre, Benoit. ``Do you follow me?'': A Survey of Recent Approaches in Dialogue State Tracking. Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2022. doi:10.18653/v1/2022.sigdial-1.33

  48. [48]

    2025 , eprint=

    Adaptive-VP: A Framework for LLM-Based Virtual Patients that Adapts to Trainees' Dialogue to Facilitate Nurse Communication Training , author=. 2025 , eprint=

  49. [49]

    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

    A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support , author =. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =

  50. [50]

    2023 , publisher =

    Motivational Interviewing: Helping People Change and Grow , author =. 2023 , publisher =

  51. [51]

    Self-disclosure Topic Model for Classifying and Analyzing

    Bak, JinYeong and Lin, Chin-Yew and Oh, Alice , booktitle =. Self-disclosure Topic Model for Classifying and Analyzing. 2014 , pages =

  52. [52]

    Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA '15) , year =

    Detecting and Characterizing Mental Health Related Self-Disclosure in Social Media , author =. Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA '15) , year =