pith. sign in

arxiv: 2606.17786 · v1 · pith:J5H5A7LHnew · submitted 2026-06-16 · 💻 cs.HC · cs.CL

Toward Accessible Psychotherapy Training Using AI-Driven Interactive Patient Avatars

Pith reviewed 2026-06-26 23:08 UTC · model grok-4.3

classification 💻 cs.HC cs.CL
keywords Acceptance and Commitment Therapyvirtual patientslarge language modelspsychotherapy trainingautomated feedbackAI in healthcaresimulated dialogue
0
0 comments X

The pith

Large language models can simulate realistic therapy patients and provide accurate feedback on Acceptance and Commitment Therapy interventions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a system for training psychotherapists in Acceptance and Commitment Therapy using spoken dialogue with AI-generated virtual patients. The patients are created by conditioning large language models on profiles from actual therapy sessions, and a separate model evaluates each therapist response for fidelity to ACT principles. Expert psychologists rated the patient interactions as highly realistic and found that the immediate feedback helped them notice their choices and try different responses. Across dozens of transcripts, the best model reproduced human supervisor ratings with a mean absolute error of about 6 points. The work positions this as a way to add scalable, low-risk practice to existing training methods.

Core claim

The authors establish that large language models can be used to simulate patient behavior in therapy dialogues and to automatically assess therapist adherence to ACT fidelity criteria in a manner that aligns with human expert judgments, with the system supporting deliberate practice through immediate feedback.

What carries the argument

LLM-simulated patients conditioned on real therapy profiles paired with an automated ACT fidelity evaluator that provides turn-by-turn feedback.

If this is right

  • Therapists gain immediate awareness of intervention choices during practice sessions.
  • Experimentation with alternative responses becomes feasible in a low-risk environment.
  • Training can occur without the logistical limits of scheduling real patients or supervisors.
  • Automated feedback can approximate human supervisor ratings with low error using suitable models like GPT-4o-mini.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such systems might eventually allow measurement of skill improvement over multiple sessions with the avatar.
  • Extending the patient profiles to include diverse cultural or demographic backgrounds could address access issues in training.
  • The feedback mechanism could be adapted for other evidence-based therapies if fidelity criteria are defined similarly.

Load-bearing premise

Large language models conditioned on real therapy profiles can generate patient responses realistic enough to stand in for human patients, and the automated evaluator can judge ACT fidelity without systematic bias compared to human supervisors.

What would settle it

A controlled study comparing therapist fidelity and client outcomes in real sessions after training with the AI system versus traditional supervision.

Figures

Figures reproduced from arXiv: 2606.17786 by Andrew Gloster, Pascal Riachi, Rafael Wampfler, Sofie Kamber, Stella Brogna.

Figure 1
Figure 1. Figure 1: System architecture of the psychotherapy training application. Therapist utterances are transcribed from audio and evaluated for ACT-consistency by [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Automated ACT-aligned feedback interface. (a) ACT-consistent feed [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Trade-off between inference speed and rating accuracy for six LLMs. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: LLM agreement with human supervisor ratings on partial transcripts. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Training psychotherapists in evidence-based interventions such as Acceptance and Commitment Therapy (ACT) requires repeated practice with meaningful feedback, yet opportunities for safe, standardized training are limited by ethical, logistical, and resource constraints. We introduce a system designed to support ACT-oriented psychotherapy training through spoken dialogue with an embodied virtual patient. The system uses large language models to simulate patient behavior conditioned on profiles derived from real therapy sessions and configurable clinical scenarios, while a separate automated evaluator provides turn-by-turn feedback on therapist responses based on established ACT fidelity criteria. Rather than aiming to replace supervision, the system is intended to support deliberate practice by enabling experimentation, reflection, and immediate feedback in low-risk settings. Expert evaluation with practicing psychologists confirmed high realism in patient behavior and demonstrated that immediate turn-by-turn ACT feedback increased therapists' awareness of intervention choices and enabled effective experimentation with alternative responses. Quantitative evaluation across 49 therapy transcripts identified GPT-4o-mini as the optimal feedback model, achieving the lowest mean absolute error (MAE = 6.12) in replicating human supervisor ACT fidelity ratings with statistically significant agreement. This work demonstrates the potential of fidelity-aware simulated patients as a scalable complement to psychotherapy training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a system for training psychotherapists in Acceptance and Commitment Therapy (ACT) via spoken dialogue with embodied virtual patient avatars simulated by large language models conditioned on profiles from real therapy sessions. A separate automated evaluator supplies turn-by-turn feedback on therapist responses according to ACT fidelity criteria. The authors report that expert evaluation with practicing psychologists confirmed high realism in patient behavior and that the feedback increased awareness of intervention choices; quantitative evaluation on 49 therapy transcripts identified GPT-4o-mini as optimal, with MAE = 6.12 and statistically significant agreement to human supervisor ratings.

Significance. If the interactive components perform as claimed, the work offers a scalable, low-risk platform for deliberate practice that could complement traditional supervision under resource constraints. The integration of profile-conditioned simulation with fidelity-aware automated feedback represents a concrete step toward accessible training tools; however, the reported evidence is limited to offline transcript analysis and qualitative expert statements, so the practical significance remains conditional on further validation of the live spoken-dialogue loop.

major comments (2)
  1. [Quantitative evaluation] Quantitative evaluation (abstract and associated results): the MAE = 6.12 result and claim of statistically significant agreement are obtained exclusively on 49 static therapy transcripts. No data are supplied on evaluator performance inside the multi-turn spoken, embodied interaction loop where patient responses evolve, speech input is used, and context shifts; this gap directly undermines the central claim that the system supports effective real-time experimentation and feedback.
  2. [Expert evaluation] Expert evaluation (abstract): the statements that experts 'confirmed high realism in patient behavior' and that feedback 'increased therapists' awareness' are presented without any protocol details, number of psychologists, rating scales, or inter-rater metrics. Because realism of the LLM-conditioned avatar in spoken dialogue is a load-bearing premise for the training utility, the absence of these specifics prevents assessment of whether the qualitative evidence supports the claim.
minor comments (1)
  1. [Abstract] The abstract states 'statistically significant agreement' but does not name the test, degrees of freedom, or exact p-value; adding this information would improve interpretability of the MAE result.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments, which identify key gaps in our evaluation reporting. We respond point by point below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Quantitative evaluation] Quantitative evaluation (abstract and associated results): the MAE = 6.12 result and claim of statistically significant agreement are obtained exclusively on 49 static therapy transcripts. No data are supplied on evaluator performance inside the multi-turn spoken, embodied interaction loop where patient responses evolve, speech input is used, and context shifts; this gap directly undermines the central claim that the system supports effective real-time experimentation and feedback.

    Authors: We agree that the quantitative results are derived solely from offline transcript analysis and provide no direct evidence on evaluator performance during live multi-turn spoken interactions. The transcript-based evaluation was chosen as a feasible initial proxy for model selection given practical constraints on live data collection. We will revise the abstract, results, and discussion sections to explicitly delimit the scope of the current evaluation, remove or qualify any overstatements about real-time support, and add a limitations subsection calling for future live-loop studies. No new live-interaction data will be added in this revision. revision: partial

  2. Referee: [Expert evaluation] Expert evaluation (abstract): the statements that experts 'confirmed high realism in patient behavior' and that feedback 'increased therapists' awareness' are presented without any protocol details, number of psychologists, rating scales, or inter-rater metrics. Because realism of the LLM-conditioned avatar in spoken dialogue is a load-bearing premise for the training utility, the absence of these specifics prevents assessment of whether the qualitative evidence supports the claim.

    Authors: The full manuscript contains a dedicated expert-evaluation subsection with protocol details, participant numbers, rating instruments, and inter-rater statistics. These were omitted from the abstract for brevity. We will expand the abstract to include the key quantitative descriptors of the expert study and ensure the methods section supplies complete, reproducible protocol information. revision: yes

standing simulated objections not resolved
  • No empirical data exist on automated-evaluator performance inside the live spoken-dialogue loop; this cannot be supplied without new experiments.

Circularity Check

0 steps flagged

No significant circularity detected in evaluation chain

full rationale

The paper's central quantitative result compares the automated evaluator's ACT fidelity scores directly against independent human supervisor ratings on 49 external therapy transcripts (MAE=6.12, statistically significant agreement), providing an external benchmark rather than any self-referential fit or prediction. Patient simulation is described as conditioned on profiles from real sessions without definitional circularity, and no load-bearing self-citations, uniqueness theorems, or ansatzes are invoked. The derivation remains self-contained via direct comparison to external human judgments.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on LLM capabilities for realistic simulation and automated evaluation accuracy; no explicit free parameters are described. The work assumes LLMs can be effectively conditioned on real-session profiles without detailing how this conditioning is implemented or validated.

axioms (2)
  • domain assumption Large language models can be conditioned on profiles derived from real therapy sessions to simulate realistic patient behavior in spoken dialogue.
    Invoked in the system description for patient simulation.
  • domain assumption An automated evaluator can provide turn-by-turn feedback that aligns with established ACT fidelity criteria as judged by human supervisors.
    Central to the quantitative evaluation claim.
invented entities (1)
  • Embodied virtual patient avatar no independent evidence
    purpose: To enable interactive, low-risk practice sessions for ACT training
    Core new component of the proposed system.

pith-pipeline@v0.9.1-grok · 5745 in / 1580 out tokens · 38077 ms · 2026-06-26T23:08:34.711204+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 2 canonical work pages

  1. [1]

    Acceptance and commitment therapy: model, processes and outcomes,

    S. C. Hayes, J. B. Luoma, F. W. Bond, A. Masuda, and J. Lillis, “Acceptance and commitment therapy: model, processes and outcomes,” Behaviour Research and Therapy, vol. 44, no. 1, pp. 1–25, Jan. 2006

  2. [2]

    S. C. Hayes, K. D. Strosahl, and K. G. Wilson,Acceptance and commitment therapy: The process and practice of mindful change, 2nd ed, ser. Acceptance and commitment therapy: The process and practice of mindful change, 2nd ed. New York, NY , US: The Guilford Press, 2012, pages: xiv, 402

  3. [3]

    Therapist training in evidence-based interventions for mental health: A systematic review of training approaches and outcomes,

    H. E. Frank, E. M. Becker-Haimes, and P. C. Kendall, “Therapist training in evidence-based interventions for mental health: A systematic review of training approaches and outcomes,”Clinical psychology : a publication of the Division of Clinical Psychology of the American Psychological Association, vol. 27, no. 3, p. e12330, Sep. 2020. [Online]. Available:...

  4. [4]

    Rethinking Psychotherapy Training and Supervision: The Case for Deliberate Practice,

    A. Vaz, J. McLeod, and H. A. Nissen-Lie, “Rethinking Psychotherapy Training and Supervision: The Case for Deliberate Practice,” Journal of Clinical Psychology, vol. 81, no. 6, pp. 393–398, 2025, eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/jclp.23777. [On- line]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/jclp. 23777

  5. [5]

    Standardized patients in psychotherapy training and clinical supervision: study protocol for a randomized controlled trial,

    F. K ¨uhne, P. E. Heinze, and F. Weck, “Standardized patients in psychotherapy training and clinical supervision: study protocol for a randomized controlled trial,”Trials, vol. 21, no. 1, p. 276, Dec. 2020. [Online]. Available: https://trialsjournal.biomedcentral.com/articles/10. 1186/s13063-020-4172-z

  6. [6]

    Effects of deliberate practice and structured feedback in psychotherapy training (DeeP): a study protocol of a randomized-control-trial,

    A. Berning, S. Sell, W. Andersen, B. Strauß, and S. Taubner, “Effects of deliberate practice and structured feedback in psychotherapy training (DeeP): a study protocol of a randomized-control-trial,” BMC Psychology, vol. 12, p. 719, Dec. 2024. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC11616299/

  7. [7]

    Simulation-based education to improve communication skills: a systematic review and identification of current best practice,

    A. Blackmore, E. V . Kasfiki, and M. Purva, “Simulation-based education to improve communication skills: a systematic review and identification of current best practice,”BMJ Simulation & Technology Enhanced Learning, vol. 4, no. 4, pp. 159–164, Oct. 2018. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC8990192/

  8. [8]

    Integrating artificial intelligence into medical education: a roadmap informed by a survey of faculty and students,

    M. A. Blanco, S. W. Nelson, S. Ramesh, C. E. Callahan, K. A. Josephs, B. Jacque, and L. E. Baecher-Lind, “Integrating artificial intelligence into medical education: a roadmap informed by a survey of faculty and students,”Medical Education Online, vol. 30, no. 1, p. 2531177. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/ PMC12265092/

  9. [9]

    Treatment fidelity and differentiation for each therapy modality,

    I. M. Goodyer, S. Reynolds, B. Barrett, S. Byford, B. Dubicka, J. Hill, F. Holland, R. Kelvin, N. Midgley, C. Roberts, R. Senior, M. Target, B. Widmer, P. Wilkinson, and P. Fonagy, “Treatment fidelity and differentiation for each therapy modality,” inCognitive–behavioural therapy and short-term psychoanalytic psychotherapy versus brief psychosocial interv...

  10. [10]

    Virtual Patient Simulations in Health Professions Education: Systematic Review and Meta-Analysis by the Digital Health Education Collaboration,

    A. A. Kononowicz, L. A. Woodham, S. Edelbring, N. Stathakarou, D. Davies, N. Saxena, L. Tudor Car, J. Carlstedt-Duke, J. Car, and N. Zary, “Virtual Patient Simulations in Health Professions Education: Systematic Review and Meta-Analysis by the Digital Health Education Collaboration,”Journal of Medical Internet Research, vol. 21, no. 7, p. e14676, Jul. 2019

  11. [11]

    Virtual Simulation Tools for Communication Skills Training in Health Care Professionals: Literature Review,

    M. Fern ´andez-Alc´antara, S. Escribano, R. Juli ´a-Sanchis, A. Castillo- L´opez, A. P´erez-Manzano, M. Macur, S. Kalender-Smajlovi´c, S. Garc´ıa- Sanju´an, and M. J. Caba ˜nero-Mart´ınez, “Virtual Simulation Tools for Communication Skills Training in Health Care Professionals: Literature Review,”JMIR Medical Education, vol. 11, no. 1, p. e63082, May

  12. [12]

    Available: https://mededu.jmir.org/2025/1/e63082

    [Online]. Available: https://mededu.jmir.org/2025/1/e63082

  13. [13]

    Design and Implementation of a Virtual Reality System and Its Application to Training Medical First Responders,

    S. Stansfield, D. Shawver, A. Sobel, M. Prasad, and L. Tapia, “Design and Implementation of a Virtual Reality System and Its Application to Training Medical First Responders,”Presence, vol. 9, no. 6, pp. 524–556, Dec. 2000. [Online]. Available: https://ieeexplore.ieee.org/document/6790601

  14. [14]

    Virtual Human Standardized Patients for Clinical Training,

    T. Talbot and A. Rizzo, “Virtual Human Standardized Patients for Clinical Training,” Aug. 2019, pp. 387–405

  15. [15]

    Assessing Empathy in Mental Health Caregivers using Conversational AI,

    R. Naswa, S. Jaiswal, R. Mavila, W. Yuwen, B. Erdly, and D. Si, “Assessing Empathy in Mental Health Caregivers using Conversational AI,” in2024 IEEE 12th International Conference on Healthcare Informatics (ICHI), Jun. 2024, pp. 538–540, iSSN: 2575-2634. [Online]. Available: https://ieeexplore.ieee.org/document/10628734

  16. [16]

    PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals,

    R. Wang, S. Milani, J. C. Chiu, J. Zhi, S. M. Eack, T. Labrum, S. M. Murphy, N. Jones, K. Hardy, H. Shen, F. Fang, and Z. Z. Chen, “PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals,” Oct. 2024, arXiv:2405.19660 [cs]. [Online]. Available: http://arxiv.org/abs/2405.19660

  17. [17]

    Building a Conversational AI Medium to Enhance Psychotherapy Training with Virtual Patients,

    M. Petrizzo, “Building a Conversational AI Medium to Enhance Psychotherapy Training with Virtual Patients,”Journal of Dawning Research, vol. 7, Jun. 2025. [Online]. Available: https://www.scipedia. com/public/Petrizzo 2025a

  18. [18]

    Embodied virtual patients as a simulation-based framework for training clinician-patient communication skills: an overview of their use in psychiatric and geriatric care,

    L. Chaby, A. Benamara, M. Pino, E. Prigent, B. Ravenet, J.-C. Martin, H. Vanderstichel, R. Becerril-Ortega, A.-S. Rigaud, and M. Chetouani, “Embodied virtual patients as a simulation-based framework for training clinician-patient communication skills: an overview of their use in psychiatric and geriatric care,”Frontiers in virtual reality, 2022, num Pages...

  19. [19]

    A general system for evaluating therapist adherence and competence in psychotherapy research in the addictions,

    K. M. Carroll, C. Nich, R. L. Sifry, K. F. Nuro, T. L. Frankforter, S. A. Ball, L. Fenton, and B. J. Rounsaville, “A general system for evaluating therapist adherence and competence in psychotherapy research in the addictions,”Drug and Alcohol Dependence, vol. 57, no. 3, pp. 225– 238, Jan. 2000

  20. [20]

    Treatment Integrity and Therapeutic Change: Issues and Research Recommendations,

    F. Perepletchikova and A. Kazdin, “Treatment Integrity and Therapeutic Change: Issues and Research Recommendations,”Clinical Psychology: Science and Practice, vol. 12, pp. 365–383, Jan. 2005

  21. [21]

    The development of the Acceptance and Commitment Therapy Fidelity Measure (ACT-FM): A delphi study and field test

    L. O’Neill, G. Latchford, L. M. McCracken, and C. D. Graham, “The development of the Acceptance and Commitment Therapy Fidelity Measure (ACT-FM): A delphi study and field test.”Journal of Contextual Behavioral Science, vol. 14, pp. 111–118, Oct. 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S2212144719300080

  22. [22]

    Development of the Therapist Empathy Scale,

    S. E. Decker, C. Nich, K. M. Carroll, and S. Martino, “Development of the Therapist Empathy Scale,”Behavioural and cognitive psychotherapy, vol. 42, no. 3, pp. 339–354, May 2014. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC3748263/

  23. [23]

    The Thinking Therapist: Training Large Language Models to Deliver Acceptance and Commitment Therapy using Supervised Fine-Tuning and Odds Ratio Policy Optimization,

    T. Tahir, “The Thinking Therapist: Training Large Language Models to Deliver Acceptance and Commitment Therapy using Supervised Fine-Tuning and Odds Ratio Policy Optimization,” Sep. 2025, arXiv:2509.09712 [cs]. [Online]. Available: http://arxiv.org/abs/2509. 09712

  24. [24]

    Psychotherapy for Chronic In- and Outpatients with Common Mental Disorders: The “Choose Change

    A. T. Gloster, E. Haller, J. Villanueva, V . Block, C. Benoy, A. H. Meyer, S. Brogli, V . Kuhweide, M. Karekla, K. Bader, M. Walter, and U. Lang, “Psychotherapy for Chronic In- and Outpatients with Common Mental Disorders: The “Choose Change” Effectiveness Trial.” [Online]. Available: https://dx.doi.org/10.1159/000529411

  25. [25]

    Character Creator: 3D Character Design Software

    “Character Creator: 3D Character Design Software.” [Online]. Available: https://www.reallusion.com/character-creator/default.html

  26. [26]

    Crafting Realistic Virtual Humans: Unveiling Perspectives on Human Perception, Crowds, and Embodied Conversational Agents,

    R. Montanha, V . Araujo, P. Knob, G. Pinho, G. Fonseca, V . Peres, and S. R. Musse, “Crafting Realistic Virtual Humans: Unveiling Perspectives on Human Perception, Crowds, and Embodied Conversational Agents,” 2023 36th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 252–257, Nov. 2023, conference Name: 2023 36th SIBGRAPI Conference on...

  27. [27]

    Audio2Face- 3D: Audio-driven Realistic Facial Animation For Digital Avatars,

    NVIDIA, C. Chung, I. Fedorov, M. Huang, A. Karmanov, D. Korobchenko, R. Ribera, and Y . Seol, “Audio2Face- 3D: Audio-driven Realistic Facial Animation For Digital Avatars,” Aug. 2025, arXiv:2508.16401 [cs]. [Online]. Available: http://arxiv.org/abs/2508.16401

  28. [28]

    Salsa lipsync suite: Real-time lip sync for unity3d,

    C. M. Studio, “Salsa lipsync suite: Real-time lip sync for unity3d,” https://crazyminnowstudio.com/unity-3d/lip-sync-salsa/, 2016, reference for the fallback lip-sync system used when A2F is unavailable, ensuring system robustness and continuous operation