Toward Accessible Psychotherapy Training Using AI-Driven Interactive Patient Avatars
Pith reviewed 2026-06-26 23:08 UTC · model grok-4.3
The pith
Large language models can simulate realistic therapy patients and provide accurate feedback on Acceptance and Commitment Therapy interventions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that large language models can be used to simulate patient behavior in therapy dialogues and to automatically assess therapist adherence to ACT fidelity criteria in a manner that aligns with human expert judgments, with the system supporting deliberate practice through immediate feedback.
What carries the argument
LLM-simulated patients conditioned on real therapy profiles paired with an automated ACT fidelity evaluator that provides turn-by-turn feedback.
If this is right
- Therapists gain immediate awareness of intervention choices during practice sessions.
- Experimentation with alternative responses becomes feasible in a low-risk environment.
- Training can occur without the logistical limits of scheduling real patients or supervisors.
- Automated feedback can approximate human supervisor ratings with low error using suitable models like GPT-4o-mini.
Where Pith is reading between the lines
- Such systems might eventually allow measurement of skill improvement over multiple sessions with the avatar.
- Extending the patient profiles to include diverse cultural or demographic backgrounds could address access issues in training.
- The feedback mechanism could be adapted for other evidence-based therapies if fidelity criteria are defined similarly.
Load-bearing premise
Large language models conditioned on real therapy profiles can generate patient responses realistic enough to stand in for human patients, and the automated evaluator can judge ACT fidelity without systematic bias compared to human supervisors.
What would settle it
A controlled study comparing therapist fidelity and client outcomes in real sessions after training with the AI system versus traditional supervision.
Figures
read the original abstract
Training psychotherapists in evidence-based interventions such as Acceptance and Commitment Therapy (ACT) requires repeated practice with meaningful feedback, yet opportunities for safe, standardized training are limited by ethical, logistical, and resource constraints. We introduce a system designed to support ACT-oriented psychotherapy training through spoken dialogue with an embodied virtual patient. The system uses large language models to simulate patient behavior conditioned on profiles derived from real therapy sessions and configurable clinical scenarios, while a separate automated evaluator provides turn-by-turn feedback on therapist responses based on established ACT fidelity criteria. Rather than aiming to replace supervision, the system is intended to support deliberate practice by enabling experimentation, reflection, and immediate feedback in low-risk settings. Expert evaluation with practicing psychologists confirmed high realism in patient behavior and demonstrated that immediate turn-by-turn ACT feedback increased therapists' awareness of intervention choices and enabled effective experimentation with alternative responses. Quantitative evaluation across 49 therapy transcripts identified GPT-4o-mini as the optimal feedback model, achieving the lowest mean absolute error (MAE = 6.12) in replicating human supervisor ACT fidelity ratings with statistically significant agreement. This work demonstrates the potential of fidelity-aware simulated patients as a scalable complement to psychotherapy training.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a system for training psychotherapists in Acceptance and Commitment Therapy (ACT) via spoken dialogue with embodied virtual patient avatars simulated by large language models conditioned on profiles from real therapy sessions. A separate automated evaluator supplies turn-by-turn feedback on therapist responses according to ACT fidelity criteria. The authors report that expert evaluation with practicing psychologists confirmed high realism in patient behavior and that the feedback increased awareness of intervention choices; quantitative evaluation on 49 therapy transcripts identified GPT-4o-mini as optimal, with MAE = 6.12 and statistically significant agreement to human supervisor ratings.
Significance. If the interactive components perform as claimed, the work offers a scalable, low-risk platform for deliberate practice that could complement traditional supervision under resource constraints. The integration of profile-conditioned simulation with fidelity-aware automated feedback represents a concrete step toward accessible training tools; however, the reported evidence is limited to offline transcript analysis and qualitative expert statements, so the practical significance remains conditional on further validation of the live spoken-dialogue loop.
major comments (2)
- [Quantitative evaluation] Quantitative evaluation (abstract and associated results): the MAE = 6.12 result and claim of statistically significant agreement are obtained exclusively on 49 static therapy transcripts. No data are supplied on evaluator performance inside the multi-turn spoken, embodied interaction loop where patient responses evolve, speech input is used, and context shifts; this gap directly undermines the central claim that the system supports effective real-time experimentation and feedback.
- [Expert evaluation] Expert evaluation (abstract): the statements that experts 'confirmed high realism in patient behavior' and that feedback 'increased therapists' awareness' are presented without any protocol details, number of psychologists, rating scales, or inter-rater metrics. Because realism of the LLM-conditioned avatar in spoken dialogue is a load-bearing premise for the training utility, the absence of these specifics prevents assessment of whether the qualitative evidence supports the claim.
minor comments (1)
- [Abstract] The abstract states 'statistically significant agreement' but does not name the test, degrees of freedom, or exact p-value; adding this information would improve interpretability of the MAE result.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which identify key gaps in our evaluation reporting. We respond point by point below and indicate planned revisions.
read point-by-point responses
-
Referee: [Quantitative evaluation] Quantitative evaluation (abstract and associated results): the MAE = 6.12 result and claim of statistically significant agreement are obtained exclusively on 49 static therapy transcripts. No data are supplied on evaluator performance inside the multi-turn spoken, embodied interaction loop where patient responses evolve, speech input is used, and context shifts; this gap directly undermines the central claim that the system supports effective real-time experimentation and feedback.
Authors: We agree that the quantitative results are derived solely from offline transcript analysis and provide no direct evidence on evaluator performance during live multi-turn spoken interactions. The transcript-based evaluation was chosen as a feasible initial proxy for model selection given practical constraints on live data collection. We will revise the abstract, results, and discussion sections to explicitly delimit the scope of the current evaluation, remove or qualify any overstatements about real-time support, and add a limitations subsection calling for future live-loop studies. No new live-interaction data will be added in this revision. revision: partial
-
Referee: [Expert evaluation] Expert evaluation (abstract): the statements that experts 'confirmed high realism in patient behavior' and that feedback 'increased therapists' awareness' are presented without any protocol details, number of psychologists, rating scales, or inter-rater metrics. Because realism of the LLM-conditioned avatar in spoken dialogue is a load-bearing premise for the training utility, the absence of these specifics prevents assessment of whether the qualitative evidence supports the claim.
Authors: The full manuscript contains a dedicated expert-evaluation subsection with protocol details, participant numbers, rating instruments, and inter-rater statistics. These were omitted from the abstract for brevity. We will expand the abstract to include the key quantitative descriptors of the expert study and ensure the methods section supplies complete, reproducible protocol information. revision: yes
- No empirical data exist on automated-evaluator performance inside the live spoken-dialogue loop; this cannot be supplied without new experiments.
Circularity Check
No significant circularity detected in evaluation chain
full rationale
The paper's central quantitative result compares the automated evaluator's ACT fidelity scores directly against independent human supervisor ratings on 49 external therapy transcripts (MAE=6.12, statistically significant agreement), providing an external benchmark rather than any self-referential fit or prediction. Patient simulation is described as conditioned on profiles from real sessions without definitional circularity, and no load-bearing self-citations, uniqueness theorems, or ansatzes are invoked. The derivation remains self-contained via direct comparison to external human judgments.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Large language models can be conditioned on profiles derived from real therapy sessions to simulate realistic patient behavior in spoken dialogue.
- domain assumption An automated evaluator can provide turn-by-turn feedback that aligns with established ACT fidelity criteria as judged by human supervisors.
invented entities (1)
-
Embodied virtual patient avatar
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Acceptance and commitment therapy: model, processes and outcomes,
S. C. Hayes, J. B. Luoma, F. W. Bond, A. Masuda, and J. Lillis, “Acceptance and commitment therapy: model, processes and outcomes,” Behaviour Research and Therapy, vol. 44, no. 1, pp. 1–25, Jan. 2006
2006
-
[2]
S. C. Hayes, K. D. Strosahl, and K. G. Wilson,Acceptance and commitment therapy: The process and practice of mindful change, 2nd ed, ser. Acceptance and commitment therapy: The process and practice of mindful change, 2nd ed. New York, NY , US: The Guilford Press, 2012, pages: xiv, 402
2012
-
[3]
Therapist training in evidence-based interventions for mental health: A systematic review of training approaches and outcomes,
H. E. Frank, E. M. Becker-Haimes, and P. C. Kendall, “Therapist training in evidence-based interventions for mental health: A systematic review of training approaches and outcomes,”Clinical psychology : a publication of the Division of Clinical Psychology of the American Psychological Association, vol. 27, no. 3, p. e12330, Sep. 2020. [Online]. Available:...
2020
-
[4]
Rethinking Psychotherapy Training and Supervision: The Case for Deliberate Practice,
A. Vaz, J. McLeod, and H. A. Nissen-Lie, “Rethinking Psychotherapy Training and Supervision: The Case for Deliberate Practice,” Journal of Clinical Psychology, vol. 81, no. 6, pp. 393–398, 2025, eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/jclp.23777. [On- line]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/jclp. 23777
-
[5]
Standardized patients in psychotherapy training and clinical supervision: study protocol for a randomized controlled trial,
F. K ¨uhne, P. E. Heinze, and F. Weck, “Standardized patients in psychotherapy training and clinical supervision: study protocol for a randomized controlled trial,”Trials, vol. 21, no. 1, p. 276, Dec. 2020. [Online]. Available: https://trialsjournal.biomedcentral.com/articles/10. 1186/s13063-020-4172-z
2020
-
[6]
Effects of deliberate practice and structured feedback in psychotherapy training (DeeP): a study protocol of a randomized-control-trial,
A. Berning, S. Sell, W. Andersen, B. Strauß, and S. Taubner, “Effects of deliberate practice and structured feedback in psychotherapy training (DeeP): a study protocol of a randomized-control-trial,” BMC Psychology, vol. 12, p. 719, Dec. 2024. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC11616299/
2024
-
[7]
Simulation-based education to improve communication skills: a systematic review and identification of current best practice,
A. Blackmore, E. V . Kasfiki, and M. Purva, “Simulation-based education to improve communication skills: a systematic review and identification of current best practice,”BMJ Simulation & Technology Enhanced Learning, vol. 4, no. 4, pp. 159–164, Oct. 2018. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC8990192/
2018
-
[8]
Integrating artificial intelligence into medical education: a roadmap informed by a survey of faculty and students,
M. A. Blanco, S. W. Nelson, S. Ramesh, C. E. Callahan, K. A. Josephs, B. Jacque, and L. E. Baecher-Lind, “Integrating artificial intelligence into medical education: a roadmap informed by a survey of faculty and students,”Medical Education Online, vol. 30, no. 1, p. 2531177. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/ PMC12265092/
-
[9]
Treatment fidelity and differentiation for each therapy modality,
I. M. Goodyer, S. Reynolds, B. Barrett, S. Byford, B. Dubicka, J. Hill, F. Holland, R. Kelvin, N. Midgley, C. Roberts, R. Senior, M. Target, B. Widmer, P. Wilkinson, and P. Fonagy, “Treatment fidelity and differentiation for each therapy modality,” inCognitive–behavioural therapy and short-term psychoanalytic psychotherapy versus brief psychosocial interv...
2017
-
[10]
Virtual Patient Simulations in Health Professions Education: Systematic Review and Meta-Analysis by the Digital Health Education Collaboration,
A. A. Kononowicz, L. A. Woodham, S. Edelbring, N. Stathakarou, D. Davies, N. Saxena, L. Tudor Car, J. Carlstedt-Duke, J. Car, and N. Zary, “Virtual Patient Simulations in Health Professions Education: Systematic Review and Meta-Analysis by the Digital Health Education Collaboration,”Journal of Medical Internet Research, vol. 21, no. 7, p. e14676, Jul. 2019
2019
-
[11]
Virtual Simulation Tools for Communication Skills Training in Health Care Professionals: Literature Review,
M. Fern ´andez-Alc´antara, S. Escribano, R. Juli ´a-Sanchis, A. Castillo- L´opez, A. P´erez-Manzano, M. Macur, S. Kalender-Smajlovi´c, S. Garc´ıa- Sanju´an, and M. J. Caba ˜nero-Mart´ınez, “Virtual Simulation Tools for Communication Skills Training in Health Care Professionals: Literature Review,”JMIR Medical Education, vol. 11, no. 1, p. e63082, May
-
[12]
Available: https://mededu.jmir.org/2025/1/e63082
[Online]. Available: https://mededu.jmir.org/2025/1/e63082
2025
-
[13]
S. Stansfield, D. Shawver, A. Sobel, M. Prasad, and L. Tapia, “Design and Implementation of a Virtual Reality System and Its Application to Training Medical First Responders,”Presence, vol. 9, no. 6, pp. 524–556, Dec. 2000. [Online]. Available: https://ieeexplore.ieee.org/document/6790601
arXiv 2000
-
[14]
Virtual Human Standardized Patients for Clinical Training,
T. Talbot and A. Rizzo, “Virtual Human Standardized Patients for Clinical Training,” Aug. 2019, pp. 387–405
2019
-
[15]
Assessing Empathy in Mental Health Caregivers using Conversational AI,
R. Naswa, S. Jaiswal, R. Mavila, W. Yuwen, B. Erdly, and D. Si, “Assessing Empathy in Mental Health Caregivers using Conversational AI,” in2024 IEEE 12th International Conference on Healthcare Informatics (ICHI), Jun. 2024, pp. 538–540, iSSN: 2575-2634. [Online]. Available: https://ieeexplore.ieee.org/document/10628734
arXiv 2024
-
[16]
R. Wang, S. Milani, J. C. Chiu, J. Zhi, S. M. Eack, T. Labrum, S. M. Murphy, N. Jones, K. Hardy, H. Shen, F. Fang, and Z. Z. Chen, “PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals,” Oct. 2024, arXiv:2405.19660 [cs]. [Online]. Available: http://arxiv.org/abs/2405.19660
arXiv 2024
-
[17]
Building a Conversational AI Medium to Enhance Psychotherapy Training with Virtual Patients,
M. Petrizzo, “Building a Conversational AI Medium to Enhance Psychotherapy Training with Virtual Patients,”Journal of Dawning Research, vol. 7, Jun. 2025. [Online]. Available: https://www.scipedia. com/public/Petrizzo 2025a
2025
-
[18]
Embodied virtual patients as a simulation-based framework for training clinician-patient communication skills: an overview of their use in psychiatric and geriatric care,
L. Chaby, A. Benamara, M. Pino, E. Prigent, B. Ravenet, J.-C. Martin, H. Vanderstichel, R. Becerril-Ortega, A.-S. Rigaud, and M. Chetouani, “Embodied virtual patients as a simulation-based framework for training clinician-patient communication skills: an overview of their use in psychiatric and geriatric care,”Frontiers in virtual reality, 2022, num Pages...
2022
-
[19]
A general system for evaluating therapist adherence and competence in psychotherapy research in the addictions,
K. M. Carroll, C. Nich, R. L. Sifry, K. F. Nuro, T. L. Frankforter, S. A. Ball, L. Fenton, and B. J. Rounsaville, “A general system for evaluating therapist adherence and competence in psychotherapy research in the addictions,”Drug and Alcohol Dependence, vol. 57, no. 3, pp. 225– 238, Jan. 2000
2000
-
[20]
Treatment Integrity and Therapeutic Change: Issues and Research Recommendations,
F. Perepletchikova and A. Kazdin, “Treatment Integrity and Therapeutic Change: Issues and Research Recommendations,”Clinical Psychology: Science and Practice, vol. 12, pp. 365–383, Jan. 2005
2005
-
[21]
The development of the Acceptance and Commitment Therapy Fidelity Measure (ACT-FM): A delphi study and field test
L. O’Neill, G. Latchford, L. M. McCracken, and C. D. Graham, “The development of the Acceptance and Commitment Therapy Fidelity Measure (ACT-FM): A delphi study and field test.”Journal of Contextual Behavioral Science, vol. 14, pp. 111–118, Oct. 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S2212144719300080
2019
-
[22]
Development of the Therapist Empathy Scale,
S. E. Decker, C. Nich, K. M. Carroll, and S. Martino, “Development of the Therapist Empathy Scale,”Behavioural and cognitive psychotherapy, vol. 42, no. 3, pp. 339–354, May 2014. [Online]. Available: https://pmc.ncbi.nlm.nih.gov/articles/PMC3748263/
2014
-
[23]
T. Tahir, “The Thinking Therapist: Training Large Language Models to Deliver Acceptance and Commitment Therapy using Supervised Fine-Tuning and Odds Ratio Policy Optimization,” Sep. 2025, arXiv:2509.09712 [cs]. [Online]. Available: http://arxiv.org/abs/2509. 09712
arXiv 2025
-
[24]
Psychotherapy for Chronic In- and Outpatients with Common Mental Disorders: The “Choose Change
A. T. Gloster, E. Haller, J. Villanueva, V . Block, C. Benoy, A. H. Meyer, S. Brogli, V . Kuhweide, M. Karekla, K. Bader, M. Walter, and U. Lang, “Psychotherapy for Chronic In- and Outpatients with Common Mental Disorders: The “Choose Change” Effectiveness Trial.” [Online]. Available: https://dx.doi.org/10.1159/000529411
-
[25]
Character Creator: 3D Character Design Software
“Character Creator: 3D Character Design Software.” [Online]. Available: https://www.reallusion.com/character-creator/default.html
-
[26]
R. Montanha, V . Araujo, P. Knob, G. Pinho, G. Fonseca, V . Peres, and S. R. Musse, “Crafting Realistic Virtual Humans: Unveiling Perspectives on Human Perception, Crowds, and Embodied Conversational Agents,” 2023 36th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pp. 252–257, Nov. 2023, conference Name: 2023 36th SIBGRAPI Conference on...
arXiv 2023
-
[27]
Audio2Face- 3D: Audio-driven Realistic Facial Animation For Digital Avatars,
NVIDIA, C. Chung, I. Fedorov, M. Huang, A. Karmanov, D. Korobchenko, R. Ribera, and Y . Seol, “Audio2Face- 3D: Audio-driven Realistic Facial Animation For Digital Avatars,” Aug. 2025, arXiv:2508.16401 [cs]. [Online]. Available: http://arxiv.org/abs/2508.16401
arXiv 2025
-
[28]
Salsa lipsync suite: Real-time lip sync for unity3d,
C. M. Studio, “Salsa lipsync suite: Real-time lip sync for unity3d,” https://crazyminnowstudio.com/unity-3d/lip-sync-salsa/, 2016, reference for the fallback lip-sync system used when A2F is unavailable, ensuring system robustness and continuous operation
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.