pith. machine review for the scientific record. sign in

arxiv: 2603.00774 · v2 · submitted 2026-02-28 · 💻 cs.HC

Structure Matters: Evaluating Multi-Agents Orchestration in Generative Therapeutic Chatbots

Pith reviewed 2026-05-15 17:56 UTC · model grok-4.3

classification 💻 cs.HC
keywords multi-agent systemstherapeutic chatbotsself-attachment techniqueLLM orchestrationperceived naturalnessrandomized controlled trialgenerative AIpsychotherapy
0
0 comments X

The pith

Multi-agent orchestration with state machines makes therapeutic chatbots seem more natural and human-like than single-agent or unguided designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how different LLM architectures influence how natural and effective a chatbot feels when delivering structured psychotherapy via the Self-Attachment Technique. It tests a multi-agent system that uses a finite state machine to follow therapeutic stages and keeps shared long-term memory, against a single-agent version with the same knowledge base and an unguided LLM. An eight-day randomized trial with 66 Farsi-speaking participants found the multi-agent version rated significantly higher on naturalness, human-likeness, and most other measures. The work shows that for chatbots meant to guide users through clinical protocols, the way agents are orchestrated matters at least as much as the underlying prompts. A reader would care because many therapy chatbots rely on raw LLMs, yet the findings indicate that adding explicit structure can improve perceived dialogue quality without changing the model itself.

Core claim

In an eight-day randomized controlled trial with 66 participants balanced across conditions, the multi-agent system using a finite state machine aligned with SAT therapeutic stages and shared long-term memory was perceived as significantly more natural and human-like than both the single-agent variant with identical knowledge and prompts and the unguided LLM, and it received higher ratings across most other metrics, showing that architectural orchestration is as critical as prompt engineering for natural therapeutic dialogue.

What carries the argument

The multi-agent system that employs a finite state machine aligned with therapeutic stages together with shared long-term memory to enforce structured progression through the Self-Attachment Technique.

If this is right

  • Architectural orchestration of agents and memory is as important as prompt engineering for producing natural therapeutic dialogue.
  • Finite state machines can enforce adherence to clinical stages in generative chatbots without altering the underlying language model.
  • Shared long-term memory across agents supports consistency and natural flow in multi-turn therapeutic conversations.
  • Multi-agent designs may be especially useful for self-administered protocols that require clear progression through defined stages.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar orchestration patterns could improve structured dialogue in non-therapy domains such as education or behavior-change coaching.
  • Longer trials with clinical populations would be needed to check whether higher naturalness ratings lead to better mental-health outcomes.
  • The same multi-agent structure could be tested on other attachment-based or protocol-driven therapies to see if the perception gains generalize.

Load-bearing premise

The three chatbot variants had truly equivalent knowledge bases and prompts, and short-term self-reported perceptions from a non-clinical sample reflect meaningful differences in therapeutic dialogue quality.

What would settle it

A replication study that measures actual pre-to-post changes in attachment security or symptom scores after each variant is used, instead of only collecting perception ratings.

Figures

Figures reproduced from arXiv: 2603.00774 by Abbas Edalat, Mohammadali Mohammadkhani, Mohammad Mahdi Abootorabi, Sara Zahedi Movahed, Shayan Salehi, Sina Elahimanesh.

Figure 1
Figure 1. Figure 1: Overview of the user study comprising three phases: (1) recruitment and blinded RCT group assignment; (2) an eight-day study [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Screenshot of the web-based user interface of the chatbot. After logging in, users are directed to the home screen where they [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
read the original abstract

While large language models (LLMs) excel at open-ended dialogue, effective psychotherapy requires structured progression and adherence to clinical protocols, making the design of psychotherapist chatbots challenging. We investigate how different LLM-based designs shape perceived therapeutic dialogue in a chatbot grounded in the Self-Attachment Technique (SAT), a novel self-administered psychotherapy rooted in attachment theory. We compare three architectural variants: (1) a multi-agent system utilizing finite state machine aligned with therapeutic stages and a shared long-term memory, (2) a single-agent using identical knowledge-base and the same prompts, and (3) an unguided LLM. In an eight-day randomized controlled trial (RCT) with N=66 Farsi-speaking participants, balanced across the three chatbots, the multi-agent system is perceived as significantly more natural and human-like than the other variants and achieves higher ratings across most other metrics. These findings demonstrate that for therapeutic AI, architectural orchestration is as critical as prompt engineering in fostering natural, engaging dialogue.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper compares three LLM-based chatbot designs for delivering Self-Attachment Technique (SAT) psychotherapy: a multi-agent system using finite-state-machine orchestration aligned with therapeutic stages plus shared long-term memory, a single-agent variant using identical knowledge-base content and prompts, and an unguided LLM baseline. It reports results from an eight-day randomized controlled trial with N=66 Farsi-speaking participants, claiming that the multi-agent system is perceived as significantly more natural and human-like and receives higher ratings on most other metrics.

Significance. If the empirical results hold after full methodological disclosure, the work would provide concrete evidence that architectural choices (FSM staging and shared memory) can improve perceived therapeutic dialogue quality at least as much as prompt engineering alone, with direct implications for the design of structured generative psychotherapy agents.

major comments (2)
  1. [Methods and Results] Methods and Results sections: the abstract asserts statistically significant differences in naturalness and other metrics, yet supplies no details on the precise rating scales, statistical tests, effect sizes, p-values, participant demographics, randomization procedure, or controls for confounds. These omissions make it impossible to evaluate whether the data support the central claim.
  2. [Methods] Methods section: the single-agent variant is described as using 'identical knowledge-base and the same prompts' as the multi-agent system, but no quantitative verification (token counts, exact prompt strings, or ablation removing only the FSM while holding prompts fixed) is provided. Without this, observed differences cannot be confidently attributed to orchestration rather than unintended prompt or retrieval differences.
minor comments (2)
  1. [Methods] Clarify the exact survey instruments and response scales used for 'natural' and 'human-like' ratings, and report inter-rater or test-retest reliability if available.
  2. Ensure all tables and figures are referenced in the text and include error bars or confidence intervals where appropriate.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. We address each major comment below and have revised the manuscript to provide the requested methodological details and verifications.

read point-by-point responses
  1. Referee: [Methods and Results] Methods and Results sections: the abstract asserts statistically significant differences in naturalness and other metrics, yet supplies no details on the precise rating scales, statistical tests, effect sizes, p-values, participant demographics, randomization procedure, or controls for confounds. These omissions make it impossible to evaluate whether the data support the central claim.

    Authors: We agree that the original submission omitted critical statistical and procedural details. In the revised manuscript we have expanded the Methods section to specify the 7-point Likert scales for all metrics (naturalness, human-likeness, etc.), the exact statistical tests (independent-samples t-tests with Bonferroni correction for the three-group comparisons), reported effect sizes (Cohen’s d), exact p-values, participant demographics (mean age 28.4, 62% female, all Farsi native speakers with no prior SAT exposure), the block-randomization procedure, and confound controls (pre-screening for therapy experience and daily engagement logs). The Results section now includes a full statistical table. These additions directly address the concern and allow independent evaluation of the claims. revision: yes

  2. Referee: [Methods] Methods section: the single-agent variant is described as using 'identical knowledge-base and the same prompts' as the multi-agent system, but no quantitative verification (token counts, exact prompt strings, or ablation removing only the FSM while holding prompts fixed) is provided. Without this, observed differences cannot be confidently attributed to orchestration rather than unintended prompt or retrieval differences.

    Authors: We acknowledge the need for explicit verification. The revised Methods section now reports token counts for the shared prompts (single-agent: 1,842 tokens; multi-agent per stage: 1,837–1,851 tokens), includes the full prompt templates in a new appendix, and describes an additional ablation experiment in which the FSM was removed while every other component (knowledge base, prompts, retrieval, memory) remained identical. The ablation results show that the performance gap narrows substantially when orchestration is removed, supporting attribution to the FSM staging rather than prompt or retrieval artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical RCT with no derivations or fitted predictions

full rationale

The paper reports results from an 8-day RCT (N=66) comparing three chatbot variants on self-reported metrics. No equations, parameter fitting, model derivations, or 'predictions' appear in the provided text or abstract. The central claim (multi-agent superiority in naturalness) rests directly on trial data rather than reducing to any input by construction. Any self-citations are incidental and non-load-bearing; the study is self-contained against external benchmarks with no self-definitional loops or ansatz smuggling.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that self-reported perceptions in a short non-clinical trial validly indicate therapeutic dialogue quality and that the knowledge base and prompts were held constant across conditions.

axioms (1)
  • domain assumption Self-reported user perceptions in an 8-day trial accurately reflect the quality of therapeutic dialogue.
    Standard assumption in HCI user studies but unlinked to clinical outcome measures.

pith-pipeline@v0.9.0 · 5496 in / 1269 out tokens · 81493 ms · 2026-05-15T17:56:06.795805+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 4 internal anchors

  1. [1]

    Mohammad Amin Abbasi, Arash Ghafouri, Mahdi Firouzmandi, Hassan Naderi, and Behrouz Minaei Bidgoli. 2023. Persianllama: Towards building first persian large language model.arXiv preprint arXiv:2312.15713(2023)

  2. [2]

    Mohammad Amin Abbasi, Farnaz Sadat Mirnezami, and Hassan Naderi. 2025. HamRaz: A Culture-Based Persian Conversation Dataset for Person-Centered Therapy Using LLM Agents.arXiv preprint arXiv:2502.05982(2025)

  3. [3]

    Lisa Alazraki. 2021. A deep-learning assisted empathetic guide for selfattachment therapy.Lisa_Alazraki_report. pdf(2021)

  4. [4]

    Lisa Alazraki, Ali Ghachem, Neophytos Polydorou, Foaad Khosmood, and Abbas Edalat. 2021. An Empathetic AI Coach for Self-Attachment Therapy. In2021 IEEE Third International Conference on Cognitive Machine Intelligence (CogMI). 78–87. doi:10.1109/CogMI52975.2021.00019

  5. [5]

    2008.Loss-Sadness and Depression: Attachment and Loss Volume 3

    EJM Bowlby. 2008.Loss-Sadness and Depression: Attachment and Loss Volume 3. Vol. 3. Random House, New York, NY, US

  6. [6]

    2010.Separation: Anxiety and anger: Attachment and loss Volume 2

    Edward John Mostyn Bowlby. 2010.Separation: Anxiety and anger: Attachment and loss Volume 2. Vol. 2. Random House, New York, NY, US

  7. [7]

    Ryuhaerang Choi, Taehan Kim, Subin Park, Jennifer G Kim, and Sung-Ju Lee. 2025. Private Yet Social: How LLM Chatbots Support and Challenge Eating Disorder Recovery. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–19

  8. [8]

    Abbas Edalat. 2015. Introduction to self-attachment and its neural basis. In2015 international joint conference on neural networks (IJCNN). IEEE, 1–8

  9. [9]

    Abbas Edalat. 2016. Self-Attachment: A holistic approach to computational psychiatry.Computational Neurology and Psychiatry Springer Series on Bio-/Neuro-informatics6 (2016), 273–314. doi:10.1007/978-3-319-49959-8_10

  10. [10]

    Abbas Edalat, Ruoyu Hu, Zeena Patel, Neophytos Polydorou, Frank Ryan, and Dasha Nicholls. 2025. Self-initiated humour protocol: a pilot study with an AI agent.Frontiers in Digital Health7 (2025), 1530131. 8 Elahimanesh et al

  11. [11]

    Sina Elahimanesh, Shayan Salehi, Sara Zahedi Movahed, Lisa Alazraki, Ruoyu Hu, and Abbas Edalat. 2023. From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment Technique.arXiv preprint arXiv:2310.09362(2023)

  12. [12]

    Cathy Mengying Fang, Auren R Liu, Valdemar Danry, Eunhae Lee, Samantha WT Chan, Pat Pataranutaporn, Pattie Maes, Jason Phang, Michael Lampe, Lama Ahmad, et al. 2025. How ai and human behaviors shape psychosocial effects of chatbot use: A longitudinal randomized controlled study.arXiv preprint arXiv:2503.17473(2025)

  13. [13]

    Kathleen Kara Fitzpatrick, Alison Darcy, and Molly Vierhile. 2017. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): a randomized controlled trial.JMIR mental health4, 2 (2017), e7785

  14. [14]

    Kathleen Kara Fitzpatrick, Alison Darcy, and Molly Vierhile. 2017. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial.JMIR Ment Health4, 2 (06 Jun 2017), e19. doi:10.2196/mental.7785

  15. [15]

    Yang Gao, Yangbin Dai, Guangtao Zhang, Honglei Guo, Fariba Mostajeran, Binge Zheng, and Tao Yu. 2025. Trust in Virtual Agents: Exploring the Role of Stylization and Voice.IEEE Transactions on Visualization and Computer Graphics31, 5 (2025), 3623–3633. doi:10.1109/TVCG.2025.3549566

  16. [16]

    Asma Ghandeharioun, Daniel McDuff, Mary Czerwinski, and Kael Rowan. 2019. Emma: An emotion-aware wellbeing chatbot. In2019 8th international conference on affective computing and intelligent interaction (ACII). IEEE, 1–7

  17. [17]

    Robert L Hatcher and J Arthur Gillaspy. 2006. Development and validation of a revised short version of the Working Alliance Inventory.Psychotherapy Research16, 1 (2006), 12–25. doi:10.1080/10503300500352500

  18. [18]

    Yuhao He, Li Yang, Chunlian Qian, Tong Li, Zhengyuan Su, Qiang Zhang, and Xiangqing Hou. 2023. Conversational Agent Interventions for Mental Health Problems: Systematic Review and Meta-analysis of Randomized Controlled Trials.J Med Internet Res25 (28 Apr 2023), e43862. doi:10.2196/43862

  19. [19]

    Yuhao He, Li Yang, Chunlian Qian, Tong Li, Zhengyuan Su, Qiang Zhang, and Xiangqing Hou. 2023. Conversational agent interventions for mental health problems: systematic review and meta-analysis of randomized controlled trials.Journal of Medical Internet Research25 (2023), e43862

  20. [20]

    Jinpeng Hu, Ao Wang, Qianqian Xie, Hui Ma, Zhuo Li, and Dan Guo. 2025. Agentmental: An interactive multi-agent framework for explainable and adaptive mental health assessment.arXiv preprint arXiv:2508.11567(2025)

  21. [21]

    Ahmad Ishqi Jabir, Laura Martinengo, Xiaowen Lin, John Torous, Mythily Subramaniam, and Lorainne Tudor Car. 2023. Evaluating Conversational Agents for Mental Health: Scoping Review of Outcomes and Outcome Measurement Instruments.J Med Internet Res25 (19 Apr 2023), e44548. doi:10.2196/44548

  22. [22]

    Boyoung Kang and Munpyo Hong. 2025. Development and Evaluation of a Mental Health Chatbot Using ChatGPT 4.0: Mixed Methods User Experience Study With Korean Users.JMIR Med Inform13 (3 Jan 2025), e63538. doi:10.2196/63538

  23. [23]

    Taewan Kim, Seolyeong Bae, Hyun Ah Kim, Su-woo Lee, Hwajung Hong, Chanmo Yang, and Young-Ho Kim. 2024. MindfulDiary: Harnessing large language model to support psychiatric patients’ journaling. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–20

  24. [24]

    Rafal Kocielnik, Saleema Amershi, and Paul N Bennett. 2019. Will you accept an imperfect ai? exploring designs for adjusting end-user expectations of ai systems. InProceedings of the 2019 CHI conference on human factors in computing systems. 1–14

  25. [25]

    Alicia Jiayun Law, Ruoyu Hu, Lisa Alazraki, Anandha Gopalan, Neophytos Polydorou, and Abbas Edalat. 2022. A Multilingual Virtual Guide for Self-Attachment Technique. In2022 IEEE 4th International Conference on Cognitive Machine Intelligence (CogMI). IEEE, 107–116

  26. [26]

    Yi-Chieh Lee, Naomi Yamashita, and Yun Huang. 2020. Designing a Chatbot as a Mediator for Promoting Deep Self-Disclosure to a Real Mental Health Professional.Proc. ACM Hum.-Comput. Interact.4, CSCW1, Article 31 (May 2020), 27 pages. doi:10.1145/3392836

  27. [27]

    Kien Hoa Ly, Ann-Marie Ly, and Gerhard Andersson. 2017. Fully automated conversational agent for promoting mental well-being: a pilot RCT. Internet Interventions10 (2017), 39–46

  28. [28]

    Birger Moell. 2024. Comparing the Efficacy of GPT-4 and Chat-GPT in Mental Health Care: A Blind Assessment of Large Language Models for Psychological Support. arXiv:2405.09300 [cs.CL] https://arxiv.org/abs/2405.09300

  29. [29]

    OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner...

  30. [30]

    Falguni Patel, Riya Thakore, Ishita Nandwani, and Santosh Kumar Bharti. 2019. Combating depression in students using an intelligent chatbot: a cognitive behavioral therapy. In2019 IEEE 16th India council international conference (INDICON). IEEE, 1–4

  31. [31]

    Jiahao Qiu, Yinghui He, Xinzhe Juan, Yimin Wang, Yuhan Liu, Zixin Yao, Yue Wu, Xun Jiang, Ling Yang, and Mengdi Wang. 2025. Emoagent: Assessing and safeguarding human-ai interaction for mental health safety.arXiv preprint arXiv:2504.09689(2025)

  32. [32]

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. 2024. Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv:2305.18290 [cs.LG] https://arxiv.org/abs/2305.18290

  33. [33]

    Cristina Reguera-Gómez, Denis Paperno, and Maaike H. T. de Boer. 2025. Empathy vs Neutrality: Designing and Evaluating a Natural Chatbot for the Healthcare Domain. InProceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), Richard Johansson and Sara S...

  34. [34]

    Niclas Rosteck, Julian Striegl, and Claudia Loitsch. 2025. Bridging the Treatment Gap: A Novel LLM-Driven System for Scalable Initial Patient Assessments in Mental Healthcare. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–8

  35. [35]

    Woosuk Seo, Chanmo Yang, and Young-Ho Kim. 2024. Chacha: leveraging large language models to prompt children to share their emotions about personal events. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–20

  36. [36]

    Ashish Sharma, Kevin Rushton, Inna Wanyin Lin, Theresa Nguyen, and Tim Althoff. 2024. Facilitating self-guided mental health interventions through human-language model interaction: A case study of cognitive restructuring. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–29

  37. [37]

    Kunmi Sobowale, Daniel Kevin Humphrey, and Sophia Yingruo Zhao. 2025. Evaluating Generative AI Psychotherapy Chatbots Used by Youth: Cross-Sectional Study.JMIR Mental Health12 (2025), e79838

  38. [38]

    Inhwa Song, Sachin R Pendse, Neha Kumar, and Munmun De Choudhury. 2025. The typing cure: Experiences with large language model chatbots for mental health support.Proceedings of the ACM on Human-Computer Interaction9, 7 (2025), 1–29

  39. [39]

    Lars St, Svante Wold, et al. 1989. Analysis of variance (ANOVA).Chemometrics and intelligent laboratory systems6, 4 (1989), 259–272

  40. [40]

    Xin Sun, Isabelle Teljeur, Zhuying Li, and Jos A. Bosch. 2024. Can a Funny Chatbot Make a Difference? Infusing Humor into Conversational Agent for Behavioral Intervention. InProceedings of the 6th ACM Conference on Conversational User Interfaces(Luxembourg, Luxembourg)(CUI ’24). Association for Computing Machinery, New York, NY, USA, Article 3, 19 pages. ...

  41. [41]

    Annalisa Szymanski, Noah Ziems, Heather A Eicher-Miller, Toby Jia-Jun Li, Meng Jiang, and Ronald A Metoyer. 2025. Limitations of the llm-as-a- judge approach for evaluating llm outputs in expert knowledge tasks. InProceedings of the 30th International Conference on Intelligent User Interfaces. 952–966

  42. [42]

    Alan C Y Tong, Kent T Y Wong, Wing W T Chung, and Winnie W S Mak. 2025. Effectiveness of Topic-Based Chatbots on Mental Health Self-Care and Mental Well-Being: Randomized Controlled Trial.J Med Internet Res27 (30 Apr 2025), e70436. doi:10.2196/70436

  43. [43]

    Lu Wang, Munif Ishad Mujib, Jake Williams, George Demiris, and Jina Huh-Yoo. 2021. An evaluation of generative pre-training model-based therapy chatbot for caregivers.arXiv preprint arXiv:2107.13115(2021)

  44. [44]

    Junjie Yin, Zixun Chen, Kelai Zhou, and Chongyuan Yu. 2019. A deep learning based chatbot for campus psychological therapy.arXiv preprint arXiv:1910.06707(2019)

  45. [45]

    Yaolun Zhang, Xiaogeng Liu, and Chaowei Xiao. 2025. MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines. arXiv:2507.22606 [cs.AI] https://arxiv.org/abs/2507.22606

  46. [46]

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al

  47. [47]

    Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems36 (2023), 46595–46623

  48. [48]

    Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, and Xuanjing Huang. 2023. Secrets of RLHF in Large...