Recognition: unknown
A Survey on LLM-based Conversational User Simulation
Pith reviewed 2026-05-08 03:16 UTC · model grok-4.3
The pith
A new taxonomy classifies LLM-based conversational user simulations by user granularity and simulation objectives.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a novel taxonomy covering user granularity and simulation objectives. Additionally, we systematically analyze core techniques and evaluation methodologies, identifying open challenges and organizing existing work under a unified framework.
What carries the argument
The taxonomy based on user granularity (how detailed or individualized the simulated users are) and simulation objectives (the intended purpose such as training or evaluation).
If this is right
- Existing papers can be reorganized and compared more systematically using the shared taxonomy.
- New simulators can be designed with explicit choices about granularity and objective from the start.
- Evaluation protocols can be aligned to the taxonomy categories rather than developed ad hoc.
- Open challenges identified in the survey become clearer targets for follow-up research.
Where Pith is reading between the lines
- The taxonomy may accelerate progress by making it easier to spot gaps, such as simulations that combine high user granularity with specific robustness-testing objectives.
- Adoption could lead to shared benchmark suites tailored to each taxonomy cell rather than generic dialogue metrics.
- Extensions of the framework might incorporate temporal consistency across long conversations or cultural variation in user behavior.
Load-bearing premise
The taxonomy based on user granularity and simulation objectives provides a comprehensive and non-overlapping classification of all relevant LLM-based conversational user simulation work.
What would settle it
A published LLM-based user simulator that cannot be placed into any category of the proposed taxonomy without stretching or redefining the axes.
Figures
read the original abstract
User simulation has long played a vital role in computer science due to its potential to support a wide range of applications. Language, as the primary medium of human communication, forms the foundation of social interaction and behavior. Consequently, simulating conversational behavior has become a key area of study. Recent advancements in large language models (LLMs) have significantly catalyzed progress in this domain by enabling high-fidelity generation of synthetic user conversation. In this paper, we survey recent advancements in LLM-based conversational user simulation. We introduce a novel taxonomy covering user granularity and simulation objectives. Additionally, we systematically analyze core techniques and evaluation methodologies. We aim to keep the research community informed of the latest advancements in conversational user simulation and to further facilitate future research by identifying open challenges and organizing existing work under a unified framework.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper surveys recent advancements in LLM-based conversational user simulation. It introduces a novel taxonomy based on user granularity and simulation objectives, systematically analyzes core techniques and evaluation methodologies, identifies open challenges, and organizes existing work under a unified framework to inform the research community.
Significance. If the taxonomy proves comprehensive and non-overlapping and the literature coverage is exhaustive, the survey could usefully consolidate a rapidly growing subfield, providing researchers with a shared vocabulary and highlighting gaps in techniques and evaluation practices. The absence of machine-checked elements or new empirical results is expected for a survey, but the value hinges on verifiable systematicity rather than post-hoc organization.
major comments (3)
- [Introduction and §2 (or equivalent methods/literature review section)] The central claim of a 'systematic analysis' and 'unified framework' organizing 'existing work' requires an explicit literature search protocol. No section describes the databases queried, search keywords, date range, inclusion/exclusion criteria, or number of papers screened. Without this, the taxonomy and subsequent technique/evaluation analysis cannot be assessed for completeness or selection bias.
- [§3] §3 (Taxonomy section): The taxonomy is asserted to cover user granularity and simulation objectives comprehensively with negligible overlap. The manuscript provides no inter-coder agreement metric, discussion of edge cases (e.g., multi-objective or hybrid-granularity simulations), or explicit mapping of all surveyed papers to categories. This directly affects the claim that the taxonomy partitions the literature without unclassifiable cases.
- [Evaluation methodologies section] The analysis of evaluation methodologies lacks a clear breakdown of how many papers use each method and whether the taxonomy dimensions correlate with evaluation choices. If the taxonomy is meant to organize the field, the evaluation section should include a contingency table or similar cross-tabulation showing coverage.
minor comments (2)
- [Figure 1] Figure 1 (taxonomy diagram) would benefit from explicit arrows or labels showing how the two dimensions interact, rather than a simple grid.
- [Core techniques section] Some citations in the techniques section appear to be grouped by high-level category without individual paper summaries; adding one-sentence contributions for the most influential works would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important opportunities to improve the transparency and rigor of our survey. We address each major comment point by point below and commit to revisions that strengthen the manuscript without altering its core contributions.
read point-by-point responses
-
Referee: The central claim of a 'systematic analysis' and 'unified framework' organizing 'existing work' requires an explicit literature search protocol. No section describes the databases queried, search keywords, date range, inclusion/exclusion criteria, or number of papers screened. Without this, the taxonomy and subsequent technique/evaluation analysis cannot be assessed for completeness or selection bias.
Authors: We agree that an explicit literature search protocol was not detailed in the original manuscript. In the revised version, we will add a new subsection (placed in the Introduction) that fully documents the search process. This will specify the databases and sources queried (arXiv, ACL Anthology, Google Scholar, and selected conference proceedings), the search keywords and Boolean combinations employed (e.g., 'LLM user simulation', 'conversational user simulation', 'synthetic conversational agents'), the date range (primarily 2022–2024 to capture post-LLM developments), inclusion/exclusion criteria, and the counts of papers identified, screened, and included. This addition will enable readers to evaluate completeness and potential biases directly. revision: yes
-
Referee: §3 (Taxonomy section): The taxonomy is asserted to cover user granularity and simulation objectives comprehensively with negligible overlap. The manuscript provides no inter-coder agreement metric, discussion of edge cases (e.g., multi-objective or hybrid-granularity simulations), or explicit mapping of all surveyed papers to categories. This directly affects the claim that the taxonomy partitions the literature without unclassifiable cases.
Authors: We acknowledge that additional validation details would strengthen the taxonomy claims. In the revision, we will insert an explicit mapping (as a table in §3 or an appendix) that assigns every surveyed paper to its primary taxonomy categories. We will also expand the section to discuss edge cases, including multi-objective and hybrid-granularity simulations, with concrete examples of how they are classified and any boundary decisions made. While formal inter-coder agreement statistics are less common in single-team surveys, we will describe the iterative internal refinement process used to minimize overlap and ensure coverage, supported by the new mapping table. revision: yes
-
Referee: The analysis of evaluation methodologies lacks a clear breakdown of how many papers use each method and whether the taxonomy dimensions correlate with evaluation choices. If the taxonomy is meant to organize the field, the evaluation section should include a contingency table or similar cross-tabulation showing coverage.
Authors: We agree that quantitative cross-analysis would better demonstrate the taxonomy's organizing value. In the revised evaluation methodologies section, we will add (1) explicit counts and percentages of papers using each evaluation method and (2) a contingency table (or equivalent cross-tabulation) that breaks down evaluation methods by the two taxonomy dimensions (user granularity and simulation objectives). This table will highlight coverage, potential correlations, and gaps, directly supporting the unified-framework claim. revision: yes
Circularity Check
No circularity: survey taxonomy and analysis rest on external literature synthesis
full rationale
This is a literature survey paper with no derivations, equations, fitted parameters, or predictions. The central contribution is a proposed taxonomy (user granularity × simulation objectives) plus systematic review of techniques and evaluations drawn from cited external works. No step reduces by construction to the paper's own inputs; the taxonomy is explicitly introduced as novel rather than derived from prior self-citations, and completeness claims are framed as synthesis rather than self-verifying. Standard review practices (citing prior papers) do not trigger any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, and Jinghui Chen
Association for Computational Linguistics. Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, and Jinghui Chen. 2024. Per- sonalized steering of large language models: Versa- tile steering vectors through bi-directional preference optimization. InAdvances in Neural Information Processing Systems 38: Annual Conference on Neu- ral Infor...
-
[2]
Association for Computational Linguistics. Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu, Aili Chen, Nianqi Li, Lida Chen, Caiyu Hu, Siye Wu, Scott Ren, Ziquan Fu, and Yanghua Xiao. 2024b. From persona to personalization: A sur- vey on role-playing language agents.Transactions on Machine...
-
[3]
Mathematical capabilities of chatgpt. InAd- vances in Neural Information Processing Systems 36: Annual Conference on Neural Information Process- ing Systems 2023, NeurIPS 2023. Jingsheng Gao, Yixin Lian, Ziyi Zhou, Yuzhuo Fu, and Baoyuan Wang. 2023. Livechat: A large-scale per- sonalized dialogue dataset automatically constructed from live streaming. InPr...
work page internal anchor Pith review arXiv 2023
-
[4]
In Proceedings of the 2023 CHI Conference on Hu- man Factors in Computing Systems, CHI 2023, pages 433:1–433:19
Evaluating large language models in generat- ing synthetic HCI research data: a case study. In Proceedings of the 2023 CHI Conference on Hu- man Factors in Computing Systems, CHI 2023, pages 433:1–433:19. ACM. F. Maxwell Harper and Joseph A. Konstan. 2015. The movielens datasets: History and context.ACM Trans- actions on Interactive Intelligent Systems, 5...
2023
-
[5]
Beyond pretend-reality dualism: Frame anal- ysis of llm-powered role play with social agents. In Proceedings of the 12th International Conference on Human-Agent Interaction, HAI 2024, pages 393–395. ACM. Joey Hong, Jessica Lin, Anca D. Dragan, and Sergey Levine. 2024. Interactive dialogue agents via re- inforcement learning on hindsight regenerations. CoR...
-
[6]
Chatcollab: Exploring collaboration between humans and AI agents in software teams.CoRR, abs/2412.01992. Aobo Kong, Wentao Ma, Shiwan Zhao, Yongbin Li, Yuchuan Wu, Ke Wang, Xiaoqian Liu, Qicheng Li, Yong Qin, and Fei Huang. 2025. SDPO: segment- level direct preference optimization for social agents. CoRR, abs/2501.01821. Chuyi Kong, Yaxin Fan, Xiang Wan, ...
-
[7]
Branislav Kveton, Csaba Szepesvári, Zheng Wen, and Azin Ashkan
Matrix factorization techniques for recom- mender systems.IEEE Computer, 42(8):30–37. Branislav Kveton, Csaba Szepesvári, Zheng Wen, and Azin Ashkan. 2015. Cascading bandits: Learning to rank in the cascade model. InProceedings of the 32nd International Conference on Machine Learn- ing, ICML 2015, volume 37 ofJMLR Workshop and Conference Proceedings, page...
2015
-
[8]
InGenerative Intelligence and Intelligent Tutoring Systems - 20th International Conference, ITS 2024, volume 14798 ofLecture Notes in Computer Science, pages 131–148
Developing conversational intelligent tutor- ing for speaking skills in second language learning. InGenerative Intelligence and Intelligent Tutoring Systems - 20th International Conference, ITS 2024, volume 14798 ofLecture Notes in Computer Science, pages 131–148. Springer. Joanne Leong, John C. Tang, Edward Cutrell, Sasa Junuzovic, Gregory Paul Baribault...
2024
-
[9]
Proceedings of the ACM on Human-Computer Inter- action, 8(CSCW2):1–28
Dittos: Personalized, embodied agents that participate in meetings when you are unavailable. Proceedings of the ACM on Human-Computer Inter- action, 8(CSCW2):1–28. Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, and Dhruv Batra. 2017. Deal or no deal? end-to-end learning for negotiation dialogues.CoRR, abs/1706.05125. Patrick Lewis, Ethan Perez, A...
-
[10]
InProceedings of the 54th Annual Meeting of the As- sociation for Computational Linguistics, ACL 2016
A persona-based neural conversation model. InProceedings of the 54th Annual Meeting of the As- sociation for Computational Linguistics, ACL 2016. The Association for Computer Linguistics. Juntao Li, Chang Liu, Chongyang Tao, Zhangming Chan, Dongyan Zhao, Min Zhang, and Rui Yan
2016
-
[11]
A personalized conversational benchmark: Towards simulating personalized conversations
Dialogue history matters! personalized re- sponse selection in multi-turn retrieval-based chat- bots.ACM Transactions on Information Systems (TOIS), 39(4):45:1–45:25. Li Li, Peilin Cai, Ryan A. Rossi, Franck Dernon- court, Branislav Kveton, Junda Wu, Tong Yu, Linxin Song, Tiankai Yang, Yuehan Qin, Nesreen K. Ahmed, Samyadeep Basu, Subhojyoti Mukherjee, Ru...
-
[12]
Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S
AAAI Press. Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan, Vishwa Vinay, and Zheng Wen
-
[13]
InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, pages 1685–
Offline evaluation of ranking policies with click models. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, pages 1685–
2018
-
[14]
A survey of personalized large language models: Progress and future directions
ACM. Jieun Lim, Unggi Lee, Junbo Koh, Yeil Jeong, Yun- seo Lee, Gyuri Byun, Haewon Jung, Yoonsun Jang, Sanghyeok Lee, and Jewoong Moon. 2025. De- velopment and implementation of a generative ar- tificial intelligence-enhanced simulation to enhance problem-solving skills for pre-service teachers.Com- puters & Education, 232:105306. Chien-Chang Lin, Anna Y ...
-
[15]
Yajiao Liu, Xin Jiang, Yichun Yin, Yasheng Wang, Fei Mi, Qun Liu, Xiang Wan, and Benyou Wang
Uncertainty estimation and quantification for llms: A simple supervised approach. Yajiao Liu, Xin Jiang, Yichun Yin, Yasheng Wang, Fei Mi, Qun Liu, Xiang Wan, and Benyou Wang. 2023. One cannot stand for everyone! leveraging multi- ple user simulators to train task-oriented dialogue systems. InProceedings of the 61st Annual Meet- ing of the Association for...
2023
-
[16]
Large language models are superpositions of all characters: Attaining arbitrary role-play via self-alignment. InProceedings of the 62nd Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers), ACL 2024, pages 7828–7840. Association for Computational Linguis- tics. Xing Han Lù, Amirhossein Kazemnejad, Nicholas Meade, Arkil...
-
[17]
InProceedings of the 2024 Joint International Con- ference on Computational Linguistics, Language Re- sources and Evaluation, LREC/COLING 2024, pages 5414–5424
Duetsim: Building user simulator with dual large language models for task-oriented dialogues. InProceedings of the 2024 Joint International Con- ference on Computational Linguistics, Language Re- sources and Evaluation, LREC/COLING 2024, pages 5414–5424. ELRA and ICCL. Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mo- han S. Kankanhalli, and Junnan Li. 2025...
2024
-
[18]
Andrea Madotto, Chien-Sheng Wu, and Pascale Fung
Steering conversational large language models for long emotional support conversations.CoRR, abs/2402.10453. Andrea Madotto, Chien-Sheng Wu, and Pascale Fung
-
[19]
Mem2seq: Effectively incorporating knowl- edge bases into end-to-end task-oriented dialog sys- tems. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, pages 1468–1478. Association for Computa- tional Linguistics. Manqing Mao, Paishun Ting, Yijian Xiang, Mingyang Xu, Julia Chen, and Jianzhe Lin. 2024. Mult...
-
[20]
Robert J
Goal alignment in llm-based user simulators for conversational ai. Robert J. Moore and Raphael Arar. 2018. Conversa- tional UX design: An introduction. In Robert J. Moore, Margaret H. Szymanski, Raphael Arar, and Guang-Jie Ren, editors,Studies in Conversational UX Design, Human-Computer Interaction Series, pages 1–16. Springer. Rémi Munos and Andrew W. Mo...
2018
-
[21]
Monika Ol˛ edzka, Mark Benesio Carace, Susana de Oliveira Tomaz, Benny Pan, and Pengfei Jiang
Towards trustworthy knowledge graph reason- ing: An uncertainty aware perspective.Proceedings of the AAAI Conference on Artificial Intelligence, 39(12):12417–12425. Monika Ol˛ edzka, Mark Benesio Carace, Susana de Oliveira Tomaz, Benny Pan, and Pengfei Jiang
-
[22]
Ai as a teaching assistant: An innovative ap- proach to education through customized model an- swer generation and guided practice.Studia Eduka- cyjne, pages 67–79. Intergovernmental Panel on Climate Change. 2021.Cli- mate Change 2021: The Physical Science Basis. Cambridge University Press. OpenAI. 2023. GPT-4 technical report.CoRR, abs/2303.08774. Long O...
work page internal anchor Pith review arXiv 2021
-
[23]
InAdvances in Neural Information Processing Systems 35: Annual Confer- ence on Neural Information Processing Systems 2022, NeurIPS 2022, volume 35, pages 27730–27744
Training language models to follow instruc- tions with human feedback. InAdvances in Neural Information Processing Systems 35: Annual Confer- ence on Neural Information Processing Systems 2022, NeurIPS 2022, volume 35, pages 27730–27744. Sitong Pan, Robin Schmucker, Bernardo Garcia Bulle Bueno, Salome Aguilar Llanes, Fernanda Albo Alar- cón, Hangxiao Zhu,...
2022
-
[24]
Datadreamer: A tool for synthetic data genera- tion and reproducible LLM workflows. InProceed- ings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers), ACL 2024, pages 3781–3799. Association for Computational Linguistics. Nikhil Patel and Sandeep Trivedi. 2020. Leveraging predictive modeling, machine lear...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[25]
InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Sys- tems 2023, NeurIPS 2023
Direct preference optimization: Your language model is secretly a reward model. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Sys- tems 2023, NeurIPS 2023. Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. 2019. Towards empathetic open- domain conversation models: A new benchma...
2023
-
[26]
David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou
Estimation of regression coefficients when some regressors are not always observed.Journal of the American Statistical Association, 89(427):846– 866. David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. Reco- gym: A reinforcement learning environment for the problem of product recommendation in online adver- tising...
-
[27]
Person- ality traits in large language models
Personality traits in large language models. CoRR, abs/2307.00184. Pararth Shah, Dilek Hakkani-Tür, Bing Liu, and Gökhan Tür. 2018. Bootstrapping a neural conversational agent with dialogue self-play, crowdsourcing and on-line reinforcement learning. InProceedings of the 2018 Conference of the North American Chap- ter of the Association for Computational ...
-
[28]
Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu
Role play with large language models.Nature, 623(7987):493–498. Yunfan Shao, Linyang Li, Junqi Dai, and Xipeng Qiu
-
[29]
InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, pages 13153–13187
Character-llm: A trainable agent for role- playing. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, pages 13153–13187. Association for Computational Linguistics. Wentao Shi, Mengqi Yuan, Junkang Wu, Qifan Wang, and Fuli Feng. 2024. Direct multi-turn preference op- timization for language agents. InProc...
2023
-
[30]
Retrieval-augmented simulacra: Generative agents for up-to-date and knowledge-adaptive simulations,
Association for Computational Linguistics. Hikaru Shimadzu, Takehito Utsuro, and Daisuke Ki- tayama. 2025. Retrieval-augmented simulacra: Gen- erative agents for up-to-date and knowledge-adaptive simulations.CoRR, abs/2503.14620. Yifan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, and Bill Yuchen Lin. 2024. Trial and error: Exploration-based trajectory o...
-
[31]
LLaMA: Open and Efficient Foundation Language Models
IEEE. Michael Tomasello. 2010.Origins of Human Communi- cation. MIT Press. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. Llama: Open and efficient foundation languag...
work page internal anchor Pith review arXiv 2010
-
[32]
Multiwoz 2.4: A multi-domain task-oriented dialogue dataset with essential annotation corrections to improve state tracking evaluation. InProceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL 2022, pages 351–360. Association for Computational Lin- guistics. Jingheng Ye, Shen Wang, Deqing Zou, Yibo Yan, Kun...
-
[33]
Mathvc: An llm-simulated multi-character virtual classroom for mathematics education,
Evaluating character understanding of large language models via character profiling from fictional works. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Process- ing, EMNLP 2024, pages 8015–8036. Association for Computational Linguistics. Murong Yue, Wijdane Mifdal, Yixuan Zhang, Jennifer Suh, and Ziyu Yao. 2024. Mathvc: An ...
-
[34]
These simulators are grounded in well-understood dynamics, unlike human be- havior, which remains far more complex and less predictable
provide high-fidelity environments for phys- ical control tasks. These simulators are grounded in well-understood dynamics, unlike human be- havior, which remains far more complex and less predictable. Building large-scale, diverse human simulators remains a core challenge. Inspired by RL success in physical environments, researchers have also developed s...
2018
-
[35]
introduces a multi-turn online iterative frame- work for direct preference learning, specifically de- signed to handle multi-turn reasoning and tool inte- gration. Building on trajectory-based optimization, ETO (Song et al., 2024) develops an exploration- based approach that learns from past exploration trajectories, including failure cases, to improve pe...
2024
-
[36]
It combines Monte Carlo Tree Search (MCTS) with self-critique and iterative fine-tuning, learning from both positive and negative conversational trajecto- ries
further improves the trajectory exploration by addressing the sub-optimal policy outcomes due to compounding errors and limited exploration data. It combines Monte Carlo Tree Search (MCTS) with self-critique and iterative fine-tuning, learning from both positive and negative conversational trajecto- ries. More recently, LOOP (Chen et al., 2025a) trains in...
2024
-
[37]
role-play prompting
augment generation with personal context, and PsyPlay (Yang et al., 2025) builds personality- infused agents capable of portraying designated traits. Beyond persona, more recent work (Mehri et al., 2025) focuses on aligning the goals of the simulated persona. Beyond prompting, fine-tuning methods such as Supervised Fine-Tuning (SFT) and Direct Prefer- enc...
2025
-
[38]
novice buyer
framework introduces a simulation-based ap- proach to multi-party dialogue. MUCA employs a multi-user simulator to mimic the behaviors of several distinct human participants, enabling the training and evaluation of group-aware AI assis- tants. By modeling not just individual utterances but the evolving group dynamics over time, MUCA facilitates the develo...
2023
-
[39]
and LifeStageBench (Fan et al., 2025) also rely on expert or crowd annotators for final scor- ing, sometimes in combination with model judges. Common protocols include: (i) Likert scoring on multiple axes (e.g., naturalness, coherence, goal completion, persona/role fidelity), (ii) pairwise A/B testing that asks which conversation (or response) is better a...
2025
-
[40]
They un- derscore the importance of careful prompt writ- ing and using these methods earlier in the design process for need finding and early feedback
evaluate LLMs’ ability to generate synthetic user research data for usability tasks. They un- derscore the importance of careful prompt writ- ing and using these methods earlier in the design process for need finding and early feedback. The same study even found that LLM simulated con- versational data was often distinguishable from hu- man results, with ...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.