Agentic Persona Generation with Critique-Refinement: An Industrial Evaluation

David Dewar; Mehrdad Sabetzadeh; Mohammad Hossein Amini; Shiva Nejati

arxiv: 2606.09637 · v1 · pith:XQGMHTAEnew · submitted 2026-06-08 · 💻 cs.SE

Agentic Persona Generation with Critique-Refinement: An Industrial Evaluation

Mohammad Hossein Amini , David Dewar , Shiva Nejati , Mehrdad Sabetzadeh This is my paper

Pith reviewed 2026-06-27 15:19 UTC · model grok-4.3

classification 💻 cs.SE

keywords persona generationLLM agentscritique-refinementindustrial evaluationsoftware engineeringrequirements elicitationagentic systemsexpert validation

0 comments

The pith

PerGent generates personas via an iterative LLM critique-refinement loop that reaches 96.9% expert approval in an industrial test.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PerGent as a method to automate persona creation for software engineering tasks such as requirements elicitation. It coordinates a generator LLM and a critic LLM through an orchestrator that refines outputs over multiple rounds using sources like interviews, surveys, and job postings. In a deployment at Kinaxis, PerGent outperformed three baselines, including one-shot approaches, with the highest expert approval rate while reproducing more content from prior expert personas and adding substantial new material. A reader would care because manual persona development remains costly and difficult to scale, so a reliable automated alternative could expand their use in design and validation.

Core claim

PerGent, an industry-grade method for persona generation built around an iterative critique-refinement loop, uses a generator and a critic LLM agent coordinated by an orchestrator to refine personas from external resources such as interviews, surveys, and job postings through a user-defined maximum number of rounds. In an expert in-situ evaluation at Kinaxis, PerGent achieved the highest expert approval rate of 96.9 percent, exceeding all baselines. Compared to baselines, PerGent reproduces a larger proportion of expert content while also contributing substantial new content beyond the pre-LLM personas.

What carries the argument

The critique-refinement loop in which a generator LLM produces personas and a critic LLM evaluates them against provided data sources, coordinated by an orchestrator for iterative passes up to a maximum round limit.

If this is right

PerGent exceeds all one-shot baselines in expert approval rate during in-situ evaluation.
PerGent reproduces a larger proportion of content from pre-existing expert personas than the baselines.
PerGent contributes substantial new content not found in pre-LLM expert personas.
The method supports deployment and evaluation inside an active industrial context such as Kinaxis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the iterative loop drives the gains, single-shot LLM methods may consistently miss nuanced details that multiple critique passes can capture.
The same generator-critic structure could be tested on related artifacts such as user stories or acceptance criteria.
Lessons from the Kinaxis deployment could inform how teams set round limits or select data sources when adapting the method elsewhere.

Load-bearing premise

Expert approval rates from a single-company in-situ setting provide an unbiased and generalizable measure of persona quality.

What would settle it

A replication study at a different company with independent experts that finds PerGent's approval rate falls below one or more one-shot baselines.

Figures

Figures reproduced from arXiv: 2606.09637 by David Dewar, Mehrdad Sabetzadeh, Mohammad Hossein Amini, Shiva Nejati.

**Figure 3.** Figure 3: Overview of our agentic persona generator (PerGent). [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Workflow of PERGENT and the baselines. PERGENTNORES uses both Steps 1 and 2, but without external resources; ONESHOT and ONESHOT+RES use only Step 1, with external resources used only by ONESHOT+RES. RQ3 (Cost). What is the cost of PerGent in terms of LLM calls and tokens per call? We compare PerGent and the baselines on persona-generation cost using the number of LLM calls and tokens required to generat… view at source ↗

**Figure 5.** Figure 5: Experiment workflows for our research questions (RQ1, RQ2 and RQ3) [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Edit categories for generated persona items. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Distinctness vs. full preservation Relative to ONESHOT, PERGENT improves distinctness and full preservation by 10.7% and 12.5%, respectively; the corresponding gains over ONESHOT+RES are 9.4% and 12.4%, and over PERGENTNORES are 5.1% and 9.3% [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

read the original abstract

Personas are widely used in software engineering to support requirements elicitation, design, and validation, but their manual creation is costly, time-consuming, and hard to scale. Recent LLM-based approaches automate persona generation from textual data; however, they typically rely on single-shot generation and subjective evaluations, limiting practical reliability. We present PerGent, an industry-grade method for persona generation built around an iterative critique-refinement loop. Specifically, PerGent uses a generator and a critic LLM agent, coordinated by an orchestrator, to iteratively refine personas using external resources such as interviews, surveys, and job postings through a critique-refinement loop with a user-defined maximum number of rounds. We deploy and evaluate PerGent in an industrial setting at Kinaxis, comparing it with three baselines, including one-shot methods. In an expert in-situ evaluation, PerGent achieved the highest expert approval rate (96.9%), exceeding all baselines. We further compare PerGent-generated personas with best-practice personas manually created by domain experts prior to the adoption of LLMs. Compared to baselines, PerGent reproduces a larger proportion of expert content while also contributing substantial new content beyond the pre-LLM personas. We conclude with lessons learned from deploying and evaluating PerGent at Kinaxis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PerGent shows a workable generator-critic loop for personas in one company setting, but the high approval rate may partly reflect the iterative process and shared context rather than method quality alone.

read the letter

The paper introduces PerGent, which coordinates a generator LLM and a critic LLM through an orchestrator to iteratively refine personas using external sources like interviews and job postings, up to a user-set round limit. They deployed it at Kinaxis and report 96.9% expert approval, higher than the single-shot LLM baselines, plus better overlap with pre-LLM manual personas while adding new content.

What is new is the explicit multi-agent critique-refinement loop grounded in real external resources rather than pure single-shot prompting. The industrial case and the direct comparison to existing expert-created personas give it more grounding than most LLM persona papers.

The soft spot is the evaluation. Ratings come from in-situ experts at the same company, and only PerGent runs the iterative loop; baselines do not. The abstract gives no expert count, blinding procedure, or inter-rater numbers, so the approval gap could partly trace to evaluator familiarity with the Kinaxis context or the refinement process itself. Single-company scope also caps how far the numbers generalize.

This is for SE teams that need scalable persona generation and for researchers studying agentic LLM workflows. It has enough concrete deployment and quantitative comparison to merit peer review, though referees will likely ask for more on the rating protocol and blinding.

Referee Report

1 major / 1 minor

Summary. The paper presents PerGent, an agentic persona generation method that uses a generator-critic LLM pair coordinated by an orchestrator in an iterative critique-refinement loop (with user-defined max rounds and external resources such as interviews and job postings). In an industrial deployment at Kinaxis, PerGent is compared to three baselines (including one-shot LLM methods) and to pre-LLM manually created expert personas; the central empirical claim is that PerGent attains a 96.9% expert approval rate (highest among methods) while reproducing a larger share of expert content and adding substantial new content.

Significance. If the evaluation results hold under more rigorous controls, the work supplies a rare industrial case study showing that multi-agent iterative refinement can outperform single-shot LLM generation for a practically important SE artifact. The explicit comparison against pre-LLM expert personas provides independent grounding that is uncommon in LLM persona papers and strengthens the practical relevance of the findings.

major comments (1)

[Abstract and Evaluation section] Abstract and Evaluation section: the central superiority claim rests on the reported 96.9% expert approval rate and the reproduction metric versus pre-LLM personas, yet the manuscript supplies no information on the number of experts, blinding procedures, inter-rater agreement, or how familiarity with the iterative Kinaxis context was controlled. These omissions are load-bearing because the evaluation is in-situ and the iterative loop is unique to PerGent.

minor comments (1)

[Method section] The description of how external resources are ingested and how the orchestrator decides termination could be expanded for reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comments on the evaluation methodology. We agree that additional details are needed to support the reported results and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and Evaluation section] Abstract and Evaluation section: the central superiority claim rests on the reported 96.9% expert approval rate and the reproduction metric versus pre-LLM personas, yet the manuscript supplies no information on the number of experts, blinding procedures, inter-rater agreement, or how familiarity with the iterative Kinaxis context was controlled. These omissions are load-bearing because the evaluation is in-situ and the iterative loop is unique to PerGent.

Authors: We agree that the manuscript omits key details on the evaluation procedure. In the revised version we will expand the Evaluation section with a dedicated paragraph reporting the exact number of experts who performed the approval ratings, any blinding procedures employed, inter-rater agreement statistics, and how experts' familiarity with the Kinaxis context was handled. Because the study is an in-situ industrial deployment, complete blinding to the generation method was not feasible; we will explicitly note this limitation and describe the mitigation steps taken. These additions will allow readers to assess the strength of the 96.9% approval claim and the pre-LLM persona comparison. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on external expert judgments and pre-LLM baselines

full rationale

The paper presents an industrial evaluation of PerGent using expert approval rates (96.9%) collected in-situ at Kinaxis and direct comparisons to independently created pre-LLM personas. These metrics are external to the generation process and not reduced to fitted parameters, self-definitions, or self-citation chains. No equations, uniqueness theorems, or ansatzes are invoked; the central claims are grounded in independent human assessment rather than internal construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the procedural effectiveness of the critique-refinement loop and on the validity of expert approval as a quality signal. No numerical parameters are fitted, no new physical or mathematical entities are postulated, and the only notable assumption is the reliability of the expert evaluation.

axioms (1)

domain assumption Expert in-situ approval rates constitute a reliable and unbiased proxy for persona quality and utility.
The 96.9% approval figure and the comparison to pre-LLM personas are treated as decisive evidence of superiority.

pith-pipeline@v0.9.1-grok · 5759 in / 1350 out tokens · 26600 ms · 2026-06-27T15:19:23.217806+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 3 linked inside Pith

[1]

Personacraft: Leveraging language models for data-driven persona development,

S. Jung, J. Salminen, K. K. Aldous, and B. J. Jansen, “Personacraft: Leveraging language models for data-driven persona development,” International Journal of Human-Computer Studies, vol. 197, p. 103445,
[2]

Available: https://doi.org/10.1016/j.ijhcs.2025.103445

[Online]. Available: https://doi.org/10.1016/j.ijhcs.2025.103445

doi:10.1016/j.ijhcs.2025.103445 2025
[3]

Deus ex machina and personas from large language models: Investigating the composition of AI-generated persona descriptions,

J. Salminen, C. Liu, W. Pian, J. Chi, E. H ¨ayh¨anen, and B. J. Jansen, “Deus ex machina and personas from large language models: Investigating the composition of AI-generated persona descriptions,” inProceedings of the CHI Conference on Human Factors in Computing Systems, 2024. [Online]. Available: https: //doi.org/10.1145/3613904.3642036

doi:10.1145/3613904.3642036 2024
[4]

Who uses personas in requirements engineering: The practitioners’ perspective,

Y . Wang, C. Arora, X. Liu, T. Hoang, V . Malhotra, B. Cheng, and J. C. Grundy, “Who uses personas in requirements engineering: The practitioners’ perspective,”Information and Software Technology, vol. 178, p. 107609, 2025. [Online]. Available: https://doi.org/10.1016/j. infsof.2024.107609

doi:10.1016/j 2025
[5]

Personagen: A tool for generating personas from user feedback,

X. Zhang, L. Liu, Y . Wang, X. Liu, H. Wang, A. Ren, and C. Arora, “Personagen: A tool for generating personas from user feedback,” inProceedings of 31st IEEE International Requirements Engineering Conference (RE’23), 2023, pp. 353–354. [Online]. Available: https://doi.org/10.1109/RE57278.2023.00048

doi:10.1109/re57278.2023.00048 2023
[6]

Cooper,The Inmates Are Running the Asylum: Why High-Tech Products Drive Us Crazy and How to Restore the Sanity

A. Cooper,The Inmates Are Running the Asylum: Why High-Tech Products Drive Us Crazy and How to Restore the Sanity. Sams Publishing, 1999

1999
[7]

Understanding human-AI workflows for generating personas,

J. Shin, M. A. Hedderich, B. J. Rey, A. Lucero, and A. Oulasvirta, “Understanding human-AI workflows for generating personas,” in Proceedings of the 2024 ACM Designing Interactive Systems Conference, 2024, pp. 757–781. [Online]. Available: https://doi.org/10. 1145/3643834.3660729

arXiv 2024
[8]

Imaginary people representing real numbers: Generating personas from online social media data,

J. An, H. Kwak, S.-G. Jung, J. Salminen, M. Ahmad, and B. J. Jansen, “Imaginary people representing real numbers: Generating personas from online social media data,”ACM Transactions on the Web, vol. 12, no. 4, 2018. [Online]. Available: https://doi.org/10.1145/3265986

doi:10.1145/3265986 2018
[9]

From flat file to interface: Synthesis of personas and analytics for enhanced user understanding,

B. J. Jansen, S. Jung, and J. Salminen, “From flat file to interface: Synthesis of personas and analytics for enhanced user understanding,”Proceedings of the Association for Information Science and Technology, vol. 57, no. 1, 2020. [Online]. Available: https://doi.org/10.1002/pra2.215

doi:10.1002/pra2.215 2020
[10]

Automatic persona generation (apg): A rationale and demonstration,

S. Jung, J. Salminen, H. Kwak, J. An, and B. J. Jansen, “Automatic persona generation (apg): A rationale and demonstration,” inExtended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, 2018, p. 321–324. [Online]. Available: https://doi.org/10.1145/3176349.3176893

doi:10.1145/3176349.3176893 2018
[11]

Generating personas using LLMs and assessing their viability,

A. Schuller, D. Janssen, J. Blumenr ¨other, T. M. Probst, M. Schmidt, and C. Kumar, “Generating personas using LLMs and assessing their viability,” inExtended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024. [Online]. Available: https://doi.org/10.1145/3613905.3650860

doi:10.1145/3613905.3650860 2024
[12]

RepairAgent: An autonomous, LLM-based agent for program repair,

I. Bouzenia, P. Devanbu, and M. Pradel, “RepairAgent: An autonomous, LLM-based agent for program repair,” in Proceedings of 47th IEEE/ACM International Conference on Software Engineering (ICSE’25), 2025, p. 2188–2200. [Online]. Available: https://doi.org/10.1109/ICSE55347.2025.00157

doi:10.1109/icse55347.2025.00157 2025
[13]

An LLM-based agent-oriented approach for automated code design issue localization,

F. Batole, D. O’Brien, T. N. Nguyen, R. Dyer, and H. Rajan, “An LLM-based agent-oriented approach for automated code design issue localization,” inProceedings of 47th IEEE/ACM International Conference on Software Engineering (ICSE’25), 2025, pp. 1320–1332. [Online]. Available: https://doi.org/10.1109/ICSE55347.2025.00100

doi:10.1109/icse55347.2025.00100 2025
[14]

Advanced smart contract vulnerability detection via LLM-powered multi-agent systems,

S. Cheng, Y . Duan, Y . Li, L. Chen, Y . Xiao, Q. Li, L. Lin, Y . Jiang, and J. Zhao, “Advanced smart contract vulnerability detection via LLM-powered multi-agent systems,”IEEE Transactions on Software Engineering, vol. 51, no. 10, pp. 2830–2846, 2025. [Online]. Available: https://doi.org/10.1109/TSE.2025.3597319

doi:10.1109/tse.2025.3597319 2025
[15]

Exploring LLM-based agents for root cause analysis,

D. Roy, X. Zhang, R. Bhave, C. Bansal, P. Las-Casas, R. Fonseca, and S. Rajmohan, “Exploring LLM-based agents for root cause analysis,” in Companion Proceedings of the ACM on Software Engineering, 2024, pp. 656–660. [Online]. Available: https://doi.org/10.1145/3663529.3663841

doi:10.1145/3663529.3663841 2024
[16]

The impact of critique on LLM-based model generation from natural language: The case of activity diagrams,

P. Khamsepour, M. Cole, I. Ashraf, S. Puri, M. Sabetzadeh, and S. Nejati, “The impact of critique on LLM-based model generation from natural language: The case of activity diagrams,” arXiv preprint, vol. abs/2509.03463, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2509.03463

doi:10.48550/arxiv.2509.03463 2025
[17]

DSL or Code? Evaluating the quality of LLM-generated algebraic specifications: A case study in optimization at Kinaxis,

N. Ayoughi, D. Dewar, S. Nejati, and M. Sabetzadeh, “DSL or Code? Evaluating the quality of LLM-generated algebraic specifications: A case study in optimization at Kinaxis,” inProceedings of 48th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’26), 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2601.00469

doi:10.48550/arxiv.2601.00469 2026
[18]

AutoGen: Enabling next-gen LLM applications via multi- agent conversation,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi- agent conversation,” 2023, arXiv:2308.08155 [cs]. [Online]. Available: https://doi.org/10.48550/arXiv.2308.08155

Pith/arXiv arXiv doi:10.48550/arxiv.2308.08155 2023
[19]

Use of personas in requirements engineering: A systematic mapping study,

D. Karolita, J. McIntosh, T. Kanij, J. Grundy, and H. O. Obie, “Use of personas in requirements engineering: A systematic mapping study,” Information and Software Technology, vol. 162, p. 107264, 2023. [Online]. Available: https://doi.org/10.1016/j.infsof.2023.107264

doi:10.1016/j.infsof.2023.107264 2023
[20]

What’s in a persona? A preliminary taxonomy from persona use in requirements engineering,

D. Karolita, J. Grundy, T. Kanij, H. Obie, and J. McIntosh, “What’s in a persona? A preliminary taxonomy from persona use in requirements engineering,” inProceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE’23), 2023, pp. 39–51. [Online]. Available: https://doi.org/10.5220/0011708500003464

doi:10.5220/0011708500003464 2023
[21]

Agentic software engineering: Foundational pillars and a research roadmap,

A. E. Hassan, H. Li, D. Lin, B. Adams, T.-H. Chen, Y . Kashiwa, and D. Qiu, “Agentic software engineering: Foundational pillars and a research roadmap,”arXiv preprint, vol. 2509.06216, 2025, preprint. [Online]. Available: https://doi.org/10.48550/arXiv.2509.06216

Pith/arXiv arXiv doi:10.48550/arxiv.2509.06216 2025
[22]

Online repository for PerGent,

M. H. Amini, S. Nejati, and M. Sabetzadeh, “Online repository for PerGent,” https://github.com/M-H-Amini/PerGent, 2026

2026
[23]

Dated data: Tracing knowledge cutoffs in large language models,

J. Cheng, M. Marone, O. Weller, D. Lawrie, D. Khashabi, and B. Van Durme, “Dated data: Tracing knowledge cutoffs in large language models,”arXiv preprint arXiv:2403.12958, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2403.12958

doi:10.48550/arxiv.2403.12958 2024
[24]

Judging LLM-as-a-judge with MT-bench and chatbot arena,

L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging LLM-as-a-judge with MT-bench and chatbot arena,” inAdvances in Neural Information Processing Systems 36 (NeurIPS’23), vol. 36, 2023, pp. 46 595–46 623. [Online]. Available: https://doi.org/10.48550/arXiv.2306.05685

Pith/arXiv arXiv doi:10.48550/arxiv.2306.05685 2023
[25]

iKnow: An intent-guided chatbot for cloud operations with retrieval-augmented generation,

J. Huang, Y . Zhong, G. Yu, Z. Jiang, M. Yan, W. Luan, T. Yang, R. Ren, and M. R. Lyu, “iKnow: An intent-guided chatbot for cloud operations with retrieval-augmented generation,” inProceedings of 40th IEEE/ACM International Conference on Automated Software Engineering (ASE’25), 2025, pp. 958–970. [Online]. Available: https://doi.org/10.1109/ASE63991.2025.00084

doi:10.1109/ase63991.2025.00084 2025
[26]

Krippendorff,Content Analysis: An Introduction to Its Methodology, 4th ed

K. Krippendorff,Content Analysis: An Introduction to Its Methodology, 4th ed. SAGE Publications, 2018

2018
[27]

Wilcoxon signed-rank test,

D. Rey and M. Neuh ¨auser, “Wilcoxon signed-rank test,” inInternational Encyclopedia of Statistical Science, M. Lovric, Ed., 2011, pp. 1658–

2011
[28]

Available: https://doi.org/10.1007/978-3-642-04898-2 616

[Online]. Available: https://doi.org/10.1007/978-3-642-04898-2 616

doi:10.1007/978-3-642-04898-2
[29]

A critique and improvement of the CL common language effect size statistics of McGraw and Wong,

A. Vargha and H. D. Delaney, “A critique and improvement of the CL common language effect size statistics of McGraw and Wong,”Journal of Educational and Behavioral Statistics, vol. 25, no. 2, pp. 101–132,
[30]

Available: https://doi.org/10.3102/10769986025002101

[Online]. Available: https://doi.org/10.3102/10769986025002101

doi:10.3102/10769986025002101

[1] [1]

Personacraft: Leveraging language models for data-driven persona development,

S. Jung, J. Salminen, K. K. Aldous, and B. J. Jansen, “Personacraft: Leveraging language models for data-driven persona development,” International Journal of Human-Computer Studies, vol. 197, p. 103445,

[2] [2]

Available: https://doi.org/10.1016/j.ijhcs.2025.103445

[Online]. Available: https://doi.org/10.1016/j.ijhcs.2025.103445

doi:10.1016/j.ijhcs.2025.103445 2025

[3] [3]

Deus ex machina and personas from large language models: Investigating the composition of AI-generated persona descriptions,

J. Salminen, C. Liu, W. Pian, J. Chi, E. H ¨ayh¨anen, and B. J. Jansen, “Deus ex machina and personas from large language models: Investigating the composition of AI-generated persona descriptions,” inProceedings of the CHI Conference on Human Factors in Computing Systems, 2024. [Online]. Available: https: //doi.org/10.1145/3613904.3642036

doi:10.1145/3613904.3642036 2024

[4] [4]

Who uses personas in requirements engineering: The practitioners’ perspective,

Y . Wang, C. Arora, X. Liu, T. Hoang, V . Malhotra, B. Cheng, and J. C. Grundy, “Who uses personas in requirements engineering: The practitioners’ perspective,”Information and Software Technology, vol. 178, p. 107609, 2025. [Online]. Available: https://doi.org/10.1016/j. infsof.2024.107609

doi:10.1016/j 2025

[5] [5]

Personagen: A tool for generating personas from user feedback,

X. Zhang, L. Liu, Y . Wang, X. Liu, H. Wang, A. Ren, and C. Arora, “Personagen: A tool for generating personas from user feedback,” inProceedings of 31st IEEE International Requirements Engineering Conference (RE’23), 2023, pp. 353–354. [Online]. Available: https://doi.org/10.1109/RE57278.2023.00048

doi:10.1109/re57278.2023.00048 2023

[6] [6]

Cooper,The Inmates Are Running the Asylum: Why High-Tech Products Drive Us Crazy and How to Restore the Sanity

A. Cooper,The Inmates Are Running the Asylum: Why High-Tech Products Drive Us Crazy and How to Restore the Sanity. Sams Publishing, 1999

1999

[7] [7]

Understanding human-AI workflows for generating personas,

J. Shin, M. A. Hedderich, B. J. Rey, A. Lucero, and A. Oulasvirta, “Understanding human-AI workflows for generating personas,” in Proceedings of the 2024 ACM Designing Interactive Systems Conference, 2024, pp. 757–781. [Online]. Available: https://doi.org/10. 1145/3643834.3660729

arXiv 2024

[8] [8]

Imaginary people representing real numbers: Generating personas from online social media data,

J. An, H. Kwak, S.-G. Jung, J. Salminen, M. Ahmad, and B. J. Jansen, “Imaginary people representing real numbers: Generating personas from online social media data,”ACM Transactions on the Web, vol. 12, no. 4, 2018. [Online]. Available: https://doi.org/10.1145/3265986

doi:10.1145/3265986 2018

[9] [9]

From flat file to interface: Synthesis of personas and analytics for enhanced user understanding,

B. J. Jansen, S. Jung, and J. Salminen, “From flat file to interface: Synthesis of personas and analytics for enhanced user understanding,”Proceedings of the Association for Information Science and Technology, vol. 57, no. 1, 2020. [Online]. Available: https://doi.org/10.1002/pra2.215

doi:10.1002/pra2.215 2020

[10] [10]

Automatic persona generation (apg): A rationale and demonstration,

S. Jung, J. Salminen, H. Kwak, J. An, and B. J. Jansen, “Automatic persona generation (apg): A rationale and demonstration,” inExtended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, 2018, p. 321–324. [Online]. Available: https://doi.org/10.1145/3176349.3176893

doi:10.1145/3176349.3176893 2018

[11] [11]

Generating personas using LLMs and assessing their viability,

A. Schuller, D. Janssen, J. Blumenr ¨other, T. M. Probst, M. Schmidt, and C. Kumar, “Generating personas using LLMs and assessing their viability,” inExtended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024. [Online]. Available: https://doi.org/10.1145/3613905.3650860

doi:10.1145/3613905.3650860 2024

[12] [12]

RepairAgent: An autonomous, LLM-based agent for program repair,

I. Bouzenia, P. Devanbu, and M. Pradel, “RepairAgent: An autonomous, LLM-based agent for program repair,” in Proceedings of 47th IEEE/ACM International Conference on Software Engineering (ICSE’25), 2025, p. 2188–2200. [Online]. Available: https://doi.org/10.1109/ICSE55347.2025.00157

doi:10.1109/icse55347.2025.00157 2025

[13] [13]

An LLM-based agent-oriented approach for automated code design issue localization,

F. Batole, D. O’Brien, T. N. Nguyen, R. Dyer, and H. Rajan, “An LLM-based agent-oriented approach for automated code design issue localization,” inProceedings of 47th IEEE/ACM International Conference on Software Engineering (ICSE’25), 2025, pp. 1320–1332. [Online]. Available: https://doi.org/10.1109/ICSE55347.2025.00100

doi:10.1109/icse55347.2025.00100 2025

[14] [14]

Advanced smart contract vulnerability detection via LLM-powered multi-agent systems,

S. Cheng, Y . Duan, Y . Li, L. Chen, Y . Xiao, Q. Li, L. Lin, Y . Jiang, and J. Zhao, “Advanced smart contract vulnerability detection via LLM-powered multi-agent systems,”IEEE Transactions on Software Engineering, vol. 51, no. 10, pp. 2830–2846, 2025. [Online]. Available: https://doi.org/10.1109/TSE.2025.3597319

doi:10.1109/tse.2025.3597319 2025

[15] [15]

Exploring LLM-based agents for root cause analysis,

D. Roy, X. Zhang, R. Bhave, C. Bansal, P. Las-Casas, R. Fonseca, and S. Rajmohan, “Exploring LLM-based agents for root cause analysis,” in Companion Proceedings of the ACM on Software Engineering, 2024, pp. 656–660. [Online]. Available: https://doi.org/10.1145/3663529.3663841

doi:10.1145/3663529.3663841 2024

[16] [16]

The impact of critique on LLM-based model generation from natural language: The case of activity diagrams,

P. Khamsepour, M. Cole, I. Ashraf, S. Puri, M. Sabetzadeh, and S. Nejati, “The impact of critique on LLM-based model generation from natural language: The case of activity diagrams,” arXiv preprint, vol. abs/2509.03463, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2509.03463

doi:10.48550/arxiv.2509.03463 2025

[17] [17]

DSL or Code? Evaluating the quality of LLM-generated algebraic specifications: A case study in optimization at Kinaxis,

N. Ayoughi, D. Dewar, S. Nejati, and M. Sabetzadeh, “DSL or Code? Evaluating the quality of LLM-generated algebraic specifications: A case study in optimization at Kinaxis,” inProceedings of 48th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’26), 2026. [Online]. Available: https://doi.org/10.48550/arXiv.2601.00469

doi:10.48550/arxiv.2601.00469 2026

[18] [18]

AutoGen: Enabling next-gen LLM applications via multi- agent conversation,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi- agent conversation,” 2023, arXiv:2308.08155 [cs]. [Online]. Available: https://doi.org/10.48550/arXiv.2308.08155

Pith/arXiv arXiv doi:10.48550/arxiv.2308.08155 2023

[19] [19]

Use of personas in requirements engineering: A systematic mapping study,

D. Karolita, J. McIntosh, T. Kanij, J. Grundy, and H. O. Obie, “Use of personas in requirements engineering: A systematic mapping study,” Information and Software Technology, vol. 162, p. 107264, 2023. [Online]. Available: https://doi.org/10.1016/j.infsof.2023.107264

doi:10.1016/j.infsof.2023.107264 2023

[20] [20]

What’s in a persona? A preliminary taxonomy from persona use in requirements engineering,

D. Karolita, J. Grundy, T. Kanij, H. Obie, and J. McIntosh, “What’s in a persona? A preliminary taxonomy from persona use in requirements engineering,” inProceedings of the 18th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE’23), 2023, pp. 39–51. [Online]. Available: https://doi.org/10.5220/0011708500003464

doi:10.5220/0011708500003464 2023

[21] [21]

Agentic software engineering: Foundational pillars and a research roadmap,

A. E. Hassan, H. Li, D. Lin, B. Adams, T.-H. Chen, Y . Kashiwa, and D. Qiu, “Agentic software engineering: Foundational pillars and a research roadmap,”arXiv preprint, vol. 2509.06216, 2025, preprint. [Online]. Available: https://doi.org/10.48550/arXiv.2509.06216

Pith/arXiv arXiv doi:10.48550/arxiv.2509.06216 2025

[22] [22]

Online repository for PerGent,

M. H. Amini, S. Nejati, and M. Sabetzadeh, “Online repository for PerGent,” https://github.com/M-H-Amini/PerGent, 2026

2026

[23] [23]

Dated data: Tracing knowledge cutoffs in large language models,

J. Cheng, M. Marone, O. Weller, D. Lawrie, D. Khashabi, and B. Van Durme, “Dated data: Tracing knowledge cutoffs in large language models,”arXiv preprint arXiv:2403.12958, 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2403.12958

doi:10.48550/arxiv.2403.12958 2024

[24] [24]

Judging LLM-as-a-judge with MT-bench and chatbot arena,

L. Zheng, W.-L. Chiang, Y . Sheng, S. Zhuang, Z. Wu, Y . Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, H. Zhang, J. E. Gonzalez, and I. Stoica, “Judging LLM-as-a-judge with MT-bench and chatbot arena,” inAdvances in Neural Information Processing Systems 36 (NeurIPS’23), vol. 36, 2023, pp. 46 595–46 623. [Online]. Available: https://doi.org/10.48550/arXiv.2306.05685

Pith/arXiv arXiv doi:10.48550/arxiv.2306.05685 2023

[25] [25]

iKnow: An intent-guided chatbot for cloud operations with retrieval-augmented generation,

J. Huang, Y . Zhong, G. Yu, Z. Jiang, M. Yan, W. Luan, T. Yang, R. Ren, and M. R. Lyu, “iKnow: An intent-guided chatbot for cloud operations with retrieval-augmented generation,” inProceedings of 40th IEEE/ACM International Conference on Automated Software Engineering (ASE’25), 2025, pp. 958–970. [Online]. Available: https://doi.org/10.1109/ASE63991.2025.00084

doi:10.1109/ase63991.2025.00084 2025

[26] [26]

Krippendorff,Content Analysis: An Introduction to Its Methodology, 4th ed

K. Krippendorff,Content Analysis: An Introduction to Its Methodology, 4th ed. SAGE Publications, 2018

2018

[27] [27]

Wilcoxon signed-rank test,

D. Rey and M. Neuh ¨auser, “Wilcoxon signed-rank test,” inInternational Encyclopedia of Statistical Science, M. Lovric, Ed., 2011, pp. 1658–

2011

[28] [28]

Available: https://doi.org/10.1007/978-3-642-04898-2 616

[Online]. Available: https://doi.org/10.1007/978-3-642-04898-2 616

doi:10.1007/978-3-642-04898-2

[29] [29]

A critique and improvement of the CL common language effect size statistics of McGraw and Wong,

A. Vargha and H. D. Delaney, “A critique and improvement of the CL common language effect size statistics of McGraw and Wong,”Journal of Educational and Behavioral Statistics, vol. 25, no. 2, pp. 101–132,

[30] [30]

Available: https://doi.org/10.3102/10769986025002101

[Online]. Available: https://doi.org/10.3102/10769986025002101

doi:10.3102/10769986025002101