Does Persona Make LLMs K-pop Fans? A Pilot Study of LLM-Based Online Concert Audience Agents
Pith reviewed 2026-06-27 20:41 UTC · model grok-4.3
The pith
Persona conditioning makes LLM-generated K-pop fan chat appear more natural without increasing social connectedness or engagement.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Persona conditioning substantially improved model-level chat quality and perceived naturalness, but did not translate into differences in social connectedness, engagement, or affective response. Interviews suggest that online K-pop concert chat may operate as collective monologue rather than interpersonal dialogue, and that meaningful participation depends on shared identification with the specific artist and fandom.
What carries the argument
A multi-agent system of ten LLM agents generating real-time fan chat messages next to a K-pop performance video, comparing agents given distinct fan identities and chat styles against a no-persona baseline.
Load-bearing premise
The study assumes that ratings from eleven participants in a within-subjects design can reliably detect or rule out differences in social connectedness, engagement, and affective response.
What would settle it
Re-running the experiment with a larger sample of K-pop fans and finding statistically significant gains in social connectedness or engagement scores for the persona condition would challenge the central claim.
Figures
read the original abstract
A concert is a collective experience, but recorded performance videos are typically watched alone, stripping away the shared audience presence that makes concerts feel eventful. We investigate whether persona-based LLM audience agents can recreate aspects of this collective experience by generating real-time fan chat alongside a K-pop performance video. We present a multi-agent system in which ten LLM agents react through live-chat messages, comparing a persona-conditioned audience (each agent assigned a distinct fan identity, bias, and chat style) with a no-persona baseline. In a within-subjects pilot with K-pop fans (N=11), persona conditioning substantially improved model-level chat quality and perceived naturalness, but did not translate into differences in social connectedness, engagement, or affective response. Interviews suggest that online K-pop concert chat may operate as collective monologue rather than interpersonal dialogue, and that meaningful participation depends on shared identification with the specific artist and fandom. Persona conditioning can make LLM audiences appear more natural, but culturally meaningful collective experience may require deeper alignment between persona, crowd behavior, fandom identity, and user expectations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a pilot study of a multi-agent LLM system that generates real-time fan chat alongside K-pop performance videos. It compares a persona-conditioned condition (ten agents each assigned distinct fan identities, biases, and chat styles) against a no-persona baseline. In a within-subjects experiment with N=11 K-pop fans, persona conditioning improved model-level chat quality and perceived naturalness, but produced no detectable differences in social connectedness, engagement, or affective response. Qualitative interviews suggest online concert chat functions more as collective monologue than interpersonal dialogue and requires shared fandom identification for meaningful participation.
Significance. If the results hold, the work supplies initial empirical evidence that persona conditioning can enhance surface-level naturalness of LLM agents but may be insufficient for recreating culturally meaningful collective experiences in virtual fan settings. The pilot design and interview insights offer concrete directions for HCI research on LLM-based social simulations, particularly the distinction between monologue-style chat and dialogue-based engagement.
major comments (2)
- [Results (statistical analysis of null effects)] The central claim that persona conditioning 'did not translate into differences' in social connectedness, engagement, or affective response rests on N=11 within-subjects data. No power analysis, effect sizes, confidence intervals, or equivalence tests are reported for these null results, leaving open the possibility that practically meaningful effects went undetected.
- [Methods (dependent measures)] The measures of social connectedness, engagement, and affective response are used to support the key negative finding, yet the manuscript provides no information on their validation, reliability, or sensitivity in the context of a collective-experience hypothesis.
minor comments (2)
- [Abstract] The abstract states the sample size and design but could more explicitly note that the study is a pilot when summarizing the null results.
- [Methods] Clarify how 'model-level chat quality' was operationalized and scored, including any inter-rater or automated metrics used.
Simulated Author's Rebuttal
We thank the referee for the insightful comments on our pilot study. We agree that the small N and lack of statistical details for null effects, as well as missing information on measure validation, are limitations that warrant clarification and expansion in the manuscript. Below we respond point-by-point and commit to revisions that strengthen reporting while preserving the exploratory intent of the pilot.
read point-by-point responses
-
Referee: [Results (statistical analysis of null effects)] The central claim that persona conditioning 'did not translate into differences' in social connectedness, engagement, or affective response rests on N=11 within-subjects data. No power analysis, effect sizes, confidence intervals, or equivalence tests are reported for these null results, leaving open the possibility that practically meaningful effects went undetected.
Authors: We acknowledge the validity of this concern. The study was designed and presented as a pilot (N=11) to assess feasibility of the multi-agent system and to collect qualitative insights on fan chat dynamics, not to deliver powered hypothesis tests. That said, we agree that null results require more transparent reporting. In the revised manuscript we will add effect sizes (e.g., Cohen’s d) and 95% confidence intervals for all between-condition comparisons on the three dependent variables. We will also expand the limitations section to explicitly discuss the low statistical power and the risk of undetected effects, framing the quantitative results as preliminary. Post-hoc power analysis and equivalence testing can be included if the editor deems them essential, though we note that pre-specifying equivalence bounds would be more appropriate for a confirmatory follow-up study. revision: yes
-
Referee: [Methods (dependent measures)] The measures of social connectedness, engagement, and affective response are used to support the key negative finding, yet the manuscript provides no information on their validation, reliability, or sensitivity in the context of a collective-experience hypothesis.
Authors: The referee correctly identifies a gap in the current draft. The scales were drawn from established instruments in the social-presence and audience-experience literature; however, the manuscript does not cite their origins, report reliability in our sample, or discuss their sensitivity to collective fandom experiences. In revision we will (1) add citations and brief validation history for each measure, (2) report internal-consistency statistics (Cronbach’s alpha) calculated from the N=11 data, and (3) include a short paragraph addressing the measures’ applicability and potential limitations when testing a collective-experience hypothesis in a virtual-concert setting. These additions will better contextualize the null quantitative findings. revision: yes
Circularity Check
No circularity: empirical user study with direct observations
full rationale
This is a straightforward empirical pilot study: the authors implement a multi-agent LLM chat system, run a within-subjects comparison of two conditions (persona vs. baseline), collect ratings and interviews from N=11 participants, and report observed differences (or lack thereof) on chat quality, naturalness, connectedness, engagement, and affect. No equations, parameter fittings presented as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. All claims rest on direct measurement rather than reduction to prior inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents can approximate real fan chat behavior when conditioned with personas
Reference graph
Works this paper leans on
-
[1]
2022 , publisher=
Liveness: Performance in a mediatized culture , author=. 2022 , publisher=
2022
-
[2]
International Journal of Human--Computer Interaction , volume=
Exploring the Dynamics of Chat Interaction in Live-Streamed Idol Concerts , author=. International Journal of Human--Computer Interaction , volume=. 2025 , publisher=
2025
-
[3]
The Routledge companion to music cognition , pages=
Music alone and with others: Listening, sharing, and celebrating , author=. The Routledge companion to music cognition , pages=. 2017 , publisher=
2017
-
[4]
Music has no borders
“Music has no borders”: An exploratory study of audience engagement with YouTube music broadcasts during COVID-19 lockdown, 2020 , author=. Frontiers in Psychology , volume=. 2021 , publisher=
2020
-
[5]
Qualitative research in psychology , volume=
Using thematic analysis in psychology , author=. Qualitative research in psychology , volume=. 2006 , publisher=
2006
-
[6]
Frontiers in Psychology , volume=
Audience reconstructed: social media interaction by BTS fans during live stream concerts , author=. Frontiers in Psychology , volume=. 2024 , publisher=
2024
-
[7]
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , pages=
Concert interaction translation: Augmenting VR live concert experience using chat-driven artificial collective reactions , author=. Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , pages=
-
[8]
Proceedings of the 33rd ACM International Conference on Multimedia , pages=
Bring the VibeOn: Designing a Multimodal Interface for Shared Emotional Experiences in Live-streamed Concerts , author=. Proceedings of the 33rd ACM International Conference on Multimedia , pages=
-
[9]
International Journal of Human--Computer Interaction , volume=
Watching a movie alone yet together: understanding reasons for watching Danmaku videos , author=. International Journal of Human--Computer Interaction , volume=. 2017 , publisher=
2017
-
[10]
ACM Transactions on Social Computing , volume=
Danmaku: A new paradigm of social interaction via online videos , author=. ACM Transactions on Social Computing , volume=. 2019 , publisher=
2019
-
[11]
Proceedings of the ACM on Human-Computer Interaction , volume=
The virtual concert-goer: Audience perspectives on remote music performances , author=. Proceedings of the ACM on Human-Computer Interaction , volume=. 2025 , publisher=
2025
-
[12]
Personality and Social Psychology Bulletin , volume=
Let the music play: Live music fosters collective effervescence and leads to lasting positive outcomes , author=. Personality and Social Psychology Bulletin , volume=. 2026 , publisher=
2026
-
[13]
Advances in Neural Information Processing Systems , volume=
Llm generated persona is a promise with a catch , author=. Advances in Neural Information Processing Systems , volume=
-
[14]
Findings of the Association for Computational Linguistics: ACL 2024 , pages=
Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models , author=. Findings of the Association for Computational Linguistics: ACL 2024 , pages=
2024
-
[15]
Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=
When” a helpful assistant” is not really helpful: Personas in system prompts do not improve performances of large language models , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=
2024
-
[16]
Political Analysis , volume=
Out of one, many: Using language models to simulate human samples , author=. Political Analysis , volume=. 2023 , publisher=
2023
-
[17]
2024 , institution=
Automated social science: Language models as scientist and subjects , author=. 2024 , institution=
2024
-
[18]
Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
-
[19]
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=
Character-llm: A trainable agent for role-playing , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=
2023
-
[20]
Characterbox: Evaluating the role-playing capabilities of llms in text-based virtual worlds , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=
2025
-
[21]
, author=
Inclusion of other in the self scale and the structure of interpersonal closeness. , author=. Journal of personality and social psychology , volume=. 1992 , publisher=
1992
-
[22]
International Journal of Human-Computer Studies , volume=
A practical approach to measuring user engagement with the refined user engagement scale (UES) and new UES short form , author=. International Journal of Human-Computer Studies , volume=. 2018 , publisher=
2018
-
[23]
Journal of behavior therapy and experimental psychiatry , volume=
Measuring emotion: the self-assessment manikin and the semantic differential , author=. Journal of behavior therapy and experimental psychiatry , volume=. 1994 , publisher=
1994
-
[24]
The 41st international ACM SIGIR conference on research & development in information retrieval , pages=
Texygen: A benchmarking platform for text generation models , author=. The 41st international ACM SIGIR conference on research & development in information retrieval , pages=
-
[25]
Frontiers in Artificial Intelligence , volume=
Computational hermeneutics: evaluating generative AI as a cultural technology , author=. Frontiers in Artificial Intelligence , volume=. 2026 , publisher=
2026
-
[26]
Advances in neural information processing systems , volume=
In-context impersonation reveals large language models' strengths and biases , author=. Advances in neural information processing systems , volume=
-
[27]
Journal of medical Internet research , volume=
Prompt engineering as an important emerging skill for medical professionals: tutorial , author=. Journal of medical Internet research , volume=. 2023 , publisher=
2023
-
[28]
2019 , month = oct, howpublished =
Erica Russell , title =. 2019 , month = oct, howpublished =
2019
-
[29]
The journal of positive psychology , volume=
Thematic analysis , author=. The journal of positive psychology , volume=. 2017 , publisher=
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.