Recognition: unknown
How Personal Characteristics Shape User Exploration of Diverse Movie Recommendations with a LLM-Based Multi-Agent System
Pith reviewed 2026-05-08 02:10 UTC · model grok-4.3
The pith
LLM multi-agent systems raise perceived novelty and diversity in movie recommendations compared to single-agent baselines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The multi-agent system significantly increases Perceived Novelty and Shannon Diversity. Conscientiousness is positively associated with Perceived Accuracy and diversity, whereas extraversion is negatively associated with Perceived Diversity. Prior experience with GenAI-based recommendations is positively associated with Shannon Diversity, while skepticism toward GenAI is negatively associated with it. Significant interaction effects exist between system design and user characteristics.
What carries the argument
The LLM-based multi-agent system for movie recommendations, which coordinates multiple specialized agents to generate suggestions versus a single-agent baseline.
If this is right
- Recommender systems may benefit from multi-agent coordination to surface less obvious but relevant items.
- Designs should incorporate personality assessment to adjust diversity levels for different users.
- Skepticism toward AI can reduce engagement with diverse outputs, suggesting a need for transparency features.
- Prior experience with generative AI tools predicts greater acceptance of novel recommendations.
Where Pith is reading between the lines
- Systems could detect user traits in real time and switch between single- and multi-agent modes dynamically.
- The same multi-agent approach might increase content exploration in domains such as music playlists or news feeds.
- Longer-term deployments could test whether higher diversity exposure changes actual viewing habits over weeks.
Load-bearing premise
The measured differences in perceived novelty, diversity, and accuracy are caused by the multi-agent architecture itself rather than unmeasured differences in how the systems were implemented or how participants interpreted the questions.
What would settle it
An experiment that holds all implementation details constant except the number of agents and still finds no difference in Shannon diversity or perceived novelty scores would falsify the central claim.
Figures
read the original abstract
Diversity is an important evaluation criterion for recommender systems beyond accuracy, yet users differ in their willingness to engage with novel and diverse content. In this work, we investigate how a Large Language Model (LLM)-based multi-agent system supports users' exploration of diverse recommendations, and how individual characteristics shape user experiences. We conducted a between-subjects user study (N = 100) comparing a single-agent system (baseline) with a multi-agent system for movie recommendations. We measured Perceived Accuracy, diversity, novelty, and overall rating, and examined the influence of personal characteristics, including personality traits, demographics, GenAI recommendation experience, and GenAI skepticism. Results show that the multi-agent system significantly increases Perceived Novelty and Shannon Diversity. Conscientiousness is positively associated with Perceived Accuracy and diversity, whereas extraversion is negatively associated with Perceived Diversity. Prior experience with GenAI-based recommendations is positively associated with Shannon Diversity, while skepticism toward GenAI is negatively associated with it. We also observe significant interaction effects between system design and user characteristics. These findings highlight the importance of personality-aware conversational recommender systems and caution against one-size-fits-all multi-agent designs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports a between-subjects user study (N=100) comparing a single-agent LLM baseline to a multi-agent LLM system for movie recommendations. It claims the multi-agent system produces significantly higher Perceived Novelty and Shannon Diversity, with conscientiousness positively associated with Perceived Accuracy and diversity, extraversion negatively associated with Perceived Diversity, prior GenAI experience positively associated with Shannon Diversity, and GenAI skepticism negatively associated with it; interaction effects between system type and user characteristics are also reported.
Significance. If the experimental conditions are shown to differ only in agent architecture, the results would usefully extend work on conversational recommenders by demonstrating how multi-agent LLM designs can increase exploration metrics and by documenting moderation by personality traits and GenAI attitudes. The N=100 between-subjects design is adequate for detecting main effects and provides direct empirical measurements rather than model-derived predictions.
major comments (2)
- [System Design / Methods] The central claim that observed increases in Perceived Novelty and Shannon Diversity are attributable to the multi-agent architecture (rather than other implementation differences) requires explicit demonstration that prompts, LLM call budgets, temperature settings, response formatting, and recommendation-generation logic were held constant across conditions. The manuscript must report these controls in the system-description or methods section; without them the reported main effects and personality interactions could be artifacts of unmatched implementation details.
- [Results] The abstract states that significant interaction effects exist between system design and user characteristics, yet the manuscript must specify the exact statistical tests, effect sizes, degrees of freedom, and any correction for multiple comparisons applied to these interactions and the personality associations. Without this information the robustness of the moderation claims cannot be evaluated.
minor comments (1)
- [Abstract] The abstract would benefit from a brief statement of the measurement instruments (e.g., exact Likert scales or diversity formulas) used for Perceived Novelty, Accuracy, and Shannon Diversity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We agree that greater transparency in system controls and statistical reporting will strengthen the manuscript. We address each major comment below and will incorporate revisions accordingly.
read point-by-point responses
-
Referee: [System Design / Methods] The central claim that observed increases in Perceived Novelty and Shannon Diversity are attributable to the multi-agent architecture (rather than other implementation differences) requires explicit demonstration that prompts, LLM call budgets, temperature settings, response formatting, and recommendation-generation logic were held constant across conditions. The manuscript must report these controls in the system-description or methods section; without them the reported main effects and personality interactions could be artifacts of unmatched implementation details.
Authors: We agree that explicit documentation of matched implementation details is necessary to support causal attribution to the multi-agent architecture. In the study, both conditions used the same underlying GPT-4 model, identical temperature (0.7), the same movie database and user profile inputs, equivalent response formatting instructions, and matched recommendation-generation logic. The sole intended difference was the introduction of agent collaboration and role specialization in the multi-agent condition. We acknowledge that the original manuscript described the systems at a high level without a side-by-side parameter table. We will revise the Methods section to include such a table (or enumerated list) confirming that prompt lengths and specificity were comparable (role-specific adaptations only), API call budgets were equivalent per recommendation, and no other systematic differences existed. This revision will directly address the concern. revision: yes
-
Referee: [Results] The abstract states that significant interaction effects exist between system design and user characteristics, yet the manuscript must specify the exact statistical tests, effect sizes, degrees of freedom, and any correction for multiple comparisons applied to these interactions and the personality associations. Without this information the robustness of the moderation claims cannot be evaluated.
Authors: We agree that full statistical transparency is required. The interactions were examined via moderated multiple regression (system type dummy-coded and interacted with each continuous user characteristic in separate models per DV). Main-effect personality associations were assessed with linear regression (or Pearson correlation where appropriate). We will expand the Results section to report, for each test: the exact model (e.g., moderated regression), F or t statistics, degrees of freedom, p-values, effect sizes (partial eta-squared or R^2 change), and whether a multiple-comparison correction (Bonferroni across the family of interaction tests) was applied. If any test did not survive correction, we will note it. These additions will allow readers to fully evaluate the moderation claims. revision: yes
Circularity Check
No circularity: empirical user study with direct measurements
full rationale
The paper reports results from a between-subjects user study (N=100) that directly measures Perceived Accuracy, diversity, novelty, and ratings, then performs statistical analysis of associations with personality traits, demographics, GenAI experience, and skepticism. No equations, derivations, fitted models, or predictions are present that could reduce to inputs by construction. Central claims rest on observed data differences between single-agent and multi-agent conditions rather than any self-definitional logic, self-citation chains, or renamed known results. The design is self-contained against external benchmarks of user perception and does not invoke load-bearing uniqueness theorems or ansatzes from prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions underlying between-subjects statistical tests (e.g., independence of observations, approximate normality for t-tests or ANOVA)
Reference graph
Works this paper leans on
-
[1]
M Mehdi Afsar, Trafford Crump, and Behrouz Far. 2022. Reinforcement learning based recommender systems: A survey.Comput. Surveys55, 7 (2022), 1–38
2022
-
[2]
Elina Maria Ahokas. 2025. The Influence of AI Literacy on User Preferences for Explainable AI in Recommender Systems
2025
-
[3]
Gabrielle Aparecida Pires Alves, Dietmar Jannach, Rodrigo Ferrari de Souza, Daniela Damian, and Marcelo Garcia Manzato. 2024. Digitally nudging users to explore off-profile recommendations: here be dragons.User Modeling and User-Adapted Interaction34, 2 (2024), 441–481
2024
-
[4]
Qazi Mohammad Areeb, Mohammad Nadeem, Shahab Saquib Sohail, Raza Imam, Faiyaz Doctor, Yassine Himeur, Amir Hussain, and Abbes Amira. 2023. Filter bubbles in recommender systems: Fact or fallacy—A systematic review.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery13, 6 (2023), e1512
2023
-
[5]
Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. InProceedings of the 21st Annual International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval(Melbourne, Australia)(SIGIR ’98). Association for Computing Machinery, New York, NY, USA, 335–336. d...
-
[6]
Diego Carraro and Derek Bridge. 2025. Enhancing recommendation diversity by re-ranking with large language models.ACM Transactions on Recommender Systems4, 2 (2025), 1–40
2025
-
[7]
Pablo Castells, Neil Hurley, and Saul Vargas. 2021. Novelty and diversity in recommender systems. InRecommender systems handbook. Springer, 603–646
2021
-
[8]
Li Chen, Wen Wu, and Liang He. 2013. How personality influences users’ needs for recommendation diversity? InCHI’13 extended abstracts on human factors in computing systems. 829–834
2013
- [9]
-
[10]
Ying-Chih Chen, Matthew J Benus, and Jaclyn Hernandez. 2019. Managing uncertainty in scientific argumentation.Science Education103, 5 (2019), 1235– 1276
2019
-
[11]
Sahraoui Dhelim, Nyothiri Aung, Mohammed Amine Bouras, Huansheng Ning, and Erik Cambria. 2022. A survey on personality-aware recommendation systems. Artificial Intelligence Review55, 3 (2022), 2409–2454
2022
-
[12]
Patrik Dokoupil, Ludovico Boratto, and Ladislav Peska. 2024. User perceptions of diversity in recommender systems. InProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization. 212–222
2024
-
[13]
Michael D Ekstrand, F Maxwell Harper, Martijn C Willemsen, and Joseph A Konstan. 2014. User perception of differences in recommender algorithms. In Proceedings of the 8th ACM Conference on Recommender systems. 161–168
2014
-
[14]
Elena Ðerić, Domagoj Frank, and Marin Milković. 2025. Trust in generative AI tools: A comparative study of higher education students, teachers, and re- searchers.Information16, 7 (2025), 622
2025
-
[15]
Hans J Eysenck. 1991. Dimensions of personality: The biosocial approach to personality. InExplorations in temperament: International perspectives on theory and measurement. Springer, 87–103
1991
- [16]
-
[17]
Bruce Ferwerda, Mark P Graus, Andreu Vall, Marko Tkalcic, and Markus Schedl
-
[18]
In4th Workshop on Emotions and Personality in Personalized Systems (EMPIRE 2016), Boston, MA, USA, September 16th, 2016
The influence of users’ personality traits on satisfaction and attractiveness of diversified recommendation lists. In4th Workshop on Emotions and Personality in Personalized Systems (EMPIRE 2016), Boston, MA, USA, September 16th, 2016. 43–47
2016
-
[19]
Bruce Ferwerda and Markus Schedl. 2016. Personality-based user modeling for music recommender systems. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 254–257
2016
- [20]
-
[21]
Shuyu Guo, Shuo Zhang, Weiwei Sun, Pengjie Ren, Zhumin Chen, and Zhaochun Ren. 2023. Towards explainable conversational recommender systems. InProceed- ings of the 46th International ACM SIGIR conference on research and development in information retrieval. 2786–2795
2023
-
[22]
Min Hou, Le Wu, Yuxin Liao, Yonghui Yang, Zhen Zhang, Changlong Zheng, Han Wu, and Richang Hong. 2025. A survey on generative recommendation: Data, model, and tasks.arXiv preprint arXiv:2510.27157(2025)
work page internal anchor Pith review arXiv 2025
-
[23]
Rong Hu and Pearl Pu. 2011. Enhancing recommendation diversity with organi- zation interfaces. InProceedings of the 16th international conference on Intelligent user interfaces. 347–350
2011
-
[24]
Rong Hu and Pearl Pu. 2011. Helping users perceive recommendation diversity.. InDiveRS@ RecSys. 43–50
2011
-
[25]
Dietmar Jannach. 2023. Evaluating conversational recommender systems: A landscape of research.Artificial Intelligence Review56, 3 (2023), 2365–2400
2023
-
[26]
Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2021. A survey on conversational recommender systems.ACM Computing Surveys (CSUR)54, 5 (2021), 1–36
2021
-
[27]
Mathias Jesse, Christine Bauer, and Dietmar Jannach. 2023. Intra-list similarity and human diversity perceptions of recommendations: the details matter: M. Jesse et al.User Modeling and User-Adapted Interaction33, 4 (2023), 769–802
2023
-
[28]
Yucheng Jin, Li Chen, Wanling Cai, and Xianglin Zhao. 2024. CRS-Que: A user- centric evaluation framework for conversational recommender systems.ACM Transactions on Recommender Systems2, 1 (2024), 1–34
2024
-
[29]
Yucheng Jin, Nava Tintarev, Nyi Nyi Htun, and Katrien Verbert. 2020. Effects of personal characteristics in control-oriented user interfaces for music recom- mender systems: Y. Jin et al.User Modeling and User-Adapted Interaction30, 2 (2020), 199–249
2020
-
[30]
Yucheng Jin, Nava Tintarev, and Katrien Verbert. 2018. Effects of individual traits on diversity-aware music recommender user interfaces. InProceedings of the 26th Conference on User Modeling, Adaptation and Personalization. 291–299
2018
-
[31]
Yucheng Jin, Nava Tintarev, and Katrien Verbert. 2018. Effects of personal char- acteristics on music recommender systems with different levels of controllability. InProceedings of the 12th ACM Conference on Recommender Systems. 13–21
2018
-
[32]
2010.Handbook of personality: Theory and research
Oliver P John, Richard W Robins, and Lawrence A Pervin. 2010.Handbook of personality: Theory and research. Guilford Press
2010
-
[33]
Marius Kaminskas and Derek Bridge. 2016. Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems.ACM Transactions on Interactive Intelligent Systems (TiiS) 7, 1 (2016), 1–42
2016
-
[34]
Marius Kaminskas and Derek Bridge. 2016. Diversity, Serendipity, Novelty, and Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in Recommender Systems.ACM Trans. Interact. Intell. Syst.7, 1, Article 2 (Dec. 2016), 42 pages. doi:10.1145/2926720
-
[35]
Jie Kang, Kyle Condiff, Shuo Chang, Joseph A Konstan, Loren Terveen, and F Maxwell Harper. 2017. Understanding how people use natural language to ask for recommendations. InProceedings of the Eleventh ACM Conference on Recommender Systems. 229–237
2017
-
[36]
Komal Kapoor, Vikas Kumar, Loren Terveen, Joseph A Konstan, and Paul Schrater
-
[37]
I like to explore sometimes
" I like to explore sometimes" Adapting to Dynamic User Novelty Pref- erences. InProceedings of the 9th ACM Conference on Recommender Systems. 19–26
-
[38]
Alex Kulesza, Ben Taskar, et al. 2012. Determinantal point processes for machine learning.Foundations and Trends®in Machine Learning5, 2–3 (2012), 123–286
2012
-
[39]
Matevž Kunaver and Tomaž Požrl. 2017. Diversity in recommender systems–A survey.Knowledge-based systems123 (2017), 154–162
2017
-
[40]
Yu Liang. 2019. Recommender system for developing new preferences and goals. InProceedings of the 13th ACM Conference on Recommender Systems. 611–615
2019
-
[41]
Yu Liang and Martijn C Willemsen. 2023. Promoting music exploration through personalized nudging in a genre exploration recommender.International Journal of Human–Computer Interaction39, 7 (2023), 1495–1518
2023
- [42]
-
[43]
Raj Mahmud, Yufeng Wu, Abdullah Bin Sawad, Shlomo Berkovsky, Mukesh Prasad, and A Baki Kocaballi. 2025. Evaluating user experience in conversational recommender systems: A systematic review across classical and LLM-powered approaches. InProceedings of the 37th Australian Conference on Human-Computer Interaction. 81–93
2025
-
[44]
Mohammad Naiseh, Dena Al-Thani, Nan Jiang, and Raian Ali. 2023. How the different explanation classes impact trust calibration: The case of clinical decision support systems.International Journal of Human-Computer Studies169 (2023), 102941. UMAP ’26, June 08–11, 2026, Gothenburg, Sweden Zhou et al
2023
-
[45]
Tien T Nguyen, Pik-Mai Hui, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2014. Exploring the filter bubble: the effect of using recommender systems on content diversity. InProceedings of the 23rd international conference on World wide web. 677–686
2014
-
[46]
Tien T Nguyen, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2018. User personality and user satisfaction with recommender systems.Information systems frontiers20, 6 (2018), 1173–1189
2018
-
[47]
Rudolph L Philipp and Gerald JS Wilde. 1970. Stimulation seeking behaviour and extraversion.Acta Psychologica32 (1970), 269–280
1970
-
[48]
Behnam Rahdari, Branislav Kveton, and Peter Brusilovsky. 2022. The magic of carousels: Single vs. multi-list recommender systems. InProceedings of the 33rd ACM Conference on Hypertext and Social Media. 166–174
2022
-
[49]
Beatrice Rammstedt and Oliver P John. 2007. Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German.Journal of research in Personality41, 1 (2007), 203–212
2007
- [50]
-
[51]
Alan Said. 2025. On explaining recommendations with Large Language Models: a review.Frontiers in Big Data7 (2025), 1505284
2025
-
[52]
Navya Nishith Sharan and Daniela Maria Romano. 2020. The effects of personality and locus of control on trust in humans versus artificial intelligence.Heliyon6, 8 (2020)
2020
-
[53]
Sarama Shehmir and Rasha Kashef. 2025. LLM4Rec: A Comprehensive Sur- vey on the Integration of Large Language Models in Recommender Sys- tems—Approaches, Applications and Challenges.Future Internet17, 6 (2025), 252
2025
-
[54]
Aletta Smits, Ester Bartels, Chris Detweiler, and Koen van Turnhout. 2023. Results of the Workshop on Algorithmic Affordances in Recommender Interfaces. InIFIP Conference on Human-Computer Interaction. Springer, 165–172
2023
- [55]
-
[56]
Ruixuan Sun, Avinash Akella, Ruoyan Kong, Moyan Zhou, and Joseph A Kon- stan. 2024. Interactive content diversity and user exploration in online movie recommenders: A field experiment.International Journal of Human–Computer Interaction40, 22 (2024), 7233–7247
2024
-
[57]
Yueming Sun and Yi Zhang. 2018. Conversational recommender system. InThe 41st international acm sigir conference on research & development in information retrieval. 235–244
2018
-
[58]
Muh-Chyun Tang and I-Han Liao. 2022. Preference diversity and openness to novelty: Scales construction from the perspective of movie recommendation. Journal of the Association for Information Science and Technology73, 9 (2022), 1222–1235
2022
-
[59]
Nava Tintarev, Matt Dennis, and Judith Masthoff. 2013. Adapting recommenda- tion diversity to openness to experience: a study of human behaviour. InInter- national Conference on User Modeling, Adaptation, and Personalization. Springer, 190–202
2013
-
[60]
Chun-Hua Tsai and Peter Brusilovsky. 2017. Enhancing recommendation diver- sity through a dual recommendation interface. InCEUR Workshop Proceedings, Vol. 17
2017
-
[61]
Chun-Hua Tsai and Peter Brusilovsky. 2019. Exploring social recommendations with visual diversity-promoting interfaces.ACM Transactions on Interactive Intelligent Systems (TiiS)10, 1 (2019), 1–34
2019
-
[62]
Zihan Wang, Shi Feng, Daling Wang, Kaisong Song, Gang Wu, Yifei Zhang, Han Zhao, and Ge Yu. 2025. Diversity-enhanced conversational recommendation via multi-agent reinforcement learning.Knowledge and Information Systems(2025), 1–29
2025
-
[63]
Wen Wu, Li Chen, and Liang He. 2013. Using personality to adjust diversity in recommender systems. InProceedings of the 24th ACM Conference on Hypertext and Social Media(Paris, France)(HT ’13). Association for Computing Machinery, New York, NY, USA, 225–229. doi:10.1145/2481492.2481521
-
[64]
Wen Wu, Li Chen, and Yu Zhao. 2018. Personalizing recommendation diversity based on user personality.User Modeling and User-Adapted Interaction28, 3 (2018), 237–276
2018
- [65]
-
[66]
Sojeong Yun and Youn-kyung Lim. 2025. User Experience with LLM-powered Conversational Recommendation Systems: A Case of Music Recommendation. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–15
2025
-
[67]
Yizhe Zhang, Yucheng Jin, Li Chen, and Ting Yang. 2026. A cross-domain study on the user experience of ChatGPT-based recommendations.International Journal of Human-Computer Studies(2026), 103743
2026
-
[68]
Yu Zhang, Jingwei Sun, Li Feng, Cen Yao, Mingming Fan, Liuxin Zhang, Qianying Wang, Xin Geng, and Yong Rui. 2024. See widely, think wisely: Toward designing a generative multi-agent system to burst filter bubbles. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–24
2024
- [69]
-
[70]
Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo, Joseph Rushton Wakeling, and Yi-Cheng Zhang. 2010. Solving the apparent diversity-accuracy dilemma of recommender systems.Proceedings of the National Academy of Sciences107, 10 (2010), 4511–4515
2010
-
[71]
Yaochen Zhu, Harald Steck, Dawen Liang, Yinhan He, Nathan Kallus, and Jundong Li. 2025. Llm-based conversational recommendation agents with collaborative verbalized experience.Proceedings of the Proc. of EMNLP Findings(2025), 2207– 2220
2025
-
[72]
Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. 2005. Improving recommendation lists through topic diversification. InProceedings of the 14th international conference on World Wide Web. 22–32
2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.