pith. machine review for the scientific record. sign in

arxiv: 2604.24405 · v2 · submitted 2026-04-27 · 💻 cs.HC

Recognition: unknown

How Personal Characteristics Shape User Exploration of Diverse Movie Recommendations with a LLM-Based Multi-Agent System

Authors on Pith no claims yet

Pith reviewed 2026-05-08 02:10 UTC · model grok-4.3

classification 💻 cs.HC
keywords recommender systemsmulti-agent systemsLLMpersonality traitsdiversitynoveltyuser study
0
0 comments X

The pith

LLM multi-agent systems raise perceived novelty and diversity in movie recommendations compared to single-agent baselines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether an LLM-based multi-agent system helps users explore more diverse movie suggestions than a single-agent version. A between-subjects study with 100 participants measured perceived accuracy, novelty, diversity, and overall ratings while tracking personality traits, demographics, prior AI recommendation experience, and skepticism toward generative AI. The multi-agent setup produced higher perceived novelty and Shannon diversity scores. Conscientiousness correlated with stronger accuracy perceptions and diversity, extraversion correlated with lower diversity perceptions, and prior AI experience boosted diversity while skepticism reduced it. Interaction effects between system type and user traits also appeared.

Core claim

The multi-agent system significantly increases Perceived Novelty and Shannon Diversity. Conscientiousness is positively associated with Perceived Accuracy and diversity, whereas extraversion is negatively associated with Perceived Diversity. Prior experience with GenAI-based recommendations is positively associated with Shannon Diversity, while skepticism toward GenAI is negatively associated with it. Significant interaction effects exist between system design and user characteristics.

What carries the argument

The LLM-based multi-agent system for movie recommendations, which coordinates multiple specialized agents to generate suggestions versus a single-agent baseline.

If this is right

  • Recommender systems may benefit from multi-agent coordination to surface less obvious but relevant items.
  • Designs should incorporate personality assessment to adjust diversity levels for different users.
  • Skepticism toward AI can reduce engagement with diverse outputs, suggesting a need for transparency features.
  • Prior experience with generative AI tools predicts greater acceptance of novel recommendations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Systems could detect user traits in real time and switch between single- and multi-agent modes dynamically.
  • The same multi-agent approach might increase content exploration in domains such as music playlists or news feeds.
  • Longer-term deployments could test whether higher diversity exposure changes actual viewing habits over weeks.

Load-bearing premise

The measured differences in perceived novelty, diversity, and accuracy are caused by the multi-agent architecture itself rather than unmeasured differences in how the systems were implemented or how participants interpreted the questions.

What would settle it

An experiment that holds all implementation details constant except the number of agents and still finds no difference in Shannon diversity or perceived novelty scores would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.24405 by Yirui Huang, Yucheng Jin, Yufan Zhou, Zhao Wang.

Figure 1
Figure 1. Figure 1: Main interface of the experimental system. The interface consists of three main panels. (a) Agent Profile Panel: view at source ↗
Figure 2
Figure 2. Figure 2: Interaction Effects of Condition and Personality Traits on User Perceptions view at source ↗
read the original abstract

Diversity is an important evaluation criterion for recommender systems beyond accuracy, yet users differ in their willingness to engage with novel and diverse content. In this work, we investigate how a Large Language Model (LLM)-based multi-agent system supports users' exploration of diverse recommendations, and how individual characteristics shape user experiences. We conducted a between-subjects user study (N = 100) comparing a single-agent system (baseline) with a multi-agent system for movie recommendations. We measured Perceived Accuracy, diversity, novelty, and overall rating, and examined the influence of personal characteristics, including personality traits, demographics, GenAI recommendation experience, and GenAI skepticism. Results show that the multi-agent system significantly increases Perceived Novelty and Shannon Diversity. Conscientiousness is positively associated with Perceived Accuracy and diversity, whereas extraversion is negatively associated with Perceived Diversity. Prior experience with GenAI-based recommendations is positively associated with Shannon Diversity, while skepticism toward GenAI is negatively associated with it. We also observe significant interaction effects between system design and user characteristics. These findings highlight the importance of personality-aware conversational recommender systems and caution against one-size-fits-all multi-agent designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper reports a between-subjects user study (N=100) comparing a single-agent LLM baseline to a multi-agent LLM system for movie recommendations. It claims the multi-agent system produces significantly higher Perceived Novelty and Shannon Diversity, with conscientiousness positively associated with Perceived Accuracy and diversity, extraversion negatively associated with Perceived Diversity, prior GenAI experience positively associated with Shannon Diversity, and GenAI skepticism negatively associated with it; interaction effects between system type and user characteristics are also reported.

Significance. If the experimental conditions are shown to differ only in agent architecture, the results would usefully extend work on conversational recommenders by demonstrating how multi-agent LLM designs can increase exploration metrics and by documenting moderation by personality traits and GenAI attitudes. The N=100 between-subjects design is adequate for detecting main effects and provides direct empirical measurements rather than model-derived predictions.

major comments (2)
  1. [System Design / Methods] The central claim that observed increases in Perceived Novelty and Shannon Diversity are attributable to the multi-agent architecture (rather than other implementation differences) requires explicit demonstration that prompts, LLM call budgets, temperature settings, response formatting, and recommendation-generation logic were held constant across conditions. The manuscript must report these controls in the system-description or methods section; without them the reported main effects and personality interactions could be artifacts of unmatched implementation details.
  2. [Results] The abstract states that significant interaction effects exist between system design and user characteristics, yet the manuscript must specify the exact statistical tests, effect sizes, degrees of freedom, and any correction for multiple comparisons applied to these interactions and the personality associations. Without this information the robustness of the moderation claims cannot be evaluated.
minor comments (1)
  1. [Abstract] The abstract would benefit from a brief statement of the measurement instruments (e.g., exact Likert scales or diversity formulas) used for Perceived Novelty, Accuracy, and Shannon Diversity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that greater transparency in system controls and statistical reporting will strengthen the manuscript. We address each major comment below and will incorporate revisions accordingly.

read point-by-point responses
  1. Referee: [System Design / Methods] The central claim that observed increases in Perceived Novelty and Shannon Diversity are attributable to the multi-agent architecture (rather than other implementation differences) requires explicit demonstration that prompts, LLM call budgets, temperature settings, response formatting, and recommendation-generation logic were held constant across conditions. The manuscript must report these controls in the system-description or methods section; without them the reported main effects and personality interactions could be artifacts of unmatched implementation details.

    Authors: We agree that explicit documentation of matched implementation details is necessary to support causal attribution to the multi-agent architecture. In the study, both conditions used the same underlying GPT-4 model, identical temperature (0.7), the same movie database and user profile inputs, equivalent response formatting instructions, and matched recommendation-generation logic. The sole intended difference was the introduction of agent collaboration and role specialization in the multi-agent condition. We acknowledge that the original manuscript described the systems at a high level without a side-by-side parameter table. We will revise the Methods section to include such a table (or enumerated list) confirming that prompt lengths and specificity were comparable (role-specific adaptations only), API call budgets were equivalent per recommendation, and no other systematic differences existed. This revision will directly address the concern. revision: yes

  2. Referee: [Results] The abstract states that significant interaction effects exist between system design and user characteristics, yet the manuscript must specify the exact statistical tests, effect sizes, degrees of freedom, and any correction for multiple comparisons applied to these interactions and the personality associations. Without this information the robustness of the moderation claims cannot be evaluated.

    Authors: We agree that full statistical transparency is required. The interactions were examined via moderated multiple regression (system type dummy-coded and interacted with each continuous user characteristic in separate models per DV). Main-effect personality associations were assessed with linear regression (or Pearson correlation where appropriate). We will expand the Results section to report, for each test: the exact model (e.g., moderated regression), F or t statistics, degrees of freedom, p-values, effect sizes (partial eta-squared or R^2 change), and whether a multiple-comparison correction (Bonferroni across the family of interaction tests) was applied. If any test did not survive correction, we will note it. These additions will allow readers to fully evaluate the moderation claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical user study with direct measurements

full rationale

The paper reports results from a between-subjects user study (N=100) that directly measures Perceived Accuracy, diversity, novelty, and ratings, then performs statistical analysis of associations with personality traits, demographics, GenAI experience, and skepticism. No equations, derivations, fitted models, or predictions are present that could reduce to inputs by construction. Central claims rest on observed data differences between single-agent and multi-agent conditions rather than any self-definitional logic, self-citation chains, or renamed known results. The design is self-contained against external benchmarks of user perception and does not invoke load-bearing uniqueness theorems or ansatzes from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the validity of self-report measures, the assumption that the multi-agent vs single-agent manipulation was the only systematic difference between conditions, and standard statistical assumptions for significance testing; no free parameters, invented entities, or non-standard axioms are introduced.

axioms (1)
  • standard math Standard assumptions underlying between-subjects statistical tests (e.g., independence of observations, approximate normality for t-tests or ANOVA)
    The abstract reports significant effects and associations, which presuppose these common statistical assumptions.

pith-pipeline@v0.9.0 · 5514 in / 1300 out tokens · 30733 ms · 2026-05-08T02:10:50.685291+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 12 canonical work pages · 1 internal anchor

  1. [1]

    M Mehdi Afsar, Trafford Crump, and Behrouz Far. 2022. Reinforcement learning based recommender systems: A survey.Comput. Surveys55, 7 (2022), 1–38

  2. [2]

    Elina Maria Ahokas. 2025. The Influence of AI Literacy on User Preferences for Explainable AI in Recommender Systems

  3. [3]

    Gabrielle Aparecida Pires Alves, Dietmar Jannach, Rodrigo Ferrari de Souza, Daniela Damian, and Marcelo Garcia Manzato. 2024. Digitally nudging users to explore off-profile recommendations: here be dragons.User Modeling and User-Adapted Interaction34, 2 (2024), 441–481

  4. [4]

    Qazi Mohammad Areeb, Mohammad Nadeem, Shahab Saquib Sohail, Raza Imam, Faiyaz Doctor, Yassine Himeur, Amir Hussain, and Abbes Amira. 2023. Filter bubbles in recommender systems: Fact or fallacy—A systematic review.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery13, 6 (2023), e1512

  5. [5]

    Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. InProceedings of the 21st Annual International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval(Melbourne, Australia)(SIGIR ’98). Association for Computing Machinery, New York, NY, USA, 335–336. d...

  6. [6]

    Diego Carraro and Derek Bridge. 2025. Enhancing recommendation diversity by re-ranking with large language models.ACM Transactions on Recommender Systems4, 2 (2025), 1–40

  7. [7]

    Pablo Castells, Neil Hurley, and Saul Vargas. 2021. Novelty and diversity in recommender systems. InRecommender systems handbook. Springer, 603–646

  8. [8]

    Li Chen, Wen Wu, and Liang He. 2013. How personality influences users’ needs for recommendation diversity? InCHI’13 extended abstracts on human factors in computing systems. 829–834

  9. [9]

    Nuo Chen, Quanyu Dai, Xiaoyu Dong, Xiao-Ming Wu, and Zhenhua Dong. 2025. Large Language Models as Evaluators for Conversational Recommender Systems: Benchmarking System Performance from a User-Centric Perspective.arXiv preprint arXiv:2501.09493(2025)

  10. [10]

    Ying-Chih Chen, Matthew J Benus, and Jaclyn Hernandez. 2019. Managing uncertainty in scientific argumentation.Science Education103, 5 (2019), 1235– 1276

  11. [11]

    Sahraoui Dhelim, Nyothiri Aung, Mohammed Amine Bouras, Huansheng Ning, and Erik Cambria. 2022. A survey on personality-aware recommendation systems. Artificial Intelligence Review55, 3 (2022), 2409–2454

  12. [12]

    Patrik Dokoupil, Ludovico Boratto, and Ladislav Peska. 2024. User perceptions of diversity in recommender systems. InProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization. 212–222

  13. [13]

    Michael D Ekstrand, F Maxwell Harper, Martijn C Willemsen, and Joseph A Konstan. 2014. User perception of differences in recommender algorithms. In Proceedings of the 8th ACM Conference on Recommender systems. 161–168

  14. [14]

    Elena Ðerić, Domagoj Frank, and Marin Milković. 2025. Trust in generative AI tools: A comparative study of higher education students, teachers, and re- searchers.Information16, 7 (2025), 622

  15. [15]

    Hans J Eysenck. 1991. Dimensions of personality: The biosocial approach to personality. InExplorations in temperament: International perspectives on theory and measurement. Springer, 87–103

  16. [16]

    Jiabao Fang, Shen Gao, Pengjie Ren, Xiuying Chen, Suzan Verberne, and Zhaochun Ren. 2024. A multi-agent conversational recommender system.arXiv preprint arXiv:2402.01135(2024)

  17. [17]

    Bruce Ferwerda, Mark P Graus, Andreu Vall, Marko Tkalcic, and Markus Schedl

  18. [18]

    In4th Workshop on Emotions and Personality in Personalized Systems (EMPIRE 2016), Boston, MA, USA, September 16th, 2016

    The influence of users’ personality traits on satisfaction and attractiveness of diversified recommendation lists. In4th Workshop on Emotions and Personality in Personalized Systems (EMPIRE 2016), Boston, MA, USA, September 16th, 2016. 43–47

  19. [19]

    Bruce Ferwerda and Markus Schedl. 2016. Personality-based user modeling for music recommender systems. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 254–257

  20. [20]

    Luke Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, et al. 2023. Leveraging large language models in conversational recommender systems.arXiv preprint arXiv:2305.07961(2023)

  21. [21]

    Shuyu Guo, Shuo Zhang, Weiwei Sun, Pengjie Ren, Zhumin Chen, and Zhaochun Ren. 2023. Towards explainable conversational recommender systems. InProceed- ings of the 46th International ACM SIGIR conference on research and development in information retrieval. 2786–2795

  22. [22]

    Min Hou, Le Wu, Yuxin Liao, Yonghui Yang, Zhen Zhang, Changlong Zheng, Han Wu, and Richang Hong. 2025. A survey on generative recommendation: Data, model, and tasks.arXiv preprint arXiv:2510.27157(2025)

  23. [23]

    Rong Hu and Pearl Pu. 2011. Enhancing recommendation diversity with organi- zation interfaces. InProceedings of the 16th international conference on Intelligent user interfaces. 347–350

  24. [24]

    Rong Hu and Pearl Pu. 2011. Helping users perceive recommendation diversity.. InDiveRS@ RecSys. 43–50

  25. [25]

    Dietmar Jannach. 2023. Evaluating conversational recommender systems: A landscape of research.Artificial Intelligence Review56, 3 (2023), 2365–2400

  26. [26]

    Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2021. A survey on conversational recommender systems.ACM Computing Surveys (CSUR)54, 5 (2021), 1–36

  27. [27]

    Mathias Jesse, Christine Bauer, and Dietmar Jannach. 2023. Intra-list similarity and human diversity perceptions of recommendations: the details matter: M. Jesse et al.User Modeling and User-Adapted Interaction33, 4 (2023), 769–802

  28. [28]

    Yucheng Jin, Li Chen, Wanling Cai, and Xianglin Zhao. 2024. CRS-Que: A user- centric evaluation framework for conversational recommender systems.ACM Transactions on Recommender Systems2, 1 (2024), 1–34

  29. [29]

    Yucheng Jin, Nava Tintarev, Nyi Nyi Htun, and Katrien Verbert. 2020. Effects of personal characteristics in control-oriented user interfaces for music recom- mender systems: Y. Jin et al.User Modeling and User-Adapted Interaction30, 2 (2020), 199–249

  30. [30]

    Yucheng Jin, Nava Tintarev, and Katrien Verbert. 2018. Effects of individual traits on diversity-aware music recommender user interfaces. InProceedings of the 26th Conference on User Modeling, Adaptation and Personalization. 291–299

  31. [31]

    Yucheng Jin, Nava Tintarev, and Katrien Verbert. 2018. Effects of personal char- acteristics on music recommender systems with different levels of controllability. InProceedings of the 12th ACM Conference on Recommender Systems. 13–21

  32. [32]

    2010.Handbook of personality: Theory and research

    Oliver P John, Richard W Robins, and Lawrence A Pervin. 2010.Handbook of personality: Theory and research. Guilford Press

  33. [33]

    Marius Kaminskas and Derek Bridge. 2016. Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems.ACM Transactions on Interactive Intelligent Systems (TiiS) 7, 1 (2016), 1–42

  34. [34]

    Marius Kaminskas and Derek Bridge. 2016. Diversity, Serendipity, Novelty, and Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in Recommender Systems.ACM Trans. Interact. Intell. Syst.7, 1, Article 2 (Dec. 2016), 42 pages. doi:10.1145/2926720

  35. [35]

    Jie Kang, Kyle Condiff, Shuo Chang, Joseph A Konstan, Loren Terveen, and F Maxwell Harper. 2017. Understanding how people use natural language to ask for recommendations. InProceedings of the Eleventh ACM Conference on Recommender Systems. 229–237

  36. [36]

    Komal Kapoor, Vikas Kumar, Loren Terveen, Joseph A Konstan, and Paul Schrater

  37. [37]

    I like to explore sometimes

    " I like to explore sometimes" Adapting to Dynamic User Novelty Pref- erences. InProceedings of the 9th ACM Conference on Recommender Systems. 19–26

  38. [38]

    Alex Kulesza, Ben Taskar, et al. 2012. Determinantal point processes for machine learning.Foundations and Trends®in Machine Learning5, 2–3 (2012), 123–286

  39. [39]

    Matevž Kunaver and Tomaž Požrl. 2017. Diversity in recommender systems–A survey.Knowledge-based systems123 (2017), 154–162

  40. [40]

    Yu Liang. 2019. Recommender system for developing new preferences and goals. InProceedings of the 13th ACM Conference on Recommender Systems. 611–615

  41. [41]

    Yu Liang and Martijn C Willemsen. 2023. Promoting music exploration through personalized nudging in a genre exploration recommender.International Journal of Human–Computer Interaction39, 7 (2023), 1495–1518

  42. [42]

    Qidong Liu, Xiangyu Zhao, Yuhao Wang, Yejing Wang, Zijian Zhang, Yuqi Sun, Xi- ang Li, Maolin Wang, Pengyue Jia, Chong Chen, et al. 2024. Large Language Model Enhanced Recommender Systems: A Survey.arXiv preprint arXiv:2412.13432 (2024)

  43. [43]

    Raj Mahmud, Yufeng Wu, Abdullah Bin Sawad, Shlomo Berkovsky, Mukesh Prasad, and A Baki Kocaballi. 2025. Evaluating user experience in conversational recommender systems: A systematic review across classical and LLM-powered approaches. InProceedings of the 37th Australian Conference on Human-Computer Interaction. 81–93

  44. [44]

    Mohammad Naiseh, Dena Al-Thani, Nan Jiang, and Raian Ali. 2023. How the different explanation classes impact trust calibration: The case of clinical decision support systems.International Journal of Human-Computer Studies169 (2023), 102941. UMAP ’26, June 08–11, 2026, Gothenburg, Sweden Zhou et al

  45. [45]

    Tien T Nguyen, Pik-Mai Hui, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2014. Exploring the filter bubble: the effect of using recommender systems on content diversity. InProceedings of the 23rd international conference on World wide web. 677–686

  46. [46]

    Tien T Nguyen, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2018. User personality and user satisfaction with recommender systems.Information systems frontiers20, 6 (2018), 1173–1189

  47. [47]

    Rudolph L Philipp and Gerald JS Wilde. 1970. Stimulation seeking behaviour and extraversion.Acta Psychologica32 (1970), 269–280

  48. [48]

    Behnam Rahdari, Branislav Kveton, and Peter Brusilovsky. 2022. The magic of carousels: Single vs. multi-list recommender systems. InProceedings of the 33rd ACM Conference on Hypertext and Social Media. 166–174

  49. [49]

    Beatrice Rammstedt and Oliver P John. 2007. Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German.Journal of research in Personality41, 1 (2007), 203–212

  50. [50]

    Giorgio Robino. 2025. Conversation routines: A prompt engineering framework for task-oriented dialog systems.arXiv preprint arXiv:2501.11613(2025)

  51. [51]

    Alan Said. 2025. On explaining recommendations with Large Language Models: a review.Frontiers in Big Data7 (2025), 1505284

  52. [52]

    Navya Nishith Sharan and Daniela Maria Romano. 2020. The effects of personality and locus of control on trust in humans versus artificial intelligence.Heliyon6, 8 (2020)

  53. [53]

    Sarama Shehmir and Rasha Kashef. 2025. LLM4Rec: A Comprehensive Sur- vey on the Integration of Large Language Models in Recommender Sys- tems—Approaches, Applications and Challenges.Future Internet17, 6 (2025), 252

  54. [54]

    Aletta Smits, Ester Bartels, Chris Detweiler, and Koen van Turnhout. 2023. Results of the Workshop on Algorithmic Affordances in Recommender Interfaces. InIFIP Conference on Human-Computer Interaction. Springer, 165–172

  55. [55]

    Haocan Sun, Weizi Liu, Di Wu, Guoming Yu, and Mike Yao. 2025. Revisiting Trust in the Era of Generative AI: Factorial Structure and Latent Profiles.arXiv preprint arXiv:2510.10199(2025)

  56. [56]

    Ruixuan Sun, Avinash Akella, Ruoyan Kong, Moyan Zhou, and Joseph A Kon- stan. 2024. Interactive content diversity and user exploration in online movie recommenders: A field experiment.International Journal of Human–Computer Interaction40, 22 (2024), 7233–7247

  57. [57]

    Yueming Sun and Yi Zhang. 2018. Conversational recommender system. InThe 41st international acm sigir conference on research & development in information retrieval. 235–244

  58. [58]

    Muh-Chyun Tang and I-Han Liao. 2022. Preference diversity and openness to novelty: Scales construction from the perspective of movie recommendation. Journal of the Association for Information Science and Technology73, 9 (2022), 1222–1235

  59. [59]

    Nava Tintarev, Matt Dennis, and Judith Masthoff. 2013. Adapting recommenda- tion diversity to openness to experience: a study of human behaviour. InInter- national Conference on User Modeling, Adaptation, and Personalization. Springer, 190–202

  60. [60]

    Chun-Hua Tsai and Peter Brusilovsky. 2017. Enhancing recommendation diver- sity through a dual recommendation interface. InCEUR Workshop Proceedings, Vol. 17

  61. [61]

    Chun-Hua Tsai and Peter Brusilovsky. 2019. Exploring social recommendations with visual diversity-promoting interfaces.ACM Transactions on Interactive Intelligent Systems (TiiS)10, 1 (2019), 1–34

  62. [62]

    Zihan Wang, Shi Feng, Daling Wang, Kaisong Song, Gang Wu, Yifei Zhang, Han Zhao, and Ge Yu. 2025. Diversity-enhanced conversational recommendation via multi-agent reinforcement learning.Knowledge and Information Systems(2025), 1–29

  63. [63]

    Wen Wu, Li Chen, and Liang He. 2013. Using personality to adjust diversity in recommender systems. InProceedings of the 24th ACM Conference on Hypertext and Social Media(Paris, France)(HT ’13). Association for Computing Machinery, New York, NY, USA, 225–229. doi:10.1145/2481492.2481521

  64. [64]

    Wen Wu, Li Chen, and Yu Zhao. 2018. Personalizing recommendation diversity based on user personality.User Modeling and User-Adapted Interaction28, 3 (2018), 237–276

  65. [65]

    Yu Xia, Sungchul Kim, Tong Yu, Ryan A Rossi, and Julian McAuley. 2025. Multi- Agent Collaborative Filtering: Orchestrating Users and Items for Agentic Recom- mendations.arXiv preprint arXiv:2511.18413(2025)

  66. [66]

    Sojeong Yun and Youn-kyung Lim. 2025. User Experience with LLM-powered Conversational Recommendation Systems: A Case of Music Recommendation. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–15

  67. [67]

    Yizhe Zhang, Yucheng Jin, Li Chen, and Ting Yang. 2026. A cross-domain study on the user experience of ChatGPT-based recommendations.International Journal of Human-Computer Studies(2026), 103743

  68. [68]

    Yu Zhang, Jingwei Sun, Li Feng, Cen Yao, Mingming Fan, Liuxin Zhang, Qianying Wang, Xin Geng, and Yong Rui. 2024. See widely, think wisely: Toward designing a generative multi-agent system to burst filter bubbles. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems. 1–24

  69. [69]

    Xiaoyan Zhao, Yang Deng, Wenjie Wang, Hong Cheng, Rui Zhang, See-Kiong Ng, Tat-Seng Chua, et al . 2025. Exploring the Impact of Personality Traits on Conversational Recommender Systems: A Simulation with Large Language Models.arXiv preprint arXiv:2504.12313(2025)

  70. [70]

    Tao Zhou, Zoltán Kuscsik, Jian-Guo Liu, Matúš Medo, Joseph Rushton Wakeling, and Yi-Cheng Zhang. 2010. Solving the apparent diversity-accuracy dilemma of recommender systems.Proceedings of the National Academy of Sciences107, 10 (2010), 4511–4515

  71. [71]

    Yaochen Zhu, Harald Steck, Dawen Liang, Yinhan He, Nathan Kallus, and Jundong Li. 2025. Llm-based conversational recommendation agents with collaborative verbalized experience.Proceedings of the Proc. of EMNLP Findings(2025), 2207– 2220

  72. [72]

    Cai-Nicolas Ziegler, Sean M McNee, Joseph A Konstan, and Georg Lausen. 2005. Improving recommendation lists through topic diversification. InProceedings of the 14th international conference on World Wide Web. 22–32