arxiv: 2604.17459 · v1 · submitted 2026-04-19 · 💻 cs.IR

Recognition: unknown

Transparent and Controllable Recommendation Filtering via Multimodal Multi-Agent Collaboration

Chi Zhang , Zhipeng Xu , Jiahao Liu , Dongsheng Li , Hansu Gu , Peng Zhang , Ning Gu , Tun Lu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:49 UTC · model grok-4.3

classification 💻 cs.IR

keywords recommendation filteringmultimodal multi-agentover-associationfact-grounded adjudicationuser controlpreference graphLLM filteringtransparent recommendations

0 comments

The pith

A multimodal multi-agent system with fact-grounded checks reduces false positives in recommendation filters by 74 percent on adversarial tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Recommender systems using large language models often block too much content because they over-generalize a user's specific dislikes and invent reasons to filter out unrelated material. This paper introduces a framework that adds multimodal perception, multiple collaborating agents, and a fact-based verification step to make filtering decisions more accurate. It also creates a preference model built as a two-tier graph that users can edit directly in real time. These changes aim to let people avoid unwanted items without losing access to useful recommendations or giving up control over their feeds. Tests on difficult cases and a small user study show sharper accuracy and better match to what participants actually want.

Core claim

The authors establish that an end-to-cloud multimodal multi-agent architecture, built around a fact-grounded adjudication pipeline and a dynamic two-tier preference graph supporting Delta-adjustments, curbs inferential hallucinations and over-association during content filtering, yielding a 74.3 percent drop in false positive rate on 473 adversarial samples, nearly twice the F1-Score of text-only baselines, and improved intent alignment in a 7-day study with 19 participants.

What carries the argument

The fact-grounded adjudication pipeline that verifies claims against objective evidence to block hallucinations, combined with the dynamic two-tier preference graph that accepts explicit user Delta-adjustments to preserve fine-grained intents without catastrophic forgetting.

Load-bearing premise

The multi-agent orchestration and fact-grounded checks will eliminate over-association and hallucinations without introducing new biases or errors, and the small 19-participant study will generalize to larger user groups.

What would settle it

Evaluating the system on a fresh adversarial dataset several times larger than 473 samples and measuring whether the false positive rate reduction remains near 74 percent would directly test the central performance claim.

Figures

Figures reproduced from arXiv: 2604.17459 by Chi Zhang, Dongsheng Li, Hansu Gu, Jiahao Liu, Ning Gu, Peng Zhang, Tun Lu, Zhipeng Xu.

**Figure 1.** Figure 1: Comparison of recommendation filtering paradigms. (a) Existing text-only monolithic filters often suffer from [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The detailed architecture of the MAP-V system. It contains four functional zones: Client Intervention, Multi-Agent [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Longitudinal Intent Alignment (Usage Day 1-7). The [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Governance Efficiency Trend. As the net intercep [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Subjective Comparison between MAP-V and Native [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Usability ratings of MAP-V specific feature modules [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Detailed user interface of MAP-V. (a) Bidirectional Curation: Demonstrates a masked item (left) with its adjudication [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

read the original abstract

While personalized recommender systems excel at content discovery, they frequently expose users to undesirable or discomforting information, highlighting the critical need for user-centric filtering tools. Current methods leveraging Large Language Models (LLMs) struggle with two major bottlenecks: they lack multimodal awareness to identify visually inappropriate content, and they are highly prone to "over-association" -- incorrectly generalizing a user's specific dislike (e.g., anxiety-inducing marketing) to block benign, educational materials. These unconstrained hallucinations lead to a high volume of false positives, ultimately undermining user agency. To overcome these challenges, we introduce a novel framework that integrates end-to-cloud collaboration, multimodal perception, and multi-agent orchestration. Our system employs a fact-grounded adjudication pipeline to eliminate inferential hallucinations. Furthermore, it constructs a dynamic, two-tier preference graph that allows for explicit, human-in-the-loop modifications (via Delta-adjustments), explicitly preventing the algorithm from catastrophically forgetting fine-grained user intents. Evaluated on an adversarial dataset comprising 473 highly confusing samples, the proposed architecture effectively curbed over-association, decreasing the false positive rate by 74.3% and achieving nearly twice the F1-Score of traditional text-only baselines. Additionally, a 7-day longitudinal field study with 19 participants demonstrated robust intent alignment and enhanced governance efficiency. User feedback confirmed that the framework drastically improves algorithmic transparency, rebuilds user control, and alleviates the fear of missing out (FOMO), paving the way for transparent human-AI co-governance in personalized feeds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable multi-agent recipe for controllable rec filtering but its performance claims rest on thin, under-documented evaluation.

read the letter

The main thing to know is that this work puts together multimodal agents, a fact-grounded pipeline, and a two-tier preference graph with Delta adjustments to let users steer filters without the system over-generalizing dislikes. It targets the practical issue of LLM-based recommenders blocking too much or hallucinating associations, especially with images involved. The authors describe an end-to-cloud setup that tries to keep decisions transparent and reversible by users. That part is a reasonable extension of existing multi-agent techniques to the filtering setting and shows they thought about human oversight in a concrete way. The preference graph idea in particular feels like a practical handle for keeping fine-grained intents alive instead of letting the model forget them. On the evidence side, the abstract reports a 74.3% false-positive drop and roughly doubled F1 on 473 adversarial samples plus a 7-day study with 19 users. Those numbers address a real pain point if they hold. The soft spots are exactly where the stress-test note points: no protocol for building the confusing samples, no inter-rater checks, no statistical tests or confidence intervals, and a tiny user study without power analysis or clear controls. If dataset construction or self-report bias crept in, the claimed gains over text-only baselines become hard to trust. The paper does not appear to ship code, data, or formal proofs, so the results stay at the level of a system demonstration rather than a fully reproducible finding. This is for recsys practitioners and applied researchers who want ideas for user-controllable filters in production feeds. A reader already working on multi-agent LLM tools for content moderation would find the architecture sketch useful even if they have to redo the experiments themselves. I would send it to peer review. The problem is worth referee time and the framework has enough structure that feedback could tighten the evaluation without starting over.

Referee Report

3 major / 2 minor

Summary. The paper introduces a multimodal multi-agent framework for user-centric recommendation filtering that combines end-to-cloud collaboration, multimodal perception, and multi-agent orchestration. It employs a fact-grounded adjudication pipeline to curb inferential hallucinations and a dynamic two-tier preference graph with Delta-adjustments for human-in-the-loop control. On an adversarial set of 473 highly confusing samples, the system is claimed to reduce false positive rate by 74.3% and nearly double the F1-score relative to text-only baselines; a 7-day field study with 19 participants is reported to show improved intent alignment and governance efficiency.

Significance. If the quantitative claims hold under rigorous controls, the work would address a practical gap in LLM-based content filters by reducing over-association while preserving user agency through explicit controllability mechanisms. The emphasis on transparency and human-AI co-governance aligns with growing interest in accountable recommender systems, though the small evaluation scales limit immediate generalizability.

major comments (3)

[Abstract] Abstract: the headline result of a 74.3% FPR reduction and nearly 2x F1 improvement on 473 adversarial samples is load-bearing for the central claim, yet no protocol is given for constructing the 'highly confusing' samples, no inter-rater reliability statistics, no baseline implementation details, and no statistical tests or confidence intervals on the deltas. Without these, selection bias or post-hoc labeling cannot be ruled out.
[Abstract] Abstract: the 7-day longitudinal study with 19 participants is presented as demonstrating 'robust intent alignment,' but the text provides no control condition, power analysis, handling of self-report bias, or quantitative metrics beyond qualitative user feedback. This small-N design is insufficient to support generalization claims.
[Abstract] Abstract: the fact-grounded adjudication pipeline and two-tier preference graph are introduced as core innovations that 'eliminate inferential hallucinations' and prevent catastrophic forgetting, but no formal definitions, pseudocode, or ablation results are supplied to show how these components achieve the reported gains without introducing new biases.

minor comments (2)

[Abstract] The abstract uses the term 'over-association' without a precise operational definition or example that distinguishes it from standard false-positive filtering errors.
No mention of reproducibility artifacts (code, prompts, or dataset release) is made, which would strengthen the empirical claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments identify important gaps in reporting and rigor that we will address through targeted revisions. Below we respond point-by-point to the major comments, committing to concrete additions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the headline result of a 74.3% FPR reduction and nearly 2x F1 improvement on 473 adversarial samples is load-bearing for the central claim, yet no protocol is given for constructing the 'highly confusing' samples, no inter-rater reliability statistics, no baseline implementation details, and no statistical tests or confidence intervals on the deltas. Without these, selection bias or post-hoc labeling cannot be ruled out.

Authors: We agree that the evaluation protocol requires substantially more detail to support the headline claims. In the revised manuscript we will add: (1) an explicit protocol describing how the 473 adversarial samples were constructed, including the multi-stage curation process, semantic similarity criteria, and examples of confusing vs. benign items; (2) inter-rater reliability statistics (Cohen’s kappa) from the annotation process; (3) complete baseline implementation details, including exact model versions, prompting templates, and decoding parameters; and (4) statistical tests (paired Wilcoxon signed-rank) together with 95% confidence intervals on all performance deltas. These additions will allow independent assessment of selection bias and reproducibility. revision: yes
Referee: [Abstract] Abstract: the 7-day longitudinal study with 19 participants is presented as demonstrating 'robust intent alignment,' but the text provides no control condition, power analysis, handling of self-report bias, or quantitative metrics beyond qualitative user feedback. This small-N design is insufficient to support generalization claims.

Authors: We acknowledge the limitations of the small-scale field study. In revision we will: clarify the within-subject design and any control elements used; report a post-hoc power analysis for the observed effect sizes; describe mitigation steps for self-report bias (triangulation with logged filtering actions and anonymous response formats); and include quantitative metrics such as pre/post changes in intent-alignment accuracy and governance efficiency scores alongside the qualitative feedback. We will also revise the language to present the study as exploratory rather than generalizable, explicitly noting the small N as a limitation and outlining plans for larger validation. revision: partial
Referee: [Abstract] Abstract: the fact-grounded adjudication pipeline and two-tier preference graph are introduced as core innovations that 'eliminate inferential hallucinations' and prevent catastrophic forgetting, but no formal definitions, pseudocode, or ablation results are supplied to show how these components achieve the reported gains without introducing new biases.

Authors: We accept that formalization and component-level validation are necessary. In the revised version we will supply: (1) formal definitions of the fact-grounded adjudication pipeline (retrieval-augmented verification against user-stated facts) and the two-tier preference graph (coarse static layer plus dynamic Delta-adjustment layer); (2) pseudocode for the multi-agent orchestration, adjudication logic, and graph-update procedure; and (3) ablation experiments that isolate each component’s contribution to the FPR and F1 improvements, together with an analysis of any introduced biases (e.g., fact-base coverage gaps). These additions will demonstrate how the mechanisms deliver the observed gains. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results are measured outcomes, not self-referential definitions

full rationale

The paper describes a multimodal multi-agent recommendation filtering framework and reports performance on an external adversarial dataset (473 samples) plus a separate 19-participant field study. No equations, parameter-fitting steps, or derivation chains appear in the abstract or described content that would reduce any claimed prediction or result back to the system's own inputs by construction. Metrics such as the 74.3% FPR reduction and doubled F1-score are presented as observed experimental outcomes rather than quantities defined in terms of fitted parameters or self-citations. The fact-grounded adjudication pipeline is introduced as a design choice whose effectiveness is validated externally, not presupposed. This is the common case of a system paper whose central claims rest on independent evaluation rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Based solely on the abstract, the central claim rests on several unproven domain assumptions about agent behavior and user interaction; no free parameters are explicitly named, but the framework introduces new components whose independence from prior work cannot be assessed without the full text.

axioms (2)

domain assumption Multi-agent orchestration combined with fact-grounded adjudication can eliminate inferential hallucinations in content filtering decisions
Invoked to address over-association and false positives.
domain assumption A dynamic two-tier preference graph with Delta-adjustments prevents catastrophic forgetting of fine-grained user intents
Stated as enabling explicit human-in-the-loop control.

invented entities (2)

two-tier preference graph no independent evidence
purpose: Stores user preferences in a structure that supports explicit modifications via Delta-adjustments
Introduced to maintain fine-grained intent alignment
fact-grounded adjudication pipeline no independent evidence
purpose: Eliminates inferential hallucinations during filtering
Core mechanism claimed to curb over-association

pith-pipeline@v0.9.0 · 5591 in / 1663 out tokens · 93595 ms · 2026-05-10T05:49:40.665987+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 18 canonical work pages · 6 internal anchors

[1]

Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning.Ai Magazine35, 4 (2014), 105–120

2014
[2]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Jun- yang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-VL: A Versatile Vision- Language Model for Understanding, Localization, Text Reading, and Beyond. arXiv:2308.12966 [cs.CV] https://arxiv.org/abs/2308.12966

work page internal anchor Pith review arXiv 2023
[4]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology3, 2 (2006), 77–101

2006
[5]

Allison J. B. Chaney, Brandon M. Stewart, and Barbara E. Engelhardt. 2018. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. InProceedings of the 12th ACM Conference on Recommender Systems(Vancouver, British Columbia, Canada)(RecSys ’18). Association for Com- puting Machinery, New York, NY, USA, 224–232. do...

work page doi:10.1145/3240323.3240370 2018
[6]

Li Chen and Pearl Pu. 2012. Critiquing-based recommenders: survey and emerg- ing trends.User Modeling and User-Adapted Interaction22, 1-2 (2012), 125–150

2012
[7]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales.Educational and psychological measurement20, 1 (1960), 37–46

1960
[8]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. InProceedings of the 10th ACM Conference on Recommender Systems(Boston, Massachusetts, USA)(RecSys ’16). Association for Computing Machinery, New York, NY, USA, 191–198. doi:10.1145/2959100. 2959190

work page doi:10.1145/2959100 2016
[9]

Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. InPro- ceedings of the International AAAI Conference on Web and Social Media (ICWSM), Vol. 11. 512–515

2017
[10]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromis- ing Real-World LLM-Integrated Applications with Indirect Prompt Injection. arXiv:2302.12173 [cs.CR] https://arxiv.org/abs/2302.12173

work page internal anchor Pith review arXiv 2023
[11]

Yui Ha and K. Park. 2020. Automatically Detecting Image–Text Mismatch on Instagram with Deep Learning.Journal of Advertising50 (2020), 52 – 62. https: //api.semanticscholar.org/CorpusID:231990977

2020
[12]

Haveliwala

Taher H. Haveliwala. 2002. Topic-sensitive PageRank. InProceedings of the 11th International Conference on World Wide Web(Honolulu, Hawaii, USA)(WWW ’02). Association for Computing Machinery, New York, NY, USA, 517–526. doi:10. 1145/511446.511513

work page arXiv 2002
[13]

Dietmar Jannach, Sidra Naveed, and Michael Jugovac. 2017. User Control in Recommender Systems: Overview and Interaction Challenges. InE-Commerce and Web Technologies, Derek Bridge and Heiner Stuckenschmidt (Eds.). Springer International Publishing, Cham, 21–33

2017
[14]

Shagun Jhaver, Alice Qian Zhang, Quan Ze Chen, Nikhila Natarajan, Ruotong Wang, and Amy X Zhang. 2023. Personalizing content moderation on social media: User perspectives on moderation choices, interface design, and labor. Proceedings of the ACM on Human-Computer Interaction7, CSCW2 (2023), 1–33

2023
[15]

Diane Kelly and Jaime Teevan. 2003. Implicit feedback for inferring user prefer- ence: a bibliography.SIGIR Forum37, 2 (Sept. 2003), 18–28. doi:10.1145/959258. 959260

work page doi:10.1145/959258 2003
[16]

Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, and Raia Hadsell. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences114, 13 ...

work page doi:10.1073/pnas.1611835114 2017
[17]

Bart P Knijnenburg, Martijn C Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction22, 4-5 (2012), 441–504

2012
[18]

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen. 2023. Evaluating Object Hallucination in Large Vision-Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 292–305. doi:10.18653...

work page doi:10.18653/v1/2023.emnlp-main.20 2023
[19]

Jiahao Liu, Yiyang Shao, Peng Zhang, Dongsheng Li, Hansu Gu, Chao Chen, Longzhi Du, Tun Lu, and Ning Gu. 2025. Filtering Discomforting Recommenda- tions with Large Language Models. InProceedings of the ACM Web Conference 2025 (WWW ’25)

2025
[20]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking : Bringing Order to the Web. InThe Web Conference. https://api.semanticscholar.org/CorpusID:1508503

1999
[21]

Yoon-Joo Park and Alexander Tuzhilin. 2008. The long tail of recommender systems and how to leverage it. InProceedings of the 2008 ACM Conference on Recommender Systems(Lausanne, Switzerland)(RecSys ’08). Association for Computing Machinery, New York, NY, USA, 11–18. doi:10.1145/1454008.1454012

work page doi:10.1145/1454008.1454012 2008
[22]

Pearl Pu, Li Chen, and Rong Hu. 2011. A user-centric evaluation framework for recommender systems. InProceedings of the Fifth ACM Conference on Recom- mender Systems(Chicago, Illinois, USA)(RecSys ’11). Association for Computing Machinery, New York, NY, USA, 157–164. doi:10.1145/2043932.2043962

work page doi:10.1145/2043932.2043962 2011
[23]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV] https://arxiv.org/ abs/2103.00020

work page internal anchor Pith review Pith/arXiv arXiv 2021
[24]

Yubo Shu, Haonan Zhang, Hansu Gu, Peng Zhang, Tun Lu, Dongsheng Li, and Ning Gu. 2024. RAH! RecSys-Assistant-Human: A Human-Centered Recommen- dation Framework With LLM Agents.IEEE Transactions on Computational Social Systems(2024)

2024
[25]

Jonathan Stray, Ivan Vendrov, Jeremy Nixon, Steven Adler, and Dylan Hadfield- Menell. 2021. What are you optimizing for? Aligning Recommender Systems with Human Values. arXiv:2107.10939 [cs.IR] https://arxiv.org/abs/2107.10939

work page arXiv 2021
[26]

Rui Sun, Xuepei Cao, Yuanyuan Zhao, Junjie Wan, Kun Gu, Qinghai Fang, and Ylenia Macis. 2020. Multi-modal knowledge graphs for recommender systems. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). 2685–2692

2020
[27]

Wenjie Wang, Fuli Feng, Liqiang Nie, and Tat-Seng Chua. 2022. User-controllable recommendation against filter bubbles. InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1251–1261. Zhang and Xu, et al

2022
[28]

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre- Trained Transformers. arXiv:2002.10957 [cs.CL] https://arxiv.org/abs/2002.10957

work page arXiv 2020
[29]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation. InProceedings of the 27th ACM International Conference on Multimedia (MM ’19). 875–883

2019
[30]

Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382 [cs.SE] https://arxiv.org/abs/2302.11382

work page internal anchor Pith review arXiv 2023
[31]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. 2023. AutoGen: Enabling Next- Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155 [cs.AI] https://arxiv.org/abs/2308.08155

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

Jiajun Xu, Zhiyuan Li, Wei Chen, Qun Wang, Xin Gao, Qi Cai, and Ziyuan Ling. 2024. On-Device Language Models: A Comprehensive Review. arXiv:2409.00088 [cs.CL] https://arxiv.org/abs/2409.00088

work page arXiv 2024
[33]

theory of mind

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep Learning Based Recommender System: A Survey and New Perspectives.ACM Comput. Surv.52, 1, Article 5 (Feb. 2019), 38 pages. doi:10.1145/3285029 A Granular Performance Breakdown To further validate the robustness of the MAP-V architecture against specific types of inferential hallucinations, we disaggr...

work page doi:10.1145/3285029 2019