Recognition: no theorem link
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Pith reviewed 2026-05-15 18:23 UTC · model grok-4.3
The pith
GroupGPT splits intervention timing and response generation across on-device and cloud models to cut token use by up to 3 times while sanitizing private messages in group chats.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GroupGPT introduces an edge-cloud collaboration architecture that decouples the reasoning for when an agent should intervene (handled locally) from the generation of responses (handled in the cloud), allowing accurate, timely replies in multi-user chats while reducing token consumption by up to three times and sanitizing user data before transmission.
What carries the argument
The edge-cloud model collaboration architecture that separates on-device intervention timing and privacy sanitization from cloud-based response generation.
If this is right
- Multi-user chats become feasible at larger scales because token costs drop sharply while quality stays high.
- Privacy improves because sensitive details never leave the user's device in raw form.
- The same split supports multimodal inputs such as images, videos, memes, and voice messages without extra overhead.
- Dedicated benchmarks like MUIR allow systematic measurement of both timing accuracy and reply quality across model sizes.
- Users report positive experiences across diverse group scenarios when timing and relevance are handled locally first.
Where Pith is reading between the lines
- The architecture could be adapted to other settings that mix personal data with shared conversations, such as collaborative document editing or family messaging apps.
- As on-device models grow stronger, more of the reasoning could stay local, further reducing cloud dependency and latency.
- MUIR-style datasets might help standardize testing for any agent that must track multiple participants over long threads.
- The token savings open the door to running such assistants continuously on consumer hardware without hitting usage limits.
Load-bearing premise
Small on-device models can reliably pick the right moments to respond and clean messages without losing essential context from the full group history.
What would settle it
Real-world group chat logs where the on-device model either intervenes at clearly wrong times or removes context that leads to inaccurate or incomplete cloud responses.
Figures
read the original abstract
Recent advances in large language models (LLMs) have enabled increasingly capable chatbots. However, most existing systems focus on single-user settings and do not generalize well to multi-user group chat interactions, where agents require more proactive and accurate intervention under complex, evolving contexts. Existing approaches typically rely on LLMs for both intervention reasoning and response generation, leading to high token consumption, limited scalability, and potential privacy risks. To address these challenges, we propose GroupGPT, a token-efficient and privacy-preserving agentic framework for multi-user chat assistant. GroupGPT adopts an edge-cloud model collaboration architecture to decouple intervention timing from response generation, enabling efficient and accurate decision-making while preserving user privacy through on-device processing of sensitive information. The framework also supports multimodal inputs, including memes, images, videos, and voice messages.To support evaluation of timing accuracy and response quality, we further introduce MUIR, a benchmark dataset for multi-user chat assistant intervention reasoning. MUIR contains 2,500 annotated group chat segments with intervention labels and rationales. We evaluate a range of models on MUIR, spanning from open-source to proprietary variants, including both LLMs and their smaller counterparts. Extensive experiments demonstrate that GroupGPT generates accurate and well-timed responses, achieving an average score of 4.72/5.0 in LLM-based evaluation, and is well-received by users across diverse group chat scenarios. Moreover, GroupGPT reduces the token usage by up to 3 times compared to baselines, while providing privacy sanitization of user messages before cloud transmission. Code is available at: https://github.com/Eliot-Shen/GroupGPT .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces GroupGPT, an edge-cloud collaborative agentic framework for multi-user group chat assistants. It decouples on-device intervention timing and privacy sanitization from cloud-based response generation to reduce token usage and protect sensitive data, while supporting multimodal inputs. The authors release the MUIR benchmark (2,500 annotated group-chat segments with labels and rationales) and report that GroupGPT achieves an average LLM-as-judge score of 4.72/5, up to 3× token reduction versus baselines, and positive user feedback across diverse scenarios.
Significance. If the per-component claims hold, the work offers a practical path toward scalable, privacy-aware multi-user LLM agents and supplies a new benchmark that could standardize evaluation of intervention reasoning. The edge-cloud split and explicit sanitization step address real deployment constraints that single-model approaches have largely ignored.
major comments (3)
- [§5] §5 (Experiments): no precision, recall, or F1 scores are reported for the on-device timing model’s intervention decisions, nor any ablation that measures response quality when timing is correct versus incorrect. The aggregate 4.72/5 LLM score therefore cannot be attributed to the proposed architecture rather than to the evaluation protocol itself.
- [§5.2] §5.2 and Table 3: the “up to 3× token reduction” claim lacks per-baseline token counts, variance across chat lengths, and a breakdown separating on-device versus cloud tokens. Without these numbers it is impossible to verify the efficiency gain or to reproduce the result.
- [§4] §4 (MUIR benchmark): the annotation protocol, inter-annotator agreement, and rationale quality statistics are not provided. Given that the benchmark is central to all quantitative claims, the absence of these reliability metrics undermines confidence in the 2,500-segment evaluation.
minor comments (2)
- [Figure 2] The architecture diagram (Figure 2) does not label the sanitization module or show how full group history is truncated before cloud transmission.
- [§3.2] The abstract states that smaller on-device models are used, but the exact model sizes, quantization, and latency numbers are only mentioned in passing in §3.2.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback on our manuscript. We address each of the major comments below and commit to revising the paper to incorporate additional details and analyses as requested.
read point-by-point responses
-
Referee: [§5] §5 (Experiments): no precision, recall, or F1 scores are reported for the on-device timing model’s intervention decisions, nor any ablation that measures response quality when timing is correct versus incorrect. The aggregate 4.72/5 LLM score therefore cannot be attributed to the proposed architecture rather than to the evaluation protocol itself.
Authors: We agree that reporting precision, recall, and F1 scores for the on-device intervention timing model, along with an ablation study on response quality for correct versus incorrect timing decisions, would provide stronger evidence for the contribution of our architecture. In the revised manuscript, we will add these metrics and the ablation analysis to better isolate the impact of the timing component. revision: yes
-
Referee: [§5.2] §5.2 and Table 3: the “up to 3× token reduction” claim lacks per-baseline token counts, variance across chat lengths, and a breakdown separating on-device versus cloud tokens. Without these numbers it is impossible to verify the efficiency gain or to reproduce the result.
Authors: We acknowledge that more granular token usage data is necessary to substantiate the efficiency claims. We will revise Table 3 to include detailed per-baseline token counts, variance or standard deviations across varying chat lengths, and an explicit breakdown of on-device versus cloud token consumption. This will enable verification and reproduction of the reported up to 3× reduction. revision: yes
-
Referee: [§4] §4 (MUIR benchmark): the annotation protocol, inter-annotator agreement, and rationale quality statistics are not provided. Given that the benchmark is central to all quantitative claims, the absence of these reliability metrics undermines confidence in the 2,500-segment evaluation.
Authors: We recognize the importance of documenting the annotation process and reliability metrics for the MUIR benchmark. In the revised version, we will include a detailed description of the annotation protocol, inter-annotator agreement statistics (such as Cohen's or Fleiss' kappa), and any available statistics on the quality of the provided rationales to strengthen confidence in the benchmark. revision: yes
Circularity Check
No circularity; empirical claims rest on new benchmark and experiments
full rationale
The paper introduces GroupGPT as an edge-cloud architecture for multi-user chat and the MUIR benchmark with 2500 annotated segments. All headline claims (4.72/5 LLM score, up to 3x token reduction, privacy sanitization) are presented as outcomes of experiments on this benchmark rather than any derivation, equation, or fitted parameter that reduces to its own inputs by construction. No self-definitional steps, predictions from fitted inputs, load-bearing self-citations, uniqueness theorems, or ansatz smuggling appear in the abstract or described content. The work is self-contained against external benchmarks and does not invoke prior author results to force its central architecture.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Abdelrahman Abouelenin, Atabak Ashfaq, Adam Atkinson, Hany Awadalla, Nguyen Bach, Jianmin Bao, Alon Benhaim, Martin Cai, Vishrav Chaudhary, Congcong Chen, et al. 2025. Phi-4-mini technical report: Compact yet powerful multimodal language models via mixture-of-loras.arXiv preprint arXiv:2503.01743 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
Hossein Aboutalebi, Hwanjun Song, Yusheng Xie, Arshit Gupta, Lijia Sun, Hang Su, Igor Shalyminov, Nikolaos Pappas, Siffi Singh, and Saab Mansour. 2024. Magid: An automated pipeline for generating synthetic multi-modal datasets. InProceed- ings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lang...
work page 2024
-
[3]
Md Bokhtiar Al Zami, Shaba Shaon, Vu Khanh Quy, and Dinh C Nguyen. 2025. Digital twin in industries: A comprehensive survey.IEEE Access(2025)
work page 2025
-
[4]
Yuval Alaluf, Elad Richardson, Sergey Tulyakov, Kfir Aberman, and Daniel Cohen- Or. 2024. Myvlm: Personalizing vlms for user-specific queries. InEuropean Conference on Computer Vision. Springer, 73–91
work page 2024
-
[5]
Constanze Albrecht, Chayapatr Archiwaranguprok, Rachel Poonsiriwong, Awu Chen, Peggy Yin, Monchai Lertsutthiwong, Kavin Winson, Hal Hershfield, Pattie Maes, and Pat Pataranutaporn. 2025. Future You: Designing and Evaluating Multimodal AI-generated Digital Twins for Strengthening Future Self-Continuity. arXiv preprint arXiv:2512.06106(2025)
-
[6]
Marcel Binz, Elif Akata, Matthias Bethge, Franziska Brändle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, Noémi Éltető, et al
-
[7]
A foundation model to predict and capture human cognition.Nature(2025), 1–8
work page 2025
-
[8]
Paweł Budzianowski and Ivan Vulić. 2019. Hello, it’s GPT-2-how can I help you? towards the use of pretrained language models for task-oriented dialogue systems. InProceedings of the 3rd Workshop on Neural Generation and Translation. 15–22
work page 2019
-
[9]
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gasic. 2018. Multiwoz-a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. InPro- ceedings of the 2018 conference on empirical methods in natural language processing. 5016–5026
work page 2018
-
[10]
Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, and Chiyuan Zhang. 2022. Quantifying memorization across neural language models. InThe Eleventh International Conference on Learning Represen- tations
work page 2022
-
[11]
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert- Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. 2021. Extracting training data from large language models. In30th USENIX security symposium (USENIX Security 21). 2633–2650
work page 2021
-
[12]
Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu
-
[13]
InFindings of the association for computational linguistics: ACL 2024
M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. InFindings of the association for computational linguistics: ACL 2024. 2318–2335
work page 2024
-
[14]
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences.Advances in neural information processing systems30 (2017)
work page 2017
-
[15]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
OpenClaw Contributors. 2026. OpenClaw: your personal, open source AI assis- tant. https://github.com/openclaw/openclaw
work page 2026
-
[17]
Yao Dou, Isadora Krsek, Tarek Naous, Anubha Kabra, Sauvik Das, Alan Ritter, and Wei Xu. 2024. Reducing privacy risks in online self-disclosures with lan- guage models. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: long papers). 13732–13754
work page 2024
-
[18]
Mihail Eric, Rahul Goel, Shachi Paul, Abhishek Sethi, Sanchit Agarwal, Shuyang Gao, Adarsh Kumar, Anuj Goyal, Peter Ku, and Dilek Hakkani-Tur. 2020. Multi- WOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. InProceedings of the twelfth language resources and evaluation conference. 422–428
work page 2020
-
[19]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[20]
Jia-Chen Gu, Zhenhua Ling, Quan Liu, Cong Liu, and Guoping Hu. 2023. GIFT: graph-induced fine-tuning for multi-party conversation understanding. InPro- ceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 11645–11658
work page 2023
-
[21]
Jia-Chen Gu, Chongyang Tao, Zhenhua Ling, Can Xu, Xiubo Geng, and Daxin Jiang. 2021. MPC-BERT: A pre-trained language model for multi-party conversa- tion understanding. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers...
work page 2021
-
[22]
Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, and Xiangyu Yue. 2025. Rap: Retrieval-augmented personalization for multimodal large language models. InProceedings of the Computer Vision and Pattern Recognition Conference. 14538– 14548
work page 2025
-
[23]
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al
-
[24]
InThe twelfth international conference on learning representations
MetaGPT: Meta programming for a multi-agent collaborative framework. InThe twelfth international conference on learning representations
-
[25]
Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, and Richard Socher. 2020. A simple language model for task-oriented dialogue.Advances in neural information processing systems33 (2020), 20179–20191
work page 2020
-
[26]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.Iclr1, 2 (2022), 3
work page 2022
-
[27]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [28]
-
[29]
Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, and Minjoon Seo. 2023. Knowledge unlearning for mitigating privacy risks in language models. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 14389–14408
work page 2023
- [30]
-
[31]
Bowen Jiang, Yuan Yuan, Maohao Shen, Zhuoqun Hao, Zhangchen Xu, Zichen Chen, Ziyi Liu, Anvesh Rao Vijjini, Jiashu He, Hanchao Yu, et al . 2025. Personamem-v2: Towards personalized intelligence via learning implicit user personas and agentic memory.arXiv preprint arXiv:2512.06688(2025)
-
[32]
Nikhil Kandpal, Eric Wallace, and Colin Raffel. 2022. Deduplicating training data mitigates privacy risks in language models. InInternational Conference on Machine Learning. PMLR, 10697–10707
work page 2022
-
[33]
Esma Karahodža, Amra Delić, and Francesco Ricci. 2025. Conceptual framework for group dynamics modeling from group chat interactions. InAdjunct Proceedings of the 33rd ACM Conference on User Modeling, Adaptation and Personalization. 23–27
work page 2025
-
[34]
Nir Kshetri. 2023. Cybercrime and privacy threats of large language models.IT Professional25, 03 (2023), 9–13
work page 2023
-
[35]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles. 611–626
work page 2023
-
[36]
Christine P Lee, Jihye Choi, and Bilge Mutlu. 2025. MAP: Multi-user Personal- ization with Collaborative LLM-powered Agents. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems. 1–11
work page 2025
-
[37]
Nyoungwoo Lee, Suwon Shin, Jaegul Choo, Ho-Jin Choi, and Sung-Hyon Myaeng
-
[38]
Constructing multi-modal dialogue dataset by replacing text with semanti- cally relevant images. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natu- ral Language Processing (Volume 2: Short Papers). 897–906
-
[39]
Young-Jun Lee, Byungsoo Ko, Han-Gyu Kim, Jonghwan Hyeon, and Ho-Jin Choi
-
[40]
Dialogcc: An automated pipeline for creating high-quality multi-modal dia- logue dataset. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 1938–1963
work page 2024
-
[41]
Yuxuan Lei, Tianfu Wang, Jianxun Lian, Zhengyu Hu, Defu Lian, and Xing Xie
-
[42]
arXiv:2601.15793 [cs.CL] https://arxiv.org/abs/2601.15793
HumanLLM: Towards Personalized Understanding and Simulation of Human Nature. arXiv:2601.15793 [cs.CL] https://arxiv.org/abs/2601.15793
- [43]
-
[44]
Yujia Lin, Liming Chen, Aftab Ali, Christopher Nugent, Ian Cleland, Rongyang Li, Jianguo Ding, and Huansheng Ning. 2024. Human digital twin: A survey. Journal of Cloud Computing13, 1 (2024), 131
work page 2024
-
[45]
Pierre Lison, Ildikó Pilán, David Sanchez, Montserrat Batet, and Lilja Øvrelid
-
[46]
Anonymisation models for text data: State of the art, challenges and future directions. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 4188–4203
-
[47]
Aixin Liu, Aoxue Mei, Bangcai Lin, Bing Xue, Bingxuan Wang, Bingzheng Xu, Bochao Wu, Bowei Zhang, Chaofan Lin, Chen Dong, et al . 2025. Deepseek- v3. 2: Pushing the frontier of open large language models.arXiv preprint arXiv:2512.02556(2025). Conference, 2026, Shen et al
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[48]
Xingyu Bruce Liu, Shitao Fang, Weiyan Shi, Chien-Sheng Wu, Takeo Igarashi, and Xiang’Anthony’ Chen. 2025. Proactive conversational agents with inner thoughts. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–19
work page 2025
-
[49]
Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. 2023. G-eval: NLG evaluation using gpt-4 with better human alignment. InProceedings of the 2023 conference on empirical methods in natural language processing. 2511–2522
work page 2023
- [50]
- [51]
-
[52]
Milad Nasr, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A Feder Cooper, Daphne Ippolito, Christopher A Choquette-Choo, Eric Wallace, Flo- rian Tramèr, and Katherine Lee. 2023. Scalable extraction of training data from (production) language models.arXiv preprint arXiv:2311.17035(2023)
work page internal anchor Pith review arXiv 2023
-
[53]
Ivoline C Ngong, Swanand Ravindra Kadhe, Hao Wang, Keerthiram Murugesan, Justin D Weisz, Amit Dhurandhar, and Karthikeyan Natesan Ramamurthy. 2025. Protecting users from themselves: Safeguarding contextual privacy in interactions with conversational agents. InFindings of the Association for Computational Linguistics: ACL 2025. 26196–26220
work page 2025
-
[54]
Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, and Yong Jae Lee
-
[55]
Yo’llava: Your personalized language and vision assistant.Advances in Neural Information Processing Systems37 (2024), 40913–40951
work page 2024
-
[56]
Hiroki Ouchi and Yuta Tsuboi. 2016. Addressee and response selection for multi- party conversation. InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2133–2143
work page 2016
-
[57]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback.Advances in neural information processing systems35 (2022), 27730–27744
work page 2022
-
[58]
Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. 2024. Chatdev: Communicative agents for software development. InProceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers). 15174–15186
work page 2024
-
[59]
Xiaohui Song, Longtao Huang, Songlin Hu, et al. 2022. Supervised prototypical contrastive learning for emotion recognition in conversation. InProceedings of the 2022 conference on empirical methods in natural language processing. 5197–5206
work page 2022
-
[60]
Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Andreas Koukounas, Nan Wang, and Han Xiao. 2024. jina-embeddings-v3: Multilingual Embeddings With Task LoRA. arXiv:2409.10173 [cs.CL] https://arxiv.org/abs/ 2409.10173
-
[61]
Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai, and Yi Zhang. 2022. Multi-task pre-training for plug-and-play task-oriented dia- logue system. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 4661–4676
work page 2022
-
[62]
Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, et al. 2024. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[63]
Qwen Team. 2024. Qwen2.5: A Party of Foundation Models. https://qwenlm. github.io/blog/qwen2.5/
work page 2024
-
[64]
Ruotong Wang, Xinyi Zhou, Lin Qiu, Joseph Chee Chang, Jonathan Bragg, and Amy X Zhang. 2025. Social-RAG: Retrieving from Group Interactions to Socially Ground AI Generation. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–25
work page 2025
-
[65]
Weizhi Wang, Zhirui Zhang, Junliang Guo, Yinpei Dai, Boxing Chen, and Wei- hua Luo. 2022. Task-oriented dialogue system as natural language generation. InProceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. 2698–2703
work page 2022
-
[66]
Xintao Wang, Heng Wang, Yifei Zhang, Xinfeng Yuan, Rui Xu, Jen tse Huang, Siyu Yuan, Haoran Guo, Jiangjie Chen, Shuchang Zhou, Wei Wang, and Yanghua Xiao
-
[67]
Coser: Coordinating llm-based persona simulation of established roles, 2025
CoSER: A Comprehensive Literary Dataset and Framework for Training and Evaluating LLM Role-Playing and Persona Simulation. arXiv:2502.09082 [cs.CL] https://arxiv.org/abs/2502.09082
-
[68]
Stanisław Woźniak, Bartłomiej Koptyra, Arkadiusz Janz, Przemysław Kazienko, and Jan Kocoń. 2024. Personalized large language models. In2024 IEEE Interna- tional Conference on Data Mining Workshops (ICDMW). IEEE, 511–520
work page 2024
- [69]
-
[70]
Bo Xu, Tingting Li, Junzhe Zheng, Mehdi Naseriparsa, Zhehuan Zhao, Hongfei Lin, and Feng Xia. 2022. Met-meme: A multimodal meme dataset rich in metaphors. InProceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. 2887–2899
work page 2022
-
[71]
Jin Xu, Zhifang Guo, Hangrui Hu, Yunfei Chu, Xiong Wang, Jinzheng He, Yuxuan Wang, Xian Shi, Ting He, Xinfa Zhu, et al. 2025. Qwen3-omni technical report. arXiv preprint arXiv:2509.17765(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[72]
Jing Xu, Arthur Szlam, and Jason Weston. 2022. Beyond goldfish memory: Long- term open-domain conversation. InProceedings of the 60th annual meeting of the association for computational linguistics (volume 1: long papers). 5180–5197
work page 2022
-
[73]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[74]
Yunyi Yang, Yunhao Li, and Xiaojun Quan. 2021. Ubar: Towards fully end-to-end task-oriented dialog system with gpt-2. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 14230–14238
work page 2021
-
[75]
Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, et al
- [76]
-
[77]
Xiaoxue Zang, Abhinav Rastogi, Srinivas Sunkara, Raghav Gupta, Jianguo Zhang, and Jindong Chen. 2020. MultiWOZ 2.2: A dialogue dataset with additional annotation corrections and state tracking baselines. InProceedings of the 2nd workshop on natural language processing for conversational AI. 109–117
work page 2020
-
[78]
Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tramèr, and Nicholas Carlini. 2023. Counterfactual memorization in neural language models.Advances in Neural Information Processing Systems36 (2023), 39321–39362
work page 2023
-
[79]
Rui Zhang, Honglak Lee, Lazaros Polymenakos, and Dragomir Radev. 2018. Ad- dressee and response selection in multi-party conversations with speaker in- teraction rnns. InProceedings of the AAAI conference on artificial intelligence, Vol. 32
work page 2018
-
[80]
Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018. Personalizing dialogue agents: I have a dog, do you have pets too?. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2204–2213
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.