pith. machine review for the scientific record. sign in

arxiv: 2604.06452 · v1 · submitted 2026-04-07 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

Learning to Interrupt in Language-based Multi-agent Communication

Ansong Ni, Asli Celikyilmaz, Danqing Wang, Da Yin, Lei Li, Ruta Desai

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:04 UTC · model grok-4.3

classification 💻 cs.CL
keywords multi-agent communicationlarge language modelsinterruption learningcommunication efficiencydialogue managementreinforcement learning for agents
0
0 comments X

The pith

A learned policy lets listener agents interrupt speakers in LLM multi-agent setups to cut communication costs by 32 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework for multi-agent LLM systems in which the listening agent can break into the speaker's output instead of waiting for a full message. LLMs alone tend to interrupt too soon because they overestimate what they already know, so the authors train a separate predictor that chooses interruption moments by balancing expected future task reward against added communication cost. The resulting method, applied to two-agent drawing games, three-agent scheduling, and three-agent debates, lowers total tokens exchanged while matching or exceeding baseline success rates. The learned interruption behavior transfers to unseen agents and tasks without retraining. If this holds, multi-agent LLM conversations can become shorter and more adaptive by borrowing the back-and-forth pattern of human dialogue.

Core claim

Predicting interruption points from estimates of future reward and communication cost produces an effective listener policy that reduces overall message volume by 32.2 percent relative to non-interruptible baselines while preserving or improving task performance across text pictionary, meeting scheduling, and debate scenarios, and that this policy generalizes to new agent combinations and task types.

What carries the argument

HANDRAISER, a learned predictor that selects interruption points by weighing estimated future task reward against the cost of continued speaking.

If this is right

  • Multi-agent LLM systems can complete the same tasks with substantially fewer tokens exchanged overall.
  • The underlying language models themselves do not need retraining to gain these efficiency improvements.
  • The same interruption policy works across agent counts and task domains without modification.
  • Listeners gain the ability to request clarification or state partial beliefs at moments that actually help the group.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Combining listener-side interruption with existing speaker-side message compression could produce further savings.
  • The approach might transfer to single-model settings where an LLM decides when to pause its own generation and query the user.
  • Large-scale deployments could see meaningful drops in API token spend once the interruption policy is learned once and reused.

Load-bearing premise

The interruption predictor trained on reward and cost estimates will continue to select useful stopping points when applied to different agents or tasks without retraining or retuning.

What would settle it

Running the trained interruption policy on a new multi-agent task and measuring whether total communication length stays the same or increases while task success falls below the no-interruption baseline.

Figures

Figures reproduced from arXiv: 2604.06452 by Ansong Ni, Asli Celikyilmaz, Danqing Wang, Da Yin, Lei Li, Ruta Desai.

Figure 1
Figure 1. Figure 1: Text Pictionary game as an example: a describer describes a word for the guesser [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: In non-interruptible communication (left), one agent (Bob) must wait for the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sampling to estimate the expectation of communication cost and task performance. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Avg. Success Ratio of Llama-8B lis￾tening agent with different speaking agents. 0 100 200 300 400 500 600 700 llama 70B llama 405B Gemini-2.0 GPT-4o Generic Concise Random Prompt HandRaiser [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: GPT-4o’s understanding on in￾correct guesses [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Ablation on different thresholds and chunk sizes. “SR” denotes success ratio. [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
read the original abstract

Multi-agent systems using large language models (LLMs) have demonstrated impressive capabilities across various domains. However, current agent communication suffers from verbose output that overload context and increase computational costs. Although existing approaches focus on compressing the message from the speaker side, they struggle to adapt to different listeners and identify relevant information. An effective way in human communication is to allow the listener to interrupt and express their opinion or ask for clarification. Motivated by this, we propose an interruptible communication framework that allows the agent who is listening to interrupt the current speaker. Through prompting experiments, we find that current LLMs are often overconfident and interrupt before receiving enough information. Therefore, we propose a learning method that predicts the appropriate interruption points based on the estimated future reward and cost. We evaluate our framework across various multi-agent scenarios, including 2-agent text pictionary games, 3-agent meeting scheduling, and 3-agent debate. The results of the experiment show that our HANDRAISER can reduce the communication cost by 32.2% compared to the baseline with comparable or superior task performance. This learned interruption behavior can also be generalized to different agents and tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces HANDRAISER, an interruptible communication framework for LLM-based multi-agent systems. It motivates the approach via human-like listener interruptions to reduce verbose outputs and context overload, identifies LLM overconfidence through prompting experiments, and proposes a learning method to predict interruption points based on estimated future reward and cost. The framework is evaluated on 2-agent text pictionary games, 3-agent meeting scheduling, and 3-agent debate, claiming a 32.2% communication cost reduction relative to baseline with comparable or superior task performance, along with generalization of the learned interruption behavior to different agents and tasks.

Significance. If the results hold, the work could meaningfully advance efficient multi-agent LLM systems by shifting from speaker-side compression to adaptive listener-initiated interruptions. The reward-cost estimation approach offers a principled, learning-based alternative to heuristic interruption rules and may scale better across scenarios than fixed prompting strategies.

major comments (2)
  1. [Experiments] Experiments section: The generalization claim (zero-shot transfer of the interruption policy across tasks with differing turn structures, information asymmetry, and reward sparsity) is load-bearing for the central result but rests on the unverified assumption that LLM-derived reward/cost estimates transfer without retraining. The manuscript should report explicit cross-task zero-shot results or clarify retraining details, as failure here would undermine both the 32.2% cost reduction and the generalization statement.
  2. [Method] Method section: The procedure for estimating future reward and cost (used as the learning objective for interruption prediction) lacks sufficient algorithmic or equation-level detail. Without this, it is impossible to determine whether the estimates avoid inheriting the overconfidence bias shown in the prompting experiments or whether the policy is robust to the task differences noted above.
minor comments (2)
  1. [Abstract] Abstract: The performance claim of 32.2% cost reduction is stated without reference to baselines, variance, number of runs, or statistical tests; these details should be summarized even in the abstract for immediate verifiability.
  2. [Experiments] The manuscript would benefit from a table comparing cost and task metrics across all three scenarios with explicit baseline definitions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's thoughtful and constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to strengthen the clarity and rigor of the presentation.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: The generalization claim (zero-shot transfer of the interruption policy across tasks with differing turn structures, information asymmetry, and reward sparsity) is load-bearing for the central result but rests on the unverified assumption that LLM-derived reward/cost estimates transfer without retraining. The manuscript should report explicit cross-task zero-shot results or clarify retraining details, as failure here would undermine both the 32.2% cost reduction and the generalization statement.

    Authors: We thank the referee for highlighting the importance of making the generalization evidence explicit. The manuscript reports results across multiple tasks (text pictionary, scheduling, and debate) and demonstrates that the learned interruption policy transfers to different agents and tasks. To directly address the concern, we will add a new subsection in the Experiments section with explicit zero-shot cross-task transfer results, including tables showing performance when the policy trained on one task is applied without retraining to others with differing turn structures, information asymmetry, and reward sparsity. This will confirm the transfer of the reward/cost estimates and support the reported cost reductions. revision: yes

  2. Referee: [Method] Method section: The procedure for estimating future reward and cost (used as the learning objective for interruption prediction) lacks sufficient algorithmic or equation-level detail. Without this, it is impossible to determine whether the estimates avoid inheriting the overconfidence bias shown in the prompting experiments or whether the policy is robust to the task differences noted above.

    Authors: We agree that the Method section requires more precise description of the estimation procedure. In the revised version, we will expand this section with algorithmic details, pseudocode, and equations detailing how future reward (derived from expected task success) and cost (derived from projected communication overhead) are estimated via LLM-based simulation of trajectories. We will clarify that these learned estimates serve as the training objective for the interruption predictor, distinguishing them from direct LLM prompting and thereby avoiding the overconfidence bias shown in our prompting experiments. We will also add discussion of robustness to the noted task differences. revision: yes

Circularity Check

0 steps flagged

No significant circularity; learning objective tied to external task outcomes

full rationale

The paper presents a learning method for predicting interruption points from estimated future reward and cost, evaluated empirically across distinct multi-agent scenarios (pictionary, scheduling, debate) with reported cost reductions. No equations, derivations, or self-citations appear in the provided text that would reduce the interruption policy or its predictions to the training inputs by construction. The reward/cost estimation is described as an objective linked to downstream task performance rather than a tautological fit or renaming. Generalization claims rest on experimental transfer, not on any self-definitional or uniqueness-imported structure. This is the expected non-circular outcome for an empirical learning paper without load-bearing analytic reductions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only abstract available; ledger is therefore incomplete and limited to assumptions visible in the summary text.

free parameters (1)
  • reward and cost estimation parameters
    The interruption predictor is trained on estimated future reward and cost; these quantities are learned from data and therefore constitute free parameters whose values are not reported.
axioms (1)
  • domain assumption Current LLMs are overconfident and interrupt too early when prompted
    The paper states this finding from prompting experiments as motivation for the learned method.

pith-pipeline@v0.9.0 · 5511 in / 1196 out tokens · 62321 ms · 2026-05-10T19:04:15.575721+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

51 extracted references · 20 canonical work pages · 8 internal anchors

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    https://www.anthropic.com/engineering/multi-agent-research-system

    Anthropic . https://www.anthropic.com/engineering/multi-agent-research-system. https://www.anthropic.com/engineering/multi-agent-research-system, 2025

  3. [3]

    Interruptions and the interpretation of conversation

    Adrian Bennett. Interruptions and the interpretation of conversation. In Annual Meeting of the Berkeley Linguistics Society, pp.\ 557--575, 1978

  4. [4]

    Why Do Multi-Agent LLM Systems Fail?

    Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multi-agent llm systems fail? arXiv preprint arXiv:2503.13657, 2025

  5. [5]

    Chateval: Towards better LLM -based evaluators through multi-agent debate

    Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. Chateval: Towards better LLM -based evaluators through multi-agent debate. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=FQepisCUWu

  6. [6]

    Autoagents: A framework for automatic agent generation

    Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje Karlsson, Jie Fu, and Yemin Shi. Autoagents: A framework for automatic agent generation. In IJCAI, pp.\ 22--30, 2024. URL https://www.ijcai.org/proceedings/2024/3

  7. [7]

    Can large language models be trusted for evaluation? scalable meta-evaluation of llms as evaluators via agent debate

    Steffi Chern, Ethan Chern, Graham Neubig, and Pengfei Liu. Can large language models be trusted for evaluation? scalable meta-evaluation of llms as evaluators via agent debate. arXiv preprint arXiv:2401.16788, 2024

  8. [8]

    Mechanism design for multi-agent meeting scheduling

    Elisabeth Crawford and Manuela Veloso. Mechanism design for multi-agent meeting scheduling. Web Intelligence and Agent Systems, 4 0 (2): 0 209--220, 2006

  9. [9]

    Improving Factuality and Reasoning in Language Models through Multiagent Debate

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023

  10. [10]

    Alpacafarm: A simulation framework for methods that learn from human feedback

    Yann Dubois, Chen Xuechen Li, Rohan Taori, Tianyi Zhang, Ishaan Gulrajani, Jimmy Ba, Carlos Guestrin, Percy S Liang, and Tatsunori B Hashimoto. Alpacafarm: A simulation framework for methods that learn from human feedback. Advances in Neural Information Processing Systems, 36: 0 30039--30069, 2023

  11. [11]

    Thinkless: Llm learns when to think.arXiv preprint arXiv:2505.13379,

    Gongfan Fang, Xinyin Ma, and Xinchao Wang. Thinkless: Llm learns when to think. arXiv preprint arXiv:2505.13379, 2025

  12. [12]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

  13. [13]

    Large language model based multi-agents: A survey of progress and challenges

    Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: A survey of progress and challenges. In IJCAI, 2024

  14. [14]

    Training large language models to reason in a continuous latent space

    Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason E Weston, and Yuandong Tian. Training large language models to reason in a continuous latent space. In Second Conference on Language Modeling, 2025. URL https://openreview.net/forum?id=Itxz7S4Ip3

  15. [15]

    Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead

    Junda He, Christoph Treude, and David Lo. Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead. ACM Transactions on Software Engineering and Methodology, 34 0 (5): 0 1--30, 2025

  16. [16]

    Self-evolving multi-agent collaboration networks for software development

    Yue Hu, Yuzhu Cai, Yaxin Du, Xinyu Zhu, Xiangrui Liu, Zijie Yu, Yuchen Hou, Shuo Tang, and Siheng Chen. Self-evolving multi-agent collaboration networks for software development. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=4R71pdPBZp

  17. [17]

    GPT-4o System Card

    Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024

  18. [18]

    C3ot: Generating shorter chain-of-thought without compromising effectiveness

    Yu Kang, Xianghui Sun, Liangyu Chen, and Wei Zou. C3ot: Generating shorter chain-of-thought without compromising effectiveness. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp.\ 24312--24320, 2025

  19. [19]

    Debating with more persuasive llms leads to more truthful answers

    Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R Bowman, Tim Rockt \"a schel, and Ethan Perez. Debating with more persuasive llms leads to more truthful answers. Proceedings of Machine Learning Research, 235: 0 23662--23733, 2024

  20. [20]

    Codi: Co-evolving contrastive diffusion models for mixed-type tabular synthesis

    Chaejeong Lee, Jayoung Kim, and Noseong Park. Codi: Co-evolving contrastive diffusion models for mixed-type tabular synthesis. In International Conference on Machine Learning, pp.\ 18940--18956. PMLR, 2023

  21. [21]

    Improving Multi-Agent Debate with Sparse Communication Topology , booktitle =

    Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, and Eugene Ie. Improving multi-agent debate with sparse communication topology. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (eds.), Findings of the Association for Computational Linguistics: EMNLP 2024, pp.\ 7281--7294, Miami, Florida, USA, November 2024. Association for Co...

  22. [22]

    Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

    Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, and Shuming Shi. Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118, 2023

  23. [23]

    Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization

    Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. In COLM, 2024

  24. [24]

    Conversation, politeness, and interruption

    William G Lycan. Conversation, politeness, and interruption. Paper in Linguistics, 10 0 (1-2): 0 23--53, 1977

  25. [25]

    Interruption and influence in discussion groups

    Sik Hung Ng, Mark Brooke, and Michael Dunne. Interruption and influence in discussion groups. Journal of Language and Social Psychology, 14 0 (4): 0 369--381, 1995

  26. [26]

    Openclaw

    OpenClaw . Openclaw. https://github.com/openclaw/openclaw, 2026

  27. [27]

    Generative agents: Interactive simulacra of human behavior

    Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology, pp.\ 1--22, 2023

  28. [28]

    Scaling large language model-based multi-agent collaboration

    Chen Qian, Zihao Xie, YiFei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Scaling large language model-based multi-agent collaboration. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=K3n5jPkrU6

  29. [29]

    Concise: Confidence-guided compression in step-by-step efficient reasoning

    Ziqing Qiao, Yongheng Deng, Jiali Zeng, Dong Wang, Lai Wei, Fandong Meng, Jie Zhou, Ju Ren, and Yaoxue Zhang. Concise: Confidence-guided compression in step-by-step efficient reasoning. arXiv preprint arXiv:2505.04881, 2025

  30. [30]

    Multi-agent meeting scheduling: A negotiation perspective

    Bram M Renting, Holger Hoos, and Catholijn M Jonker. Multi-agent meeting scheduling: A negotiation perspective. In The Sixteenth Workshop on Adaptive and Learning Agents, 2024

  31. [31]

    Coding theorems for a discrete source with a fidelity criterion

    Claude E Shannon et al. Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec, 4 0 (142-163): 0 1, 1959

  32. [32]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023

  33. [33]

    Kimi Team, Tongtong Bai, Yifan Bai, Yiping Bao, SH Cai, Yuan Cao, Y Charles, HS Che, Cheng Chen, Guanduo Chen, et al. Kimi k2. 5: Visual agentic intelligence. arXiv preprint arXiv:2602.02276, 2026

  34. [34]

    Multi-Agent Collaboration Mechanisms: A Survey of LLMs

    Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O'Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322, 2025

  35. [35]

    Beyond turn-based interfaces: Synchronous llms as full-duplex dialogue agents

    Bandhav Veluri, Benjamin N Peloquin, Bokai Yu, Hongyu Gong, and Shyamnath Gollakota. Beyond turn-based interfaces: Synchronous llms as full-duplex dialogue agents. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 21390--21402, 2024

  36. [36]

    Avalon’s game of thoughts: Battle against deception through recursive contemplation

    Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, and Gao Huang. Avalon's game of thoughts: Battle against deception through recursive contemplation. ArXiv, abs/2310.01320, 2023. URL https://api.semanticscholar.org/CorpusID:263605971

  37. [37]

    Mmlu-pro: A more robust and challenging multi-task language understanding benchmark

    Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, et al. Mmlu-pro: A more robust and challenging multi-task language understanding benchmark. Advances in Neural Information Processing Systems, 37: 0 95266--95290, 2024

  38. [38]

    A gent D ropout: Dynamic agent elimination for token-efficient and high-performance LLM -based multi-agent collaboration

    Zhexuan Wang, Yutong Wang, Xuebo Liu, Liang Ding, Miao Zhang, Jie Liu, and Min Zhang. A gent D ropout: Dynamic agent elimination for token-efficient and high-performance LLM -based multi-agent collaboration. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.), Proceedings of the 63rd Annual Meeting of the Association for ...

  39. [39]

    Tokenskip: Controllable chain-of-thought compression in llms

    Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li, and Wenjie Li. Tokenskip: Controllable chain-of-thought compression in llms. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp.\ 3351--3363, 2025

  40. [40]

    Ex- ploring large language models for communica- tion games: An empirical study on werewolf

    Yuzhuang Xu, Shuo Wang, Peng Li, Fuwen Luo, Xiaolong Wang, Weidong Liu, and Yang Liu. Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658, 2023

  41. [41]

    findings-emnlp.479/

    Bingyu Yan, Zhibo Zhou, Litian Zhang, Lian Zhang, Ziyi Zhou, Dezhuang Miao, Zhoujun Li, Chaozhuo Li, and Xiaoming Zhang. Beyond self-talk: A communication-centric survey of llm-based multi-agent systems. arXiv preprint arXiv:2502.14321, 2025

  42. [42]

    Exchange-of-thought: Enhancing large language model capabilities through cross-model communication

    Zhangyue Yin, Qiushi Sun, Cheng Chang, Qipeng Guo, Junqi Dai, Xuan-Jing Huang, and Xipeng Qiu. Exchange-of-thought: Enhancing large language model capabilities through cross-model communication. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.\ 15135--15153, 2023

  43. [43]

    Cut the crap: An economical communication pipeline for LLM -based multi-agent systems

    Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun, Guancheng Wan, Kun Wang, Dawei Cheng, Jeffrey Xu Yu, and Tianlong Chen. Cut the crap: An economical communication pipeline for LLM -based multi-agent systems. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=LkzuPorQ5L

  44. [44]

    Beyond the turn-based game: Enabling real-time conversations with duplex models

    Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, and Zhiyuan Liu. Beyond the turn-based game: Enabling real-time conversations with duplex models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 11543--11557, 2024 a

  45. [45]

    Chain of agents: Large language models collaborating on long-context tasks

    Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, and Sercan \"O Ar k. Chain of agents: Large language models collaborating on long-context tasks. Advances in Neural Information Processing Systems, 37: 0 132208--132237, 2024 b

  46. [46]

    arXiv preprint arXiv:2406.04520 , year=

    Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou, Heng-Tze Cheng, Quoc V Le, Ed H Chi, et al. Natural plan: Benchmarking llms on natural language planning. arXiv preprint arXiv:2406.04520, 2024

  47. [47]

    arXiv preprint arXiv:2502.02533 , year=

    Han Zhou, Xingchen Wan, Ruoxi Sun, Hamid Palangi, Shariq Iqbal, Ivan Vuli \'c , Anna Korhonen, and Sercan \"O Ar k. Multi-agent design: Optimizing agents with better prompts and topologies. arXiv preprint arXiv:2502.02533, 2025

  48. [48]

    Language agents as optimizable graphs.arXiv preprint arXiv:2402.16823, 2024

    Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and J \"u rgen Schmidhuber. Language agents as optimizable graphs. arXiv preprint arXiv:2402.16823, 2024

  49. [49]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  50. [50]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  51. [51]

    start immediately

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...