pith. sign in

arxiv: 2605.19418 · v1 · pith:XUGA5QD2new · submitted 2026-05-19 · 💻 cs.AI

Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling

Pith reviewed 2026-05-20 05:42 UTC · model grok-4.3

classification 💻 cs.AI
keywords signed graphsmulti-agent systemsLLM reasoningconflict modelingmessage passingagent collaborationgraph-based aggregation
0
0 comments X

The pith

SIGMA models inter-agent relations as a signed graph to suppress conflicts and improve multi-agent LLM reasoning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SIGMA to fix how multiple LLM agents combine their outputs when they disagree. It builds a signed graph that marks edges as trust, conflict, or neutral, each with a confidence weight, then routes information through message passing that strengthens reliable signals and weakens contradictory ones. This replaces naive averaging that lets errors spread unchecked. The final step uses both the graph structure and conflict information to produce a single consistent answer. Readers would care because many practical tasks require agents to handle opposing views without collapsing into lower accuracy.

Core claim

Given a query, SIGMA selects relevant and diverse agents, constructs a signed interaction graph with confidence-weighted edges that encode trust, conflict, and neutral relations, runs conflict-aware signed message passing to reinforce trustworthy agents while suppressing conflicting signals, and finishes with structure- and conflict-aware weighted aggregation that delivers globally consistent predictions.

What carries the argument

The signed relational graph with confidence-weighted edges, which enables conflict-aware signed message passing to distinguish and act on trust versus conflict relations among agents.

If this is right

  • Accuracy improves on six standard benchmarks across varied LLM backbones.
  • Conflict-resilient performance rises in multiple multi-agent configurations.
  • The method identifies reliable interaction patterns that prior graph-based MAS frameworks miss.
  • Globally consistent predictions emerge even when raw agent outputs contain opposing signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same signed-graph construction could be tested on single-agent chain-of-thought traces by treating intermediate steps as conflicting nodes.
  • Dynamic updates to the signed graph during multi-turn interactions might handle changing agent stances better than a static construction.
  • Hybrid teams mixing LLMs with human experts could assign signed edges based on domain expertise to weight human input more heavily in conflicts.

Load-bearing premise

A structured signed interaction graph with confidence-weighted edges can be constructed from a query such that conflict-aware signed message passing reliably suppresses erroneous signals and yields globally consistent predictions.

What would settle it

On a benchmark dataset with deliberately injected agent conflicts where the true answer comes from the conflicting agents, if SIGMA's accuracy falls below that of simple majority or averaging baselines, the central claim would not hold.

Figures

Figures reproduced from arXiv: 2605.19418 by Chaozhuo Li, Daojing He, Longgang He, Longzhu He.

Figure 1
Figure 1. Figure 1: (Left) Prior MAS treat all agents as equally reliable, including chain, tree, and graph-based structures. (Right) SIGMA models trust, conflict, and neutral relations via signed graph modeling, enabling the system to identify which agents to trust or challenge for conflict-resilient reasoning. To address this, we propose a SIgned Graph-informed Multi-Agent Reason￾ing framework for LLM-based multi￾agent syst… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of Balanced Triad types. First two are balanced, last two are imbalanced. Signed Graph Representation. Inspired by Balance Theory [35], signed graphs represent relationships by polarity and magnitude. Specifically, each interaction is given by Aij = sij · wij , where sij ∈ {−1, 0, +1} indicates polarity and wij ≥ 0 its magnitude, as shown in Definition 1. Definition 1 (Balance Theory [35]). A… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of SIGMA, through four stages enabling robust multi-agent reasoning: (I) Query￾Guided Agent Selection, leveraging multi-dimensional attributes; (II) Signed Relational Graph Construction, explicitly modeling heterogeneous inter-agent relations; (III) Conflict-Aware Signed Message Passing; and (IV) Signed Consensus Readout, where SIGMA integrates agent representations to yield a globally coherent co… view at source ↗
Figure 4
Figure 4. Figure 4: Panels (a) and (b) present the ablation study, illustrating the contribution of each module in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hyperparameter sensitivity. Panels (a) and (b) show the effect of [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Case study [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

LLM-based multi-agent systems (MAS) have demonstrated strong reasoning and decision-making capabilities that consistently surpass those of single LLM agents. However, their performance often suffers from naive aggregation mechanisms that assume uniformly cooperative interactions. Upon close inspection, we observe that existing graph-based MAS frameworks (1) propagate errors when conflicting signals arise without control, and (2) lack explicit modeling of conflicting inter-agent relations as well as structural awareness, failing to identify reliable interaction patterns. To bridge this gap, we introduce SIGMA, a novel SIgned Graph-informed Multi-Agent reasoning framework that explicitly captures trust, conflict, and neutral relations among agents via a signed relational graph. Specifically, given a query, SIGMA first selects a set of relevant and diverse agents, then constructs a structured signed interaction graph with confidence-weighted edges. Reasoning proceeds through conflict-aware signed message passing, which reinforces information from trustworthy agents while suppressing conflicting signals, and terminates with a structure- and conflict-aware weighted aggregation to yield globally consistent and conflict-resilient predictions. Extensive experiments on six benchmark datasets, across multiple LLM backbones and diverse multi-agent configurations, demonstrate that SIGMA consistently outperforms state-of-the-art baselines, achieving notable gains in both accuracy and conflict-resilient performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce SIGMA, a SIgned Graph-informed Multi-Agent reasoning framework for LLM-based systems. It selects relevant agents, constructs a signed interaction graph with confidence-weighted edges to capture trust, conflict, and neutral relations, performs conflict-aware signed message passing to reinforce trustworthy information and suppress conflicts, and uses structure- and conflict-aware weighted aggregation for consistent predictions. Extensive experiments on six benchmark datasets across multiple LLM backbones and multi-agent configurations demonstrate consistent outperformance over state-of-the-art baselines in accuracy and conflict-resilient performance.

Significance. If the results are substantiated, SIGMA offers a promising approach to enhancing the reliability of multi-agent LLM reasoning by explicitly modeling conflicting inter-agent relations using signed graphs, addressing limitations in existing frameworks that propagate errors under conflicts. The framework's design provides a structured method for achieving globally consistent outputs, and the broad experimental evaluation across datasets and backbones adds to its potential impact. The work merits credit for identifying specific gaps in prior graph-based MAS approaches and proposing a targeted solution.

major comments (2)
  1. [Abstract] The abstract asserts 'notable gains in both accuracy and conflict-resilient performance' without providing any numerical values, standard deviations, or specific baseline comparisons. This omission is problematic as it is central to evaluating the empirical support for the outperformance claim.
  2. [Framework Description (likely §3)] The construction of the signed interaction graph depends on deriving edge signs and confidence weights from LLM responses or judgments. The manuscript does not include tests for robustness against variations or noise in these judgments, which could lead to incorrect conflict/trust relations and affect the reliability of the subsequent message passing and aggregation steps.
minor comments (2)
  1. [Abstract] The acronym SIGMA is introduced but its expansion is given as SIgned Graph-informed Multi-Agent; consider ensuring consistent capitalization.
  2. Some sentences in the abstract are long and could be split for improved readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive overall assessment of SIGMA. We address each major comment point by point below, indicating where revisions have been made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] The abstract asserts 'notable gains in both accuracy and conflict-resilient performance' without providing any numerical values, standard deviations, or specific baseline comparisons. This omission is problematic as it is central to evaluating the empirical support for the outperformance claim.

    Authors: We agree that quantitative details strengthen the abstract's claims. In the revised manuscript we have updated the abstract to report concrete average accuracy gains of 5.3% over the strongest baselines across the six datasets (with standard deviations of 1.2-2.1% from five runs), while preserving conciseness. Full per-dataset numbers and baseline comparisons remain in Section 4 and Tables 1-3. revision: yes

  2. Referee: [Framework Description (likely §3)] The construction of the signed interaction graph depends on deriving edge signs and confidence weights from LLM responses or judgments. The manuscript does not include tests for robustness against variations or noise in these judgments, which could lead to incorrect conflict/trust relations and affect the reliability of the subsequent message passing and aggregation steps.

    Authors: We acknowledge this limitation in the original submission. While experiments across multiple LLM backbones and agent configurations provide indirect evidence of stability, we have added a dedicated sensitivity analysis (new Section 5.4) that injects controlled sign-flip noise into the LLM-derived judgments at rates of 10-30% and measures downstream impact. Results show graceful degradation with SIGMA still outperforming baselines; we also discuss this as a direction for future work. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in SIGMA derivation chain

full rationale

The paper presents SIGMA as a novel construction: agent selection from a query, followed by building a signed interaction graph with confidence-weighted edges, then conflict-aware signed message passing, and finally structure-aware weighted aggregation. These steps are defined procedurally as new modeling choices rather than derived from or reduced to prior fitted parameters or self-referential definitions within the paper. Performance claims rest on empirical comparisons against external baselines across six independent benchmark datasets and multiple LLM backbones, with no equations or procedures shown to make the reported accuracy or conflict-resilience gains equivalent to quantities defined by construction inside the same work. The framework is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claim rests on domain assumptions about the constructibility and utility of signed graphs rather than on new mathematical derivations or external benchmarks; no free parameters are explicitly named but edge weights are described as confidence-weighted.

free parameters (1)
  • edge confidence weights
    Edges are described as confidence-weighted; the abstract does not specify how these weights are obtained or whether they involve fitting.
axioms (2)
  • domain assumption A set of relevant and diverse agents can be selected for any query
    First explicit step of the SIGMA pipeline.
  • domain assumption Signed relations (trust, conflict, neutral) among agents can be meaningfully assigned and used for message passing
    Core modeling premise that enables conflict-aware propagation.
invented entities (1)
  • Signed interaction graph no independent evidence
    purpose: To explicitly represent trust, conflict, and neutral relations with weighted edges for conflict-resilient reasoning
    New structural object introduced for MAS; no independent evidence outside the framework is provided in the abstract.

pith-pipeline@v0.9.0 · 5750 in / 1509 out tokens · 60020 ms · 2026-05-20T05:42:30.599185+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · 7 internal anchors

  1. [1]

    From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

    Mohamed Amine Ferrag, Norbert Tihanyi, and Merouane Debbah. From llm reasoning to autonomous ai agents: A comprehensive review. arXiv preprint arXiv:2504.19678, 2025

  2. [2]

    Agentic reasoning: A streamlined framework for enhancing llm reasoning with agentic tools

    Junde Wu, Jiayuan Zhu, Yuyuan Liu, Min Xu, and Yueming Jin. Agentic reasoning: A streamlined framework for enhancing llm reasoning with agentic tools. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers) (ACL), pages 28489–28503, 2025

  3. [3]

    Beyond react: A planner-centric framework for complex tool- augmented llm reasoning

    Xiaolong Wei, Yuehu Dong, Xingliang Wang, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, and Dawei Yin. Beyond react: A planner-centric framework for complex tool- augmented llm reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 40, pages 33845–33853, 2026

  4. [4]

    Multi-Agent Collaboration Mechanisms: A Survey of LLMs

    Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322, 2025

  5. [5]

    Citysim: Modeling urban behaviors and city dynam- ics with large-scale llm-driven agent simulation

    Nicolas Bougie and Narimawa Watanabe. Citysim: Modeling urban behaviors and city dynam- ics with large-scale llm-driven agent simulation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track (EMNLP), pages 215–229, 2025

  6. [6]

    From words to actions: Unveiling the theoretical underpinnings of llm-driven autonomous systems

    Jianliang He, Siyu Chen, Fengzhuo Zhang, and Zhuoran Yang. From words to actions: Unveiling the theoretical underpinnings of llm-driven autonomous systems. In International Conference on Machine Learning (ICML), pages 17807–17841, 2024

  7. [7]

    Agents of change: Self-evolving llm agents for strategic planning

    Nikolas Belle, Dakota Barnes, Alfonso Amayuelas, Ivan Bercovich, Xin Eric Wang, and William Wang. Agents of change: Self-evolving llm agents for strategic planning. arXiv preprint arXiv:2506.04651, 2025

  8. [8]

    Agentic llm framework for adaptive decision discourse

    Antoine Dolant and Praveen Kumar. Agentic llm framework for adaptive decision discourse. arXiv preprint arXiv:2502.10978, 2025

  9. [9]

    Embodied agent interface: benchmarking llms for embodied decision making

    Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, et al. Embodied agent interface: benchmarking llms for embodied decision making. In Proceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS), pages 100428–100534, 2024

  10. [10]

    Automisty: a multi-agent llm framework for automated code generation in the misty social robot

    Xiao Wang, Lu Dong, Sahana Rangasrinivasan, Ifeoma Nwogu, Srirangaraj Setlur, and Venu- gopal Govindaraju. Automisty: a multi-agent llm framework for automated code generation in the misty social robot. In 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9194–9201. IEEE, 2025

  11. [11]

    A survey on large language models for code generation

    Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation. ACM Transactions on Software Engineering and Methodology (ACM TOSEM), 35(2):1–72, 2026

  12. [12]

    Data interpreter: An llm agent for data science

    Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Ceyao Zhang, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, et al. Data interpreter: An llm agent for data science. In Findings of the Association for Computational Linguistics (ACL Findings), pages 19796– 19821, 2025

  13. [13]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023

  14. [14]

    Improv- ing factuality and reasoning in language models through multiagent debate

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improv- ing factuality and reasoning in language models through multiagent debate. In International Conference on Machine Learning (ICML), 2024. 10

  15. [15]

    Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead

    Junda He, Christoph Treude, and David Lo. Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead. ACM Transactions on Software Engineering and Methodology (ACM TOSEM), 34(5):1–30, 2025

  16. [16]

    Many heads are better than one: Improved scientific idea generation by a llm-based multi-agent system

    Haoyang Su, Renqi Chen, Shixiang Tang, Zhenfei Yin, Xinzhe Zheng, Jinzhe Li, Biqing Qi, Qi Wu, Hui Li, Wanli Ouyang, et al. Many heads are better than one: Improved scientific idea generation by a llm-based multi-agent system. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers) (ACL), pages 28...

  17. [17]

    LLM Multi-Agent Systems: Challenges and Open Problems

    Shanshan Han, Qifan Zhang, Weizhao Jin, and Zhaozhuo Xu. Llm multi-agent systems: Challenges and open problems. arXiv preprint arXiv:2402.03578, 2024

  18. [18]

    Cut the crap: An economical communication pipeline for llm-based multi-agent systems

    Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun, Guancheng Wan, Kun Wang, Dawei Cheng, Jeffrey Xu Yu, and Tianlong Chen. Cut the crap: An economical communication pipeline for llm-based multi-agent systems. In International Conference on Learning Representations (ICLR), 2025

  19. [19]

    Metagpt: Meta programming for a multi-agent collaborative framework

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al. Metagpt: Meta programming for a multi-agent collaborative framework. In International Conference on Learning Representations (ICLR), 2024

  20. [20]

    L2mac: Large language model automatic computer for extensive code generation

    Samuel Holt, Max Ruiz Luyten, and Mihaela van der Schaar. L2mac: Large language model automatic computer for extensive code generation. In International Conference on Learning Representations (ICLR), 2024

  21. [21]

    Chatdev: Communicative agents for software development

    Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, et al. Chatdev: Communicative agents for software development. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (volume 1: Long papers) (ACL), pages 15174–15186, 2024

  22. [22]

    Mixture-of- agents enhances large language model capabilities

    Junlin Wang, WANG Jue, Ben Athiwaratkun, Ce Zhang, and James Zou. Mixture-of- agents enhances large language model capabilities. In International Conference on Learning Representations (ICLR), 2025

  23. [23]

    Rethinking mixture-of-agents: Is mixing different large language models beneficial? arXiv preprint arXiv:2502.00674, 2025

    Wenzhe Li, Yong Lin, Mengzhou Xia, and Chi Jin. Rethinking mixture-of-agents: Is mixing different large language models beneficial? arXiv preprint arXiv:2502.00674, 2025

  24. [24]

    Autogen: Enabling next-gen llm applications via multi-agent conversations

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. In First Conference on Language Modeling (COLM), 2024

  25. [25]

    Gptswarm: Language agents as optimizable graphs

    Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmidhuber. Gptswarm: Language agents as optimizable graphs. In International Conference on Machine Learning (ICML), 2024

  26. [26]

    Graph-of-agents: A graph-based framework for multi-agent llm collaboration

    Sukwon Yun, Jie Peng, Pingzhi Li, Wendong Fan, Jie Chen, James Zou, Guohao Li, and Tianlong Chen. Graph-of-agents: A graph-based framework for multi-agent llm collaboration. In International Conference on Learning Representations (ICLR), 2026

  27. [27]

    G-designer: Architecting multi-agent communica- tion topologies via graph neural networks

    Guibin Zhang, Yanwei Yue, Xiangguo Sun, Guancheng Wan, Miao Yu, Junfeng Fang, Kun Wang, Tianlong Chen, and Dawei Cheng. G-designer: Architecting multi-agent communica- tion topologies via graph neural networks. In International Conference on Machine Learning (ICML), pages 76678–76692, 2025

  28. [28]

    Graphplanner: Graph memory-augmented agentic routing for multi-agent llms

    Tao Feng, Haozhen Zhang, Zijie Lei, Peixuan Han, and Jiaxuan You. Graphplanner: Graph memory-augmented agentic routing for multi-agent llms. In International Conference on Learning Representations (ICLR), 2026

  29. [29]

    Large language model based multi-agents: a survey of progress and challenges

    Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. Large language model based multi-agents: a survey of progress and challenges. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI), pages 8048–8057, 2024. 11

  30. [30]

    Learning distilled collaboration graph for multi-agent perception

    Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, and Wenjun Zhang. Learning distilled collaboration graph for multi-agent perception. InProceedings of the 35th International Conference on Neural Information Processing Systems (NeurIPS), pages 29541–29552, 2021

  31. [31]

    Signed graphs

    Thomas Zaslavsky. Signed graphs. Discrete Applied Mathematics (DAM), 4(1):47–74, 1982

  32. [32]

    Signed graph neural network with latent groups

    Haoxin Liu, Ziwei Zhang, Peng Cui, Yafeng Zhang, Qiang Cui, Jiashuo Liu, and Wenwu Zhu. Signed graph neural network with latent groups. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (SIGKDD), pages 1066–1075, 2021

  33. [33]

    Dropedge not foolproof: effective augmentation method for signed graph neural networks

    Zeyu Zhang, Lu Li, Shuyan Wan, Sijie Wang, Zhiyi Wang, Zhiyuan Lu, Dong Hao, and Wanli Li. Dropedge not foolproof: effective augmentation method for signed graph neural networks. In Proceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS), pages 117041–117069, 2024

  34. [34]

    Signed graph convolutional networks

    Tyler Derr, Yao Ma, and Jiliang Tang. Signed graph convolutional networks. In IEEE International Conference on Data Mining (ICDM), pages 929–934, 2018

  35. [35]

    Attitudes and cognitive organization

    Fritz Heider. Attitudes and cognitive organization. The Journal of Psychology, 21(1):107–112, 1946

  36. [36]

    Structure balance and gradient matching-based signed graph condensation

    Rong Li, Long Xu, Songbai Liu, Junkai Ji, Lingjie Li, Qiuzhen Lin, and Lijia Ma. Structure balance and gradient matching-based signed graph condensation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), volume 39, pages 12121–12129, 2025

  37. [37]

    Retrieval-augmented generation for knowledge-intensive nlp tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS), pages 9459–9474, 2020

  38. [38]

    Dense passage retrieval for open-domain question answering

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, 2020

  39. [39]

    Di- versifying training pool predictability for zero-shot coordination: a theory of mind approach

    Dung Nguyen, Hung Le, Kien Do, Sunil Gupta, Svetha Venkatesh, and Truyen Tran. Di- versifying training pool predictability for zero-shot coordination: a theory of mind approach. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI), pages 166–174, 2024

  40. [40]

    Self-consistency improves chain of thought reasoning in language models

    Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V Le, Ed H Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. In International Conference on Learning Representations (ICLR), 2023

  41. [41]

    Dropout as a bayesian approximation: Representing model uncertainty in deep learning

    Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning (ICML), pages 1050–1059, 2016

  42. [42]

    A unified theory of diversity in ensemble learning

    Danny Wood, Tingting Mu, Andrew M Webb, Henry WJ Reeve, Mikel Lujan, and Gavin Brown. A unified theory of diversity in ensemble learning. Journal of Machine Learning Research (JMLR), 24(359):1–49, 2023

  43. [43]

    Multistrategy ensemble learning: Reducing error by com- bining ensemble learning techniques

    Geoffrey I Webb and Zijian Zheng. Multistrategy ensemble learning: Reducing error by com- bining ensemble learning techniques. IEEE Transactions on Knowledge and Data Engineering (TKDE), 16(8):980–991, 2004

  44. [44]

    A comprehensive survey on multi-agent cooperative decision-making: Scenarios, approaches, challenges and perspectives

    Weiqiang Jin, Hongyang Du, Biao Zhao, Xingwu Tian, Bohang Shi, and Guang Yang. A comprehensive survey on multi-agent cooperative decision-making: Scenarios, approaches, challenges and perspectives. arXiv preprint arXiv:2503.13415, 2025

  45. [45]

    Measuring massive multitask language understanding

    Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. In International Conference on Learning Representations (ICLR), 2021. 12

  46. [46]

    Mmlu-pro: a more robust and chal- lenging multi-task language understanding benchmark

    Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, et al. Mmlu-pro: a more robust and chal- lenging multi-task language understanding benchmark. In Proceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS), pages 95266–95290, 2024

  47. [47]

    Gpqa: A graduate-level google-proof q&a benchmark

    David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R Bowman. Gpqa: A graduate-level google-proof q&a benchmark. In First Conference on Language Modeling (COLM), 2024

  48. [48]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021

  49. [49]

    Solving general arithmetic word problems

    Subhro Roy and Dan Roth. Solving general arithmetic word problems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1743–1752, 2015

  50. [50]

    Evaluating Large Language Models Trained on Code

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021

  51. [51]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS), pages 24824–24837, 2022

  52. [52]

    Complexity-based prompting for multi-step reasoning

    Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, and Tushar Khot. Complexity-based prompting for multi-step reasoning. In International Conference on Learning Representations (ICLR), 2023

  53. [53]

    Progressive-hint prompting improves reasoning in large language models

    Chuanyang Zheng, Zhengying Liu, Enze Xie, Zhenguo Li, and Yu Li. Progressive-hint prompt- ing improves reasoning in large language models. arXiv preprint arXiv:2304.09797, 2023

  54. [54]

    A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration

    Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. Dynamic llm-agent net- work: An llm-agent collaboration framework with agent team optimization. arXiv preprint arXiv:2310.02170, 2023

  55. [55]

    Chateval: Towards better llm-based evaluators through multi-agent debate

    Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. Chateval: Towards better llm-based evaluators through multi-agent debate. In International Conference on Learning Representations (ICLR), 2024

  56. [56]

    Large language model as a policy teacher for training reinforcement learning agents

    Zihao Zhou, Bin Hu, Chenyang Zhao, Pu Zhang, and Bin Liu. Large language model as a policy teacher for training reinforcement learning agents. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI), pages 5671–5679, 2024

  57. [57]

    Self-organized agents: A LLM multi- agent framework toward ultra large-scale code generation and optimiza- tion,

    Yoichi Ishibashi and Yoshimasa Nishimura. Self-organized agents: A llm multi-agent framework toward ultra large-scale code generation and optimization. arXiv preprint arXiv:2404.02183, 2024

  58. [58]

    Scaling large language model-based multi-agent collabora- tion

    Chen Qian, Zihao Xie, YiFei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, et al. Scaling large language model-based multi-agent collabora- tion. In International Conference on Learning Representations (ICLR), 2025

  59. [59]

    Masrouter: Learning to route llms for multi-agent systems

    Yanwei Yue, Guibin Zhang, Boyang Liu, Guancheng Wan, Kun Wang, Dawei Cheng, and Yiyan Qi. Masrouter: Learning to route llms for multi-agent systems. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers) (ACL), pages 15549–15572, 2025

  60. [60]

    Multi-agent reinforcement learning is a sequence modeling problem

    Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, and Yaodong Yang. Multi-agent reinforcement learning is a sequence modeling problem. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS), pages 16509–16521, 2022. 13

  61. [61]

    Grandmaster level in starcraft ii using multi-agent reinforcement learning

    Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Jun- young Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019

  62. [62]

    Signal-to-noise ratios, performance criteria, and transformations

    George Box. Signal-to-noise ratios, performance criteria, and transformations. Technometrics, 30(1):1–17, 1988

  63. [63]

    Minimization of functions having lipschitz continuous first partial derivatives

    Larry Armijo. Minimization of functions having lipschitz continuous first partial derivatives. Pacific Journal of Mathematics (PJM), 16(1):1–3, 1966

  64. [64]

    Status-aware signed heterogeneous network embedding with graph neural networks

    Wanyu Lin and Baochun Li. Status-aware signed heterogeneous network embedding with graph neural networks. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 35(4):4580–4592, 2022

  65. [65]

    Clique identification in signed graphs: a balance theory based model.IEEE Transactions on Knowledge and Data Engineering (TKDE), 35(12):12513–12527, 2023

    Renjie Sun, Yanping Wu, Xiaoyang Wang, Chen Chen, Wenjie Zhang, and Xuemin Lin. Clique identification in signed graphs: a balance theory based model.IEEE Transactions on Knowledge and Data Engineering (TKDE), 35(12):12513–12527, 2023

  66. [66]

    Ning Gong, Michael Korostelev, Qiangguo Ren, Li Bai, Saroj Biswas, and Frank Ferrese

    Yoonhyuk Choi, Taewook Ko, Jiho Choi, and Chong-Kwon Kim. Beyond binary: Improving signed message passing in graph neural networks for multi-class graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025. 14 Conflict-Resilient Multi-Agent Reasoning via Signed Graph Modeling Supplementary Materials Appendix Contents A Notations ...