{"total":22,"items":[{"citing_arxiv_id":"2606.22385","ref_index":128,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MetaPS: Adaptive Programmatic Strategy Selection for Market Agents","primary_cat":"cs.AI","submitted_at":"2026-06-21T08:22:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09249","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MAGIS: Evidence-Based Multi-Agent Reasoning for Interpretable Strabismus Clinical Decision-Making","primary_cat":"cs.CV","submitted_at":"2026-06-08T09:21:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MAGIS applies multi-agent reasoning with dual-evidence constrained context and corrective verification to raise weighted F1 from 72.0% to 91.3% on a strabismus benchmark while improving report consistency, alignment, and completeness.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28334","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Cybersecurity SuperIntelligence (CSI): What's the best harness for cybersecurity?","primary_cat":"cs.CR","submitted_at":"2026-05-27T11:37:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"CSI meta-scaffold unifies five LLM agent harnesses; a blackboard multi-agent system solves 19/33 cybench challenges (57.6%) versus 15/33 for the best single scaffold.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26178","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ATOM: Instantiating Budget-Controllable Multi-Agent Collaboration via Nucleus-Electron Hierarchy","primary_cat":"cs.MA","submitted_at":"2026-05-25T06:41:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ATOM uses a nucleus-electron hierarchy and task-driven RL to generate budget-controllable multi-agent collaboration graphs for LLMs, claiming SOTA performance with up to 30% better token efficiency on six benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17361","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"\\textsc{MasFACT}: Continual Multi-Agent Topology Learning via Geometry-Aware Posterior Transfer","primary_cat":"cs.LG","submitted_at":"2026-05-17T09:58:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MasFACT transfers historical topology priors across tasks via Fused Gromov-Wasserstein optimal transport and PAC-Bayes conservative adaptation to reduce topology forgetting in continual multi-agent settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15706","ref_index":25,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Differentiable Mixture-of-Agents Incentivizes Swarm Intelligence of Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-15T07:54:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DMoA is a differentiable multi-agent framework for LLMs that uses recurrent context-aware routing and predictive entropy for test-time adaptation, claiming SOTA results on 9 benchmarks with efficiency and robustness.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13213","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Hierarchical Attacks for Multi-Modal Multi-Agent Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-13T09:06:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HAM³ achieves up to 78.3% attack success rate on the GQA benchmark by hierarchically attacking perception, communication, and reasoning layers in multi-modal multi-agent systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09703","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MOTOR-Bench: A Real-world Dataset and Multi-agent Framework for Zero-shot Human Mental State Understanding","primary_cat":"cs.CV","submitted_at":"2026-05-10T18:51:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MOTOR-Bench supplies a real-world video dataset for structured mental state understanding in learning settings, while MOTOR-MAS improves zero-shot prediction of behavior, cognition, and emotion labels over single models and other multi-agent systems.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"These frameworks demonstrate that collaboration is helpful, but their structure stems from heuristics of task decomposition rather than domain theory. Increasing the number of agents does not solve this limita- tion. Qian et al. discovered that performance follows a rea- sonable growth pattern and tends to saturate before reaching its theoretical limit [26]. Yang et al. [27] provides a formal explanation: homogeneous agents produce correlated out- puts, thus the evidence contributed by newly added agents diminishes. Two different agents can achieve the same pro- cessing power as sixteen homogeneous agents, indicating that the structure of information is far more important than the number of agents. However, these frameworks are not designed specifically"},{"citing_arxiv_id":"2605.09278","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium","primary_cat":"cs.AI","submitted_at":"2026-05-10T03:04:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Common designs include embedding-based retrieval [56, 69, 94], graph-structured memory [12, 51], and hierarchical or multi-tier storage [ 45, 75, 84]. These sys- tems are typically optimised for retrieval quality, with veracity and provenance treated as sec- ondary [61, 80, 92]. Multi-agent variants extend the same designs to a shared store accessible to several agents at once [49, 52, 89]. LLM hallucinations and memory safety.LLMs frequently produce plausible-sounding but incorrect content [5, 25, 26, 85]. Common mitigations include retrieval augmentation [19, 32], self-consistency verification [41, 60], and uncertainty calibration [27, 53, 64, 83]. Beyond intrinsic errors, conflicts between retrieved memories further complicate downstream reasoning [ 68, 74]."},{"citing_arxiv_id":"2605.09076","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Robust Multi-Agent LLMs under Byzantine Faults","primary_cat":"cs.MA","submitted_at":"2026-05-09T17:37:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SAC is a decentralized iterative filter-and-refine protocol that achieves (F+1)-robustness in LLM multi-agent systems, suppressing Byzantine influence and improving performance on reasoning benchmarks where prior methods fail.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"e067503fUnit conversion: 6 wallops=5 ballops, 3 ballops=11 fallops; wallops for 110 fallops 36 7bfcd56aFirst odd year after 2006 whose digits split into 3-digit and 1-digit groups with common factor>12013 Number Theory a10973bfSmallest integer with exactly 16 divisors, including 12 and 15 120 27b01b01Base-7 cryptarithm AB7 + BA7 = AA07; productA·B6 583c9eafNumber of even positive divisors of 252 12 37bab629Count ofn∈[1,29]for whichn/30has a repeating decimal expansion 20 Counting & Probability 0a3e457d20-member club, 3 distinct officers, with constraint Alex serves only if Bob does not 6732 d790474fJar with 4 red, 2 white marbles; swap-and-sample procedure;P(red) = 11/18 11/18 b98d41f7Coefficient ofx 2y2 in(x+y) 4 + (x+ 2y) 4 30 03694fa9120 triangles formed bynvertices on a base; findn16"},{"citing_arxiv_id":"2605.02939","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Static Analysis to Audience Dissemination: A Training-Free Multimodal Controversy Detection Multi-Agent Framework","primary_cat":"cs.LG","submitted_at":"2026-05-01T07:57:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AuDisAgent reformulates multimodal controversy detection as a dynamic audience dissemination process using screening, panel discussion, and arbitration agents, plus comment bootstrapping, and reports outperforming prior static methods on a public dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27221","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction","primary_cat":"cs.AI","submitted_at":"2026-04-29T21:43:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"from 2010 to 2025 with date, city, and venuereturns hundreds of rows, each of which must be correct and mutually consistent, which is wide search. The first demands long, coherent reasoning; the second demands broad, verified coverage. A monolithic agent handles neither well at scale: the context saturates, errors compound, and the fixed plan made at the start cannot adapt to what later steps uncover. Hierarchical agent frameworks [16, 11, 31] and automatically searched workflow pipelines [29, 8] partially address this issue through task decomposition, but often rely on fixed or weakly adaptive planning strategies, with limited feedback from downstream execution to upstream decomposition [13, 27]. Meanwhile, recent self-evolving and memory-augmented agents improve execution through reusable external skills [7, 20, 17] or structured long-term"},{"citing_arxiv_id":"2604.18133","ref_index":117,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures","primary_cat":"cs.AI","submitted_at":"2026-04-20T12:00:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"can give rise to behaviors such as complementary informa- tion integration and stable consensus formation. For example, AgentVerse [116] shows that modular expert teams outperform individual agents and display emergent behaviors such as spontaneous assistance and implicit goal alignment. Some studies further examine how collective performance evolves with agent population size. MACNET [117] explores collab- oration from a few to thousands of agents and identifies a collaborative scaling law. System performance grows logisti- cally with agent count, and collaborative emergence occurs earlier than traditional neural scaling. However, LFM-based agents inherently carry risks of role inconsistency, accumulation of cognitive biases, and large-"},{"citing_arxiv_id":"2604.17503","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology","primary_cat":"cs.AI","submitted_at":"2026-04-19T15:46:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"https://github.com/niez233/skillgraph. Keywords:Visual Multi-Agent Systems·Multimodal Reasoning 1 Introduction The rapid development of vision-language models (VLMs) has advanced single- model perceptual and reasoning capabilities. Consequently, research is shifting fromasingle-agentparadigmtoVisualMulti-AgentSystems(VMAS)toleverage collective intelligence [13,24,34,40,45,48]. The core hypothesis is that VMAS, by wiring together specialized agents into a collaborative network, can yield substantial performance gains on complex, multi-step multimodal tasks that remain intractable for individual models. In an ideal VMAS framework, agents form a dynamic ensemble of experts whose communication structure is tailored to the multimodal characteristics of each query, enabling more effective reasoning"},{"citing_arxiv_id":"2604.22820","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows","primary_cat":"cs.MA","submitted_at":"2026-04-17T15:31:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Complete cyclic subtask graphs offer a lens to measure when multi-agent revisitation aids recovery and exploration versus when it increases costs or is dominated by other bottlenecks in LLM agent workflows.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.13559","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"WebMAC: A Multi-Agent Collaborative Framework for Scenario Testing of Web Systems","primary_cat":"cs.SE","submitted_at":"2026-04-15T07:07:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"WebMAC uses three specialized multi-agent modules to clarify test scenarios, partition them for adequacy, and generate executable scripts, yielding 30-60% higher success rates and 29% better efficiency than SOTA on four web systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.02674","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems","primary_cat":"cs.MA","submitted_at":"2026-04-03T03:08:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"LLM Multi-Agent Systems and Coordination:LLM MAS extend reasoning beyond single-agent limits through structured interaction across role-playing, orchestration, software development, and social simulation [33, 66, 25, 50, 48]. Deliberative mechanisms such as debate improve reasoning quality [16, 34, 8, 10], while communication topology strongly shapes collective outcomes [ 51, 70, 36]. Despite this progress, scaling agent count does not reliably improve performance and can degrade due to coordination failures and saturation [ 11, 28, 6, 27]. These limitations depend on protocol and scaffold design [45] and are not fully explained by model capability alone, indicating a lack of principled multi-agent grounding [30]. Surveys summarize advances and open challenges [23,"},{"citing_arxiv_id":"2604.03295","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Scaling Teams or Scaling Time? Memory Enabled Lifelong Learning in LLM Multi-Agent Systems","primary_cat":"cs.MA","submitted_at":"2026-03-27T19:34:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLMA-Mem improves long-horizon performance in LLM multi-agent systems over baselines while reducing cost and shows non-monotonic scaling where memory-enabled smaller teams can beat larger ones.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.02334","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web","primary_cat":"cs.AI","submitted_at":"2026-01-18T13:09:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Holos is a five-layer LLM-based multi-agent system architecture using the Nuwa engine for agent generation, a market-driven Orchestrator for coordination, and an endogenous value cycle for incentive-compatible persistence in the Agentic Web.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.20857","ref_index":59,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory","primary_cat":"cs.CL","submitted_at":"2025-11-25T21:08:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.07799","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models","primary_cat":"cs.CL","submitted_at":"2025-10-09T05:28:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GTD generates task-adaptive, sparse communication topologies for multi-LLM agents via guided iterative graph diffusion steered by a proxy model predicting accuracy, utility, and cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.08435","ref_index":195,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Automated Design of Agentic Systems","primary_cat":"cs.AI","submitted_at":"2024-08-15T21:59:23+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}