arxiv: 2605.10064 · v1 · submitted 2026-05-11 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

Flora D. Salim, Hao Xue, Imran Razzak, Ruiyi Yang, Zechen Li

Pith reviewed 2026-05-12 05:08 UTC · model grok-4.3

classification 💻 cs.AI

keywords self-evolving agentsknowledge graphsmulti-agent systemslanguage model agentsretrieval augmentationagent evolutionfrozen backbone modelsco-evolutionary learning

0 comments

The pith

MAGE lets frozen language model agents improve by retrieving guidance from a co-evolutionary knowledge graph of successes and corrections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method for self-evolving agents that stores their learning in a structured knowledge graph rather than in prompts or flat memory. This graph evolves alongside the agent's performance and supplies relevant past experiences to guide a model whose weights stay fixed. It is evaluated on nine benchmarks covering reasoning, question answering, and interactive tasks, showing gains over standard prompt-based approaches. The design includes analysis of how the memory structure avoids common pitfalls like retrieval degradation. Readers should care because it points to a scalable way to build capable agents without the cost of repeatedly updating large models.

Core claim

MAGE introduces a four-subgraph co-evolutionary knowledge graph that externalizes self-knowledge for agents. The experience subgraph holds both teacher corrections of failures and the agent's own successful reasoning traces. These are retrieved to condition a frozen execution model, while the graph and associated bandits update from rewards. Structural analysis confirms that append-only growth, bounded coverage, and filtered retrieval enable stable gains in the retrieval quality.

What carries the argument

The four-subgraph co-evolutionary knowledge graph whose experience subgraph delivers task-conditioned guidance retrieved for the frozen model.

If this is right

The framework delivers strong results against prompt-based frozen-backbone baselines on nine benchmarks including mathematical reasoning, multi-hop and open-domain question answering, spatio-temporal analysis, financial numerical reasoning, medical multiple-choice questions, an open-world survival game, and web navigation.
Self-harvested success traces and teacher-written corrections prove complementary, with success memories aiding reasoning-template tasks and corrective memories helping complex composition and interaction.
Append-only memory growth paired with bounded curriculum coverage and task-filtered retrieval sustains improvement of the retrieval substrate.
Task-level and skill-level routing bandits update jointly with the graph from the reward stream to guide evolution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could allow agent systems to accumulate expertise indefinitely without increasing model size or requiring gradient updates.
The separation of the experience subgraph from other structural elements suggests it might integrate with existing retrieval-augmented systems in new domains.
Extending the co-evolution to include direct agent-to-agent knowledge exchange could support more complex multi-agent collaborations.
If the bandits scale well, the method offers a template for automated skill acquisition in long-horizon tasks.

Load-bearing premise

The structural analysis showing that append-only memory growth, bounded curriculum coverage, and task-filtered retrieval support stable improvement of the retrieval substrate holds for the reported benchmarks and generalizes.

What would settle it

A new benchmark where performance plateaus or drops after multiple evolution cycles even as the knowledge graph enlarges would falsify the stability claim.

Figures

Figures reproduced from arXiv: 2605.10064 by Flora D. Salim, Hao Xue, Imran Razzak, Ruiyi Yang, Zechen Li.

**Figure 1.** Figure 1: The MAGE framework. A strong guidance tier (LG) writes to the four-subgraph coevolutionary knowledge graph (EVOKG), while a frozen execution tier (LE) answers questions through a Learner that consults a dual success/failure memory index, a task-conditioned searchstrategy bandit, and a per-skill routing bandit. Correct answers are harvested back into the graph as success memories. The graph and the two ba… view at source ↗

**Figure 2.** Figure 2: One-iteration co-evolution and conditional memory injection in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Effect sizes of MAGE on the reasoning benchmarks: gain over the strongest frozenbackbone baseline and gain attributable to success memories. Tables 1–3 summarize the main results across the original reasoning suite, the finance/medical domain extensions, and the sequential environments. On the core reasoning suite, MAGE outperforms the strongest frozen-backbone prompting baseline on all five benchmarks… view at source ↗

read the original abstract

Self-evolving language-model agents must decide what to learn next and how to preserve what they have learned across iterations. Existing systems typically carry this cross-iteration knowledge as natural-language feedback, flat episodic memory, or implicit reinforcement signals, none of which cleanly supports a frozen weak backbone at inference time. This paper introduces MAGE (Multi-Agent Graph-guided Evolution), a framework that externalizes self-knowledge into a four-subgraph co-evolutionary knowledge graph. Its experience subgraph stores both teacher-written failure corrections and the learner's own past correct reasoning traces, which are retrieved as task-conditioned guidance for a frozen execution model. During evolution, the graph, a task-level search bandit, and a skill-level routing bandit are updated from the same reward stream, while the learner's backbone remains unchanged. We further provide structural analysis showing how append-only memory growth, bounded curriculum coverage, and task-filtered retrieval together support stable improvement of the retrieval substrate for frozen-learner evolution. Across nine benchmarks spanning mathematical reasoning, multi-hop and open-domain question answering, spatio-temporal analysis, financial numerical reasoning, medical multiple-choice, an open-world survival game, and web navigation, MAGE achieves strong performance against prompt-based frozen-backbone baselines. Ablations show that self-harvested success traces and teacher-written corrections are complementary, with success memories contributing most on reasoning-template-heavy tasks and corrective memories supporting harder composition and interaction settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper introduces MAGE, a multi-agent framework that externalizes self-knowledge into a four-subgraph co-evolutionary knowledge graph for frozen-backbone LLM agents. Success traces and teacher corrections are stored in the experience subgraph and retrieved via task-conditioned guidance; task-level and skill-level bandits update the graph and routing from a shared reward stream. The work reports strong empirical gains over prompt-based frozen baselines across nine benchmarks (math reasoning, multi-hop/open QA, spatio-temporal, financial, medical, survival game, web navigation), with ablations indicating complementary contributions from success and corrective memories, plus a structural analysis arguing that append-only growth, bounded curriculum coverage, and task-filtered retrieval enable stable retrieval-substrate improvement.

Significance. If the reported gains and supporting analysis hold, the framework offers a practical route to cross-iteration improvement without backbone updates, addressing a key limitation of current self-evolving agents. The explicit separation of memory, retrieval, and bandit-driven evolution, together with the multi-domain evaluation and memory-type ablations, provides concrete evidence that structured external memory can stabilize and enhance frozen-learner performance.

major comments (2)

[§4.3] §4.3 (Structural Analysis): the claim that append-only memory growth combined with bounded curriculum coverage and task-filtered retrieval guarantees stable improvement of the retrieval substrate is supported only by qualitative arguments and a limited set of coverage plots; no quantitative bound or sensitivity analysis is given for how curriculum size or retrieval threshold affects long-term stability, which is load-bearing for the generalization statement beyond the nine reported benchmarks.
[Table 2, §5.1] Table 2 and §5.1: the main results compare against prompt-based frozen-backbone baselines, but the baseline implementations are described only at high level; it is unclear whether they receive equivalent retrieval or memory access, so the magnitude of the reported gains cannot be isolated from differences in prompting or retrieval setup.

minor comments (3)

[§3.2] §3.2: the four-subgraph architecture is introduced with a diagram, but the precise schema for each subgraph (node/edge types, update rules) is only summarized; an explicit table or pseudocode listing the fields and update operations would improve reproducibility.
[§5.2] §5.2 (Ablations): the success-trace vs. correction ablation reports aggregate scores but does not break down per-benchmark variance or statistical significance; adding error bars or p-values would strengthen the complementarity claim.
[References] References: several recent works on memory-augmented agents and graph-based retrieval (e.g., on episodic memory or KG-augmented LLMs) appear under-cited relative to the claims made in the introduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and outline planned revisions to improve clarity and rigor.

read point-by-point responses

Referee: [§4.3] §4.3 (Structural Analysis): the claim that append-only memory growth combined with bounded curriculum coverage and task-filtered retrieval guarantees stable improvement of the retrieval substrate is supported only by qualitative arguments and a limited set of coverage plots; no quantitative bound or sensitivity analysis is given for how curriculum size or retrieval threshold affects long-term stability, which is load-bearing for the generalization statement beyond the nine reported benchmarks.

Authors: We acknowledge that §4.3 currently relies on qualitative arguments and coverage plots without quantitative bounds or sensitivity analysis. While the design (append-only growth to avoid forgetting, bounded curriculum for tractable retrieval, and task-filtered access to limit noise) is intended to promote stability, we agree this requires stronger empirical grounding for broader generalization claims. In revision we will add a dedicated sensitivity analysis subsection with experiments varying curriculum size and retrieval thresholds, reporting metrics such as retrieval hit rate, performance variance, and substrate quality over extended iterations. revision: yes
Referee: [Table 2, §5.1] Table 2 and §5.1: the main results compare against prompt-based frozen-backbone baselines, but the baseline implementations are described only at high level; it is unclear whether they receive equivalent retrieval or memory access, so the magnitude of the reported gains cannot be isolated from differences in prompting or retrieval setup.

Authors: The baselines are standard prompt-only implementations of the frozen backbone that receive no external memory, retrieval, or knowledge-graph access; this is by design to isolate the contribution of MAGE's co-evolutionary substrate. To remove ambiguity we will expand §5.1 with explicit baseline prompt templates, input formatting details, and a clear statement that no retrieval or memory components are used. This will better separate framework gains from prompting differences. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes an empirical multi-agent framework (MAGE) that externalizes knowledge into co-evolutionary graphs updated from an external reward stream, with a frozen backbone at inference. Performance is reported via direct benchmark comparisons to prompt-based baselines across nine tasks; the structural analysis of append-only growth and task-filtered retrieval is presented as explanatory support rather than a formal derivation. No equations, self-definitional reductions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claims remain independent of the inputs by construction, consistent with a self-contained empirical result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities beyond the high-level framework description; the knowledge graph components are introduced as part of the method rather than postulated physical entities.

pith-pipeline@v0.9.0 · 5557 in / 1183 out tokens · 30374 ms · 2026-05-12T05:08:51.990462+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability echoes
Theorem 1 (EVOKG Information Monotonicity)... append-only invariant on principle, failure-memory, and success-memory nodes... I(Y;K_{k+1}) ≥ I(Y;K_k)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Theorem 4 (Task-Filtered Retrieval Support)... Ak+1(t) ≥ Ak(t) − ε_K ... append-only graph growth and bounded curriculum coverage

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 8 internal anchors

[1]

Awais Ahmed, Xiaoyang Zeng, Rui Xi, Mengshu Hou, and Syed Attique Shah. Med-prompt: A novel prompt engineering framework for medicine prediction on free-text clinical notes.Journal of King Saud University-Computer and Information Sciences, 36(2):101933, 2024

work page 2024
[2]

Experiential reflective learning for self-improving llm agents.arXiv preprint arXiv:2603.24639, 2026

Marc-Antoine Allard, Arnaud Teinturier, Victor Xing, and Gautier Viaud. Experiential reflective learning for self-improving llm agents.arXiv preprint arXiv:2603.24639, 2026

work page arXiv 2026
[3]

Semantic parsing on freebase from question-answer pairs

Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. Semantic parsing on freebase from question-answer pairs. InProceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544, 2013

work page 2013
[4]

Mars: Optimizing dual-system deep research via multi-agent reinforcement learning.arXiv preprint arXiv:2510.04935, 2025

Guoxin Chen, Zile Qiao, Wenqing Wang, Donglei Yu, Xuanzhong Chen, Hao Sun, Minpeng Liao, Kai Fan, Yong Jiang, Penguin Xie, et al. Mars: Optimizing dual-system deep research via multi-agent reinforcement learning.arXiv preprint arXiv:2510.04935, 2025

work page arXiv 2025
[5]

Multi-Agent Evolve:

Yixing Chen, Yiding Wang, Siqi Zhu, Haofei Yu, Tao Feng, Muhan Zhang, Mostofa Patwary, and Jiaxuan You. Multi-agent evolve: Llm self-improve through co-evolution.arXiv preprint arXiv:2510.23595, 2025

work page arXiv 2025
[6]

Finqa: A dataset of numerical reasoning over financial data

Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan R Routledge, et al. Finqa: A dataset of numerical reasoning over financial data. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3697–3711, 2021

work page 2021
[7]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[8]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

Benchmarking the spectrum of agent capabilities.arXiv preprint arXiv:2109.06780, 2021

Danijar Hafner. Benchmarking the spectrum of agent capabilities.arXiv preprint arXiv:2109.06780, 2021

work page arXiv 2021
[10]

What disease does this patient have? a large-scale open domain question answering dataset from medical exams.Applied Sciences, 11(14):6421, 2021

Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, and Peter Szolovits. What disease does this patient have? a large-scale open domain question answering dataset from medical exams.Applied Sciences, 11(14):6421, 2021

work page 2021
[11]

Agentic-kgr: Co-evolutionary knowledge graph construction through multi-agent reinforcement learning.arXiv preprint arXiv:2510.09156, 2025

Jing Li, Zhijie Sun, Zhicheng Zhou, Suming Qiu, Junjie Huang, Haijia Sun, and Linyuan Qiu. Agentic-kgr: Co-evolutionary knowledge graph construction through multi-agent reinforcement learning.arXiv preprint arXiv:2510.09156, 2025

work page arXiv 2025
[12]

Stbench: Assessing the ability of large language models in spatio-temporal analysis

Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, and Jingping Bi. Stbench: Assessing the ability of large language models in spatio-temporal analysis. InCompanion Proceedings of the ACM on Web Conference 2025, pages 749–752, 2025

work page 2025
[13]

Sage: Multi-agent self-evolution for llm reasoning.arXiv preprint arXiv:2603.15255, 2026

Yulin Peng, Xinxin Zhu, Chenxing Wei, Nianbo Zeng, Leilei Wang, Ying Tiffany He, and F Richard Yu. Sage: Multi-agent self-evolution for llm reasoning.arXiv preprint arXiv:2603.15255, 2026

work page arXiv 2026
[14]

Fino1: On the transferability of reasoning enhanced llms to finance.arXiv e-prints, pages arXiv–2502, 2025

Lingfei Qian, Weipeng Zhou, Yan Wang, Xueqing Peng, Jimin Huang, and Qianqian Xie. Fino1: On the transferability of reasoning enhanced llms to finance.arXiv e-prints, pages arXiv–2502, 2025

work page 2025
[15]

Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning.Advances in neural information processing systems, 36:8634–8652, 2023

work page 2023
[16]

SEAgent: Self- evolving computer use agent with autonomous learning from experience.arXiv preprint arXiv:2508.04700, 2025

Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, and Jiaqi Wang. Seagent: Self-evolving computer use agent with autonomous learning from experience. arXiv preprint arXiv:2508.04700, 2025. 10

work page arXiv 2025
[17]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models.arXiv preprint arXiv:2203.11171, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[19]

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Xing Jin, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, et al. Ragen: Understanding self-evolution in llm agents via multi-turn reinforcement learning.arXiv preprint arXiv:2504.20073, 2025

work page internal anchor Pith review arXiv 2025
[20]

arXiv preprint arXiv:2511.20857 , year=

Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H Chi, et al. Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory.arXiv preprint arXiv:2511.20857, 2025

work page arXiv 2025
[21]

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle

Rong Wu, Xiaoman Wang, Jianbiao Mei, Pinlong Cai, Daocheng Fu, Cheng Yang, Licheng Wen, Xuemeng Yang, Yufan Shen, Yuxin Wang, et al. Evolver: Self-evolving llm agents through an experience-driven lifecycle.arXiv preprint arXiv:2510.16079, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Agent0: Unleashing self-evolving agents from zero data via tool-integrated reasoning.arXiv preprint arXiv:2511.16043, 2025

Peng Xia, Kaide Zeng, Jiaqi Liu, Can Qin, Fang Wu, Yiyang Zhou, Caiming Xiong, and Huaxiu Yao. Agent0: Unleashing self-evolving agents from zero data via tool-integrated reasoning. arXiv preprint arXiv:2511.16043, 2025

work page arXiv 2025
[23]

A-MEM: Agentic Memory for LLM Agents

Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[24]

Divide by question, conquer by agent: Split-rag with question-driven graph partitioning.arXiv preprint arXiv:2505.13994, 2025

Ruiyi Yang, Hao Xue, Imran Razzak, Shirui Pan, Hakim Hacid, and Flora D Salim. Divide by question, conquer by agent: Split-rag with question-driven graph partitioning.arXiv preprint arXiv:2505.13994, 2025

work page arXiv 2025
[25]

Toward self-evolving systems of llm agents through exploration and iterative feedback

Yongjin Yang, Sinjae Kang, Juyong Lee, Dongjun Lee, Se-Young Yun, and Kimin Lee. Toward self-evolving systems of llm agents through exploration and iterative feedback

work page
[26]

Hotpotqa: A dataset for diverse, explainable multi-hop question answering

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D Manning. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. InProceedings of the 2018 conference on empirical methods in natural language processing, pages 2369–2380, 2018

work page 2018
[27]

Webshop: Towards scalable real-world web interaction with grounded language agents.Advances in Neural Information Processing Systems, 35:20744–20757, 2022

Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. Webshop: Towards scalable real-world web interaction with grounded language agents.Advances in Neural Information Processing Systems, 35:20744–20757, 2022

work page 2022
[28]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022

work page 2022
[29]

Infiagent: Self-evolving pyramid agent framework for infinite scenarios.arXiv preprint arXiv:2509.22502, 2025

Chenglin Yu, Yang Yu, Songmiao Wang, Yucheng Wang, Yifan Yang, Jinjia Li, Ming Li, and Hongxia Yang. Infiagent: Self-evolving pyramid agent framework for infinite scenarios.arXiv preprint arXiv:2509.22502, 2025

work page arXiv 2025
[30]

Agentevolver: Towards efficient self-evolving agent system.arXiv preprint arXiv:2511.10395, 2025

Yunpeng Zhai, Shuchang Tao, Cheng Chen, Anni Zou, Ziqian Chen, Qingxu Fu, Shinji Mai, Li Yu, Jiaji Deng, Zouying Cao, et al. Agentevolver: Towards efficient self-evolving agent system.arXiv preprint arXiv:2511.10395, 2025

work page arXiv 2025
[31]

Sovereign AI Foundation Model

Jie Zhang, Cezara Petrui, Kristina Nikoli ´c, and Florian Tramèr. Realmath: A continuous benchmark for evaluating language models on research-level mathematics.arXiv preprint arXiv:2505.12575, 2025

work page arXiv 2025
[32]

arXiv preprint arXiv:2601.03192 , year=

Shengtao Zhang, Jiaqian Wang, Ruiwen Zhou, Junwei Liao, Yuchen Feng, Zhuo Li, Yujie Zheng, Weinan Zhang, Ying Wen, Zhiyu Li, et al. Memrl: Self-evolving agents via runtime reinforcement learning on episodic memory.arXiv preprint arXiv:2601.03192, 2026. 11

work page arXiv 2026
[33]

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, and Gao Huang. Absolute zero: Reinforced self-play reasoning with zero data.arXiv preprint arXiv:2505.03335, 2025

work page internal anchor Pith review arXiv 2025
[34]

arXiv preprint arXiv:2502.04780 , year=

Wanjia Zhao, Mert Yuksekgonul, Shirley Wu, and James Zou. Sirius: Self-improving multi- agent systems via bootstrapped reasoning.arXiv preprint arXiv:2502.04780, 2025

work page arXiv 2025
[35]

not recently selected

Xinjie Zhao, Moritz Blum, Fan Gao, Yingjian Chen, Boming Yang, Luis Marquez-Carpintero, Mónica Pina-Navarro, Yanran Fu, So Morikawa, Yusuke Iwasawa, et al. Agentigraph: A multi- agent knowledge graph framework for interactive, domain-specific llm chatbots. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, page...

work page 2025