Clarus: Coordinating Autonomous Research Agents toward Web-Scale Scientific Collaboration

Bo Huang; Chenxi Zeng; Hanwen Zhu; Junwei Liao; Ming Zhou; Shuai Shao; Weinan Zhang; Xiaohang Nie; Yang Li; Yuanjian Zhou

arxiv: 2606.30246 · v1 · pith:7FUPOYA3new · submitted 2026-06-29 · 💻 cs.AI · cs.CY· cs.MA

Clarus: Coordinating Autonomous Research Agents toward Web-Scale Scientific Collaboration

Zihan Guo , Zeyi Chen , Zhiyu Chen , Zicai Cui , Shuai Shao , Bo Huang , Zhi Han , Yuanyi Song

show 10 more authors

Yuan Yuan Chenxi Zeng Xiaohang Nie Zhengxi Yu Hanwen Zhu Junwei Liao Ming Zhou Yang Li Yuanjian Zhou Weinan Zhang

This is my paper

Pith reviewed 2026-06-30 06:13 UTC · model grok-4.3

classification 💻 cs.AI cs.CYcs.MA

keywords autonomous research agentsscientific collaborationmulti-agent systemsresearch infrastructurecollaboration networksagent coordination

0 comments

The pith

Clarus organizes research goals into traceable, reviewable, attributable collaboration networks across phases and participants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Clarus as infrastructure that coordinates autonomous research agents, treating them as AI systems, humans, teams or organizations, to move research beyond isolated tasks or closed loops. It reframes the process as open multi-phase collaboration that must track questions, evidence, participants and resources under uncertainty. The system uses a minimal object model for projects, agents and resources, plus four layers and pluggable modules to adapt to different risks and constraints. A controlled paper-generation case study shows the result is a network that remains traceable, reviewable, attributable and accumulative. A sympathetic reader would care because current agent tools lack mechanisms for shared, auditable progress at web scale.

Core claim

Clarus reformulates research as an open, auditable, attributable, and resource-aware multi-phase collaboration process. It defines a minimal project-agent-resource object model and organizes scientific collaboration through four layers including Research Application, Digital Collaboration, Physical Substrate, and Physical World. Core modules are implemented as pluggable mechanisms. Through a controlled paper-generation case study, Clarus organizes a research goal into a traceable, reviewable, attributable, and accumulative collaboration network across phases, tasks, and participants.

What carries the argument

The four-layer architecture (Research Application, Digital Collaboration, Physical Substrate, Physical World) combined with the project-agent-resource object model and pluggable coordination modules.

If this is right

Research projects shift from closed workflows to open, auditable processes that record contributions across phases.
Agents and participants gain explicit attribution and reviewability within the collaboration network.
Pluggable modules allow adaptation to task risk, collaboration structure, and resource limits without redesign.
Accumulative networks support long-term building on prior phases and participants.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Open attribution could reduce redundant work if multiple independent groups adopt the same object model.
Physical Substrate layer may need extra protocols when real labs or equipment enter the network.
Trust mechanisms described could extend to versioned data sharing across organizations.

Load-bearing premise

The four-layer architecture and pluggable mechanisms can handle coordination under uncertainty and varying resource constraints at web scale.

What would settle it

A replication of the paper-generation case study in which the produced collaboration network lacks clear traceability or attributability for tasks and phases.

Figures

Figures reproduced from arXiv: 2606.30246 by Bo Huang, Chenxi Zeng, Hanwen Zhu, Junwei Liao, Ming Zhou, Shuai Shao, Weinan Zhang, Xiaohang Nie, Yang Li, Yuanjian Zhou, Yuanyi Song, Yuan Yuan, Zeyi Chen, Zhengxi Yu, Zhi Han, Zhiyu Chen, Zicai Cui, Zihan Guo.

**Figure 1.** Figure 1: From closed workflows and open research “dark forests” to trusted scientific collaboration networks. Closed multi-agent workflows organize fixed roles into a bounded pipeline, while open research networks introduce heterogeneous agents, organizations, tools, data, and physical resources with unverifiable identity, ambiguous credit, broken provenance, and ungoverned access. Clarus addresses this transition… view at source ↗

**Figure 2.** Figure 2: Clarus four-layer architecture. Clarus organizes open scientific collaboration into four layers. The Research Application layer structures research goals into project, phase, subtask, and artifact objects. The Digital Collaboration layer provides identity, discovery, collaboration, and utility capabilities for open participants. The Physical Substrate layer mediates controlled access to decentralized rea… view at source ↗

**Figure 3.** Figure 3: Application workflow in Clarus. Clarus transforms a research goal into an attributable project lifecycle. Open team formation discovers and assembles agents around the task, after which the project container executes repeated phase loops through phase planning, subtask DAG construction, agent execution, artifact and evidence collection, audit and credit confirmation, and phase checkpoints. The resulting re… view at source ↗

**Figure 4.** Figure 4: Clarus prototype interface and MirrorEval paper artifact. a: The Clarus project room brings phase state, agent execution records, artifact registration, DAG overview, and related information into one inspectable interface. b: The final MirrorEval paper page assembled by the system. c to h: Representative pages of the final paper, including the problem setting, benchmark structure, experimental pipeline, ma… view at source ↗

**Figure 5.** Figure 5: End-to-end execution results of the MirrorEval case study. a: Subtask DAGs across six research phases, showing how Clarus decomposes an open-ended research goal into a phased, executable, and traceable task structure. b: Final credit settlement, comparing equal share with the contribution allocation derived from process records, artifact ownership, and handoff. c: The paper preview interface, showing the c… view at source ↗

**Figure 6.** Figure 6: Audit-triggered replanning in the Experiment phase. a: Adapter provenance audit failure in the first version of Experiment DAG, where the red node indicates a failed audit. b: Human checkpoint and DAG diff record. Because Auto checkpoint decision was enabled, the system automatically accepts the recommendation to repair the current phase and triggers replanning from first version to second one. c: The rep… view at source ↗

**Figure 7.** Figure 7: Credit attribution and provenance tracking in the MirrorEval run. a: Credit-share changes of different agents across the six research phases. Solid lines denote accumulated cross-phase settlement, while dashed lines denote within-phase or intermediate states. b: Distribution of 589 trace events, showing that credit, artifact, task, audit, phase, and routing records jointly form the evidence trail. c: The n… view at source ↗

read the original abstract

Existing autonomous research agents can support parts of the research process, but most systems still treat research as either an isolated assistant task or a closed workflow. Therefore, autonomous science needs a collaboration infrastructure that coordinates projects, agents, and digital and physical resources. We identify this as a shift from code-centered execution loops to research-oriented collaboration processes, where questions, evidence, participants, and resources must be coordinated under uncertainty. In this framing, an agent may be an AI system, a human researcher, a team, a laboratory, or an organization-backed participant. To this end, we present Clarus, a collaboration infrastructure for coordinating autonomous research agents toward web-scale scientific collaboration. Clarus reformulates research as an open, auditable, attributable, and resource-aware multi-phase collaboration process. It defines a minimal project-agent-resource object model and organizes scientific collaboration through four layers including Research Application, Digital Collaboration, Physical Substrate, and Physical World. Core modules are implemented as pluggable mechanisms, allowing Clarus to adapt to task risk, collaboration structure, and resource constraints. Through a controlled paper-generation case study, we show that Clarus can organize a research goal into a traceable, reviewable, attributable, and accumulative collaboration network across phases, tasks, and participants. Together, the object model, collaboration protocol, trust mechanisms, and prototype validation provide an initial foundation for open research networks. Clarus is now available at clarus.holosai.io.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Clarus defines a project-agent-resource model and four-layer protocol for auditable research collaboration, but the controlled case study supplies no metrics or tests of uncertainty handling.

read the letter

Clarus names a specific object model and four-layer stack (Research Application, Digital Collaboration, Physical Substrate, Physical World) to turn research into an open, attributable process that can include both AI agents and human participants. The pluggable modules are meant to adapt to different risks and constraints, and the authors make the prototype available at clarus.holosai.io.

The design is new in its explicit focus on auditability and accumulation across phases rather than just execution loops. That framing is useful for people thinking about infrastructure for distributed science.

The controlled paper-generation case study is the only evidence offered. It claims the system produces traceable networks, yet the description contains no numbers on completion rates, review quality, attribution accuracy, or how the layers behaved when resources were limited. Because the study is controlled, it also does not exercise the web-scale conditions the introduction highlights.

The central assumption—that the architecture plus pluggable mechanisms will suffice under uncertainty—therefore sits on the design rather than on demonstrated behavior. No failure modes or quantitative validation appear in the available text.

This paper is for groups already working on multi-agent scientific systems who want a concrete object model to compare against. It is preliminary but coherent enough to merit referee time; a review could usefully press for the missing metrics and stress tests before any stronger claims are made.

Referee Report

2 major / 0 minor

Summary. The manuscript presents Clarus, a collaboration infrastructure for coordinating autonomous research agents (AI systems, humans, teams, labs, or organizations) toward web-scale scientific collaboration. It reformulates research as an open, auditable, attributable, and resource-aware multi-phase process using a minimal project-agent-resource object model organized across four layers (Research Application, Digital Collaboration, Physical Substrate, Physical World). Core modules are implemented as pluggable mechanisms to adapt to task risk, collaboration structure, and resource constraints. The central claim is validated through a controlled paper-generation case study demonstrating that Clarus organizes a research goal into a traceable, reviewable, attributable, and accumulative collaboration network across phases, tasks, and participants.

Significance. If the case study evidence holds and generalizes beyond the controlled setting, Clarus could provide a foundational object model, protocol, and trust mechanisms for open research networks, shifting from isolated code-centered loops to coordinated multi-participant processes under uncertainty. The prototype availability at clarus.holosai.io offers a concrete implementation starting point. However, the lack of quantitative metrics or scaling analysis in the validation limits demonstrated significance for web-scale claims.

major comments (2)

[Case Study] Case Study section: The controlled paper-generation case study is presented without quantitative results, error analysis, implementation details, or metrics on traceability, reviewability, or attributability. This leaves the central claim—that Clarus organizes a research goal into an effective collaboration network—unsupported by evidence in the manuscript.
[Architecture and Pluggable Mechanisms] Architecture and Pluggable Mechanisms sections: The four-layer architecture plus pluggable mechanisms are asserted to handle coordination under uncertainty and varying resource constraints for web-scale use, yet no specific mechanisms, failure modes, or experiments exercising these conditions appear; the controlled case study does not test web-scale conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to respond. We address each major comment below, clarifying the intended scope of the work as a conceptual infrastructure proposal supported by a controlled demonstration.

read point-by-point responses

Referee: [Case Study] Case Study section: The controlled paper-generation case study is presented without quantitative results, error analysis, implementation details, or metrics on traceability, reviewability, or attributability. This leaves the central claim—that Clarus organizes a research goal into an effective collaboration network—unsupported by evidence in the manuscript.

Authors: The case study is presented as a controlled, qualitative demonstration to illustrate how the project-agent-resource model structures a research goal into traceable phases with attributable contributions across participants. We acknowledge that it provides no quantitative metrics, error analysis, or statistical evaluation of traceability or attributability. This choice reflects the paper's focus on introducing the object model, layers, and protocol rather than conducting a performance benchmark study. The prototype at clarus.holosai.io supplies additional implementation details for inspection. We agree that quantitative metrics would provide stronger support for the claims but maintain that the existing demonstration is sufficient to show the model's organizational capability within the scope of this work. revision: no
Referee: [Architecture and Pluggable Mechanisms] Architecture and Pluggable Mechanisms sections: The four-layer architecture plus pluggable mechanisms are asserted to handle coordination under uncertainty and varying resource constraints for web-scale use, yet no specific mechanisms, failure modes, or experiments exercising these conditions appear; the controlled case study does not test web-scale conditions.

Authors: The four-layer architecture and pluggable mechanisms are described conceptually to enable adaptation to task risk, collaboration structures, and resource constraints through modular trust and resource modules. Specific mechanisms for attribution, auditing, and resource awareness are outlined in the relevant sections, but we recognize that no detailed failure-mode analysis or experiments under web-scale conditions or high uncertainty are included. The case study exercises the layers at small scale to validate the model. We view the design as a foundational proposal rather than a fully evaluated system at web scale and agree that scaling experiments would be required to substantiate broader applicability claims. revision: no

Circularity Check

0 steps flagged

No circularity: systems description with independent case study

full rationale

The paper contains no equations, derivations, fitted parameters, or mathematical claims. Its central demonstration is a controlled case study that produces a traceable collaboration network; this is presented as an empirical outcome of the described architecture rather than a quantity that reduces to the architecture by definition or by self-citation. No load-bearing step invokes a prior result from the same authors that is itself unverified, nor does any claim rename a known result or smuggle an ansatz. The architecture and case study are therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review provides no explicit free parameters, axioms, or external evidence for invented elements. The system itself is presented as the core contribution.

invented entities (1)

project-agent-resource object model no independent evidence
purpose: To coordinate research elements in an open process
Introduced as minimal model in the abstract without independent validation or falsifiable predictions

pith-pipeline@v0.9.1-grok · 5851 in / 1069 out tokens · 28098 ms · 2026-06-30T06:13:12.168093+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 30 canonical work pages · 14 internal anchors

[1]

A survey on llm-based multi- agent system: Recent advances and new frontiers in application.arXiv preprint arXiv:2412.17481,

Shuaihang Chen, Yuanxing Liu, Wei Han, Weinan Zhang, and Ting Liu. A survey on llm-based multi- agent system: Recent advances and new frontiers in application.arXiv preprint arXiv:2412.17481,

work page arXiv
[2]

Abul Ehtesham, Aditi Singh, Gaurav Kumar Gupta, and Saket Kumar. A survey of agent interoper- ability protocols: Model context protocol (mcp), agent communication protocol (acp), agent-to-agent protocol (a2a), and agent network protocol (anp).arXiv preprint arXiv:2505.02279,

work page arXiv
[3]

Agentic LLM Reasoning in a Self-Driving Laboratory for Air-Sensitive Lithium Halide Spinel Conductors

Yuxing Fei, Bernardus Rendy, Xiaochen Yang, Junhee Woo, Xu Huang, Chang Li, Shilong Wang, David Milsted, Yan Zeng, and Gerbrand Ceder. Agentic llm reasoning in a self-driving laboratory for air-sensitive lithium halide spinel conductors.arXiv preprint arXiv:2604.11957,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Unilabos: An ai-native operating system for autonomous laboratories.arXiv preprint arXiv:2512.21766,

Jing Gao, Junhan Chang, Haohui Que, Yanfei Xiong, Shixiang Zhang, Xianwei Qi, Zhen Liu, Jun-Jie Wang, Qianjun Ding, Xinyu Li, et al. Unilabos: An ai-native operating system for autonomous laboratories.arXiv preprint arXiv:2512.21766,

work page arXiv
[5]

Towards an AI co-scientist

24 Clarus Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. Towards an ai co-scientist. arXiv preprint arXiv:2502.18864,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Betaweb: Towards a blockchain-enabled trustworthy agentic web.arXiv preprint arXiv:2508.13787,

Zihan Guo, Yuanjian Zhou, Chenyi Wang, Linlin You, Minjie Bian, and Weinan Zhang. Betaweb: Towards a blockchain-enabled trustworthy agentic web.arXiv preprint arXiv:2508.13787,

work page arXiv
[7]

Which contributions deserve credit? perceptions of attribution in human-ai co-creation

Jessica He, Stephanie Houde, and Justin D Weisz. Which contributions deserve credit? perceptions of attribution in human-ai co-creation. InProceedings of the 2025 CHI conference on human factors in computing systems, pp. 1–18,

2025
[8]

Repro-bench: Can agentic ai systems assess the reproducibility of social science research? InFindings of the Association for Computational Linguistics: ACL 2025, pp

Chuxuan Hu, Liyun Zhang, Yeji Lim, Aum Wadhwani, Austin Peters, and Daniel Kang. Repro-bench: Can agentic ai systems assess the reproducibility of social science research? InFindings of the Association for Computational Linguistics: ACL 2025, pp. 23616–23626,

2025
[9]

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

Komal Kumar, Aman Chadha, Salman Khan, Fahad Shahbaz Khan, and Hisham Cholakkal. Pa- per circle: An open-source multi-agent research discovery and analysis framework.arXiv preprint arXiv:2604.06170,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Agent-oriented planning in multi-agent systems

Ao Li, Yuexiang Xie, Songze Li, Fugee Tsung, Bolin Ding, and Yaliang Li. Agent-oriented planning in multi-agent systems. InInternational Conference on Learning Representations, volume 2025, pp. 19495–19517,

2025
[11]

AutoSOTA: An End-to-End Automated Research System for State-of-the-Art AI Model Discovery

Yu Li, Chenyang Shao, Xinyang Liu, Ruotong Zhao, Peijie Liu, Hongyuan Su, Zhibin Chen, Qinglong Yang, Anjie Xu, Yi Fang, et al. Autosota: An end-to-end automated research system for state-of- the-art ai model discovery.arXiv preprint arXiv:2604.05550,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Hall, Zoey Warecki, John Cum- ings, Hideomi Koinuma, Aaron Gilad Kusne, Mikk Lippmaa, and Ichiro Takeuchi

Haotong Liang, Yunlong Sun, Ryan Paxson, Chih-Yu Lee, Alex T. Hall, Zoey Warecki, John Cum- ings, Hideomi Koinuma, Aaron Gilad Kusne, Mikk Lippmaa, and Ichiro Takeuchi. Autonomous epitaxial atomic-layer synthesis via real-time computer vision of electron diffraction.arXiv preprint arXiv:2602.20432,

work page arXiv
[13]

A vision for auto research with llm agents.arXiv preprint arXiv:2504.18765,

Chengwei Liu, Chong Wang, Jiayue Cao, Jingquan Ge, Kun Wang, Lyuye Zhang, Ming-Ming Cheng, Penghai Zhao, Tianlin Li, Xiaojun Jia, et al. A vision for auto research with llm agents.arXiv preprint arXiv:2504.18765,

work page arXiv
[14]

The Last Human-Written Paper: Agent-Native Research Artifacts

Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, et al. The last human-written paper: Agent-native research artifacts. arXiv preprint arXiv:2604.24658, 2026a. Jiaqi Liu, Shi Qiu, Mairui Li, Bingzhou Li, Haonian Ji, Siwei Han, Xinyu Ye, Peng Xia, Zihan Dong, Congyu Zhang, et al. A...

work page internal anchor Pith review Pith/arXiv arXiv
[15]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292,

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Leveraging large language models for effective and explainable multi-agent credit assignment.arXiv preprint arXiv:2502.16863,

26 Clarus Kartik Nagpal, Dayi Dong, Jean-Baptiste Bouvier, and Negar Mehr. Leveraging large language models for effective and explainable multi-agent credit assignment.arXiv preprint arXiv:2502.16863,

work page arXiv
[17]

Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web

Xiaohang Nie, Zihan Guo, Zicai Cui, Jiachi Yang, Zeyi Chen, Leheyi De, Yu Zhang, Junwei Liao, Bo Huang, Yingxuan Yang, et al. Holos: A web-scale llm-based multi-agent system for the agentic web.arXiv preprint arXiv:2604.02334, 2026a. Xiaohang Nie, Zihan Guo, Kezhuo Yang, Zhichong Zheng, Bochen Ge, Shuai Pan, Zeyi Chen, Youling Xiang, Yu Zhang, Weiwen Liu,...

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Agent R xiv: Towards collaborative autonomous research

Samuel Schmidgall and Michael Moor. Agentrxiv: Towards collaborative autonomous research.arXiv preprint arXiv:2503.18102,

work page arXiv
[19]

Agent laboratory: Using llm agents as research assistants

Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. Agent laboratory: Using llm agents as research assistants. Findings of the Association for Computational Linguistics: EMNLP 2025, pp. 5977–6043,

2025
[20]

Authenticated delegation and authorized ai agents

Tobin South, Samuele Marro, Thomas Hardjono, Robert Mahari, Cedric Deslandes Whitney, Dazza Greenwood, Alan Chan, and Alex Pentland. Authenticated delegation and authorized ai agents. arXiv preprint arXiv:2501.09674,

work page arXiv
[21]

PaperBench: Evaluating AI's Ability to Replicate AI Research

Giulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, Jun Shern Chan, Leon Maksin, Rachel Dias, Evan Mays, Benjamin Kinsella, Wyatt Thompson, et al. Paperbench: Evaluating ai’s ability to replicate ai research.arXiv preprint arXiv:2504.01848,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Multi-agent coordination across diverse applications: A survey.arXiv preprint arXiv:2502.14743,

Lijun Sun, Yijun Yang, Qiqi Duan, Yuhui Shi, Chao Lyu, Yu-Cheng Chang, Chin-Teng Lin, and Yang Shen. Multi-agent coordination across diverse applications: A survey.arXiv preprint arXiv:2502.14743,

work page arXiv
[23]

Value-Decomposition Networks For Cooperative Multi-Agent Learning

Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. Value-decomposition networks for cooperative multi-agent learning.arXiv preprint arXiv:1706.05296,

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Robert T Thibault, Olavo B Amaral, Felipe Argolo, Anita E Bandrowski, Natascha I Drude, et al

doi: 10.36227/techrxiv.176540311.11203219/v1. Robert T Thibault, Olavo B Amaral, Felipe Argolo, Anita E Bandrowski, Natascha I Drude, et al. Open science 2.0: Towards a truly collaborative research ecosystem.PLoS Biology, 21(10):e3002362,

work page doi:10.36227/techrxiv.176540311.11203219/v1
[25]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322,

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Reasoning Provenance for Autonomous AI Agents: Structured Behavioral Analytics Beyond State Checkpoints and Execution Traces

Neelmani Vispute and Aditya Kadam. Reasoning provenance for autonomous ai agents: Structured behavioral analytics beyond state checkpoints and execution traces.arXiv preprint arXiv:2603.21692,

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Agent-temporal attention for reward redistribution in episodic multi-agent reinforcement learning.arXiv preprint arXiv:2201.04612,

Baicen Xiao, Bhaskar Ramasubramanian, and Radha Poovendran. Agent-temporal attention for reward redistribution in episodic multi-agent reinforcement learning.arXiv preprint arXiv:2201.04612,

work page arXiv
[28]

Asi-evolve: Ai accelerates ai.arXiv preprint arXiv:2603.29640,

Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, and Pengfei Liu. Asi-evolve: Ai accelerates ai.arXiv preprint arXiv:2603.29640,

work page arXiv
[29]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066,

work page internal anchor Pith review Pith/arXiv arXiv
[30]

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Ruofeng Yang, Yongcan Li, and Shuai Li. Aris: Autonomous research via adversarial multi-agent collaboration.arXiv preprint arXiv:2605.03042,

work page internal anchor Pith review Pith/arXiv arXiv
[31]

R&d-agent: Automating data-driven ai solution building through llm-powered automated research, development, and evolution.arXiv preprint arXiv:2505.14738, 2025a

Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu, et al. R&d-agent: Automating data-driven ai solution building through llm-powered automated research, development, and evolution.arXiv preprint arXiv:2505.14738, 2025a. Yingxuan Yang, Huacan Chai, Yuanyi Song, Siyuan Qi, Muning Wen, Ning Li...

work page arXiv
[32]

Verified multi-agent orchestration: A plan-execute-verify-replan framework for complex query resolution.arXiv preprint arXiv:2603.11445,

Xing Zhang, Yanwei Cui, Guanghui Wang, Wei Qiu, Ziyuan Li, Fangwei Han, Yajing Huang, Hengzhi Qiu, Bing Zhu, and Peiyang He. Verified multi-agent orchestration: A plan-execute-verify-replan framework for complex query resolution.arXiv preprint arXiv:2603.11445,

work page arXiv
[33]

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Chenyu Zhou, Huacan Chai, Wenteng Chen, Zihan Guo, Rong Shan, Yuanyi Song, Tianyi Xu, Yingxuan Yang, Aofan Yu, Weiming Zhang, et al. Externalization in llm agents: A unified review of memory, skills, protocols and harness engineering.arXiv preprint arXiv:2604.08224,

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Agent-as-a-judge: Evaluate agents with agents.arXiv preprint arXiv:2410.10934,

Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoorthi, Yuandong Tian, et al. Agent-as-a-judge: Evaluate agents with agents.arXiv preprint arXiv:2410.10934,

work page arXiv

[1] [1]

A survey on llm-based multi- agent system: Recent advances and new frontiers in application.arXiv preprint arXiv:2412.17481,

Shuaihang Chen, Yuanxing Liu, Wei Han, Weinan Zhang, and Ting Liu. A survey on llm-based multi- agent system: Recent advances and new frontiers in application.arXiv preprint arXiv:2412.17481,

work page arXiv

[2] [2]

Abul Ehtesham, Aditi Singh, Gaurav Kumar Gupta, and Saket Kumar. A survey of agent interoper- ability protocols: Model context protocol (mcp), agent communication protocol (acp), agent-to-agent protocol (a2a), and agent network protocol (anp).arXiv preprint arXiv:2505.02279,

work page arXiv

[3] [3]

Agentic LLM Reasoning in a Self-Driving Laboratory for Air-Sensitive Lithium Halide Spinel Conductors

Yuxing Fei, Bernardus Rendy, Xiaochen Yang, Junhee Woo, Xu Huang, Chang Li, Shilong Wang, David Milsted, Yan Zeng, and Gerbrand Ceder. Agentic llm reasoning in a self-driving laboratory for air-sensitive lithium halide spinel conductors.arXiv preprint arXiv:2604.11957,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Unilabos: An ai-native operating system for autonomous laboratories.arXiv preprint arXiv:2512.21766,

Jing Gao, Junhan Chang, Haohui Que, Yanfei Xiong, Shixiang Zhang, Xianwei Qi, Zhen Liu, Jun-Jie Wang, Qianjun Ding, Xinyu Li, et al. Unilabos: An ai-native operating system for autonomous laboratories.arXiv preprint arXiv:2512.21766,

work page arXiv

[5] [5]

Towards an AI co-scientist

24 Clarus Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. Towards an ai co-scientist. arXiv preprint arXiv:2502.18864,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Betaweb: Towards a blockchain-enabled trustworthy agentic web.arXiv preprint arXiv:2508.13787,

Zihan Guo, Yuanjian Zhou, Chenyi Wang, Linlin You, Minjie Bian, and Weinan Zhang. Betaweb: Towards a blockchain-enabled trustworthy agentic web.arXiv preprint arXiv:2508.13787,

work page arXiv

[7] [7]

Which contributions deserve credit? perceptions of attribution in human-ai co-creation

Jessica He, Stephanie Houde, and Justin D Weisz. Which contributions deserve credit? perceptions of attribution in human-ai co-creation. InProceedings of the 2025 CHI conference on human factors in computing systems, pp. 1–18,

2025

[8] [8]

Repro-bench: Can agentic ai systems assess the reproducibility of social science research? InFindings of the Association for Computational Linguistics: ACL 2025, pp

Chuxuan Hu, Liyun Zhang, Yeji Lim, Aum Wadhwani, Austin Peters, and Daniel Kang. Repro-bench: Can agentic ai systems assess the reproducibility of social science research? InFindings of the Association for Computational Linguistics: ACL 2025, pp. 23616–23626,

2025

[9] [9]

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

Komal Kumar, Aman Chadha, Salman Khan, Fahad Shahbaz Khan, and Hisham Cholakkal. Pa- per circle: An open-source multi-agent research discovery and analysis framework.arXiv preprint arXiv:2604.06170,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Agent-oriented planning in multi-agent systems

Ao Li, Yuexiang Xie, Songze Li, Fugee Tsung, Bolin Ding, and Yaliang Li. Agent-oriented planning in multi-agent systems. InInternational Conference on Learning Representations, volume 2025, pp. 19495–19517,

2025

[11] [11]

AutoSOTA: An End-to-End Automated Research System for State-of-the-Art AI Model Discovery

Yu Li, Chenyang Shao, Xinyang Liu, Ruotong Zhao, Peijie Liu, Hongyuan Su, Zhibin Chen, Qinglong Yang, Anjie Xu, Yi Fang, et al. Autosota: An end-to-end automated research system for state-of- the-art ai model discovery.arXiv preprint arXiv:2604.05550,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Hall, Zoey Warecki, John Cum- ings, Hideomi Koinuma, Aaron Gilad Kusne, Mikk Lippmaa, and Ichiro Takeuchi

Haotong Liang, Yunlong Sun, Ryan Paxson, Chih-Yu Lee, Alex T. Hall, Zoey Warecki, John Cum- ings, Hideomi Koinuma, Aaron Gilad Kusne, Mikk Lippmaa, and Ichiro Takeuchi. Autonomous epitaxial atomic-layer synthesis via real-time computer vision of electron diffraction.arXiv preprint arXiv:2602.20432,

work page arXiv

[13] [13]

A vision for auto research with llm agents.arXiv preprint arXiv:2504.18765,

Chengwei Liu, Chong Wang, Jiayue Cao, Jingquan Ge, Kun Wang, Lyuye Zhang, Ming-Ming Cheng, Penghai Zhao, Tianlin Li, Xiaojun Jia, et al. A vision for auto research with llm agents.arXiv preprint arXiv:2504.18765,

work page arXiv

[14] [14]

The Last Human-Written Paper: Agent-Native Research Artifacts

Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, et al. The last human-written paper: Agent-native research artifacts. arXiv preprint arXiv:2604.24658, 2026a. Jiaqi Liu, Shi Qiu, Mairui Li, Bingzhou Li, Haonian Ji, Siwei Han, Xinyu Ye, Peng Xia, Zihan Dong, Congyu Zhang, et al. A...

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292,

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Leveraging large language models for effective and explainable multi-agent credit assignment.arXiv preprint arXiv:2502.16863,

26 Clarus Kartik Nagpal, Dayi Dong, Jean-Baptiste Bouvier, and Negar Mehr. Leveraging large language models for effective and explainable multi-agent credit assignment.arXiv preprint arXiv:2502.16863,

work page arXiv

[17] [17]

Holos: A Web-Scale LLM-Based Multi-Agent System for the Agentic Web

Xiaohang Nie, Zihan Guo, Zicai Cui, Jiachi Yang, Zeyi Chen, Leheyi De, Yu Zhang, Junwei Liao, Bo Huang, Yingxuan Yang, et al. Holos: A web-scale llm-based multi-agent system for the agentic web.arXiv preprint arXiv:2604.02334, 2026a. Xiaohang Nie, Zihan Guo, Kezhuo Yang, Zhichong Zheng, Bochen Ge, Shuai Pan, Zeyi Chen, Youling Xiang, Yu Zhang, Weiwen Liu,...

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Agent R xiv: Towards collaborative autonomous research

Samuel Schmidgall and Michael Moor. Agentrxiv: Towards collaborative autonomous research.arXiv preprint arXiv:2503.18102,

work page arXiv

[19] [19]

Agent laboratory: Using llm agents as research assistants

Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. Agent laboratory: Using llm agents as research assistants. Findings of the Association for Computational Linguistics: EMNLP 2025, pp. 5977–6043,

2025

[20] [20]

Authenticated delegation and authorized ai agents

Tobin South, Samuele Marro, Thomas Hardjono, Robert Mahari, Cedric Deslandes Whitney, Dazza Greenwood, Alan Chan, and Alex Pentland. Authenticated delegation and authorized ai agents. arXiv preprint arXiv:2501.09674,

work page arXiv

[21] [21]

PaperBench: Evaluating AI's Ability to Replicate AI Research

Giulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, Jun Shern Chan, Leon Maksin, Rachel Dias, Evan Mays, Benjamin Kinsella, Wyatt Thompson, et al. Paperbench: Evaluating ai’s ability to replicate ai research.arXiv preprint arXiv:2504.01848,

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

Multi-agent coordination across diverse applications: A survey.arXiv preprint arXiv:2502.14743,

Lijun Sun, Yijun Yang, Qiqi Duan, Yuhui Shi, Chao Lyu, Yu-Cheng Chang, Chin-Teng Lin, and Yang Shen. Multi-agent coordination across diverse applications: A survey.arXiv preprint arXiv:2502.14743,

work page arXiv

[23] [23]

Value-Decomposition Networks For Cooperative Multi-Agent Learning

Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. Value-decomposition networks for cooperative multi-agent learning.arXiv preprint arXiv:1706.05296,

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Robert T Thibault, Olavo B Amaral, Felipe Argolo, Anita E Bandrowski, Natascha I Drude, et al

doi: 10.36227/techrxiv.176540311.11203219/v1. Robert T Thibault, Olavo B Amaral, Felipe Argolo, Anita E Bandrowski, Natascha I Drude, et al. Open science 2.0: Towards a truly collaborative research ecosystem.PLoS Biology, 21(10):e3002362,

work page doi:10.36227/techrxiv.176540311.11203219/v1

[25] [25]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, and Hoang D Nguyen. Multi-agent collaboration mechanisms: A survey of llms.arXiv preprint arXiv:2501.06322,

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Reasoning Provenance for Autonomous AI Agents: Structured Behavioral Analytics Beyond State Checkpoints and Execution Traces

Neelmani Vispute and Aditya Kadam. Reasoning provenance for autonomous ai agents: Structured behavioral analytics beyond state checkpoints and execution traces.arXiv preprint arXiv:2603.21692,

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

Agent-temporal attention for reward redistribution in episodic multi-agent reinforcement learning.arXiv preprint arXiv:2201.04612,

Baicen Xiao, Bhaskar Ramasubramanian, and Radha Poovendran. Agent-temporal attention for reward redistribution in episodic multi-agent reinforcement learning.arXiv preprint arXiv:2201.04612,

work page arXiv

[28] [28]

Asi-evolve: Ai accelerates ai.arXiv preprint arXiv:2603.29640,

Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, and Pengfei Liu. Asi-evolve: Ai accelerates ai.arXiv preprint arXiv:2603.29640,

work page arXiv

[29] [29]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066,

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

Ruofeng Yang, Yongcan Li, and Shuai Li. Aris: Autonomous research via adversarial multi-agent collaboration.arXiv preprint arXiv:2605.03042,

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

R&d-agent: Automating data-driven ai solution building through llm-powered automated research, development, and evolution.arXiv preprint arXiv:2505.14738, 2025a

Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu, et al. R&d-agent: Automating data-driven ai solution building through llm-powered automated research, development, and evolution.arXiv preprint arXiv:2505.14738, 2025a. Yingxuan Yang, Huacan Chai, Yuanyi Song, Siyuan Qi, Muning Wen, Ning Li...

work page arXiv

[32] [32]

Verified multi-agent orchestration: A plan-execute-verify-replan framework for complex query resolution.arXiv preprint arXiv:2603.11445,

Xing Zhang, Yanwei Cui, Guanghui Wang, Wei Qiu, Ziyuan Li, Fangwei Han, Yajing Huang, Hengzhi Qiu, Bing Zhu, and Peiyang He. Verified multi-agent orchestration: A plan-execute-verify-replan framework for complex query resolution.arXiv preprint arXiv:2603.11445,

work page arXiv

[33] [33]

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Chenyu Zhou, Huacan Chai, Wenteng Chen, Zihan Guo, Rong Shan, Yuanyi Song, Tianyi Xu, Yingxuan Yang, Aofan Yu, Weiming Zhang, et al. Externalization in llm agents: A unified review of memory, skills, protocols and harness engineering.arXiv preprint arXiv:2604.08224,

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Agent-as-a-judge: Evaluate agents with agents.arXiv preprint arXiv:2410.10934,

Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoorthi, Yuandong Tian, et al. Agent-as-a-judge: Evaluate agents with agents.arXiv preprint arXiv:2410.10934,

work page arXiv