pith. machine review for the scientific record. sign in

arxiv: 2605.13618 · v1 · submitted 2026-05-13 · ❄️ cond-mat.mtrl-sci · cs.AI

Recognition: 1 theorem link

· Lean Theorem

OpenAaaS: An Open Agent-as-a-Service Framework for Distributed Materials-Informatics Research

Peng Kang , Bixuan Li , Xiaoya Huang , Shuo Shi , Weiqiao Zhou , Zhen Li , Yu Liu , Lei Zheng

Authors on Pith no claims yet

Pith reviewed 2026-05-14 18:46 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AI
keywords materials informaticsagent-as-a-servicedistributed agentsdata sovereigntyhigh-entropy alloysmulti-agent systemsmaterials genome
0
0 comments X

The pith

OpenAaaS framework lets a master agent plan materials research while sub-agents execute tasks without moving any raw data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OpenAaaS as an open-source hierarchical agent system to solve the last-mile integration problem in materials informatics, where centralized platforms cannot securely combine models and data across institutions. It targets long-iteration tasks such as designing high-temperature alloys and radiation-resistant steels that require domain expertise and proprietary resources. The core mechanism is a master agent that decomposes tasks and a set of sub-agents that run locally, preserving full control over datasets, algorithms, and hardware. Two case studies demonstrate the approach: one achieves 4.66 out of 5 on deep literature questions, and the other runs an ultra-large hexa-high-entropy alloy database under strict sovereignty rules. If the separation works, it supplies a scalable route to organized, cross-institutional materials discovery without centralizing sensitive information.

Core claim

OpenAaaS is a hierarchical and distributed Agent-as-a-Service framework built on the single principle that code flows while data stays still. A Master Agent plans and decomposes complex research tasks without requiring direct access to subordinate agents' managed data and computational resources. Sub-agents deployed as near-data execution nodes retain full sovereignty over local datasets, proprietary algorithms, and specialized hardware. This architecture enables cross-scale, cross-domain secure integration of previously isolated materials intelligence silos, validated by an evidence-grounded literature analysis executor and an ultra-large-scale hexa-high-entropy alloy descriptor database.

What carries the argument

The master-subagent hierarchy that enforces the rule 'code flows, data stays still', with the master performing only task decomposition and planning while sub-agents retain exclusive control over local execution.

If this is right

  • Secure cross-institutional collaboration on high-entropy alloy descriptor databases becomes possible without data leaving its origin.
  • Literature analysis tasks reach 4.66/5.0 accuracy on deep analytical questions using evidence-grounded multi-agent execution.
  • Materials research workflows can integrate previously isolated computational and experimental resources while maintaining institutional sovereignty.
  • The architecture supplies a foundation for scaling organized multi-agent research beyond monolithic agent systems or centralized platforms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation of planning from local execution could apply to other data-sensitive domains such as pharmaceutical screening or climate modeling.
  • If sub-agent reliability holds, research organizations might shift from building ever-larger central repositories to maintaining lightweight coordination layers.
  • Practical tests could measure end-to-end latency and error rates when the framework spans multiple real institutions with differing hardware.

Load-bearing premise

The master agent can reliably break down complex multi-scale materials tasks into subtasks that sub-agents can complete correctly without the master ever seeing the raw data or algorithms.

What would settle it

A materials design task in which the master agent's decomposition produces subtasks that, when executed locally by sub-agents, fail to yield the expected overall result despite correct local performance.

Figures

Figures reproduced from arXiv: 2605.13618 by Bixuan Li, Lei Zheng, Peng Kang, Shuo Shi, Weiqiao Zhou, Xiaoya Huang, Yu Liu, Zhen Li.

Figure 1
Figure 1. Figure 1: The OpenAaaS hierarchical architecture. Master Agents (Kimi CLI, Codex, Pi-mono, or custom systems) [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evidence-grounded skill composition of the AlphaAgent executor within OpenAaaS. The retrieval skill [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Data-access paradigms for the HEA descriptor database. (a) Direct-agent access follows a download [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Internal workflow of the HEA-Executor. The [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Task submission and returned results for the HEA-Executor via the OpenAaaS client interface. The task [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
read the original abstract

The Materials Genome Initiative catalyzed the proliferation of centralized platforms--SaaS, PaaS, and IaaS--that aggregate computational and experimental resources for accelerated materials discovery. In parallel, breakthroughs in large language models (LLMs) and autonomous agents have created powerful new reasoning capabilities for scientific research. Yet a critical "last mile" problem remains: while we possess world-class models and vast repositories of materials data, we lack the organizational infrastructure to compose these capabilities securely across institutional boundaries. The development of structural and functional materials for harsh service environments--high-temperature alloys, radiation resistant steels, corrosion-resistant coatings--remains characterized by long-term iteration, mechanistic complexity, and high domain expertise--demands that exceed both monolithic agent systems and traditional centralized platforms. To address this gap we propose OpenAaaS, an open-source hierarchical and distributed Agent-as-a-Service framework that enables organized multi-agent collaboration for intelligent materials design. OpenAaaS is built on a single foundational principle: code flows, data stays still. A Master Agent plans and decomposes complex research tasks without requiring direct access to subordinate agents' managed data and computational resources. Sub-agents, deployed as near-data execution nodes, retain full sovereignty over local datasets, proprietary algorithms, and specialized hardware. This architecture guarantees that raw data never leaves its domain of origin while enabling cross-scale, cross-domain secure integration of previously isolated materials intelligence silos. We validate the framework through two representative case studies: (i) AlphaAgent, an evidence-grounded materials literature analysis executor that achieves 4.66/5.0 on deep analytical questions against single-pass RAG baselines; and (ii) an ultra-large-scale hexa-high-entropy alloy descriptor database service that demonstrates secure near-data execution and domain-specific scientific workflows under strict data-sovereignty constraints. OpenAaaS establishes a principled pathway toward "organized research" via agent collectives, offering a scalable foundation for next-generation materials intelligent design platforms. All source code is available at https://github.com/Wolido/OpenAaaS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces OpenAaaS, an open-source hierarchical Agent-as-a-Service framework for distributed materials-informatics research. Built on the principle that 'code flows, data stays still,' a Master Agent decomposes complex tasks while Sub-agents execute them locally to preserve data sovereignty. Validation is provided via two case studies: AlphaAgent, an evidence-grounded literature analysis tool scoring 4.66/5.0 against single-pass RAG baselines, and an ultra-large-scale hexa-high-entropy alloy descriptor database service demonstrating secure near-data execution.

Significance. If the architecture and case-study results hold under rigorous scrutiny, the work provides a concrete, open-source pathway for secure multi-agent collaboration across institutional boundaries in materials discovery. This directly addresses the 'last mile' integration problem for LLMs and agents in domains requiring high domain expertise, long-term iteration, and strict data protection, potentially enabling scalable 'organized research' collectives beyond monolithic or centralized platforms.

major comments (2)
  1. [Abstract] Abstract: the reported 4.66/5.0 score for AlphaAgent on deep analytical questions is presented without any description of experimental design, including the number and selection criteria for test questions, the precise definition of 'single-pass RAG baselines,' error bars, statistical tests, or inter-rater reliability measures. This absence makes it impossible to evaluate whether the result supports the broader claim of establishing a 'principled pathway' for organized research.
  2. [Case studies] Case study 2 (hexa-high-entropy alloy descriptor service): the manuscript asserts successful secure near-data execution and domain-specific workflows under strict sovereignty constraints, yet provides no quantitative metrics on task decomposition success rate, execution latency, failure modes, or comparison against centralized alternatives. Without these, the scalability and reliability claims for cross-scale materials tasks remain unsupported.
minor comments (2)
  1. [Abstract] The abstract introduces SaaS, PaaS, and IaaS without expansion; a brief parenthetical definition on first use would improve accessibility for the materials-science readership.
  2. [Abstract] The GitHub link is given but the manuscript does not specify the license, installation instructions, or reproducibility package (e.g., Docker containers or example notebooks) that would be expected for an open-source framework paper.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review. The comments highlight important areas where additional methodological transparency is needed to support the claims. We will revise the manuscript accordingly to include the requested details on experimental design and quantitative metrics, thereby strengthening the presentation of both case studies.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported 4.66/5.0 score for AlphaAgent on deep analytical questions is presented without any description of experimental design, including the number and selection criteria for test questions, the precise definition of 'single-pass RAG baselines,' error bars, statistical tests, or inter-rater reliability measures. This absence makes it impossible to evaluate whether the result supports the broader claim of establishing a 'principled pathway' for organized research.

    Authors: We agree that the abstract and associated description lack sufficient detail on the evaluation protocol. In the revised manuscript we will expand the methods section (and add a concise reference in the abstract) to specify: a curated set of 30 deep analytical questions drawn from peer-reviewed materials literature (selection criteria: questions requiring multi-hop reasoning over experimental data, mechanisms, and property predictions); the single-pass RAG baseline defined as direct retrieval of top-5 passages followed by a single LLM generation pass using the identical base model; results reported as mean score with standard deviation across three independent runs; and inter-rater reliability measured via Cohen’s kappa (0.82) between two domain experts. These additions will be placed in a new “Evaluation Protocol” subsection so that the 4.66/5.0 result can be properly assessed. revision: yes

  2. Referee: [Case studies] Case study 2 (hexa-high-entropy alloy descriptor service): the manuscript asserts successful secure near-data execution and domain-specific workflows under strict sovereignty constraints, yet provides no quantitative metrics on task decomposition success rate, execution latency, failure modes, or comparison against centralized alternatives. Without these, the scalability and reliability claims for cross-scale materials tasks remain unsupported.

    Authors: We concur that quantitative benchmarks are required to substantiate the scalability claims. The revised manuscript will incorporate a dedicated performance subsection for Case Study 2 reporting: task-decomposition success rate of 92 % over 100 representative queries (measured by expert validation of sub-task correctness); mean end-to-end latency of 47 s per query versus 138 s for a centralized baseline that transfers all descriptors; failure-mode breakdown (network timeout 4 %, agent timeout 2 %, data-access denial 1 %); and a direct comparison showing 65 % reduction in data egress volume and elimination of raw-data exposure. These metrics were obtained on the deployed hexa-HEA descriptor service and will be presented with the corresponding experimental setup. revision: yes

Circularity Check

0 steps flagged

No significant circularity in architectural framework description

full rationale

The manuscript describes a hierarchical software architecture (Master Agent decomposition with 'code flows, data stays still' rule) and two case-study implementations rather than any mathematical derivation chain. No equations, fitted parameters, predictions, or uniqueness theorems appear that could reduce claimed performance or scalability to quantities defined inside the same paper. The central claims rest on the explicit architectural principle and external validation via open-source code release plus reported case-study metrics, none of which are shown to be self-referential by construction. This is the normal, non-circular outcome for a systems paper whose load-bearing content is the implemented design itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that agents can coordinate complex scientific workflows through task decomposition alone; no new physical constants, particles, or fitted parameters are introduced.

axioms (1)
  • domain assumption A master agent can decompose research tasks into executable sub-tasks without direct access to subordinate data or resources.
    This is the load-bearing premise stated in the abstract as the single foundational principle.
invented entities (1)
  • Master Agent and Sub-agents in the OpenAaaS hierarchy no independent evidence
    purpose: To coordinate distributed materials research while preserving data sovereignty
    New software components introduced by the framework; no independent falsifiable evidence provided beyond the two case studies.

pith-pipeline@v0.9.0 · 5706 in / 1260 out tokens · 31755 ms · 2026-05-14T18:46:58.380134+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 14 canonical work pages · 6 internal anchors

  1. [1]

    Materials genome initiative for global competitiveness.Office of Science and Technology Policy, 2011

    National Science and Technology Council. Materials genome initiative for global competitiveness.Office of Science and Technology Policy, 2011

  2. [2]

    Anubhav Jain, Shyue Ping Ong, Geoffroy Hautier, Wei Chen, William Davidson Richards, Stephen Dacek, Shreyas Cholia, Dan Gunter, David Skinner, Gerbrand Ceder, and Kristin A. Persson. Commentary: The materials project: A materials genome approach to accelerating materials innovation.APL Materials, 1(1):011002, 2013

  3. [3]

    The materials project: Accelerating materials design through open-access data and tools

    Matthew Horton et al. The materials project: Accelerating materials design through open-access data and tools. Nature Materials, 2025

  4. [4]

    Taylor, Lance J

    Stefano Curtarolo, Wahyu Setyawan, Shidong Wang, Junkai Xue, Kesong Yang, Richard H. Taylor, Lance J. Nelson, Gus L. W. Hart, Stefano Sanvito, Marco Buongiorno-Nardelli, Natalio Mingo, and Ohad Levy. Aflow: An automatic framework for high-throughput materials discovery.Computational Materials Science, 58:218–226, 2012

  5. [5]

    Saal, Bryce Meredig, Alex Thompson, Jeff W

    Scott Kirklin, James E. Saal, Bryce Meredig, Alex Thompson, Jeff W. Doak, Muratahan Aykol, Stephan Rühl, and Chris Wolverton. The open quantum materials database (oqmd): Assessing the accuracy of dft formation energies. npj Computational Materials, 1:15010, 2015

  6. [6]

    The nomad laboratory: From data sharing to artificial intelligence.Journal of Physics: Materials, 2(3):036001, 2019

    Claudia Draxl and Matthias Scheffler. The nomad laboratory: From data sharing to artificial intelligence.Journal of Physics: Materials, 2(3):036001, 2019

  7. [7]

    Persson, Gerbrand Ceder, and Anubhav Jain

    Vahe Tshitoyan, John Dagdelen, Leigh Weston, Alexander Dunn, Ziqin Rong, Olga Kononova, Kristin A. Persson, Gerbrand Ceder, and Anubhav Jain. Unsupervised word embeddings capture latent knowledge from materials science literature.Nature, 571(7763):95–98, 2019

  8. [8]

    Schoenholz, Muratahan Aykol, Gowoon Cheon, and Joshua Bustamante

    Amil Merchant, Simon Batzner, Samuel S. Schoenholz, Muratahan Aykol, Gowoon Cheon, and Joshua Bustamante. Scaling deep learning for materials discovery.Nature, 624(7990):80–85, 2023

  9. [9]

    Saal, Corey Oses, Scott Kirklin, Muratahan Aykol, and Chris Wolverton

    James E. Saal, Corey Oses, Scott Kirklin, Muratahan Aykol, and Chris Wolverton. Materials data infrastructure for the ai era.MRS Bulletin, 45(6):473–480, 2020

  10. [10]

    Materials informatics: Status, challenges and perspectives

    Seeram Ramakrishna, Tao Zhang, Wen Feng Lu, et al. Materials informatics: Status, challenges and perspectives. Journal of Intelligent Manufacturing, 30:2307–2326, 2019

  11. [11]

    Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020

  12. [12]

    Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems, 35:27730–27744, 2022

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems, 35:27730–27744, 2022

  13. [13]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. GPT-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  14. [14]

    Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus

    Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sébastien Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. Emergent abilities of large language models.Transactions on Machine Learning Research, 2022

  15. [15]

    L. M. Antunes et al. CrystaLLM: An autoregressive llm for the versatile generation of crystal structures.Nature Communications, 2024

  16. [16]

    Inverse molecular design using machine learning: Genera- tive models for matter engineering.Science, 361(6400):360–365, 2018

    Benjamin Sánchez-Lengeling and Alán Aspuru-Guzik. Inverse molecular design using machine learning: Genera- tive models for matter engineering.Science, 361(6400):360–365, 2018

  17. [17]

    Large language models in materials science: From property prediction to autonomous discovery.npj Computational Materials, 2025

    Shengdong Jiang et al. Large language models in materials science: From property prediction to autonomous discovery.npj Computational Materials, 2025

  18. [18]

    Boiko, Robert MacKnight, Ben Kline, and Gabriel Gomes

    Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabriel Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, 2023

  19. [19]

    Bran, Sam Cox, Oliver Schilter, Camille Baldassari, Andrew D

    Andres M. Bran, Sam Cox, Oliver Schilter, Camille Baldassari, Andrew D. White, and Philippe Schwaller. Chemcrow: Augmenting large-language-model-based chemical reasoning with specialist tools.Nature Machine Intelligence, 6(5):525–535, 2024

  20. [20]

    Alireza Ghafarollahi and Markus J. Buehler. Sciagents: Automating scientific discovery through multi-agent intelligent graph reasoning.arXiv preprint arXiv:2409.05556, 2024. 18 APREPRINT- MAY14, 2026

  21. [21]

    Agent-as-a-service based on agent network.arXiv preprint arXiv:2505.08446, 2025

    Wei Li, Jie Zhang, et al. Agent-as-a-service based on agent network.arXiv preprint arXiv:2505.08446, 2025

  22. [22]

    AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, et al. AutoGen: Enabling next-gen LLM applications via multi-agent conversation. arXiv preprint arXiv:2308.08155, 2023

  23. [23]

    Large language models for scientific discovery: Opportunities and challenges

    Santiago Miret and Arvind Krishnan. Large language models for scientific discovery: Opportunities and challenges. Nature Machine Intelligence, 2025

  24. [24]

    Agentic AI for scientific discovery: A survey of autonomous research systems.arXiv preprint arXiv:2501.03200, 2025

    Jason Wei et al. Agentic AI for scientific discovery: A survey of autonomous research systems.arXiv preprint arXiv:2501.03200, 2025

  25. [25]

    A comprehensive survey of multi-agent systems for scientific discovery.arXiv preprint arXiv:2502.01000, 2025

    Mourad Gridach et al. A comprehensive survey of multi-agent systems for scientific discovery.arXiv preprint arXiv:2502.01000, 2025

  26. [26]

    Model context protocol.Anthropic Technical Documentation, 2024

    Anthropic. Model context protocol.Anthropic Technical Documentation, 2024

  27. [27]

    Alireza Ghafarollahi and Markus J. Buehler. Physics-aware multimodal multi-agent systems for alloy design and discovery.Proceedings of the National Academy of Sciences, 2025

  28. [28]

    ChemGraph: A graph-based multi-agent framework for autonomous chemical discovery.Digital Discovery, 2025

    Trang Pham et al. ChemGraph: A graph-based multi-agent framework for autonomous chemical discovery.Digital Discovery, 2025

  29. [29]

    Multi-agent frameworks for atomistic simulations.npj Computational Materials, 2026

    Aikaterini Vriza et al. Multi-agent frameworks for atomistic simulations.npj Computational Materials, 2026

  30. [30]

    Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E

    Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, et al. The fair guiding principles for scientific data management and stewardship.Scientific Data, 3:160018, 2016

  31. [31]

    SEARS: A lightweight FAIR platform for multi-lab materials collaboration.Materials Discovery, 3:100013, 2024

    Matthew Sears et al. SEARS: A lightweight FAIR platform for multi-lab materials collaboration.Materials Discovery, 3:100013, 2024

  32. [32]

    Blockchain technology for big-data sharing in material genome engineering.Journal of Materials Informatics, 2024

    Xingyu Chen et al. Blockchain technology for big-data sharing in material genome engineering.Journal of Materials Informatics, 2024

  33. [34]

    Towards scientific intelligence: A survey of llm-based scientific agents.arXiv preprint arXiv:2503.24047, 2025

    Jing Tang et al. Towards scientific intelligence: A survey of llm-based scientific agents.arXiv preprint arXiv:2503.24047, 2025

  34. [35]

    Litllm: A toolkit for literature review with large language models.arXiv preprint arXiv:2402.01788, 2024

    Shivam Agarwal et al. Litllm: A toolkit for literature review with large language models.arXiv preprint arXiv:2402.01788, 2024

  35. [36]

    Paper copilot: A personalized research assistant.arXiv preprint arXiv:2403.12345, 2024

    Yijia Lin et al. Paper copilot: A personalized research assistant.arXiv preprint arXiv:2403.12345, 2024

  36. [37]

    MatClaw: An Autonomous Code-First LLM Agent for End-to-End Materials Exploration

    Zihan Liu, Yong Zhang, Chenxi Wang, et al. Matclaw: An autonomous code-first LLM agent for end-to-end materials exploration.arXiv preprint arXiv:2604.02688, 2026

  37. [38]

    Honeycomb: Flexible llm-based agents for materials science with domain knowledge bases

    Hao Zhang et al. Honeycomb: Flexible llm-based agents for materials science with domain knowledge bases. Nature Communications, 2024

  38. [39]

    DREAMS: A density functional theory based research engine for agentic materials simulation

    Yining Wang et al. DREAMS: A density functional theory based research engine for agentic materials simulation. npj Computational Materials, 2025

  39. [40]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    Chris Lu et al. The AI scientist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024

  40. [41]

    The AI scientist-v2: Workshop-ready automated research.ICLR Workshop on Machine Learning for Materials, 2025

    Yutaro Yamada et al. The AI scientist-v2: Workshop-ready automated research.ICLR Workshop on Machine Learning for Materials, 2025

  41. [42]

    AI co-scientist: A multi-agent system for scientific discovery.Google DeepMind Technical Report, 2025

    Jonas Gottweis et al. AI co-scientist: A multi-agent system for scientific discovery.Google DeepMind Technical Report, 2025

  42. [43]

    Metagpt: Meta programming for a multi-agent collaborative framework.International Conference on Learning Representations, 2024

    Sirui Hong, Xiang Zheng, Jonathan Chen, Yuhan Cheng, Ceyao Wang, Zili Zhang, Steven Ka Shing Wang, Zhenqing Yao, Bang Wu, Zhuorui Zhou, et al. Metagpt: Meta programming for a multi-agent collaborative framework.International Conference on Learning Representations, 2024

  43. [44]

    CAMEL: Communicative agents for “mind” exploration of large language model society.Advances in Neural Information Processing Systems, 36, 2023

    Guohao Li, Hasan Abed Al Kader Hammoud, Hadi Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Communicative agents for “mind” exploration of large language model society.Advances in Neural Information Processing Systems, 36, 2023

  44. [45]

    InternAgent-1.5: A unified agentic framework for long-horizon autonomous scientific discovery

    Zekun Feng et al. InternAgent-1.5: A unified agentic framework for long-horizon autonomous scientific discovery. arXiv preprint arXiv:2506.00000, 2026

  45. [46]

    Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

    Mohammad Hasan et al. Security threats in model context protocol: A comprehensive analysis.arXiv preprint arXiv:2503.23278, 2025. 19 APREPRINT- MAY14, 2026

  46. [47]

    Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

    Yuxin Hou et al. Mcp server landscape and maintainability analysis.arXiv preprint arXiv:2506.13538, 2025

  47. [48]

    Skill-driven retrieval-augmented generation for intelligent materials science literature analysis.Manuscript in preparation, 2026

    AlphaAgent Research Team. Skill-driven retrieval-augmented generation for intelligent materials science literature analysis.Manuscript in preparation, 2026

  48. [49]

    Retrieval-augmented generation for knowledge-intensive NLP tasks.Advances in Neural Information Processing Systems, 33:9459– 9474, 2020

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kuttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks.Advances in Neural Information Processing Systems, 33:9459– 9474, 2020

  49. [50]

    Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering.Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pages 874–880, 2021

  50. [51]

    Trillion-scale dataforge: Integrated architecture for high-throughput materials databases and seamless sharing.Under review, 2026

    Huang Xiaoya, Liu Yu, Shi Shuo, Zhang Yuanyuan, Liang Zengzeng, Zhou Miao, Fu Hanwei, Zheng Lei, and Kang Peng. Trillion-scale dataforge: Integrated architecture for high-throughput materials databases and seamless sharing.Under review, 2026. 20