arxiv: 2605.15132 · v1 · submitted 2026-05-14 · 💻 cs.AI · cs.DC· cs.MA

Recognition: no theorem link

APWA: A Distributed Architecture for Parallelizable Agentic Workflows

Evan Rose , Tushin Mallick , Matthew D. Laws , Cristina Nita-Rotaru , Alina Oprea

Authors on Pith no claims yet

Pith reviewed 2026-05-15 03:07 UTC · model grok-4.3

classification 💻 cs.AI cs.DCcs.MA

keywords multi-agent systemslarge language modelsdistributed architectureparallel workflowsagentic tasksworkflow decompositionautonomous agents

0 comments

The pith

APWA breaks complex agentic tasks into independent subproblems that run in parallel without communication.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents APWA, a distributed architecture for multi-agent LLM systems that splits large workflows into separate parts. These parts execute on different resources with no need for agents to exchange information during processing. This approach addresses the coordination and scaling limits that cause existing systems to fail on bigger jobs. The architecture handles varied domains by supporting heterogeneous data and parallel patterns.

Core claim

APWA is a distributed multi-agent system architecture designed to process heavily parallelizable agentic workloads by dynamically decomposing them into non-interfering subproblems that execute independently on heterogeneous resources without cross-communication, enabling scaling on larger tasks where prior systems fail completely.

What carries the argument

The decomposition of agentic workflows into non-interfering subproblems that permit fully independent execution without cross-communication on heterogeneous resources.

Load-bearing premise

Complex agentic workflows can be reliably decomposed into non-interfering subproblems that require no cross-communication and run independently.

What would settle it

A complex query whose correct solution requires ongoing information exchange between subproblems, causing APWA to produce errors or lose its scaling advantage.

Figures

Figures reproduced from arXiv: 2605.15132 by Alina Oprea, Cristina Nita-Rotaru, Evan Rose, Matthew D. Laws, Tushin Mallick.

**Figure 2.** Figure 2: APWA Distributed System Architecture. 3.1 APWA system-level abstractions As other multi-agent systems, APWA is organized around three main abstractions: the manager, the worker, and the executor. However, there are fundamental differences with respect to other systems, as these abstractions target data-parallel or task-parallel workflows. Both manager and worker have planning roles, the manager is performi… view at source ↗

read the original abstract

Autonomous multi-agent systems based on large language models (LLMs) have demonstrated remarkable abilities in independently solving complex tasks in a wide breadth of application domains. However, these systems hit critical reasoning, coordination, and computational scaling bottlenecks as the size and complexity of their tasks grow. These limitations hinder multi-agent systems from achieving high-throughput processing for highly parallelizable tasks, despite the availability of parallel computing and reasoning primitives in the underlying LLMs. We introduce the Agent-Parallel Workload Architecture (APWA), a distributed multi-agent system architecture designed for the efficient processing of heavily parallelizable agentic workloads. APWA facilitates parallel execution by decomposing workflows into non-interfering subproblems that can be processed using independent resources without cross-communication. It supports heterogeneous data and parallel processing patterns, and it accommodates tasks from a wide breadth of domains. In our evaluation, we demonstrate that APWA can dynamically decompose complex queries into parallelizable workflows and scales on larger tasks in settings where prior systems fail completely.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

APWA introduces a distributed architecture for decomposing agentic workflows into independent parallel subproblems, but the scaling claims rest on thin evidence in the abstract.

read the letter

The core idea here is a distributed setup called APWA that breaks complex multi-agent LLM tasks into non-interfering subproblems for execution on separate resources without any cross-communication. This targets the coordination and throughput limits that show up once workflows get larger and more parallelizable, which is a practical pain point in the space. The architecture also claims to handle heterogeneous data and processing patterns across domains, which broadens its potential reach. What stands out as new is the specific emphasis on dynamic decomposition tailored to agentic workloads, rather than generic parallelism primitives. That framing gives it a clear engineering angle that prior systems apparently do not emphasize as directly. The paper does a reasonable job laying out the motivation and high-level design without overclaiming theoretical novelty. The soft spots are mostly around verification. The abstract asserts that APWA scales on larger tasks where earlier approaches fail completely, yet it supplies no methods, datasets, metrics, or failure cases to back that up. The central assumption—that decomposition reliably produces subproblems with zero dependencies—is load-bearing, and without details on how independence is checked or what happens on latent interactions, the scaling advantage stays unproven. The stress-test note correctly flags this gap; if any real workflow has hidden data or control links, independent execution would break correctness. This work is aimed at engineers and researchers building multi-agent LLM systems who need concrete architectural options for parallelism. A reader focused on implementation patterns rather than formal proofs could extract useful design points from the high-level description. I would send it for peer review. The idea is grounded enough in a real bottleneck to merit referee input on the experiments and any code or results in the full manuscript, even if substantial revisions are likely needed to strengthen the evidence.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces the Agent-Parallel Workload Architecture (APWA), a distributed multi-agent system for LLM-based autonomous agents. APWA is designed to handle heavily parallelizable agentic workloads by dynamically decomposing complex queries into non-interfering subproblems that execute independently on heterogeneous resources with no cross-communication. The paper claims support for heterogeneous data patterns across domains and reports that evaluation demonstrates successful decomposition plus scaling on larger tasks where prior systems fail completely.

Significance. If the decomposition reliably produces independent subproblems, APWA would address key coordination and scaling bottlenecks in multi-agent LLM systems, enabling higher throughput for parallel tasks. This could have substantial practical impact on domains requiring high-volume agentic processing, provided the independence assumption holds under realistic workflow conditions.

major comments (2)

[Abstract] Abstract: the claim that APWA 'scales on larger tasks in settings where prior systems fail completely' is presented without any quantitative metrics, baselines, datasets, error bars, or implementation details, leaving the central scaling result unverifiable from the supplied text.
[Abstract] Abstract: the core mechanism asserts that workflows are decomposed into 'non-interfering subproblems' requiring 'no cross-communication,' yet the manuscript supplies no dependency analysis, concrete decomposition examples, failure-mode characterization, or validation that latent data/control dependencies are absent in the evaluated tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript introducing APWA. We address each major comment point by point below, clarifying the content of the full paper and indicating planned revisions to the abstract for improved clarity and verifiability.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that APWA 'scales on larger tasks in settings where prior systems fail completely' is presented without any quantitative metrics, baselines, datasets, error bars, or implementation details, leaving the central scaling result unverifiable from the supplied text.

Authors: The abstract is intentionally concise, but the full manuscript (Section 4: Evaluation) provides the requested details: quantitative throughput and latency metrics on parallel workloads of increasing size, baselines including AutoGen and CrewAI, datasets drawn from multi-domain agentic benchmarks (e.g., parallel planning and data-processing tasks), error bars from repeated runs (n=5), and implementation notes on the distributed runtime. We will revise the abstract to incorporate the key quantitative result (e.g., 'APWA sustains linear scaling to 200+ subproblems with 3.8x higher throughput where baselines timeout'), making the claim verifiable directly from the abstract while preserving its brevity. revision: yes
Referee: [Abstract] Abstract: the core mechanism asserts that workflows are decomposed into 'non-interfering subproblems' requiring 'no cross-communication,' yet the manuscript supplies no dependency analysis, concrete decomposition examples, failure-mode characterization, or validation that latent data/control dependencies are absent in the evaluated tasks.

Authors: Section 3.2 of the manuscript presents a formal dependency-graph analysis that identifies data and control dependencies to guarantee non-interference, with concrete decomposition examples illustrated in Figures 2 and 3 for representative queries. Failure modes (including undetected latent dependencies) are characterized in Section 5.2, and validation is reported via automated static checks plus manual inspection on the evaluated task suite, confirming absence of cross-communication requirements. We will add one sentence to the abstract summarizing the validation approach ('Decomposition is validated by static dependency analysis ensuring independence'). This provides the requested high-level evidence in the abstract without duplicating the full technical treatment in the body. revision: partial

Circularity Check

0 steps flagged

No significant circularity in architecture proposal

full rationale

The paper presents APWA as a new distributed architecture whose central claim is an empirical demonstration that dynamic decomposition enables scaling on parallelizable agentic tasks where prior systems fail. No equations, fitted parameters, or derivation chain appear in the abstract or described structure. The decomposition into non-interfering subproblems is introduced as a design choice supported by evaluation results rather than reduced by construction from prior self-citations, ansatzes, or fitted inputs. The architecture is therefore self-contained against external benchmarks and receives a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that workflows admit non-interfering decomposition; no free parameters or invented physical entities are introduced in the abstract.

axioms (1)

domain assumption Complex agentic workflows can be decomposed into non-interfering subproblems processable independently without cross-communication
Invoked in the abstract as the enabling condition for parallel execution and scaling.

invented entities (1)

APWA architecture no independent evidence
purpose: Distributed system for parallel agentic workloads
Newly proposed architecture whose independent evidence is limited to the abstract's evaluation claim.

pith-pipeline@v0.9.0 · 5488 in / 1121 out tokens · 49473 ms · 2026-05-15T03:07:47.800391+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages

[1]

PII masking 300k dataset

Ai4Privacy. PII masking 300k dataset. https://huggingface.co/datasets/ai4privacy /pii-masking-300k, 2024

work page 2024
[2]

Hadoop.https://hadoop.apache.org/

Apache Software Foundation. Hadoop.https://hadoop.apache.org/

work page
[3]

Schema-driven information extraction from heterogeneous tables

Fan Bai, Junmo Kang, Gabriel Stanovsky, Dayne Freitag, Mark Dredze, and Alan Ritter. Schema-driven information extraction from heterogeneous tables. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Findings of the Association for Computational Linguistics: EMNLP 2024, pages 10252–10273, Miami, Florida, USA, November 2024. Association for Comp...

work page 2024
[4]

Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes

Daniil A. Boiko, Robert MacKnight, Ben Kline, and Gabe Gomes. Autonomous chemical research with large language models.Nature, 624(7992):570–578, December 2023

work page 2023
[5]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott G...

work page 1901
[6]

MLE-bench: Evaluating machine learning agents on machine learning engineering

Jun Shern Chan, Neil Chowdhury, Oliver Jaffe, James Aung, Dane Sherburn, Evan Mays, Giulio Starace, Kevin Liu, Leon Maksin, Tejal Patwardhan, Aleksander Madry, and Lilian Weng. MLE-bench: Evaluating machine learning agents on machine learning engineering. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Represe...

work page 2025
[7]

AutoAgents: A framework for automatic agent generation

Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje Karlsson, Jie Fu, and Yemin Shi. AutoAgents: A framework for automatic agent generation. In Kate Larson, editor, Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI- 24, pages 22–30. International Joint Conferences on Artificial Intelligence Organ...

work page 2024
[8]

AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors

Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou. AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors. In B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, edit...

work page 2024
[9]

CrewAI.https://www.crewai.com/

CrewAI, Inc. CrewAI.https://www.crewai.com/

work page
[10]

MapReduce: Simplified data processing on large clusters

Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In6th Symposium on Operating Systems Design and Implementation (OSDI 04), pages 137–150, San Francisco, CA, December 2004. USENIX Association

work page 2004
[11]

PentestGPT: Evaluating and harnessing large language models for automated penetration testing

Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. PentestGPT: Evaluating and harnessing large language models for automated penetration testing. In33rd USENIX Security Symposium (USENIX Security 24), pages 847–864, Philadelphia, PA, August 2024. USENIX Association. 10

work page 2024
[12]

Mind2web: Towards a generalist agent for the web

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Sam Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 28091–28114. Curran Associates, Inc., 2023

work page 2023
[13]

Self-collaboration code generation via chatgpt

Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. Self-collaboration code generation via chatgpt. ACM Trans. Softw. Eng. Methodol., 33(7), September 2024

work page 2024
[14]

Tenenbaum, and Igor Mordatch

Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. In Ruslan Salakhutdi- nov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine L...

work page 2024
[15]

Faith and fate: Limits of transformers on compositionality

Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang (Lorraine) Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena Hwang, Soumya Sanyal, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, and Yejin Choi. Faith and fate: Limits of transformers on compositionality. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, ...

work page 2023
[16]

Magentic-One: A generalist multi-agent system for solving complex tasks

Adam Fourney, Gagan Bansal, Hussein Mozannar, Cheng Tan, Eduardo Salinas, Erkang (Eric) Zhu, Friederike Niedtner, Grace Proebsting, Griffin Bassman, Jack Gerrits, Jacob Alber, Peter Chang, Ricky Loynd, Robert West, Victor Dibia, Ahmed Awadallah, Ece Kamar, Rafah Hosn, and Saleema Amershi. Magentic-One: A generalist multi-agent system for solving complex t...

work page 2024
[17]

Project Gutenberg, Urbana, Illinois, 2008

Edward Gibbon.The History of the Decline and Fall of the Roman Empire. Project Gutenberg, Urbana, Illinois, 2008. Accessed: 2026-04-27

work page 2008
[18]

LLM task interference: An initial study on the impact of task-switch in conversational history

Akash Gupta, Ivaxi Sheth, Vyas Raina, Mark Gales, and Mario Fritz. LLM task interference: An initial study on the impact of task-switch in conversational history. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 14633–14652, Miami, Florida, USA, Nove...

work page 2024
[19]

Project Gutenberg, Urbana, Illinois, 2005

Thomas Hardy.The Dynasts: An Epic-Drama of the War with Napoleon. Project Gutenberg, Urbana, Illinois, 2005. Accessed: 2026-04-27

work page 2005
[20]

Measuring massive multitask language understanding

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. InInternational Conference on Learning Representations, 2021

work page 2021
[21]

MetaGPT: Meta programming for a multi-agent collaborative framework

Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. MetaGPT: Meta programming for a multi-agent collaborative framework. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[22]

RULER: What’s the real context size of your long-context language models? InFirst Conference on Language Modeling, 2024

Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, and Boris Ginsburg. RULER: What’s the real context size of your long-context language models? InFirst Conference on Language Modeling, 2024

work page 2024
[23]

Owl: Optimized workforce learning for general multi-agent assistance in real- world task automation

Mengkang Hu, Yuhang Zhou, Wendong Fan, Yuzhou Nie, Ziyu Ye, Bowei Xia, Tao Sun, Zhaoxuan Jin, Yingru Li, Zeyu Zhang, Yifeng Wang, Qianshuo Ye, Bernard Ghanem, Ping Luo, and Guohao Li. Owl: Optimized workforce learning for general multi-agent assistance in real- world task automation. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, a...

work page 2025
[24]

Large language models cannot self-correct reasoning yet

Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Yu, Xinying Song, and Denny Zhou. Large language models cannot self-correct reasoning yet. In B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, editors,International Conference on Learning Representations, volume 2024, pages 32808–32824, 2024

work page 2024
[25]

Dryad: distributed data-parallel programs from sequential building blocks

Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. InProceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, EuroSys ’07, pages 59–72, New York, NY , USA, 2007. Association for Computing Machinery

work page 2007
[26]

SWE-bench: Can language models resolve real-world github issues? In B

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can language models resolve real-world github issues? In B. Kim, Y . Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, and Y . Sun, editors,International Conference on Learning Representations, volume 2024, pages 54107–54157, 2024

work page 2024
[27]

Mahoney, Kurt Keutzer, and Amir Gholami

Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, and Amir Gholami. An LLM compiler for parallel function calling. In Ruslan Salakhutdi- nov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning,...

work page 2024
[28]

Mdagents: An adaptive collaboration of llms for medical decision-making

Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel McDuff, Hyeon- hoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae Won Park. Mdagents: An adaptive collaboration of llms for medical decision-making. In A. Globerson, L. Mackey, D. Bel- grave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors,Advances in Neural Information Processi...

work page 2024
[29]

LangChain.https://github.com/langchain-ai/langchain/

Langchain, Inc. LangChain.https://github.com/langchain-ai/langchain/

work page
[30]

SEC-bench: Automated bench- marking of llm agents on real-world software security tasks

Hwiwon Lee, Ziqi Zhang, Hanxiao Lu, and Lingming Zhang. SEC-bench: Automated bench- marking of llm agents on real-world software security tasks. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Neural Information Processing Systems, volume 38, pages 116342–116378. Curran Associates, Inc., 2025

work page 2025
[31]

CAMEL: Communicative agents for "mind" exploration of large language model society

Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. CAMEL: Communicative agents for "mind" exploration of large language model society. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 51991–52008. Curran Associates, Inc., 2023

work page 2023
[32]

Encouraging divergent thinking in large language models through multi-agent debate

Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. Encouraging divergent thinking in large language models through multi-agent debate. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17889...

work page 2024
[33]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

work page 2024
[34]

Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D

Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D. White, and Philippe Schwaller. Augmenting large language models with chemistry tools.Nature Machine Intelli- gence, 6(5):525–535, May 2024

work page 2024
[35]

Austern, Aart J.C Bik, James C

Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. InProceed- ings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 135–146, New York, NY , USA, 2010. Association for Computing Machinery

work page 2010
[36]

Large language model guided protocol fuzzing

Ruijie Meng, Martin Mirchev, Marcel Böhme, and Abhik Roychoudhury. Large language model guided protocol fuzzing. InProceedings 2024 Network and Distributed System Security Symposium, NDSS 2024. Internet Society, 2024. 12

work page 2024
[37]

Microsoft agent framework

Microsoft Corporation. Microsoft agent framework. https://github.com/microsoft/age nt-framework/

work page
[38]

Model context protocol (MCP)

Model Context Protocol a Series of LF Projects, LLC. Model context protocol (MCP). https: //github.com/modelcontextprotocol/modelcontextprotocol

work page
[39]

Jordan, and Ion Stoica

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. Ray: A distributed framework for emerging AI applications. In13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 561–577, Carlsbad, CA, October 2018. USE...

work page 2018
[40]

Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Mad- havapeddy, and Steven Hand

Derek G. Murray, Malte Schwarzkopf, Christopher Smowton, Steven Smith, Anil Mad- havapeddy, and Steven Hand. CIEL: A universal execution engine for distributed Data-Flow computing. In8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11), Boston, MA, March 2011. USENIX Association

work page 2011
[41]

Flow: Modularized agentic workflow automation

Boye Niu, Yiliao Song, Kai Lian, Yifan Shen, Yu Yao, Kun Zhang, and Tongliang Liu. Flow: Modularized agentic workflow automation. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, pages 74949– 74977, 2025

work page 2025
[42]

Introducing GPT-5.4

OpenAI. Introducing GPT-5.4. https://openai.com/index/introducing-gpt-5-4/ ,

work page
[43]

Accessed: 2026-04-27

work page 2026
[44]

Introducing GPT-5.4 mini and nano

OpenAI. Introducing GPT-5.4 mini and nano. https://openai.com/index/introducing -gpt-5-4-mini-and-nano/, 2025. Accessed: 2026-04-27

work page 2025
[45]

Openai agents sdk

OpenAI. Openai agents sdk. https://github.com/openai/openai-agents-python ,

work page
[46]

Accessed: 2026-05-05

work page 2026
[47]

Piccolo: Building fast, distributed programs with partitioned tables

Russell Power and Jinyang Li. Piccolo: Building fast, distributed programs with partitioned tables. In9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10), Vancouver, BC, October 2010. USENIX Association

work page 2010
[48]

Project gutenberg

Project Gutenberg. Project gutenberg. https://www.gutenberg.org/. Accessed: 2026-04- 27

work page 2026
[49]

Webrl: Training llm web agents via self-evolving online curriculum reinforcement learning

Zehan Qi, Xiao Liu, Iat Long Iong, Hanyu Lai, Xueqiao Sun, Jiadai Sun, Xinyue Yang, Yu Yang, Shuntian Yao, Wei Xu, Jie Tang, and Yuxiao Dong. Webrl: Training llm web agents via self-evolving online curriculum reinforcement learning. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, ...

work page 2025
[50]

ChatDev: Communicative agents for software development

Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, Juyuan Xu, Dahai Li, Zhiyuan Liu, and Maosong Sun. ChatDev: Communicative agents for software development. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Li...

work page 2024
[51]

Scaling large language model- based multi-agent collaboration

Chen Qian, Zihao Xie, YiFei Wang, Wei Liu, Kunlun Zhu, Hanchen Xia, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Scaling large language model- based multi-agent collaboration. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors, International Conference on Learning Representations, volume 2025, pages 41488–41505, 2025

work page 2025
[52]

Benchmarking agentic workflow generation

Shuofei Qiao, Runnan Fang, Zhisong Qiu, Xiaobin Wang, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, and Huajun Chen. Benchmarking agentic workflow generation. InWorkshop on Reasoning and Planning for Large Language Models, 2025

work page 2025
[53]

Dask: Parallel computation with blocked algorithms and task scheduling

Matthew Rocklin. Dask: Parallel computation with blocked algorithms and task scheduling. In Kathryn Huff and James Bergstra, editors,Proceedings of the 14th Python in Science Conference, pages 126–132, 2015. 13

work page 2015
[54]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 6...

work page 2023
[55]

Project Gutenberg, Urbana, Illinois, 1998

William Shakespeare.Romeo and Juliet. Project Gutenberg, Urbana, Illinois, 1998. Accessed: 2026-04-27

work page 1998
[56]

Reflex- ion: language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflex- ion: language agents with verbal reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 8634–8652. Curran Associates, Inc., 2023

work page 2023
[57]

AutoML-agent: A multi-agent LLM framework for full-pipeline AutoML

Patara Trirat, Wonyong Jeong, and Sung Ju Hwang. AutoML-agent: A multi-agent LLM framework for full-pipeline AutoML. In Aarti Singh, Maryam Fazel, Daniel Hsu, Simon Lacoste- Julien, Felix Berkenkamp, Tegan Maharaj, Kiri Wagstaff, and Jerry Zhu, editors,Proceedings of the 42nd International Conference on Machine Learning, volume 267 ofProceedings of Machin...

work page 2025
[58]

Mixture-of-agents enhances large language model capabilities

Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Y Zou. Mixture-of-agents enhances large language model capabilities. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, editors,International Conference on Learning Representations, volume 2025, pages 33944– 33963, 2025

work page 2025
[59]

MegaAgent: A large-scale autonomous LLM-based multi-agent system without predefined SOPs

Qian Wang, Tianyu Wang, Zhenheng Tang, Qinbin Li, Nuo Chen, Jingsheng Liang, and Bingsheng He. MegaAgent: A large-scale autonomous LLM-based multi-agent system without predefined SOPs. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Findings of the Association for Computational Linguistics: ACL 2025, pages 4998–5036...

work page 2025
[60]

Openhands: An open platform for ai software developers as generalist agents

Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Daniel Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, and Graham Neubig. Openhands: An open platform for ai software...

work page 2025
[61]

AutoGen: Enabling next-gen LLM applications via multi-agent conversations

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. AutoGen: Enabling next-gen LLM applications via multi-agent conversations. InFirst Conference on Language Modeling, 2024

work page 2024
[62]

SWE-agent: Agent-computer interfaces enable automated software engineering

John Yang, Carlos Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, pages 50528–5065...

work page 2024
[63]

LeanDojo: Theorem proving with retrieval- augmented language models

Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J Prenger, and Animashree Anandkumar. LeanDojo: Theorem proving with retrieval- augmented language models. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information Processing Systems, volume 36, pages 21573–21...

work page 2023
[64]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[65]

Suchow, Zhenyu Cui, Rong Liu, Zhaozhuo Xu, Denghui Zhang, Koduvayur 14 Subbalakshmi, Guojun Xiong, Yueru He, Jimin Huang, Dong Li, and Qianqian Xie

Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yuechen Jiang, Yupeng Cao, Zhi Chen, Jordan W. Suchow, Zhenyu Cui, Rong Liu, Zhaozhuo Xu, Denghui Zhang, Koduvayur 14 Subbalakshmi, Guojun Xiong, Yueru He, Jimin Huang, Dong Li, and Qianqian Xie. Fincon: A synthesized llm multi-agent system with conceptual verbal reinforcement for enhanced financial deci...

work page 2024
[66]

Franklin, Scott Shenker, and Ion Stoica

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient distributed datasets: A Fault- Tolerant abstraction for In-Memory cluster computing. In9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pages 15–28, San Jose, CA, April 2012. U...

work page 2012
[67]

GPT-4V(ision) is a generalist web agent, if grounded

Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, and Yu Su. GPT-4V(ision) is a generalist web agent, if grounded. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learn...

work page 2024
[68]

subtask manager

Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, and Jürgen Schmidhuber. GPTSwarm: Language agents as optimizable graphs. In Ruslan Salakhutdi- nov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning...

work page 2024