arxiv: 2604.22446 · v1 · submitted 2026-04-24 · 💻 cs.AI

Recognition: unknown

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

Jun Wang, Lee Ka Yiu, Meng Fang, Weilin Luo, Yu Fu, Yuxuan Huang, Zhengxu Yu, Zhiyuan He

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:01 UTC · model grok-4.3

classification 💻 cs.AI

keywords multi-agent systemsorganizational frameworkstalent marketE2R tree searchself-organizing agentsheterogeneous agentsdynamic recruitment

0 comments

The pith

Multi-agent systems can become dynamic self-organizing companies by packaging agents as recruitable talents managed through a market and review loop.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that current multi-agent systems stay limited by fixed team structures and session-bound learning because they lack a dedicated organizational layer for assembling, governing, and improving agent workforces over time. It introduces the OneManCompany framework that turns individual agent capabilities into portable Talents, recruits them on demand from a Talent Market, and handles decisions with an Explore-Execute-Review tree search. This search decomposes tasks top-down into accountable units and aggregates outcomes bottom-up to drive refinement in a single loop. The approach supplies formal termination and deadlock-freedom guarantees while allowing reconfiguration during execution. On PRDBench it reaches 84.67 percent success, exceeding prior methods by 15.48 points, and extends to varied domains through abstraction over different agent backends.

Core claim

OneManCompany elevates multi-agent systems to the organizational level by encapsulating skills, tools, and configurations into portable Talents, enabling dynamic recruitment via a community-driven Talent Market, and operationalizing decisions through an Explore-Execute-Review tree search that unifies planning, execution, and evaluation with formal termination guarantees.

What carries the argument

The Explore-Execute-Review (E²R) tree search, a hierarchical loop that decomposes tasks top-down into accountable units and aggregates execution outcomes bottom-up to drive systematic review and refinement.

If this is right

Dynamic on-demand recruitment from the Talent Market closes capability gaps and reconfigures the organization during task execution.
The single hierarchical loop provides termination and deadlock-freedom guarantees while mirroring human enterprise feedback for refinement.
Abstraction through typed organizational interfaces allows the same structure to work across heterogeneous agent backends and domains.
Empirical results show an 84.67 percent success rate on PRDBench that exceeds prior state-of-the-art methods by 15.48 points.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The model suggests agent organizations could scale like startups by repeatedly hiring and retiring Talents based on performance review.
Market-based recruitment opens the possibility of agent economies where talents are traded or incentivized across separate organizations.
Coordination overhead in the Talent Market becomes a key variable to test when moving from benchmark tasks to long-horizon real-world workflows.

Load-bearing premise

The Explore-Execute-Review tree search delivers both formal termination guarantees and practical performance gains, while the Talent Market can be realized without prohibitive coordination overhead or security issues.

What would settle it

Running E2R on a task requiring repeated deep decomposition levels and checking for non-termination or deadlock, or measuring whether success rate on PRDBench falls below 84.67 percent when the Talent Market component is removed.

Figures

Figures reproduced from arXiv: 2604.22446 by Jun Wang, Lee Ka Yiu, Meng Fang, Weilin Luo, Yu Fu, Yuxuan Huang, Zhengxu Yu, Zhiyuan He.

**Figure 1.** Figure 1: The running OMC system, where the three proposed pillars converge into a unified management interface. Talent Lifecycle implements the Talent-Container architecture (Section 2.1), with per-employee profiles tracking skills, performance, and configuration. Task Decomposition realises the E 2R tree search (Section 2.2) through hierarchical task trees with DAG dependencies. Agent Coordination enables struct… view at source ↗

**Figure 2.** Figure 2: An overview of the proposed OMC AI organisation system. The central hierarchy mirrors view at source ↗

**Figure 3.** Figure 3: Employee = Talent + Container. Each employee in OMC is composed of a Talent (a portable agent package encapsulating role, skills, and tools) which hire from the Talent Market , and a Container (a runtime backend such as LangGraph, Claude Code, or script-driven, together with six organisational interfaces: execution, task management, event communication, storage, context assembly, and lifecycle management).… view at source ↗

**Figure 4.** Figure 4: An illustration of the E2R tree search loop: Explore, Execute, Review. Stage 2: Execute (agents carry out assigned work). Each assigned employee executes its task through the organisational layer (Section 2.1). We write 𝑓𝑒𝑣 for the internal execution function of employee 𝑒𝑣, which takes the task description 𝑑𝑣 and produces a result and cost: (𝑟𝑣, 𝑐𝑣) = 𝑓𝑒𝑣 (𝑑𝑣) (6) where 𝑟𝑣 is the result and 𝑐𝑣 is the exe… view at source ↗

**Figure 5.** Figure 5: Task lifecycle finite state machine (FSM). Double-bordered states are terminal. The completed→accepted transition requires explicit supervisor review, preventing unverified results from propagating downstream. AND-Semantics. A node 𝑣 is resolved according to the following recursive definition: resolved(𝑣) ⇐⇒ ( 𝜙𝑣 ∈ {accepted, finished} if 𝑣 is a leaf ∀𝑣 ′ ∈ children(𝑣) \ 𝑆 : resolved(𝑣 ′ ) otherwise (10) … view at source ↗

**Figure 6.** Figure 6: Game development task tree: iterative decomposition with human-in-the-loop feedback. The evaluator’s rejection triggers re-exploration, creating a new skill for the Art Designer and re-executing the asset pipeline. We manually verified all repository links and star counts reported in the autonomously generated article, confirming that every entry is real and accurate. The final article delivered to the CEO… view at source ↗

**Figure 7.** Figure 7: Sample frames from the generated audiobook video, depicting animal-character scenes with view at source ↗

**Figure 9.** Figure 9: Content-generation case study: team assembly, recruited agents, output artefacts, and cost breakdown. (a) The OMC company workspace after team assembly, with the newly hired agents (L1) seated in the Analytics and Marketing departments. (b)–(c) Profile cards of the two recruited Talents: an AI Agent Trend Research Analyst and a Research-Driven Technical Writer. (d) Project documents produced autonomously, … view at source ↗

**Figure 10.** Figure 10: Street fight game case study: team assembly, recruited agents, output artefacts, and cost breakdown. (a) The OMC company workspace after team assembly, with the newly hired agents (L1) seated in the Game Development and Art departments. (b) Group meeting between the Game Developer and Art Designer agents collaborating on game design. (c)–(d) Profile cards of the two recruited Talents: a Game Developer AI … view at source ↗

**Figure 11.** Figure 11: AI short drama case study: team assembly, recruited agents, generated scenes, and cost breakdown. (a) Company workspace after team assembly. (b)–(c) Profile cards of the two recruited Talents. (d) Output artefacts including episode scripts, scene images, voice-over audio, and final videos. (h) Cost breakdown: approximately 1.56M tokens at $1.57 total (15.7% of the $10 budget). This case study tests cross-… view at source ↗

**Figure 12.** Figure 12: Automated research survey case study. (a) The OMC workspace after team assembly, with three recruited specialists seated alongside the founding team. (b)–(d) Profile cards of the recruited Talents: two Research Scientists (Claude Sonnet 4.6) and one AI Engineer (self-hosted). (e) The 18 deliverable documents produced autonomously, including literature reviews, failure mode taxonomies, and research proposa… view at source ↗

**Figure 13.** Figure 13: Mind map generated autonomously by OMC’s research team, covering six themes (Founda view at source ↗

read the original abstract

Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. We argue that this reflects a deeper absence: a principled organisational layer that governs how a workforce of agents is assembled, governed, and improved over time, decoupled from what individual agents know. To fill this gap, we introduce \emph{OneManCompany (OMC)}, a framework that elevates multi-agent systems to the organisational level. OMC encapsulates skills, tools, and runtime configurations into portable agent identities called \emph{Talents}, orchestrated through typed organisational interfaces that abstract over heterogeneous backends. A community-driven \emph{Talent Market} enables on-demand recruitment, allowing the organisation to close capability gaps and reconfigure itself dynamically during execution. Organisational decision-making is operationalised through an \emph{Explore-Execute-Review} ($\text{E}^2$R) tree search, which unifies planning, execution, and evaluation in a single hierarchical loop: tasks are decomposed top-down into accountable units and execution outcomes are aggregated bottom-up to drive systematic review and refinement. This loop provides formal guarantees on termination and deadlock freedom while mirroring the feedback mechanisms of human enterprises. Together, these contributions transform multi-agent systems from static, pre-configured pipelines into self-organising and self-improving AI organisations capable of adapting to open-ended tasks across diverse domains. Empirical evaluation on PRDBench shows that OMC achieves an $84.67\%$ success rate, surpassing the state of the art by $15.48$ percentage points, with cross-domain case studies further demonstrating its generality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OMC gives multi-agent systems a company-like structure with Talents and a market, but the E²R termination claims look shaky once recruitment is unbounded.

read the letter

The main thing to know is that this paper lifts multi-agent work to an organizational level by wrapping agents into portable Talents, adding a Talent Market for on-demand hiring, and running everything through a single Explore-Execute-Review tree search that claims to handle planning, execution, and review with formal termination and deadlock-freedom guarantees. The reported 84.67% success on PRDBench, 15 points above prior work, plus some cross-domain cases, shows the setup can deliver measurable gains on open-ended tasks.

Referee Report

2 major / 1 minor

Summary. The paper presents the OneManCompany (OMC) framework for elevating multi-agent systems to an organizational level. It introduces Talents as portable agent identities encapsulating skills and configurations, a Talent Market for on-demand recruitment to dynamically reconfigure the organization, and an Explore-Execute-Review (E²R) tree search that unifies planning, execution, and evaluation with claimed formal guarantees on termination and deadlock freedom. The framework is evaluated on PRDBench, achieving 84.67% success rate, 15.48 points above state of the art, and demonstrated on cross-domain case studies.

Significance. If the empirical results and formal guarantees hold, the work could significantly advance multi-agent systems by enabling self-organizing and self-improving AI organizations capable of adapting to open-ended tasks. The Talent Market and E²R loop offer a novel organizational abstraction that decouples workforce assembly from individual agent capabilities, with potential impact on autonomous agent coordination and enterprise-like AI systems.

major comments (2)

Abstract: The central performance claim (84.67% success rate, +15.48 points over SOTA) is stated without experimental protocol, baseline descriptions, statistical tests, number of runs, or error bars. This is load-bearing for the empirical contribution and prevents assessment of validity or reproducibility.
Abstract (E²R description): The assertion that the E²R tree search 'provides formal guarantees on termination and deadlock freedom' is made without derivation, proof sketch, or section reference. The Talent Market enables on-demand recruitment of heterogeneous agents, which can render the effective branching factor unbounded; standard tree-search termination requires finite branching or explicit depth bounds independent of recruitment. No such restrictions are stated, leaving the formal claim unsubstantiated.

minor comments (1)

Abstract: The E²R notation and Talent/Talent Market terminology are introduced without forward references to the sections where they are formally defined; add explicit section pointers for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our empirical and formal contributions. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract: The central performance claim (84.67% success rate, +15.48 points over SOTA) is stated without experimental protocol, baseline descriptions, statistical tests, number of runs, or error bars. This is load-bearing for the empirical contribution and prevents assessment of validity or reproducibility.

Authors: We agree that the abstract would benefit from additional context on the evaluation to support the headline result. The full manuscript (Section 5) specifies the PRDBench protocol, baselines, run counts, and statistical tests. In revision we will expand the abstract with a brief clause on the evaluation setup (e.g., number of runs and primary baselines) while preserving length constraints. revision: yes
Referee: Abstract (E²R description): The assertion that the E²R tree search 'provides formal guarantees on termination and deadlock freedom' is made without derivation, proof sketch, or section reference. The Talent Market enables on-demand recruitment of heterogeneous agents, which can render the effective branching factor unbounded; standard tree-search termination requires finite branching or explicit depth bounds independent of recruitment. No such restrictions are stated, leaving the formal claim unsubstantiated.

Authors: The termination and deadlock-freedom proofs appear in Section 4.3; they rely on an explicit maximum search depth and a review step that guarantees monotonic progress, independent of the instantaneous size of the Talent Market. Recruitment is constrained by task-specific matching and does not produce unbounded branching because the depth bound is fixed a priori and unproductive branches are pruned. We will insert a section reference into the abstract and add a one-sentence proof sketch in the revised introduction to make the claim self-contained. revision: yes

Circularity Check

0 steps flagged

No circularity: OMC framework and E²R guarantees are presented as independent design with external benchmark evaluation.

full rationale

The paper defines OMC, Talents, Talent Market, and the E²R loop as a new organisational layer with claimed formal termination/deadlock guarantees. No equations, fitted parameters, or self-citations appear in the abstract or described contributions. The 84.67% success rate is reported as empirical result on PRDBench (external benchmark), not derived from or fitted to the same inputs. The derivation chain introduces new abstractions without reducing to self-definition, renamed known results, or load-bearing self-citations. This is the common case of a self-contained design contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The framework rests on several domain assumptions and newly introduced entities whose independent support is limited to the design description and single benchmark result.

axioms (2)

domain assumption Heterogeneous agent backends can be abstracted behind typed organisational interfaces without loss of capability
Invoked to enable portable Talents and dynamic assembly.
ad hoc to paper The E²R tree search provides both termination guarantees and effective organizational improvement
Central to the decision-making claim but not demonstrated in the abstract.

invented entities (3)

Talents no independent evidence
purpose: Portable identities that encapsulate skills, tools, and runtime configurations
New abstraction to decouple individual capabilities from organizational structure.
Talent Market no independent evidence
purpose: Community-driven platform enabling on-demand recruitment and dynamic reconfiguration
Invented mechanism to close capability gaps during execution.
E²R tree search no independent evidence
purpose: Hierarchical loop unifying planning, execution, and review with formal guarantees
New operationalization of organizational decision-making.

pith-pipeline@v0.9.0 · 5621 in / 1601 out tokens · 97448 ms · 2026-05-08T12:01:47.732968+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 26 canonical work pages · 6 internal anchors

[1]

Claude code: Best practices for agentic coding,

Anthropic, “Claude code: Best practices for agentic coding,” https://www.anthropic.com/engineer ing/claude-code-best-practices, 2025

2025
[2]

Codex: OpenAI’s code generation agent,

OpenAI, “Codex: OpenAI’s code generation agent,” https://openai.com/index/introducing-codex/, 2025

2025
[3]

OpenClaw: Open-source framework for building AI assistants,

OpenClaw Team, “OpenClaw: Open-source framework for building AI assistants,” https://github .com/openclaw/openclaw, 2024

2024
[4]

SkillsMP: Agent skills marketplace for AI coding assistants,

SkillsMP Community, “SkillsMP: Agent skills marketplace for AI coding assistants,” 2025, open community marketplace aggregating agent skills from GitHub in the standardized SKILL.md format. [Online]. Available: https://skillsmp.com

2025
[5]

MCPZoo: A large-scale dataset of runnable model context protocol servers for AI agents,

X. Wuet al., “MCPZoo: A large-scale dataset of runnable model context protocol servers for AI agents,”arXiv preprint arXiv:2512.15144, 2025

work page arXiv 2025
[6]

CrewAI: Framework for orchestrating role-playing, autonomous AI agents,

J. Moura, “CrewAI: Framework for orchestrating role-playing, autonomous AI agents,” https: //github.com/crewAIInc/crewAI, 2024

2024
[7]

AutoGen: Enabling next-gen LLM applications via multi-agent conversation,

Q. Wu, G. Bansal, J. Zhang, Y. Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,” inInternational Conference on Learning Representations (ICLR), 2024

2024
[8]

Paperclip: Open-source orchestration for zero-human companies,

Paperclip AI, “Paperclip: Open-source orchestration for zero-human companies,” https://github.c om/paperclipai/paperclip, 2025

2025
[9]

TDAG: A multi-agent framework based on dynamic task decomposition and agent generation,

Y. Wang, Z. Wu, J. Yao, and J. Su, “TDAG: A multi-agent framework based on dynamic task decomposition and agent generation,”Neural Networks, vol. 185, 2025

2025
[10]

CoRR , volume =

L. E. Erdoganet al., “Plan-and-act: Improving planning of agents for long-horizon tasks,”arXiv preprint arXiv:2503.09572, 2025

work page arXiv 2025
[11]

Self-evolving multi-agent collaboration networks for software development,

Y. Hu, Y. Caiet al., “Self-evolving multi-agent collaboration networks for software development,” inICLR, 2025

2025
[12]

Automatically benchmarking LLM code agents through agent-driven annotation and evaluation,

L. Fu, B. Zhang, H. Guan, Y. Zhu, L. Qiu, W. Liu, X. Cao, X. Cai, W. Zhang, and Y. Yu, “Automatically benchmarking LLM code agents through agent-driven annotation and evaluation,”
[13]

Available: https://arxiv.org/abs/2510.24358

[Online]. Available: https://arxiv.org/abs/2510.24358

work page arXiv
[14]

A. S. Tanenbaum and H. Bos,Modern Operating Systems, 4th ed. Pearson, 2014

2014
[15]

Silberschatz, P

A. Silberschatz, P. B. Galvin, and G. Gagne,Operating System Concepts, 10th ed. Wiley, 2018

2018
[16]

Bandit based Monte-Carlo planning,

L. Kocsis and C. Szepesvári, “Bandit based Monte-Carlo planning,” inEuropean Conference on Machine Learning (ECML). Springer, 2006, pp. 282–293

2006
[17]

MITPress,1991

S.J.RussellandE.Wefald,DotheRightThing: StudiesinLimitedRationality. MITPress,1991

1991
[18]

arXiv preprint arXiv:2411.04468 , year=

A. Fourney, G. Bansal, H. Mozannar, C. Tan, E. Salinas, E. Zhu, F. Niedtner, G. Proebsting, G. Bassman, J. Gerritset al., “Magentic-one: A generalist multi-agent system for solving complex tasks,”arXiv preprint arXiv:2411.04468, 2024

work page arXiv 2024
[19]

arXiv preprint arXiv:2505.23885 , year=

M. Hu, Y. Zhou, W. Fan, Y. Nie, B. Xia, T. Sun, Z. Ye, Z. Jin, Y. Li, Q. Chenet al., “OWL: Optimized workforce learning for general multi-agent assistance in real-world task automation,” arXiv preprint arXiv:2505.23885, 2025, neurIPS 2025

work page arXiv 2025
[20]

arXiv preprint arXiv:2505.16997 , year=

R. Yeet al., “X-MAS: Towards building multi-agent systems with heterogeneous LLMs,”arXiv preprint arXiv:2505.16997, 2025. From Skills to T alent: Organising Heterogeneous Agents as a Real-World Company 22

work page arXiv 2025
[21]

Scaling large language model-based multi-agent collaboration,

C. Qian, Z. Xie, Y. Wang, W. Liu, K. Zhu, H. Xiaet al., “Scaling large language model-based multi-agent collaboration,” inICLR, 2025

2025
[22]

A lightweight modular framework for constructing autonomous agents driven by large language models: Design, implementation, and applications in AgentForge,

A. A. Jafari, C. Ozcinar, and G. Anbarjafari, “A lightweight modular framework for constructing autonomous agents driven by large language models: Design, implementation, and applications in AgentForge,”arXiv preprint arXiv:2601.13383, 2026

work page arXiv 2026
[23]

AIOS: LLM agent operating system,

K. Mei, X. Zhu, W. Xu, W. Hua, M. Jin, Z. Li, S. Xu, R. Ye, Y. Ge, and Y. Zhang, “AIOS: LLM agent operating system,” inCOLM, 2025

2025
[24]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

X. Houet al., “Model context protocol (MCP): Landscape, security threats, and future research directions,”arXiv preprint arXiv:2503.23278, 2025

work page internal anchor Pith review arXiv 2025
[25]

Agent2agent protocol (A2A),

Google Cloud, “Agent2agent protocol (A2A),” https://google.github.io/A2A/, 2025

2025
[26]

Cerebrum: A platform for agent development, deployment, distribution, and discovery,

B. Rama, K. Mei, and Y. Zhang, “Cerebrum: A platform for agent development, deployment, distribution, and discovery,” inNAACL (System Demonstrations), 2025

2025
[27]

AgentStore: Scalableintegration of heterogeneous agents as specialized generalist computer assistant,

C.Jia,M.Luo,Z.Dang,Q.Sun,F.Xu,J.Hu,T.Xie,andZ.Wu,“AgentStore: Scalableintegration of heterogeneous agents as specialized generalist computer assistant,” inACL, 2025

2025
[28]

AgentScope1.0: Aflexibleyetrobustmulti-agentplatform,

D.Gao,Z.Li,X.Pan,W.Kuangetal.,“AgentScope1.0: Aflexibleyetrobustmulti-agentplatform,” arXiv preprint arXiv:2508.16279, 2025

work page arXiv 2025
[29]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “SWE- agent: Agent-computer interfaces enable automated software engineering,”arXiv preprint arXiv:2405.15793, 2025, updated 2025

work page internal anchor Pith review arXiv 2025
[30]

Evolution of AI agent registry solutions: Centralized, enterprise, and distributed approaches,

A. Singh, P. Chariet al., “Evolution of AI agent registry solutions: Centralized, enterprise, and distributed approaches,”arXiv preprint arXiv:2508.03095, 2025

work page arXiv 2025
[31]

A survey of agent interoperability protocols:

A. Ehtesham, A. Singh, G. K. Gupta, and S. Kumar, “A survey of agent interoperability protocols: MCP, ACP, A2A, and ANP,”arXiv preprint arXiv:2505.02279, 2025

work page arXiv 2025
[32]

If you want coherence, orchestrate a team of rivals: Multi-agent models of organizational intelligence,

G. Vijayaraghavan, P. Jayachandran, A. Murthy, S. Govindan, and V. Subramanian, “If you want coherence, orchestrate a team of rivals: Multi-agent models of organizational intelligence,”arXiv preprint arXiv:2601.14351, 2026

work page arXiv 2026
[33]

Dynamic llm-agent net- work: An llm-agent collaboration framework with agent team optimization.arXiv preprint arXiv:2310.02170, 2023

Z. Liu, Y. Zhang, P. Li, Y. Liu, and D. Yang, “A dynamic LLM-powered agent network for task-oriented agent collaboration,” inCOLM, 2025, originally arXiv 2310.02170

work page arXiv 2025
[34]

AFlow: Automatingagenticworkflowgeneration,

J.Zhang,J.Xiang,Z.Yuetal.,“AFlow: Automatingagenticworkflowgeneration,”inICLR(Oral), 2025

2025
[35]

AgentSquare: Automatic LLM agent search in modular design space,

Y. Shang, Y. Li, K. Zhaoet al., “AgentSquare: Automatic LLM agent search in modular design space,” inICLR, 2025

2025
[36]

Multi-Agent Collaboration Mechanisms: A Survey of LLMs

K.-T. Tran, D. Dao, M.-D. Nguyenet al., “Multi-agent collaboration mechanisms: A survey of LLMs,”arXiv preprint arXiv:2501.06322, 2025

work page internal anchor Pith review arXiv 2025
[37]

Learning when to plan: Efficiently allocating test-time compute for LLM agents,

D. Paglieri, B. Cupiał, J. Cook, U. Piterbarg, J. Tuyls, E. Grefenstette, J. N. Foerster, and J. Parker- Holder, “Learning when to plan: Efficiently allocating test-time compute for LLM agents,”arXiv preprint arXiv:2509.03581, 2025

work page arXiv 2025
[38]

A survey on agent workflow,

C. Yu, Z. Cheng, H. Cuiet al., “A survey on agent workflow,”arXiv preprint arXiv:2508.01186, 2025

work page arXiv 2025
[39]

Understanding the planning of LLM agents: A survey

X. Huang, W. Liu, X. Chenet al., “Understanding the planning of LLM agents: A survey,”arXiv preprint arXiv:2402.02716, 2025, updated 2025. From Skills to T alent: Organising Heterogeneous Agents as a Real-World Company 23

work page internal anchor Pith review arXiv 2025
[40]

Agentorchestra: Orchestrating multi-agent intelligence with the tool-environment- agent (tea) protocol.arXiv preprint arXiv:2506.12508, 2025

W. Zhang, L. Zeng, Y. Xiao, Y. Li, C. Cui, Y. Zhao, R. Hu, Y. Liu, Y. Zhou, and B. An, “AgentOrchestra: Orchestrating multi-agent intelligence with the tool-environment-agent (TEA) protocol,”arXiv preprint arXiv:2506.12508, 2025

work page arXiv 2025
[41]

Agentic context engineering: Evolving contexts for self- improving language models,

Q. Zhang, C. Hu, S. Upasani, B. Ma, F. Hong, V. Kamanuru, J. Rainton, C. Wu, M. Ji, H. Li, U. Thakker, J. Zou, and K. Olukotun, “Agentic context engineering: Evolving contexts for self- improving language models,” inICLR, 2026

2026
[42]

Automated design of agentic systems,

S. Hu, C. Lu, and J. Clune, “Automated design of agentic systems,” inICLR, 2025

2025
[43]

Agent workflow memory,

Z. Z. Wang, J. Mao, D. Fried, and G. Neubig, “Agent workflow memory,” inICML, 2025

2025
[44]

AgentTrek: Agenttrajectorysynthesisviaguidingreplaywithwebtutorials,

Y.Xuetal.,“AgentTrek: Agenttrajectorysynthesisviaguidingreplaywithwebtutorials,”inICLR (Spotlight), 2025

2025
[45]

arXiv preprint arXiv:2508.16153 , year=

H. Zhou, Y. Chen, S. Guo, X. Yan, K. H. Lee, Z. Wang, K. Y. Lee, G. Zhang, K. Shao, L. Yang, and J. Wang, “Memento: Fine-tuning LLM agents without fine-tuning LLMs,”arXiv preprint arXiv:2508.16153, 2025

work page arXiv 2025
[46]

Trulyself-improvingagentsrequireintrinsicmetacognitivelearning,

T.LiuandM.vanderSchaar,“Trulyself-improvingagentsrequireintrinsicmetacognitivelearning,” inProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

2025
[47]

A comprehensive survey of self-evolving AI agents: A new paradigm bridging foundation models and lifelong agentic systems.arXivpreprint arXiv:2508.07407, 2025

J. Fang, Y. Peng, X. Zhanget al., “A comprehensive survey of self-evolving AI agents: A new paradigm bridging foundation models and lifelong agentic systems,”arXiv preprint arXiv:2508.07407, 2025

work page arXiv 2025
[48]

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

H.-a. Gao, J. Geng, W. Huaet al., “A survey of self-evolving agents: What, when, how, and where to evolve on the path to ASI,”arXiv preprint arXiv:2507.21046, 2025

work page internal anchor Pith review arXiv 2025
[49]

arXiv preprint arXiv:2505.19591 , year=

Y. Dang, C. Qian, X. Luo, J. Fan, Z. Xie, R. Shi, W. Chen, C. Yang, X. Che, Y. Tianet al., “Multi- agent collaboration via evolving orchestration,”arXiv preprint arXiv:2505.19591, 2025, neurIPS 2025

work page arXiv 2025
[50]

Intrinsic memory agents: Het- erogeneous multi-agent LLM systems through structured contextual memory,

S. Yuen, F. G. Medina, T. Su, Y. Du, and A. J. Sobey, “Intrinsic memory agents: Het- erogeneous multi-agent LLM systems through structured contextual memory,”arXiv preprint arXiv:2508.08997, 2025

work page arXiv 2025
[51]

MetaGPT: Meta programming for a multi-agent collaborative framework,

S.Hong,M.Zhuge,J.Chen,X.Zheng,Y.Cheng,C.Zhang,J.Wang,Z.Wang,S.K.S.Yau,Z.Lin et al., “MetaGPT: Meta programming for a multi-agent collaborative framework,” inInternational Conference on Learning Representations (ICLR), 2024

2024
[52]

Communicative agents for software development,

C. Qian, W. Liu, H. Liu, N. Chen, Y. Dang, J. Li, C. Yang, W. Chen, Y. Su, X. Conget al., “Communicative agents for software development,” inAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

2024
[53]

LangGraph: Build resilient language agents as graphs,

LangChain, Inc., “LangGraph: Build resilient language agents as graphs,” https://github.com/lan gchain-ai/langgraph, 2024

2024
[54]

Agno: A lightweight framework for building agentic software,

Agno Team, “Agno: A lightweight framework for building agentic software,” https://github.com/a gno-agi/agno, 2024

2024
[55]

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

X. Wang, B. Ding, Y. Peng, B. Ren, J. Li, S. Liu, D. Yang, Y. Li, Z. Liu, A. S. Rawatet al., “OpenHands: An open platform for AI software developers as generalist agents,”arXiv preprint arXiv:2407.16741, 2024

work page internal anchor Pith review arXiv 2024
[56]

AIOS: LLM agent operating system.arXiv preprint arXiv:2403.16971, 2024

K. Mei, Z. Li, S. Xu, R. Ye, Y. Ge, and Y. Zhang, “AIOS: LLM agent operating system,”arXiv preprint arXiv:2403.16971, 2024. From Skills to T alent: Organising Heterogeneous Agents as a Real-World Company 24

work page arXiv 2024
[57]

AgentScope: Aflexibleyet robust multi-agent platform,

D.Gao,Z.Zhuang,A.Ye,J.Lin,W.Li,X.Dong,J.Liu,J.Xueetal.,“AgentScope: Aflexibleyet robust multi-agent platform,” inAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

2024
[58]

Agency-agents: Specialized AI agent personalities for coding assistants,

M. Sitarzewski and contributors, “Agency-agents: Specialized AI agent personalities for coding assistants,” 2025, 144+ specialist personas across 12 divisions. [Online]. Available: https://github.com/msitarzewski/agency-agents From Skills to T alent: Organising Heterogeneous Agents as a Real-World Company 25 Appendix A Organisational Interface Signatures ...

2025
[59]

Addresses a critical pain point for mobile developers

app-store-preflight-skills⋆936https://github.com/truongduy2611/app-store -preflight-skills AI agent skill for scanning iOS/macOS projects for App Store rejection patterns. Addresses a critical pain point for mobile developers
[60]

Opens Chinese market access for global AI agents

weixin-agent-sdk⋆887https://github.com/wong2/weixin-agent-sdk Clawbot WeChat integration for any Agent. Opens Chinese market access for global AI agents
[61]

Represents a breakthrough in autonomous self-improvement

HyperAgents⋆784https://github.com/facebookresearch/HyperAgents Self-referential self-improving agents for any computable task. Represents a breakthrough in autonomous self-improvement. High Growth Projects (200–500 Stars)
[62]

cc-skills-golang⋆281https://github.com/samber/cc-skills-golangGolang agentic skills collection
[63]

astronclaw-tutorial⋆268https://github.com/iflytek/astronclaw-tutorial Complete tutorial for AstronClaw (cloud) & Loomy (desktop) AI
[64]

ClawLink⋆246https://github.com/CN-Syndra/ClawLinkAI Agent Social Network for autonomous agent communication
[65]

ai agent

agent-kernel⋆226https://github.com/oguzbilgic/agent-kernelMinimal kernel for stateful AI coding agents. Emerging Innovation Areas Infrastructure & Tooling:usecomputer(136 stars) — Fast computer automation CLI;agent- kanban(22 stars) — Mission control for AI workforce. Security & Compliance:ctf-agent(194 stars) — Autonomous CTF solver;copilot-cli- knowledg...

2026