pith. machine review for the scientific record. sign in

arxiv: 2604.22446 · v1 · submitted 2026-04-24 · 💻 cs.AI

Recognition: unknown

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

Jun Wang, Lee Ka Yiu, Meng Fang, Weilin Luo, Yu Fu, Yuxuan Huang, Zhengxu Yu, Zhiyuan He

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:01 UTC · model grok-4.3

classification 💻 cs.AI
keywords multi-agent systemsorganizational frameworkstalent marketE2R tree searchself-organizing agentsheterogeneous agentsdynamic recruitment
0
0 comments X

The pith

Multi-agent systems can become dynamic self-organizing companies by packaging agents as recruitable talents managed through a market and review loop.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that current multi-agent systems stay limited by fixed team structures and session-bound learning because they lack a dedicated organizational layer for assembling, governing, and improving agent workforces over time. It introduces the OneManCompany framework that turns individual agent capabilities into portable Talents, recruits them on demand from a Talent Market, and handles decisions with an Explore-Execute-Review tree search. This search decomposes tasks top-down into accountable units and aggregates outcomes bottom-up to drive refinement in a single loop. The approach supplies formal termination and deadlock-freedom guarantees while allowing reconfiguration during execution. On PRDBench it reaches 84.67 percent success, exceeding prior methods by 15.48 points, and extends to varied domains through abstraction over different agent backends.

Core claim

OneManCompany elevates multi-agent systems to the organizational level by encapsulating skills, tools, and configurations into portable Talents, enabling dynamic recruitment via a community-driven Talent Market, and operationalizing decisions through an Explore-Execute-Review tree search that unifies planning, execution, and evaluation with formal termination guarantees.

What carries the argument

The Explore-Execute-Review (E²R) tree search, a hierarchical loop that decomposes tasks top-down into accountable units and aggregates execution outcomes bottom-up to drive systematic review and refinement.

If this is right

  • Dynamic on-demand recruitment from the Talent Market closes capability gaps and reconfigures the organization during task execution.
  • The single hierarchical loop provides termination and deadlock-freedom guarantees while mirroring human enterprise feedback for refinement.
  • Abstraction through typed organizational interfaces allows the same structure to work across heterogeneous agent backends and domains.
  • Empirical results show an 84.67 percent success rate on PRDBench that exceeds prior state-of-the-art methods by 15.48 points.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The model suggests agent organizations could scale like startups by repeatedly hiring and retiring Talents based on performance review.
  • Market-based recruitment opens the possibility of agent economies where talents are traded or incentivized across separate organizations.
  • Coordination overhead in the Talent Market becomes a key variable to test when moving from benchmark tasks to long-horizon real-world workflows.

Load-bearing premise

The Explore-Execute-Review tree search delivers both formal termination guarantees and practical performance gains, while the Talent Market can be realized without prohibitive coordination overhead or security issues.

What would settle it

Running E2R on a task requiring repeated deep decomposition levels and checking for non-termination or deadlock, or measuring whether success rate on PRDBench falls below 84.67 percent when the Talent Market component is removed.

Figures

Figures reproduced from arXiv: 2604.22446 by Jun Wang, Lee Ka Yiu, Meng Fang, Weilin Luo, Yu Fu, Yuxuan Huang, Zhengxu Yu, Zhiyuan He.

Figure 1
Figure 1. Figure 1: The running OMC system, where the three proposed pillars converge into a unified man￾agement interface. Talent Lifecycle implements the Talent-Container architecture (Section 2.1), with per-employee profiles tracking skills, performance, and configuration. Task Decomposition realises the E 2R tree search (Section 2.2) through hierarchical task trees with DAG dependencies. Agent Coordi￾nation enables struct… view at source ↗
Figure 2
Figure 2. Figure 2: An overview of the proposed OMC AI organisation system. The central hierarchy mirrors view at source ↗
Figure 3
Figure 3. Figure 3: Employee = Talent + Container. Each employee in OMC is composed of a Talent (a portable agent package encapsulating role, skills, and tools) which hire from the Talent Market , and a Container (a runtime backend such as LangGraph, Claude Code, or script-driven, together with six organisational interfaces: execution, task management, event communication, storage, context assembly, and lifecycle management).… view at source ↗
Figure 4
Figure 4. Figure 4: An illustration of the E2R tree search loop: Explore, Execute, Review. Stage 2: Execute (agents carry out assigned work). Each as￾signed employee executes its task through the organisational layer (Section 2.1). We write 𝑓𝑒𝑣 for the internal execution function of employee 𝑒𝑣, which takes the task description 𝑑𝑣 and produces a result and cost: (𝑟𝑣, 𝑐𝑣) = 𝑓𝑒𝑣 (𝑑𝑣) (6) where 𝑟𝑣 is the result and 𝑐𝑣 is the exe… view at source ↗
Figure 5
Figure 5. Figure 5: Task lifecycle finite state machine (FSM). Double-bordered states are terminal. The com￾pleted→accepted transition requires explicit supervisor review, preventing unverified results from propagating downstream. AND-Semantics. A node 𝑣 is resolved according to the following recursive definition: resolved(𝑣) ⇐⇒ ( 𝜙𝑣 ∈ {accepted, finished} if 𝑣 is a leaf ∀𝑣 ′ ∈ children(𝑣) \ 𝑆 : resolved(𝑣 ′ ) otherwise (10) … view at source ↗
Figure 6
Figure 6. Figure 6: Game development task tree: iterative decomposition with human-in-the-loop feedback. The evaluator’s rejection triggers re-exploration, creating a new skill for the Art Designer and re-executing the asset pipeline. We manually verified all repository links and star counts reported in the autonomously generated article, confirming that every entry is real and accurate. The final article delivered to the CEO… view at source ↗
Figure 7
Figure 7. Figure 7: Sample frames from the generated audiobook video, depicting animal-character scenes with view at source ↗
Figure 9
Figure 9. Figure 9: Content-generation case study: team assembly, recruited agents, output artefacts, and cost breakdown. (a) The OMC company workspace after team assembly, with the newly hired agents (L1) seated in the Analytics and Marketing departments. (b)–(c) Profile cards of the two recruited Talents: an AI Agent Trend Research Analyst and a Research-Driven Technical Writer. (d) Project documents produced autonomously, … view at source ↗
Figure 10
Figure 10. Figure 10: Street fight game case study: team assembly, recruited agents, output artefacts, and cost breakdown. (a) The OMC company workspace after team assembly, with the newly hired agents (L1) seated in the Game Development and Art departments. (b) Group meeting between the Game Developer and Art Designer agents collaborating on game design. (c)–(d) Profile cards of the two recruited Talents: a Game Developer AI … view at source ↗
Figure 11
Figure 11. Figure 11: AI short drama case study: team assembly, recruited agents, generated scenes, and cost breakdown. (a) Company workspace after team assembly. (b)–(c) Profile cards of the two recruited Talents. (d) Output artefacts including episode scripts, scene images, voice-over audio, and final videos. (h) Cost breakdown: approximately 1.56M tokens at $1.57 total (15.7% of the $10 budget). This case study tests cross-… view at source ↗
Figure 12
Figure 12. Figure 12: Automated research survey case study. (a) The OMC workspace after team assembly, with three recruited specialists seated alongside the founding team. (b)–(d) Profile cards of the recruited Talents: two Research Scientists (Claude Sonnet 4.6) and one AI Engineer (self-hosted). (e) The 18 deliverable documents produced autonomously, including literature reviews, failure mode taxonomies, and research proposa… view at source ↗
Figure 13
Figure 13. Figure 13: Mind map generated autonomously by OMC’s research team, covering six themes (Founda view at source ↗
read the original abstract

Individual agent capabilities have advanced rapidly through modular skills and tool integrations, yet multi-agent systems remain constrained by fixed team structures, tightly coupled coordination logic, and session-bound learning. We argue that this reflects a deeper absence: a principled organisational layer that governs how a workforce of agents is assembled, governed, and improved over time, decoupled from what individual agents know. To fill this gap, we introduce \emph{OneManCompany (OMC)}, a framework that elevates multi-agent systems to the organisational level. OMC encapsulates skills, tools, and runtime configurations into portable agent identities called \emph{Talents}, orchestrated through typed organisational interfaces that abstract over heterogeneous backends. A community-driven \emph{Talent Market} enables on-demand recruitment, allowing the organisation to close capability gaps and reconfigure itself dynamically during execution. Organisational decision-making is operationalised through an \emph{Explore-Execute-Review} ($\text{E}^2$R) tree search, which unifies planning, execution, and evaluation in a single hierarchical loop: tasks are decomposed top-down into accountable units and execution outcomes are aggregated bottom-up to drive systematic review and refinement. This loop provides formal guarantees on termination and deadlock freedom while mirroring the feedback mechanisms of human enterprises. Together, these contributions transform multi-agent systems from static, pre-configured pipelines into self-organising and self-improving AI organisations capable of adapting to open-ended tasks across diverse domains. Empirical evaluation on PRDBench shows that OMC achieves an $84.67\%$ success rate, surpassing the state of the art by $15.48$ percentage points, with cross-domain case studies further demonstrating its generality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents the OneManCompany (OMC) framework for elevating multi-agent systems to an organizational level. It introduces Talents as portable agent identities encapsulating skills and configurations, a Talent Market for on-demand recruitment to dynamically reconfigure the organization, and an Explore-Execute-Review (E²R) tree search that unifies planning, execution, and evaluation with claimed formal guarantees on termination and deadlock freedom. The framework is evaluated on PRDBench, achieving 84.67% success rate, 15.48 points above state of the art, and demonstrated on cross-domain case studies.

Significance. If the empirical results and formal guarantees hold, the work could significantly advance multi-agent systems by enabling self-organizing and self-improving AI organizations capable of adapting to open-ended tasks. The Talent Market and E²R loop offer a novel organizational abstraction that decouples workforce assembly from individual agent capabilities, with potential impact on autonomous agent coordination and enterprise-like AI systems.

major comments (2)
  1. Abstract: The central performance claim (84.67% success rate, +15.48 points over SOTA) is stated without experimental protocol, baseline descriptions, statistical tests, number of runs, or error bars. This is load-bearing for the empirical contribution and prevents assessment of validity or reproducibility.
  2. Abstract (E²R description): The assertion that the E²R tree search 'provides formal guarantees on termination and deadlock freedom' is made without derivation, proof sketch, or section reference. The Talent Market enables on-demand recruitment of heterogeneous agents, which can render the effective branching factor unbounded; standard tree-search termination requires finite branching or explicit depth bounds independent of recruitment. No such restrictions are stated, leaving the formal claim unsubstantiated.
minor comments (1)
  1. Abstract: The E²R notation and Talent/Talent Market terminology are introduced without forward references to the sections where they are formally defined; add explicit section pointers for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our empirical and formal contributions. We address each point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: Abstract: The central performance claim (84.67% success rate, +15.48 points over SOTA) is stated without experimental protocol, baseline descriptions, statistical tests, number of runs, or error bars. This is load-bearing for the empirical contribution and prevents assessment of validity or reproducibility.

    Authors: We agree that the abstract would benefit from additional context on the evaluation to support the headline result. The full manuscript (Section 5) specifies the PRDBench protocol, baselines, run counts, and statistical tests. In revision we will expand the abstract with a brief clause on the evaluation setup (e.g., number of runs and primary baselines) while preserving length constraints. revision: yes

  2. Referee: Abstract (E²R description): The assertion that the E²R tree search 'provides formal guarantees on termination and deadlock freedom' is made without derivation, proof sketch, or section reference. The Talent Market enables on-demand recruitment of heterogeneous agents, which can render the effective branching factor unbounded; standard tree-search termination requires finite branching or explicit depth bounds independent of recruitment. No such restrictions are stated, leaving the formal claim unsubstantiated.

    Authors: The termination and deadlock-freedom proofs appear in Section 4.3; they rely on an explicit maximum search depth and a review step that guarantees monotonic progress, independent of the instantaneous size of the Talent Market. Recruitment is constrained by task-specific matching and does not produce unbounded branching because the depth bound is fixed a priori and unproductive branches are pruned. We will insert a section reference into the abstract and add a one-sentence proof sketch in the revised introduction to make the claim self-contained. revision: yes

Circularity Check

0 steps flagged

No circularity: OMC framework and E²R guarantees are presented as independent design with external benchmark evaluation.

full rationale

The paper defines OMC, Talents, Talent Market, and the E²R loop as a new organisational layer with claimed formal termination/deadlock guarantees. No equations, fitted parameters, or self-citations appear in the abstract or described contributions. The 84.67% success rate is reported as empirical result on PRDBench (external benchmark), not derived from or fitted to the same inputs. The derivation chain introduces new abstractions without reducing to self-definition, renamed known results, or load-bearing self-citations. This is the common case of a self-contained design contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The framework rests on several domain assumptions and newly introduced entities whose independent support is limited to the design description and single benchmark result.

axioms (2)
  • domain assumption Heterogeneous agent backends can be abstracted behind typed organisational interfaces without loss of capability
    Invoked to enable portable Talents and dynamic assembly.
  • ad hoc to paper The E²R tree search provides both termination guarantees and effective organizational improvement
    Central to the decision-making claim but not demonstrated in the abstract.
invented entities (3)
  • Talents no independent evidence
    purpose: Portable identities that encapsulate skills, tools, and runtime configurations
    New abstraction to decouple individual capabilities from organizational structure.
  • Talent Market no independent evidence
    purpose: Community-driven platform enabling on-demand recruitment and dynamic reconfiguration
    Invented mechanism to close capability gaps during execution.
  • E²R tree search no independent evidence
    purpose: Hierarchical loop unifying planning, execution, and review with formal guarantees
    New operationalization of organizational decision-making.

pith-pipeline@v0.9.0 · 5621 in / 1601 out tokens · 97448 ms · 2026-05-08T12:01:47.732968+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 26 canonical work pages · 6 internal anchors

  1. [1]

    Claude code: Best practices for agentic coding,

    Anthropic, “Claude code: Best practices for agentic coding,” https://www.anthropic.com/engineer ing/claude-code-best-practices, 2025

  2. [2]

    Codex: OpenAI’s code generation agent,

    OpenAI, “Codex: OpenAI’s code generation agent,” https://openai.com/index/introducing-codex/, 2025

  3. [3]

    OpenClaw: Open-source framework for building AI assistants,

    OpenClaw Team, “OpenClaw: Open-source framework for building AI assistants,” https://github .com/openclaw/openclaw, 2024

  4. [4]

    SkillsMP: Agent skills marketplace for AI coding assistants,

    SkillsMP Community, “SkillsMP: Agent skills marketplace for AI coding assistants,” 2025, open community marketplace aggregating agent skills from GitHub in the standardized SKILL.md format. [Online]. Available: https://skillsmp.com

  5. [5]

    MCPZoo: A large-scale dataset of runnable model context protocol servers for AI agents,

    X. Wuet al., “MCPZoo: A large-scale dataset of runnable model context protocol servers for AI agents,”arXiv preprint arXiv:2512.15144, 2025

  6. [6]

    CrewAI: Framework for orchestrating role-playing, autonomous AI agents,

    J. Moura, “CrewAI: Framework for orchestrating role-playing, autonomous AI agents,” https: //github.com/crewAIInc/crewAI, 2024

  7. [7]

    AutoGen: Enabling next-gen LLM applications via multi-agent conversation,

    Q. Wu, G. Bansal, J. Zhang, Y. Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liuet al., “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,” inInternational Conference on Learning Representations (ICLR), 2024

  8. [8]

    Paperclip: Open-source orchestration for zero-human companies,

    Paperclip AI, “Paperclip: Open-source orchestration for zero-human companies,” https://github.c om/paperclipai/paperclip, 2025

  9. [9]

    TDAG: A multi-agent framework based on dynamic task decomposition and agent generation,

    Y. Wang, Z. Wu, J. Yao, and J. Su, “TDAG: A multi-agent framework based on dynamic task decomposition and agent generation,”Neural Networks, vol. 185, 2025

  10. [10]

    CoRR , volume =

    L. E. Erdoganet al., “Plan-and-act: Improving planning of agents for long-horizon tasks,”arXiv preprint arXiv:2503.09572, 2025

  11. [11]

    Self-evolving multi-agent collaboration networks for software development,

    Y. Hu, Y. Caiet al., “Self-evolving multi-agent collaboration networks for software development,” inICLR, 2025

  12. [12]

    Automatically benchmarking LLM code agents through agent-driven annotation and evaluation,

    L. Fu, B. Zhang, H. Guan, Y. Zhu, L. Qiu, W. Liu, X. Cao, X. Cai, W. Zhang, and Y. Yu, “Automatically benchmarking LLM code agents through agent-driven annotation and evaluation,”

  13. [13]

    Available: https://arxiv.org/abs/2510.24358

    [Online]. Available: https://arxiv.org/abs/2510.24358

  14. [14]

    A. S. Tanenbaum and H. Bos,Modern Operating Systems, 4th ed. Pearson, 2014

  15. [15]

    Silberschatz, P

    A. Silberschatz, P. B. Galvin, and G. Gagne,Operating System Concepts, 10th ed. Wiley, 2018

  16. [16]

    Bandit based Monte-Carlo planning,

    L. Kocsis and C. Szepesvári, “Bandit based Monte-Carlo planning,” inEuropean Conference on Machine Learning (ECML). Springer, 2006, pp. 282–293

  17. [17]

    MITPress,1991

    S.J.RussellandE.Wefald,DotheRightThing: StudiesinLimitedRationality. MITPress,1991

  18. [18]

    arXiv preprint arXiv:2411.04468 , year=

    A. Fourney, G. Bansal, H. Mozannar, C. Tan, E. Salinas, E. Zhu, F. Niedtner, G. Proebsting, G. Bassman, J. Gerritset al., “Magentic-one: A generalist multi-agent system for solving complex tasks,”arXiv preprint arXiv:2411.04468, 2024

  19. [19]

    arXiv preprint arXiv:2505.23885 , year=

    M. Hu, Y. Zhou, W. Fan, Y. Nie, B. Xia, T. Sun, Z. Ye, Z. Jin, Y. Li, Q. Chenet al., “OWL: Optimized workforce learning for general multi-agent assistance in real-world task automation,” arXiv preprint arXiv:2505.23885, 2025, neurIPS 2025

  20. [20]

    arXiv preprint arXiv:2505.16997 , year=

    R. Yeet al., “X-MAS: Towards building multi-agent systems with heterogeneous LLMs,”arXiv preprint arXiv:2505.16997, 2025. From Skills to T alent: Organising Heterogeneous Agents as a Real-World Company 22

  21. [21]

    Scaling large language model-based multi-agent collaboration,

    C. Qian, Z. Xie, Y. Wang, W. Liu, K. Zhu, H. Xiaet al., “Scaling large language model-based multi-agent collaboration,” inICLR, 2025

  22. [22]

    A lightweight modular framework for constructing autonomous agents driven by large language models: Design, implementation, and applications in AgentForge,

    A. A. Jafari, C. Ozcinar, and G. Anbarjafari, “A lightweight modular framework for constructing autonomous agents driven by large language models: Design, implementation, and applications in AgentForge,”arXiv preprint arXiv:2601.13383, 2026

  23. [23]

    AIOS: LLM agent operating system,

    K. Mei, X. Zhu, W. Xu, W. Hua, M. Jin, Z. Li, S. Xu, R. Ye, Y. Ge, and Y. Zhang, “AIOS: LLM agent operating system,” inCOLM, 2025

  24. [24]

    Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

    X. Houet al., “Model context protocol (MCP): Landscape, security threats, and future research directions,”arXiv preprint arXiv:2503.23278, 2025

  25. [25]

    Agent2agent protocol (A2A),

    Google Cloud, “Agent2agent protocol (A2A),” https://google.github.io/A2A/, 2025

  26. [26]

    Cerebrum: A platform for agent development, deployment, distribution, and discovery,

    B. Rama, K. Mei, and Y. Zhang, “Cerebrum: A platform for agent development, deployment, distribution, and discovery,” inNAACL (System Demonstrations), 2025

  27. [27]

    AgentStore: Scalableintegration of heterogeneous agents as specialized generalist computer assistant,

    C.Jia,M.Luo,Z.Dang,Q.Sun,F.Xu,J.Hu,T.Xie,andZ.Wu,“AgentStore: Scalableintegration of heterogeneous agents as specialized generalist computer assistant,” inACL, 2025

  28. [28]

    AgentScope1.0: Aflexibleyetrobustmulti-agentplatform,

    D.Gao,Z.Li,X.Pan,W.Kuangetal.,“AgentScope1.0: Aflexibleyetrobustmulti-agentplatform,” arXiv preprint arXiv:2508.16279, 2025

  29. [29]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

    J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. Narasimhan, and O. Press, “SWE- agent: Agent-computer interfaces enable automated software engineering,”arXiv preprint arXiv:2405.15793, 2025, updated 2025

  30. [30]

    Evolution of AI agent registry solutions: Centralized, enterprise, and distributed approaches,

    A. Singh, P. Chariet al., “Evolution of AI agent registry solutions: Centralized, enterprise, and distributed approaches,”arXiv preprint arXiv:2508.03095, 2025

  31. [31]

    A survey of agent interoperability protocols:

    A. Ehtesham, A. Singh, G. K. Gupta, and S. Kumar, “A survey of agent interoperability protocols: MCP, ACP, A2A, and ANP,”arXiv preprint arXiv:2505.02279, 2025

  32. [32]

    If you want coherence, orchestrate a team of rivals: Multi-agent models of organizational intelligence,

    G. Vijayaraghavan, P. Jayachandran, A. Murthy, S. Govindan, and V. Subramanian, “If you want coherence, orchestrate a team of rivals: Multi-agent models of organizational intelligence,”arXiv preprint arXiv:2601.14351, 2026

  33. [33]

    Dynamic llm-agent net- work: An llm-agent collaboration framework with agent team optimization.arXiv preprint arXiv:2310.02170, 2023

    Z. Liu, Y. Zhang, P. Li, Y. Liu, and D. Yang, “A dynamic LLM-powered agent network for task-oriented agent collaboration,” inCOLM, 2025, originally arXiv 2310.02170

  34. [34]

    AFlow: Automatingagenticworkflowgeneration,

    J.Zhang,J.Xiang,Z.Yuetal.,“AFlow: Automatingagenticworkflowgeneration,”inICLR(Oral), 2025

  35. [35]

    AgentSquare: Automatic LLM agent search in modular design space,

    Y. Shang, Y. Li, K. Zhaoet al., “AgentSquare: Automatic LLM agent search in modular design space,” inICLR, 2025

  36. [36]

    Multi-Agent Collaboration Mechanisms: A Survey of LLMs

    K.-T. Tran, D. Dao, M.-D. Nguyenet al., “Multi-agent collaboration mechanisms: A survey of LLMs,”arXiv preprint arXiv:2501.06322, 2025

  37. [37]

    Learning when to plan: Efficiently allocating test-time compute for LLM agents,

    D. Paglieri, B. Cupiał, J. Cook, U. Piterbarg, J. Tuyls, E. Grefenstette, J. N. Foerster, and J. Parker- Holder, “Learning when to plan: Efficiently allocating test-time compute for LLM agents,”arXiv preprint arXiv:2509.03581, 2025

  38. [38]

    A survey on agent workflow,

    C. Yu, Z. Cheng, H. Cuiet al., “A survey on agent workflow,”arXiv preprint arXiv:2508.01186, 2025

  39. [39]

    Understanding the planning of LLM agents: A survey

    X. Huang, W. Liu, X. Chenet al., “Understanding the planning of LLM agents: A survey,”arXiv preprint arXiv:2402.02716, 2025, updated 2025. From Skills to T alent: Organising Heterogeneous Agents as a Real-World Company 23

  40. [40]

    Agentorchestra: Orchestrating multi-agent intelligence with the tool-environment- agent (tea) protocol.arXiv preprint arXiv:2506.12508, 2025

    W. Zhang, L. Zeng, Y. Xiao, Y. Li, C. Cui, Y. Zhao, R. Hu, Y. Liu, Y. Zhou, and B. An, “AgentOrchestra: Orchestrating multi-agent intelligence with the tool-environment-agent (TEA) protocol,”arXiv preprint arXiv:2506.12508, 2025

  41. [41]

    Agentic context engineering: Evolving contexts for self- improving language models,

    Q. Zhang, C. Hu, S. Upasani, B. Ma, F. Hong, V. Kamanuru, J. Rainton, C. Wu, M. Ji, H. Li, U. Thakker, J. Zou, and K. Olukotun, “Agentic context engineering: Evolving contexts for self- improving language models,” inICLR, 2026

  42. [42]

    Automated design of agentic systems,

    S. Hu, C. Lu, and J. Clune, “Automated design of agentic systems,” inICLR, 2025

  43. [43]

    Agent workflow memory,

    Z. Z. Wang, J. Mao, D. Fried, and G. Neubig, “Agent workflow memory,” inICML, 2025

  44. [44]

    AgentTrek: Agenttrajectorysynthesisviaguidingreplaywithwebtutorials,

    Y.Xuetal.,“AgentTrek: Agenttrajectorysynthesisviaguidingreplaywithwebtutorials,”inICLR (Spotlight), 2025

  45. [45]

    arXiv preprint arXiv:2508.16153 , year=

    H. Zhou, Y. Chen, S. Guo, X. Yan, K. H. Lee, Z. Wang, K. Y. Lee, G. Zhang, K. Shao, L. Yang, and J. Wang, “Memento: Fine-tuning LLM agents without fine-tuning LLMs,”arXiv preprint arXiv:2508.16153, 2025

  46. [46]

    Trulyself-improvingagentsrequireintrinsicmetacognitivelearning,

    T.LiuandM.vanderSchaar,“Trulyself-improvingagentsrequireintrinsicmetacognitivelearning,” inProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

  47. [47]

    A comprehensive survey of self-evolving AI agents: A new paradigm bridging foundation models and lifelong agentic systems.arXivpreprint arXiv:2508.07407, 2025

    J. Fang, Y. Peng, X. Zhanget al., “A comprehensive survey of self-evolving AI agents: A new paradigm bridging foundation models and lifelong agentic systems,”arXiv preprint arXiv:2508.07407, 2025

  48. [48]

    A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

    H.-a. Gao, J. Geng, W. Huaet al., “A survey of self-evolving agents: What, when, how, and where to evolve on the path to ASI,”arXiv preprint arXiv:2507.21046, 2025

  49. [49]

    arXiv preprint arXiv:2505.19591 , year=

    Y. Dang, C. Qian, X. Luo, J. Fan, Z. Xie, R. Shi, W. Chen, C. Yang, X. Che, Y. Tianet al., “Multi- agent collaboration via evolving orchestration,”arXiv preprint arXiv:2505.19591, 2025, neurIPS 2025

  50. [50]

    Intrinsic memory agents: Het- erogeneous multi-agent LLM systems through structured contextual memory,

    S. Yuen, F. G. Medina, T. Su, Y. Du, and A. J. Sobey, “Intrinsic memory agents: Het- erogeneous multi-agent LLM systems through structured contextual memory,”arXiv preprint arXiv:2508.08997, 2025

  51. [51]

    MetaGPT: Meta programming for a multi-agent collaborative framework,

    S.Hong,M.Zhuge,J.Chen,X.Zheng,Y.Cheng,C.Zhang,J.Wang,Z.Wang,S.K.S.Yau,Z.Lin et al., “MetaGPT: Meta programming for a multi-agent collaborative framework,” inInternational Conference on Learning Representations (ICLR), 2024

  52. [52]

    Communicative agents for software development,

    C. Qian, W. Liu, H. Liu, N. Chen, Y. Dang, J. Li, C. Yang, W. Chen, Y. Su, X. Conget al., “Communicative agents for software development,” inAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

  53. [53]

    LangGraph: Build resilient language agents as graphs,

    LangChain, Inc., “LangGraph: Build resilient language agents as graphs,” https://github.com/lan gchain-ai/langgraph, 2024

  54. [54]

    Agno: A lightweight framework for building agentic software,

    Agno Team, “Agno: A lightweight framework for building agentic software,” https://github.com/a gno-agi/agno, 2024

  55. [55]

    OpenHands: An Open Platform for AI Software Developers as Generalist Agents

    X. Wang, B. Ding, Y. Peng, B. Ren, J. Li, S. Liu, D. Yang, Y. Li, Z. Liu, A. S. Rawatet al., “OpenHands: An open platform for AI software developers as generalist agents,”arXiv preprint arXiv:2407.16741, 2024

  56. [56]

    AIOS: LLM agent operating system.arXiv preprint arXiv:2403.16971, 2024

    K. Mei, Z. Li, S. Xu, R. Ye, Y. Ge, and Y. Zhang, “AIOS: LLM agent operating system,”arXiv preprint arXiv:2403.16971, 2024. From Skills to T alent: Organising Heterogeneous Agents as a Real-World Company 24

  57. [57]

    AgentScope: Aflexibleyet robust multi-agent platform,

    D.Gao,Z.Zhuang,A.Ye,J.Lin,W.Li,X.Dong,J.Liu,J.Xueetal.,“AgentScope: Aflexibleyet robust multi-agent platform,” inAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

  58. [58]

    Agency-agents: Specialized AI agent personalities for coding assistants,

    M. Sitarzewski and contributors, “Agency-agents: Specialized AI agent personalities for coding assistants,” 2025, 144+ specialist personas across 12 divisions. [Online]. Available: https://github.com/msitarzewski/agency-agents From Skills to T alent: Organising Heterogeneous Agents as a Real-World Company 25 Appendix A Organisational Interface Signatures ...

  59. [59]

    Addresses a critical pain point for mobile developers

    app-store-preflight-skills⋆936https://github.com/truongduy2611/app-store -preflight-skills AI agent skill for scanning iOS/macOS projects for App Store rejection patterns. Addresses a critical pain point for mobile developers

  60. [60]

    Opens Chinese market access for global AI agents

    weixin-agent-sdk⋆887https://github.com/wong2/weixin-agent-sdk Clawbot WeChat integration for any Agent. Opens Chinese market access for global AI agents

  61. [61]

    Represents a breakthrough in autonomous self-improvement

    HyperAgents⋆784https://github.com/facebookresearch/HyperAgents Self-referential self-improving agents for any computable task. Represents a breakthrough in autonomous self-improvement. High Growth Projects (200–500 Stars)

  62. [62]

    cc-skills-golang⋆281https://github.com/samber/cc-skills-golangGolang agentic skills collection

  63. [63]

    astronclaw-tutorial⋆268https://github.com/iflytek/astronclaw-tutorial Complete tutorial for AstronClaw (cloud) & Loomy (desktop) AI

  64. [64]

    ClawLink⋆246https://github.com/CN-Syndra/ClawLinkAI Agent Social Network for autonomous agent communication

  65. [65]

    ai agent

    agent-kernel⋆226https://github.com/oguzbilgic/agent-kernelMinimal kernel for stateful AI coding agents. Emerging Innovation Areas Infrastructure & Tooling:usecomputer(136 stars) — Fast computer automation CLI;agent- kanban(22 stars) — Mission control for AI workforce. Security & Compliance:ctf-agent(194 stars) — Autonomous CTF solver;copilot-cli- knowledg...