pith. machine review for the scientific record. sign in

arxiv: 2605.10754 · v1 · submitted 2026-05-11 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:59 UTC · model grok-4.3

classification 💻 cs.AI
keywords agent cyberneticsfoundation agentsLLM agentscyberneticsagent design principlesreliabilityself-improvementlong-horizon agents
0
0 comments X

The pith

Cybernetics supplies the first principles needed to build reliable, long-running foundation agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Foundation agents powered by large language models handle complex tasks over thousands of steps yet are assembled through trial-and-error rather than theory. The paper claims classical cybernetics, the science of control and communication in complex systems, fills this gap. It maps six canonical cybernetic laws onto six agent design principles and combines them into three engineering goals: reliability, lifelong running, and self-improvement. The resulting framework, Agent Cybernetics, is used to diagnose problems and recommend fixes in code generation, computer use, and automated research.

Core claim

By mapping six canonical laws of classical cybernetics onto six agent design principles and synthesizing those principles into the three engineering desiderata of reliability, lifelong running, and self-improvement, the authors establish Agent Cybernetics as the missing theoretical scaffold for foundation agents that perceive, reason, and act across long horizons.

What carries the argument

The direct mapping of six classical cybernetics laws to six modern agent design principles, which is then synthesized into the three desiderata that define the Agent Cybernetics framework.

If this is right

  • Code generation agents remain on-task across many steps when control and communication laws are applied to error handling.
  • Computer-use agents achieve lifelong running by adapting to changing interfaces through feedback mechanisms drawn from cybernetics.
  • Automated research agents can pursue safe self-improvement when capability growth is constrained by the three desiderata.
  • Common failure modes such as drifting off-task or exceeding representational capacity are diagnosed and mitigated using the mapped principles.
  • Engineering shifts from assembling primitives by trial and error to designing agents from the three explicit desiderata.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could inspire new evaluation benchmarks that test agents on cybernetic properties like stability and adaptation rather than task success alone.
  • Links to broader control theory may allow agents to borrow stability guarantees from engineering domains outside AI.
  • Applying the same mapping to non-LLM agents could reveal whether the approach is specific to language models or general to autonomous systems.

Load-bearing premise

The six laws of classical cybernetics can be mapped directly onto LLM-based agents in a useful way without further derivation or empirical checks.

What would settle it

Build two versions of a long-horizon agent for the same task, one following the Agent Cybernetics principles and one following standard engineering practice, then measure which version stays on-task longer and exhibits safer self-improvement.

Figures

Figures reproduced from arXiv: 2605.10754 by Chang Yang, He Zhao, Shuyue Hu, Xinrun Wang, Zhuoyi Lin.

Figure 1
Figure 1. Figure 1: From Classical Cybernetics to Agent cybernetics [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The terminology between classical cybernetics (left) and agent cybernetics (right). The [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

LLM-based foundation agents that perceive, reason, and act across thousands of reasoning steps are rapidly becoming the dominant paradigm for deploying artificial intelligence in open-ended, long-horizon complex tasks. Despite this significance, the field remains overwhelmingly engineering-driven. Engineering practice has converged on useful primitives (tool loops, memory banks, harnesses, reflection steps), yet these are assembled by empirical trial and error rather than from first principles. Fundamental questions remain open: under what conditions does a long-running agent remain on-task? How should an agent respond when its environment exceeds its representational capacity? What architectural properties are necessary for safe self-improvement? We argue that cybernetics, the mid-twentieth-century science of control and communication in complex systems, provides the missing theoretical scaffold for foundation agents. By mapping six canonical laws of classical cybernetics onto six agent design principles, and synthesizing those principles into three engineering desiderata (reliability, lifelong running, and self-Improvement), we arrive at a framework termed Agent Cybernetics. Three application domains, code generation, computer use and automated research, exemplify the analytical framework of agent cybernetics by identifying failure modes and concrete engineering recommendations. We hope that agent cybernetics opens a new research venue and establishes the scientific foundation that foundation agents need for principled, reliable real-world deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that cybernetics, the mid-twentieth-century science of control and communication, supplies the missing theoretical scaffold for LLM-based foundation agents. It does so by mapping six canonical laws of classical cybernetics onto six agent design principles, synthesizing those principles into three engineering desiderata (reliability, lifelong running, and self-improvement), and applying the resulting 'Agent Cybernetics' framework to identify failure modes and concrete recommendations in the domains of code generation, computer use, and automated research.

Significance. If the mapping can be shown to yield non-trivial, testable guidance that improves upon existing agent architectures, the framework could help organize engineering practice around long-horizon reliability and safe self-improvement. As presented, however, the contribution remains an interpretive synthesis without derivation, empirical tests, or direct comparison to prior agent models, limiting its immediate impact on the field.

major comments (3)
  1. [Framework introduction and mapping section] The central mapping of the six classical cybernetics laws to six agent design principles is asserted without step-by-step derivation, adaptation for discrete token-based stochastic reasoning, finite context windows, or sampling noise. This absence is load-bearing because the claim that cybernetics is 'the missing science' rests on the mapping being useful and non-circular.
  2. [Application domains section] The three application domains (code generation, computer use, automated research) identify failure modes conceptually but supply no measurements, baselines, or ablation studies showing that the proposed principles resolve the stated open questions on on-task persistence or safe self-improvement.
  3. [Related work and synthesis section] No explicit comparison is made to existing agent paradigms (ReAct-style loops, memory-augmented systems, hierarchical planners) to demonstrate that the synthesized desiderata are novel or superior rather than a relabeling of known engineering practices.
minor comments (2)
  1. [Abstract] The abstract contains an inconsistent capitalization ('self-Improvement').
  2. [References] Ensure original sources for the six canonical cybernetics laws are cited with precise references rather than relying on secondary summaries.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive review and for identifying areas where the presentation of the Agent Cybernetics framework can be strengthened. We address each major comment below, clarifying the conceptual scope of the work while outlining targeted revisions to improve rigor and clarity.

read point-by-point responses
  1. Referee: [Framework introduction and mapping section] The central mapping of the six classical cybernetics laws to six agent design principles is asserted without step-by-step derivation, adaptation for discrete token-based stochastic reasoning, finite context windows, or sampling noise. This absence is load-bearing because the claim that cybernetics is 'the missing science' rests on the mapping being useful and non-circular.

    Authors: We acknowledge that the mappings are presented as an interpretive synthesis rather than a formal derivation from first principles. The intent is to adapt classical cybernetic laws (originally for continuous control systems) to the discrete, stochastic, and context-limited setting of LLM agents by identifying functional analogies in control, feedback, and stability. In the revised manuscript we will expand the framework section with an explicit justification subsection for each mapping. This will include step-by-step reasoning on how each law translates to token-based reasoning, finite context, and sampling noise, using concrete agent failure examples to show non-circularity. The expanded discussion will make the utility of the mapping more transparent while preserving the paper's conceptual character. revision: partial

  2. Referee: [Application domains section] The three application domains (code generation, computer use, automated research) identify failure modes conceptually but supply no measurements, baselines, or ablation studies showing that the proposed principles resolve the stated open questions on on-task persistence or safe self-improvement.

    Authors: The manuscript is a theoretical synthesis whose primary contribution is the framework itself; the application domains serve to illustrate how the desiderata surface concrete failure modes and recommendations rather than to validate them empirically. We agree that the absence of measurements limits immediate impact. In revision we will add a forward-looking subsection to the applications section that outlines testable predictions and experimental designs (e.g., persistence metrics in long-horizon code tasks or safety checks in self-improvement loops) that future work could use to evaluate the principles. This keeps the current scope intact while addressing the referee's concern about empirical grounding. revision: partial

  3. Referee: [Related work and synthesis section] No explicit comparison is made to existing agent paradigms (ReAct-style loops, memory-augmented systems, hierarchical planners) to demonstrate that the synthesized desiderata are novel or superior rather than a relabeling of known engineering practices.

    Authors: We will insert a new subsection (likely in the synthesis or related-work portion) that directly contrasts Agent Cybernetics with ReAct-style loops, memory-augmented architectures, and hierarchical planners. The comparison will show that while these paradigms supply useful primitives, they lack an explicit organizing theory for long-horizon reliability, lifelong operation under representational limits, and safe self-improvement. By mapping the three desiderata onto these existing practices, we will argue that Agent Cybernetics supplies a higher-level scaffold that integrates rather than duplicates them. This addition will clarify the framework's novelty without changing the paper's non-empirical nature. revision: yes

Circularity Check

1 steps flagged

Agent Cybernetics is defined by the authors' mapping and synthesis, making the 'missing science' claim self-referential by construction

specific steps
  1. self definitional [Abstract]
    "By mapping six canonical laws of classical cybernetics onto six agent design principles, and synthesizing those principles into three engineering desiderata (reliability, lifelong running, and self-Improvement), we arrive at a framework termed Agent Cybernetics."

    The framework 'Agent Cybernetics' is explicitly defined as the output of the authors' mapping and synthesis operation; the assertion that cybernetics thereby supplies the missing scaffold for foundation agents is therefore equivalent to the construction itself rather than derived from independent evidence or non-trivial transformation of the laws.

full rationale

The paper's derivation chain consists solely of asserting a direct mapping from six classical cybernetics laws to six agent principles, followed by synthesis into three desiderata, after which the result is named 'Agent Cybernetics' and declared the missing theoretical scaffold. No equations, adaptation steps for token-based or stochastic agents, or contrasts with prior frameworks appear in the provided text. This reduces the central claim to the authors' definitional act rather than an independent derivation or external benchmark, satisfying the self-definitional pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven premise that classical cybernetics laws transfer directly to LLM agents; no free parameters are fitted, but the framework itself is an invented organizing structure whose utility is asserted rather than demonstrated.

axioms (1)
  • domain assumption Six canonical laws of classical cybernetics apply to foundation agents
    Invoked in the abstract as the basis for the mapping without derivation or justification for why these particular laws are the right ones for LLM agents.
invented entities (1)
  • Agent Cybernetics framework no independent evidence
    purpose: To serve as the theoretical scaffold that turns cybernetics laws into agent design principles and three desiderata
    New term and synthesis introduced by the authors; no independent evidence or falsifiable prediction is supplied outside the mapping itself.

pith-pipeline@v0.9.0 · 5541 in / 1513 out tokens · 74009 ms · 2026-05-12T04:59:19.472356+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 11 internal anchors

  1. [1]

    On-policy distillation of language models: Learning from self-generated mistakes

    Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos Garea, Matthieu Geist, and Olivier Bachem. On-policy distillation of language models: Learning from self-generated mistakes. InICLR, 2024

  2. [2]

    Springer Science & Business Media, 2013

    William Ashby.Design for a Brain: The Origin of Adaptive Behaviour. Springer Science & Business Media, 2013

  3. [3]

    Chapman & Hall, 1956

    William Ross Ashby.An Introduction to Cybernetics. Chapman & Hall, 1956

  4. [4]

    Bonatti, D

    Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, et al. Windows agent arena: Evaluating multi-modal os agents at scale.arXiv preprint arXiv:2409.08264, 2024

  5. [5]

    Norton & Co., 1939

    Walter Bradford Cannon.The Wisdom of the Body. Norton & Co., 1939

  6. [6]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready AI agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

  7. [7]

    On the Measure of Intelligence

    François Chollet. On the measure of intelligence.arXiv preprint arXiv:1911.01547, 2019

  8. [8]

    SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

    Xiang Deng, Jeff Da, Edwin Pan, Yannis Yiming He, Charles Ide, Kanak Garg, Niklas Lauffer, Andrew Park, Nitin Pasari, Chetan Rane, et al. SWE-bench pro: Can AI agents solve long- horizon software engineering tasks?arXiv preprint arXiv:2509.16941, 2025

  9. [9]

    Tenenbaum, and Igor Mordatch

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. InICML, 2024

  10. [10]

    Internagent-1.5: A unified agentic framework for long-horizon autonomous scientific discovery.arXiv preprint arXiv:2602.08990, 2026

    Shiyang Feng, Runmin Ma, Xiangchao Yan, Yue Fan, Yusong Hu, Songtao Huang, Shuaiyu Zhang, Zongsheng Cao, Tianshuo Peng, Jiakang Yuan, et al. Internagent-1.5: A unified agentic framework for long-horizon autonomous scientific discovery.arXiv preprint arXiv:2602.08990, 2026

  11. [11]

    Autonomous closed-loop framework for reproducible perovskite solar cells.Nature, pages 1–3, 2026

    Danpeng Gao, Shuaihua Lu, Chunlei Zhang, Ning Wang, Zexin Yu, Xianglang Sun, Rebecca Martin, Francesco Vanin, Liangchen Qian, Nicholas Long, et al. Autonomous closed-loop framework for reproducible perovskite solar cells.Nature, pages 1–3, 2026

  12. [12]

    On the Reliability of Computer Use Agents

    Gonzalo Gonzalez-Pumariega, Saaket Agashe, Jiachen Yang, Ang Li, and Xin Eric Wang. On the reliability of computer use agents.arXiv preprint arXiv:2604.17849, 2026

  13. [13]

    A survey on LLM-as-a-judge.The Innovation, 2024

    Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al. A survey on LLM-as-a-judge.The Innovation, 2024

  14. [14]

    Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025

    Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025

  15. [15]

    Olympiad-level formal mathematical reasoning with reinforcement learning.Nature, pages 1–3, 2025

    Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z Horváth, Goran Žuži´c, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, et al. Olympiad-level formal mathematical reasoning with reinforcement learning.Nature, pages 1–3, 2025

  16. [16]

    Physical Intelligence, Bo Ai, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Greg Balke, Kevin Black, George Bokinsky, Shihao Cao, Thomas Charbonnier, Vedant Choudhary, Foster Collins, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Maitrayee Dhaka, Jared DiCarlo, Danny Driess, Michael Equi, Adnan Esmail, Yunhao Fang, Chelsea Finn, 15 Cather...

  17. [17]

    Adaptation of agentic ai: A survey of post-training, memory, and skills.arXiv preprint arXiv:2512.16301, 2026a

    Pengcheng Jiang, Jiacheng Lin, Zhiyi Shi, Zifeng Wang, Luxi He, Yichen Wu, Ming Zhong, Peiyang Song, Qizheng Zhang, Heng Wang, et al. Adaptation of agentic AI.arXiv preprint arXiv:2512.16301, 2025

  18. [18]

    SWE-bench: Can language models resolve real-world github issues? In ICLR, 2024

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. SWE-bench: Can language models resolve real-world github issues? In ICLR, 2024

  19. [19]

    OS-Harm: A benchmark for measuring safety of computer use agents

    Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, J Zico Kolter, Nicolas Flammarion, and Maksym Andriushchenko. OS-Harm: A benchmark for measuring safety of computer use agents. InNeurIPS Datasets and Benchmarks Track, 2025

  20. [20]

    Meta-Harness: End-to-End Optimization of Model Harnesses

    Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta-harness: End-to-end optimization of model harnesses.arXiv preprint arXiv:2603.28052, 2026

  21. [21]

    Retrieval-augmented generation for knowledge-intensive NLP tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. InNeurIPS, pages 9459–9474, 2020

  22. [22]

    SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

    Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, et al. SkillsBench: Benchmarking how well agent skills work across diverse tasks.arXiv preprint arXiv:2602.12670, 2026

  23. [23]

    Autoharness: improving llm agents by automatically synthesizing a code harness.arXiv preprint arXiv:2603.03329, 2026

    Xinghua Lou, Miguel Lázaro-Gredilla, Antoine Dedieu, Carter Wendelken, Wolfgang Lehrach, and Kevin P Murphy. AutoHarness: improving LLM agents by automatically synthesizing a code harness.arXiv preprint arXiv:2603.03329, 2026

  24. [24]

    Training language models to follow instructions with human feedback

    Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. InNeurIPS, pages 27730–27744, 2022

  25. [25]

    Tool learning with foundation models.ACM Computing Surveys, 57(4):1–40, 2024

    Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Xuanhe Zhou, Yufei Huang, Chaojun Xiao, et al. Tool learning with foundation models.ACM Computing Surveys, 57(4):1–40, 2024

  26. [26]

    Direct preference optimization: your language model is secretly a reward model

    Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, and Chelsea Finn. Direct preference optimization: your language model is secretly a reward model. InNeurIPS, pages 53728–53741, 2023

  27. [27]

    Mastering Atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

    Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Si- mon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering Atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

  28. [28]

    A mathematical theory of communication.The Bell System Technical Journal, 27(3):379–423, 1948

    Claude Elwood Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379–423, 1948

  29. [29]

    Reflexion: language agents with verbal reinforcement learning

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: language agents with verbal reinforcement learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, pages 8634–8652, 2023. 16

  30. [30]

    Cognitive architectures for language agents.Transactions on Machine Learning Research, 2023

    Theodore Sumers, Shunyu Yao, Karthik R Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents.Transactions on Machine Learning Research, 2023

  31. [31]

    MIT press Cambridge, 1998

    Richard S Sutton and Andrew G Barto.Reinforcement Learning: An Introduction. MIT press Cambridge, 1998

  32. [32]

    Karlsson, Bo An, Shuicheng Y AN, and Zongqing Lu

    Weihao Tan, Wentao Zhang, Xinrun Xu, Haochong Xia, Ziluo Ding, Boyu Li, Bohan Zhou, Junpeng Yue, Jiechuan Jiang, Yewen Li, Ruyi An, Molei Qin, Chuqiao Zong, Longtao Zheng, YuJie Wu, Xiaoqiang Chai, Yifei Bi, Tianbao Xie, Pengjie Gu, Xiyun Li, Ceyao Zhang, Long Tian, Chaojie Wang, Xinrun Wang, Börje F. Karlsson, Bo An, Shuicheng Y AN, and Zongqing Lu. Crad...

  33. [33]

    The information bottleneck method

    Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method. arXiv preprint physics/0004057, 2000

  34. [34]

    Deep learning and the information bottleneck principle

    Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. Ieee, 2015

  35. [35]

    Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

    Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

  36. [36]

    McGraw-Hill, New York, 1954

    Hsue Shen Tsien.Engineering Cybernetics. McGraw-Hill, New York, 1954

  37. [37]

    Parametrically retargetable decision-makers tend to seek power

    Alexander Matt Turner and Prasad Tadepalli. Parametrically retargetable decision-makers tend to seek power. InNeurIPS, pages 31391–31401, 2022

  38. [38]

    Cybernetics of cybernetics

    Heinz V on Foerster. Cybernetics of cybernetics. InUnderstanding understanding: Essays on cybernetics and cognition, pages 283–286. Springer, 2003

  39. [39]

    Grand Central Publishing, 1988

    Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society. Grand Central Publishing, 1988

  40. [40]

    MIT Press, 2019

    Norbert Wiener.Cybernetics or Control and Communication in the Animal and the Machine. MIT Press, 2019

  41. [41]

    John wiley & sons, 2009

    Michael Wooldridge.An Introduction to Multiagent Systems. John wiley & sons, 2009

  42. [42]

    OSWorld: benchmarking multimodal agents for open-ended tasks in real computer environments

    Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, et al. OSWorld: benchmarking multimodal agents for open-ended tasks in real computer environments. InNeurIPS, pages 52040–52094, 2024

  43. [43]

    Sim- pletir: End-to-end reinforcement learning for multi-turn tool-integrated reasoning.arXiv preprint arXiv:2509.02479,

    Zhenghai Xue, Longtao Zheng, Qian Liu, Yingru Li, Xiaosen Zheng, Zejun Ma, and Bo An. SimpleTIR: End-to-end reinforcement learning for multi-turn tool-integrated reasoning.arXiv preprint arXiv:2509.02479, 2025

  44. [44]

    arXiv preprint arXiv:2603.01145 , year=

    Yutao Yang, Junsong Li, Qianjun Pan, Bihao Zhan, Yuxuan Cai, Lin Du, Jie Zhou, Kai Chen, Qin Chen, Xin Li, et al. Autoskill: Experience-driven lifelong learning via skill self-evolution. arXiv preprint arXiv:2603.01145, 2026

  45. [45]

    On-Policy Context Distillation for Language Models

    Tianzhu Ye, Li Dong, Xun Wu, Shaohan Huang, and Furu Wei. On-policy context distillation for language models.arXiv preprint arXiv:2602.12275, 2026

  46. [46]

    Zhang, Z

    Hangfan Zhang, Zhiyao Cui, Jianhao Chen, Xinrun Wang, Qiaosheng Zhang, Zhen Wang, Dinghao Wu, and Shuyue Hu. Stop overvaluing multi-agent debate–we must rethink evaluation and embrace model heterogeneity.arXiv preprint arXiv:2502.08788, 2025

  47. [47]

    Darwin G

    Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, and Jeff Clune. Darwin Godel machine: Open-ended evolution of self-improving agents.arXiv preprint arXiv:2505.22954, 2025

  48. [48]

    Hyperagents.arXiv preprint arXiv:2603.19461, 2026

    Jenny Zhang, Bingchen Zhao, Wannan Yang, Jakob Foerster, Jeff Clune, Minqi Jiang, Sam Devlin, and Tatiana Shavrina. Hyperagents.arXiv preprint arXiv:2603.19461, 2026. 17

  49. [49]

    A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

    Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Wenyue Hua, Haolun Wu, Zhihan Guo, Yufei Wang, Niklas Muennighoff, et al. A survey on test-time scaling in large language models: What, how, where, and how well?arXiv preprint arXiv:2503.24235, 2025

  50. [50]

    A multimodal robotic platform for multi-element electrocatalyst discovery.Nature, 647(8089):390–396, 2025

    Zhen Zhang, Zhichu Ren, Chia-Wei Hsu, Weibin Chen, Zhang-Wei Hong, Chi-Feng Lee, Aubrey Penn, Hongbin Xu, Daniel J Zheng, Shuhan Miao, et al. A multimodal robotic platform for multi-element electrocatalyst discovery.Nature, 647(8089):390–396, 2025

  51. [51]

    Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

    Chenyu Zhou, Huacan Chai, Wenteng Chen, Zihan Guo, Rong Shan, Yuanyi Song, Tianyi Xu, Yingxuan Yang, Aofan Yu, Weiming Zhang, et al. Externalization in LLM agents: A unified review of memory, skills, protocols and harness engineering.arXiv preprint arXiv:2604.08224, 2026

  52. [52]

    essential variables

    Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia, Jialin Wu, Paul Wohlhart, Stefan Welker, Ayzaan Wahid, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. InCoRL, pages 2165–2183, 2023. 18 A Frequent Asked Questions (FAQs) A.1 What Are the Novelties of The Two-level Homeostatic Architecture in Agent C...