arxiv: 2605.10754 · v1 · submitted 2026-05-11 · 💻 cs.AI

Recognition: 2 theorem links

· Lean Theorem

The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents

Xinrun Wang , Chang Yang , He Zhao , Zhuoyi Lin , Shuyue Hu

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:59 UTC · model grok-4.3

classification 💻 cs.AI

keywords agent cyberneticsfoundation agentsLLM agentscyberneticsagent design principlesreliabilityself-improvementlong-horizon agents

0 comments

The pith

Cybernetics supplies the first principles needed to build reliable, long-running foundation agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Foundation agents powered by large language models handle complex tasks over thousands of steps yet are assembled through trial-and-error rather than theory. The paper claims classical cybernetics, the science of control and communication in complex systems, fills this gap. It maps six canonical cybernetic laws onto six agent design principles and combines them into three engineering goals: reliability, lifelong running, and self-improvement. The resulting framework, Agent Cybernetics, is used to diagnose problems and recommend fixes in code generation, computer use, and automated research.

Core claim

By mapping six canonical laws of classical cybernetics onto six agent design principles and synthesizing those principles into the three engineering desiderata of reliability, lifelong running, and self-improvement, the authors establish Agent Cybernetics as the missing theoretical scaffold for foundation agents that perceive, reason, and act across long horizons.

What carries the argument

The direct mapping of six classical cybernetics laws to six modern agent design principles, which is then synthesized into the three desiderata that define the Agent Cybernetics framework.

If this is right

Code generation agents remain on-task across many steps when control and communication laws are applied to error handling.
Computer-use agents achieve lifelong running by adapting to changing interfaces through feedback mechanisms drawn from cybernetics.
Automated research agents can pursue safe self-improvement when capability growth is constrained by the three desiderata.
Common failure modes such as drifting off-task or exceeding representational capacity are diagnosed and mitigated using the mapped principles.
Engineering shifts from assembling primitives by trial and error to designing agents from the three explicit desiderata.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could inspire new evaluation benchmarks that test agents on cybernetic properties like stability and adaptation rather than task success alone.
Links to broader control theory may allow agents to borrow stability guarantees from engineering domains outside AI.
Applying the same mapping to non-LLM agents could reveal whether the approach is specific to language models or general to autonomous systems.

Load-bearing premise

The six laws of classical cybernetics can be mapped directly onto LLM-based agents in a useful way without further derivation or empirical checks.

What would settle it

Build two versions of a long-horizon agent for the same task, one following the Agent Cybernetics principles and one following standard engineering practice, then measure which version stays on-task longer and exhibits safer self-improvement.

Figures

Figures reproduced from arXiv: 2605.10754 by Chang Yang, He Zhao, Shuyue Hu, Xinrun Wang, Zhuoyi Lin.

**Figure 2.** Figure 2: The terminology between classical cybernetics (left) and agent cybernetics (right). The [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

LLM-based foundation agents that perceive, reason, and act across thousands of reasoning steps are rapidly becoming the dominant paradigm for deploying artificial intelligence in open-ended, long-horizon complex tasks. Despite this significance, the field remains overwhelmingly engineering-driven. Engineering practice has converged on useful primitives (tool loops, memory banks, harnesses, reflection steps), yet these are assembled by empirical trial and error rather than from first principles. Fundamental questions remain open: under what conditions does a long-running agent remain on-task? How should an agent respond when its environment exceeds its representational capacity? What architectural properties are necessary for safe self-improvement? We argue that cybernetics, the mid-twentieth-century science of control and communication in complex systems, provides the missing theoretical scaffold for foundation agents. By mapping six canonical laws of classical cybernetics onto six agent design principles, and synthesizing those principles into three engineering desiderata (reliability, lifelong running, and self-Improvement), we arrive at a framework termed Agent Cybernetics. Three application domains, code generation, computer use and automated research, exemplify the analytical framework of agent cybernetics by identifying failure modes and concrete engineering recommendations. We hope that agent cybernetics opens a new research venue and establishes the scientific foundation that foundation agents need for principled, reliable real-world deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper asserts a direct mapping from classical cybernetics laws to LLM agent design but supplies no derivations, adaptations, or tests to show the mapping adds anything beyond relabeling.

read the letter

The main takeaway is that this work reframes existing ideas about control and feedback in complex systems as a new foundation for long-horizon LLM agents. It names three engineering goals—reliability, lifelong operation, and safe self-improvement—and claims cybernetics supplies the missing principles for them. That framing is the only real novelty here; the rest recycles known problems like staying on task across thousands of steps or handling environments that exceed an agent's context window.

Referee Report

3 major / 2 minor

Summary. The paper claims that cybernetics, the mid-twentieth-century science of control and communication, supplies the missing theoretical scaffold for LLM-based foundation agents. It does so by mapping six canonical laws of classical cybernetics onto six agent design principles, synthesizing those principles into three engineering desiderata (reliability, lifelong running, and self-improvement), and applying the resulting 'Agent Cybernetics' framework to identify failure modes and concrete recommendations in the domains of code generation, computer use, and automated research.

Significance. If the mapping can be shown to yield non-trivial, testable guidance that improves upon existing agent architectures, the framework could help organize engineering practice around long-horizon reliability and safe self-improvement. As presented, however, the contribution remains an interpretive synthesis without derivation, empirical tests, or direct comparison to prior agent models, limiting its immediate impact on the field.

major comments (3)

[Framework introduction and mapping section] The central mapping of the six classical cybernetics laws to six agent design principles is asserted without step-by-step derivation, adaptation for discrete token-based stochastic reasoning, finite context windows, or sampling noise. This absence is load-bearing because the claim that cybernetics is 'the missing science' rests on the mapping being useful and non-circular.
[Application domains section] The three application domains (code generation, computer use, automated research) identify failure modes conceptually but supply no measurements, baselines, or ablation studies showing that the proposed principles resolve the stated open questions on on-task persistence or safe self-improvement.
[Related work and synthesis section] No explicit comparison is made to existing agent paradigms (ReAct-style loops, memory-augmented systems, hierarchical planners) to demonstrate that the synthesized desiderata are novel or superior rather than a relabeling of known engineering practices.

minor comments (2)

[Abstract] The abstract contains an inconsistent capitalization ('self-Improvement').
[References] Ensure original sources for the six canonical cybernetics laws are cited with precise references rather than relying on secondary summaries.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive review and for identifying areas where the presentation of the Agent Cybernetics framework can be strengthened. We address each major comment below, clarifying the conceptual scope of the work while outlining targeted revisions to improve rigor and clarity.

read point-by-point responses

Referee: [Framework introduction and mapping section] The central mapping of the six classical cybernetics laws to six agent design principles is asserted without step-by-step derivation, adaptation for discrete token-based stochastic reasoning, finite context windows, or sampling noise. This absence is load-bearing because the claim that cybernetics is 'the missing science' rests on the mapping being useful and non-circular.

Authors: We acknowledge that the mappings are presented as an interpretive synthesis rather than a formal derivation from first principles. The intent is to adapt classical cybernetic laws (originally for continuous control systems) to the discrete, stochastic, and context-limited setting of LLM agents by identifying functional analogies in control, feedback, and stability. In the revised manuscript we will expand the framework section with an explicit justification subsection for each mapping. This will include step-by-step reasoning on how each law translates to token-based reasoning, finite context, and sampling noise, using concrete agent failure examples to show non-circularity. The expanded discussion will make the utility of the mapping more transparent while preserving the paper's conceptual character. revision: partial
Referee: [Application domains section] The three application domains (code generation, computer use, automated research) identify failure modes conceptually but supply no measurements, baselines, or ablation studies showing that the proposed principles resolve the stated open questions on on-task persistence or safe self-improvement.

Authors: The manuscript is a theoretical synthesis whose primary contribution is the framework itself; the application domains serve to illustrate how the desiderata surface concrete failure modes and recommendations rather than to validate them empirically. We agree that the absence of measurements limits immediate impact. In revision we will add a forward-looking subsection to the applications section that outlines testable predictions and experimental designs (e.g., persistence metrics in long-horizon code tasks or safety checks in self-improvement loops) that future work could use to evaluate the principles. This keeps the current scope intact while addressing the referee's concern about empirical grounding. revision: partial
Referee: [Related work and synthesis section] No explicit comparison is made to existing agent paradigms (ReAct-style loops, memory-augmented systems, hierarchical planners) to demonstrate that the synthesized desiderata are novel or superior rather than a relabeling of known engineering practices.

Authors: We will insert a new subsection (likely in the synthesis or related-work portion) that directly contrasts Agent Cybernetics with ReAct-style loops, memory-augmented architectures, and hierarchical planners. The comparison will show that while these paradigms supply useful primitives, they lack an explicit organizing theory for long-horizon reliability, lifelong operation under representational limits, and safe self-improvement. By mapping the three desiderata onto these existing practices, we will argue that Agent Cybernetics supplies a higher-level scaffold that integrates rather than duplicates them. This addition will clarify the framework's novelty without changing the paper's non-empirical nature. revision: yes

Circularity Check

1 steps flagged

Agent Cybernetics is defined by the authors' mapping and synthesis, making the 'missing science' claim self-referential by construction

specific steps

self definitional [Abstract]
"By mapping six canonical laws of classical cybernetics onto six agent design principles, and synthesizing those principles into three engineering desiderata (reliability, lifelong running, and self-Improvement), we arrive at a framework termed Agent Cybernetics."

The framework 'Agent Cybernetics' is explicitly defined as the output of the authors' mapping and synthesis operation; the assertion that cybernetics thereby supplies the missing scaffold for foundation agents is therefore equivalent to the construction itself rather than derived from independent evidence or non-trivial transformation of the laws.

full rationale

The paper's derivation chain consists solely of asserting a direct mapping from six classical cybernetics laws to six agent principles, followed by synthesis into three desiderata, after which the result is named 'Agent Cybernetics' and declared the missing theoretical scaffold. No equations, adaptation steps for token-based or stochastic agents, or contrasts with prior frameworks appear in the provided text. This reduces the central claim to the authors' definitional act rather than an independent derivation or external benchmark, satisfying the self-definitional pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven premise that classical cybernetics laws transfer directly to LLM agents; no free parameters are fitted, but the framework itself is an invented organizing structure whose utility is asserted rather than demonstrated.

axioms (1)

domain assumption Six canonical laws of classical cybernetics apply to foundation agents
Invoked in the abstract as the basis for the mapping without derivation or justification for why these particular laws are the right ones for LLM agents.

invented entities (1)

Agent Cybernetics framework no independent evidence
purpose: To serve as the theoretical scaffold that turns cybernetics laws into agent design principles and three desiderata
New term and synthesis introduced by the authors; no independent evidence or falsifiable prediction is supplied outside the mapping itself.

pith-pipeline@v0.9.0 · 5541 in / 1513 out tokens · 74009 ms · 2026-05-12T04:59:19.472356+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By mapping six canonical laws of classical cybernetics onto six agent design principles, and synthesizing those principles into three engineering desiderata (reliability, lifelong running, and self-Improvement)
IndisputableMonolith/Foundation/ArrowOfTime.lean z_monotone_absolute echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Context Entropy Minimization... H(output)≥H(E)−C channel

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 11 internal anchors

[1]

On-policy distillation of language models: Learning from self-generated mistakes

Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos Garea, Matthieu Geist, and Olivier Bachem. On-policy distillation of language models: Learning from self-generated mistakes. InICLR, 2024

work page 2024
[2]

Springer Science & Business Media, 2013

William Ashby.Design for a Brain: The Origin of Adaptive Behaviour. Springer Science & Business Media, 2013

work page 2013
[3]

Chapman & Hall, 1956

William Ross Ashby.An Introduction to Cybernetics. Chapman & Hall, 1956

work page 1956
[4]

Bonatti, D

Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, et al. Windows agent arena: Evaluating multi-modal os agents at scale.arXiv preprint arXiv:2409.08264, 2024

work page arXiv 2024
[5]

Norton & Co., 1939

Walter Bradford Cannon.The Wisdom of the Body. Norton & Co., 1939

work page 1939
[6]

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready AI agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

On the Measure of Intelligence

François Chollet. On the measure of intelligence.arXiv preprint arXiv:1911.01547, 2019

work page internal anchor Pith review arXiv 1911
[8]

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Xiang Deng, Jeff Da, Edwin Pan, Yannis Yiming He, Charles Ide, Kanak Garg, Niklas Lauffer, Andrew Park, Nitin Pasari, Chetan Rane, et al. SWE-bench pro: Can AI agents solve long- horizon software engineering tasks?arXiv preprint arXiv:2509.16941, 2025

work page internal anchor Pith review arXiv 2025
[9]

Tenenbaum, and Igor Mordatch

Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. InICML, 2024

work page 2024
[10]

Internagent-1.5: A unified agentic framework for long-horizon autonomous scientific discovery.arXiv preprint arXiv:2602.08990, 2026

Shiyang Feng, Runmin Ma, Xiangchao Yan, Yue Fan, Yusong Hu, Songtao Huang, Shuaiyu Zhang, Zongsheng Cao, Tianshuo Peng, Jiakang Yuan, et al. Internagent-1.5: A unified agentic framework for long-horizon autonomous scientific discovery.arXiv preprint arXiv:2602.08990, 2026

work page arXiv 2026
[11]

Autonomous closed-loop framework for reproducible perovskite solar cells.Nature, pages 1–3, 2026

Danpeng Gao, Shuaihua Lu, Chunlei Zhang, Ning Wang, Zexin Yu, Xianglang Sun, Rebecca Martin, Francesco Vanin, Liangchen Qian, Nicholas Long, et al. Autonomous closed-loop framework for reproducible perovskite solar cells.Nature, pages 1–3, 2026

work page 2026
[12]

On the Reliability of Computer Use Agents

Gonzalo Gonzalez-Pumariega, Saaket Agashe, Jiachen Yang, Ang Li, and Xin Eric Wang. On the reliability of computer use agents.arXiv preprint arXiv:2604.17849, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

A survey on LLM-as-a-judge.The Innovation, 2024

Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al. A survey on LLM-as-a-judge.The Innovation, 2024

work page 2024
[14]

Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse control tasks through world models.Nature, 640(8059):647–653, 2025

work page 2025
[15]

Olympiad-level formal mathematical reasoning with reinforcement learning.Nature, pages 1–3, 2025

Thomas Hubert, Rishi Mehta, Laurent Sartran, Miklós Z Horváth, Goran Žuži´c, Eric Wieser, Aja Huang, Julian Schrittwieser, Yannick Schroecker, Hussain Masoom, et al. Olympiad-level formal mathematical reasoning with reinforcement learning.Nature, pages 1–3, 2025

work page 2025
[16]

Physical Intelligence, Bo Ai, Ali Amin, Raichelle Aniceto, Ashwin Balakrishna, Greg Balke, Kevin Black, George Bokinsky, Shihao Cao, Thomas Charbonnier, Vedant Choudhary, Foster Collins, Ken Conley, Grace Connors, James Darpinian, Karan Dhabalia, Maitrayee Dhaka, Jared DiCarlo, Danny Driess, Michael Equi, Adnan Esmail, Yunhao Fang, Chelsea Finn, 15 Cather...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[17]

Adaptation of agentic ai: A survey of post-training, memory, and skills.arXiv preprint arXiv:2512.16301, 2026a

Pengcheng Jiang, Jiacheng Lin, Zhiyi Shi, Zifeng Wang, Luxi He, Yichen Wu, Ming Zhong, Peiyang Song, Qizheng Zhang, Heng Wang, et al. Adaptation of agentic AI.arXiv preprint arXiv:2512.16301, 2025

work page arXiv 2025
[18]

SWE-bench: Can language models resolve real-world github issues? In ICLR, 2024

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. SWE-bench: Can language models resolve real-world github issues? In ICLR, 2024

work page 2024
[19]

OS-Harm: A benchmark for measuring safety of computer use agents

Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, J Zico Kolter, Nicolas Flammarion, and Maksym Andriushchenko. OS-Harm: A benchmark for measuring safety of computer use agents. InNeurIPS Datasets and Benchmarks Track, 2025

work page 2025
[20]

Meta-Harness: End-to-End Optimization of Model Harnesses

Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta-harness: End-to-end optimization of model harnesses.arXiv preprint arXiv:2603.28052, 2026

work page internal anchor Pith review arXiv 2026
[21]

Retrieval-augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. InNeurIPS, pages 9459–9474, 2020

work page 2020
[22]

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, et al. SkillsBench: Benchmarking how well agent skills work across diverse tasks.arXiv preprint arXiv:2602.12670, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

Autoharness: improving llm agents by automatically synthesizing a code harness.arXiv preprint arXiv:2603.03329, 2026

Xinghua Lou, Miguel Lázaro-Gredilla, Antoine Dedieu, Carter Wendelken, Wolfgang Lehrach, and Kevin P Murphy. AutoHarness: improving LLM agents by automatically synthesizing a code harness.arXiv preprint arXiv:2603.03329, 2026

work page arXiv 2026
[24]

Training language models to follow instructions with human feedback

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. InNeurIPS, pages 27730–27744, 2022

work page 2022
[25]

Tool learning with foundation models.ACM Computing Surveys, 57(4):1–40, 2024

Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Xuanhe Zhou, Yufei Huang, Chaojun Xiao, et al. Tool learning with foundation models.ACM Computing Surveys, 57(4):1–40, 2024

work page 2024
[26]

Direct preference optimization: your language model is secretly a reward model

Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, and Chelsea Finn. Direct preference optimization: your language model is secretly a reward model. InNeurIPS, pages 53728–53741, 2023

work page 2023
[27]

Mastering Atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Si- mon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. Mastering Atari, go, chess and shogi by planning with a learned model.Nature, 588(7839):604–609, 2020

work page 2020
[28]

A mathematical theory of communication.The Bell System Technical Journal, 27(3):379–423, 1948

Claude Elwood Shannon. A mathematical theory of communication.The Bell System Technical Journal, 27(3):379–423, 1948

work page 1948
[29]

Reflexion: language agents with verbal reinforcement learning

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: language agents with verbal reinforcement learning. InProceedings of the 37th International Conference on Neural Information Processing Systems, pages 8634–8652, 2023. 16

work page 2023
[30]

Cognitive architectures for language agents.Transactions on Machine Learning Research, 2023

Theodore Sumers, Shunyu Yao, Karthik R Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents.Transactions on Machine Learning Research, 2023

work page 2023
[31]

MIT press Cambridge, 1998

Richard S Sutton and Andrew G Barto.Reinforcement Learning: An Introduction. MIT press Cambridge, 1998

work page 1998
[32]

Karlsson, Bo An, Shuicheng Y AN, and Zongqing Lu

Weihao Tan, Wentao Zhang, Xinrun Xu, Haochong Xia, Ziluo Ding, Boyu Li, Bohan Zhou, Junpeng Yue, Jiechuan Jiang, Yewen Li, Ruyi An, Molei Qin, Chuqiao Zong, Longtao Zheng, YuJie Wu, Xiaoqiang Chai, Yifei Bi, Tianbao Xie, Pengjie Gu, Xiyun Li, Ceyao Zhang, Long Tian, Chaojie Wang, Xinrun Wang, Börje F. Karlsson, Bo An, Shuicheng Y AN, and Zongqing Lu. Crad...

work page 2025
[33]

The information bottleneck method

Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottleneck method. arXiv preprint physics/0004057, 2000

work page internal anchor Pith review Pith/arXiv arXiv 2000
[34]

Deep learning and the information bottleneck principle

Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. Ieee, 2015

work page 2015
[35]

Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

work page 2024
[36]

McGraw-Hill, New York, 1954

Hsue Shen Tsien.Engineering Cybernetics. McGraw-Hill, New York, 1954

work page 1954
[37]

Parametrically retargetable decision-makers tend to seek power

Alexander Matt Turner and Prasad Tadepalli. Parametrically retargetable decision-makers tend to seek power. InNeurIPS, pages 31391–31401, 2022

work page 2022
[38]

Cybernetics of cybernetics

Heinz V on Foerster. Cybernetics of cybernetics. InUnderstanding understanding: Essays on cybernetics and cognition, pages 283–286. Springer, 2003

work page 2003
[39]

Grand Central Publishing, 1988

Norbert Wiener.The Human Use of Human Beings: Cybernetics and Society. Grand Central Publishing, 1988

work page 1988
[40]

MIT Press, 2019

Norbert Wiener.Cybernetics or Control and Communication in the Animal and the Machine. MIT Press, 2019

work page 2019
[41]

John wiley & sons, 2009

Michael Wooldridge.An Introduction to Multiagent Systems. John wiley & sons, 2009

work page 2009
[42]

OSWorld: benchmarking multimodal agents for open-ended tasks in real computer environments

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, et al. OSWorld: benchmarking multimodal agents for open-ended tasks in real computer environments. InNeurIPS, pages 52040–52094, 2024

work page 2024
[43]

Sim- pletir: End-to-end reinforcement learning for multi-turn tool-integrated reasoning.arXiv preprint arXiv:2509.02479,

Zhenghai Xue, Longtao Zheng, Qian Liu, Yingru Li, Xiaosen Zheng, Zejun Ma, and Bo An. SimpleTIR: End-to-end reinforcement learning for multi-turn tool-integrated reasoning.arXiv preprint arXiv:2509.02479, 2025

work page arXiv 2025
[44]

arXiv preprint arXiv:2603.01145 , year=

Yutao Yang, Junsong Li, Qianjun Pan, Bihao Zhan, Yuxuan Cai, Lin Du, Jie Zhou, Kai Chen, Qin Chen, Xin Li, et al. Autoskill: Experience-driven lifelong learning via skill self-evolution. arXiv preprint arXiv:2603.01145, 2026

work page arXiv 2026
[45]

On-Policy Context Distillation for Language Models

Tianzhu Ye, Li Dong, Xun Wu, Shaohan Huang, and Furu Wei. On-policy context distillation for language models.arXiv preprint arXiv:2602.12275, 2026

work page internal anchor Pith review arXiv 2026
[46]

Zhang, Z

Hangfan Zhang, Zhiyao Cui, Jianhao Chen, Xinrun Wang, Qiaosheng Zhang, Zhen Wang, Dinghao Wu, and Shuyue Hu. Stop overvaluing multi-agent debate–we must rethink evaluation and embrace model heterogeneity.arXiv preprint arXiv:2502.08788, 2025

work page arXiv 2025
[47]

Darwin G

Jenny Zhang, Shengran Hu, Cong Lu, Robert Lange, and Jeff Clune. Darwin Godel machine: Open-ended evolution of self-improving agents.arXiv preprint arXiv:2505.22954, 2025

work page arXiv 2025
[48]

Hyperagents.arXiv preprint arXiv:2603.19461, 2026

Jenny Zhang, Bingchen Zhao, Wannan Yang, Jakob Foerster, Jeff Clune, Minqi Jiang, Sam Devlin, and Tatiana Shavrina. Hyperagents.arXiv preprint arXiv:2603.19461, 2026. 17

work page arXiv 2026
[49]

A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

Qiyuan Zhang, Fuyuan Lyu, Zexu Sun, Lei Wang, Weixu Zhang, Wenyue Hua, Haolun Wu, Zhihan Guo, Yufei Wang, Niklas Muennighoff, et al. A survey on test-time scaling in large language models: What, how, where, and how well?arXiv preprint arXiv:2503.24235, 2025

work page internal anchor Pith review arXiv 2025
[50]

A multimodal robotic platform for multi-element electrocatalyst discovery.Nature, 647(8089):390–396, 2025

Zhen Zhang, Zhichu Ren, Chia-Wei Hsu, Weibin Chen, Zhang-Wei Hong, Chi-Feng Lee, Aubrey Penn, Hongbin Xu, Daniel J Zheng, Shuhan Miao, et al. A multimodal robotic platform for multi-element electrocatalyst discovery.Nature, 647(8089):390–396, 2025

work page 2025
[51]

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Chenyu Zhou, Huacan Chai, Wenteng Chen, Zihan Guo, Rong Shan, Yuanyi Song, Tianyi Xu, Yingxuan Yang, Aofan Yu, Weiming Zhang, et al. Externalization in LLM agents: A unified review of memory, skills, protocols and harness engineering.arXiv preprint arXiv:2604.08224, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[52]

essential variables

Brianna Zitkovich, Tianhe Yu, Sichun Xu, Peng Xu, Ted Xiao, Fei Xia, Jialin Wu, Paul Wohlhart, Stefan Welker, Ayzaan Wahid, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. InCoRL, pages 2165–2183, 2023. 18 A Frequent Asked Questions (FAQs) A.1 What Are the Novelties of The Two-level Homeostatic Architecture in Agent C...

work page 2023