Recognition: unknown
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems
Pith reviewed 2026-05-10 14:27 UTC · model grok-4.3
The pith
Claude Code's architecture is shaped by five human values that lead to concrete choices in permissions, context management, and extensibility.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Claude Code centers on a simple while-loop that calls the model, runs tools, and repeats, yet most of its code resides in surrounding systems: a permission framework with seven modes and an ML-based classifier, a five-layer compaction pipeline for context management, four extensibility mechanisms (MCP, plugins, skills, and hooks), a subagent delegation mechanism with worktree isolation, and append-oriented session storage. The authors trace these elements to five human values and thirteen design principles, then contrast the resulting architecture with OpenClaw to illustrate how deployment context alters the concrete answers to recurring design questions.
What carries the argument
The core while-loop for model-tool iteration, surrounded by a permission system, compaction pipeline, extensibility mechanisms, subagent isolation, and append-only session storage that together realize the design principles.
If this is right
- The same design questions yield different architectural answers when deployment context changes from CLI to gateway.
- Per-action safety classification versus perimeter-level access control represents a key divergence driven by context.
- Context-window extensions versus gateway-wide capability registration address similar needs with different mechanisms.
- Future agent systems should explicitly address the six open design directions identified from empirical, architectural, and policy literature.
- Tracing values through principles to implementations provides a reusable lens for evaluating other agentic coding tools.
Where Pith is reading between the lines
- This value-to-implementation mapping could serve as a checklist for open-source agent developers to audit alignment with user priorities.
- Policy discussions around AI agents could reference these principles when balancing automation with human oversight requirements.
- Empirical user studies might test whether systems explicitly built on these values produce higher trust or fewer errors in long coding sessions.
- The approach could extend to partial analyses of other commercial agents if documentation or API behavior is made available.
Load-bearing premise
That the publicly available TypeScript source code and the authors' interpretive mapping sufficiently reveal the true motivating human values and that the thirteen design principles comprehensively capture the architecture without selection bias.
What would settle it
A line-by-line review of the Claude Code TypeScript codebase that finds the permission modes, compaction layers, or other listed components bear no traceable link to the five stated values, or that identifies major architectural pieces absent from the thirteen principles.
read the original abstract
Claude Code is an agentic coding tool that can run shell commands, edit files, and call external services on behalf of the user. This study describes its comprehensive architecture by analyzing the publicly available TypeScript source code and further comparing it with OpenClaw, an independent open-source AI agent system that answers many of the same design questions from a different deployment context. Our analysis identifies five human values, philosophies, and needs that motivate the architecture (human decision authority, safety and security, reliable execution, capability amplification, and contextual adaptability) and traces them through thirteen design principles to specific implementation choices. The core of the system is a simple while-loop that calls the model, runs tools, and repeats. Most of the code, however, lives in the systems around this loop: a permission system with seven modes and an ML-based classifier, a five-layer compaction pipeline for context management, four extensibility mechanisms (MCP, plugins, skills, and hooks), a subagent delegation mechanism with worktree isolation, and append-oriented session storage. A comparison with OpenClaw, a multi-channel personal assistant gateway, shows that the same recurring design questions produce different architectural answers when the deployment context changes: from per-action safety classification to perimeter-level access control, from a single CLI loop to an embedded runtime within a gateway control plane, and from context-window extensions to gateway-wide capability registration. We finally identify six open design directions for future agent systems, grounded in recent empirical, architectural, and policy literature.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes the architecture of Claude Code, an agentic coding tool, by inspecting its publicly available TypeScript source code. It identifies five motivating human values (human decision authority, safety and security, reliable execution, capability amplification, contextual adaptability) and traces them through thirteen design principles to concrete mechanisms including a seven-mode permission system with ML classifier, five-layer context compaction pipeline, four extensibility mechanisms (MCP, plugins, skills, hooks), subagent worktree isolation, and append-oriented session storage. A comparison with OpenClaw illustrates how the same design questions yield different solutions under different deployment contexts, and the work concludes with six open design directions for future AI agent systems grounded in empirical and policy literature.
Significance. If the interpretive mapping holds, the paper offers a useful case study for software engineering researchers and practitioners working on AI agents, by concretely linking high-level values to low-level implementation choices and showing context-dependent trade-offs via the OpenClaw contrast. The enumeration of open directions provides a starting point for future work. The analysis is grounded in real artifacts rather than abstract models, which strengthens its potential utility for system designers.
major comments (2)
- [Sections describing the value-to-principle tracing and source-code analysis] The central claim—that five specific human values motivate the architecture and are systematically traced through thirteen design principles—rests on manual code inspection without any described formal method (e.g., coding protocol, decision criteria, or inter-rater process) for principle extraction. This interpretive step is load-bearing for the narrative and the subsequent OpenClaw comparison.
- [Comparison with OpenClaw] The comparison section asserts that deployment context produces different architectural answers (e.g., per-action safety classification vs. perimeter control), but provides no systematic evaluation framework or metrics to substantiate that the differences are attributable to context rather than author selection.
minor comments (2)
- [Abstract and Introduction] The abstract and introduction would benefit from an explicit statement of the analysis scope (e.g., which version of the TypeScript codebase was examined and the date of inspection) to aid reproducibility.
- [Implementation details of the permission system] Figure captions and the description of the seven-mode permission system could clarify how the ML-based classifier integrates with the modes, as the current text leaves the interaction somewhat implicit.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights opportunities to strengthen the transparency of our interpretive analysis and the framing of the OpenClaw comparison. We address each major comment below and outline targeted revisions.
read point-by-point responses
-
Referee: [Sections describing the value-to-principle tracing and source-code analysis] The central claim—that five specific human values motivate the architecture and are systematically traced through thirteen design principles—rests on manual code inspection without any described formal method (e.g., coding protocol, decision criteria, or inter-rater process) for principle extraction. This interpretive step is load-bearing for the narrative and the subsequent OpenClaw comparison.
Authors: We agree that greater methodological transparency would strengthen the paper. The analysis was performed via iterative manual inspection of the publicly available TypeScript source, beginning with identification of the core model-tool loop and then examining surrounding subsystems (permissions, context compaction, extensibility, delegation, and storage) for recurring patterns. Values were mapped based on alignment between implementation choices and documented design goals in code comments, error handling, and user-facing safeguards. To address the concern, we will add a dedicated 'Analysis Methodology' subsection that explicitly describes this process, the decision criteria for extracting the thirteen principles, and the rationale for linking them to the five values. This will make the interpretive steps reproducible in principle while preserving the exploratory nature of the study. revision: yes
-
Referee: [Comparison with OpenClaw] The comparison section asserts that deployment context produces different architectural answers (e.g., per-action safety classification vs. perimeter control), but provides no systematic evaluation framework or metrics to substantiate that the differences are attributable to context rather than author selection.
Authors: The OpenClaw section is presented as an illustrative contrast to demonstrate how identical design questions receive different answers under different deployment constraints, rather than as a controlled empirical comparison. We do not claim statistical attribution or provide quantitative metrics because the intent is to surface concrete trade-offs for system designers. We will revise the section to (a) explicitly label it as an illustrative case study, (b) add a side-by-side table summarizing the six recurring design questions and their resolutions in each system, and (c) include a short limitations paragraph acknowledging that observed differences may also reflect project scope, developer priorities, and implementation timelines in addition to deployment context. revision: partial
Circularity Check
No circularity: purely descriptive mapping of external code artifacts
full rationale
The paper's central activity is manual inspection of publicly available TypeScript source code for Claude Code, followed by interpretive labeling of observed mechanisms with five human values and thirteen design principles. No equations, fitted parameters, predictions, or first-principles derivations exist. The claimed tracing from values to principles to implementations is performed by author inspection rather than by any self-referential reduction or self-citation chain that would make the output equivalent to the input by construction. External comparison with OpenClaw is likewise descriptive. Per the evaluation rules, this is self-contained interpretive analysis with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Publicly available TypeScript source code accurately and completely represents the deployed system's design decisions and motivations.
Forward citations
Cited by 3 Pith papers
-
Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours
An agentic red teaming system automates creation of adversarial testing workflows from natural language goals, unifying ML and generative AI attacks and achieving 85% success rate on Meta Llama Scout with no custom hu...
-
HARBOR: Automated Harness Optimization
HARBOR formalizes harness optimization as constrained noisy Bayesian optimization over mixed-variable spaces and reports a case study where it outperforms manual tuning on a production coding agent.
-
Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification
DEMM defines four executable evidence-sufficiency categories plus a conflicting category for agentic AI decisions and rolls per-property verdicts into a five-level maturity rubric.
Reference graph
Works this paper leans on
-
[1]
Anthropic PBC, no
Bartz v. Anthropic PBC, no. 3:24-cv-05417-WHA. U.S. District Court for the Northern District of California, Order on Motion for Summary Judgment (June 23, 2025), Alsup, J. Court docket:https://www.courtlistener.com/docket/ 69058235/bartz-v-anthropic-pbc/,
2025
-
[2]
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, et al. Do as i can, not as i say: Grounding language in robotic affordances.arXiv preprint arXiv:2204.01691,
work page internal anchor Pith review arXiv
-
[3]
Aizierjiang Aiersilan. The vibe-check protocol: Quantifying cognitive offloading in ai programming.arXiv preprint arXiv:2601.02410,
-
[4]
InversePrompt: Turning claude against itself, one prompt at a time
Elad Beber. InversePrompt: Turning claude against itself, one prompt at a time. https://cymulate.com/blog/ cve-2025-547954-54795-claude-inverseprompt/,
2025
-
[5]
CVE-2025-54794, CVE-2025-54795; updated April 6,
2025
-
[6]
Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein. Measuring the impact of early-2025 ai on experienced open-source developer productivity.arXiv preprint arXiv:2507.09089,
-
[7]
International ai safety report 2026.arXiv preprint arXiv:2602.21012, 2026
Yoshua Bengio, Stephen Clare, Carina Prunkl, Maksym Andriushchenko, Ben Bucknall, Malcolm Murray, Rishi Bommasani, Stephen Casper, Tom Davidson, Raymond Douglas, et al. International ai safety report 2026.arXiv preprint arXiv:2602.21012,
-
[8]
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Johan Bjorck, Fernando Castañeda, Nikita Cherniadev, Xingye Da, Runyu Ding, Linxi Fan, Yu Fang, Dieter Fox, Fengyuan Hu, Spencer Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734,
work page internal anchor Pith review arXiv
-
[9]
Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Lachy Groom, Karol Hausman, Brian Ichter, et al.π0: A vision-language-action flow model for general robot control.arXiv preprint arXiv:2410.24164,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
38 Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, et al. Rt-2: Vision-language-action models transfer web knowledge to robotic control, 2023.URL https://arxiv. org/abs/2307.15818, 1:2,
work page internal anchor Pith review arXiv 2023
-
[11]
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multi-agent llm systems fail?arXiv preprint arXiv:2503.13657,
work page internal anchor Pith review arXiv
-
[12]
Evaluating Large Language Models Trained on Code
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde De Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Need help? designing proactive ai assistants for programming
Valerie Chen, Alan Zhu, Sebastian Zhao, Hussein Mozannar, David Sontag, and Ameet Talwalkar. Need help? designing proactive ai assistants for programming. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–18,
2025
-
[14]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413,
work page internal anchor Pith review arXiv
-
[15]
Caught in the hook: RCE and API token ex- filtration through Claude Code project files
Aviv Donenfeld and Oded Vanunu. Caught in the hook: RCE and API token ex- filtration through Claude Code project files. https://research.checkpoint.com/2026/ rce-and-api-token-exfiltration-through-claude-code-project-files-cve-2025-59536/ ,
2026
-
[16]
Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch
CVE-2025-59536 (CVSS 8.7), CVE-2026-21852 (CVSS 5.3). Yilun Du, Shuang Li, Antonio Torralba, Joshua B Tenenbaum, and Igor Mordatch. Improving factuality and reasoning in language models through multiagent debate. InForty-first international conference on machine learning,
2025
-
[17]
Paul Gauthier. Aider: AI pair programming in your terminal, 2024.https://github.com/Aider-AI/aider. Open-source software,https://aider.chat. 39 Juraj Gottweis, Wei-Hung Weng, Alexander Daryin, Tao Tu, Anil Palepu, Petar Sirkovic, Artiom Myaskovsky, Felix Weissenberger, Keran Rong, Ryutaro Tanno, et al. Towards an ai co-scientist.arXiv preprint arXiv:2502.18864,
work page internal anchor Pith review arXiv 2024
-
[18]
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xian- gliang Zhang. Large language model based multi-agents: A survey of progress and challenges.arXiv preprint arXiv:2402.01680,
work page internal anchor Pith review arXiv
-
[19]
Hao He, Courtney Miller, Shyam Agarwal, Christian Kästner, and Bogdan Vasilescu. Speed at the cost of quality: How cursor ai increases short-term velocity and long-term complexity in open-source projects.arXiv preprint arXiv:2511.04427,
-
[20]
arXiv preprint arXiv:2408.08435 , year=
Shengran Hu, Cong Lu, and Jeff Clune. Automated design of agentic systems.arXiv preprint arXiv:2408.08435,
-
[21]
Memory in the Age of AI Agents
Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, et al. Memory in the age of ai agents.arXiv preprint arXiv:2512.13564,
work page internal anchor Pith review arXiv
-
[22]
Wei-Chieh Huang, Weizhi Zhang, Yueqing Liang, Yuanchen Bei, Yankai Chen, Tao Feng, Xinyu Pan, Zhen Tan, Yu Wang, Tianxin Wei, et al. Rethinking memory mechanisms of foundation agents in the second half.arXiv preprint arXiv:2602.06052,
-
[23]
Agents.https://huyenchip.com/2025/01/07/agents.html,
Chip Huyen. Agents.https://huyenchip.com/2025/01/07/agents.html,
2025
-
[24]
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues?arXiv preprint arXiv:2310.06770,
work page internal anchor Pith review Pith/arXiv arXiv
-
[25]
Sayash Kapoor, Benedikt Stroebl, Zachary S Siegel, Nitya Nadgir, and Arvind Narayanan. Ai agents that matter. arXiv preprint arXiv:2407.01502,
-
[26]
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
November 2023; popularizes the LLM-as-OS framing. Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, et al. Dspy: Compiling declarative language model calls into self-improving pipelines.arXiv preprint arXiv:2310.03714,
work page internal anchor Pith review arXiv 2023
-
[27]
Nataliya Kosmyna, Eugene Hauptmann, Ye Tong Yuan, Jessica Situ, Xian-Hao Liao, Ashly Vivian Beresnitzky, Iris Braunstein, and Pattie Maes. Your brain on chatgpt: Accumulation of cognitive debt when using an ai assistant for essay writing task.arXiv preprint arXiv:2506.08872, 4,
-
[28]
LangGraph: Build resilient language agents as graphs, 2024.https://github.com/langchain-ai/ langgraph
LangChain, Inc. LangGraph: Build resilient language agents as graphs, 2024.https://github.com/langchain-ai/ langgraph. GitHub repository. Geonsun Lee, Min Xia, Nels Numan, Xun Qian, David Li, Yanhe Chen, Achin Kulshrestha, Ishan Chatterjee, Yinda Zhang, Dinesh Manocha, et al. Sensible agent: A framework for unobtrusive interaction with proactive ar agents...
2024
-
[29]
Encouraging divergent thinking in large language models through multi-agent debate
Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Shuming Shi, and Zhaopeng Tu. Encouraging divergent thinking in large language models through multi-agent debate. InProceedings of the 2024 conference on empirical methods in natural language processing, pages 17889–17904,
2024
-
[30]
Proactive conversational agents with inner thoughts
Xingyu Bruce Liu, Shitao Fang, Weiyan Shi, Chien-Sheng Wu, Takeo Igarashi, and Xiang’Anthony’ Chen. Proactive conversational agents with inner thoughts. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–19,
2025
-
[31]
Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild
Yue Liu, Ratnadira Widyasari, Yanjie Zhao, Ivana Clairine Irsan, and David Lo. Debt behind the ai boom: A large-scale empirical study of ai-generated code in the wild.arXiv preprint arXiv:2603.28592,
work page internal anchor Pith review Pith/arXiv arXiv
-
[32]
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292,
work page internal anchor Pith review arXiv
-
[33]
Agent design patterns.https://rlancemartin.github.io/2026/01/09/agent_design/,
Lance Martin. Agent design patterns.https://rlancemartin.github.io/2026/01/09/agent_design/,
2026
-
[34]
Luca Nannini, Adam Leon Smith, Michele Joshua Maggini, Enrico Panai, Sandra Feliciano, Aleksandr Tiulkanov, Elena Maran, James Gealy, and Piercosma Bisconti. Ai agents under eu law.arXiv preprint arXiv:2604.04604,
work page internal anchor Pith review Pith/arXiv arXiv
-
[35]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Alexander Novikov, Ngân V˜ u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131,
work page internal anchor Pith review arXiv
-
[36]
Beyond reactivity: Measuring proactive problem solving in llm agents
Gil Pasternak, Dheeraj Rajagopal, Julia White, Dhruv Atreja, Matthew Thomas, George Hurn-Maloney, and Ash Lewis. Beyond reactivity: Measuring proactive problem solving in llm agents.arXiv preprint arXiv:2510.19771,
- [37]
-
[38]
Do users write more insecure code with ai assistants? InProceedings of the 2023 ACM SIGSAC conference on computer and communications security, pages 2785–2799,
Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh. Do users write more insecure code with ai assistants? InProceedings of the 2023 ACM SIGSAC conference on computer and communications security, pages 2785–2799,
2023
-
[39]
Assistance or disruption? exploring and evaluating the design and trade-offs of proactive ai programming support
Kevin Pu, Daniel Lazaro, Ian Arawjo, Haijun Xia, Ziang Xiao, Tovi Grossman, and Yan Chen. Assistance or disruption? exploring and evaluating the design and trade-offs of proactive ai programming support. InProceedings of the 2025 CHI conference on human factors in computing systems, pages 1–21,
2025
-
[40]
How to stay ahead of AI as an early-career engineer.IEEE Spectrum, 2025.https://spectrum.ieee
Gwendolyn Rak. How to stay ahead of AI as an early-career engineer.IEEE Spectrum, 2025.https://spectrum.ieee. org/ai-effect-entry-level-jobs. Charles Reis and Steven D Gribble. Isolating web programs in modern browser architectures. InProceedings of the 4th ACM European conference on Computer systems, pages 219–232,
2025
-
[41]
How ai impacts skill formation.arXiv preprint arXiv:2601.20245,
Judy Hanwen Shen and Alex Tamkin. How ai impacts skill formation.arXiv preprint arXiv:2601.20245,
-
[42]
The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems
Leon Staufer, Kevin Feng, Kevin Wei, Luke Bailey, Yawen Duan, Mick Yang, A Pinar Ozisik, Stephen Casper, and Noam Kolt. The 2025 ai agent index: Documenting technical and safety features of deployed agentic ai systems. arXiv preprint arXiv:2602.17753,
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study
Open-source multi-channel AI assistant gateway. MIT License. Viktoria Stray, Elias Goldmann Brandtzæg, Viggo Tellefsen Wivestad, Astri Barbala, and Nils Brede Moe. Devel- oper productivity with and without github copilot: A longitudinal mixed-methods case study.arXiv preprint arXiv:2509.20353,
work page internal anchor Pith review Pith/arXiv arXiv
-
[44]
Yifan Sui, Han Zhao, Rui Ma, Zhiyuan He, Hao Wang, Jianxun Li, and Yuqing Yang. Act while thinking: Accelerating llm agents via pattern-aware speculative tool execution.arXiv preprint arXiv:2603.18897,
-
[45]
Training proactive and personalized llm agents.arXiv preprint arXiv:2511.02208, 2025
Weiwei Sun, Xuhui Zhou, Weihua Du, Xingyao Wang, Sean Welleck, Graham Neubig, Maarten Sap, and Yiming Yang. Training proactive and personalized llm agents.arXiv preprint arXiv:2511.02208,
-
[46]
com/atlas/ai-infrastructure-roadmap-five-frontiers-for-2026,
Bessemer Venture Partners,https://www.bvp. com/atlas/ai-infrastructure-roadmap-five-frontiers-for-2026,
2026
-
[47]
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Voyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291,
work page internal anchor Pith review Pith/arXiv arXiv
-
[48]
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Xingyao Wang, Boxuan Li, Yufan Song, Frank F Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, et al. Openhands: An open platform for ai software developers as generalist agents.arXiv preprint arXiv:2407.16741, 2024b. Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, and Graham Neubig. Agent workflow memory.arXiv preprint arXiv...
work page internal anchor Pith review arXiv 2023
- [49]
-
[50]
Ai agent systems: Architectures, applications, and evaluation,
Bin Xu. Ai agent systems: Architectures, applications, and evaluation.arXiv preprint arXiv:2601.01743,
-
[51]
A-MEM: Agentic Memory for LLM Agents
Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents.arXiv preprint arXiv:2502.12110,
work page internal anchor Pith review arXiv
-
[52]
Shunyu Yao, Noah Shinn, Pedram Razavi, and Karthik Narasimhan.τ-bench: A benchmark for tool-agent-user interaction in real-world domains.arXiv preprint arXiv:2406.12045,
work page internal anchor Pith review arXiv
-
[53]
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, et al. Agentic context engineering: Evolving contexts for self-improving language models.arXiv preprint arXiv:2510.04618, 2025a. Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen, Jieming Zhu, Zhenhua Dong, an...
work page internal anchor Pith review arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.