Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

· 2026 · cs.SE · arXiv 2604.14228

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

open full Pith review browse 9 citing papers arXiv PDF

abstract

Claude Code is an agentic coding tool that can run shell commands, edit files, and call external services on behalf of the user. This study describes its architecture by analyzing the publicly available source code and comparing it with two independent open-source AI agent systems, OpenClaw and Hermes Agent, that answer many of similar or even the same design questions. Our analysis identifies five human values, philosophies, and needs that motivate the architecture: human decision authority, safety, security, and privacy, reliable execution, capability amplification, and contextual adaptability. We then trace them through thirteen design principles to implementation choices. The core of the system is a simple while-loop that calls the model, runs tools, and repeats. Most of the code, however, lives in the systems around this loop: a permission system with seven modes and an ML-based classifier, a five-layer compaction pipeline for context management, four extensibility mechanisms (MCP, plugins, skills, and hooks), a subagent delegation and orchestration mechanism, and append-oriented session storage. Comparisons with OpenClaw and Hermes Agent show that the same design questions produce different answers across three deployment contexts. Claude Code emphasizes per-action safety, OpenClaw emphasizes perimeter-level access, and Hermes renders per-action approvals across many surfaces. At the runtime layer, Claude Code uses a single CLI loop, OpenClaw embeds the runtime within a gateway control plane, and Hermes uses one process whose role is set by its entry point. At the context and extension layer, Claude Code extends the context window, OpenClaw registers gateway-wide capabilities, and Hermes provides pluggable memory and model backends. We finally identify six open design directions for future agent systems, grounded in recent empirical, architectural, and policy literature.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Self-Harness: Harnesses That Improve Themselves

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

Self-Harness lets LLM agents autonomously refine their interaction harnesses through weakness mining, proposal generation, and validation, raising held-out pass rates on Terminal-Bench-2.0 from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1% across three models.

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

cs.AI · 2026-05-29 · unverdicted · novelty 7.0

Proposes the Intelligent Computing Architecture (ICA) as a six-layer framework with dual probabilistic-deterministic planes and three Amdahl-style heuristics to unify design of LLM-based systems.

When Video Misreads: Closed-Loop Distillation of Reading Heuristics for Exploratory Manipulation Trace QA

cs.RO · 2026-06-07 · unverdicted · novelty 6.0

Closed-Loop Trace Distillation distills one-line natural-language prompts from labeled training traces to improve VLM accuracy on predicting minimal-success action chains in Exploratory Manipulation Trace QA by 0.38-0.47 across simulator and real-robot tasks.

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

An agentic red teaming system automates creation of adversarial testing workflows from natural language goals, unifying ML and generative AI attacks and achieving 85% success rate on Meta Llama Scout with no custom human code.

HARBOR: Automated Harness Optimization

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

HARBOR formalizes harness optimization as constrained noisy Bayesian optimization over mixed-variable spaces and reports a case study where it outperforms manual tuning on a production coding agent.

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

cs.CR · 2026-05-31 · unverdicted · novelty 5.0

BraveGuard trains guard models on realistic agent trajectories derived from open-world threats, raising detection accuracy on AgentHazard from 38.79% to 82.38%.

Code as Agent Harness

cs.CL · 2026-05-18 · accept · novelty 5.0

A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification

cs.CY · 2026-04-29 · unverdicted · novelty 4.0

DEMM defines four executable evidence-sufficiency categories plus a conflicting category for agentic AI decisions and rolls per-property verdicts into a five-level maturity rubric.

"Skill issues'': data-centric optimization of lakehouse agents

cs.AI · 2026-05-31 · unverdicted · novelty 3.0

Data-centric optimization of skills for agents on a branching lakehouse improves accuracy by 31.9% on 25 tasks via state-verification evaluation.

citing papers explorer

Showing 1 of 1 citing paper after filters.

When Video Misreads: Closed-Loop Distillation of Reading Heuristics for Exploratory Manipulation Trace QA cs.RO · 2026-06-07 · unverdicted · none · ref 21 · internal anchor
Closed-Loop Trace Distillation distills one-line natural-language prompts from labeled training traces to improve VLM accuracy on predicting minimal-success action chains in Exploratory Manipulation Trace QA by 0.38-0.47 across simulator and real-robot tasks.

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer