The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents

· 2025 · cs.SE · arXiv 2511.03690

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open full Pith review browse 10 citing papers arXiv PDF

abstract

Agents are now used widely in the process of software development, but building production-ready software engineering agents is a complex task. Deploying software agents effectively requires flexibility in implementation and experimentation, reliable and secure execution, and interfaces for users to interact with agents. In this paper, we present the OpenHands Software Agent SDK, a toolkit for implementing software development agents that satisfy these desiderata. This toolkit is a complete architectural redesign of the agent components of the popular OpenHands framework for software development agents. To achieve flexibility, we design a simple interface for implementing agents that requires only a few lines of code in the default case, but is easily extensible to more complex full-featured agents with features such as custom tools, memory management, and more. For security and reliability, it delivers seamless local-to-remote execution portability, integrated REST/WebSocket services. For interaction with human users, it can connect directly to a variety of interfaces, such as visual workspaces (VSCode, VNC, browser), command-line interfaces, and APIs. Compared with existing SDKs from OpenAI, Claude and Google, OpenHands uniquely integrates native sandboxed execution, lifecycle control, model-agnostic multi-LLM routing, and built-in security analysis. We validate the architecture empirically: production deployment data shows that V1 substantially reduces system-attributable failures over V0 with negligible event-sourcing overhead, and evaluations across multiple models and benchmarks demonstrate strong agent performance. Put together, these elements allow the OpenHands Software Agent SDK to provide a practical foundation for prototyping, unlocking new classes of custom applications, and reliably deploying agents at scale.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.

SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents

cs.CR · 2026-05-05 · unverdicted · novelty 7.0 · 2 refs

SkCC introduces a typed intermediate representation and compiler pipeline to make LLM agent skills portable across frameworks and enforce security constraints before deployment.

Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems

cs.SE · 2026-05-26 · unverdicted · novelty 6.0

RAMP evaluates 15 models on production-like serial workflows and reports completion rates collapsing from 100% to 20% with none finishing the full pipeline and costs varying by three orders of magnitude.

Shepherd: Enabling Programmable Meta-Agents via Reversible Agentic Execution Traces

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Shepherd provides a reversible execution trace substrate for LLM agents that enables meta-agents to inspect and transform runs, yielding reported gains on coding and terminal benchmarks via supervision, counterfactual repair, and RL credit assignment.

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

cs.CL · 2026-02-10 · conditional · novelty 6.0

EcoGym is a new open benchmark with three economic environments that reveals no leading LLM dominates at sustained plan-and-execute decision making across scenarios.

Code as Agent Harness

cs.CL · 2026-05-18 · accept · novelty 5.0

A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development

cs.SE · 2026-05-19 · unverdicted · novelty 4.0

Agentic Agile-V uses Agile-V as backbone and a Specify-Constrain-Orchestrate-Prove-Evolve-Verify loop to convert AI agent conversations into traceable engineering artifacts with acceptance evidence.

Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models

cs.CL · 2026-04-22 · unverdicted · novelty 4.0

A 3B model with few-shot prompting reaches 79.7% of GPT-5 tool-use performance while a hypernetwork adaptation adds zero measurable benefit across four benchmarks.

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

cs.AI · 2026-04-20

citing papers explorer

Showing 10 of 10 citing papers.

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition cs.CL · 2026-05-12 · unverdicted · none · ref 62 · internal anchor
DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.
SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents cs.CR · 2026-05-05 · unverdicted · none · ref 45 · 2 links · internal anchor
SkCC introduces a typed intermediate representation and compiler pipeline to make LLM agent skills portable across frameworks and enforce security constraints before deployment.
Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems cs.SE · 2026-05-26 · unverdicted · none · ref 25 · internal anchor
RAMP evaluates 15 models on production-like serial workflows and reports completion rates collapsing from 100% to 20% with none finishing the full pipeline and costs varying by three orders of magnitude.
Shepherd: Enabling Programmable Meta-Agents via Reversible Agentic Execution Traces cs.AI · 2026-05-11 · unverdicted · none · ref 39 · 2 links · internal anchor
Shepherd provides a reversible execution trace substrate for LLM agents that enables meta-agents to inspect and transform runs, yielding reported gains on coding and terminal benchmarks via supervision, counterfactual repair, and RL credit assignment.
EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies cs.CL · 2026-02-10 · conditional · none · ref 36 · internal anchor
EcoGym is a new open benchmark with three economic environments that reveals no leading LLM dominates at sustained plan-and-execute decision making across scenarios.
Code as Agent Harness cs.CL · 2026-05-18 · accept · none · ref 254 · internal anchor
A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 149 · internal anchor
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development cs.SE · 2026-05-19 · unverdicted · none · ref 23 · internal anchor
Agentic Agile-V uses Agile-V as backbone and a Specify-Constrain-Orchestrate-Prove-Evolve-Verify loop to convert AI agent conversations into traceable engineering artifacts with acceptance evidence.
Meta-Tool: Efficient Few-Shot Tool Adaptation for Small Language Models cs.CL · 2026-04-22 · unverdicted · none · ref 50 · internal anchor
A 3B model with few-shot prompting reaches 79.7% of GPT-5 tool-use performance while a hypernetwork adaptation adds zero measurable benefit across four benchmarks.
ClawEnvKit: Automatic Environment Generation for Claw-Like Agents cs.AI · 2026-04-20 · unreviewed · ref 97 · internal anchor

The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer