pith. machine review for the scientific record. sign in

arxiv: 2512.02393 · v3 · submitted 2025-12-02 · 💻 cs.SE · cs.AI· cs.CL

Recognition: unknown

Process-Centric Analysis of Agentic Software Systems

Authors on Pith no claims yet
classification 💻 cs.SE cs.AIcs.CL
keywords systemsagenticgraphectoryissuessoftwareagentsoftenstrategies
0
0 comments X
read the original abstract

Agentic systems are modern software systems: they consist of orchestrated modules, expose interfaces, and are deployed in software pipelines. Unlike conventional programs, their execution, i.e., trajectories, is inherently stochastic and adaptive to the problems they solve. Evaluation of such systems is often outcome-centric. This narrow focus overlooks detailed insights, failing to explain how agents reason, plan, act, or change their strategies. Inspired by the structured representation of conventional software systems as graphs, we introduce Graphectory to systematically encode the temporal and semantic relations in such systems. Using Graphectory, we automatically analyze 4000 trajectories of two dominant agentic programming workflows, SWE-agent and OpenHands, with four backbone Large Language Models (LLMs), attempting to resolve SWE-bench issues. Our automated analyses (completed within four minutes) reveal that: (1) agents using richer prompts or stronger LLMs exhibit more complex Graphectory, reflecting deeper exploration, broader context gathering, and more thorough validation; (2) agents' strategies vary with problem difficulty and the underlying LLM - for resolved issues, strategies often follow coherent localization-patching-validation steps, while unresolved ones exhibit chaotic or backtracking behaviors; and (3) even successful agentic systems often display inefficient processes. We also implement a novel technique for real-time construction and analysis of Graphectory and Langutory during agent execution to flag trajectory issues. Upon detecting such issues, the technique notifies the agent with a diagnostic message and, when applicable, rolls back the trajectory. Experiments show that online monitoring and interventions improve resolution rates by 6.9%-23.5% across models for problematic instances, while significantly shortening trajectories with near-zero overhead.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Evaluating Plan Compliance in Autonomous Programming Agents

    cs.SE 2026-04 unverdicted novelty 7.0

    Autonomous programming agents frequently fail to follow instructed plans, falling back on incomplete internalized workflows, while standard plans and periodic reminders improve performance but poor plans can degrade i...

  2. ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories

    cs.SE 2026-04 unverdicted novelty 7.0

    ReCodeAgent uses a multi-agent system to translate and validate large code repositories across multiple programming languages, achieving 60.8% higher test pass rates than prior neuro-symbolic and agentic methods on 11...

  3. AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits

    cs.SE 2026-04 conditional novelty 7.0

    AgentSZZ is an LLM-agent framework that identifies bug-inducing commits with up to 27.2% higher F1 scores than prior methods by enabling adaptive exploration and causal tracing, especially for cross-file and ghost commits.

  4. Beyond Resolution Rates: Behavioral Drivers of Coding Agent Success and Failure

    cs.SE 2026-04 accept novelty 7.0

    Large-scale trajectory analysis of 19 coding agents on 500 tasks finds that LLM choice drives outcomes more than framework design and that context-gathering plus validation behaviors improve success beyond task diffic...