DAIRA integrates dynamic tracing into LLM agents to achieve 79.4% resolution rate on SWE-bench Verified for code defect repair.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 7roles
background 2polarities
background 2representative citing papers
The first empirical study of test overfitting shows that auto-generated tests from issues can lead to code that passes observed tests but misses important cases or breaks functionality in SWE-bench issue resolution.
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.
Empirical study of 3977 agent trajectories finds Python execution errors correlate with lower success rates on GitHub issues, flags challenging errors, and reports three confirmed bugs in the SWE-Bench platform.
Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.
Agent-generated tests mainly act as observational feedback channels and do not meaningfully improve issue resolution success in current LLM software engineering agents.
Auto-Diagnose applies LLMs to summarize and diagnose root causes of integration test failures, reporting 90.14% accuracy on 71 manual cases and positive adoption after Google-wide rollout.
citing papers explorer
-
Dynamic analysis enhances issue resolution
DAIRA integrates dynamic tracing into LLM agents to achieve 79.4% resolution rate on SWE-bench Verified for code defect repair.
-
Investigating Test Overfitting on SWE-bench
The first empirical study of test overfitting shows that auto-generated tests from issues can lead to code that passes observed tests but misses important cases or breaks functionality in SWE-bench issue resolution.
-
Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints
Agent-CoEvo is a multi-agent LLM framework that coevolves code patches and test patches to resolve repository-level issues, outperforming fixed-test baselines on SWE-bench Lite and SWT-bench Lite.
-
Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios
Empirical study of 3977 agent trajectories finds Python execution errors correlate with lower success rates on GitHub issues, flags challenging errors, and reports three confirmed bugs in the SWE-Bench platform.
-
Agentless: Demystifying LLM-based Software Engineering Agents
Agentless, a basic three-phase LLM pipeline for bug localization, repair, and validation, outperforms complex open-source agents on SWE-bench Lite with 32% success rate at $0.70 cost.
-
Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents
Agent-generated tests mainly act as observational feedback channels and do not meaningfully improve issue resolution success in current LLM software engineering agents.
-
LLM-Based Automated Diagnosis Of Integration Test Failures At Google
Auto-Diagnose applies LLMs to summarize and diagnose root causes of integration test failures, reporting 90.14% accuracy on 71 manual cases and positive adoption after Google-wide rollout.