Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance

Giovanni Pinna , Jingzhi Gong , David Williams , Federica Sarro

Authors on Pith no claims yet

classification 💻 cs.SE

keywords acceptancetaskagentsacrossanalysistaskscategoriesclaude

read the original abstract

The rapid adoption of AI-powered coding assistants is transforming software development practices, yet systematic comparisons of their effectiveness across different task types and over time remain limited. This paper presents an empirical study comparing five popular agents (OpenAI Codex, GitHub Copilot, Devin, Cursor, and Claude Code), analyzing 7,156 pull requests (PRs) from the AIDev dataset. Temporal trend analysis reveals heterogeneous evolution patterns: Devin exhibits the only consistent positive trend in acceptance rate (+0.77% per week over 32 weeks), whereas other agents remain largely stable. Our analysis suggests that the PR task type is a dominant factor influencing acceptance rates: documentation tasks achieve 82.1% acceptance compared to 66.1% for new features - a 16 percentage point gap that exceeds typical inter-agent variance for most tasks. OpenAI Codex achieves consistently high acceptance rates across all nine task categories (59.6%-88.6%), with stratified Chi-square tests confirming statistically significant advantages over other agents in several task categories. However, no single agent performs best across all task types: Claude Code leads in documentation (92.3%) and features (72.6%), while Cursor excels in fix tasks (80.4%).

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles
cs.SE 2026-05 unverdicted novelty 7.0

AI coding agents are classified along a Collaborator-Assistant spectrum using an Initiator x Approver taxonomy on 29,585 PR lifecycles, revealing agent initiation in collaborator tools but near-universal human merge g...
Collaborator or Assistant? How AI Coding Agents Partition Work Across Pull Request Lifecycles
cs.SE 2026-05 unverdicted novelty 6.0

AI coding tools divide into collaborators that initiate most PRs and assistants that support human-led ones, yet humans retain merge authority across all five tools examined.