How AI Coding Agents Modify Code: A Large-Scale Study of GitHub Pull Requests

Daniel Ogenrwot, John Businge

Authors on Pith no claims yet

classification 💻 cs.SE cs.AI

keywords agentscodingcommitshumanagenticcodedevelopmentdiffer

read the original abstract

AI coding agents are increasingly acting as autonomous contributors by generating and submitting pull requests (PRs). However, we lack empirical evidence on how these agent-generated PRs differ from human contributions, particularly in how they modify code and describe their changes. Understanding these differences is essential for assessing their reliability and impact on development workflows. Using the MSR 2026 Mining Challenge version of the AIDev dataset, we analyze 24,014 merged Agentic PRs (440,295 commits) and 5,081 merged Human PRs (23,242 commits). We examine additions, deletions, commits, and files touched, and evaluate the consistency between PR descriptions and their diffs using lexical and semantic similarity. Agentic PRs differ substantially from Human PRs in commit count (Cliff's $\delta = 0.5429$) and show moderate differences in files touched and deleted lines. They also exhibit slightly higher description-to-diff similarity across all measures. These findings provide a large-scale empirical characterization of how AI coding agents contribute to open source development.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub
cs.SE 2026-04 accept novelty 7.0

AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.
AIRA: AI-Induced Risk Audit: A Structured Inspection Framework for AI-Generated Code
cs.SE 2026-04 unverdicted novelty 5.0

AIRA is a 15-check audit framework that finds AI-generated code has 1.8 times more high-severity failure-untruthful patterns than human-written code in a matched replication study.