Scaling Reproducibility: An AI-Assisted Workflow for Large-Scale Replication and Reanalysis
read the original abstract
Computational reproducibility is central to scientific credibility, yet verifying published results at scale remains costly. We develop an AI-assisted workflow for automated full-paper replication -- retrieving materials, reconstructing environments, executing code, and matching outputs to point estimates reported in regression tables. We define a universe of all empirical and quantitative papers from the three top political science journals (2010--2025) and measure stated data availability using automated extraction. For a stratified sample of 384 studies, we apply the workflow to conduct full-paper replication, totaling 3,523 empirical models. We find that journal verification requirements, combined with data archiving mandates, drive reproducibility: the share of fully or largely reproducible papers rises from 20.8% before DA-RT adoption to 82.5% after, and conditional on accessible replication packages, 92.1% of papers are fully or largely reproducible (234/254). As a secondary application, we apply standardized IV diagnostics to 84 studies (597 IV specifications among 1,910 replicated models), illustrating how automated execution enables systematic reanalysis across heterogeneous empirical settings.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
Can Coding Agents Reproduce Findings in Computational Materials Science?
AutoMat benchmark shows current LLM coding agents achieve at most 54.1% success when reproducing computational materials science claims from papers.
-
Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results
LLM agents can reproduce many social science results from paper descriptions and data alone, though performance varies and failures trace to both agents and paper underspecification.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.