Scaling Reproducibility: An AI-Assisted Workflow for Large-Scale Replication and Reanalysis

Leo Yang Yang; Yiqing Xu

arxiv: 2602.16733 · v3 · pith:UHCFNC5Unew · submitted 2026-02-17 · 💰 econ.EM · stat.ME

Scaling Reproducibility: An AI-Assisted Workflow for Large-Scale Replication and Reanalysis

Yiqing Xu , Leo Yang Yang This is my paper

classification 💰 econ.EM stat.ME

keywords replicationautomatedempiricalreproducibilityworkflowai-assistedapplydata

0 comments

read the original abstract

Computational reproducibility is central to scientific credibility, yet verifying published results at scale remains costly. We develop an AI-assisted workflow for automated full-paper replication -- retrieving materials, reconstructing environments, executing code, and matching outputs to point estimates reported in regression tables. We define a universe of all empirical and quantitative papers from the three top political science journals (2010--2025) and measure stated data availability using automated extraction. For a stratified sample of 384 studies, we apply the workflow to conduct full-paper replication, totaling 3,523 empirical models. We find that journal verification requirements, combined with data archiving mandates, drive reproducibility: the share of fully or largely reproducible papers rises from 20.8% before DA-RT adoption to 82.5% after, and conditional on accessible replication packages, 92.1% of papers are fully or largely reproducible (234/254). As a secondary application, we apply standardized IV diagnostics to 84 studies (597 IV specifications among 1,910 replicated models), illustrating how automated execution enables systematic reanalysis across heterogeneous empirical settings.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Can Coding Agents Reproduce Findings in Computational Materials Science?
cs.SE 2026-05 conditional novelty 8.0

AutoMat benchmark shows current LLM coding agents achieve at most 54.1% success when reproducing computational materials science claims from papers.
Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results
cs.AI 2026-04 conditional novelty 6.0

LLM agents can reproduce many social science results from paper descriptions and data alone, though performance varies and failures trace to both agents and paper underspecification.