Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Daniel A. McFarland; Haley Lepp; Hancheng Cao; Haotian Ye; James Y. Zou; Lingjiao Chen; Sheng Liu; Weixin Liang; Xuandong Zhao; Yaohui Zhang

arxiv: 2403.07183 · v4 · pith:JLKT2QTNnew · submitted 2024-03-11 · 💻 cs.CL · cs.AI· cs.LG· cs.SI

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Weixin Liang , Zachary Izzo , Yaohui Zhang , Haley Lepp , Hancheng Cao , Xuandong Zhao , Lingjiao Chen , Haotian Ye

show 4 more authors

Sheng Liu Zhi Huang Daniel A. McFarland James Y. Zou

This is my paper

classification 💻 cs.CL cs.AIcs.LGcs.SI

keywords textpeerreviewsapproachcasechatgptconferencescorpus

0 comments

read the original abstract

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PeerPrism: Peer Evaluation Expertise vs Review-writing AI
cs.CL 2026-04 unverdicted novelty 7.0

PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.
The Impact of AI-Generated Text on the Internet
cs.CY 2026-04 unverdicted novelty 7.0

By mid-2025 roughly 35% of new websites are AI-generated or AI-assisted, correlating with lower semantic diversity and higher positive sentiment but showing no significant drop in factual accuracy or stylistic diversity.
Detecting Verbatim LLM Copy-Paste in Homework
cs.CR 2026-05 unverdicted novelty 6.0

SteganoPrompt embeds a hidden instruction in assignment prompts via the Unicode Tags block so that LLMs add a detectable signature to responses when the prompt is pasted verbatim.
Rethinking Publication: A Certification Framework for AI-Enabled Research
cs.AI 2026-04 conditional novelty 6.0

The paper introduces a certification framework that grades AI research contributions into Categories A, B, and C based on pipeline reach at submission time and adds benchmark slots for fully automated work.
Rethinking Publication: A Certification Framework for AI-Enabled Research
cs.AI 2026-04 unverdicted novelty 6.0

A two-layer certification framework decouples knowledge validity from human authorship to accommodate AI-enabled research in existing publication systems.
AI Disclosure with DAISY
cs.HC 2026-04 conditional novelty 6.0

DAISY is a structured form tool that generates more complete AI disclosure statements for research papers without reducing author comfort levels.
Publish and Perish: How AI-Accelerated Writing Without Proportional Verification Investment Degrades Scientific Knowledge
physics.soc-ph 2026-04 unverdicted novelty 5.0

A minimal ODE model of AI adoption in writing and reviewing predicts a short-term knowledge peak followed by 40% long-term decline unless review acceleration exceeds writing acceleration.
Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews
cs.CY 2025-09 unverdicted novelty 5.0

Controlled prompt interventions reveal strong affiliation bias in LLM peer reviews favoring top-ranked institutions, plus effects from seniority and publication history.