Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
read the original abstract
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.
This paper has not been read by Pith yet.
Forward citations
Cited by 8 Pith papers
-
PeerPrism: Peer Evaluation Expertise vs Review-writing AI
PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.
-
The Impact of AI-Generated Text on the Internet
By mid-2025 roughly 35% of new websites are AI-generated or AI-assisted, correlating with lower semantic diversity and higher positive sentiment but showing no significant drop in factual accuracy or stylistic diversity.
-
Detecting Verbatim LLM Copy-Paste in Homework
SteganoPrompt embeds a hidden instruction in assignment prompts via the Unicode Tags block so that LLMs add a detectable signature to responses when the prompt is pasted verbatim.
-
Rethinking Publication: A Certification Framework for AI-Enabled Research
The paper introduces a certification framework that grades AI research contributions into Categories A, B, and C based on pipeline reach at submission time and adds benchmark slots for fully automated work.
-
Rethinking Publication: A Certification Framework for AI-Enabled Research
A two-layer certification framework decouples knowledge validity from human authorship to accommodate AI-enabled research in existing publication systems.
-
AI Disclosure with DAISY
DAISY is a structured form tool that generates more complete AI disclosure statements for research papers without reducing author comfort levels.
-
Publish and Perish: How AI-Accelerated Writing Without Proportional Verification Investment Degrades Scientific Knowledge
A minimal ODE model of AI adoption in writing and reviewing predicts a short-term knowledge peak followed by 40% long-term decline unless review acceleration exceeds writing acceleration.
-
Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews
Controlled prompt interventions reveal strong affiliation bias in LLM peer reviews favoring top-ranked institutions, plus effects from seniority and publication history.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.