Recognition: unknown
AI-assisted writing and the reorganization of scientific knowledge
Pith reviewed 2026-05-10 11:26 UTC · model grok-4.3
The pith
Post-2023 AI-assisted writing links to higher disruption but narrower knowledge recombination in science.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using approximately two million full-text articles published 2021-2024 and linked to citation networks, the study shows that before 2023 AI-assisted writing intensity was weakly or negatively associated with disruption, but after 2023 the association turns positive within-author and within-field. Over the same period the positive link to cross-field citation breadth weakens substantially and the negative association with citation concentration attenuates, so that the rise in disruption is not matched by broader knowledge sourcing.
What carries the argument
AI-assisted writing intensity, defined as the predicted share of text in a paper exhibiting features consistent with LLM-generated text, and its statistical associations with citation-based measures of disruption, cross-field breadth, and concentration.
Load-bearing premise
The assumption that the predicted share of LLM-like text in a paper accurately reflects the actual intensity of generative AI assistance during writing, rather than capturing unrelated stylistic or field-specific patterns.
What would settle it
Direct measurement of actual AI tool usage through author surveys or editing logs in a sample of papers, followed by re-estimation of the post-2023 associations between measured usage and disruption or citation breadth to test whether the patterns persist.
Figures
read the original abstract
Generative AI systems such as ChatGPT are increasingly used in scientific writing, yet their broader implications for the organization of scientific knowledge remain unclear. We examine whether AI-assisted writing intensity, measured as the share of text in a paper that is predicted to exhibit features consistent with LLM-generated text, is associated with scientific disruption and knowledge recombination. Using approximately two million full-text research articles published between 2021 and 2024 and linked to citation networks, we document a sharp temporal pattern beginning in 2023. Before 2023, higher AI-assisted writing intensity is weakly or negatively associated with disruption; after 2023, the association becomes positive in within-author, within-field analyses. Over the same period, the positive association between AI-assisted writing intensity and cross-field citation breadth weakens substantially, and the negative association with citation concentration attenuates. Thus, the post-2023 increase in disruption is not accompanied by broader knowledge sourcing. These patterns suggest that generative AI is associated with more disruptive citation structures without a corresponding expansion in cross-field recombination. Rather than simply broadening the search space of science, AI-assisted writing may be associated with new forms of recombination built from relatively narrower knowledge inputs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes the relationship between AI-assisted writing intensity, proxied by the proportion of text exhibiting LLM-like features, and scientific disruption and knowledge recombination using a large corpus of approximately two million papers published 2021-2024. Employing fixed-effects regressions, it reports that after 2023, higher AI intensity is associated with greater disruption, while associations with cross-field citation breadth weaken and with citation concentration attenuate, implying that increased disruption occurs without broader knowledge sourcing.
Significance. Should the proxy measure prove valid, these results would indicate that generative AI is reshaping scientific output toward more disruptive but less interdisciplinary recombination patterns. This has potential implications for understanding AI's effects on scientific progress and could guide future research on AI tools in academia. The scale of the data and the use of within-author and within-field designs are notable strengths that help isolate the associations.
major comments (2)
- [Abstract] Abstract: the AI-assisted writing intensity measure is defined only as 'the share of text in a paper that is predicted to exhibit features consistent with LLM-generated text,' with no reported classifier accuracy, training data, validation metrics, false-positive rates by field, or robustness to alternative detectors. This is load-bearing for all post-2023 coefficient shifts, as any time-varying stylistic confounder captured by the detector would directly bias the before-after contrasts.
- [Abstract] Abstract (results paragraph): the within-author and within-field fixed-effects models are invoked to support the temporal change in associations, yet no details are given on exact specifications, additional controls, or tests for parallel trends pre-2023. Without these, it remains unclear whether the reported sign flip in the disruption association is robust to reasonable alternative modeling choices.
minor comments (2)
- The abstract could more explicitly note the exact sample construction (e.g., how full-text availability and citation linking were handled) to aid replicability.
- Consider adding a supplementary table of descriptive statistics for the AI-intensity variable by year and field.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving transparency and robustness, and we will revise the manuscript to address them directly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the AI-assisted writing intensity measure is defined only as 'the share of text in a paper that is predicted to exhibit features consistent with LLM-generated text,' with no reported classifier accuracy, training data, validation metrics, false-positive rates by field, or robustness to alternative detectors. This is load-bearing for all post-2023 coefficient shifts, as any time-varying stylistic confounder captured by the detector would directly bias the before-after contrasts.
Authors: We agree that full documentation of the AI detection classifier is essential for evaluating the proxy's validity and for interpreting the post-2023 coefficient changes. The current manuscript provides a high-level description but lacks the requested metrics and robustness checks. In the revision we will add a dedicated methods subsection (and appendix) reporting the training corpus, model performance (accuracy, precision, recall, F1, AUC), field-specific false-positive rates, and results from alternative detectors. We will also discuss potential time-varying stylistic confounders and how they are mitigated by the within-author and within-field designs. revision: yes
-
Referee: [Abstract] Abstract (results paragraph): the within-author and within-field fixed-effects models are invoked to support the temporal change in associations, yet no details are given on exact specifications, additional controls, or tests for parallel trends pre-2023. Without these, it remains unclear whether the reported sign flip in the disruption association is robust to reasonable alternative modeling choices.
Authors: We acknowledge that the abstract's space constraints left the econometric specifications underspecified. The full manuscript contains the core fixed-effects regressions, but we agree that explicit details and pre-trend diagnostics are needed. In the revision we will expand the methods and results sections to report the precise model equations, the full set of controls, clustering choices, and formal tests for parallel trends in the pre-2023 period. We will also present robustness checks using alternative specifications to confirm that the sign change in the disruption association is not sensitive to modeling decisions. revision: yes
Circularity Check
No significant circularity in empirical associations
full rationale
The paper's core claims rest on observational regressions linking a text-derived AI-intensity measure (share of LLM-like features) to independent citation-network outcomes (disruption index, cross-field breadth, concentration). These variables are constructed from distinct data sources—full-text content versus the citation graph—and the reported pre/post-2023 shifts are estimated via within-author and within-field fixed effects. No equations, self-definitions, or self-citation chains reduce the associations to fitted parameters or tautologies by construction. The measurement of AI assistance is a predictive classifier output, but the paper presents statistical patterns rather than derivations that presuppose their own results. This is a standard empirical study whose central findings remain independent of the inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- LLM text classifier threshold or model parameters
axioms (2)
- domain assumption Text features can reliably indicate the presence and intensity of generative AI assistance in scientific writing
- domain assumption Citation network metrics validly capture scientific disruption and knowledge recombination
Reference graph
Works this paper leans on
-
[1]
Bergstrom, D
T. Bergstrom, D. Ruediger, A Third Transformation. Generative AI and Scholarly (2024)
2024
- [2]
-
[3]
M. H. B. Salih, T. Gul, S. S. Bukhari, I. Liaqat, A. Majeed, AI-Enhanced Scholarly Communication: Transforming Peer Review, Knowledge Dissemination, and Academic Publishing Workflows in the Digital Era. (2025)
2025
-
[4]
arXiv preprint arXiv:2501.04040 , year=
A. Matarazzo, R. Torlone, A survey on large language models with some insights on their capabilities and limitations. arXiv preprint arXiv:2501.04040 (2025)
-
[5]
X. Tang, X. Duan, Z. Cai (2025) Large language models for automated literature review: An evaluation of reference generation, abstract writing, and review composition. in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp 1602-1617
2025
-
[6]
J. E. Dittmar, Information technology and economic change: the impact of the printing press. The Quarterly Journal of Economics 126, 1133-1172 (2011)
2011
-
[7]
McLuhan, The gutenberg galaxy (University of Toronto press, 2011)
M. McLuhan, The gutenberg galaxy (University of Toronto press, 2011)
2011
-
[8]
J. A. Evans, Electronic publication and the narrowing of science and scholarship. science 321, 395-399 (2008)
2008
-
[9]
R. J. Funk, J. Owen-Smith, A dynamic network measure of technological change. Management science 63, 791-817 (2017)
2017
-
[10]
L. Wu, D. Wang, J. A. Evans, Large teams develop and small teams disrupt science and technology. Nature 566, 378-382 (2019)
2019
-
[11]
M. Park, E. Leahey, R. J. Funk, Papers and patents are becoming less disruptive over time. Nature 613, 138-144 (2023)
2023
-
[12]
Leibel, L
C. Leibel, L. Bornmann, What do we know about the disruption index in scientometrics? An overview of the literature. Scientometrics 129, 601-639 (2024)
2024
- [13]
-
[14]
J. Bai, P . Perron, Computation and analysis of multiple structural change models. Journal of applied econometrics 18, 1-22 (2003)
2003
-
[15]
Azoulay, J
P . Azoulay, J. S. Graff Zivin, J. Wang, Superstar extinction. The Quarterly Journal of Economics 125, 549-589 (2010)
2010
-
[16]
Bornmann, S
L. Bornmann, S. Devarakonda, A. Tekles, G. Chacko, Are disruption index indicators convergently valid? The comparison of several indicator variants with assessments by peers. Quantitative Science Studies 1, 1242-1259 (2020)
2020
-
[17]
Lin, Divergence measures based on the Shannon entropy
J. Lin, Divergence measures based on the Shannon entropy. IEEE Transactions on Information theory 37, 145-151 (1991)
1991
-
[18]
S. A. Rhoades, The herfindahl-hirschman index. Fed. Res. Bull. 79, 188 (1993)
1993
-
[19]
J. M. Wooldridge, Introductory econometrics: A modern approach (Nelson Education, 2015)
2015
-
[20]
M. C. Lovell, A simple proof of the FWL theorem. The Journal of Economic Education 39, 88-91 (2008)
2008
-
[21]
J. M. Wooldridge, Econometric analysis of cross section and panel data (MIT press, 2010)
2010
-
[22]
J. Priem, H. Piwowar, R. Orr, OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833 (2022)
-
[23]
Google LLC (2024) BigQuery. Fig. S1. Heterogeneity in the effect of AI-assisted writing intensity by researcher seniority. The interaction between AI-assisted writing intensity and academic age is positive and statistically significant in the 12- and 18-month windows (12m: β = 0.0011, 18m: β = 0.0009; both p < 0.001) but small and statistically indistingu...
2024
-
[24]
The simplified and full indices are highly correlated (r ≈ 0.95) based on a robustness test on data described in S10, and all substantive conclusions remain unchanged. S12. Control for collaboration structure We use the data set with 18-month citation window. Including team size as a control does not materially alter the estimated relationship between AI-...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.