arxiv: 2604.14126 · v1 · submitted 2026-04-15 · 💻 cs.DL

Recognition: unknown

AI-assisted writing and the reorganization of scientific knowledge

Erjia Yan , Chaoqun Ni

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:26 UTC · model grok-4.3

classification 💻 cs.DL

keywords AI-assisted writingscientific disruptionknowledge recombinationcitation analysisgenerative AILLM featuresresearch articlescitation networks

0 comments

The pith

Post-2023 AI-assisted writing links to higher disruption but narrower knowledge recombination in science.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether the intensity of AI-assisted writing in research papers, measured by the share of text predicted to show LLM features, relates to scientific disruption and how knowledge is recombined across fields. It documents a clear shift beginning in 2023: higher AI intensity becomes positively associated with disruption in analyses that hold author and field constant, while associations with broader cross-field citations weaken and citation concentration patterns change. A sympathetic reader would care because the findings indicate that AI may support new disruptive combinations without expanding the range of knowledge sources drawn upon.

Core claim

Using approximately two million full-text articles published 2021-2024 and linked to citation networks, the study shows that before 2023 AI-assisted writing intensity was weakly or negatively associated with disruption, but after 2023 the association turns positive within-author and within-field. Over the same period the positive link to cross-field citation breadth weakens substantially and the negative association with citation concentration attenuates, so that the rise in disruption is not matched by broader knowledge sourcing.

What carries the argument

AI-assisted writing intensity, defined as the predicted share of text in a paper exhibiting features consistent with LLM-generated text, and its statistical associations with citation-based measures of disruption, cross-field breadth, and concentration.

Load-bearing premise

The assumption that the predicted share of LLM-like text in a paper accurately reflects the actual intensity of generative AI assistance during writing, rather than capturing unrelated stylistic or field-specific patterns.

What would settle it

Direct measurement of actual AI tool usage through author surveys or editing logs in a sample of papers, followed by re-estimation of the post-2023 associations between measured usage and disruption or citation breadth to test whether the patterns persist.

Figures

Figures reproduced from arXiv: 2604.14126 by Chaoqun Ni, Erjia Yan.

**Figure 1.** Figure 1: Year-specific associations between AI-assisted writing intensity and scientific disruption. Marginal effects of AI-assisted writing intensity on the consolidation– disruption (CD) index are estimated from author–field–year panel models with author×field and year fixed effects. Shaded bands show 95% confidence intervals. In 2021, the association is negative across all forward-citation windows; in 2022, it a… view at source ↗

read the original abstract

Generative AI systems such as ChatGPT are increasingly used in scientific writing, yet their broader implications for the organization of scientific knowledge remain unclear. We examine whether AI-assisted writing intensity, measured as the share of text in a paper that is predicted to exhibit features consistent with LLM-generated text, is associated with scientific disruption and knowledge recombination. Using approximately two million full-text research articles published between 2021 and 2024 and linked to citation networks, we document a sharp temporal pattern beginning in 2023. Before 2023, higher AI-assisted writing intensity is weakly or negatively associated with disruption; after 2023, the association becomes positive in within-author, within-field analyses. Over the same period, the positive association between AI-assisted writing intensity and cross-field citation breadth weakens substantially, and the negative association with citation concentration attenuates. Thus, the post-2023 increase in disruption is not accompanied by broader knowledge sourcing. These patterns suggest that generative AI is associated with more disruptive citation structures without a corresponding expansion in cross-field recombination. Rather than simply broadening the search space of science, AI-assisted writing may be associated with new forms of recombination built from relatively narrower knowledge inputs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a post-2023 reversal where higher LLM-detected text shares tie to more citation disruption but weaker cross-field breadth, using large-scale fixed effects, though the detector proxy needs stronger validation.

read the letter

The main takeaway is straightforward: after 2023, papers with more text flagged as LLM-like show a positive link to disruption scores in within-author and within-field models, while the earlier positive association with cross-field citation breadth fades and the tie to citation concentration weakens. This points to AI-assisted writing correlating with narrower knowledge inputs that still produce more disruptive citation patterns, rather than simply widening the search space.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes the relationship between AI-assisted writing intensity, proxied by the proportion of text exhibiting LLM-like features, and scientific disruption and knowledge recombination using a large corpus of approximately two million papers published 2021-2024. Employing fixed-effects regressions, it reports that after 2023, higher AI intensity is associated with greater disruption, while associations with cross-field citation breadth weaken and with citation concentration attenuate, implying that increased disruption occurs without broader knowledge sourcing.

Significance. Should the proxy measure prove valid, these results would indicate that generative AI is reshaping scientific output toward more disruptive but less interdisciplinary recombination patterns. This has potential implications for understanding AI's effects on scientific progress and could guide future research on AI tools in academia. The scale of the data and the use of within-author and within-field designs are notable strengths that help isolate the associations.

major comments (2)

[Abstract] Abstract: the AI-assisted writing intensity measure is defined only as 'the share of text in a paper that is predicted to exhibit features consistent with LLM-generated text,' with no reported classifier accuracy, training data, validation metrics, false-positive rates by field, or robustness to alternative detectors. This is load-bearing for all post-2023 coefficient shifts, as any time-varying stylistic confounder captured by the detector would directly bias the before-after contrasts.
[Abstract] Abstract (results paragraph): the within-author and within-field fixed-effects models are invoked to support the temporal change in associations, yet no details are given on exact specifications, additional controls, or tests for parallel trends pre-2023. Without these, it remains unclear whether the reported sign flip in the disruption association is robust to reasonable alternative modeling choices.

minor comments (2)

The abstract could more explicitly note the exact sample construction (e.g., how full-text availability and citation linking were handled) to aid replicability.
Consider adding a supplementary table of descriptive statistics for the AI-intensity variable by year and field.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving transparency and robustness, and we will revise the manuscript to address them directly.

read point-by-point responses

Referee: [Abstract] Abstract: the AI-assisted writing intensity measure is defined only as 'the share of text in a paper that is predicted to exhibit features consistent with LLM-generated text,' with no reported classifier accuracy, training data, validation metrics, false-positive rates by field, or robustness to alternative detectors. This is load-bearing for all post-2023 coefficient shifts, as any time-varying stylistic confounder captured by the detector would directly bias the before-after contrasts.

Authors: We agree that full documentation of the AI detection classifier is essential for evaluating the proxy's validity and for interpreting the post-2023 coefficient changes. The current manuscript provides a high-level description but lacks the requested metrics and robustness checks. In the revision we will add a dedicated methods subsection (and appendix) reporting the training corpus, model performance (accuracy, precision, recall, F1, AUC), field-specific false-positive rates, and results from alternative detectors. We will also discuss potential time-varying stylistic confounders and how they are mitigated by the within-author and within-field designs. revision: yes
Referee: [Abstract] Abstract (results paragraph): the within-author and within-field fixed-effects models are invoked to support the temporal change in associations, yet no details are given on exact specifications, additional controls, or tests for parallel trends pre-2023. Without these, it remains unclear whether the reported sign flip in the disruption association is robust to reasonable alternative modeling choices.

Authors: We acknowledge that the abstract's space constraints left the econometric specifications underspecified. The full manuscript contains the core fixed-effects regressions, but we agree that explicit details and pre-trend diagnostics are needed. In the revision we will expand the methods and results sections to report the precise model equations, the full set of controls, clustering choices, and formal tests for parallel trends in the pre-2023 period. We will also present robustness checks using alternative specifications to confirm that the sign change in the disruption association is not sensitive to modeling decisions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical associations

full rationale

The paper's core claims rest on observational regressions linking a text-derived AI-intensity measure (share of LLM-like features) to independent citation-network outcomes (disruption index, cross-field breadth, concentration). These variables are constructed from distinct data sources—full-text content versus the citation graph—and the reported pre/post-2023 shifts are estimated via within-author and within-field fixed effects. No equations, self-definitions, or self-citation chains reduce the associations to fitted parameters or tautologies by construction. The measurement of AI assistance is a predictive classifier output, but the paper presents statistical patterns rather than derivations that presuppose their own results. This is a standard empirical study whose central findings remain independent of the inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the validity of LLM text detection as a proxy for AI writing assistance and on the standard citation-based measures of disruption and recombination; no new entities are postulated.

free parameters (1)

LLM text classifier threshold or model parameters
The exact cutoff or model used to compute the share of AI-consistent text is not specified in the abstract.

axioms (2)

domain assumption Text features can reliably indicate the presence and intensity of generative AI assistance in scientific writing
This underpins the key independent variable throughout the analysis.
domain assumption Citation network metrics validly capture scientific disruption and knowledge recombination
Standard assumption in scientometrics invoked for the outcome variables.

pith-pipeline@v0.9.0 · 5500 in / 1392 out tokens · 76318 ms · 2026-05-10T11:26:58.023223+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 4 canonical work pages

[1]

Bergstrom, D

T. Bergstrom, D. Ruediger, A Third Transformation. Generative AI and Scholarly (2024)

2024
[2]

Feher, M

K. Feher, M. Demeter, Generative knowledge production pipeline driven by academic influencers. arXiv preprint arXiv:2505.24681 (2025)

work page arXiv 2025
[3]

M. H. B. Salih, T. Gul, S. S. Bukhari, I. Liaqat, A. Majeed, AI-Enhanced Scholarly Communication: Transforming Peer Review, Knowledge Dissemination, and Academic Publishing Workflows in the Digital Era. (2025)

2025
[4]

arXiv preprint arXiv:2501.04040 , year=

A. Matarazzo, R. Torlone, A survey on large language models with some insights on their capabilities and limitations. arXiv preprint arXiv:2501.04040 (2025)

work page arXiv 2025
[5]

X. Tang, X. Duan, Z. Cai (2025) Large language models for automated literature review: An evaluation of reference generation, abstract writing, and review composition. in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp 1602-1617

2025
[6]

J. E. Dittmar, Information technology and economic change: the impact of the printing press. The Quarterly Journal of Economics 126, 1133-1172 (2011)

2011
[7]

McLuhan, The gutenberg galaxy (University of Toronto press, 2011)

M. McLuhan, The gutenberg galaxy (University of Toronto press, 2011)

2011
[8]

J. A. Evans, Electronic publication and the narrowing of science and scholarship. science 321, 395-399 (2008)

2008
[9]

R. J. Funk, J. Owen-Smith, A dynamic network measure of technological change. Management science 63, 791-817 (2017)

2017
[10]

L. Wu, D. Wang, J. A. Evans, Large teams develop and small teams disrupt science and technology. Nature 566, 378-382 (2019)

2019
[11]

M. Park, E. Leahey, R. J. Funk, Papers and patents are becoming less disruptive over time. Nature 613, 138-144 (2023)

2023
[12]

Leibel, L

C. Leibel, L. Bornmann, What do we know about the disruption index in scientometrics? An overview of the literature. Scientometrics 129, 601-639 (2024)

2024
[13]

J. Liu, Y . He, Z. Zheng, Y . Bu, C. Ni, AI-assisted writing is growing fastest among non- english-speaking and less established scientists. arXiv preprint arXiv:2511.15872 (2025)

work page arXiv 2025
[14]

J. Bai, P . Perron, Computation and analysis of multiple structural change models. Journal of applied econometrics 18, 1-22 (2003)

2003
[15]

Azoulay, J

P . Azoulay, J. S. Graff Zivin, J. Wang, Superstar extinction. The Quarterly Journal of Economics 125, 549-589 (2010)

2010
[16]

Bornmann, S

L. Bornmann, S. Devarakonda, A. Tekles, G. Chacko, Are disruption index indicators convergently valid? The comparison of several indicator variants with assessments by peers. Quantitative Science Studies 1, 1242-1259 (2020)

2020
[17]

Lin, Divergence measures based on the Shannon entropy

J. Lin, Divergence measures based on the Shannon entropy. IEEE Transactions on Information theory 37, 145-151 (1991)

1991
[18]

S. A. Rhoades, The herfindahl-hirschman index. Fed. Res. Bull. 79, 188 (1993)

1993
[19]

J. M. Wooldridge, Introductory econometrics: A modern approach (Nelson Education, 2015)

2015
[20]

M. C. Lovell, A simple proof of the FWL theorem. The Journal of Economic Education 39, 88-91 (2008)

2008
[21]

J. M. Wooldridge, Econometric analysis of cross section and panel data (MIT press, 2010)

2010
[22]

Shaurya Rohatgi

J. Priem, H. Piwowar, R. Orr, OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833 (2022)

work page arXiv 2022
[23]

Google LLC (2024) BigQuery. Fig. S1. Heterogeneity in the effect of AI-assisted writing intensity by researcher seniority. The interaction between AI-assisted writing intensity and academic age is positive and statistically significant in the 12- and 18-month windows (12m: β = 0.0011, 18m: β = 0.0009; both p < 0.001) but small and statistically indistingu...

2024
[24]

The simplified and full indices are highly correlated (r ≈ 0.95) based on a robustness test on data described in S10, and all substantive conclusions remain unchanged. S12. Control for collaboration structure We use the data set with 18-month citation window. Including team size as a control does not materially alter the estimated relationship between AI-...

2021