pith. sign in

arxiv: 2604.05714 · v1 · submitted 2026-04-07 · ⚛️ physics.soc-ph

Publish and Perish: How AI-Accelerated Writing Without Proportional Verification Investment Degrades Scientific Knowledge

Pith reviewed 2026-05-10 18:49 UTC · model grok-4.3

classification ⚛️ physics.soc-ph
keywords AI in scientific publishingpeer review dynamicsknowledge production modelsystem dynamicspublication accelerationreview qualityscientific output degradation
0
0 comments X

The pith

AI-accelerated writing without matching review acceleration degrades scientific knowledge output after a brief peak.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models how AI tools speed up writing and submission rates faster than peer review capacity can grow. Increased submissions lengthen review queues, which in turn pressures reviewers to adopt AI tools that lower verification standards. This produces a temporary rise in net knowledge output followed by decline as errors accumulate and quality erodes. With writing speed doubled and review speed halved, the model forecasts a peak near 2026, then a drop to 60 percent of baseline output in the long run. Only if review processes accelerate more than writing does the system avoid net loss, and recent submission data from major venues matches the early acceleration phase.

Core claim

The central claim is that writing AI adoption increases submissions and review queue size, which endogenously drives reviewer AI adoption under pressure, degrading verification quality and causing knowledge output to peak temporarily before settling at a lower steady state when review acceleration lags writing acceleration.

What carries the argument

A minimal two-variable ordinary differential equation model coupling review queue evolution to verification quality degradation via an endogenous, queue-pressure-driven review AI adoption mechanism.

Load-bearing premise

The assumption that review quality will degrade endogenously as queue pressure causes reviewers to adopt AI tools at rates that reduce verification effectiveness.

What would settle it

Tracking of submission volumes, error rates in published work, and net knowledge indicators through 2030 showing sustained rise or stability past the predicted 2028 onset instead of the modeled decline.

Figures

Figures reproduced from arXiv: 2604.05714 by Seok Joon Kwon.

Figure 1
Figure 1. Figure 1: Model dynamics of the Publish and Perish paradox. (a) AI adoption in writing (φw, external logistic) and in review (φr, endogenous, queue-driven). (b) Submission rate S(t), review throughput R(t), and review queue Q(t) (right axis, dashed). (c) Verification quality q(t), with analytical steady state qss = 0.40 and quality floor qmin = 0.20 marked. (d) Normalized knowledge output K(t)/K0, showing honeymoon … view at source ↗
Figure 2
Figure 2. Figure 2: Empirical validation. Annual submissions for (a) NeurIPS (2008-2025), (b) ICLR (2013-2026), (c) arXiv monthly (2008-2025), and (d) bioRxiv annual (2014-2025). Circles: pre-ChatGPT data; squares: post-ChatGPT data. CAGR shown for each period. AI-intensive venues (a-c) accelerated post-ChatGPT; bioRxiv (d) decelerated, providing a suggestive comparison. Data sources: [19-23] [PITH_FULL_IMAGE:figures/full_fi… view at source ↗
Figure 3
Figure 3. Figure 3: Parameter space exploration of the knowledge degradation. Heatmap of K(t = 20)/K0 as a function of writing acceleration γ and review acceleration δ. The K/K0 = 1 contour (black) separates the paradox regime (red, below) from the benefit regime (blue, above). The dashed white line marks δ = γ (simplified critical condition). Black star: current observed operating point (γ = 2.0, δ = 0.5) [PITH_FULL_IMAGE:f… view at source ↗
Figure 4
Figure 4. Figure 4: Policy lever analysis. (a) Effect of review acceleration δ on K(t)/K0 over 20 years (year 2022- 2042). Baseline δ = 0.5 (bold red); critical threshold δ = 2.0 (green) restores K ≈ K0. (b) Effect of quality floor qmin on K(t)/K0. Higher institutional standards (qmin = 0.4-0.6) significantly mitigate knowledge loss [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
read the original abstract

Artificial intelligence tools are accelerating manuscript production far faster than peer review capacity can expand. Applying the theory of constraints from manufacturing science, we formalize this asymmetry through a minimal two-variable ordinary differential equation model coupling review queue evolution and verification quality degradation via an endogenous, queue-pressure-driven review AI adoption mechanism. The causal chain is: writing AI adoption increases submissions, growing the review queue, which drives reviewer AI adoption under pressure, degrading verification quality and reducing net knowledge output. Under empirically informed parameters (writing acceleration {\gamma} = 2.0, review acceleration {\delta} = 0.5), the model predicts a deceptive honeymoon where knowledge output peaks at 1.10K0 (circa 2026), followed by paradox onset at t = 6 years (2028) and long-term degradation to 0.68K0 (32% loss), approaching a steady state of 0.60K0 (40% loss). The critical condition for net benefit is {\delta} > {\gamma}; the current operating point lies deep in the paradox regime. Empirical validation against NeurIPS, ICLR, arXiv, and bioRxiv submission data shows qualitative consistency with observed post-ChatGPT acceleration patterns. Policy analysis reveals that only combined interventions such as review infrastructure investment paired with institutional quality standards can restore positive knowledge production.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 2 minor

Summary. The paper claims that AI-accelerated writing outpaces verification capacity, creating a 'publish and perish' dynamic that degrades net scientific knowledge. It formalizes this via a minimal two-variable ODE model coupling submission queue growth to endogenous review-quality degradation through queue-pressure-driven reviewer AI adoption. With parameters γ=2.0 (writing acceleration) and δ=0.5 (review acceleration), the model predicts a short-term 'honeymoon' peak in knowledge output at 1.10K0 circa 2026, followed by paradox onset at t=6 years and long-term decline to a steady state of 0.60K0 (40% loss). Qualitative consistency is shown with post-ChatGPT submission trends from NeurIPS, ICLR, arXiv, and bioRxiv; the critical condition for net benefit is stated as δ > γ, with current parameters deep in the paradox regime. Policy conclusions favor combined infrastructure investment and quality standards.

Significance. If the modeling assumptions hold, the work supplies a transparent, minimal dynamical framework that isolates the asymmetry between writing and review acceleration and identifies a clear threshold condition (δ > γ) for sustained knowledge production. The approach is a strength in its use of theory-of-constraints logic to generate falsifiable trajectories and policy implications. However, the significance is tempered by the absence of empirical grounding for the key endogenous degradation term and lack of sensitivity or uncertainty quantification around the 40% loss prediction.

major comments (4)
  1. [ODE model] The two-variable ODE system (model equations in the main text): the verification-quality degradation term is introduced by assumption as an endogenous, queue-pressure-driven reviewer AI adoption mechanism rather than derived from data on review times, rejection rates, or post-publication quality metrics. This functional form and the specific value δ=0.5 are load-bearing for the central claim of long-term degradation to 0.60K0; if the mapping from queue length to quality loss is weaker or absent, the critical condition δ > γ is satisfied and the 40% loss does not occur.
  2. [Results and parameter selection] Parameter choice and results section: γ=2.0 and δ=0.5 are described as 'empirically informed' yet no data sources, fitting procedure, or calibration against review-quality indicators are supplied. The quantitative predictions (peak at 1.10K0, steady state 0.60K0) are generated directly by the same equations that embed these hand-chosen values, creating circularity; no sensitivity analysis or error bounds on the 40% loss figure are reported.
  3. [Empirical validation] Empirical validation paragraph: consistency with NeurIPS, ICLR, arXiv, and bioRxiv submission-volume trends is stated as qualitative only, with no quantitative fit statistics, statistical tests, or comparison to alternative models. This leaves the support for the predicted honeymoon and degradation trajectories weak relative to the strength of the policy conclusions drawn.
  4. [Abstract] Abstract: the text reports both 'degradation to 0.68K0 (32% loss)' and 'steady state of 0.60K0 (40% loss)' without clarifying the distinction or reconciling the two figures; this internal inconsistency affects the precision of the headline claim.
minor comments (2)
  1. [Model setup] The baseline notation K0 is used throughout but should be defined explicitly on first use in the main text for readers unfamiliar with the normalization.
  2. [Abstract and results] The abstract and results use slightly varying loss percentages (32% vs 40%); a single consistent figure or explicit explanation of the difference would improve clarity.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each of the major comments point by point below, indicating where revisions will be made to strengthen the paper.

read point-by-point responses
  1. Referee: [ODE model] The verification-quality degradation term is introduced by assumption as an endogenous, queue-pressure-driven reviewer AI adoption mechanism rather than derived from data on review times, rejection rates, or post-publication quality metrics. This functional form and the specific value δ=0.5 are load-bearing for the central claim of long-term degradation to 0.60K0.

    Authors: As a minimal model in the spirit of sociophysics, the degradation term is posited to encapsulate the theory-of-constraints logic without claiming empirical derivation. We will revise the text to more explicitly label this as a core modeling assumption, discuss its sensitivity to alternative forms, and outline potential empirical validation strategies using review time or quality data. The threshold condition δ > γ is a structural result of the model. revision: partial

  2. Referee: [Results and parameter selection] γ=2.0 and δ=0.5 are described as 'empirically informed' yet no data sources, fitting procedure, or calibration against review-quality indicators are supplied. No sensitivity analysis or error bounds on the 40% loss figure are reported.

    Authors: We agree that the parameters are illustrative choices informed by observed submission growth rates rather than a formal fit. We will add a dedicated sensitivity analysis subsection varying γ and δ around these values and report the resulting range of steady-state knowledge levels to provide uncertainty bounds. revision: yes

  3. Referee: [Empirical validation] Consistency with NeurIPS, ICLR, arXiv, and bioRxiv submission-volume trends is stated as qualitative only, with no quantitative fit statistics, statistical tests, or comparison to alternative models.

    Authors: The validation is deliberately qualitative to demonstrate consistency with post-ChatGPT trends in multiple venues. We will expand this section with additional descriptive statistics on the trends and explicitly note the absence of quantitative model fitting due to the focus on dynamical insight rather than prediction. revision: partial

  4. Referee: [Abstract] The text reports both 'degradation to 0.68K0 (32% loss)' and 'steady state of 0.60K0 (40% loss)' without clarifying the distinction or reconciling the two figures.

    Authors: This is an oversight in the abstract wording. The 0.68K0 represents the knowledge level at the onset of the paradox phase, while 0.60K0 is the long-term steady state. We will revise the abstract to clearly distinguish these and use consistent figures. revision: yes

Circularity Check

0 steps flagged

No circularity: model outputs are forward simulations under explicit parameter choices

full rationale

The paper constructs an explicit two-variable ODE model whose outputs (honeymoon peak, onset time, steady-state value) are obtained by integrating the system forward under the stated parameter values γ=2.0 and δ=0.5. These numerical results are not equivalent to the inputs by construction; they depend on the specific functional forms chosen for queue growth and quality degradation. The critical condition δ > γ is stated separately as a model property, and the parameter values are presented as externally informed rather than fitted to the target degradation metric. No self-definitional loop, fitted-input-as-prediction, or self-citation load-bearing step is exhibited in the quoted material. The derivation chain therefore remains self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on two fitted acceleration parameters and the modeling assumption that reviewer AI adoption is driven by queue pressure and produces lower verification quality. No new physical entities are postulated.

free parameters (2)
  • γ (writing acceleration) = 2.0
    Empirically informed multiplier on submission rate due to writing AI; set to 2.0 to produce the reported trajectories.
  • δ (review acceleration) = 0.5
    Empirically informed multiplier on review rate due to reviewer AI; set to 0.5 to place the system in the paradox regime.
axioms (1)
  • domain assumption Writing AI adoption increases submissions, which grows the review queue and endogenously drives reviewer AI adoption under pressure, which in turn degrades verification quality.
    This causal chain is the core mechanism of the ODE model and is not derived from first principles or external data within the abstract.

pith-pipeline@v0.9.0 · 5541 in / 1587 out tokens · 43310 ms · 2026-05-10T18:49:51.912566+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

  1. [1]

    Publish and Perish 1 Publish and Perish: How AI-Accelerated Writing Without Proportional Verification Investment Degrades Scientific Knowledge S. Joon Kwon1 School of Chemical Engineering, Department of Semiconductor Convergence Engineering, Department of Future Energy Engineering, Department of Quantum Information Engineering, & SKKU Institute of Energy ...

  2. [2]

    Model Formulation 2.1. Causal chain and design philosophy The model is deliberately minimal: two state variables (review queue Q and verification quality q), one external input (writing AI penetration φw(t)), and one endogenous coupling (review AI penetration φr(t)). The one-directional causal chain can be described as φw(t)↑ → S↑ → Q↑ → φr(Q)↑ → q↓ → K↓ ...

  3. [3]

    For writing acceleration (γ = 2.0), Amdahl’s law provides a lower bound on AI-driven pipeline acceleration

    Model parameters, values, and justifications Symbol Value Definition Justification γ 2.0 Writing acceleration factor Amdahl lower bound = 0.3-0.7; empirical NeurIPS CAGR supports = 2.0 (includes community growth) δ 0.5 Review acceleration factor AI automates approximately 30% of review mechanics leading to net 50% speedup on that fraction Qc 2.0 Queue hal...

  4. [4]

    System dynamics Figure 1 presents the baseline simulation over a 20-year horizon (initial point t = 0 at November 2022 (ChatGPT release date))

    Results 3.1. System dynamics Figure 1 presents the baseline simulation over a 20-year horizon (initial point t = 0 at November 2022 (ChatGPT release date)). Under baseline parameters (Q0 = 0, q0 = 1.0, representing the pre-AI equilibrium), the model predicts a two-phase trajectory. The first phase is about deceptive honeymoon (t = 0-6 yr, 2022-2028). In t...

  5. [5]

    The model’s γ = 2.0 likely overstates the AI-tool-specific contribution; however, the qualitative prediction of differential acceleration by AI-tool-adoption intensity is supported

    A definitive causal test would require within-venue variation in AI tool access, which is not currently available. The model’s γ = 2.0 likely overstates the AI-tool-specific contribution; however, the qualitative prediction of differential acceleration by AI-tool-adoption intensity is supported. On the review side, the model predicts φr = 0.15-0.20 by yea...

  6. [6]

    Open peer review (publishing reviews alongside papers) deters low-effort AI-generated reviews, as several journals (i.e., eLife, EMBO, F1000Research) already demonstrate [26]

    incentivize thorough reviews, since reviewing is currently uncompensated labor. Open peer review (publishing reviews alongside papers) deters low-effort AI-generated reviews, as several journals (i.e., eLife, EMBO, F1000Research) already demonstrate [26]. For funders and institutions, the model reveals that volume-based quantitative metrics (papers per ye...

  7. [7]

    The critical condition is δ > γ: review acceleration must exceed writing acceleration

    before declining to 0.68K0 at 20 years (32% knowledge loss) and approaching an analytical steady state of 0.60K0 (40% loss). The critical condition is δ > γ: review acceleration must exceed writing acceleration. Currently, γ = 2.0 while δ = 0.5, placing academic publishing firmly in the paradox regime leading to knowledge degradation. The paradox is not i...

  8. [8]

    ICLR submissions: 2013-2026 data from Paper Copilot and OpenReview [21]

    and Paper Copilot [20]. ICLR submissions: 2013-2026 data from Paper Copilot and OpenReview [21]. arXiv submissions: monthly rates from arxiv.org [22]. bioRxiv preprints: annual totals [23]. ICLR AI review detection rates from [1, 2]. Nature AI usage survey from [4]. Peer review crisis reports from [16-18]. Code availability. Model implementation: Python 3...

  9. [9]

    Liang, W. et al. Monitoring AI-modified content at scale: a case study on the impact of ChatGPT on AI conference peer reviews. arXiv 2403.07183 (2024)

  10. [10]

    Liang, W. et al. Mapping the increasing use of LLMs in scientific papers. arXiv 2404.01268 (2024)

  11. [11]

    Checco, A. et al. AI-assisted peer review. Humanit. Soc. Sci. Commun. 8, 25 (2021)

  12. [12]

    More than half of researchers now use AI for peer review—often against guidance

    Nature News. More than half of researchers now use AI for peer review—often against guidance. Nature 648, 16 (2025)

  13. [13]

    Huang, J. et al. The rise of AI-generated scientific text. Science 382, 1316–1318 (2023)

  14. [14]

    Barnett, A. et al. Mass-manufactured research on UK Biobank overwhelms legitimate work. J. Clin. Epidemiol. (2026)

  15. [15]

    & Spick, M

    Barnett, A. & Spick, M. Research integrity is locked into an arms race with agentic AI slop. LSE Impact Blog (2026)

  16. [16]

    Goldratt, E. M. The Goal: A Process of Ongoing Improvement (North River Press, 1984)

  17. [17]

    AI reviewers are here—we are not ready

    Nature News. AI reviewers are here—we are not ready. Nature 648, 9 (2025)

  18. [18]

    The WyCash portfolio management system

    Cunningham, W. The WyCash portfolio management system. OOPSLA ’92 Experience Report (1992)

  19. [19]

    Kruchten, P., Nord, R. L. & Ozkaya, I. Managing Technical Debt (Addison-Wesley, 2019)

  20. [20]

    & Borg, M

    Tornhill, A. & Borg, M. Code Red: The Business Impact of Code Quality (Addison-Wesley, 2024)

  21. [21]

    Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005)

  22. [22]

    1,500 scientists lift the lid on reproducibility

    Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016)

  23. [23]

    Errington, T. M. et al. Investigating the replicability of preclinical cancer biology. eLife 10, e71601 (2021)

  24. [24]

    The peer-review crisis: how to fix an overloaded system

    Nature News. The peer-review crisis: how to fix an overloaded system. Nature 631, 7 (2025)

  25. [25]

    Prophy.ai Blog (2025)

  26. [26]

    Shah, C. et al. Will anyone review this paper? Screening, sorting, and the feedback cycles that imperil peer review. arXiv 2507.10734 (2025)

  27. [27]

    Reflections on the 2025 review process from the program committee chairs (2025)

    NeurIPS Blog. Reflections on the 2025 review process from the program committee chairs (2025)

  28. [28]

    NeurIPS statistics

    Paper Copilot. NeurIPS statistics. https://papercopilot.com/statistics/neurips-statistics/ (accessed Publish and Perish 14 2026)

  29. [29]

    ICLR statistics

    Paper Copilot. ICLR statistics. https://papercopilot.com/statistics/iclr-statistics/ (accessed 2026)

  30. [30]

    Submission rate statistics

    arXiv. Submission rate statistics. https://arxiv.org/stats/monthly_submissions (accessed 2026)

  31. [31]

    How bioRxiv changed the way biologists share ideas—in numbers

    Nature News. How bioRxiv changed the way biologists share ideas—in numbers. Nature 637, 412 (2026)

  32. [32]

    & Birch, K

    Kaltenbrunner, W. & Birch, K. Post-publication peer review: challenges and opportunities. Learn. Publ. 35, 274–282 (2022)

  33. [33]

    Open evaluation: a vision for entirely transparent post-publication peer review

    Kriegeskorte, N. Open evaluation: a vision for entirely transparent post-publication peer review. Front. Comput. Neurosci. 6, 79 (2012)

  34. [34]

    AI is transforming peer review—and many scientists are worried

    Nature News. AI is transforming peer review—and many scientists are worried. Nature 629, 14 (2025)

  35. [35]

    Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017). Publish and Perish 15 Figure Captions Figure

  36. [36]

    (a) AI adoption in writing (φw, external logistic) and in review (φr, endogenous, queue-driven)

    Model dynamics of the Publish and Perish paradox. (a) AI adoption in writing (φw, external logistic) and in review (φr, endogenous, queue-driven). (b) Submission rate S(t), review throughput R(t), and review queue Q(t) (right axis, dashed). (c) Verification quality q(t), with analytical steady state qss = 0.40 and quality floor qmin = 0.20 marked. (d) Nor...

  37. [37]

    Annual submissions for (a) NeurIPS (2008-2025), (b) ICLR (2013-2026), (c) arXiv monthly (2008-2025), and (d) bioRxiv annual (2014-2025)

    Empirical validation. Annual submissions for (a) NeurIPS (2008-2025), (b) ICLR (2013-2026), (c) arXiv monthly (2008-2025), and (d) bioRxiv annual (2014-2025). Circles: pre-ChatGPT data; squares: post-ChatGPT data. CAGR shown for each period. AI-intensive venues (a-c) accelerated post-ChatGPT; bioRxiv (d) decelerated, providing a suggestive comparison. Dat...

  38. [38]

    (a) Effect of review acceleration δ on K(t)/K0 over 20 years (year 2022-2042)

    Policy lever analysis. (a) Effect of review acceleration δ on K(t)/K0 over 20 years (year 2022-2042). Baseline δ = 0.5 (bold red); critical threshold δ = 2.0 (green) restores K ≈ K0. (b) Effect of quality floor qmin on K(t)/K0. Higher institutional standards (qmin = 0.4-0.6) significantly mitigate knowledge loss