Smaller, Younger, and More Impactful: How AI-Assisted Writing Transforms Research Teams
Pith reviewed 2026-07-04 17:47 UTC · model glm-5.2
The pith
AI-Assisted Writing Shrinks Research Teams and Raises Their Impact
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper's central claim is that AI-assisted writing is associated with a structural shift in research teams — toward fewer authors and lower average seniority — and that this shift is accompanied by an increase, not a decrease, in the probability of producing highly cited work. In the PLoS dataset, AI-assisted teams had a 3.0 percentage-point higher probability of producing a top-5% FWCI publication than matched controls; in the Nature dataset, the advantage was 2.7 percentage points. The team-age effect was strongest among the youngest teams (25th percentile), suggesting AI acts as an equalizer for junior researchers who lack the writing experience and academic-discourse mastery of their,
What carries the argument
The paper relies on an AI Usage Score (0–1) derived from a text-based detection algorithm (Liang et al., 2025) that estimates the proportion of AI-modified content in each paper's full text. Papers scoring above the 95th percentile of the pre-ChatGPT distribution for their journal are classified as 'AI-assisted.' The dependent variables are Team Age (average career age of co-authors, based on years since first publication), Team Size (author count), and Top 5% FWCI (a binary indicator for whether a paper falls in the top 5% of field-weighted citation impact). Controls include first-author and corresponding-author prior productivity, collaborator counts, prior team-size and team-age habits,FW
If this is right
- If AI-assisted writing genuinely enables smaller, younger teams to produce high-impact research, funding agencies may need to reconsider evaluation criteria that implicitly reward large, senior-heavy teams.
- The reversal of the decades-long trend toward larger teams could change the structure of scientific labor markets, reducing demand for senior co-authors whose primary contribution is manuscript polishing and editing.
- Junior researchers and non-native English speakers may gain disproportionate advantage from AI writing tools, potentially democratizing access to elite publication venues.
- If the pattern generalizes beyond PLoS and Nature, it could signal a broad reorganization of how scientific collaboration is structured, with AI absorbing coordination and communication overhead that previously required larger teams.
Load-bearing premise
The entire analysis rests on the accuracy of a single AI-detection algorithm (from Liang et al., 2025) that estimates how much of each paper's text was modified by AI. If this detector systematically misclassifies certain writing styles, disciplines, or author demographics as 'AI-assisted' or 'human-written,' every downstream association — team age, team size, impact — could be biased. The paper does not independently validate the detector on its specific PLoS and Nature full
What would settle it
If an independent, validated AI-detection method applied to the same corpus produced substantially different group classifications — for instance, if many papers currently labeled 'AI-assisted' were reclassified as human-written — the observed associations with team age, size, and impact could weaken or disappear.
read the original abstract
The era of Big Science has long been defined by increasingly large and specialized research teams pushing the frontiers of knowledge. However, recent advances in artificial intelligence (AI), particularly large language models (LLMs), are beginning to reshape academic writing and scientific research, potentially disrupting the longstanding trend toward ever-larger teams and transforming other dimensions of research team structure. Drawing on 147,074 full-text publications from the PLoS family and the Nature portfolio since 2020, we examined whether and how AI-assisted writing influences team structure and team outcomes in science. Using multiple methods, including ordinary least square, quantile regression, Poisson regression, logistic regression and propensity score matching, we found that research teams using AI-assisted writing tend to be younger and smaller. Importantly, this shift toward more compact, junior-leaning teams does not come at the expense of scientific impact. On the contrary, we observed a higher probability of research teams that employed AI-assisted writing producing highly impactful publications. These results highlight the significant role of AI-assisted writing in reshaping not only how research is produced, but also how research teams are formed and assembled. Our findings call for policy improvements in research evaluation, funding, and training to address this emerging trend.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper examines whether AI-assisted writing is associated with changes in research team structure (age and size) and scientific impact, using 147,074 publications from the PLoS family and Nature portfolio (2020–2025). AI usage is measured via the Liang et al. (2025) text-based detection algorithm applied to full-text articles. The authors find that teams with higher AI usage scores tend to be younger and smaller, and that AI-assisted publications are more likely to achieve top-5% FWCI. The analysis employs OLS, quantile regression, Poisson regression, logistic regression, and propensity score matching, with extensive controls and fixed effects.
Significance. The paper addresses a timely and important question at the intersection of AI adoption and scientific team dynamics. Its strengths include the use of full-text (rather than abstract-only) analysis across two distinct publisher portfolios, a multi-method analytical strategy with consistent specifications, and a falsifiable set of predictions that are tested across multiple estimators. The finding that AI-assisted teams are smaller and younger without apparent loss of impact, if robust, has genuine policy relevance. The use of an externally published detector (Liang et al. 2025, Nature Human Behaviour) rather than a home-grown measure is a reasonable methodological choice, though it introduces dependencies discussed below.
major comments (3)
- §3.2.1: The AI usage score from Liang et al. (2025) is the sole basis for the key independent variable, and the paper acknowledges that 5% of pre-GPT papers exceed the 95th-percentile threshold—these are necessarily false positives since ChatGPT did not exist. The paper does not examine what characteristics these pre-GPT false-positive papers share. If the detector systematically flags certain human writing styles (e.g., formulaic prose, non-native English patterns) that correlate with author career stage or team composition, the headline findings could be partially or wholly artifactual. The authors should at minimum: (a) analyze the pre-GPT false-positive papers for systematic correlates (discipline, author seniority, team size, journal), and (b) discuss the detector's known sensitivity to writing-style confounds. The discipline and journal fixed effects in the regressions partially,但不
- §3.2.2, Eq. (1): Career age is defined as years since an author's first publication found in the PLoS family and Nature Portfolio specifically. A researcher who published extensively in Science, Cell, Lancet, or other venues before appearing in PLoS/Nature would have their career age systematically underestimated. This measurement choice is load-bearing for RQ1 (team age) and could bias results if, for example, AI-assisted writing is more common among researchers who are newer to PLoS/Nature specifically but not necessarily junior in their careers. The authors should either justify why this measurement is adequate given that OpenAlex (which they already use) provides broader publication histories, or acknowledge this as a substantive limitation rather than a minor one.
- Table 2, Panel B (Nature): The PSM result for team size in the Nature portfolio is not statistically significant (ATT = -0.131, p = 0.384). Yet the headline claim in the abstract and §4.2 states that 'teams using AI-assisted writing tend to be smaller' as a general finding across both datasets. The Poisson regression (Table 3) does show a significant coefficient for Nature, but the PSM—the authors' preferred quasi-causal method—does not corroborate this for Nature. The paper should more carefully distinguish which findings are robust across methods and datasets and which are not, particularly in the abstract and conclusion.
minor comments (6)
- §3.2.3 is labeled '3.3.3 Team size' in the text, breaking the section numbering sequence.
- Table 3: The variable 'First Author Co-author Avg. Career Age (ln)' appears in the Poisson regression but is not described in the variable definitions section. It is unclear how this differs from the 'First Author Prior Team Age (ln)' used in Table 1.
- §4.1.1: The text states that for Nature, 'the differences in team age distributions across the three groups are not statistically significant,' yet the OLS regression (Table 1) and PSM (Table 2) show significant results. This apparent tension between the descriptive and inferential results should be clarified.
- Abstract: The phrase 'higher probability of research teams that employed AI-assisted writing producing highly impactful publications' is grammatically awkward; consider revision.
- The paper would benefit from reporting the actual AI usage score distribution statistics (mean, median, SD) for each of the three groups, beyond the visual distribution in Fig. 1.
- Several references have 2026 publication dates (e.g., Hao et al., 2026; He & Bu, 2026). If these are forthcoming or in-press, this should be noted.
Simulated Author's Rebuttal
We thank the referee for a careful and constructive report. All three major comments identify legitimate issues. We will (1) analyze pre-GPT false-positive papers for systematic correlates and add a discussion of detector writing-style confounds, (2) recompute career age using OpenAlex's broader publication histories and treat the current measurement as a robustness check, and (3) revise the abstract, §4.2, and conclusion to accurately reflect that the team-size finding is robust across methods for PLoS but that PSM does not corroborate the Poisson result for Nature. We view all three revisions as strengthening the paper.
read point-by-point responses
-
Referee: §3.2.1: The AI usage score from Liang et al. (2025) is the sole basis for the key independent variable, and the paper acknowledges that 5% of pre-GPT papers exceed the 95th-percentile threshold—these are necessarily false positives since ChatGPT did not exist. The paper does not examine what characteristics these pre-GPT false-positive papers share. If the detector systematically flags certain human writing styles (e.g., formulaic prose, non-native English patterns) that correlate with author career stage or team composition, the headline findings could be partially or wholly artifactual. The authors should at minimum: (a) analyze the pre-GPT false-positive papers for systematic correlates (discipline, author seniority, team size, journal), and (b) discuss the detector's known sensitivity to writing-style confounds.
Authors: The referee raises a valid and important concern. We agree that if the detector systematically flags particular human writing styles that correlate with team composition, our findings could be confounded. We will address this in two ways. First, we will conduct the requested analysis of pre-GPT false-positive papers: we will compare the 5% of pre-GPT papers exceeding the threshold to other pre-GPT papers on discipline, author seniority (career age), team size, and journal. If the false positives are systematically associated with younger or smaller teams, this would suggest a writing-style confound that inflates our headline findings; if not, the concern is substantially mitigated. Second, we will add a dedicated paragraph in §3.2.1 discussing the detector's known sensitivity to formulaic prose and non-native English writing patterns, drawing on the validation evidence reported in Liang et al. (2025) and related literature. We will also note that our regression specifications include discipline and journal fixed effects, which partially absorb systematic differences in writing conventions across fields and venues, though we agree these do not fully eliminate individual-level confounds within a discipline-journal cell. We acknowledge that this is a genuine limitation of relying on any text-based detector and will state this explicitly. revision: yes
-
Referee: §3.2.2, Eq. (1): Career age is defined as years since an author's first publication found in the PLoS family and Nature Portfolio specifically. A researcher who published extensively in Science, Cell, Lancet, or other venues before appearing in PLoS/Nature would have their career age systematically underestimated. This measurement choice is load-bearing for RQ1 (team age) and could bias results if, for example, AI-assisted writing is more common among researchers who are newer to PLoS/Nature specifically but not necessarily junior in their careers. The authors should either justify why this measurement is adequate given that OpenAlex (which they already use) provides broader publication histories, or acknowledge this as a substantive limitation rather than a minor one.
Authors: The referee is correct that restricting career age to first appearance in PLoS/Nature systematically underestimates seniority for researchers whose first publications appeared in other venues. This is a substantive measurement concern, not a minor one. We already use OpenAlex for author disambiguation and metadata, and OpenAlex does index broader publication histories across venues. We will recompute career age using each author's first publication year as recorded in OpenAlex across all indexed venues, which provides a far more accurate measure of academic seniority. We will then re-estimate all team-age analyses (OLS, quantile regression, PSM) with the revised measure. We will retain the original PLoS/Nature-based measure as a robustness check and report both sets of results. If the pattern holds under the broader measure, this strengthens our findings; if it weakens, we will report that honestly. We will also move this measurement issue from its current implicit treatment to an explicit discussion in the limitations section, acknowledging that even OpenAlex coverage is not complete, particularly for researchers in non-English-dominant scientific communities. revision: yes
-
Referee: Table 2, Panel B (Nature): The PSM result for team size in the Nature portfolio is not statistically significant (ATT = -0.131, p = 0.384). Yet the headline claim in the abstract and §4.2 states that 'teams using AI-assisted writing tend to be smaller' as a general finding across both datasets. The Poisson regression (Table 3) does show a significant coefficient for Nature, but the PSM—the authors' preferred quasi-causal method—does not corroborate this for Nature. The paper should more carefully distinguish which findings are robust across methods and which are not, particularly in the abstract and conclusion.
Authors: The referee is correct. The PSM result for team size in the Nature portfolio is not statistically significant, and our current framing in the abstract and §4.2 overstates the generality of the team-size finding. We will revise the manuscript as follows. First, in §4.2, we will explicitly state that the team-size finding is robust across PSM and Poisson regression for PLoS, but that for Nature, the Poisson coefficient is significant while the PSM ATT is not (p = 0.384). We will offer a substantive interpretation: the Nature sample has far fewer AI-assisted papers (1,581 vs. 11,914 in PLoS), reducing PSM statistical power, and Nature teams are on average larger (mean ~8.5 vs. ~6 in PLoS), meaning the absolute difference of 0.131 authors may be too small to detect reliably. Second, we will revise the abstract to state that AI-assisted writing is associated with smaller teams in PLoS across multiple methods, and with a significant negative coefficient in Poisson regression for Nature, while noting that the PSM result for Nature does not reach significance. Third, the conclusion will be adjusted to reflect this nuance rather than presenting the team-size finding as uniformly robust across both datasets. We agree that precision here is essential for the paper's credibility. revision: yes
Circularity Check
No circularity found: the paper's central claims are derived from external data sources and an externally-authored AI detection method, with no self-definitional or fitted-input-as-prediction patterns.
full rationale
The paper's central claims rest on three independent legs: (1) the AI usage score from Liang et al. (2025), an external benchmark authored by a different research group (no author overlap with the present paper); (2) bibliometric data from OpenAlex (author career age, team size, citations) computed via standard scientometric definitions (Eq. 1 for CareerAge, author counts for team size, FWCI normalization by year/discipline/document type); and (3) standard econometric methods (OLS, Poisson, logistic regression, PSM) applied to these variables. The regression equations (Eqs. 2-5) specify AIUsageScore as the independent variable and team age, team size, and Top5% FWCI as dependent variables—none of these are defined in terms of each other. The control variables (prior productivity, prior team structure, prior impact) are computed from historical data that is temporally prior to and distinct from the dependent variables. The 95th-percentile threshold for AI classification is derived from the pre-GPT distribution of the same external detector, not from a parameter the authors fit to their outcome data. No author of the present paper appears in the author list of Liang et al. (2025), so the load-bearing measurement is not a self-citation. The Nature dataset is noted as 'provided by Liang et al. (2025),' but this is data sharing, not a circular definitional dependency. The derivation chain is self-contained against external benchmarks, and no 'prediction' reduces by construction to a fitted input. The concerns raised by the skeptic (detector false positives, writing-style confounds) are correctness and validity risks, not circularity.
Axiom & Free-Parameter Ledger
free parameters (3)
- AI detection threshold (PLoS) =
0.092
- AI detection threshold (Nature) =
0.026
- Career age classification boundaries =
5, 15, 30 years
axioms (4)
- domain assumption The AI usage score from Liang et al. (2025) accurately estimates the proportion of AI-modified content in academic text.
- domain assumption Career age measured within PLoS and Nature publications is a valid proxy for author seniority.
- domain assumption Field-weighted citation impact (FWCI) within a 3-year window is a valid measure of scientific impact.
- standard math Article teams (co-authors on a single paper) are a valid proxy for research teams.
Reference graph
Works this paper leans on
-
[1]
only how research is produced, but also how research teams are formed and assembled. Our findings call for policy improvements in research evaluation, funding, and training to address this emerging trend. Keywords: Large language models; scientific collaboration; academic writing; team structure; scientific impact; artificial intelligence 2 1 INTRODUCTION...
work page 2007
-
[2]
and funding acquiring (Qian et al., 2026). While a growing body of empirical studies have documented the profound impact of AI on science, it has largely done so at either the micro level (i.e., on individual researchers or publications) (Bianchini et al.,
work page 2026
-
[3]
or the macro level (i.e., on research system as a whole) (Hao et al., 2026). By contrast, despite a few exceptions (Slade et al., 2026), the meso level, particularly the level of the research team, remains underexplored. AI has evolved beyond its initial role as a methodological tool to function as a general-purpose collaborator, capable of managing high-...
work page 2026
-
[4]
This study offers several important theoretical and practical contributions
to the full text of publications in the two datasets, to quantify AI adoption in academic writing of each publication, and explore whether and how it is associated with team size, team age, and team outcomes. This study offers several important theoretical and practical contributions. First, it enhances our understanding of how emerging technologies are i...
work page 2007
-
[5]
and the increasing complexity of scientific problems, which necessitate larger collaborative efforts. Our finding suggests that AI is pushing back against a long‑standing trend in the era of big science: the steady growth of team size. Second, this research provides critical insight into how AI shapes science at the meso level, specifically within researc...
work page 2024
-
[6]
help LLMs break down complex problems, mimic human reasoning, and learn from expert input. This lets them adapt quickly to new scientific challenges. AI can also process huge amounts of data automatically, spotting relationships and anomalies that human researchers might miss (Bianchini et al., 2025). As a result, AI is increasingly being integrated into ...
work page 2025
-
[7]
Nature portfolio dataset consists of 41,080 publications published from January 2020 to September
work page 2020
-
[8]
OpenAlex is an open-access and comprehensive bibliographic database (Priem et al., 2022). It indexes over 200 million publications from at least 109,000 global institutions and about 124,000 venues. Due to its comprehensive coverage and open-access nature, it has been widely adopted in relevant research (Culbert et al., 2025; Yang et al., 2026; Zhang et a...
work page 2022
-
[9]
3.2 Variables In this study, the unit of analysis is a research team denoted by 𝑖, which is proxied by an “article team”. Research teams are defined as research groups or project groups who work together (Guzzo & Dickson, 1996). In scientometrics studies or science of science studies, research teams are often measured by “article teams” (Leahey, 2016; Liu...
work page 1996
-
[10]
Fig.1 provides the distribution of AI usage scores for both PLoS and Nature
That score reflects the share of content altered beyond basic orthographic and grammatical corrections. Fig.1 provides the distribution of AI usage scores for both PLoS and Nature. It shows that most publications demonstrate very low AI usage scores and only a small portion exabits high AI usage scores. In addition, a higher mean AI usage score is founded...
-
[11]
Since these papers were written prior to the availability of ChatGPT, they are unlikely to apply AI-assisted writing and serve as the pre-treatment baseline. Group 2: Human-written (post-GPT) group: This group consists of publications from November 30, 2022 onward whose AI usage score is below the percentile of the pre-GPT score distribution for the same ...
work page 2022
-
[12]
The distribution of AI usage scores in the PLoS family (a) and Nature portfolio (b). 3.2.2 Team age This variable is applied to measure the average seniority of authors in a research team. Following established practice (Xu et al., 2022), we defined the career age of authors as the number of years elapsed between their first publications in the PLoS famil...
work page 2022
-
[13]
Late-career: Authors with a career age ≥ 30 years. 3.3.3 Team size Following the traditional measurement (Lee et al., 2015; Liu, Jaiswal, et al., 2022), we quantified team size of 𝑖 by counting the total number of authors listed in a publication’s byline. 3.2.4 Team outcome Team outcome is operationalized as the normalized citation count of a research tea...
work page 2015
-
[14]
We denoted this variable by 𝐹𝑊𝐶𝐼!
To enable cross‑publication comparison, we normalized each publication’s citation count by the mean citation count of all papers sharing the same publication year, discipline, and document type (e.g., article or review). We denoted this variable by 𝐹𝑊𝐶𝐼!. This normalized measure captures a publication’s relative citation performance against its peers. An ...
work page 1947
-
[15]
𝑇𝑒𝑎𝑚𝐴𝑔𝑒!= 𝛽'+ 𝛽(𝐴𝐼𝑈𝑠𝑎𝑔𝑒𝑆𝑐𝑜𝑟𝑒!+ Γ𝑋!+ 𝑇𝑖𝑚𝑒!+𝐷𝑖𝑠𝑐𝑖𝑝𝑙𝑖𝑛𝑒!+𝐽𝑜𝑢𝑟𝑛𝑎𝑙!+ 𝜖! (2) Where 𝑖 denotes a research team; 𝐴𝐼𝑠𝑐𝑜𝑟𝑒! represents the key independent variable, the estimated proportion of AI-assisted writing for a publication produced by 𝑖; 𝑇𝑒𝑎𝑚𝐴𝑔𝑒! is the dependent variable, defined as the average career age of all authors for the publication produced by 𝑖. 𝑋!...
work page 2022
-
[16]
The summary statistics of variables employed in this study are shown in Table S1 in Appendix. 3.3.3 Propensity score matching Publications may differ systematically in key confounding variables that are correlated with both the AI usage score and the three dependent variables. This raise concerns that any observed relationship between AI usage and the dep...
work page 2008
-
[17]
When the dependent variable was Top5% FWCI, we excluded FWCI from the covariates
When the dependent variable was team size or team age, the covariates included FWCI and ten author-level controls. When the dependent variable was Top5% FWCI, we excluded FWCI from the covariates. We utilized 1 to 1 nearest neighbor matching without replacement, imposing a caliper of 0.2 standard deviations of the propensity score to enforce strict simila...
work page 2011
-
[18]
5 DISCUSSION AND CONCLUSION Based on 147,074 full-text publications from the PLoS family and the Nature portfolio since 2020, we examined whether and how AI-assisted writing shapes research teams, focusing on team age, team size, and team outcomes. Our findings show that teams using AI-assisted writing tend to be younger and smaller. Importantly, this shi...
work page 2020
-
[19]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
to ensure transparent and accountable use of these tools, preserving the integrity of scholarly communication. This study has several limitations. First, it only analyzes publications from two sources: the PLoS family and the Nature portfolio. The findings therefore apply directly only to these two open‑access publishers. Whether they generalize to other ...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1016/j.respol.2005.01.014 2022
-
[20]
https://doi.org/10.1007/s10462-024-10902-3 Bouschery, S. G., Blazevic, V., & Piller, F. T. (2023). Augmenting human innovation teams with artificial intelligence: Exploring transformer-based language models. Journal of Product Innovation Management, 40(2), 139-153. https://doi.org/https://doi.org/10.1111/jpim.12656 Brynjolfsson, E., Li, D., & Raymond, L. ...
-
[21]
https://doi.org/10.1057/s41599-026-06956-z Jaiswal, A., Babu, A. R., Zadeh, M. Z., Banerjee, D., & Makedon, F. (2021). A Survey on Contrastive Self-Supervised Learning. Technologies, 9(1),
-
[22]
https://www.mdpi.com/2227-7080/9/1/2 Jones, B. F. (2009). The Burden of Knowledge and the “Death of the Renaissance Man”: Is Innovation Getting Harder? The Review of Economic Studies, 76(1), 283-317. https://doi.org/10.1111/j.1467-937X.2008.00531.x Jones, B. F., Wuchty, S., & Uzzi, B. (2008). Multi-university research teams: Shifting impact, geography, an...
-
[23]
Journal of the Association for Information Science and Technology, 66(7), 1323-1332. https://doi.org/https://doi.org/10.1002/asi.23266 Lee, Y.-N., Walsh, J. P., & Wang, J. (2015). Creativity in scientific teams: Unpacking novelty and 27 impact. Research Policy, 44(3), 684-697. https://doi.org/https://doi.org/10.1016/j.respol.2014.10.007 Liang, W., Izzo, Z...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1002/asi.23266 2015
-
[24]
https://doi.org/10.1057/s41599-023-02392-5 Liu, Y., Xie, Y., Shen, X., & Wu, D. (2026). Artificial intelligence use and scientific innovation. Journal of the Association for Information Science and Technology, 77(5), 682-698. https://doi.org/https://doi.org/10.1002/asi.70043 Lund, B. D., Wang, T., Mannuru, N. R., Nie, B., Shimray, S., & Wang, Z. (2023). C...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.