The Fast and Spurious: Developer Productivity with GenAI

Anita Sarma; Bianca Trinkenreich; Igor Steinmacher; Katie Kimura; Sadia Afroz; Tyler Menezes; Zixuan Feng

arxiv: 2510.24265 · v2 · submitted 2025-10-28 · 💻 cs.SE · cs.HC

The Fast and Spurious: Developer Productivity with GenAI

Sadia Afroz , Zixuan Feng , Tyler Menezes , Katie Kimura , Bianca Trinkenreich , Igor Steinmacher , Anita Sarma This is my paper

Pith reviewed 2026-05-18 03:23 UTC · model grok-4.3

classification 💻 cs.SE cs.HC

keywords GenAI adoptiondeveloper productivitySPACE frameworksoftware engineering surveycode review burdencognitive loadeffort redistributionspurious productivity gains

0 comments

The pith

GenAI speeds up coding tasks but shifts effort to code review and output verification, leaving perceived productivity gains potentially spurious.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys 415 software practitioners to examine how generative AI affects developer productivity across multiple dimensions. Using the SPACE framework, it shows that frequent users report faster task completion and higher output volume, yet these are offset by increased code review burden, ongoing cognitive load from verifying AI suggestions, and no improvement in collaboration. The results point to a redistribution of effort rather than a net reduction in work. The study also connects the challenges developers report to possible mitigation strategies. This leads to the view that current productivity gains from GenAI may be surface-level accelerations accompanied by hidden costs.

Core claim

Frequent GenAI users complete tasks more quickly and produce more code, but these gains are counterbalanced by higher demands on code review, sustained cognitive effort to check AI outputs for correctness, and unchanged collaboration patterns. Applying the SPACE framework to the survey data reveals systematic shifts of effort across satisfaction, performance, activity, communication, and efficiency dimensions. At the present stage of adoption, this pattern indicates that perceived productivity improvements may be spurious.

What carries the argument

Survey mapping of GenAI usage levels to perceived changes across the five SPACE productivity dimensions, which tracks where effort increases and decreases.

If this is right

Faster individual task completion does not reduce total workload because of added review time.
Cognitive load from verifying AI outputs stays constant even as speed improves.
Collaboration and communication patterns show no change with GenAI use.
Developers face specific challenges that can be addressed with targeted mitigation strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Teams may need to adjust schedules to account for extra review time when rolling out GenAI tools.
Pairing self-reported data with logged metrics could test whether the observed shifts hold in practice.
GenAI tools could be refined to lower the verification overhead that currently offsets speed gains.

Load-bearing premise

Developers' self-reported perceptions of productivity changes accurately reflect actual shifts without systematic bias or the need for objective measures like time logs.

What would settle it

A follow-up study that collects objective time-tracking or version-control data on coding, review, and verification effort before and after GenAI adoption to check whether net productivity increases.

Figures

Figures reproduced from arXiv: 2510.24265 by Anita Sarma, Bianca Trinkenreich, Igor Steinmacher, Katie Kimura, Sadia Afroz, Tyler Menezes, Zixuan Feng.

**Figure 1.** Figure 1: Split-violin plots of aggregated SPACE scores. Left half = non-frequent AI users (blue), right half = frequent AI users [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 5.** Figure 5: Distribution of responses to Communication and [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 3.** Figure 3: Distribution of responses to Performance dimen [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 6.** Figure 6: Distribution of responses to Efficiency and flow [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗

**Figure 4.** Figure 4: Distribution of responses to Activity dimension [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Generative AI (GenAI) tools are increasingly being adopted in software development as productivity aids, since there is evidence that GenAI tools can improve individual aspects of productivity. However, productivity is multidimensional; accelerating one aspect of work may simply shift effort to another. In this paper, we investigate how GenAI adoption affects different dimensions of developer productivity. We surveyed 415 software practitioners to understand how they perceive productivity changes associated with AI adoption, using the SPACE framework (Satisfaction and well-being, Performance, Activity, Communication and collaboration, and Efficiency and flow). Our results reveal systematic redistribution of effort across SPACE dimensions. While frequent GenAI users reported faster task completion and higher output volume, these gains were offset by increased code review burden, persistent cognitive load from output verification, and unchanged collaboration patterns. We further provide an empirical mapping between the challenges perceived by developers and potential strategies to mitigate them. Overall, our findings suggest that, at the current stage of GenAI adoption, perceived productivity gains may be spurious -- surface-level acceleration, often accompanied by redistributed effort and hidden costs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Survey of 415 devs finds GenAI speeds some tasks but shifts effort to review and verification, though self-reports leave the 'spurious gains' claim open to bias questions.

read the letter

The key point from this paper is that GenAI tools seem to speed up individual task completion and increase output for frequent users, but those benefits get balanced out by more time spent on code reviews, verifying AI-generated code, and no real change in team collaboration. The authors conclude that the productivity improvements might be more apparent than real. They apply the SPACE framework to organize survey responses from 415 software practitioners. This gives a structured look at how adoption affects satisfaction and well-being, performance, activity levels, communication, and efficiency. The new data shows specific shifts, like higher activity in some areas but added burdens in others. They also outline mitigation strategies based on what developers reported as challenges. This approach works well because it moves beyond single-metric views of productivity and captures the multidimensional nature. The sample size is reasonable for a survey, and the findings align with some existing concerns in the field about hidden costs of AI assistance. The main limitation is the reliance on self-reported data alone. Perceptions of faster work or increased review time could be influenced by biases in how people recall their experiences or respond to questions. Without cross-checks using objective measures such as version control logs, pull request timelines, or actual time tracking, the evidence for effort redistribution stays somewhat soft. The paper would be stronger with even a small set of corroborating metrics from a subset of respondents. This paper is aimed at software engineering researchers studying tool adoption and at practitioners or managers evaluating GenAI in their teams. Anyone looking at productivity frameworks or AI in development will get practical insights from the redistribution patterns and the suggested mitigations. It should go to peer review. The empirical mapping is useful and timely, and referees can help tighten the methods around validation.

Referee Report

2 major / 2 minor

Summary. The paper reports results from a survey of 415 software practitioners on how GenAI tool adoption affects developer productivity across the SPACE dimensions (Satisfaction, Performance, Activity, Communication/collaboration, Efficiency/flow). Frequent users report faster task completion and higher output volume, but these are offset by greater code-review burden, persistent cognitive load from verifying AI-generated code, and unchanged collaboration patterns; the authors conclude that perceived productivity gains may therefore be spurious and supply an empirical mapping from reported challenges to mitigation strategies.

Significance. If the central redistribution claim holds, the work would be a useful addition to the empirical literature on GenAI in software engineering by moving beyond single-dimension speed claims and applying the established SPACE lens. The challenge-to-strategy mapping supplies concrete practitioner guidance. The study is timely and the sample size is respectable for a perception survey, but the absence of objective corroboration limits how strongly the “spurious” interpretation can be advanced.

major comments (2)

[§3] §3 (Survey Design and Data Collection): The central claim that gains are spurious depends on self-reported perceptions accurately reflecting actual effort redistribution. The manuscript provides no objective metrics (commit rates, PR review durations, time logs), reports no response rate, and does not describe controls for self-selection bias. This is load-bearing for the abstract’s conclusion because the observed offsets in review burden and verification load could be artifacts of recall or social-desirability bias rather than genuine shifts.
[§4] §4 (Results): The quantitative presentation of SPACE-dimension changes relies on Likert-scale or frequency responses without reported effect sizes, confidence intervals, or statistical tests comparing frequent versus infrequent users. Without these, it is difficult to judge whether the reported offsets are large enough to render the performance/activity gains spurious.

minor comments (2)

[Abstract] The abstract states that the survey used the SPACE framework but does not list the exact items or adaptations; a short appendix or table with the instrument would improve reproducibility.
[Figure 2] Figure 2 (SPACE dimension shifts) would benefit from error bars or significance markers to allow readers to assess the reliability of the reported differences.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below, indicating planned revisions where feasible.

read point-by-point responses

Referee: [§3] §3 (Survey Design and Data Collection): The central claim that gains are spurious depends on self-reported perceptions accurately reflecting actual effort redistribution. The manuscript provides no objective metrics (commit rates, PR review durations, time logs), reports no response rate, and does not describe controls for self-selection bias. This is load-bearing for the abstract’s conclusion because the observed offsets in review burden and verification load could be artifacts of recall or social-desirability bias rather than genuine shifts.

Authors: We agree that the study is based on self-reported perceptions, which is the standard approach when applying the SPACE framework to capture how developers experience productivity across multiple dimensions. Objective metrics such as commit rates or time logs would require a longitudinal design with direct access to development data, which was outside the scope of this perception survey. We will add an expanded limitations section that explicitly discusses potential biases including recall bias and social-desirability bias. The survey was distributed through open professional channels (e.g., LinkedIn groups, developer forums, and social media), so a response rate cannot be calculated; we will state this clearly in the methods. We will also elaborate on our sampling approach and efforts to reach a broad practitioner population to address self-selection concerns. revision: partial
Referee: [§4] §4 (Results): The quantitative presentation of SPACE-dimension changes relies on Likert-scale or frequency responses without reported effect sizes, confidence intervals, or statistical tests comparing frequent versus infrequent users. Without these, it is difficult to judge whether the reported offsets are large enough to render the performance/activity gains spurious.

Authors: We thank the referee for highlighting the need for stronger statistical presentation. The original analysis emphasized descriptive patterns and the observed redistribution of effort. We will revise the results section to include effect sizes (using appropriate measures for ordinal data), confidence intervals for key findings, and statistical comparisons (e.g., Mann-Whitney U or chi-square tests) between frequent and infrequent GenAI users. These additions will allow readers to better assess the magnitude and reliability of the reported offsets. revision: yes

standing simulated objections not resolved

We cannot provide objective metrics such as commit rates, PR review durations, or time logs, as the study was designed as a cross-sectional perception survey and collected no such data.

Circularity Check

0 steps flagged

Empirical survey with no derivation or self-referential reduction

full rationale

The paper is a survey-based empirical study of 415 practitioners using the established SPACE framework. Central claims about redistributed effort and spurious perceived gains are presented as direct interpretations of self-reported responses rather than any mathematical derivation, fitted parameter, or equation that reduces to prior inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify the results. The study is self-contained against its own data collection and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on the assumption that practitioner self-reports are a valid proxy for productivity changes; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Self-reported perceptions from software practitioners accurately reflect changes in productivity dimensions.
The entire analysis depends on survey responses without stated validation against objective logs or metrics.

pith-pipeline@v0.9.0 · 5734 in / 1127 out tokens · 23238 ms · 2026-05-18T03:23:26.525359+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

To Copilot and Beyond: 22 AI Systems Developers Want Built
cs.SE 2026-04 unverdicted novelty 5.0

Survey of 860 developers reveals 22 desired AI systems for non-coding tasks with explicit constraints on authority, provenance, and quality signals, framed as bounded delegation where AI handles assembly work but not ...

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Anonym. 2025. Developer Productivity with GenAI — Appendix. https://doi. org/10.5281/zenodo.17459831

work page doi:10.5281/zenodo.17459831 2025
[2]

Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein. 2025. Measuring the impact of early-2025 AI on experienced open-source developer productivity. arXiv preprint arXiv:2507.09089(2025)

work page arXiv 2025
[3]

Ana Casic and Eri Panselina. 2025. Quiet cracking: The hidden crisis silently re- shaping work. https://www.talentlms.com/research/quiet-cracking-workplace- survey

work page 2025
[4]

Thomas Dohmke, Marco Iansiti, and Greg Richards. 2023. Sea change in software development: Economic and productivity analysis of the ai-powered developer lifecycle.arXiv preprint arXiv:2306.15033(2023)

work page arXiv 2023
[5]

Zixuan Feng, Amreeta Chatterjee, Anita Sarma, and Iftekhar Ahmed. 2022. A case study of implicit mentoring, its prevalence, and impact in Apache. InESEC/FSE. 797–809

work page 2022
[6]

Zixuan Feng, Igor Steinmacher, Marco Gerosa, Tyler Menezes, Alexander Sere- brenik, Reed Milewicz, and Anita Sarma. 2025. The multifaceted nature of mentoring in oss: strategies, qualities, and ideal outcomes. InCHASE. IEEE, 203–214

work page 2025
[7]

Zixuan Feng, Thomas Zimmermann, Lorenzo Pisani, Christopher Gooley, Jeremiah Wander, and Anita Sarma. 2025. When Domains Collide: An Ac- tivity Theory Exploration of Cross-Disciplinary Collaboration.arXiv preprint arXiv:2506.20063(2025)

work page arXiv 2025
[8]

Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler. 2021. The SPACE of Developer Productivity: There’s more to it than you think.Queue19, 1 (2021), 20–48

work page 2021
[9]

Ranim Khojah, Mazen Mohamad, Philipp Leitner, and Francisco de Oliveira Neto

work page
[10]

Beyond code generation: An observational study of chatgpt usage in software engineering practice.FSE1 (2024), 1819–1840

work page 2024
[11]

Will I be replaced?

Mohammad Amin Kuhail, Sujith Samuel Mathew, Ashraf Khalil, Jose Berengueres, and Syed Jawad Hussain Shah. 2024. “Will I be replaced?” Assessing ChatGPT’s effect on software development and programmer perceptions of AI tools.Science of Computer Programming235 (2024), 103111

work page 2024
[12]

André N Meyer, Thomas Fritz, Gail C Murphy, and Thomas Zimmermann. 2014. Software developers’ perceptions of productivity. InFSE. 19–29

work page 2014
[13]

Maybe We Need Some More Examples:

Courtney Miller, Rudrajit Choudhuri, Mara Ulloa, Sankeerti Haniyur, Robert DeLine, Margaret-Anne Storey, Emerson Murphy-Hill, Christian Bird, and Jenna L Butler. 2025. " Maybe We Need Some More Examples:" Individual and Team Drivers of Developer GenAI Tool Use.arXiv preprint arXiv:2507.21280(2025)

work page arXiv 2025
[14]

Audris Mockus, Roy Fielding, and James Herbsleb. 2002. Two case studies of open source software development: Apache and Mozilla.TOSEM11, 3 (2002)

work page 2002
[15]

Anh Nguyen-Duc, Beatriz Cabrero-Daniel, Adam Przybylek, Chetan Arora, Dron Khanna, Tomas Herda, Usman Rafiq, Jorge Melegati, Eduardo Guerra, Kai-Kristian Kemell, et al. 2025. Generative artificial intelligence for software engineering—A research agenda.Software: Practice and Experience(2025)

work page 2025
[16]

Abi Noda, Margaret-Anne Storey, Nicole Forsgren, and Michaela Greiler. 2023. DevEx: What Actually Drives Productivity: The developer-centric approach to measuring and improving productivity.Queue21, 2 (2023), 35–53

work page 2023
[17]

Edson Oliveira, Eduardo Fernandes, Igor Steinmacher, Marco Cristo, Tayana Conte, and Alessandro Garcia. 2020. Code and commit metrics of developer productivity: a study on team leaders perceptions.EMSE25, 4 (2020), 2519–2549

work page 2020
[18]

Elise Paradis, Kate Grey, Quinn Madison, et al. 2025. How much does AI impact development speed? An enterprise RCT. InProc. ICSE-SEIP. 618–629

work page 2025
[19]

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv preprint arXiv:2302.06590

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Ketai Qiu, Niccolò Puccinelli, Matteo Ciniselli, and Luca Di Grazia. 2025. From today’s code to tomorrow’s symphony: The AI transformation of developer’s routine by 2030.TOSEM34, 5 (2025), 1–17

work page 2025
[21]

Ya Gao & GitHub Customer Research. [n. d.]. Research: Quanti- fying GitHub Copilot’s impact in the enterprise with Accenture. https://github.blog/news-insights/research/research-quantifying-github- copilots-impact-in-the-enterprise-with-accenture/. Accessed: 2025-08-08

work page 2025
[22]

Daniel Rodríguez, MA Sicilia, E García, and Rachel Harrison. 2012. Empirical findings on team size and productivity in software development.Journal of Systems and Software85, 3 (2012), 562–570

work page 2012
[23]

Mario Rodriguez. 2023. Research: Quantifying GitHub Copilot’s impact on code quality. https://github.blog/news-insights/research/research-quantifying-github- copilots-impact-on-code-quality/. Accessed: 2025-10-21

work page 2023
[24]

Alan Shimel. 2025. Stack Overflow Survey Shows AI Adoption for Devs.De- vOps.com(August 12 2025). https://devops.com/stack-overflow-survey-shows- Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Sadia Afroz, Zixuan Feng, Katie Kimura, Bianca Trinkenreich, Igor Steinmacher, and Anita Sarma ai-adoption-for-devs/ Accessed: 2025-10-05

work page 2025
[25]

Margaret-Anne Storey, Brian Houck, and Thomas Zimmermann. 2022. How developers and managers define and trade productivity for quality. InCHASE

work page 2022
[26]

Margaret-Anne Storey, Thomas Zimmermann, Christian Bird, Jacek Czerwonka, Brendan Murphy, and Eirini Kalliamvakou. 2019. Towards a theory of software developer job satisfaction and perceived productivity.TSE47, 10 (2019)

work page 2019
[27]

Franziska Tobisch and Florian Matthes. 2025. Knowledge Sharing and Coor- dination in Large-Scale Agile Software Development–A Systematic Literature Review and an Interview Study. InInternational Conference on Agile Software Development. Springer, 81–99

work page 2025
[28]

Anna Tong. 2025. AI slows down some experienced software developers, study finds. https://www.reuters.com/business/ai-slows-down-some-experienced- software-developers-study-finds-2025-07-10/. Accessed: 2025-10-21

work page 2025
[29]

Bianca Trinkenreich, Fabio Santos, and Klaas-jan Stol. 2024. Predicting attrition among software professionals: Antecedents and consequences of burnout and engagement.TOSEM33, 8 (2024), 1–45

work page 2024
[30]

Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. InCHI EA. 1–7

work page 2022
[31]

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and productivity outcomes relating to continuous integra- tion in GitHub. InESEC/FSE. 805–816

work page 2015
[32]

Brennan Wilkes, Alessandra Maciel Paz Milani, and Margaret-Anne Storey. 2023. A framework for automating the measurement of devops research and assessment (dora) metrics. InICSME. IEEE, 62–72

work page 2023
[33]

Liang Yu. 2025. Paradigm shift on Coding Productivity Using GenAI.arXiv preprint arXiv:2504.18404(2025)

work page arXiv 2025
[34]

Ilya Zakharov, Ekaterina Koshchenko, and Agnia Sergeyuk. 2025. AI in Software Engineering: Perceived Roles and Their Impact on Adoption. InFSE Companion. 1305–1309

work page 2025
[35]

Minghui Zhou and Audris Mockus. 2010. Developer fluency: Achieving true mastery in software projects. InFSE. 137–146

work page 2010
[36]

Albert Ziegler, Eirini Kalliamvakou, X Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2022. Productivity assessment of neural code completion. InMAPS. 21–29

work page 2022

[1] [1]

Anonym. 2025. Developer Productivity with GenAI — Appendix. https://doi. org/10.5281/zenodo.17459831

work page doi:10.5281/zenodo.17459831 2025

[2] [2]

Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein. 2025. Measuring the impact of early-2025 AI on experienced open-source developer productivity. arXiv preprint arXiv:2507.09089(2025)

work page arXiv 2025

[3] [3]

Ana Casic and Eri Panselina. 2025. Quiet cracking: The hidden crisis silently re- shaping work. https://www.talentlms.com/research/quiet-cracking-workplace- survey

work page 2025

[4] [4]

Thomas Dohmke, Marco Iansiti, and Greg Richards. 2023. Sea change in software development: Economic and productivity analysis of the ai-powered developer lifecycle.arXiv preprint arXiv:2306.15033(2023)

work page arXiv 2023

[5] [5]

Zixuan Feng, Amreeta Chatterjee, Anita Sarma, and Iftekhar Ahmed. 2022. A case study of implicit mentoring, its prevalence, and impact in Apache. InESEC/FSE. 797–809

work page 2022

[6] [6]

Zixuan Feng, Igor Steinmacher, Marco Gerosa, Tyler Menezes, Alexander Sere- brenik, Reed Milewicz, and Anita Sarma. 2025. The multifaceted nature of mentoring in oss: strategies, qualities, and ideal outcomes. InCHASE. IEEE, 203–214

work page 2025

[7] [7]

Zixuan Feng, Thomas Zimmermann, Lorenzo Pisani, Christopher Gooley, Jeremiah Wander, and Anita Sarma. 2025. When Domains Collide: An Ac- tivity Theory Exploration of Cross-Disciplinary Collaboration.arXiv preprint arXiv:2506.20063(2025)

work page arXiv 2025

[8] [8]

Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler. 2021. The SPACE of Developer Productivity: There’s more to it than you think.Queue19, 1 (2021), 20–48

work page 2021

[9] [9]

Ranim Khojah, Mazen Mohamad, Philipp Leitner, and Francisco de Oliveira Neto

work page

[10] [10]

Beyond code generation: An observational study of chatgpt usage in software engineering practice.FSE1 (2024), 1819–1840

work page 2024

[11] [11]

Will I be replaced?

Mohammad Amin Kuhail, Sujith Samuel Mathew, Ashraf Khalil, Jose Berengueres, and Syed Jawad Hussain Shah. 2024. “Will I be replaced?” Assessing ChatGPT’s effect on software development and programmer perceptions of AI tools.Science of Computer Programming235 (2024), 103111

work page 2024

[12] [12]

André N Meyer, Thomas Fritz, Gail C Murphy, and Thomas Zimmermann. 2014. Software developers’ perceptions of productivity. InFSE. 19–29

work page 2014

[13] [13]

Maybe We Need Some More Examples:

Courtney Miller, Rudrajit Choudhuri, Mara Ulloa, Sankeerti Haniyur, Robert DeLine, Margaret-Anne Storey, Emerson Murphy-Hill, Christian Bird, and Jenna L Butler. 2025. " Maybe We Need Some More Examples:" Individual and Team Drivers of Developer GenAI Tool Use.arXiv preprint arXiv:2507.21280(2025)

work page arXiv 2025

[14] [14]

Audris Mockus, Roy Fielding, and James Herbsleb. 2002. Two case studies of open source software development: Apache and Mozilla.TOSEM11, 3 (2002)

work page 2002

[15] [15]

Anh Nguyen-Duc, Beatriz Cabrero-Daniel, Adam Przybylek, Chetan Arora, Dron Khanna, Tomas Herda, Usman Rafiq, Jorge Melegati, Eduardo Guerra, Kai-Kristian Kemell, et al. 2025. Generative artificial intelligence for software engineering—A research agenda.Software: Practice and Experience(2025)

work page 2025

[16] [16]

Abi Noda, Margaret-Anne Storey, Nicole Forsgren, and Michaela Greiler. 2023. DevEx: What Actually Drives Productivity: The developer-centric approach to measuring and improving productivity.Queue21, 2 (2023), 35–53

work page 2023

[17] [17]

Edson Oliveira, Eduardo Fernandes, Igor Steinmacher, Marco Cristo, Tayana Conte, and Alessandro Garcia. 2020. Code and commit metrics of developer productivity: a study on team leaders perceptions.EMSE25, 4 (2020), 2519–2549

work page 2020

[18] [18]

Elise Paradis, Kate Grey, Quinn Madison, et al. 2025. How much does AI impact development speed? An enterprise RCT. InProc. ICSE-SEIP. 618–629

work page 2025

[19] [19]

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer. 2023. The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv preprint arXiv:2302.06590

work page internal anchor Pith review Pith/arXiv arXiv 2023

[20] [20]

Ketai Qiu, Niccolò Puccinelli, Matteo Ciniselli, and Luca Di Grazia. 2025. From today’s code to tomorrow’s symphony: The AI transformation of developer’s routine by 2030.TOSEM34, 5 (2025), 1–17

work page 2025

[21] [21]

Ya Gao & GitHub Customer Research. [n. d.]. Research: Quanti- fying GitHub Copilot’s impact in the enterprise with Accenture. https://github.blog/news-insights/research/research-quantifying-github- copilots-impact-in-the-enterprise-with-accenture/. Accessed: 2025-08-08

work page 2025

[22] [22]

Daniel Rodríguez, MA Sicilia, E García, and Rachel Harrison. 2012. Empirical findings on team size and productivity in software development.Journal of Systems and Software85, 3 (2012), 562–570

work page 2012

[23] [23]

Mario Rodriguez. 2023. Research: Quantifying GitHub Copilot’s impact on code quality. https://github.blog/news-insights/research/research-quantifying-github- copilots-impact-on-code-quality/. Accessed: 2025-10-21

work page 2023

[24] [24]

Alan Shimel. 2025. Stack Overflow Survey Shows AI Adoption for Devs.De- vOps.com(August 12 2025). https://devops.com/stack-overflow-survey-shows- Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Sadia Afroz, Zixuan Feng, Katie Kimura, Bianca Trinkenreich, Igor Steinmacher, and Anita Sarma ai-adoption-for-devs/ Accessed: 2025-10-05

work page 2025

[25] [25]

Margaret-Anne Storey, Brian Houck, and Thomas Zimmermann. 2022. How developers and managers define and trade productivity for quality. InCHASE

work page 2022

[26] [26]

Margaret-Anne Storey, Thomas Zimmermann, Christian Bird, Jacek Czerwonka, Brendan Murphy, and Eirini Kalliamvakou. 2019. Towards a theory of software developer job satisfaction and perceived productivity.TSE47, 10 (2019)

work page 2019

[27] [27]

Franziska Tobisch and Florian Matthes. 2025. Knowledge Sharing and Coor- dination in Large-Scale Agile Software Development–A Systematic Literature Review and an Interview Study. InInternational Conference on Agile Software Development. Springer, 81–99

work page 2025

[28] [28]

Anna Tong. 2025. AI slows down some experienced software developers, study finds. https://www.reuters.com/business/ai-slows-down-some-experienced- software-developers-study-finds-2025-07-10/. Accessed: 2025-10-21

work page 2025

[29] [29]

Bianca Trinkenreich, Fabio Santos, and Klaas-jan Stol. 2024. Predicting attrition among software professionals: Antecedents and consequences of burnout and engagement.TOSEM33, 8 (2024), 1–45

work page 2024

[30] [30]

Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. InCHI EA. 1–7

work page 2022

[31] [31]

Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and productivity outcomes relating to continuous integra- tion in GitHub. InESEC/FSE. 805–816

work page 2015

[32] [32]

Brennan Wilkes, Alessandra Maciel Paz Milani, and Margaret-Anne Storey. 2023. A framework for automating the measurement of devops research and assessment (dora) metrics. InICSME. IEEE, 62–72

work page 2023

[33] [33]

Liang Yu. 2025. Paradigm shift on Coding Productivity Using GenAI.arXiv preprint arXiv:2504.18404(2025)

work page arXiv 2025

[34] [34]

Ilya Zakharov, Ekaterina Koshchenko, and Agnia Sergeyuk. 2025. AI in Software Engineering: Perceived Roles and Their Impact on Adoption. InFSE Companion. 1305–1309

work page 2025

[35] [35]

Minghui Zhou and Audris Mockus. 2010. Developer fluency: Achieving true mastery in software projects. InFSE. 137–146

work page 2010

[36] [36]

Albert Ziegler, Eirini Kalliamvakou, X Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2022. Productivity assessment of neural code completion. InMAPS. 21–29

work page 2022