Self-Admitted GenAI Usage in Open-Source Software

Christoph Treude; Fabio Calefato; Hideaki Hata; Raula Gaikovina Kula; Sebastian Baltes; Tao Xiao; Youmei Fan

arxiv: 2507.10422 · v4 · submitted 2025-07-14 · 💻 cs.SE

Self-Admitted GenAI Usage in Open-Source Software

Tao Xiao , Youmei Fan , Fabio Calefato , Christoph Treude , Raula Gaikovina Kula , Hideaki Hata , Sebastian Baltes This is my paper

Pith reviewed 2026-05-19 04:53 UTC · model grok-4.3

classification 💻 cs.SE

keywords generative AIopen source softwareself-admitted usagecode churnsoftware developmentGitHub repositoriesAI-assisted coding

0 comments

The pith

Open-source developers explicitly admit using generative AI tools in commits and comments, revealing careful project management and no overall rise in code churn.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines self-admitted GenAI usage as explicit developer references to tools like Copilot or ChatGPT in project artifacts. It scans more than 200,000 repositories to locate 1,292 such mentions across 156 projects and classifies them into 32 tasks, 10 content types, and 11 purposes. The work reviews 13 policy documents and surveys developers to surface ethical, legal, and practical concerns. Longitudinal tracking of code changes in 151 repositories shows no general increase in churn after these admissions appear. This evidence indicates that open-source teams actively shape how generative AI enters their workflows rather than letting it run unchecked.

Core claim

Developers in open-source projects actively manage generative AI by making explicit self-admissions in commit messages, code comments, and documentation. These admissions form the basis for a taxonomy of tasks, content, and purposes, while policy documents and surveys expose concerns about attribution and quality. Analysis of code churn over time in repositories that contain such admissions finds no general increase, which contradicts common narratives about the disruptive effects of GenAI on software development.

What carries the argument

self-admitted GenAI usage, the practice of developers explicitly noting the use of generative AI tools for creating content in software artifacts such as commit messages, comments, and documentation.

If this is right

Project maintainers should establish explicit rules for transparency and proper attribution of AI-generated contributions.
Quality control steps become necessary whenever generative AI assists in writing or modifying code.
Adoption of generative AI tools does not produce a measurable general increase in code churn within open-source repositories.
Ethical and legal considerations around generative AI require attention at the level of individual projects rather than only at the tool level.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar self-admission mechanisms could help track the introduction of other new development tools beyond generative AI.
Comparing admitted and non-admitted usage in the same repositories would test how complete the current sample is.
Maintainers might adopt standardized admission formats to streamline code review for AI-assisted changes.

Load-bearing premise

Explicit self-admissions in commits, comments, and documentation provide a representative sample of actual generative AI usage across open-source projects.

What would settle it

A broad scan of repositories that locates widespread generative AI code without corresponding self-admissions, or a re-analysis of churn rates that shows a clear rise once non-admitted usage is included.

Figures

Figures reproduced from arXiv: 2507.10422 by Christoph Treude, Fabio Calefato, Hideaki Hata, Raula Gaikovina Kula, Sebastian Baltes, Tao Xiao, Youmei Fan.

read the original abstract

Strategized LaTeX removal and whitespace normalization approachThe widespread adoption of generative AI (GenAI) tools such as GitHub Copilot and ChatGPT is transforming software development. Since generated source code is virtually impossible to distinguish from manually written code, their real-world usage and impact on open-source software (OSS) development remain poorly understood. In this paper, we introduce the concept of self-admitted GenAI usage, that is, developers explicitly referring to the use of GenAI tools for content creation in software artifacts. Using this concept as a lens to study how GenAI tools are integrated into OSS projects, we analyze a curated sample of more than 200,000 GitHub repositories, identifying 1,292 such self-admissions across 156 repositories in commit messages, code comments, and project documentation. Using a mixed methods approach, we derive a taxonomy of 32 tasks, 10 content types, and 11 purposes associated with GenAI usage based on 1,292 qualitatively coded mentions. We then analyze 13 documents with policies and usage guidelines for GenAI tools and conduct a developer survey to uncover the ethical, legal, and practical concerns behind them. Our findings reveal that developers actively manage how GenAI is used in their projects, highlighting the need for project-level transparency, attribution, and quality control practices in AI-assisted software development. Finally, we examine the longitudinal impact of GenAI adoption on code churn in 151 repositories with self-admitted GenAI usage and find no general increase, contradicting popular narratives on the impact of GenAI on software development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper maps self-admitted GenAI use in OSS with a solid taxonomy but the no-churn result is undercut by selection on projects that disclose it.

read the letter

Hi, the main things to know are that this is the first large-scale extraction of self-admitted GenAI mentions from real OSS artifacts, yielding a taxonomy of 32 tasks, 10 content types, and 11 purposes, plus a longitudinal check showing no general rise in code churn within the 151 admitting repositories they tracked. They scanned over 200k repos to surface 1,292 mentions across 156 projects, then coded them qualitatively and supplemented with policy documents and a survey on concerns. That gives a concrete baseline on what developers actually say they are doing with tools like Copilot or ChatGPT, and the churn analysis is a direct test of impact claims. The mixed-methods setup fits the observational nature of the data and the null churn finding is worth having on record even if limited in scope. The soft spot is the sample itself. By conditioning on explicit admissions in commits, comments, or docs, the study likely captures projects that are already more open or structured in their processes, which could explain why churn stays flat. That selection makes it difficult to treat the result as a general contradiction to narratives about GenAI raising maintenance costs across all usage. The abstract leaves inter-rater reliability, exact sampling details, and statistical controls for the churn part unspecified, so those need to be checked in the full text to judge robustness. This work is aimed at empirical software engineering researchers who study tool adoption and transparency practices. Readers focused on AI-assisted development guidelines or code quality metrics will get usable categories and counts from it. It deserves a serious referee because the artifact-based data collection is new and the questions are timely, though any review should press on the generalizability of the churn claim and ask for more on how the admitting subset compares to the broader population of GenAI-using projects. I would send it to review with targeted requests for sampling discussion and quantitative details.

Referee Report

2 major / 2 minor

Summary. The paper introduces the concept of self-admitted GenAI usage in OSS projects and analyzes over 200,000 GitHub repositories to identify 1,292 explicit mentions across 156 repositories. It develops a taxonomy of 32 tasks, 10 content types, and 11 purposes via qualitative coding, examines 13 policy documents, conducts a developer survey on ethical/legal/practical concerns, and performs longitudinal churn analysis on 151 repositories, concluding that developers actively manage GenAI usage and that there is no general increase in code churn after adoption, contradicting popular narratives.

Significance. If the central claims hold, the work offers timely empirical insights into real-world GenAI integration in open-source development, emphasizing project-level transparency, attribution, and quality controls. The mixed-methods design—qualitative coding of 1,292 items paired with quantitative churn tracking across 151 repositories—is appropriate and provides both depth in usage patterns and breadth in longitudinal impact assessment.

major comments (2)

[Methods section, data collection and sampling] Methods section, data collection and sampling: The no-general-increase claim in the longitudinal churn analysis of 151 repositories (selected from 156 with self-admissions out of >200k repos) is load-bearing for contradicting broader narratives. However, the paper does not address whether self-admitting projects differ systematically from other GenAI-using projects in governance, review processes, or maturity—factors that could affect churn rates. This selection effect weakens generalizability of the 'no increase' result.
[Longitudinal impact analysis] Longitudinal impact analysis (churn tracking subsection): The before/after comparison lacks reported statistical controls, baseline matching, or robustness checks for confounding factors such as project size or concurrent changes. Without these, the finding of no general increase cannot reliably support the claim that it contradicts popular narratives on GenAI's impact.

minor comments (2)

[Abstract] Abstract: No information is provided on inter-rater reliability for the qualitative coding of the 1,292 items or on exact sampling frame details, which would strengthen the mixed-methods description.
[Taxonomy and policy analysis] The taxonomy derivation and policy analysis sections would benefit from explicit discussion of how the 32 tasks/10 content types/11 purposes were validated beyond initial coding.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's comments. We value the feedback on the methods and longitudinal analysis, which are key to our contributions. Below, we provide point-by-point responses and indicate planned revisions to address the concerns raised.

read point-by-point responses

Referee: Methods section, data collection and sampling: The no-general-increase claim in the longitudinal churn analysis of 151 repositories (selected from 156 with self-admissions out of >200k repos) is load-bearing for contradicting broader narratives. However, the paper does not address whether self-admitting projects differ systematically from other GenAI-using projects in governance, review processes, or maturity—factors that could affect churn rates. This selection effect weakens generalizability of the 'no increase' result.

Authors: We agree that our sample is limited to projects that self-admit GenAI usage, which may indeed differ from non-admitting projects in terms of transparency practices and project maturity. Our analysis is intentionally scoped to self-admitted usage as this provides observable evidence of adoption. To strengthen the manuscript, we will revise the discussion and limitations sections to explicitly acknowledge this selection effect and its implications for generalizability. We will also suggest that future research could explore ways to identify GenAI usage in non-admitting projects. revision: yes
Referee: Longitudinal impact analysis (churn tracking subsection): The before/after comparison lacks reported statistical controls, baseline matching, or robustness checks for confounding factors such as project size or concurrent changes. Without these, the finding of no general increase cannot reliably support the claim that it contradicts popular narratives on GenAI's impact.

Authors: The churn analysis provides an initial longitudinal view based on available data from the 151 repositories. We acknowledge the value of additional statistical rigor. In the revised version, we will include baseline matching on key project characteristics such as size and age, perform statistical tests to assess significance of changes, and add robustness checks. We will also discuss potential confounding factors like concurrent project changes as a limitation of the current analysis. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical study with direct data extraction

full rationale

This is an empirical mixed-methods paper that identifies self-admitted GenAI mentions via keyword search and manual review across >200k repositories, qualitatively codes 1,292 instances into taxonomies, surveys developers, and performs before/after churn comparison on the 151 repositories containing such admissions. No equations, parameter fitting, first-principles derivations, or predictions are present. All quantitative results (counts, taxonomies, churn deltas) are extracted directly from the sampled artifacts without any reduction to prior self-citations or inputs by construction. The selection of self-admitting projects is an explicit methodological filter rather than a hidden tautology, and the 'no general increase' claim is scoped to the observed subset. This matches the expected non-circular outcome for observational repository mining studies.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on the domain assumption that self-admitted mentions are a valid proxy for GenAI usage and that the curated sample of repositories is representative of broader OSS; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Self-admitted mentions in commits, comments, and documentation accurately reflect intentional GenAI use without substantial under- or over-reporting.
Central to treating the 1,292 mentions as the primary data source for taxonomy and churn analysis.

pith-pipeline@v0.9.0 · 5834 in / 1341 out tokens · 42413 ms · 2026-05-19T04:53:25.388238+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we examine the longitudinal impact of GenAI adoption on code churn in 151 repositories with self-admitted GenAI usage and find no general increase, contradicting popular narratives
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We followed a mixed-methods research design... qualitative analysis... Regression Discontinuity Design (RDD)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Dataset of Agentic AI Coding Tool Configurations
cs.SE 2026-05 accept novelty 8.0

A publicly released dataset of 15,591 configuration artifacts for five agentic AI coding tools, drawn from 4,738 GitHub repositories along with associated files and AI-co-authored commits.
A Large-Scale Empirical Study of AI-Generated Code in Real-World Repositories
cs.SE 2026-03 unverdicted novelty 7.0

A large-scale study of real-world repositories finds that AI-generated code differs from human-written code in complexity, structural traits, defect indicators, and commit-level activity patterns.
Agentic Much? Adoption of Coding Agents on GitHub
cs.SE 2026-01 conditional novelty 7.0

Coding agents reached 22-29% adoption in GitHub projects within months of release, with agent-assisted commits larger and focused on features and bug fixes.
A survey of generative AI adoption and perceived productivity among scientists who program
cs.SE 2025-12 unverdicted novelty 6.0

Survey of 868 scientific programmers shows generative AI adoption is highest among the inexperienced, who prefer conversational tools, and perceived productivity correlates most with volume of accepted generated code ...
Reliability of AI Bots Footprints in GitHub Actions CI/CD Workflows
cs.SE 2026-04 unverdicted novelty 5.0

Large-scale analysis of AI bot PRs shows Copilot and Codex achieve the highest CI/CD success rates but more frequent AI contributions correlate with reduced workflow reliability.
Engineering Students' Usage and Perceptions of GitHub Copilot in Open-Source Projects
cs.SE 2026-04 unverdicted novelty 5.0

Students primarily used Copilot chat and code generation features during open-source contributions, with usage patterns varying significantly by gender, programming skill, and AI experience.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 6 Pith papers

[1]

Large language mod- els for software engineering: A systematic literature review,

X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language mod- els for software engineering: A systematic literature review,” ACM Trans. Softw. Eng. Methodol., 2023

work page 2023
[2]

Expec- tation vs. experience: Evaluating the usability of code generation tools powered by large language models,

P . Vaithilingam, T. Zhang, and E. L. Glassman, “Expec- tation vs. experience: Evaluating the usability of code generation tools powered by large language models,” in CHI Extended Abstracts ’22, 2022

work page 2022
[3]

A large-scale survey on the usability of ai programming assistants: Successes and challenges,

J. T. Liang, C. Yang, and B. A. Myers, “A large-scale survey on the usability of ai programming assistants: Successes and challenges,” in ICSE ’24, 2024

work page 2024
[4]

Mea- suring GitHub Copilot’s impact on productivity,

A. Ziegler, E. Kalliamvakou, X. A. Li, A. Rice, D. Rifkin, S. Simister, G. Sittampalam, and E. Aftandilian, “Mea- suring GitHub Copilot’s impact on productivity,”Com- mun. ACM, vol. 67, no. 3, pp. 54–63, 2024

work page 2024
[5]

An empirical evaluation of github copilot’s code suggestions,

N. Nguyen and S. Nadi, “An empirical evaluation of github copilot’s code suggestions,” in MSE ’22, 2022

work page 2022
[6]

Unveiling ChatGPT’s usage in open source projects: A mining-based study,

R. Tufano, A. Mastropaolo, F. Pepe, O. Dabic, M. Di Penta, and G. Bavota, “Unveiling ChatGPT’s usage in open source projects: A mining-based study,” in MSE ’24, 2024, p. 571–583

work page 2024
[7]

Gener- ative ai for pull request descriptions: Adoption, impact, and developer interventions,

T. Xiao, H. Hata, C. Treude, and K. Matsumoto, “Gener- ative ai for pull request descriptions: Adoption, impact, and developer interventions,” ACM P ACMSE, vol. 1, no. FSE, pp. 1043–1065, 2024

work page 2024
[8]

De- vGPT: Studying developer-chatgpt conversations,

T. Xiao, C. Treude, H. Hata, and K. Matsumoto, “De- vGPT: Studying developer-chatgpt conversations,” in MSR ’24, 2024, p. 227–230

work page 2024
[9]

Social coding in GitHub: transparency and collaboration in an open software repository,

L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, “Social coding in GitHub: transparency and collaboration in an open software repository,” in CSCW ’12, 2012

work page 2012
[10]

An exploratory study on self- admitted technical debt,

A. Potdar and E. Shihab, “An exploratory study on self- admitted technical debt,” in ICSME ’14, 2014

work page 2014
[11]

Coding on Copilot: 2024 data suggests downward pressure on code quality,

GitClear, “Coding on Copilot: 2024 data suggests downward pressure on code quality,” https://gitclear.com/coding_on_copilot_data_shows_ ais_downward_pressure_on_code_quality, 2024

work page 2024
[12]

Sampling projects in GitHub for MSR studies,

O. Dabic, E. Aghajani, and G. Bavota, “Sampling projects in GitHub for MSR studies,” in MSR ’21, 2021

work page 2021
[13]

Octoverse: The state of open source and rise of ai in 2024,

“Octoverse: The state of open source and rise of ai in 2024,” https://github.blog/news-insights/octoverse/ octoverse-2024/, 2024, accessed: 2025-07-01

work page 2024
[14]

Curating GitHub for engineered software projects,

N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating GitHub for engineered software projects,” Empir. Softw. Eng., vol. 22, no. 6, pp. 3219–3253, 2017

work page 2017
[15]

Ulfsnes, N

R. Ulfsnes, N. B. Moe, V . Stray, and M. Skarpen, Trans- forming Software Development with Generative AI: Empir- ical Insights on Collaboration and Workflow . Springer, 2024

work page 2024
[16]

Measuring nominal scale agreement among many raters

J. L. Fleiss, “Measuring nominal scale agreement among many raters.” Psychol. Bull., vol. 76, no. 5, 1971

work page 1971
[17]

Self-admitted GenAI usage in open-source software,

T. Xiao, Y. Fan, F. Calefato, C. Treude, R. G. Kula, H. Hata, and S. Baltes, “Self-admitted GenAI usage in open-source software,” Jul. 2025. [Online]. Available: https://doi.org/10.5281/zenodo.15871467

work page doi:10.5281/zenodo.15871467 2025
[18]

Charmaz, Constructing grounded theory

K. Charmaz, Constructing grounded theory. SAGE, 2014

work page 2014
[19]

Understanding interob- server agreement: the kappa statistic,

A. J. Viera, J. M. Garrett et al., “Understanding interob- server agreement: the kappa statistic,” Fam med, vol. 37, no. 5, pp. 360–363, 2005. 17

work page 2005
[20]

Code churn: A measure for estimating the impact of code change,

J. C. Munson and S. G. Elbaum, “Code churn: A measure for estimating the impact of code change,” in ICSM ’98. IEEE, 1998, pp. 24–31

work page 1998
[21]

Examining the impact of self-admitted technical debt on software quality,

S. Wehaibi, E. Shihab, and L. Guerrouj, “Examining the impact of self-admitted technical debt on software quality,” in SANER ’16, vol. 1, 2016, pp. 179–188

work page 2016
[22]

Use of relative code churn measures to predict system defect density,

N. Nagappan and T. Ball, “Use of relative code churn measures to predict system defect density,” inICSE ’05, 2005, pp. 284–292

work page 2005
[23]

Individual comparisons by ranking methods,

F. Wilcoxon, “Individual comparisons by ranking methods,” in Biometrics Bulletin, 1945, pp. 80–83

work page 1945
[24]

Cohen, Statistical power analysis for the behavioral sci- ences

J. Cohen, Statistical power analysis for the behavioral sci- ences. Routledge, 2013

work page 2013
[25]

On a test of whether one of two random variables is stochastically larger than the other,

H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,” The annals of mathematical statistics , pp. 50–60, 1947

work page 1947
[26]

Dominance statistics: Ordinal analyses to answer ordinal questions

N. Cliff, “Dominance statistics: Ordinal analyses to answer ordinal questions.” Psychological bulletin , vol. 114, no. 3, p. 494, 1993

work page 1993
[27]

Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and cohen’s d indices the most appropriate choices,

J. Romano, J. D. Kromrey, J. Coraggio, J. Skowronek, and L. Devine, “Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and cohen’s d indices the most appropriate choices,” in Annual Meeting of SAIR, vol. 14, 2006

work page 2006
[28]

Regression- discontinuity analysis: An alternative to the ex post facto experiment

D. L. Thistlethwaite and D. T. Campbell, “Regression- discontinuity analysis: An alternative to the ex post facto experiment.” Journal of Educational psychology , vol. 51, no. 6, p. 309, 1960

work page 1960
[29]

Regression discontinu- ity designs: A guide to practice,

G. W. Imbens and T. Lemieux, “Regression discontinu- ity designs: A guide to practice,” Journal of econometrics, vol. 142, no. 2, pp. 615–635, 2008

work page 2008
[30]

Effects of adopting code review bots on pull requests to oss projects,

M. Wessel, A. Serebrenik, I. Wiese, I. Steinmacher, and M. A. Gerosa, “Effects of adopting code review bots on pull requests to oss projects,” in ICSME ’20, 2020

work page 2020
[31]

Github actions: the impact on the pull request pro- cess,

M. Wessel, J. Vargovich, M. A. Gerosa, and C. Treude, “Github actions: the impact on the pull request pro- cess,” Empir. Softw. Eng., vol. 28, no. 6, p. 131, 2023

work page 2023
[32]

The software bill of materials,

D. Riehle, “The software bill of materials,” Computer, vol. 58, no. 4, pp. 115–120, 2025

work page 2025
[33]

An empirical study on software bill of materials: Where we stand and the road ahead,

B. Xia, T. Bi, Z. Xing, Q. Lu, and L. Zhu, “An empirical study on software bill of materials: Where we stand and the road ahead,” in ICSE ’23, 2023, pp. 2630–2642

work page 2023
[34]

Analyz- ing developer use of ChatGPT generated code in open source github projects,

B. Grewal, W. Lu, S. Nadi, and C.-P . Bezemer, “Analyz- ing developer use of ChatGPT generated code in open source github projects,” in MSR ’24, 2024, p. 157–161

work page 2024
[35]

On the taxon- omy of developers’ discussion topics with ChatGPT,

E. Sagdic, A. Bayram, and M. R. Islam, “On the taxon- omy of developers’ discussion topics with ChatGPT,” in MSE ’24, 2024, p. 197–201

work page 2024
[36]

ChatGPT in action: Analyzing its use in software development,

A. I. Champa, M. F. Rabbi, C. Nachuma, and M. F. Zi- bran, “ChatGPT in action: Analyzing its use in software development,” in MSR ’24, 2024, p. 182–186

work page 2024
[37]

Can ChatGPT support developers? an empirical evaluation of large language models for code generation,

K. Jin, C.-Y. Wang, H. V . Pham, and H. Hemmati, “Can ChatGPT support developers? an empirical evaluation of large language models for code generation,” in MSE ’24, 2024, p. 167–171

work page 2024
[38]

Lost at C: A user study on the security implications of large language model code assistants,

G. Sandoval, H. Pearce, T. Nys, R. Karri, S. Garg, and B. Dolan-Gavitt, “Lost at C: A user study on the security implications of large language model code assistants,” pp. 2205–2222, 2023

work page 2023
[39]

Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?

O. Asare, M. Nagappan, and N. Asokan, “Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?” Empir. Softw. Eng., vol. 28, no. 6, p. 129, 2023

work page 2023
[40]

Quality assessment of chatgpt generated code and their use by developers,

M. L. Siddiq, L. Roney, J. Zhang, and J. C. D. S. Santos, “Quality assessment of chatgpt generated code and their use by developers,” in MSR ’24, 2024, p. 152–156

work page 2024
[41]

Write me this code: An analysis of ChatGPT quality for producing source code,

K. Moratis, T. Diamantopoulos, D.-N. Nastos, and A. Symeonidis, “Write me this code: An analysis of ChatGPT quality for producing source code,” in MSR ’24, 2024, p. 147–151

work page 2024
[42]

Ai writes, we analyze: The ChatGPT python code saga,

M. F. Rabbi, A. I. Champa, M. F. Zibran, and M. R. Islam, “Ai writes, we analyze: The ChatGPT python code saga,” in MSR ’24, 2024, p. 177–181

work page 2024
[43]

Does generative ai gener- ate smells related to container orchestration?: An ex- ploratory study with kubernetes manifests,

Y. Zhang, R. Meredith, W. Reeves, J. Coriolano, M. A. Babar, and A. Rahman, “Does generative ai gener- ate smells related to container orchestration?: An ex- ploratory study with kubernetes manifests,” in MSR ’24, 2024, p. 192–196

work page 2024
[44]

Future of software development with generative ai,

J. Sauvola, S. Tarkoma, M. Klemettinen, J. Riekki, and D. Doermann, “Future of software development with generative ai,” Autom. Softw. Eng., vol. 31, no. 1, 2024

work page 2024
[45]

Navigating the complexity of generative AI adoption in software engineering,

D. Russo, “Navigating the complexity of generative AI adoption in software engineering,” ACM Trans. Softw. Eng. Methodol., vol. 33, no. 5, pp. 135:1–135:50, 2024

work page 2024
[46]

Technology acceptance model: a literature review from 1986 to 2013,

N. Marangunic and A. Granic, “Technology acceptance model: a literature review from 1986 to 2013,” Univers. Access Inf. Soc., vol. 14, no. 1, pp. 81–95, 2015

work page 1986
[47]

“the law doesn’t work like a computer

N. Wintersgill, T. Stalnaker, L. A. Heymann, O. Cha- parro, and D. Poshyvanyk, ““the law doesn’t work like a computer”’: Exploring software licensing issues faced by legal practitioners,” Proc. ACM Softw. Eng., vol. 1, no. FSE, pp. 882–905, 2024

work page 2024
[48]

Ai copilot code quality: 2025 data suggests 4x growth in code clones,

GitClear, “Ai copilot code quality: 2025 data suggests 4x growth in code clones,” https://gitclear.com/ai_ assistant_code_quality_2025_research, 2025

work page 2025
[49]

Asleep at the keyboard? assessing the secu- rity of GitHub copilot’s code contributions,

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? assessing the secu- rity of GitHub copilot’s code contributions,” Commun. ACM, vol. 68, no. 2, pp. 96–105, 2025

work page 2025
[50]

An empirical study of software reuse vs. defect-density and stability,

P . Mohagheghi, R. Conradi, O. M. Killi, and H. Schwarz, “An empirical study of software reuse vs. defect-density and stability,” in ICSE ’04, 2004

work page 2004
[51]

Investigating on the impact of software clones on technical debt,

A. Lerina and L. Nardi, “Investigating on the impact of software clones on technical debt,” in 2019 IEEE/ACM International Conference on Technical Debt (TechDebt) . IEEE, 2019, pp. 108–112

work page 2019
[52]

Code reuse in practice: Benefiting or harming technical debt,

D. Feitosa, A. Ampatzoglou, A. Gkortzis, S. Bibi, and A. Chatzigeorgiou, “Code reuse in practice: Benefiting or harming technical debt,” J. Syst. Softw. , vol. 167, p. 110618, 2020

work page 2020

[1] [1]

Large language mod- els for software engineering: A systematic literature review,

X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language mod- els for software engineering: A systematic literature review,” ACM Trans. Softw. Eng. Methodol., 2023

work page 2023

[2] [2]

Expec- tation vs. experience: Evaluating the usability of code generation tools powered by large language models,

P . Vaithilingam, T. Zhang, and E. L. Glassman, “Expec- tation vs. experience: Evaluating the usability of code generation tools powered by large language models,” in CHI Extended Abstracts ’22, 2022

work page 2022

[3] [3]

A large-scale survey on the usability of ai programming assistants: Successes and challenges,

J. T. Liang, C. Yang, and B. A. Myers, “A large-scale survey on the usability of ai programming assistants: Successes and challenges,” in ICSE ’24, 2024

work page 2024

[4] [4]

Mea- suring GitHub Copilot’s impact on productivity,

A. Ziegler, E. Kalliamvakou, X. A. Li, A. Rice, D. Rifkin, S. Simister, G. Sittampalam, and E. Aftandilian, “Mea- suring GitHub Copilot’s impact on productivity,”Com- mun. ACM, vol. 67, no. 3, pp. 54–63, 2024

work page 2024

[5] [5]

An empirical evaluation of github copilot’s code suggestions,

N. Nguyen and S. Nadi, “An empirical evaluation of github copilot’s code suggestions,” in MSE ’22, 2022

work page 2022

[6] [6]

Unveiling ChatGPT’s usage in open source projects: A mining-based study,

R. Tufano, A. Mastropaolo, F. Pepe, O. Dabic, M. Di Penta, and G. Bavota, “Unveiling ChatGPT’s usage in open source projects: A mining-based study,” in MSE ’24, 2024, p. 571–583

work page 2024

[7] [7]

Gener- ative ai for pull request descriptions: Adoption, impact, and developer interventions,

T. Xiao, H. Hata, C. Treude, and K. Matsumoto, “Gener- ative ai for pull request descriptions: Adoption, impact, and developer interventions,” ACM P ACMSE, vol. 1, no. FSE, pp. 1043–1065, 2024

work page 2024

[8] [8]

De- vGPT: Studying developer-chatgpt conversations,

T. Xiao, C. Treude, H. Hata, and K. Matsumoto, “De- vGPT: Studying developer-chatgpt conversations,” in MSR ’24, 2024, p. 227–230

work page 2024

[9] [9]

Social coding in GitHub: transparency and collaboration in an open software repository,

L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, “Social coding in GitHub: transparency and collaboration in an open software repository,” in CSCW ’12, 2012

work page 2012

[10] [10]

An exploratory study on self- admitted technical debt,

A. Potdar and E. Shihab, “An exploratory study on self- admitted technical debt,” in ICSME ’14, 2014

work page 2014

[11] [11]

Coding on Copilot: 2024 data suggests downward pressure on code quality,

GitClear, “Coding on Copilot: 2024 data suggests downward pressure on code quality,” https://gitclear.com/coding_on_copilot_data_shows_ ais_downward_pressure_on_code_quality, 2024

work page 2024

[12] [12]

Sampling projects in GitHub for MSR studies,

O. Dabic, E. Aghajani, and G. Bavota, “Sampling projects in GitHub for MSR studies,” in MSR ’21, 2021

work page 2021

[13] [13]

Octoverse: The state of open source and rise of ai in 2024,

“Octoverse: The state of open source and rise of ai in 2024,” https://github.blog/news-insights/octoverse/ octoverse-2024/, 2024, accessed: 2025-07-01

work page 2024

[14] [14]

Curating GitHub for engineered software projects,

N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating GitHub for engineered software projects,” Empir. Softw. Eng., vol. 22, no. 6, pp. 3219–3253, 2017

work page 2017

[15] [15]

Ulfsnes, N

R. Ulfsnes, N. B. Moe, V . Stray, and M. Skarpen, Trans- forming Software Development with Generative AI: Empir- ical Insights on Collaboration and Workflow . Springer, 2024

work page 2024

[16] [16]

Measuring nominal scale agreement among many raters

J. L. Fleiss, “Measuring nominal scale agreement among many raters.” Psychol. Bull., vol. 76, no. 5, 1971

work page 1971

[17] [17]

Self-admitted GenAI usage in open-source software,

T. Xiao, Y. Fan, F. Calefato, C. Treude, R. G. Kula, H. Hata, and S. Baltes, “Self-admitted GenAI usage in open-source software,” Jul. 2025. [Online]. Available: https://doi.org/10.5281/zenodo.15871467

work page doi:10.5281/zenodo.15871467 2025

[18] [18]

Charmaz, Constructing grounded theory

K. Charmaz, Constructing grounded theory. SAGE, 2014

work page 2014

[19] [19]

Understanding interob- server agreement: the kappa statistic,

A. J. Viera, J. M. Garrett et al., “Understanding interob- server agreement: the kappa statistic,” Fam med, vol. 37, no. 5, pp. 360–363, 2005. 17

work page 2005

[20] [20]

Code churn: A measure for estimating the impact of code change,

J. C. Munson and S. G. Elbaum, “Code churn: A measure for estimating the impact of code change,” in ICSM ’98. IEEE, 1998, pp. 24–31

work page 1998

[21] [21]

Examining the impact of self-admitted technical debt on software quality,

S. Wehaibi, E. Shihab, and L. Guerrouj, “Examining the impact of self-admitted technical debt on software quality,” in SANER ’16, vol. 1, 2016, pp. 179–188

work page 2016

[22] [22]

Use of relative code churn measures to predict system defect density,

N. Nagappan and T. Ball, “Use of relative code churn measures to predict system defect density,” inICSE ’05, 2005, pp. 284–292

work page 2005

[23] [23]

Individual comparisons by ranking methods,

F. Wilcoxon, “Individual comparisons by ranking methods,” in Biometrics Bulletin, 1945, pp. 80–83

work page 1945

[24] [24]

Cohen, Statistical power analysis for the behavioral sci- ences

J. Cohen, Statistical power analysis for the behavioral sci- ences. Routledge, 2013

work page 2013

[25] [25]

On a test of whether one of two random variables is stochastically larger than the other,

H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,” The annals of mathematical statistics , pp. 50–60, 1947

work page 1947

[26] [26]

Dominance statistics: Ordinal analyses to answer ordinal questions

N. Cliff, “Dominance statistics: Ordinal analyses to answer ordinal questions.” Psychological bulletin , vol. 114, no. 3, p. 494, 1993

work page 1993

[27] [27]

Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and cohen’s d indices the most appropriate choices,

J. Romano, J. D. Kromrey, J. Coraggio, J. Skowronek, and L. Devine, “Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and cohen’s d indices the most appropriate choices,” in Annual Meeting of SAIR, vol. 14, 2006

work page 2006

[28] [28]

Regression- discontinuity analysis: An alternative to the ex post facto experiment

D. L. Thistlethwaite and D. T. Campbell, “Regression- discontinuity analysis: An alternative to the ex post facto experiment.” Journal of Educational psychology , vol. 51, no. 6, p. 309, 1960

work page 1960

[29] [29]

Regression discontinu- ity designs: A guide to practice,

G. W. Imbens and T. Lemieux, “Regression discontinu- ity designs: A guide to practice,” Journal of econometrics, vol. 142, no. 2, pp. 615–635, 2008

work page 2008

[30] [30]

Effects of adopting code review bots on pull requests to oss projects,

M. Wessel, A. Serebrenik, I. Wiese, I. Steinmacher, and M. A. Gerosa, “Effects of adopting code review bots on pull requests to oss projects,” in ICSME ’20, 2020

work page 2020

[31] [31]

Github actions: the impact on the pull request pro- cess,

M. Wessel, J. Vargovich, M. A. Gerosa, and C. Treude, “Github actions: the impact on the pull request pro- cess,” Empir. Softw. Eng., vol. 28, no. 6, p. 131, 2023

work page 2023

[32] [32]

The software bill of materials,

D. Riehle, “The software bill of materials,” Computer, vol. 58, no. 4, pp. 115–120, 2025

work page 2025

[33] [33]

An empirical study on software bill of materials: Where we stand and the road ahead,

B. Xia, T. Bi, Z. Xing, Q. Lu, and L. Zhu, “An empirical study on software bill of materials: Where we stand and the road ahead,” in ICSE ’23, 2023, pp. 2630–2642

work page 2023

[34] [34]

Analyz- ing developer use of ChatGPT generated code in open source github projects,

B. Grewal, W. Lu, S. Nadi, and C.-P . Bezemer, “Analyz- ing developer use of ChatGPT generated code in open source github projects,” in MSR ’24, 2024, p. 157–161

work page 2024

[35] [35]

On the taxon- omy of developers’ discussion topics with ChatGPT,

E. Sagdic, A. Bayram, and M. R. Islam, “On the taxon- omy of developers’ discussion topics with ChatGPT,” in MSE ’24, 2024, p. 197–201

work page 2024

[36] [36]

ChatGPT in action: Analyzing its use in software development,

A. I. Champa, M. F. Rabbi, C. Nachuma, and M. F. Zi- bran, “ChatGPT in action: Analyzing its use in software development,” in MSR ’24, 2024, p. 182–186

work page 2024

[37] [37]

Can ChatGPT support developers? an empirical evaluation of large language models for code generation,

K. Jin, C.-Y. Wang, H. V . Pham, and H. Hemmati, “Can ChatGPT support developers? an empirical evaluation of large language models for code generation,” in MSE ’24, 2024, p. 167–171

work page 2024

[38] [38]

Lost at C: A user study on the security implications of large language model code assistants,

G. Sandoval, H. Pearce, T. Nys, R. Karri, S. Garg, and B. Dolan-Gavitt, “Lost at C: A user study on the security implications of large language model code assistants,” pp. 2205–2222, 2023

work page 2023

[39] [39]

Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?

O. Asare, M. Nagappan, and N. Asokan, “Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?” Empir. Softw. Eng., vol. 28, no. 6, p. 129, 2023

work page 2023

[40] [40]

Quality assessment of chatgpt generated code and their use by developers,

M. L. Siddiq, L. Roney, J. Zhang, and J. C. D. S. Santos, “Quality assessment of chatgpt generated code and their use by developers,” in MSR ’24, 2024, p. 152–156

work page 2024

[41] [41]

Write me this code: An analysis of ChatGPT quality for producing source code,

K. Moratis, T. Diamantopoulos, D.-N. Nastos, and A. Symeonidis, “Write me this code: An analysis of ChatGPT quality for producing source code,” in MSR ’24, 2024, p. 147–151

work page 2024

[42] [42]

Ai writes, we analyze: The ChatGPT python code saga,

M. F. Rabbi, A. I. Champa, M. F. Zibran, and M. R. Islam, “Ai writes, we analyze: The ChatGPT python code saga,” in MSR ’24, 2024, p. 177–181

work page 2024

[43] [43]

Does generative ai gener- ate smells related to container orchestration?: An ex- ploratory study with kubernetes manifests,

Y. Zhang, R. Meredith, W. Reeves, J. Coriolano, M. A. Babar, and A. Rahman, “Does generative ai gener- ate smells related to container orchestration?: An ex- ploratory study with kubernetes manifests,” in MSR ’24, 2024, p. 192–196

work page 2024

[44] [44]

Future of software development with generative ai,

J. Sauvola, S. Tarkoma, M. Klemettinen, J. Riekki, and D. Doermann, “Future of software development with generative ai,” Autom. Softw. Eng., vol. 31, no. 1, 2024

work page 2024

[45] [45]

Navigating the complexity of generative AI adoption in software engineering,

D. Russo, “Navigating the complexity of generative AI adoption in software engineering,” ACM Trans. Softw. Eng. Methodol., vol. 33, no. 5, pp. 135:1–135:50, 2024

work page 2024

[46] [46]

Technology acceptance model: a literature review from 1986 to 2013,

N. Marangunic and A. Granic, “Technology acceptance model: a literature review from 1986 to 2013,” Univers. Access Inf. Soc., vol. 14, no. 1, pp. 81–95, 2015

work page 1986

[47] [47]

“the law doesn’t work like a computer

N. Wintersgill, T. Stalnaker, L. A. Heymann, O. Cha- parro, and D. Poshyvanyk, ““the law doesn’t work like a computer”’: Exploring software licensing issues faced by legal practitioners,” Proc. ACM Softw. Eng., vol. 1, no. FSE, pp. 882–905, 2024

work page 2024

[48] [48]

Ai copilot code quality: 2025 data suggests 4x growth in code clones,

GitClear, “Ai copilot code quality: 2025 data suggests 4x growth in code clones,” https://gitclear.com/ai_ assistant_code_quality_2025_research, 2025

work page 2025

[49] [49]

Asleep at the keyboard? assessing the secu- rity of GitHub copilot’s code contributions,

H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? assessing the secu- rity of GitHub copilot’s code contributions,” Commun. ACM, vol. 68, no. 2, pp. 96–105, 2025

work page 2025

[50] [50]

An empirical study of software reuse vs. defect-density and stability,

P . Mohagheghi, R. Conradi, O. M. Killi, and H. Schwarz, “An empirical study of software reuse vs. defect-density and stability,” in ICSE ’04, 2004

work page 2004

[51] [51]

Investigating on the impact of software clones on technical debt,

A. Lerina and L. Nardi, “Investigating on the impact of software clones on technical debt,” in 2019 IEEE/ACM International Conference on Technical Debt (TechDebt) . IEEE, 2019, pp. 108–112

work page 2019

[52] [52]

Code reuse in practice: Benefiting or harming technical debt,

D. Feitosa, A. Ampatzoglou, A. Gkortzis, S. Bibi, and A. Chatzigeorgiou, “Code reuse in practice: Benefiting or harming technical debt,” J. Syst. Softw. , vol. 167, p. 110618, 2020

work page 2020