Self-Admitted GenAI Usage in Open-Source Software
Pith reviewed 2026-05-19 04:53 UTC · model grok-4.3
The pith
Open-source developers explicitly admit using generative AI tools in commits and comments, revealing careful project management and no overall rise in code churn.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Developers in open-source projects actively manage generative AI by making explicit self-admissions in commit messages, code comments, and documentation. These admissions form the basis for a taxonomy of tasks, content, and purposes, while policy documents and surveys expose concerns about attribution and quality. Analysis of code churn over time in repositories that contain such admissions finds no general increase, which contradicts common narratives about the disruptive effects of GenAI on software development.
What carries the argument
self-admitted GenAI usage, the practice of developers explicitly noting the use of generative AI tools for creating content in software artifacts such as commit messages, comments, and documentation.
If this is right
- Project maintainers should establish explicit rules for transparency and proper attribution of AI-generated contributions.
- Quality control steps become necessary whenever generative AI assists in writing or modifying code.
- Adoption of generative AI tools does not produce a measurable general increase in code churn within open-source repositories.
- Ethical and legal considerations around generative AI require attention at the level of individual projects rather than only at the tool level.
Where Pith is reading between the lines
- Similar self-admission mechanisms could help track the introduction of other new development tools beyond generative AI.
- Comparing admitted and non-admitted usage in the same repositories would test how complete the current sample is.
- Maintainers might adopt standardized admission formats to streamline code review for AI-assisted changes.
Load-bearing premise
Explicit self-admissions in commits, comments, and documentation provide a representative sample of actual generative AI usage across open-source projects.
What would settle it
A broad scan of repositories that locates widespread generative AI code without corresponding self-admissions, or a re-analysis of churn rates that shows a clear rise once non-admitted usage is included.
Figures
read the original abstract
Strategized LaTeX removal and whitespace normalization approachThe widespread adoption of generative AI (GenAI) tools such as GitHub Copilot and ChatGPT is transforming software development. Since generated source code is virtually impossible to distinguish from manually written code, their real-world usage and impact on open-source software (OSS) development remain poorly understood. In this paper, we introduce the concept of self-admitted GenAI usage, that is, developers explicitly referring to the use of GenAI tools for content creation in software artifacts. Using this concept as a lens to study how GenAI tools are integrated into OSS projects, we analyze a curated sample of more than 200,000 GitHub repositories, identifying 1,292 such self-admissions across 156 repositories in commit messages, code comments, and project documentation. Using a mixed methods approach, we derive a taxonomy of 32 tasks, 10 content types, and 11 purposes associated with GenAI usage based on 1,292 qualitatively coded mentions. We then analyze 13 documents with policies and usage guidelines for GenAI tools and conduct a developer survey to uncover the ethical, legal, and practical concerns behind them. Our findings reveal that developers actively manage how GenAI is used in their projects, highlighting the need for project-level transparency, attribution, and quality control practices in AI-assisted software development. Finally, we examine the longitudinal impact of GenAI adoption on code churn in 151 repositories with self-admitted GenAI usage and find no general increase, contradicting popular narratives on the impact of GenAI on software development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the concept of self-admitted GenAI usage in OSS projects and analyzes over 200,000 GitHub repositories to identify 1,292 explicit mentions across 156 repositories. It develops a taxonomy of 32 tasks, 10 content types, and 11 purposes via qualitative coding, examines 13 policy documents, conducts a developer survey on ethical/legal/practical concerns, and performs longitudinal churn analysis on 151 repositories, concluding that developers actively manage GenAI usage and that there is no general increase in code churn after adoption, contradicting popular narratives.
Significance. If the central claims hold, the work offers timely empirical insights into real-world GenAI integration in open-source development, emphasizing project-level transparency, attribution, and quality controls. The mixed-methods design—qualitative coding of 1,292 items paired with quantitative churn tracking across 151 repositories—is appropriate and provides both depth in usage patterns and breadth in longitudinal impact assessment.
major comments (2)
- [Methods section, data collection and sampling] Methods section, data collection and sampling: The no-general-increase claim in the longitudinal churn analysis of 151 repositories (selected from 156 with self-admissions out of >200k repos) is load-bearing for contradicting broader narratives. However, the paper does not address whether self-admitting projects differ systematically from other GenAI-using projects in governance, review processes, or maturity—factors that could affect churn rates. This selection effect weakens generalizability of the 'no increase' result.
- [Longitudinal impact analysis] Longitudinal impact analysis (churn tracking subsection): The before/after comparison lacks reported statistical controls, baseline matching, or robustness checks for confounding factors such as project size or concurrent changes. Without these, the finding of no general increase cannot reliably support the claim that it contradicts popular narratives on GenAI's impact.
minor comments (2)
- [Abstract] Abstract: No information is provided on inter-rater reliability for the qualitative coding of the 1,292 items or on exact sampling frame details, which would strengthen the mixed-methods description.
- [Taxonomy and policy analysis] The taxonomy derivation and policy analysis sections would benefit from explicit discussion of how the 32 tasks/10 content types/11 purposes were validated beyond initial coding.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's comments. We value the feedback on the methods and longitudinal analysis, which are key to our contributions. Below, we provide point-by-point responses and indicate planned revisions to address the concerns raised.
read point-by-point responses
-
Referee: Methods section, data collection and sampling: The no-general-increase claim in the longitudinal churn analysis of 151 repositories (selected from 156 with self-admissions out of >200k repos) is load-bearing for contradicting broader narratives. However, the paper does not address whether self-admitting projects differ systematically from other GenAI-using projects in governance, review processes, or maturity—factors that could affect churn rates. This selection effect weakens generalizability of the 'no increase' result.
Authors: We agree that our sample is limited to projects that self-admit GenAI usage, which may indeed differ from non-admitting projects in terms of transparency practices and project maturity. Our analysis is intentionally scoped to self-admitted usage as this provides observable evidence of adoption. To strengthen the manuscript, we will revise the discussion and limitations sections to explicitly acknowledge this selection effect and its implications for generalizability. We will also suggest that future research could explore ways to identify GenAI usage in non-admitting projects. revision: yes
-
Referee: Longitudinal impact analysis (churn tracking subsection): The before/after comparison lacks reported statistical controls, baseline matching, or robustness checks for confounding factors such as project size or concurrent changes. Without these, the finding of no general increase cannot reliably support the claim that it contradicts popular narratives on GenAI's impact.
Authors: The churn analysis provides an initial longitudinal view based on available data from the 151 repositories. We acknowledge the value of additional statistical rigor. In the revised version, we will include baseline matching on key project characteristics such as size and age, perform statistical tests to assess significance of changes, and add robustness checks. We will also discuss potential confounding factors like concurrent project changes as a limitation of the current analysis. revision: yes
Circularity Check
No circularity: purely observational empirical study with direct data extraction
full rationale
This is an empirical mixed-methods paper that identifies self-admitted GenAI mentions via keyword search and manual review across >200k repositories, qualitatively codes 1,292 instances into taxonomies, surveys developers, and performs before/after churn comparison on the 151 repositories containing such admissions. No equations, parameter fitting, first-principles derivations, or predictions are present. All quantitative results (counts, taxonomies, churn deltas) are extracted directly from the sampled artifacts without any reduction to prior self-citations or inputs by construction. The selection of self-admitting projects is an explicit methodological filter rather than a hidden tautology, and the 'no general increase' claim is scoped to the observed subset. This matches the expected non-circular outcome for observational repository mining studies.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Self-admitted mentions in commits, comments, and documentation accurately reflect intentional GenAI use without substantial under- or over-reporting.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we examine the longitudinal impact of GenAI adoption on code churn in 151 repositories with self-admitted GenAI usage and find no general increase, contradicting popular narratives
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We followed a mixed-methods research design... qualitative analysis... Regression Discontinuity Design (RDD)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 6 Pith papers
-
A Dataset of Agentic AI Coding Tool Configurations
A publicly released dataset of 15,591 configuration artifacts for five agentic AI coding tools, drawn from 4,738 GitHub repositories along with associated files and AI-co-authored commits.
-
A Large-Scale Empirical Study of AI-Generated Code in Real-World Repositories
A large-scale study of real-world repositories finds that AI-generated code differs from human-written code in complexity, structural traits, defect indicators, and commit-level activity patterns.
-
Agentic Much? Adoption of Coding Agents on GitHub
Coding agents reached 22-29% adoption in GitHub projects within months of release, with agent-assisted commits larger and focused on features and bug fixes.
-
A survey of generative AI adoption and perceived productivity among scientists who program
Survey of 868 scientific programmers shows generative AI adoption is highest among the inexperienced, who prefer conversational tools, and perceived productivity correlates most with volume of accepted generated code ...
-
Reliability of AI Bots Footprints in GitHub Actions CI/CD Workflows
Large-scale analysis of AI bot PRs shows Copilot and Codex achieve the highest CI/CD success rates but more frequent AI contributions correlate with reduced workflow reliability.
-
Engineering Students' Usage and Perceptions of GitHub Copilot in Open-Source Projects
Students primarily used Copilot chat and code generation features during open-source contributions, with usage patterns varying significantly by gender, programming skill, and AI experience.
Reference graph
Works this paper leans on
-
[1]
Large language mod- els for software engineering: A systematic literature review,
X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language mod- els for software engineering: A systematic literature review,” ACM Trans. Softw. Eng. Methodol., 2023
work page 2023
-
[2]
P . Vaithilingam, T. Zhang, and E. L. Glassman, “Expec- tation vs. experience: Evaluating the usability of code generation tools powered by large language models,” in CHI Extended Abstracts ’22, 2022
work page 2022
-
[3]
A large-scale survey on the usability of ai programming assistants: Successes and challenges,
J. T. Liang, C. Yang, and B. A. Myers, “A large-scale survey on the usability of ai programming assistants: Successes and challenges,” in ICSE ’24, 2024
work page 2024
-
[4]
Mea- suring GitHub Copilot’s impact on productivity,
A. Ziegler, E. Kalliamvakou, X. A. Li, A. Rice, D. Rifkin, S. Simister, G. Sittampalam, and E. Aftandilian, “Mea- suring GitHub Copilot’s impact on productivity,”Com- mun. ACM, vol. 67, no. 3, pp. 54–63, 2024
work page 2024
-
[5]
An empirical evaluation of github copilot’s code suggestions,
N. Nguyen and S. Nadi, “An empirical evaluation of github copilot’s code suggestions,” in MSE ’22, 2022
work page 2022
-
[6]
Unveiling ChatGPT’s usage in open source projects: A mining-based study,
R. Tufano, A. Mastropaolo, F. Pepe, O. Dabic, M. Di Penta, and G. Bavota, “Unveiling ChatGPT’s usage in open source projects: A mining-based study,” in MSE ’24, 2024, p. 571–583
work page 2024
-
[7]
Gener- ative ai for pull request descriptions: Adoption, impact, and developer interventions,
T. Xiao, H. Hata, C. Treude, and K. Matsumoto, “Gener- ative ai for pull request descriptions: Adoption, impact, and developer interventions,” ACM P ACMSE, vol. 1, no. FSE, pp. 1043–1065, 2024
work page 2024
-
[8]
De- vGPT: Studying developer-chatgpt conversations,
T. Xiao, C. Treude, H. Hata, and K. Matsumoto, “De- vGPT: Studying developer-chatgpt conversations,” in MSR ’24, 2024, p. 227–230
work page 2024
-
[9]
Social coding in GitHub: transparency and collaboration in an open software repository,
L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, “Social coding in GitHub: transparency and collaboration in an open software repository,” in CSCW ’12, 2012
work page 2012
-
[10]
An exploratory study on self- admitted technical debt,
A. Potdar and E. Shihab, “An exploratory study on self- admitted technical debt,” in ICSME ’14, 2014
work page 2014
-
[11]
Coding on Copilot: 2024 data suggests downward pressure on code quality,
GitClear, “Coding on Copilot: 2024 data suggests downward pressure on code quality,” https://gitclear.com/coding_on_copilot_data_shows_ ais_downward_pressure_on_code_quality, 2024
work page 2024
-
[12]
Sampling projects in GitHub for MSR studies,
O. Dabic, E. Aghajani, and G. Bavota, “Sampling projects in GitHub for MSR studies,” in MSR ’21, 2021
work page 2021
-
[13]
Octoverse: The state of open source and rise of ai in 2024,
“Octoverse: The state of open source and rise of ai in 2024,” https://github.blog/news-insights/octoverse/ octoverse-2024/, 2024, accessed: 2025-07-01
work page 2024
-
[14]
Curating GitHub for engineered software projects,
N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating GitHub for engineered software projects,” Empir. Softw. Eng., vol. 22, no. 6, pp. 3219–3253, 2017
work page 2017
-
[15]
R. Ulfsnes, N. B. Moe, V . Stray, and M. Skarpen, Trans- forming Software Development with Generative AI: Empir- ical Insights on Collaboration and Workflow . Springer, 2024
work page 2024
-
[16]
Measuring nominal scale agreement among many raters
J. L. Fleiss, “Measuring nominal scale agreement among many raters.” Psychol. Bull., vol. 76, no. 5, 1971
work page 1971
-
[17]
Self-admitted GenAI usage in open-source software,
T. Xiao, Y. Fan, F. Calefato, C. Treude, R. G. Kula, H. Hata, and S. Baltes, “Self-admitted GenAI usage in open-source software,” Jul. 2025. [Online]. Available: https://doi.org/10.5281/zenodo.15871467
-
[18]
Charmaz, Constructing grounded theory
K. Charmaz, Constructing grounded theory. SAGE, 2014
work page 2014
-
[19]
Understanding interob- server agreement: the kappa statistic,
A. J. Viera, J. M. Garrett et al., “Understanding interob- server agreement: the kappa statistic,” Fam med, vol. 37, no. 5, pp. 360–363, 2005. 17
work page 2005
-
[20]
Code churn: A measure for estimating the impact of code change,
J. C. Munson and S. G. Elbaum, “Code churn: A measure for estimating the impact of code change,” in ICSM ’98. IEEE, 1998, pp. 24–31
work page 1998
-
[21]
Examining the impact of self-admitted technical debt on software quality,
S. Wehaibi, E. Shihab, and L. Guerrouj, “Examining the impact of self-admitted technical debt on software quality,” in SANER ’16, vol. 1, 2016, pp. 179–188
work page 2016
-
[22]
Use of relative code churn measures to predict system defect density,
N. Nagappan and T. Ball, “Use of relative code churn measures to predict system defect density,” inICSE ’05, 2005, pp. 284–292
work page 2005
-
[23]
Individual comparisons by ranking methods,
F. Wilcoxon, “Individual comparisons by ranking methods,” in Biometrics Bulletin, 1945, pp. 80–83
work page 1945
-
[24]
Cohen, Statistical power analysis for the behavioral sci- ences
J. Cohen, Statistical power analysis for the behavioral sci- ences. Routledge, 2013
work page 2013
-
[25]
On a test of whether one of two random variables is stochastically larger than the other,
H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,” The annals of mathematical statistics , pp. 50–60, 1947
work page 1947
-
[26]
Dominance statistics: Ordinal analyses to answer ordinal questions
N. Cliff, “Dominance statistics: Ordinal analyses to answer ordinal questions.” Psychological bulletin , vol. 114, no. 3, p. 494, 1993
work page 1993
-
[27]
J. Romano, J. D. Kromrey, J. Coraggio, J. Skowronek, and L. Devine, “Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and cohen’s d indices the most appropriate choices,” in Annual Meeting of SAIR, vol. 14, 2006
work page 2006
-
[28]
Regression- discontinuity analysis: An alternative to the ex post facto experiment
D. L. Thistlethwaite and D. T. Campbell, “Regression- discontinuity analysis: An alternative to the ex post facto experiment.” Journal of Educational psychology , vol. 51, no. 6, p. 309, 1960
work page 1960
-
[29]
Regression discontinu- ity designs: A guide to practice,
G. W. Imbens and T. Lemieux, “Regression discontinu- ity designs: A guide to practice,” Journal of econometrics, vol. 142, no. 2, pp. 615–635, 2008
work page 2008
-
[30]
Effects of adopting code review bots on pull requests to oss projects,
M. Wessel, A. Serebrenik, I. Wiese, I. Steinmacher, and M. A. Gerosa, “Effects of adopting code review bots on pull requests to oss projects,” in ICSME ’20, 2020
work page 2020
-
[31]
Github actions: the impact on the pull request pro- cess,
M. Wessel, J. Vargovich, M. A. Gerosa, and C. Treude, “Github actions: the impact on the pull request pro- cess,” Empir. Softw. Eng., vol. 28, no. 6, p. 131, 2023
work page 2023
-
[32]
The software bill of materials,
D. Riehle, “The software bill of materials,” Computer, vol. 58, no. 4, pp. 115–120, 2025
work page 2025
-
[33]
An empirical study on software bill of materials: Where we stand and the road ahead,
B. Xia, T. Bi, Z. Xing, Q. Lu, and L. Zhu, “An empirical study on software bill of materials: Where we stand and the road ahead,” in ICSE ’23, 2023, pp. 2630–2642
work page 2023
-
[34]
Analyz- ing developer use of ChatGPT generated code in open source github projects,
B. Grewal, W. Lu, S. Nadi, and C.-P . Bezemer, “Analyz- ing developer use of ChatGPT generated code in open source github projects,” in MSR ’24, 2024, p. 157–161
work page 2024
-
[35]
On the taxon- omy of developers’ discussion topics with ChatGPT,
E. Sagdic, A. Bayram, and M. R. Islam, “On the taxon- omy of developers’ discussion topics with ChatGPT,” in MSE ’24, 2024, p. 197–201
work page 2024
-
[36]
ChatGPT in action: Analyzing its use in software development,
A. I. Champa, M. F. Rabbi, C. Nachuma, and M. F. Zi- bran, “ChatGPT in action: Analyzing its use in software development,” in MSR ’24, 2024, p. 182–186
work page 2024
-
[37]
K. Jin, C.-Y. Wang, H. V . Pham, and H. Hemmati, “Can ChatGPT support developers? an empirical evaluation of large language models for code generation,” in MSE ’24, 2024, p. 167–171
work page 2024
-
[38]
Lost at C: A user study on the security implications of large language model code assistants,
G. Sandoval, H. Pearce, T. Nys, R. Karri, S. Garg, and B. Dolan-Gavitt, “Lost at C: A user study on the security implications of large language model code assistants,” pp. 2205–2222, 2023
work page 2023
-
[39]
Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?
O. Asare, M. Nagappan, and N. Asokan, “Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?” Empir. Softw. Eng., vol. 28, no. 6, p. 129, 2023
work page 2023
-
[40]
Quality assessment of chatgpt generated code and their use by developers,
M. L. Siddiq, L. Roney, J. Zhang, and J. C. D. S. Santos, “Quality assessment of chatgpt generated code and their use by developers,” in MSR ’24, 2024, p. 152–156
work page 2024
-
[41]
Write me this code: An analysis of ChatGPT quality for producing source code,
K. Moratis, T. Diamantopoulos, D.-N. Nastos, and A. Symeonidis, “Write me this code: An analysis of ChatGPT quality for producing source code,” in MSR ’24, 2024, p. 147–151
work page 2024
-
[42]
Ai writes, we analyze: The ChatGPT python code saga,
M. F. Rabbi, A. I. Champa, M. F. Zibran, and M. R. Islam, “Ai writes, we analyze: The ChatGPT python code saga,” in MSR ’24, 2024, p. 177–181
work page 2024
-
[43]
Y. Zhang, R. Meredith, W. Reeves, J. Coriolano, M. A. Babar, and A. Rahman, “Does generative ai gener- ate smells related to container orchestration?: An ex- ploratory study with kubernetes manifests,” in MSR ’24, 2024, p. 192–196
work page 2024
-
[44]
Future of software development with generative ai,
J. Sauvola, S. Tarkoma, M. Klemettinen, J. Riekki, and D. Doermann, “Future of software development with generative ai,” Autom. Softw. Eng., vol. 31, no. 1, 2024
work page 2024
-
[45]
Navigating the complexity of generative AI adoption in software engineering,
D. Russo, “Navigating the complexity of generative AI adoption in software engineering,” ACM Trans. Softw. Eng. Methodol., vol. 33, no. 5, pp. 135:1–135:50, 2024
work page 2024
-
[46]
Technology acceptance model: a literature review from 1986 to 2013,
N. Marangunic and A. Granic, “Technology acceptance model: a literature review from 1986 to 2013,” Univers. Access Inf. Soc., vol. 14, no. 1, pp. 81–95, 2015
work page 1986
-
[47]
“the law doesn’t work like a computer
N. Wintersgill, T. Stalnaker, L. A. Heymann, O. Cha- parro, and D. Poshyvanyk, ““the law doesn’t work like a computer”’: Exploring software licensing issues faced by legal practitioners,” Proc. ACM Softw. Eng., vol. 1, no. FSE, pp. 882–905, 2024
work page 2024
-
[48]
Ai copilot code quality: 2025 data suggests 4x growth in code clones,
GitClear, “Ai copilot code quality: 2025 data suggests 4x growth in code clones,” https://gitclear.com/ai_ assistant_code_quality_2025_research, 2025
work page 2025
-
[49]
Asleep at the keyboard? assessing the secu- rity of GitHub copilot’s code contributions,
H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? assessing the secu- rity of GitHub copilot’s code contributions,” Commun. ACM, vol. 68, no. 2, pp. 96–105, 2025
work page 2025
-
[50]
An empirical study of software reuse vs. defect-density and stability,
P . Mohagheghi, R. Conradi, O. M. Killi, and H. Schwarz, “An empirical study of software reuse vs. defect-density and stability,” in ICSE ’04, 2004
work page 2004
-
[51]
Investigating on the impact of software clones on technical debt,
A. Lerina and L. Nardi, “Investigating on the impact of software clones on technical debt,” in 2019 IEEE/ACM International Conference on Technical Debt (TechDebt) . IEEE, 2019, pp. 108–112
work page 2019
-
[52]
Code reuse in practice: Benefiting or harming technical debt,
D. Feitosa, A. Ampatzoglou, A. Gkortzis, S. Bibi, and A. Chatzigeorgiou, “Code reuse in practice: Benefiting or harming technical debt,” J. Syst. Softw. , vol. 167, p. 110618, 2020
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.