How Humans, Bots, and Agents Communicate About Vulnerabilities in Pull Requests
Pith reviewed 2026-06-29 03:29 UTC · model grok-4.3
The pith
This registered report outlines a planned study comparing how humans, bots, and agents reference vulnerabilities in pull requests using explicit and implicit signals.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors plan to analyze explicit vulnerability references such as CVEs or GHSAs and implicit security-related signals across pull request titles, descriptions, review comments, commit messages, and timeline discussions, then relate these to introduced or fixed vulnerabilities and to review activity and outcomes, comparing across human, bot, and agent accounts.
What carries the argument
Comparison of explicit and implicit vulnerability signals in the AIDev-pop dataset across multiple pull request components to identify differences by account type.
If this is right
- Vulnerability references may be associated with whether vulnerabilities are introduced or fixed in the modified code.
- References may relate to pull request review activity and outcomes.
- The study will generate data on communication practices involving automated accounts in modern software development.
Where Pith is reading between the lines
- Prioritizing implicit signals could surface vulnerability discussions that prior studies limited to explicit identifiers would have missed.
- Findings on account-type differences could inform design choices for future bots and agents regarding security topics.
- The planned analysis could be extended to additional datasets to test whether patterns hold beyond the current sample.
Load-bearing premise
The AIDev-pop dataset provides adequate coverage of pull requests involving bots and coding agents to enable valid comparisons of communication patterns across account types.
What would settle it
Discovery that the AIDev-pop dataset contains too few pull requests with bot or agent activity would prevent meaningful comparisons across account types.
Figures
read the original abstract
Developers may reference vulnerabilities in pull request discussions through both explicit identifiers, such as CVEs or GHSAs, and implicit security-related language (e.g., "unauthorized access" or "SQL injection"). Prior work has primarily focused on explicit identifiers, potentially overlooking vulnerability discussions that lack formal references. Bots and coding agents are becoming more common in pull requests, raising new questions about how different accounts communicate about vulnerabilities. In this registered report, we describe our planned study of vulnerability communication in pull requests by humans, bots, and coding agents. Building on the AIDev-pop dataset, we analyze explicit vulnerability references and implicit security-related signals across pull request titles, descriptions, review comments, commit messages, and timeline discussions. We further investigate whether these references are associated with vulnerabilities introduced or fixed in the modified code and how they relate to pull request review activity and outcomes. This study contributes a large-scale empirical investigation of vulnerability communication practices in modern software development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a registered report outlining a planned large-scale empirical study of vulnerability communication in pull requests. It examines explicit references (CVEs, GHSAs) and implicit security-related language across PR titles, descriptions, review comments, commit messages, and timelines, comparing patterns among humans, bots, and coding agents using the AIDev-pop dataset. The study further plans to link these references to introduced or fixed vulnerabilities and to review activity/outcomes.
Significance. If the planned analyses can be executed, the work would address a gap in prior research by incorporating implicit signals and automated account types, contributing empirical evidence on vulnerability discussions in modern development workflows. The registered-report format is a clear strength, as it commits to the analysis plan in advance and supports reproducibility. The contribution hinges on whether AIDev-pop supplies adequate coverage for the cross-account comparisons.
major comments (1)
- [Abstract / Planned Methods] Abstract and planned-methods description: The central claim of performing valid comparisons of vulnerability communication across humans, bots, and coding agents rests on the untested premise that AIDev-pop contains sufficient bot- and agent-authored PRs with explicit or implicit vulnerability signals in titles, descriptions, comments, commits, and timelines. No preliminary counts, sampling strategy, or power analysis for these subgroups are supplied, so the cross-account analysis cannot be guaranteed to be feasible as described.
minor comments (2)
- [Abstract] The distinction between 'bots' and 'coding agents' is introduced but not operationally defined; a clear classification rule or reference to how AIDev-pop labels these account types would improve reproducibility.
- [Abstract] The abstract states the study will 'investigate whether these references are associated with vulnerabilities introduced or fixed,' but does not specify the code-analysis method (e.g., static analysis tool or diff-based detection) that will be used to identify such vulnerabilities.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our registered report. We address the single major comment below and commit to revisions that strengthen the description of dataset feasibility.
read point-by-point responses
-
Referee: [Abstract / Planned Methods] Abstract and planned-methods description: The central claim of performing valid comparisons of vulnerability communication across humans, bots, and coding agents rests on the untested premise that AIDev-pop contains sufficient bot- and agent-authored PRs with explicit or implicit vulnerability signals in titles, descriptions, comments, commits, and timelines. No preliminary counts, sampling strategy, or power analysis for these subgroups are supplied, so the cross-account analysis cannot be guaranteed to be feasible as described.
Authors: We agree that the registered report would benefit from explicit discussion of dataset coverage to support the planned comparisons. In the revised manuscript we will add a dedicated subsection on the AIDev-pop dataset that reports all publicly documented statistics on the distribution of human-, bot-, and agent-authored PRs. We will also describe our sampling strategy (first filtering the full dataset for PRs containing explicit CVE/GHSA references or implicit security keywords, then stratifying by account type) and commit to performing and transparently reporting a post-hoc power analysis once the filtered sample sizes are known. These additions address the concern while preserving the pre-registered analysis plan. revision: yes
Circularity Check
No circularity in registered report plan
full rationale
This is a registered report outlining a planned empirical study on vulnerability communication in PRs using the external AIDev-pop dataset. No equations, derivations, fitted parameters, predictions, or self-citations appear in the provided text. The document describes future analysis steps without any load-bearing claims that reduce to inputs by construction, self-definition, or author-overlapping citations. The central contribution is a descriptive plan, which is self-contained against external benchmarks and contains no derivation chain to inspect.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A Quantitative Study of Security Bug Fixes of GitHub Repositories,
D. Nakano, M. Yin, R. Sato, A. Hindle, Y . Kamei, and N. Ubayashi, “A Quantitative Study of Security Bug Fixes of GitHub Repositories,”arXiv preprint arXiv:2012.08053, 2020
arXiv 2012
-
[2]
CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software,
G. Bhandari, A. Naseer, and L. Moonen, “CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software,” inProceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering, ser. PROMISE 2021. New York, NY , USA: Association for Computing Machinery, 2021, p. 30–39. [Online]. Avail...
-
[3]
Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories,
D. Hommersom, A. Sabetta, B. Coppola, D. D. Nucci, and D. A. Tamburri, “Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source Repositories,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 5, p. 1–28, Jun. 2024. [Online]. Available: https://doi.org/10.1145/3649590
-
[4]
An Empirical Study on Vulnerability Disclosure Management of Open Source Software Systems,
S. Liu, J. Zhou, X. Hu, F. R. Cogo, X. Xia, and X. Yang, “An Empirical Study on Vulnerability Disclosure Management of Open Source Software Systems,”ACM Trans. Softw. Eng. Methodol., vol. 34, no. 7, pp. 1–31, Aug. 2025. [Online]. Available: https://doi.org/10.1145/3716822
-
[5]
A Mixed-Methods Study of Open-Source Software Maintainers on Vul- nerability Management and Platform Security Features,
J. Ayala, Y .-J. Tung, and J. Garcia, “A Mixed-Methods Study of Open-Source Software Maintainers on Vul- nerability Management and Platform Security Features,” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 2105–2124
2025
-
[6]
Are security commit messages informative? Not enough!
S. Reis, R. Abreu, and C. Pasareanu, “Are security commit messages informative? Not enough!” in Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering, ser. EASE ’23. New York, NY , USA: Association for Computing Machinery, 2023, p. 196–199. [Online]. Available: https://doi.org/10.1145/3593434.3593481
-
[7]
An empirical study of developers’ discussions about security challenges of different programming languages,
R. Croft, Y . Xie, M. Zahedi, M. A. Babar, and C. Treude, “An empirical study of developers’ discussions about security challenges of different programming languages,” Empirical Software Engineering, vol. 27, no. 1, p. 27, 2022. [Online]. Available: https://doi.org/10.1007/ s10664-021-10054-w
2022
-
[8]
Exploring the Security Awareness of the Python and JavaScript Open Source Communities,
G. Antal, M. Keleti, and P. Hegedundefineds, “Exploring the Security Awareness of the Python and JavaScript Open Source Communities,” in Proceedings of the 17th International Conference on Mining Software Repositories, ser. MSR ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 16–20. [Online]. Available: https://doi.org/10.1145/337959...
-
[9]
A Comprehensive Study on the Impact of Vulnerable Dependencies on Open- Source Software,
S. H. B. I. Kumar, L. R. Sampaio, A. Martin, A. Brito, and C. Fetzer, “A Comprehensive Study on the Impact of Vulnerable Dependencies on Open- Source Software,” in2024 IEEE 35th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2024, pp. 96–107. [Online]. Available: https: //doi.org/10.1109/ISSRE62328.2024.00020
-
[10]
On Categorizing Open Source Software Security Vulnerability Reporting Mechanisms on GitHub,
S. Kancharoendee, T. Phichitphanphong, C. Jongyingyos, B. Reid, R. G. Kula, M. Choetkiertikul, C. Ragkhitwet- sagul, and T. Sunetnanta, “On Categorizing Open Source Software Security Vulnerability Reporting Mechanisms on GitHub,” in2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2025, pp. 751–756. [Onlin...
-
[11]
Automated Identification of Security Issues from Commit Messages and Bug Reports,
Y . Zhou and A. Sharma, “Automated Identification of Security Issues from Commit Messages and Bug Reports,” inProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2017. New York, NY , USA: Association for Computing Machinery, 2017, p. 914–919. [Online]. Available: https://doi.org/10.1145/3106237.3117771
-
[12]
Automating the detection of code vulnerabilities by analyzing github issues,
D. Cipollone, C. Wang, M. Scazzariello, S. Ferlin, M. Izadi, D. Kosti ´c, and M. Chiesa, “Automating the detection of code vulnerabilities by analyzing github issues,” in2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code). IEEE, 2025, pp. 41–48. [Online]. Available: https: //doi.org/10.1109/LLM4Code66737.2025.00010
-
[13]
P. Rooijendijk, C. Treude, and M. Wessel, “Who Said CVE? How Vulnerability Identifiers Are Mentioned by Humans, Bots, and Agents in Pull Requests,” in2026 IEEE/ACM 23rd International Conference on Mining Software Repositories (MSR), 2026. [Online]. Available: https://doi.org/10.1145/3793302.3793616
-
[14]
Security in the Age of AI Teammates: An Empirical Study of Agentic Pull Requests on GitHub,
M. L. Siddiq, X. Zhao, V . C. Lopes, B. Casey, and J. Santos, “Security in the Age of AI Teammates: An Empirical Study of Agentic Pull Requests on GitHub,” arXiv preprint arXiv:2601.00477, 2026
arXiv 2026
-
[15]
Automated vs. human security patching patterns in pull requests: Evidence from the aidev dataset,
F. Wang, B. Do, and J. Jermier, “Automated vs. human security patching patterns in pull requests: Evidence from the aidev dataset,” 2025. [Online]. Available: https://plg.uwaterloo.ca/ ∼migod/846/current/ projects/04-FelixJacieBrian-report.pdf
2025
-
[16]
H. Li, H. Zhang, and A. E. Hassan, “The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents are Reshaping Software Engineering,”arXiv preprint arXiv:2507.15003, 2025
Pith/arXiv arXiv 2025
-
[17]
On the Use of Dependabot Security Pull Requests,
M. Alfadel, D. E. Costa, E. Shihab, and M. Mkhallalati, “On the Use of Dependabot Security Pull Requests,” in2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), 2021, pp. 254–265. [Online]. Available: https://doi.org/10.1109/MSR52588. 2021.00037
-
[18]
Investigating the Resolution of Vulnerable Dependencies with Dependabot Security Updates,
H. Mohayeji, A. Agaronian, E. Constantinou, N. Zannone, and A. Serebrenik, “Investigating the Resolution of Vulnerable Dependencies with Dependabot Security Updates,” in2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), 2023, pp. 234–246. [Online]. Available: https://doi.org/10.1109/MSR59073.2023.00042
-
[19]
B. Steenhoek, K. Sivaraman, R. S. Gonzalez, Y . Mohylevskyy, R. Z. Moghaddam, and W. Le, “Closing the Gap: A User Study on the Real- world Usefulness of AI-powered Vulnerability Detection & Repair in the IDE,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE, 2025, pp. 01–13. [Online]. Available: https://doi.org/10.1109/I...
-
[20]
Insights into Security-Related AI-Generated Pull Requests,
M. F. Rabbi, A. K. Turzo, A. I. Champa, and M. F. Zi- bran, “Insights into Security-Related AI-Generated Pull Requests,”arXiv preprint arXiv:2604.19965, 2026
Pith/arXiv arXiv 2026
-
[21]
Cohen’s Kappa Coefficient as a Performance Measure for Feature Selection,
S. M. Vieira, U. Kaymak, and J. M. Sousa, “Cohen’s Kappa Coefficient as a Performance Measure for Feature Selection,” inInternational conference on fuzzy systems. IEEE, 2010, pp. 1–8. [Online]. Available: https://doi.org/doi={10.1109/FUZZY .2010.5584447}
-
[22]
Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why,
D. Rao and C. Callison-Burch, “Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why,” arXiv preprint arXiv:2606.00093, 2026
Pith/arXiv arXiv 2026
-
[23]
BotHunter: An Approach to Detect Software Bots in GitHub,
A. Abdellatif, M. Wessel, I. Steinmacher, M. A. Gerosa, and E. Shihab, “BotHunter: An Approach to Detect Software Bots in GitHub,” inProceedings of the 19th International Conference on Mining Software Repositories, ser. MSR ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 6–17. [Online]. Available: https://doi.org/10.1145/ 3524842.3527959
arXiv 2022
-
[24]
A Dataset of Bot and Human Activities in GitHub,
N. Chidambaram, A. Decan, and T. Mens, “A Dataset of Bot and Human Activities in GitHub,” in2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), 2023, pp. 465–469. [Online]. Available: https://doi.org/10.1109/MSR59073. 2023.00070
-
[25]
Comparative Analysis of Open-Source Tools for Conducting Static Code Analysis,
K. Kuszczy ´nski and M. Walkowski, “Comparative Analysis of Open-Source Tools for Conducting Static Code Analysis,”Sensors, vol. 23, no. 18, p. 7978, 2023. [Online]. Available: https://doi.org/10.3390/s23187978
-
[26]
Semgrep*: Improving the Limited Performance of Static Application Security Testing (SAST) Tools,
G. Bennett, T. Hall, E. Winter, and S. Counsell, “Semgrep*: Improving the Limited Performance of Static Application Security Testing (SAST) Tools,” in Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, ser. EASE ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 614–623. [Online]. Ava...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.