Recognition: unknown
Insights into Security-Related AI-Generated Pull Requests
Pith reviewed 2026-05-10 01:43 UTC · model grok-4.3
The pith
AI-generated security pull requests introduce recurring weaknesses like regex inefficiencies and injection flaws, with many flawed ones still merged.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study of 675 security-related AI-generated pull requests from over 33,000 total AI PRs identifies a small set of recurring weaknesses such as regex inefficiencies, injection flaws, and path traversal. Many flawed contributions are still merged, while rejections often arise from social or process factors such as inactivity or missing test coverage. Commit message quality shows limited effect on acceptance or latency unlike in human PRs, and the work extends existing rejection taxonomies with categories unique to AI-generated security contributions.
What carries the argument
Identification of security-related AI PRs and categorization of their recurring weaknesses together with an extended taxonomy of rejection reasons specific to AI security submissions.
If this is right
- Many flawed security-related contributions from AI agents are merged into software projects.
- Rejections of AI security PRs are more often tied to inactivity or missing tests than to the security weaknesses themselves.
- Commit message quality does not strongly influence acceptance or review latency for AI-generated security PRs.
- Rejection taxonomies for pull requests can be extended with new categories that apply specifically to AI security submissions.
Where Pith is reading between the lines
- AI coding agents may need targeted safeguards against the narrow set of weaknesses that recur in security PRs.
- Project maintainers could benefit from automated detectors tuned to the common AI security flaws before merging.
- The limited role of commit messages suggests review processes for AI PRs may need different signals than those used for human PRs.
Load-bearing premise
The 675 security-related PRs were accurately and consistently identified from the 33,000 AI-generated PRs without significant selection or labeling bias.
What would settle it
An independent review of the AI PR dataset that reveals a substantially different distribution of weaknesses or shows that rejections are driven primarily by security concerns instead of process factors.
Figures
read the original abstract
Recent years have experienced growing contributions of AI coding agents that assist human developers in various software engineering tasks. However, this growing AI-assisted autonomy raises questions about security and trust. In this paper, we analyze more than 33,000 AI-generated pull requests (PRs) and identify 675 security-related submissions made by agentic AIs. Then we examine the security-related PRs with a focus on recurring security weaknesses, review outcomes and latency, commit message quality, and rejection reasons. The results show that security-related AI PRs introduce a small set of recurring weaknesses such as regex inefficiencies, injection flaws, and path traversal. Many flawed contributions are still merged, while rejections often arise from social or process factors such as inactivity or missing test coverage. The commit message quality of AI PRs has a limited effect on acceptance or latency, in contrast to human PRs reported in previous studies. We also extend existing rejection taxonomies by adding categories that are unique to AI-generated security contributions. These findings offer new insights into the strengths and shortcomings of autonomous coding systems in secure software development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes more than 33,000 AI-generated pull requests and identifies 675 security-related submissions. It examines these for recurring security weaknesses (regex inefficiencies, injection flaws, path traversal), review outcomes and latency, commit message quality, and rejection reasons. Key findings are that many flawed AI PRs are still merged, rejections often stem from social/process factors like inactivity or missing tests, commit message quality has limited effect on acceptance (unlike human PRs), and existing rejection taxonomies are extended with AI-specific categories.
Significance. If the 675-PRs sample is accurately and representatively identified, the study supplies useful empirical observations on security risks from autonomous AI coding agents in open-source settings. The identification of recurring weakness patterns and the extension of rejection taxonomies could inform both tooling and future research on AI-assisted secure development.
major comments (1)
- Abstract: the identification of the 675 security-related PRs from the 33,000 AI-generated PRs is stated without any description of the detection method (e.g., keywords, classifier, manual review), validation procedure, inter-rater reliability, or false-positive/negative rates. Because every subsequent claim—recurring weaknesses, merge rates, rejection reasons, and taxonomy extensions—rests on this subset being a clean sample, the absence of these details is load-bearing for the central empirical contribution.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and the recommendation for major revision. We address the single major comment below and agree that strengthening the abstract will improve the manuscript.
read point-by-point responses
-
Referee: Abstract: the identification of the 675 security-related PRs from the 33,000 AI-generated PRs is stated without any description of the detection method (e.g., keywords, classifier, manual review), validation procedure, inter-rater reliability, or false-positive/negative rates. Because every subsequent claim—recurring weaknesses, merge rates, rejection reasons, and taxonomy extensions—rests on this subset being a clean sample, the absence of these details is load-bearing for the central empirical contribution.
Authors: We agree that the abstract would be strengthened by briefly describing the identification process. The full manuscript details this in the methodology section, which outlines the multi-stage approach used to select the 675 security-related PRs from the larger corpus. We will revise the abstract to include a concise summary of the detection method, validation steps, and any reported reliability or error considerations. This revision will be incorporated in the next version of the paper. revision: yes
Circularity Check
No circularity: purely observational empirical study
full rationale
The paper performs data collection and qualitative analysis on external GitHub PRs (33k total, 675 labeled security-related). No equations, fitted parameters, predictions, or derivations exist. Identification of the 675 PRs, weakness taxonomy, merge/rejection statistics, and taxonomy extensions are direct observations from the dataset rather than reductions to self-definitions, self-citations, or renamed inputs. No load-bearing self-citation chains or ansatzes are present. The central claims rest on external data and manual/automated labeling whose validity is a separate methodological concern, not a circularity issue.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AI-generated PRs can be reliably distinguished from human ones and their security relevance can be accurately assessed through manual or automated review
Reference graph
Works this paper leans on
-
[1]
Software security analysis in 2030 and beyond: A research roadmap.ACM Transactions on Software Engineering and Methodology, 34(5):1–26, 2025
Marcel Böhme, Eric Bodden, Tevfik Bultan, Cristian Cadar, Yang Liu, and Giuseppe Scanniello. Software security analysis in 2030 and beyond: A research roadmap.ACM Transactions on Software Engineering and Methodology, 34(5):1–26, 2025. 15
2030
-
[2]
Software security in practice: knowledge and motivation.Journal of Cybersecurity, 11(1):tyaf005, 2025
Hala Assal, Srivathsan G Morkonda, Muhammad Zaid Arif, and Sonia Chiasson. Software security in practice: knowledge and motivation.Journal of Cybersecurity, 11(1):tyaf005, 2025
2025
-
[3]
Vulnerabilities and security patches detection in oss: a survey.ACM Computing Surveys, 57(1):1–37, 2024
Ruyan Lin, Yulong Fu, Wei Yi, Jincheng Yang, Jin Cao, Zhiqiang Dong, Fei Xie, and Hui Li. Vulnerabilities and security patches detection in oss: a survey.ACM Computing Surveys, 57(1):1–37, 2024
2024
-
[4]
Xu Yang, Wenhan Zhu, Michael Pacheco, Jiayuan Zhou, Shaowei Wang, Xing Hu, and Kui Liu. Code change intention, development artifact, and history vulnerability: Putting them together for vulnerability fix detection by llm.Proceedings of the ACM on Software Engineering, 2(FSE):489–510, 2025
2025
-
[5]
An exploratory study of the pull-based software development model
Georgios Gousios, Martin Pinzger, and Arie van Deursen. An exploratory study of the pull-based software development model. InProceedings of the 36th international conference on software engineering, pages 345–355, 2014
2014
-
[6]
Expectations, outcomes, and challenges of modern code review
Alberto Bacchelli and Christian Bird. Expectations, outcomes, and challenges of modern code review. In2013 35th International Conference on Software Engineering (ICSE), pages 712–721, 2013
2013
-
[7]
How do software developers use chatgpt? an exploratory study on github pull requests
Moataz Chouchen, Narjes Bessghaier, Mahi Begoug, Ali Ouni, Eman Alomar, and Mohamed Wiem Mkaouer. How do software developers use chatgpt? an exploratory study on github pull requests. InProceedings of the 21st International Conference on Mining Software Repositories, pages 212–216, 2024
2024
-
[8]
Generative ai for pull request descriptions: Adoption, impact, and developer interventions.Proceedings of the ACM on Software Engineering, 1(FSE):1043– 1065, 2024
Tao Xiao, Hideaki Hata, Christoph Treude, and Kenichi Matsumoto. Generative ai for pull request descriptions: Adoption, impact, and developer interventions.Proceedings of the ACM on Software Engineering, 1(FSE):1043– 1065, 2024
2024
-
[9]
On the use of agentic coding: An empirical study of pull requests on github,
Miku Watanabe, Hao Li, Yutaro Kashiwa, Brittany Reid, Hajimu Iida, and Ahmed E Hassan. On the use of agentic coding: An empirical study of pull requests on github.arXiv preprint arXiv:2509.14745, 2025
-
[10]
Ranjan Sapkota, Konstantinos I Roumeliotis, and Manoj Karkee. Vibe coding vs. agentic coding: Fundamentals and practical implications of agentic ai.arXiv preprint arXiv:2505.19443, 2025
-
[11]
An empirical study of automation in software security patch management
Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi, and Muhammad Ali Babar. An empirical study of automation in software security patch management. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pages 1–13, 2022
2022
-
[12]
Patchtrack: Analyzing chatgpt’s impact on software patch decision-making in pull requests
Daniel Ogenrwot and John Businge. Patchtrack: Analyzing chatgpt’s impact on software patch decision-making in pull requests. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, page 2480–2481, New York, NY , USA, 2024. ACM
2024
-
[13]
Dependabot and security pull requests: large empirical study.Empirical Software Engineering, 29(5):128, 2024
Hocine Rebatchi, Tégawendé F Bissyandé, and Naouel Moha. Dependabot and security pull requests: large empirical study.Empirical Software Engineering, 29(5):128, 2024
2024
-
[14]
On the use of dependabot security pull requests
Mahmoud Alfadel, Diego Elias Costa, Emad Shihab, and Mouafak Mkhallalati. On the use of dependabot security pull requests. In2021 IEEE/ACM 18th International conference on mining software repositories (MSR), pages 254–265. IEEE, 2021
2021
-
[15]
How to get developers to accept security prs faster
Andrew Stiefel. How to get developers to accept security prs faster. https://www.endorlabs.com/learn/ how-to-get-developers-to-accept-security-prs-faster , February 2025. Endor Labs. Accessed: 2025-10-14
2025
-
[16]
Hao Li, Haoxiang Zhang, and Ahmed E Hassan. The rise of ai teammates in software engineering (se) 3.0: How autonomous coding agents are reshaping software engineering.arXiv preprint arXiv:2507.15003, 2025
work page internal anchor Pith review arXiv 2025
-
[17]
Why do developers reject refactorings in open-source projects?ACM Transactions on Software Engineering and Methodology (TOSEM), 31(2):1–23, 2021
Jevgenija Pantiuchina, Bin Lin, Fiorella Zampetti, Massimiliano Di Penta, Michele Lanza, and Gabriele Bavota. Why do developers reject refactorings in open-source projects?ACM Transactions on Software Engineering and Methodology (TOSEM), 31(2):1–23, 2021
2021
-
[18]
insights into security-related ai-generated pull requests
Md Fazle Rabbi, Asif K. Turzo, Arifa I. Champa, and Minhaz F. Zibran. Replication package: “insights into security-related ai-generated pull requests”.https://doi.org/10.6084/m9.figshare.30421996, 2025
-
[19]
An empirical study of retrieval-augmented code generation: Challenges and opportunities.ACM Transactions on Software Engineering and Methodology, 2025
Zezhou Yang, Sirong Chen, Cuiyun Gao, Zhenhao Li, Xing Hu, Kui Liu, and Xin Xia. An empirical study of retrieval-augmented code generation: Challenges and opportunities.ACM Transactions on Software Engineering and Methodology, 2025
2025
-
[20]
Humanevalcomm: Benchmarking the communication competence of code generation for llms and llm agents.ACM Transactions on Software Engineering and Methodology, 34(7):1–42, 2025
Jie JW Wu and Fatemeh H Fard. Humanevalcomm: Benchmarking the communication competence of code generation for llms and llm agents.ACM Transactions on Software Engineering and Methodology, 34(7):1–42, 2025
2025
-
[21]
Huanting Wang, Jingzhi Gong, Huawei Zhang, and Zheng Wang. Ai agentic programming: A survey of techniques, challenges, and opportunities.arXiv preprint arXiv:2508.11126, 2025
-
[22]
The impact of generative ai on open-source community engagement
Karthik Babu Nattamai Kannan and Narayan Ramasubbu. The impact of generative ai on open-source community engagement. 2025. 16
2025
-
[23]
Abhik Roychoudhury, Corina Pasareanu, Michael Pradel, and Baishakhi Ray. Agentic ai software engineer: Programming with trust.arXiv preprint arXiv:2502.13767, 2025
-
[24]
Investigating and designing for trust in ai-powered code generation tools
Ruotong Wang, Ruijia Cheng, Denae Ford, and Thomas Zimmermann. Investigating and designing for trust in ai-powered code generation tools. InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 1475–1493, 2024
2024
-
[25]
Trust, transparency, and adoption in generative ai for software engineering: Insights from twitter discourse.Information and Software Technology, 186:107804, 2025
Manaal Basha and Gema Rodríguez-Pérez. Trust, transparency, and adoption in generative ai for software engineering: Insights from twitter discourse.Information and Software Technology, 186:107804, 2025
2025
-
[26]
Trust in ai: progress, challenges, and future directions.Humanities and Social Sciences Communications, 11(1):1–30, 2024
Saleh Afroogh, Ali Akbari, Emmie Malone, Mohammadali Kargar, and Hananeh Alambeigi. Trust in ai: progress, challenges, and future directions.Humanities and Social Sciences Communications, 11(1):1–30, 2024
2024
-
[27]
Storer, Derek DeBellis, Sarah D’Angelo, and Adam Brown
Kevin M. Storer, Derek DeBellis, Sarah D’Angelo, and Adam Brown. Fostering developers’ trust in generative artificial intelligence. Technical report, DORA Research, March 2025. https://dora.dev/research/ai/ trust-in-ai/. Accessed: 2025-09-30
2025
-
[28]
A large-scale empirical study of security patches
Frank Li and Vern Paxson. A large-scale empirical study of security patches. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 2201–2215, 2017
2017
-
[29]
A large-scale analysis of the effectiveness of publicly reported security patches.Computers & Security, 148:104181, 2025
Seunghoon Woo, Eunjin Choi, and Heejo Lee. A large-scale analysis of the effectiveness of publicly reported security patches.Computers & Security, 148:104181, 2025
2025
-
[30]
Software security patch management- a systematic literature review of challenges, approaches, tools and practices.Information and Software Technology, 144:106771, 2022
Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi, and M Ali Babar. Software security patch management- a systematic literature review of challenges, approaches, tools and practices.Information and Software Technology, 144:106771, 2022
2022
-
[31]
Patchdb: A large-scale security patch dataset
Xinda Wang, Shu Wang, Pengbin Feng, Kun Sun, and Sushil Jajodia. Patchdb: A large-scale security patch dataset. In2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 149–160. IEEE, 2021
2021
-
[32]
Just-in-time detection of silent security patches.ACM Transactions on Software Engineering and Methodology, 2025
Xunzhu Tang, Kisub Kim, Saad Ezzini, Yewei Song, Haoye Tian, Jacques Klein, and Tegawende Bissyande. Just-in-time detection of silent security patches.ACM Transactions on Software Engineering and Methodology, 2025
2025
-
[33]
On the feasibility of stealthily introducing vulnerabilities in open-source software via hypocrite commits.Proc
Qiushi Wu and Kangjie Lu. On the feasibility of stealthily introducing vulnerabilities in open-source software via hypocrite commits.Proc. Oakland, 17, 2021
2021
-
[34]
Influence of social and technical factors for evaluating contri- bution in github
Jason Tsay, Laura Dabbish, and James Herbsleb. Influence of social and technical factors for evaluating contri- bution in github. InProceedings of the 36th international conference on Software engineering, pages 356–366, 2014
2014
-
[35]
Work practices and challenges in pull-based development: The integrator’s perspective
Georgios Gousios, Andy Zaidman, Margaret-Anne Storey, and Arie Van Deursen. Work practices and challenges in pull-based development: The integrator’s perspective. In2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, volume 1, pages 358–368. IEEE, 2015
2015
-
[36]
Pull request decisions explained: An empirical overview.IEEE Transactions on Software Engineering, 49(2):849–871, 2022
Xunhui Zhang, Yue Yu, Georgios Gousios, and Ayushi Rastogi. Pull request decisions explained: An empirical overview.IEEE Transactions on Software Engineering, 49(2):849–871, 2022
2022
-
[37]
Wait for it: Determinants of pull request evaluation latency on github
Yue Yu, Huaimin Wang, Vladimir Filkov, Premkumar Devanbu, and Bogdan Vasilescu. Wait for it: Determinants of pull request evaluation latency on github. In2015 IEEE/ACM 12th working conference on mining software repositories, pages 367–371. IEEE, 2015
2015
-
[38]
Pull request latency explained: An empirical overview.Empirical Software Engineering, 27(6):126, 2022
Xunhui Zhang, Yue Yu, Tao Wang, Ayushi Rastogi, and Huaimin Wang. Pull request latency explained: An empirical overview.Empirical Software Engineering, 27(6):126, 2022
2022
-
[39]
What makes a good commit message? In Proceedings of the 44th International Conference on Software Engineering, pages 2389–2401, 2022
Yingchen Tian, Yuxia Zhang, Klaas-Jan Stol, Lin Jiang, and Hui Liu. What makes a good commit message? In Proceedings of the 44th International Conference on Software Engineering, pages 2389–2401, 2022
2022
-
[40]
Commit message matters: Investigating impact and evolution of commit message quality
Jiawei Li and Iftekhar Ahmed. Commit message matters: Investigating impact and evolution of commit message quality. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 806–817. IEEE, 2023
2023
-
[41]
Jiawei Li, David Faragó, Christian Petrov, and Iftekhar Ahmed. Optimization is better than generation: Optimizing commit message leveraging human-written commit message.arXiv preprint arXiv:2501.09861, 2025
-
[42]
Codex.https://openai.com/codex/, 2025
OpenAI. Codex.https://openai.com/codex/, 2025. Accessed: 2025-10-15
2025
-
[43]
Devin, the ai software engineer, 2025
Cognition AI. Devin, the ai software engineer, 2025. Available at: https://devin.ai. Accessed: 2025-10-15
2025
-
[44]
Github copilot, 2025
GitHub. Github copilot, 2025. Available at: https://github.com/features/copilot. Accessed: 2025-10- 15. 17
2025
-
[45]
Cursor, 2025
Cursor. Cursor, 2025. Available at:https://cursor.com. Accessed: 2025-10-15
2025
-
[46]
Claude code, 2025
Anthropic. Claude code, 2025. Available at: https://www.claude.com/product/claude-code. Accessed: 2025-10-15
2025
-
[47]
Text filtering and ranking for security bug report prediction.IEEE Transactions on Software Engineering, 45(6):615–631, 2017
Fayola Peters, Thein Than Tun, Yijun Yu, and Bashar Nuseibeh. Text filtering and ranking for security bug report prediction.IEEE Transactions on Software Engineering, 45(6):615–631, 2017
2017
-
[48]
Why security defects go unnoticed during code reviews? a case-control study of the chromium os project
Rajshakhar Paul, Asif Kamal Turzo, and Amiangshu Bosu. Why security defects go unnoticed during code reviews? a case-control study of the chromium os project. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 1373–1385. IEEE, 2021
2021
-
[49]
Spi: Automated identification of security patches via commits.ACM Transactions on Software Engineering and Methodology (TOSEM), 31(1):1–27, 2021
Yaqin Zhou, Jing Kai Siow, Chenyu Wang, Shangqing Liu, and Yang Liu. Spi: Automated identification of security patches via commits.ACM Transactions on Software Engineering and Methodology (TOSEM), 31(1):1–27, 2021
2021
-
[50]
Automated identification of security issues from commit messages and bug reports
Yaqin Zhou and Asankhaya Sharma. Automated identification of security issues from commit messages and bug reports. InProceedings of the 2017 11th joint meeting on foundations of software engineering, pages 914–919, 2017
2017
-
[51]
Annotating materials science text: A semi-automated approach for crafting outputs with gemini pro.Integrating Materials and Manufacturing Innovation, 13(2):445– 452, 2024
Hasan M Sayeed, Trupti Mohanty, and Taylor D Sparks. Annotating materials science text: A semi-automated approach for crafting outputs with gemini pro.Integrating Materials and Manufacturing Innovation, 13(2):445– 452, 2024
2024
-
[52]
Automated reddit data annotation with large language models
Sai Vuruma, Dezhi Wu, Saborny Sen Gupta, Lucas Aust, Valerie Lookingbill, Wyatt Bellamy, Yang Ren, Erin Kasson, Li-Shiun Chen, Patricia Cavazos-Rehg, et al. Automated reddit data annotation with large language models. In2025 IEEE 13th International Conference on Healthcare Informatics (ICHI), pages 251–260. IEEE, 2025
2025
-
[53]
Chatgpt outperforms crowd workers for text-annotation tasks.Proceedings of the National Academy of Sciences, 120(30):e2305016120, 2023
Fabrizio Gilardi, Meysam Alizadeh, and Maël Kubli. Chatgpt outperforms crowd workers for text-annotation tasks.Proceedings of the National Academy of Sciences, 120(30):e2305016120, 2023
2023
-
[54]
Studying the impact of noises in build breakage data.IEEE Transactions on Software Engineering, 47(9):1998–2011, 2019
Taher Ahmed Ghaleb, Daniel Alencar Da Costa, Ying Zou, and Ahmed E Hassan. Studying the impact of noises in build breakage data.IEEE Transactions on Software Engineering, 47(9):1998–2011, 2019
1998
-
[55]
An empirical study of issue-link algorithms: which issue-link algorithms should we use?Empirical Software Engineering, 27(6):136, 2022
Masanari Kondo, Yutaro Kashiwa, Yasutaka Kamei, and Osamu Mizuno. An empirical study of issue-link algorithms: which issue-link algorithms should we use?Empirical Software Engineering, 27(6):136, 2022
2022
-
[56]
The nature of build changes: An empirical study of maven-based build systems.Empirical Software Engineering, 26(3):32, 2021
Christian Macho, Stefanie Beyer, Shane McIntosh, and Martin Pinzger. The nature of build changes: An empirical study of maven-based build systems.Empirical Software Engineering, 26(3):32, 2021
2021
-
[57]
What happens in my code reviews? an investigation on automatically classifying review changes.Empirical Software Engineering, 27(4):89, 2022
Enrico Fregnan, Fernando Petrulio, Linda Di Geronimo, and Alberto Bacchelli. What happens in my code reviews? an investigation on automatically classifying review changes.Empirical Software Engineering, 27(4):89, 2022
2022
-
[58]
Using the confidence interval confidently.Journal of thoracic disease, 9(10):4125, 2017
Avijit Hazra. Using the confidence interval confidently.Journal of thoracic disease, 9(10):4125, 2017
2017
-
[59]
Cohen’s kappa coefficient as a performance measure for feature selection
Susana M Vieira, Uzay Kaymak, and João MC Sousa. Cohen’s kappa coefficient as a performance measure for feature selection. InInternational conference on fuzzy systems, pages 1–8. IEEE, 2010
2010
-
[60]
accessed: 2025-09-24
Semgrep.https://semgrep.dev, 2025. accessed: 2025-09-24
2025
-
[61]
Semgrep*: Improving the limited performance of static application security testing (sast) tools
Gareth Bennett, Tracy Hall, Emily Winter, and Steve Counsell. Semgrep*: Improving the limited performance of static application security testing (sast) tools. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, pages 614–623, 2024
2024
-
[62]
On detecting and measuring exploitable javascript functions in real-world applications.ACM Transactions on Privacy and Security, 27(1):1–37, 2024
Maryna Kluban, Mohammad Mannan, and Amr Youssef. On detecting and measuring exploitable javascript functions in real-world applications.ACM Transactions on Privacy and Security, 27(1):1–37, 2024
2024
-
[63]
Evaluating c/c++ vulnerability detectability of query-based static application security testing tools.IEEE Transactions on Dependable and Secure Computing, 21(5):4600–4618, 2024
Zongjie Li, Zhibo Liu, Wai Kin Wong, Pingchuan Ma, and Shuai Wang. Evaluating c/c++ vulnerability detectability of query-based static application security testing tools.IEEE Transactions on Dependable and Secure Computing, 21(5):4600–4618, 2024
2024
-
[64]
♪ with a little help from my (llm) friends: Enhancing static analysis with llms to detect software vulnerabilities
Amy Munson, Juanita Gomez, and Álvaro A Cárdenas. ♪ with a little help from my (llm) friends: Enhancing static analysis with llms to detect software vulnerabilities. In2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code), pages 25–32. IEEE, 2025
2025
-
[65]
Effect of technical and social factors on pull request quality for the npm ecosystem
Tapajit Dey and Audris Mockus. Effect of technical and social factors on pull request quality for the npm ecosystem. InProceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1–11, 2020
2020
-
[66]
Nearest neighbor selection for iteratively knn imputation.Journal of Systems and Software, 85(11):2541–2552, 2012
Shichao Zhang. Nearest neighbor selection for iteratively knn imputation.Journal of Systems and Software, 85(11):2541–2552, 2012. 18
2012
-
[67]
The power of outliers (and why researchers should always check for them)
Jason W Osborne and Amy Overbay. The power of outliers (and why researchers should always check for them). Practical Assessment, Research, and Evaluation, 9(1), 2004
2004
-
[68]
Multivariable modeling strategies
Frank E Harrell Jr. Multivariable modeling strategies. InRegression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis, pages 63–102. Springer, 2015
2015
-
[69]
Linear regression models with logarithmic transformations.London School of Economics, London, 22(1):23–36, 2011
Kenneth Benoit. Linear regression models with logarithmic transformations.London School of Economics, London, 22(1):23–36, 2011
2011
-
[70]
Correlation tests in r: pearson cor, kendall’s tau, and spearman’s rho
Kingsley Okoye and Samira Hosseini. Correlation tests in r: pearson cor, kendall’s tau, and spearman’s rho. InR programming: Statistical data analysis in research, pages 247–277. Springer, 2024
2024
-
[71]
Inferring probability of relevance using the method of logistic regression
Fredric C Gey. Inferring probability of relevance using the method of logistic regression. InSIGIR’94: Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, organised by Dublin City University, pages 222–231. Springer, 1994
1994
-
[72]
John Wiley & Sons, 2013
David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant.Applied logistic regression. John Wiley & Sons, 2013
2013
-
[73]
Segress: Software engineering guidelines for reporting secondary studies.IEEE Transactions on Software Engineering, 49(3):1273–1298, 2022
Barbara Kitchenham, Lech Madeyski, and David Budgen. Segress: Software engineering guidelines for reporting secondary studies.IEEE Transactions on Software Engineering, 49(3):1273–1298, 2022
2022
-
[74]
Making sense of card sorting data.Expert Systems, 22(3):89–93, 2005
Sally Fincher and Josh Tenenberg. Making sense of card sorting data.Expert Systems, 22(3):89–93, 2005
2005
-
[75]
The measurement of observer agreement for categorical data.biometrics, pages 159–174, 1977
J Richard Landis and Gary G Koch. The measurement of observer agreement for categorical data.biometrics, pages 159–174, 1977
1977
-
[76]
Domenico Cotroneo, Cristina Improta, and Pietro Liguori. Human-written vs. ai-generated code: A large-scale study of defects, vulnerabilities, and complexity.arXiv preprint arXiv:2508.21634, 2025
-
[77]
Heejae Chon, Seonghyeon Lee, Jinyoung Yeo, and Dongha Lee. Is functional correctness enough to evaluate code language models? exploring diversity of generated codes.arXiv preprint arXiv:2408.14504, 2024
-
[78]
Benchmarks and metrics for evaluations of code generation: A critical review
Debalina Ghosh Paul, Hong Zhu, and Ian Bayley. Benchmarks and metrics for evaluations of code generation: A critical review. In2024 IEEE International Conference on Artificial Intelligence Testing (AITest), pages 87–94. IEEE, 2024
2024
-
[79]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation.Advances in Neural Information Processing Systems, 36:21558–21572, 2023
2023
-
[80]
Security degradation in iterative ai code generation – a systematic analysis of the paradox,
Shivani Shukla, Himanshu Joshi, and Romilla Syed. Security degradation in iterative ai code generation–a systematic analysis of the paradox.arXiv preprint arXiv:2506.11022, 2025. 19
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.