Recognition: 1 theorem link
· Lean TheoremLongitudinal Analyses of SAST Tools: A CodeQL Case Study
Pith reviewed 2026-05-11 03:30 UTC · model grok-4.3
The pith
CodeQL detects 171 CVEs across OSS but only 83 before fixes, with detections shifting after tool updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CodeQL identifies a total of 171 CVEs in the studied repositories. For 83 of those, an earlier version of the tool could have flagged the vulnerability before the fix commit. Within vulnerable files, half the detections show more than 50 percent of findings concentrated at the exact vulnerable location, making them potentially actionable with file-level triage. Detections are not stable: 21 CVEs stop being reported after a version change and 17 are never redetected once lost.
What carries the argument
Longitudinal apparatus that replays multiple historical versions of CodeQL on the pre-fix state of each CVE repository to measure pre-fix detection, actionability via location distance, and stability of alerts across tool releases.
If this is right
- SAST tools can block many vulnerabilities from entering OSS codebases when used before merges.
- Tool updates can remove coverage for previously detectable vulnerabilities, creating new blind spots.
- Focusing triage on the single vulnerable file rather than the whole codebase makes most detections actionable.
- Developers should treat SAST output as version-dependent and may need to retain older tool versions for critical checks.
Where Pith is reading between the lines
- Pinning to a specific CodeQL version or running parallel analyses with several versions could reduce lost detections.
- The same longitudinal replay method could be used to benchmark other SAST tools and identify which maintain stable coverage over time.
- Rule changes that drop coverage for real CVEs suggest a need for regression testing of new tool releases against known vulnerable code.
Load-bearing premise
The 3993 CVEs and 1622 repositories are representative of the wider OSS ecosystem and CodeQL findings can be mapped accurately to the precise vulnerable code locations.
What would settle it
Repeating the analysis on a fresh, larger sample of CVEs drawn from different repositories and languages shows substantially lower pre-fix detection rates or much higher rates of lost detections after version changes.
Figures
read the original abstract
Open-source software (OSS) pipelines rely on automated static analysis tools to prevent the introduction of vulnerabilities in code. However, there is limited understanding of the efficacy of these tools across the OSS ecosystem over time. In this paper, we introduce a novel method to evaluate static application security testing (SAST) tools through longitudinal measurements and perform the largest academic study of CodeQL -- the most prevalent static analysis tool from GitHub -- on OSS codebases. We apply our apparatus on 114 versions of CodeQL over time on 3993 CVEs from 1622 repositories to measure key properties of the tool, culminating in more than 20 billion lines of code analyzed. First, we measure its effectiveness, i.e., its ability to detect vulnerabilities before they are fixed. Then, we determine whether these detections were actionable through two measures of the distance between findings and vulnerability location either over the entire codebase or within the vulnerable file. Finally, we study the stability of CodeQL by examining how vulnerability detections hold across versions and the evolution of CodeQL on the accuracy-precision trade-off. We find that CodeQL identifies a total of 171 CVEs, and that for 83 of them, a CodeQL version prior to the fix could detect it. Such detections are in general actionable if findings are triaged across files, as for 50% of the 171 detections, more than 50% of findings in the vulnerable file are located in the vulnerable location. Finally, we show that CVE detections are not monotonic across versions as 21 CVEs were no longer detected following a version change and 17 that were never redetected. Our study shows that using SAST tools is a matter of best practice as they prevent numerous vulnerabilities from being introduced, but that developers should be aware of changes that may leave blind spots in detections upon updates of the tool.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a longitudinal evaluation method for SAST tools and applies it to 114 historical versions of CodeQL across 3993 CVEs from 1622 OSS repositories (over 20 billion LOC analyzed). It reports that CodeQL detects 171 CVEs in total, with 83 detectable by a pre-fix version; claims these detections are generally actionable under file-level triage because 50% of the 171 cases have more than 50% of intra-file findings at the vulnerable location; and shows that detections are not monotonic across versions (21 CVEs lost after an update, 17 never redetected). The work concludes that SAST tools prevent vulnerabilities but that updates can introduce blind spots.
Significance. If the empirical measurements hold, the study provides the largest-scale academic longitudinal data on CodeQL's real-world effectiveness, actionability, and version stability in OSS codebases. The scale, use of historical tool versions, and dual metrics (whole-codebase and intra-file distance) are strengths that go beyond single-snapshot analyses and can directly inform developer practices and tool-maintenance decisions.
major comments (2)
- [Abstract and results section on actionability] The actionability claim (that detections are 'in general actionable' when triaged across files) rests on the intra-file concentration statistic (50% of the 171 detections have >50% of file-level findings at the vulnerable location). This requires reliable extraction of the exact vulnerable location from CVE/patch data and precise alignment of CodeQL alerts to that location across 114 tool versions. The manuscript provides no methodological details on CVE selection criteria, the definition of 'vulnerable location,' false-positive filtering, or how alert sites are matched when queries evolve; any systematic offset would invalidate the concentration metric. (Abstract; results on actionability.)
- [Abstract and data-collection section] The headline counts (171 total detections, 83 pre-fix) are presented without describing how the 3993-CVE sample was constructed or how 'vulnerable location' is operationalized. This leaves open whether the sample is biased toward easily mappable cases and whether the reported percentages generalize. (Abstract; § on data collection / CVE corpus.)
minor comments (1)
- [Abstract] The abstract states 'more than 20 billion lines of code analyzed' but does not clarify whether this is unique LOC or includes re-analysis of the same files across versions; a brief clarification would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and positive assessment of the study's scale and potential impact. We address each major comment below and will perform a major revision to incorporate additional methodological details as requested.
read point-by-point responses
-
Referee: [Abstract and results section on actionability] The actionability claim (that detections are 'in general actionable' when triaged across files) rests on the intra-file concentration statistic (50% of the 171 detections have >50% of file-level findings at the vulnerable location). This requires reliable extraction of the exact vulnerable location from CVE/patch data and precise alignment of CodeQL alerts to that location across 114 tool versions. The manuscript provides no methodological details on CVE selection criteria, the definition of 'vulnerable location,' false-positive filtering, or how alert sites are matched when queries evolve; any systematic offset would invalidate the concentration metric. (Abstract; results on actionability.)
Authors: We agree that the manuscript would benefit from expanded methodological transparency to support the actionability claims. In the revised version, we will add a new subsection under Data Collection detailing: CVE selection criteria (including how the 3993-CVE corpus was filtered from a larger set of public CVEs with available patches and repositories), the operational definition of 'vulnerable location' (extracted from CVE descriptions, patch diffs, and commit metadata), the alignment procedure for CodeQL alerts across the 114 tool versions (accounting for query changes), and any post-processing to mitigate false positives. We will also include a limitations paragraph discussing potential alignment offsets and their impact on the intra-file concentration metric. These additions will allow readers to evaluate the reliability of the 50% statistic. revision: yes
-
Referee: [Abstract and data-collection section] The headline counts (171 total detections, 83 pre-fix) are presented without describing how the 3993-CVE sample was constructed or how 'vulnerable location' is operationalized. This leaves open whether the sample is biased toward easily mappable cases and whether the reported percentages generalize. (Abstract; § on data collection / CVE corpus.)
Authors: We acknowledge the need for explicit description of the sample construction. The 3993 CVEs were drawn from public OSS repositories with available historical code and patches, but the current text does not detail inclusion/exclusion criteria or potential biases toward mappable cases. In the revision, we will expand the Data Collection section to describe the full sampling process, report the number of CVEs excluded at each stage (e.g., due to missing patches or non-analyzable repositories), and discuss generalizability of the 171 detections and 83 pre-fix cases. This will clarify whether the headline counts are representative of the broader CVE population. revision: yes
Circularity Check
Pure empirical measurement study with no derivation chain
full rationale
The paper reports direct tallies from executing 114 CodeQL versions on 3993 CVEs across 1622 repositories, including counts of detections (171 total, 83 pre-fix) and intra-file concentration statistics (50% of cases with >50% findings at vulnerable location). No equations, fitted parameters, predictions, or self-citations are used to derive results; all claims are raw outputs of the measurement apparatus. The mapping of findings to CVE locations is an input assumption, not a derived claim that reduces to itself.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearWe apply our apparatus on 114 versions of CodeQL over time on 3993 CVEs ... measure the distance between findings and vulnerability location either over the entire codebase or within the vulnerable file.
Reference graph
Works this paper leans on
-
[1]
Exploring the orthogonality and linearity of backdoor attacks,
Amit Seal Ami, Kevin Moran, Denys Poshyvanyk, and Adwait Nadkarni. 2024. "False Negative - That One Is Going to Kill You": Understanding Industry Per- spectives of Static Analysis Based Security Testing. In2024 IEEE Symposium on Security and Privacy (SP). 3979–3997. https://doi.org/10.1109/SP54263.2024.00019
-
[2]
Anthropic. [n. d.]. Claude Mythos Preview
-
[3]
Stefan Axelsson. 2000. The Base-Rate Fallacy and the Difficulty of Intrusion Detection.ACM Trans. Inf. Syst. Secur.3, 3 (Aug. 2000), 186–205. https://doi.or g/10.1145/357830.357849
-
[4]
Lingfeng Bao, Xin Xia, Ahmed E. Hassan, and Xiaohu Yang. 2022. V-SZZ: Au- tomatic Identification of Version Ranges Affected by CVE Vulnerabilities. In Proceedings of the 44th International Conference on Software Engineering. ACM, Pittsburgh Pennsylvania, 2352–2364. https://doi.org/10.1145/3510003.3510113
-
[5]
Setu Kumar Basak, Jamison Cox, Bradley Reaves, and Laurie Williams. 2023. A Comparative Study of Software Secrets Reporting by Secret Detection Tools. In2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE Computer Society, Los Alamitos, CA, USA, 1–12. https://doi.org/10.1109/ESEM56168.2023.10304853
-
[6]
Moritz Beller, Radjino Bholanath, Shane McIntosh, and Andy Zaidman. 2016. Analyzing the State of Static Analysis: A Large-Scale Evaluation in Open Source Software. In2016 IEEE 23rd International Conference on Software Analysis, Evolu- tion, and Reengineering (SANER), Vol. 1. 470–481. https://doi.org/10.1109/SANE R.2016.105
-
[7]
Gareth Bennett, Tracy Hall, Steve Counsell, Emily Winter, and Thomas Shippey
-
[8]
Do Developers Use Static Application Security Testing (SAST) Tools Straight Out of the Box? A Large-Scale Empirical Study. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’24). Association for Computing Machinery, New York, NY, USA, 454–460. https://doi.org/10.1145/3674805.3690750
-
[9]
Gareth Bennett, Tracy Hall, Emily Winter, and Steve Counsell. 2024. Semgrep*: Improving the Limited Performance of Static Application Security Testing (SAST) Tools. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. ACM, Salerno Italy, 614–623. https://doi.or g/10.1145/3661167.3661262
-
[10]
Guru Bhandari, Amara Naseer, and Leon Moonen. 2021. CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2021). Association for Computing Machinery, New York, NY, USA, 30–39. https://doi.org/1...
-
[11]
Tim Boland and Paul E. Black. 2012. Juliet 1.1 C/C++ and Java Test Suite.Computer 45, 10 (Oct. 2012), 88–90. https://doi.org/10.1109/MC.2012.345
-
[12]
Tiago Brito, Mafalda Ferreira, Miguel Monteiro, Pedro Lopes, Miguel Barros, José Fragoso Santos, and Nuno Santos. 2023. Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.Js Packages.IEEE Transactions on Reliability72, 4 (Dec. 2023), 1324–1339. https://doi.org/10.1109/TR.2023.3286301
-
[13]
Guillaume Cardoen, Tom Mens, and Alexandre Decan. 2024. A Dataset of GitHub Actions Workflow Histories. InProceedings of the 21st International Conference on Mining Software Repositories (MSR ’24). Association for Computing Machinery, New York, NY, USA, 677–681. https://doi.org/10.1145/3643991.3644867
-
[14]
Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, and Christoph Treude. 2024. An Empirical Study of Static Analysis Tools for Secure Code Review. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 691–703. https://doi.org/10.1145...
-
[15]
Maria Christakis and Christian Bird. 2016. What Developers Want and Need from Program Analysis: An Empirical Study. InProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE ’16). Asso- ciation for Computing Machinery, New York, NY, USA, 332–343. https: //doi.org/10.1145/2970276.2970347
-
[16]
Jamie Cool. 2019. Announcing GitHub Security Lab: Securing the World’s Code, Together
work page 2019
-
[17]
Oege de Moor, Damien Sereni, Mathieu Verbaere, Elnar Hajiyev, Pavel Av- gustinov, Torbjörn Ekman, Neil Ongkingco, and Julian Tibble. 2008. .QL: Object-oriented Queries Made Easy. InGenerative and Transformational Tech- niques in Software Engineering II: International Summer School, GTTSE 2007, Braga, Portugal, July 2-7, 2007. Revised Papers, Ralf Lämmel, ...
-
[18]
Dubniczky, Krisztofer Zoltan Horvát, Tamás Bisztray, Mo- hamed Amine Ferrag, Lucas C
Richard A. Dubniczky, Krisztofer Zoltan Horvát, Tamás Bisztray, Mo- hamed Amine Ferrag, Lucas C. Cordeiro, and Norbert Tihanyi. 2025. CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs Towards CWE Detec- tion. InTheoretical Aspects of Software Engineering: 19th International Symposium, TASE 2025, Limassol, Cyprus, July 14–16, 2025, Proceeding...
-
[19]
Douglas Everson, Long Cheng, and Zhenkai Zhang. 2022. Log4shell: Redefining the Web Attack Surface. InProceedings 2022 Workshop on Measurements, Attacks, and Defenses for the Web. Internet Society, San Diego, CA, USA. https://doi.org/ 10.14722/madweb.2022.23010
-
[20]
FIRST.Org, Inc. [n. d.]. CVSS v3.1 Specification Document. https://www.first.org/cvss/v3.1/specification-document
-
[21]
Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2017. Software Vul- nerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey.ACM Comput. Surv.50, 4 (Aug. 2017), 56:1–56:36. https://doi.org/10.1145/3092566
-
[22]
GitHub. [n. d.]. CodeQL Wall of Fame. https://securitylab.github.com/codeql- wall-of-fame/
-
[23]
GitHub. 2022. CodeQL for VS Code: Download CodeQL Databases from GitHub.Com - GitHub Changelog
work page 2022
-
[24]
GitHub. 2025. Incremental Security Analysis Makes CodeQL up to 20% Faster in Pull Requests - GitHub Changelog
work page 2025
-
[25]
GitHub. 2025. Introducing GitHub Secret Protection and GitHub Code Security - GitHub Changelog
work page 2025
-
[26]
GitHub. 2026. Faster Incremental Analysis with CodeQL in Pull Requests - GitHub Changelog
work page 2026
-
[27]
GitHub. 2026. Github/Codeql. GitHub
work page 2026
-
[28]
HackerOne. [n. d.]. HackerOne. https://www.hackerone.com/
-
[29]
Michael Hilton, Nicholas Nelson, Timothy Tunnell, Darko Marinov, and Danny Dig. 2017. Trade-Offs in Continuous Integration: Assurance, Security, and Flexi- bility. InProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 197–207. https://doi.org/10.1145/310...
-
[30]
Kevin Hogan, Noel Warford, Robert Morrison, David Miller, Sean Malone, and James Purtilo. 2019. The Challenges of Labeling Vulnerability-Contributing Com- mits. In2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 270–275. https://doi.org/10.1109/ISSREW.2019.00083
-
[31]
Junze Hu, Xiangyu Jin, Yizhe Zeng, Yuling Liu, Yunpeng Li, Dan Du, Kaiyu Xie, and Hongsong Zhu. 2025. QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration. https://doi.org/10.48550/arXiv.2506. 23644 arXiv:2506.23644 [cs]
-
[32]
Emanuele Iannone, Roberta Guadagni, Filomena Ferrucci, Andrea De Lucia, and Fabio Palomba. 2023. The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study.IEEE Transactions on Software Engineering49, 1 (Jan. 2023), 44–63. https://doi.org/10.1109/TSE.2022.3140868
-
[33]
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge
-
[34]
In2013 35th International Conference on Software Engineering (ICSE)
Why Don’t Software Developers Use Static Analysis Tools to Find Bugs?. In2013 35th International Conference on Software Engineering (ICSE). IEEE, San Francisco, CA, USA, 672–681. https://doi.org/10.1109/ICSE.2013.6606613
-
[35]
Wooseok Kang, Byoungho Son, and Kihong Heo. 2022. TRACER: Signature- based Static Analysis for Detecting Recurring Vulnerabilities. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS ’22). Association for Computing Machinery, New York, NY, USA, 1695–1708. https://doi.org/10.1145/3548606.3560664
-
[36]
Avishree Khare, Saikat Dutta, Ziyang Li, Alaia Solko-Breslin, Rajeev Alur, and Mayur Naik. 2024. Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities. https://doi.org/10.48550/arXiv.2311.16169 arXiv:2311.16169 [cs]
-
[37]
Piergiorgio Ladisa, Henrik Plate, Matias Martinez, and Olivier Barais. 2023. SoK: Taxonomy of Attacks on Open-Source Software Supply Chains. In2023 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, USA, 1509–1526. https://doi.org/10.1109/SP46215.2023.10179304
-
[38]
Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, and Hanspeter Pfister. 2014. UpSet: Visualization of Intersecting Sets.IEEE Trans- actions on Visualization and Computer Graphics20, 12 (Dec. 2014), 1983–1992. https://doi.org/10.1109/TVCG.2014.2346248
-
[39]
Kaixuan Li, Sen Chen, Lingling Fan, Ruitao Feng, Han Liu, Chengwei Liu, Yang Liu, and Yixiang Chen. 2023. Comparison and Evaluation on Static Application Security Testing (SAST) Tools for Java. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, San Francisco CA USA, 9...
-
[40]
Yuan Li, Peisen Yao, Kan Yu, Chengpeng Wang, Yaoyang Ye, Song Li, Meng Luo, Yepang Liu, and Kui Ren. 2025. Understanding Industry Perspectives of Static Noirot Ferrand et al. Application Security Testing (SAST) Evaluation.Proc. ACM Softw. Eng.2, FSE (June 2025), FSE134:3033–FSE134:3056. https://doi.org/10.1145/3729404
-
[41]
Ziyang Li, Saikat Dutta, and Mayur Naik. 2025. IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. https://doi.org/10.48550/arXiv.2405.17238 arXiv:2405.17238 [cs]
-
[42]
Zongjie Li, Zhibo Liu, Wai Kin Wong, Pingchuan Ma, and Shuai Wang. 2024. Evaluating C/C++ Vulnerability Detectability of Query-Based Static Application Security Testing Tools.IEEE Transactions on Dependable and Secure Computing 21, 5 (Sept. 2024), 4600–4618. https://doi.org/10.1109/TDSC.2024.3354789
-
[43]
Mario Lins, René Mayrhofer, Michael Roland, Daniel Hofer, and Martin Schwaighofer. 2024. On the Critical Path to Implant Backdoors and the Effectiveness of Potential Mitigation Techniques: Early Learnings from XZ. https://doi.org/10.48550/arXiv.2404.08987 arXiv:2404.08987 [cs]
-
[44]
Stephan Lipp, Sebastian Banescu, and Alexander Pretschner. 2022. An Empirical Study on the Effectiveness of Static C Code Analyzers for Vulnerability Detection. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, Virtual South Korea, 544–555. https://doi.org/10.1 145/3533767.3534380
-
[45]
Shuhan Liu, Jiayuan Zhou, Xing Hu, Filipe Roseiro Cogo, Xin Xia, and Xiaohu Yang. 2025. An Empirical Study on Vulnerability Disclosure Management of Open Source Software Systems.ACM Trans. Softw. Eng. Methodol.34, 7 (Aug. 2025), 214:1–214:31. https://doi.org/10.1145/3716822
-
[46]
Michael Meli, Matthew R. McNiece, and Bradley Reaves. 2019. How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories. InProceedings 2019 Network and Distributed System Security Symposium. Internet Society, San Diego, CA. https://doi.org/10.14722/ndss.2019.23418
-
[47]
Andrew Meneely, Harshavardhan Srinivasan, Ayemi Musa, Alberto Rodríguez Tejeda, Matthew Mokary, and Brian Spates. 2013. When a Patch Goes Bad: Explor- ing the Properties of Vulnerability-Contributing Commits. In2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement. 65–74. https://doi.org/10.1109/ESEM.2013.19
-
[48]
MITRE. 2025. CWE - 2025 CWE Top 25 Most Dangerous Software Weaknesses. https://cwe.mitre.org/top25/archive/2025/2025_cwe_top25.html
work page 2025
-
[49]
OASIS. [n. d.]. Static Analysis Results Interchange Format (SARIF) Ver- sion 2.1.0. https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/sarif-v2.1.0- cs01.html#_Toc16012611
-
[50]
Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. InDe- tection of Intrusions and Malware, and Vulnerability Assessment: 17th International Conference, DIMV A 2020, Lisbon, Portugal, June 24–26, 2020, Proceedings. Springer- Verlag, Berlin, Heidelberg, 23–...
-
[51]
OWASP Foundation. [n. d.]. OWASP Benchmark. https://owasp.org/www- project-benchmark/
-
[52]
Eric Pauley, Paul Barford, and Patrick McDaniel. 2023. The CVE Wayback Machine: Measuring Coordinated Disclosure from Exploits against Two Years of Zero-Days. InProceedings of the 2023 ACM on Internet Measurement Conference (IMC ’23). Association for Computing Machinery, New York, NY, USA, 236–252. https://doi.org/10.1145/3618257.3624810
-
[53]
Valentina Piantadosi, Simone Scalabrino, and Rocco Oliveto. 2019. Fixing of Security Vulnerabilities in Open Source Projects: A Case Study of Apache HTTP Server and Apache Tomcat. In2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). 68–78. https://doi.org/10.1109/ICST.2019.00 017
-
[54]
Niklas Risse, Jing Liu, and Marcel Böhme. 2025. Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection.Proc. ACM Softw. Eng.2, ISSTA (June 2025), ISSTA018:388–ISSTA018:410. https://doi.org/ 10.1145/3728887
-
[55]
Mingjie Shen, Akul Abhilash Pillai, Brian A. Yuan, James C. Davis, and Ar- avind Machiry. 2025. Finding 709 Defects in 258 Projects: An Experience Re- port on Applying CodeQL to Open-Source Embedded Software (Experience Paper).Finding 709 Defects in 258 Projects: An Experience Report on Applying CodeQL to Open-Source Embedded Software (Experience Paper)2,...
-
[56]
Kiran Sridhar and Ming Ng. 2021. Hacking for Good: Leveraging HackerOne Data to Develop an Economic Model of Bug Bounties.Journal of Cybersecurity 7, 1 (Feb. 2021), tyab007. https://doi.org/10.1093/cybsec/tyab007
-
[57]
Tamás Szabó. 2023. Incrementalizing Production CodeQL Analyses. InPro- ceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). As- sociation for Computing Machinery, New York, NY, USA, 1716–1726. https: //doi.org/10.1145/3611643.3613860
-
[58]
Shrey Tiwari, Serena Chen, Alexander Joukov, Peter Vandervelde, Ao Li, and Rohan Padhye. 2025. It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python Software. In2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). 39–51. https://doi.org/10.110 9/MSR66628.2025.00020
-
[59]
Thomas Walshe and Andrew Simpson. 2020. An Empirical Study of Bug Bounty Programs. In2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF). 35–44. https://doi.org/10.1109/IBF50092.2020.9034828
-
[60]
Claire Wang, Ziyang Li, Saikat Dutta, and Mayur Naik. 2026. QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities. https: //doi.org/10.48550/arXiv.2511.08462 arXiv:2511.08462 [cs]
-
[61]
Yanjing Yang, Xin Zhou, Runfeng Mao, Jinwei Xu, Lanxin Yang, Yu Zhang, Haifeng Shen, and He Zhang. 2025. DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection. Journal of Systems and Software219 (Jan. 2025), 112234. https://doi.org/10.1016/ j.jss.2024.112234
-
[62]
Xin Zhou, Duc-Manh Tran, Thanh Le-Cong, Ting Zhang, Ivana Clairine Irsan, Joshua Sumarlin, Bach Le, and David Lo. 2024. Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection. https://doi.org/10.48550/arXiv.2407.16235 arXiv:2407.16235 [cs] A APPENDIX A.1 Contributors and Language expertise ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.