arxiv: 2605.07900 · v1 · submitted 2026-05-08 · 💻 cs.CR

Recognition: 1 theorem link

· Lean Theorem

Longitudinal Analyses of SAST Tools: A CodeQL Case Study

Jean-Charles Noirot Ferrand, Kyle Domico, Patrick McDaniel, Yohan Beugin

Authors on Pith no claims yet

Pith reviewed 2026-05-11 03:30 UTC · model grok-4.3

classification 💻 cs.CR

keywords CodeQLstatic analysisSASTvulnerability detectionlongitudinal studyCVEsopen source softwaresoftware security

0 comments

The pith

CodeQL detects 171 CVEs across OSS but only 83 before fixes, with detections shifting after tool updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper measures how well CodeQL catches vulnerabilities in real open-source code by running 114 versions of the tool on thousands of known CVEs spanning many years and billions of lines. It checks whether the tool could have flagged the vulnerable code before the fix was applied and whether those flags would have been useful to a developer triaging results. The authors also track whether detections remain consistent as the tool itself evolves through updates. This matters because static analysis is widely embedded in OSS pipelines, yet little is known about its long-term reliability on actual vulnerable code rather than synthetic benchmarks.

Core claim

CodeQL identifies a total of 171 CVEs in the studied repositories. For 83 of those, an earlier version of the tool could have flagged the vulnerability before the fix commit. Within vulnerable files, half the detections show more than 50 percent of findings concentrated at the exact vulnerable location, making them potentially actionable with file-level triage. Detections are not stable: 21 CVEs stop being reported after a version change and 17 are never redetected once lost.

What carries the argument

Longitudinal apparatus that replays multiple historical versions of CodeQL on the pre-fix state of each CVE repository to measure pre-fix detection, actionability via location distance, and stability of alerts across tool releases.

If this is right

SAST tools can block many vulnerabilities from entering OSS codebases when used before merges.
Tool updates can remove coverage for previously detectable vulnerabilities, creating new blind spots.
Focusing triage on the single vulnerable file rather than the whole codebase makes most detections actionable.
Developers should treat SAST output as version-dependent and may need to retain older tool versions for critical checks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Pinning to a specific CodeQL version or running parallel analyses with several versions could reduce lost detections.
The same longitudinal replay method could be used to benchmark other SAST tools and identify which maintain stable coverage over time.
Rule changes that drop coverage for real CVEs suggest a need for regression testing of new tool releases against known vulnerable code.

Load-bearing premise

The 3993 CVEs and 1622 repositories are representative of the wider OSS ecosystem and CodeQL findings can be mapped accurately to the precise vulnerable code locations.

What would settle it

Repeating the analysis on a fresh, larger sample of CVEs drawn from different repositories and languages shows substantially lower pre-fix detection rates or much higher rates of lost detections after version changes.

Figures

Figures reproduced from arXiv: 2605.07900 by Jean-Charles Noirot Ferrand, Kyle Domico, Patrick McDaniel, Yohan Beugin.

**Figure 1.** Figure 1: Adoption of CodeQL across languages Versions. As any software, CodeQL is regularly updated to improve the ecosystem support (new languages and their versions), the security coverage (new queries), the overall tool (new features or bug fixes), or adjust the precision of the tool. This results in a new version: a snapshot of the current command-line interface (CLI) and the companion repository containing the… view at source ↗

**Figure 2.** Figure 2: In a CodeQL analysis, a database is generated from [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: An example vulnerability and CodeQL timeline. A [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of our methodology. We instantiate the tool (CodeQL) at a given version and apply it on two commits per [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Overview of our measures of locality. We consider [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Top 20 CWEs detected across languages developers with alerts on low severity vulnerabilities, even if they are correct [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Lead time of detecting CVEs Detection and lead time. As stated in Section 3, the main problem is not the detection of vulnerabilities, but starting which version CodeQL can detect them. Therefore, we study the lead time 𝑡fix − 𝑓𝑣𝑖 (as defined in Section 3). Indeed, given our longitudinal measurement, we can establish a precise characterization of this lead time that prior evaluations missed, therein paint… view at source ↗

**Figure 8.** Figure 8: Most proactive queries and their most likely driving [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Project locality Project locality. CodeQL is generally applied on the whole repository, thus a set of alerts can span many files unrelated to the CVE and therefore make it harder to triage and fix [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: File locality File locality. Even if the alerts span the vulnerable file, it is unclear whether they overlap with vulnerable lines of code [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

**Figure 11.** Figure 11: Evolution of CodeQL on the utility-cost trade-off. [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗

**Figure 12.** Figure 12: Overall stability of the detections. Interestingly, we found that many such drops share a common characteristic, they often occur at a new minor version (e.g., v2.18.0, v2.20.0, etc.), with the subsequent patch version restoring most detections. We note that some of these minor versions correspond to changes in threat model configurations (v2.18.0) and deprecations of some APIs (v2.20.0). While those par… view at source ↗

**Figure 13.** Figure 13: UpSet plot of the contributors per language, trun [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 14.** Figure 14: Vulnerability detections stability across releases [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗

read the original abstract

Open-source software (OSS) pipelines rely on automated static analysis tools to prevent the introduction of vulnerabilities in code. However, there is limited understanding of the efficacy of these tools across the OSS ecosystem over time. In this paper, we introduce a novel method to evaluate static application security testing (SAST) tools through longitudinal measurements and perform the largest academic study of CodeQL -- the most prevalent static analysis tool from GitHub -- on OSS codebases. We apply our apparatus on 114 versions of CodeQL over time on 3993 CVEs from 1622 repositories to measure key properties of the tool, culminating in more than 20 billion lines of code analyzed. First, we measure its effectiveness, i.e., its ability to detect vulnerabilities before they are fixed. Then, we determine whether these detections were actionable through two measures of the distance between findings and vulnerability location either over the entire codebase or within the vulnerable file. Finally, we study the stability of CodeQL by examining how vulnerability detections hold across versions and the evolution of CodeQL on the accuracy-precision trade-off. We find that CodeQL identifies a total of 171 CVEs, and that for 83 of them, a CodeQL version prior to the fix could detect it. Such detections are in general actionable if findings are triaged across files, as for 50% of the 171 detections, more than 50% of findings in the vulnerable file are located in the vulnerable location. Finally, we show that CVE detections are not monotonic across versions as 21 CVEs were no longer detected following a version change and 17 that were never redetected. Our study shows that using SAST tools is a matter of best practice as they prevent numerous vulnerabilities from being introduced, but that developers should be aware of changes that may leave blind spots in detections upon updates of the tool.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The scale of the longitudinal CodeQL run is real, but the actionability claim depends on unshown details for mapping alerts to exact CVE locations.

read the letter

This paper runs 114 CodeQL versions across 3993 CVEs in 1622 repositories and more than 20 billion lines of code. That volume is the main new piece: prior work on SAST tools did not combine multi-version tracking, pre-fix detection counts, and intra-file concentration at this size. They report 171 total detections and 83 caught before the fix, plus the observation that detections are not stable—21 CVEs dropped out after a version change and 17 were never recovered. The intra-file metric shows that for half the cases more than half the findings inside the vulnerable file sit at the vulnerable location, which they use to argue the results are actionable when triaged by file.

Referee Report

2 major / 1 minor

Summary. The paper introduces a longitudinal evaluation method for SAST tools and applies it to 114 historical versions of CodeQL across 3993 CVEs from 1622 OSS repositories (over 20 billion LOC analyzed). It reports that CodeQL detects 171 CVEs in total, with 83 detectable by a pre-fix version; claims these detections are generally actionable under file-level triage because 50% of the 171 cases have more than 50% of intra-file findings at the vulnerable location; and shows that detections are not monotonic across versions (21 CVEs lost after an update, 17 never redetected). The work concludes that SAST tools prevent vulnerabilities but that updates can introduce blind spots.

Significance. If the empirical measurements hold, the study provides the largest-scale academic longitudinal data on CodeQL's real-world effectiveness, actionability, and version stability in OSS codebases. The scale, use of historical tool versions, and dual metrics (whole-codebase and intra-file distance) are strengths that go beyond single-snapshot analyses and can directly inform developer practices and tool-maintenance decisions.

major comments (2)

[Abstract and results section on actionability] The actionability claim (that detections are 'in general actionable' when triaged across files) rests on the intra-file concentration statistic (50% of the 171 detections have >50% of file-level findings at the vulnerable location). This requires reliable extraction of the exact vulnerable location from CVE/patch data and precise alignment of CodeQL alerts to that location across 114 tool versions. The manuscript provides no methodological details on CVE selection criteria, the definition of 'vulnerable location,' false-positive filtering, or how alert sites are matched when queries evolve; any systematic offset would invalidate the concentration metric. (Abstract; results on actionability.)
[Abstract and data-collection section] The headline counts (171 total detections, 83 pre-fix) are presented without describing how the 3993-CVE sample was constructed or how 'vulnerable location' is operationalized. This leaves open whether the sample is biased toward easily mappable cases and whether the reported percentages generalize. (Abstract; § on data collection / CVE corpus.)

minor comments (1)

[Abstract] The abstract states 'more than 20 billion lines of code analyzed' but does not clarify whether this is unique LOC or includes re-analysis of the same files across versions; a brief clarification would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and positive assessment of the study's scale and potential impact. We address each major comment below and will perform a major revision to incorporate additional methodological details as requested.

read point-by-point responses

Referee: [Abstract and results section on actionability] The actionability claim (that detections are 'in general actionable' when triaged across files) rests on the intra-file concentration statistic (50% of the 171 detections have >50% of file-level findings at the vulnerable location). This requires reliable extraction of the exact vulnerable location from CVE/patch data and precise alignment of CodeQL alerts to that location across 114 tool versions. The manuscript provides no methodological details on CVE selection criteria, the definition of 'vulnerable location,' false-positive filtering, or how alert sites are matched when queries evolve; any systematic offset would invalidate the concentration metric. (Abstract; results on actionability.)

Authors: We agree that the manuscript would benefit from expanded methodological transparency to support the actionability claims. In the revised version, we will add a new subsection under Data Collection detailing: CVE selection criteria (including how the 3993-CVE corpus was filtered from a larger set of public CVEs with available patches and repositories), the operational definition of 'vulnerable location' (extracted from CVE descriptions, patch diffs, and commit metadata), the alignment procedure for CodeQL alerts across the 114 tool versions (accounting for query changes), and any post-processing to mitigate false positives. We will also include a limitations paragraph discussing potential alignment offsets and their impact on the intra-file concentration metric. These additions will allow readers to evaluate the reliability of the 50% statistic. revision: yes
Referee: [Abstract and data-collection section] The headline counts (171 total detections, 83 pre-fix) are presented without describing how the 3993-CVE sample was constructed or how 'vulnerable location' is operationalized. This leaves open whether the sample is biased toward easily mappable cases and whether the reported percentages generalize. (Abstract; § on data collection / CVE corpus.)

Authors: We acknowledge the need for explicit description of the sample construction. The 3993 CVEs were drawn from public OSS repositories with available historical code and patches, but the current text does not detail inclusion/exclusion criteria or potential biases toward mappable cases. In the revision, we will expand the Data Collection section to describe the full sampling process, report the number of CVEs excluded at each stage (e.g., due to missing patches or non-analyzable repositories), and discuss generalizability of the 171 detections and 83 pre-fix cases. This will clarify whether the headline counts are representative of the broader CVE population. revision: yes

Circularity Check

0 steps flagged

Pure empirical measurement study with no derivation chain

full rationale

The paper reports direct tallies from executing 114 CodeQL versions on 3993 CVEs across 1622 repositories, including counts of detections (171 total, 83 pre-fix) and intra-file concentration statistics (50% of cases with >50% findings at vulnerable location). No equations, fitted parameters, predictions, or self-citations are used to derive results; all claims are raw outputs of the measurement apparatus. The mapping of findings to CVE locations is an input assumption, not a derived claim that reduces to itself.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study rests on standard empirical assumptions in security research (representative CVE sampling, accurate historical code snapshots, reliable mapping of scanner output to vulnerability sites) with no free parameters, invented entities, or non-standard axioms.

pith-pipeline@v0.9.0 · 5650 in / 1120 out tokens · 56882 ms · 2026-05-11T03:30:57.748879+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
We apply our apparatus on 114 versions of CodeQL over time on 3993 CVEs ... measure the distance between findings and vulnerability location either over the entire codebase or within the vulnerable file.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages

[1]

Exploring the orthogonality and linearity of backdoor attacks,

Amit Seal Ami, Kevin Moran, Denys Poshyvanyk, and Adwait Nadkarni. 2024. "False Negative - That One Is Going to Kill You": Understanding Industry Per- spectives of Static Analysis Based Security Testing. In2024 IEEE Symposium on Security and Privacy (SP). 3979–3997. https://doi.org/10.1109/SP54263.2024.00019

work page doi:10.1109/sp54263.2024.00019 2024
[2]

Anthropic. [n. d.]. Claude Mythos Preview

work page
[3]

Stefan Axelsson. 2000. The Base-Rate Fallacy and the Difficulty of Intrusion Detection.ACM Trans. Inf. Syst. Secur.3, 3 (Aug. 2000), 186–205. https://doi.or g/10.1145/357830.357849

work page doi:10.1145/357830.357849 2000
[4]

Hassan, and Xiaohu Yang

Lingfeng Bao, Xin Xia, Ahmed E. Hassan, and Xiaohu Yang. 2022. V-SZZ: Au- tomatic Identification of Version Ranges Affected by CVE Vulnerabilities. In Proceedings of the 44th International Conference on Software Engineering. ACM, Pittsburgh Pennsylvania, 2352–2364. https://doi.org/10.1145/3510003.3510113

work page doi:10.1145/3510003.3510113 2022
[5]

Setu Kumar Basak, Jamison Cox, Bradley Reaves, and Laurie Williams. 2023. A Comparative Study of Software Secrets Reporting by Secret Detection Tools. In2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE Computer Society, Los Alamitos, CA, USA, 1–12. https://doi.org/10.1109/ESEM56168.2023.10304853

work page doi:10.1109/esem56168.2023.10304853 2023
[6]

Moritz Beller, Radjino Bholanath, Shane McIntosh, and Andy Zaidman. 2016. Analyzing the State of Static Analysis: A Large-Scale Evaluation in Open Source Software. In2016 IEEE 23rd International Conference on Software Analysis, Evolu- tion, and Reengineering (SANER), Vol. 1. 470–481. https://doi.org/10.1109/SANE R.2016.105

work page doi:10.1109/sane 2016
[7]

Gareth Bennett, Tracy Hall, Steve Counsell, Emily Winter, and Thomas Shippey

work page
[8]

InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’24)

Do Developers Use Static Application Security Testing (SAST) Tools Straight Out of the Box? A Large-Scale Empirical Study. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’24). Association for Computing Machinery, New York, NY, USA, 454–460. https://doi.org/10.1145/3674805.3690750

work page doi:10.1145/3674805.3690750
[9]

Gareth Bennett, Tracy Hall, Emily Winter, and Steve Counsell. 2024. Semgrep*: Improving the Limited Performance of Static Application Security Testing (SAST) Tools. InProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering. ACM, Salerno Italy, 614–623. https://doi.or g/10.1145/3661167.3661262

work page doi:10.1145/3661167.3661262 2024
[10]

Guru Bhandari, Amara Naseer, and Leon Moonen. 2021. CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software. In Proceedings of the 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2021). Association for Computing Machinery, New York, NY, USA, 30–39. https://doi.org/1...

work page doi:10.1145/3475960.3475985 2021
[11]

Tim Boland and Paul E. Black. 2012. Juliet 1.1 C/C++ and Java Test Suite.Computer 45, 10 (Oct. 2012), 88–90. https://doi.org/10.1109/MC.2012.345

work page doi:10.1109/mc.2012.345 2012
[12]

Tiago Brito, Mafalda Ferreira, Miguel Monteiro, Pedro Lopes, Miguel Barros, José Fragoso Santos, and Nuno Santos. 2023. Study of JavaScript Static Analysis Tools for Vulnerability Detection in Node.Js Packages.IEEE Transactions on Reliability72, 4 (Dec. 2023), 1324–1339. https://doi.org/10.1109/TR.2023.3286301

work page doi:10.1109/tr.2023.3286301 2023
[13]

Guillaume Cardoen, Tom Mens, and Alexandre Decan. 2024. A Dataset of GitHub Actions Workflow Histories. InProceedings of the 21st International Conference on Mining Software Repositories (MSR ’24). Association for Computing Machinery, New York, NY, USA, 677–681. https://doi.org/10.1145/3643991.3644867

work page doi:10.1145/3643991.3644867 2024
[14]

Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, and Christoph Treude. 2024. An Empirical Study of Static Analysis Tools for Secure Code Review. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 691–703. https://doi.org/10.1145...

work page doi:10.1145/3650212.3680313 2024
[15]

Maria Christakis and Christian Bird. 2016. What Developers Want and Need from Program Analysis: An Empirical Study. InProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE ’16). Asso- ciation for Computing Machinery, New York, NY, USA, 332–343. https: //doi.org/10.1145/2970276.2970347

work page doi:10.1145/2970276.2970347 2016
[16]

Jamie Cool. 2019. Announcing GitHub Security Lab: Securing the World’s Code, Together

work page 2019
[17]

Oege de Moor, Damien Sereni, Mathieu Verbaere, Elnar Hajiyev, Pavel Av- gustinov, Torbjörn Ekman, Neil Ongkingco, and Julian Tibble. 2008. .QL: Object-oriented Queries Made Easy. InGenerative and Transformational Tech- niques in Software Engineering II: International Summer School, GTTSE 2007, Braga, Portugal, July 2-7, 2007. Revised Papers, Ralf Lämmel, ...

work page doi:10.1007/978-3-540-88643-3_3 2008
[18]

Dubniczky, Krisztofer Zoltan Horvát, Tamás Bisztray, Mo- hamed Amine Ferrag, Lucas C

Richard A. Dubniczky, Krisztofer Zoltan Horvát, Tamás Bisztray, Mo- hamed Amine Ferrag, Lucas C. Cordeiro, and Norbert Tihanyi. 2025. CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs Towards CWE Detec- tion. InTheoretical Aspects of Software Engineering: 19th International Symposium, TASE 2025, Limassol, Cyprus, July 14–16, 2025, Proceeding...

work page doi:10.1007/978-3-031-98208-8_15 2025
[19]

Douglas Everson, Long Cheng, and Zhenkai Zhang. 2022. Log4shell: Redefining the Web Attack Surface. InProceedings 2022 Workshop on Measurements, Attacks, and Defenses for the Web. Internet Society, San Diego, CA, USA. https://doi.org/ 10.14722/madweb.2022.23010

work page doi:10.14722/madweb.2022.23010 2022
[20]

FIRST.Org, Inc. [n. d.]. CVSS v3.1 Specification Document. https://www.first.org/cvss/v3.1/specification-document

work page
[21]

Seyed Mohammad Ghaffarian and Hamid Reza Shahriari. 2017. Software Vul- nerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey.ACM Comput. Surv.50, 4 (Aug. 2017), 56:1–56:36. https://doi.org/10.1145/3092566

work page doi:10.1145/3092566 2017
[22]

GitHub. [n. d.]. CodeQL Wall of Fame. https://securitylab.github.com/codeql- wall-of-fame/

work page
[23]

GitHub. 2022. CodeQL for VS Code: Download CodeQL Databases from GitHub.Com - GitHub Changelog

work page 2022
[24]

GitHub. 2025. Incremental Security Analysis Makes CodeQL up to 20% Faster in Pull Requests - GitHub Changelog

work page 2025
[25]

GitHub. 2025. Introducing GitHub Secret Protection and GitHub Code Security - GitHub Changelog

work page 2025
[26]

GitHub. 2026. Faster Incremental Analysis with CodeQL in Pull Requests - GitHub Changelog

work page 2026
[27]

GitHub. 2026. Github/Codeql. GitHub

work page 2026
[28]

HackerOne. [n. d.]. HackerOne. https://www.hackerone.com/

work page
[29]

Michael Hilton, Nicholas Nelson, Timothy Tunnell, Darko Marinov, and Danny Dig. 2017. Trade-Offs in Continuous Integration: Assurance, Security, and Flexi- bility. InProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 197–207. https://doi.org/10.1145/310...

work page doi:10.1145/3106237.3106270 2017
[30]

Kevin Hogan, Noel Warford, Robert Morrison, David Miller, Sean Malone, and James Purtilo. 2019. The Challenges of Labeling Vulnerability-Contributing Com- mits. In2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW). 270–275. https://doi.org/10.1109/ISSREW.2019.00083

work page doi:10.1109/issrew.2019.00083 2019
[31]

Junze Hu, Xiangyu Jin, Yizhe Zeng, Yuling Liu, Yunpeng Li, Dan Du, Kaiyu Xie, and Hongsong Zhu. 2025. QLPro: Automated Code Vulnerability Discovery via LLM and Static Code Analysis Integration. https://doi.org/10.48550/arXiv.2506. 23644 arXiv:2506.23644 [cs]

work page doi:10.48550/arxiv.2506 2025
[32]

Emanuele Iannone, Roberta Guadagni, Filomena Ferrucci, Andrea De Lucia, and Fabio Palomba. 2023. The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study.IEEE Transactions on Software Engineering49, 1 (Jan. 2023), 44–63. https://doi.org/10.1109/TSE.2022.3140868

work page doi:10.1109/tse.2022.3140868 2023
[33]

Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge

work page
[34]

In2013 35th International Conference on Software Engineering (ICSE)

Why Don’t Software Developers Use Static Analysis Tools to Find Bugs?. In2013 35th International Conference on Software Engineering (ICSE). IEEE, San Francisco, CA, USA, 672–681. https://doi.org/10.1109/ICSE.2013.6606613

work page doi:10.1109/icse.2013.6606613 2013
[35]

Wooseok Kang, Byoungho Son, and Kihong Heo. 2022. TRACER: Signature- based Static Analysis for Detecting Recurring Vulnerabilities. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS ’22). Association for Computing Machinery, New York, NY, USA, 1695–1708. https://doi.org/10.1145/3548606.3560664

work page doi:10.1145/3548606.3560664 2022
[36]

Avishree Khare, Saikat Dutta, Ziyang Li, Alaia Solko-Breslin, Rajeev Alur, and Mayur Naik. 2024. Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities. https://doi.org/10.48550/arXiv.2311.16169 arXiv:2311.16169 [cs]

work page doi:10.48550/arxiv.2311.16169 2024
[37]

Piergiorgio Ladisa, Henrik Plate, Matias Martinez, and Olivier Barais. 2023. SoK: Taxonomy of Attacks on Open-Source Software Supply Chains. In2023 IEEE Symposium on Security and Privacy (SP). IEEE, San Francisco, CA, USA, 1509–1526. https://doi.org/10.1109/SP46215.2023.10179304

work page doi:10.1109/sp46215.2023.10179304 2023
[38]

Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, and Hanspeter Pfister. 2014. UpSet: Visualization of Intersecting Sets.IEEE Trans- actions on Visualization and Computer Graphics20, 12 (Dec. 2014), 1983–1992. https://doi.org/10.1109/TVCG.2014.2346248

work page doi:10.1109/tvcg.2014.2346248 2014
[39]

Kaixuan Li, Sen Chen, Lingling Fan, Ruitao Feng, Han Liu, Chengwei Liu, Yang Liu, and Yixiang Chen. 2023. Comparison and Evaluation on Static Application Security Testing (SAST) Tools for Java. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, San Francisco CA USA, 9...

work page arXiv 2023
[40]

Yuan Li, Peisen Yao, Kan Yu, Chengpeng Wang, Yaoyang Ye, Song Li, Meng Luo, Yepang Liu, and Kui Ren. 2025. Understanding Industry Perspectives of Static Noirot Ferrand et al. Application Security Testing (SAST) Evaluation.Proc. ACM Softw. Eng.2, FSE (June 2025), FSE134:3033–FSE134:3056. https://doi.org/10.1145/3729404

work page doi:10.1145/3729404 2025
[41]

Ziyang Li, Saikat Dutta, and Mayur Naik. 2025. IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. https://doi.org/10.48550/arXiv.2405.17238 arXiv:2405.17238 [cs]

work page doi:10.48550/arxiv.2405.17238 2025
[42]

Zongjie Li, Zhibo Liu, Wai Kin Wong, Pingchuan Ma, and Shuai Wang. 2024. Evaluating C/C++ Vulnerability Detectability of Query-Based Static Application Security Testing Tools.IEEE Transactions on Dependable and Secure Computing 21, 5 (Sept. 2024), 4600–4618. https://doi.org/10.1109/TDSC.2024.3354789

work page doi:10.1109/tdsc.2024.3354789 2024
[43]

Mario Lins, René Mayrhofer, Michael Roland, Daniel Hofer, and Martin Schwaighofer. 2024. On the Critical Path to Implant Backdoors and the Effectiveness of Potential Mitigation Techniques: Early Learnings from XZ. https://doi.org/10.48550/arXiv.2404.08987 arXiv:2404.08987 [cs]

work page doi:10.48550/arxiv.2404.08987 2024
[44]

Stephan Lipp, Sebastian Banescu, and Alexander Pretschner. 2022. An Empirical Study on the Effectiveness of Static C Code Analyzers for Vulnerability Detection. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, Virtual South Korea, 544–555. https://doi.org/10.1 145/3533767.3534380

work page arXiv 2022
[45]

Shuhan Liu, Jiayuan Zhou, Xing Hu, Filipe Roseiro Cogo, Xin Xia, and Xiaohu Yang. 2025. An Empirical Study on Vulnerability Disclosure Management of Open Source Software Systems.ACM Trans. Softw. Eng. Methodol.34, 7 (Aug. 2025), 214:1–214:31. https://doi.org/10.1145/3716822

work page doi:10.1145/3716822 2025
[46]

McNiece, and Bradley Reaves

Michael Meli, Matthew R. McNiece, and Bradley Reaves. 2019. How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories. InProceedings 2019 Network and Distributed System Security Symposium. Internet Society, San Diego, CA. https://doi.org/10.14722/ndss.2019.23418

work page doi:10.14722/ndss.2019.23418 2019
[47]

Andrew Meneely, Harshavardhan Srinivasan, Ayemi Musa, Alberto Rodríguez Tejeda, Matthew Mokary, and Brian Spates. 2013. When a Patch Goes Bad: Explor- ing the Properties of Vulnerability-Contributing Commits. In2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement. 65–74. https://doi.org/10.1109/ESEM.2013.19

work page doi:10.1109/esem.2013.19 2013
[48]

MITRE. 2025. CWE - 2025 CWE Top 25 Most Dangerous Software Weaknesses. https://cwe.mitre.org/top25/archive/2025/2025_cwe_top25.html

work page 2025
[49]

OASIS. [n. d.]. Static Analysis Results Interchange Format (SARIF) Ver- sion 2.1.0. https://docs.oasis-open.org/sarif/sarif/v2.1.0/cs01/sarif-v2.1.0- cs01.html#_Toc16012611

work page
[50]

Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks. InDe- tection of Intrusions and Malware, and Vulnerability Assessment: 17th International Conference, DIMV A 2020, Lisbon, Portugal, June 24–26, 2020, Proceedings. Springer- Verlag, Berlin, Heidelberg, 23–...

work page doi:10.1007/978-3-030-52683-2_2 2020
[51]

OWASP Foundation. [n. d.]. OWASP Benchmark. https://owasp.org/www- project-benchmark/

work page
[52]

Eric Pauley, Paul Barford, and Patrick McDaniel. 2023. The CVE Wayback Machine: Measuring Coordinated Disclosure from Exploits against Two Years of Zero-Days. InProceedings of the 2023 ACM on Internet Measurement Conference (IMC ’23). Association for Computing Machinery, New York, NY, USA, 236–252. https://doi.org/10.1145/3618257.3624810

work page doi:10.1145/3618257.3624810 2023
[53]

Valentina Piantadosi, Simone Scalabrino, and Rocco Oliveto. 2019. Fixing of Security Vulnerabilities in Open Source Projects: A Case Study of Apache HTTP Server and Apache Tomcat. In2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). 68–78. https://doi.org/10.1109/ICST.2019.00 017

work page doi:10.1109/icst.2019.00 2019
[54]

Niklas Risse, Jing Liu, and Marcel Böhme. 2025. Top Score on the Wrong Exam: On Benchmarking in Machine Learning for Vulnerability Detection.Proc. ACM Softw. Eng.2, ISSTA (June 2025), ISSTA018:388–ISSTA018:410. https://doi.org/ 10.1145/3728887

work page doi:10.1145/3728887 2025
[55]

Yuan, James C

Mingjie Shen, Akul Abhilash Pillai, Brian A. Yuan, James C. Davis, and Ar- avind Machiry. 2025. Finding 709 Defects in 258 Projects: An Experience Re- port on Applying CodeQL to Open-Source Embedded Software (Experience Paper).Finding 709 Defects in 258 Projects: An Experience Report on Applying CodeQL to Open-Source Embedded Software (Experience Paper)2,...

work page doi:10.1145/3728923 2025
[56]

Kiran Sridhar and Ming Ng. 2021. Hacking for Good: Leveraging HackerOne Data to Develop an Economic Model of Bug Bounties.Journal of Cybersecurity 7, 1 (Feb. 2021), tyab007. https://doi.org/10.1093/cybsec/tyab007

work page doi:10.1093/cybsec/tyab007 2021
[57]

Tamás Szabó. 2023. Incrementalizing Production CodeQL Analyses. InPro- ceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). As- sociation for Computing Machinery, New York, NY, USA, 1716–1726. https: //doi.org/10.1145/3611643.3613860

work page doi:10.1145/3611643.3613860 2023
[58]

Shrey Tiwari, Serena Chen, Alexander Joukov, Peter Vandervelde, Ao Li, and Rohan Padhye. 2025. It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python Software. In2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). 39–51. https://doi.org/10.110 9/MSR66628.2025.00020

work page arXiv 2025
[59]

Thomas Walshe and Andrew Simpson. 2020. An Empirical Study of Bug Bounty Programs. In2020 IEEE 2nd International Workshop on Intelligent Bug Fixing (IBF). 35–44. https://doi.org/10.1109/IBF50092.2020.9034828

work page doi:10.1109/ibf50092.2020.9034828 2020
[60]

Claire Wang, Ziyang Li, Saikat Dutta, and Mayur Naik. 2026. QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities. https: //doi.org/10.48550/arXiv.2511.08462 arXiv:2511.08462 [cs]

work page doi:10.48550/arxiv.2511.08462 2026
[61]

Yanjing Yang, Xin Zhou, Runfeng Mao, Jinwei Xu, Lanxin Yang, Yu Zhang, Haifeng Shen, and He Zhang. 2025. DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection. Journal of Systems and Software219 (Jan. 2025), 112234. https://doi.org/10.1016/ j.jss.2024.112234

work page arXiv 2025
[62]

Xin Zhou, Duc-Manh Tran, Thanh Le-Cong, Ting Zhang, Ivana Clairine Irsan, Joshua Sumarlin, Bach Le, and David Lo. 2024. Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection. https://doi.org/10.48550/arXiv.2407.16235 arXiv:2407.16235 [cs] A APPENDIX A.1 Contributors and Language expertise ...

work page doi:10.48550/arxiv.2407.16235 2024