pith. machine review for the scientific record. sign in

arxiv: 2605.04000 · v2 · submitted 2026-05-05 · 💻 cs.SE

Recognition: 2 theorem links

· Lean Theorem

Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning

Foutse Khomh, Leuson Da Silva, P Akilesh, Sridhar Chimalakonda

Pith reviewed 2026-05-08 18:07 UTC · model grok-4.3

classification 💻 cs.SE
keywords Ruststatic analysisfalse positivesreinforcement learningmemory safetyMIRfuzz testing
0
0 comments X

The pith

Reinforcement learning can suppress false positive warnings from Rust static analyzers like Rudra by learning policies from MIR features and fuzzing feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes training a reinforcement learning agent to decide which warnings from static memory safety tools to suppress in Rust programs. Static analyzers produce many spurious alerts that force developers to spend time on manual reviews and can hide real problems. The agent extracts contextual signals from the program's mid-level intermediate representation and refines its decisions through repeated interaction with analysis results. Dynamic fuzz testing supplies additional feedback to validate borderline cases. The resulting system raises precision from 25.6 percent to 59 percent while preserving recall near 75 percent and outperforming language-model baselines.

Core claim

By treating false-positive suppression as a reinforcement learning task, an agent can learn a policy that uses features from Rust's mid-level intermediate representation to classify warnings and selectively invokes cargo-fuzz for dynamic confirmation, yielding 65.2 percent accuracy and an F1 score of 0.659, a 17.1 percent improvement over the strongest LLM baseline.

What carries the argument

An RL agent that learns a warning-suppression policy from contextual features extracted from Rust's mid-level intermediate representation, with cargo-fuzz providing auxiliary dynamic validation signals.

If this is right

  • Raw Rudra precision rises from 25.6 percent to 59.0 percent while recall reaches 74.6 percent.
  • Adding targeted fuzzing yields another 10.7 percentage points in accuracy over the RL-only version.
  • The hybrid approach outperforms LLM-based warning classification by 17.1 percent in accuracy.
  • Developers receive fewer spurious alerts, lowering the effort needed to trust static memory-safety results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar RL policies could be trained for other languages that expose comparable intermediate representations.
  • Embedding the agent inside continuous-integration pipelines would let teams triage only the warnings the policy retains.
  • If MIR features prove insufficient on certain code patterns, augmenting the state with additional static facts could be tested directly.

Load-bearing premise

The assumption that features drawn from the mid-level intermediate representation supply enough context for the agent to learn a reliable distinction between true and false warnings, and that fuzzing feedback is accurate and unbiased.

What would settle it

A test set of Rust programs containing documented memory-safety bugs where the trained agent suppresses a substantial fraction of the true-positive warnings that the original static analyzer had flagged.

Figures

Figures reproduced from arXiv: 2605.04000 by Foutse Khomh, Leuson Da Silva, P Akilesh, Sridhar Chimalakonda.

Figure 1
Figure 1. Figure 1: Overview of the warning classification pipeline. view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of key performance metrics across all approaches. Our RL+Fuzzing approach achieves the highest view at source ↗
read the original abstract

Static analysis tools are essential for ensuring memory safety in Rust programs, particularly as Rust gains adoption in safety-critical domains. However, existing tools such as Rudra and MirChecker suffer from high false positive rates, which diminish developer trust, increase manual review effort, and may obscure genuine vulnerabilities. This paper presents a novel reinforcement learning (RL)-based approach for automatically classifying and suppressing spurious warnings in static memory safety analysis for Rust. To achieve this, we design an RL agent that learns a warning suppression policy by extracting contextual features from Rust's Mid-level Intermediate Representation (MIR) and optimizing its decisions through interaction with static analysis outputs. To improve decision quality, we integrate dynamic validation via cargo-fuzz as an auxiliary feedback mechanism, allowing the agent to selectively validate suspicious warnings through targeted fuzz testing. Our evaluation shows that the proposed approach significantly outperforms state-of-the-art LLM-based baselines, achieving 65.2% accuracy and an F1 score of 0.659, an improvement of 17.1% over the best LLM baseline. With a recall of 74.6%, our method successfully identifies nearly three-quarters of true bugs while substantially reducing false positives, improving precision from 25.6% in raw Rudra output to 59.0%. Incorporating dynamic fuzzing further boosts performance, yielding additional improvements of 10.7 percentage points in accuracy and 8.6 percentage points in F1 score over the RL-only variant. Overall, our work demonstrates that combining reinforcement learning with hybrid static-dynamic analysis can substantially reduce false positives and improve the practical usability of memory safety verification tools for Rust.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a reinforcement learning (RL) framework to reduce false positives in static memory safety analyzers for Rust, such as Rudra. It uses contextual features from the Mid-level Intermediate Representation (MIR) to train an agent that learns a suppression policy, augmented by dynamic feedback from cargo-fuzz. The evaluation reports that this approach achieves 65.2% accuracy and an F1 score of 0.659, outperforming LLM-based baselines by 17.1%, with precision improving from 25.6% to 59.0% and recall at 74.6%. Dynamic fuzzing adds further gains.

Significance. If the empirical results hold under rigorous validation, the hybrid RL approach combining MIR-derived features with fuzzing feedback could substantially improve the practical utility of static analyzers for Rust memory safety, a domain where high false-positive rates currently limit adoption in safety-critical code. The work credits the integration of static analysis outputs with dynamic validation as a key enabler for the reported precision lift.

major comments (2)
  1. [Evaluation] Evaluation section: the headline metrics (65.2% accuracy, F1=0.659, precision rising from 25.6% to 59.0%, recall 74.6%) are presented without any description of dataset size, number of programs or warnings, how ground-truth labels for true bugs were obtained, RL training hyperparameters, or statistical significance tests. These omissions make it impossible to verify whether the 17.1% improvement over LLM baselines is robust or reproducible.
  2. [Method (Dynamic Validation)] Dynamic validation subsection: the reward signal relies on cargo-fuzz to label true positives, yet no coverage statistics, number of fuzzing campaigns, or comparison against exhaustive or symbolic validation are supplied. If fuzzing systematically misses data-dependent or rare-path violations (as is common in Rust), the RL agent receives false-negative labels and is incentivized to suppress genuine bugs, directly undermining the claimed precision and recall figures.
minor comments (1)
  1. [Abstract] Abstract: the phrase 'state-of-the-art LLM-based baselines' is used without naming the specific models or providing citations; this should be expanded for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the evaluation and dynamic validation aspects of our work. We address each major comment below and indicate the revisions incorporated into the manuscript.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the headline metrics (65.2% accuracy, F1=0.659, precision rising from 25.6% to 59.0%, recall 74.6%) are presented without any description of dataset size, number of programs or warnings, how ground-truth labels for true bugs were obtained, RL training hyperparameters, or statistical significance tests. These omissions make it impossible to verify whether the 17.1% improvement over LLM baselines is robust or reproducible.

    Authors: We agree that the original Evaluation section omitted details required for reproducibility and verification of the results. The revised manuscript expands this section to describe the dataset size and composition (number of programs and warnings), the procedure used to obtain ground-truth labels for true bugs, the full set of RL training hyperparameters, and the statistical significance tests performed to support the reported improvements over LLM baselines. revision: yes

  2. Referee: [Method (Dynamic Validation)] Dynamic validation subsection: the reward signal relies on cargo-fuzz to label true positives, yet no coverage statistics, number of fuzzing campaigns, or comparison against exhaustive or symbolic validation are supplied. If fuzzing systematically misses data-dependent or rare-path violations (as is common in Rust), the RL agent receives false-negative labels and is incentivized to suppress genuine bugs, directly undermining the claimed precision and recall figures.

    Authors: We acknowledge the validity of this concern regarding potential incomplete coverage in fuzzing. The revised manuscript adds coverage statistics from the cargo-fuzz campaigns, the number of fuzzing campaigns executed, and a rationale for not performing exhaustive or symbolic validation (due to scalability limitations with Rust programs). We also include an explicit discussion of the risk of false-negative labels from missed paths and describe how the RL framework integrates the dynamic signal conservatively alongside MIR features to limit its impact; an ablation study has been added showing that precision gains remain even under partial coverage. revision: partial

Circularity Check

0 steps flagged

No circularity in empirical RL training pipeline

full rationale

The paper describes an empirical reinforcement learning approach that trains an agent on MIR-derived contextual features to suppress false positives from static analyzers like Rudra, with cargo-fuzz providing external reward signals for validation. Reported metrics (65.2% accuracy, 0.659 F1, precision lift from 25.6% to 59.0%) arise from standard training/evaluation loops against held-out data and baselines, not from any internal equations or parameters that define the target quantities by construction. No self-definitional reductions, fitted-input predictions, or load-bearing self-citations appear in the abstract or described method; the derivation chain consists of feature extraction, policy optimization, and hybrid static-dynamic feedback, all externally grounded.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit mathematical axioms, free parameters, or invented entities are stated in the abstract; the work rests on standard assumptions of reinforcement learning and empirical evaluation in software analysis.

pith-pipeline@v0.9.0 · 5608 in / 1199 out tokens · 40562 ms · 2026-05-08T18:07:11.639968+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

63 extracted references · 30 canonical work pages · 5 internal anchors

  1. [1]

    https://github.com/Akileshdash/rl-guided-static-analysis-rust/blob/main/ README.md

    2025. https://github.com/Akileshdash/rl-guided-static-analysis-rust/blob/main/ README.md

  2. [2]

    https://github.com/Akileshdash/Rudra/blob/master/README.md

    2025. https://github.com/Akileshdash/Rudra/blob/master/README.md

  3. [3]

    aarc Developers. 2021. aarc crate, version 0.3.2. https://crates.io/crates/aarc/0.3.2. Accessed: 2026-01-10

  4. [4]

    Vytautas Astrauskas, Christoph Matheja, Federico Poli, Peter Müller, and Alexan- der J. Summers. 2020. How Do Programmers Use Unsafe Rust?. InProceed- ings of the ACM on Programming Languages (OOPSLA), Vol. 4. 136:1–136:27. doi:10.1145/3428204

  5. [5]

    Vytautas Astrauskas, Peter Müller, Federico Poli, and Alexander J Summers. 2019. Leveraging Rust types for modular specification and verification. InProceedings of the ACM on Programming Languages (OOPSLA), Vol. 3. 1–30

  6. [6]

    David Morgenthaler, John Penix, and William Pugh

    Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix, and William Pugh. 2008. Using Static Analysis to Find Bugs.IEEE Software25, 5, 22–29. doi:10.1109/MS.2008.130

  7. [7]

    Yechan Bae, Youngsuk Kim, Ammar Askar, Jungwon Lim, and Taesoo Kim. 2021. Rudra: Finding Memory Safety Bugs in Rust at the Ecosystem Scale. InProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP). 84–99. doi:10.1145/3477132.3483570

  8. [8]

    Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. InCommunications of the ACM, Vol. 53. 66–75. doi:10.1145/1646353.1646374

  9. [9]

    Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury

  10. [10]

    InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

    Directed greybox fuzzing. InProceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2329–2344

  11. [11]

    Montgomery Carter, Shaobo He, Jonathan Whitaker, Zvonimir Rakamarić, and Michael Emmi. 2016. SMACK software verification toolchain. InProceedings of the 38th International Conference on Software Engineering Companion. 589–592

  12. [12]

    Microsoft Security Response Center. 2019. Why Rust for Safe Systems Program- ming. https://msrc-blog.microsoft.com/2019/07/22/why-rust-for-safe-systems- programming/. Accessed: 2024-12-10

  13. [13]

    Partha Chakraborty, Mahmoud Alfadel, and Meiyappan Nagappan. 2024. RLoca- tor: Reinforcement learning for bug localization.IEEE Transactions on Software Engineering(2024)

  14. [14]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating Large Language Models Trained on Code. InarXiv preprint arXiv:2107.03374

  15. [15]

    Maria Christakis and Christian Bird. 2016. What Developers Want and Need from Program Analysis: An Empirical Study. InProceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). 332–343. doi:10.1145/2970276.2970347

  16. [16]

    Cloudflare. 2020. Enjoy a Slice of QUIC, and Rust! https://blog.cloudflare.com/ enjoy-a-slice-of-quic-and-rust/. Accessed: 2024-12-10

  17. [17]

    National Vulnerability Database. 2020. CVE-2020-35905. https://nvd.nist.gov/ vuln/detail/CVE-2020-35905. Accessed: 2026-01-09

  18. [18]

    National Vulnerability Database. 2021. CVE-2020-36323. https://nvd.nist.gov/ vuln/detail/CVE-2020-36323. Accessed: 2026-01-09

  19. [19]

    Xiangjue Dong, Maria Teleki, and James Caverlee. 2024. A survey on llm inference- time self-improvement.arXiv preprint arXiv:2412.14352(2024)

  20. [20]

    Ana Nora Evans, Bradford Campbell, and Mary Lou Soffa. 2020. Is Rust Used Safely by Software Developers?. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE). 246–257. doi:10.1145/3377811.3380413

  21. [21]

    Z Feng. 2020. Codebert: A pre-trained model for program-ming and natural languages.arXiv preprint arXiv:2002.08155(2020)

  22. [22]

    MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Gen- eration

    Sushant Ghimire, Michael W. Godfrey, and Chanchal K. Roy. 2023. Yuga: Au- tomatically Detecting Lifetime Annotation Bugs in the Rust Language.IEEE Transactions on Software Engineering49, 4 (2023), 2075–2091. doi:10.1109/TSE. 2022.3200162

  23. [23]

    Aaron Grattafiori et al . 2024. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] https://arxiv.org/abs/2407.21783

  24. [24]

    Sarah Heckman and Laurie Williams. 2009. A Model Building Process for Iden- tifying Actionable Static Analysis Alerts. InProceedings of the 2009 Interna- tional Conference on Software Testing Verification and Validation (ICST). 161–170. doi:10.1109/ICST.2009.47

  25. [25]

    Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven...

  26. [26]

    Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge

  27. [27]

    InProceedings of the 2013 International Conference on Software Engineering (ICSE)

    Why Don’t Software Developers Use Static Analysis Tools to Find Bugs?. InProceedings of the 2013 International Conference on Software Engineering (ICSE). 672–681. doi:10.1109/ICSE.2013.6606613

  28. [28]

    Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. 2018. RustBelt: Securing the Foundations of the Rust Programming Language. In Proceedings of the ACM on Programming Languages (POPL), Vol. 2. 66:1–66:34. doi:10.1145/3158154

  29. [29]

    Ralf Jung, Benjamin Kimock, Christian Poveda, Eduardo Sánchez Muñoz, Oli Scherer, and Qian Wang. 2026. Miri: Practical Undefined Behavior Detection for Rust.Proceedings of the ACM on Programming Languages10, POPL (2026), 1383–1411

  30. [30]

    Sunghun Kim and Michael D. Ernst. 2007. Which Warnings Should I Fix First?. InProceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE). 45–54. doi:10.1145/1287624.1287633

  31. [31]

    George Klees, Andrew Ruef, Benji Cooper, Shiyi Wei, and Michael Hicks. 2018. Evaluating Fuzz Testing. InProceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS). 2123–2138. doi:10.1145/3243734. 3243804

  32. [32]

    Ted Kremenek, Ken Ashcraft, Junfeng Yang, and Dawson Engler. 2004. Corre- lation Exploitation in Error Ranking. InProceedings of the 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). 83–93. doi:10.1145/1029894.1029909

  33. [33]

    Yi Li, Aws Albarghouthi, Zachary Kincaid, Mayur Naik, et al . 2019. Hybrid program analysis for effective bug detection. InProceedings of the ACM/IEEE International Conference on Software Engineering

  34. [34]

    Zhuohua Li, Jincheng Wang, Mingshen Sun, and John C. S. Lui. 2021. MirChecker: Detecting Bugs in Rust Programs via Static Analysis. InProceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS). 2183–

  35. [35]

    doi:10.1145/3460120.3484541

  36. [36]

    Kevin Lira, Baldoino Fonseca, Wesley KG Assunccao, Davy Baya, and Marcio Ribeiro. 2025. Beyond Code Explanations: a Ray of Hope for Cross-Language Vul- nerability Repair. In2025 2nd IEEE/ACM International Conference on AI-powered Software (AIware). IEEE, 01–09

  37. [37]

    Guoming Long, Jingzhi Gong, Hui Fang, and Tao Chen. 2025. Learning Software Bug Reports: A Systematic Literature Review.ACM Transactions on Software Engineering and Methodology(2025)

  38. [38]

    Scott Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. arXiv:1705.07874 [cs.AI] https://arxiv.org/abs/1705.07874

  39. [39]

    Matsakis and Felix S

    Nicholas D. Matsakis and Felix S. Klock. 2014. The Rust Language. InProceedings of the 2014 ACM SIGAda Annual Conference on High Integrity Language Technology (HILT). 103–104. doi:10.1145/2663171.2663188

  40. [40]

    Nathalia Nascimento, Everton Guimaraes, Sai Sanjna Chintakunta, and San- thosh Anitha Boominathan. 2025. How Effective are LLMs for Data Science Coding? A Controlled Experiment. In2025 IEEE/ACM 22nd International Confer- ence on Mining Software Repositories (MSR). IEEE, 211–222

  41. [41]

    National Vulnerability Database. 2021. CVE-2020-36317. https://nvd.nist.gov/ vuln/detail/CVE-2020-36317. Accessed: 2026-01-09

  42. [42]

    Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, and Shuiwang Ji. 2025. Inference-time computations for llm reasoning and planning: A benchmark and insights.arXiv preprint arXiv:2502.12521(2025)

  43. [43]

    Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name- based bug detection.Proceedings of the ACM on Programming Languages2, OOPSLA (2018), 1–25

  44. [44]

    Boqin Qin, Yilun Chen, Zeming Yu, Linhai Song, and Yiying Zhang. 2020. Under- standing Memory and Thread Safety Practices and Issues in Real-World Rust Pro- grams. InProceedings of the 41st ACM SIGPLAN Conference on Programming Lan- guage Design and Implementation (PLDI). 763–779. doi:10.1145/3385412.3386036

  45. [45]

    Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiao- qing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, et al. 2023. Code Llama: Open Foundation Models for Code.arXiv preprint arXiv:2308.12950(2023)

  46. [46]

    Rust Fuzzing Authority. 2024. cargo-fuzz: Fuzz testing for Rust. https://github. com/rust-fuzz/cargo-fuzz

  47. [47]

    Rust Project Developers. 2025. Rust Version 1.88.0. https://doc.rust-lang.org/ beta/releases.html#version-1880-2025-06-26. Accessed: 2026-01-10

  48. [48]

    Rust Release Notes. 2021. Rust Version 1.56.0. https://doc.rust-lang.org/beta/ releases.html#version-1560-2021-10-21. Accessed: 2026-01-10

  49. [49]

    Ruthruff, John Penix, J

    Joseph R. Ruthruff, John Penix, J. David Morgenthaler, Sebastian Elbaum, and Gregg Rothermel. 2008. Predicting Accurate and Actionable Static Analysis Warnings: An Experimental Approach. InProceedings of the 30th International Conference on Software Engineering (ICSE). 341–350. doi:10.1145/1368088.1368135

  50. [50]

    Iman Saberi, Amirreza Esmaeili, Fatemeh Fard, and Fuxiang Chen. 2025. Adv- Fusion: Adapter-based Knowledge Transfer for Code Summarization on Code Language Models. In2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 563–574

  51. [51]

    Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, and Ciera Jaspan. 2018. Lessons from Building Static Analysis Tools at Google. In Communications of the ACM, Vol. 61. 58–66. doi:10.1145/3188720

  52. [52]

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

  53. [53]

    Proximal Policy Optimization Algorithms

    Proximal Policy Optimization Algorithms. arXiv:1707.06347 [cs.LG] https: EASE 2026, 9–12 June, 2026, Glasgow, Scotland, United Kingdom Akilesh et al. //arxiv.org/abs/1707.06347

  54. [54]

    Amazon Web Services. 2020. Why AWS Loves Rust, and How We’d Like to Help. https://aws.amazon.com/blogs/opensource/why-aws-loves-rust-and-how- wed-like-to-help/. Accessed: 2024-12-10

  55. [55]

    Ayushi Sharma, Shashank Sharma, Sai Ritvik Tanksalkar, Santiago Torres-Arias, and Aravind Machiry. 2024. Rust for embedded systems: current state and open problems. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 2296–2310

  56. [56]

    Tushar Sharma, Maria Kechagia, Stefanos Georgiou, Rohit Tiwari, Indira Vats, Hadi Moazen, and Federica Sarro. 2021. A survey on machine learning techniques for source code analysis.arXiv preprint arXiv:2110.09610(2021)

  57. [57]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto. 2018.Reinforcement Learning: An Intro- duction(2nd ed.). MIT Press

  58. [58]

    Linus Torvalds and Linux Kernel Team. 2022. Linux 6.1: Rust Support Merged into Mainline Kernel. https://www.kernel.org/. Accessed: 2024-12-10

  59. [59]

    Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. Codet5: Identifier- aware unified pre-trained encoder-decoder models for code understanding and generation.arXiv preprint arXiv:2109.00859(2021)

  60. [60]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. Chain-of-thought prompting elicits reasoning in large language models.Advances in neural information processing systems35 (2022), 24824–24837

  61. [61]

    Hui Xu, Zhuangbin Chen, Mingshen Sun, Yangfan Zhou, and Michael R. Lyu

  62. [62]

    InACM Transactions on Software Engineering and Methodology (TOSEM), Vol

    Memory-Safety Challenge Considered Solved? An In-Depth Study with All Rust CVEs. InACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 31. 3:1–3:25. doi:10.1145/3466642

  63. [63]

    Michal Zalewski. 2014. American fuzzy lop. http://lcamtuf.coredump.cx/afl/