pith. machine review for the scientific record. sign in

arxiv: 2605.00413 · v1 · submitted 2026-05-01 · 💻 cs.SE

Recognition: unknown

ClozeMaster: Fuzzing Rust Compiler by Harnessing LLMs for Infilling Masked Real Programs

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:26 UTC · model grok-4.3

classification 💻 cs.SE
keywords Rust compiler fuzzingLLM infillingclozeMask strategytest case generationcompiler bug detectionhistorical bug reportsfuzzer effectivenesstest program synthesis
0
0 comments X

The pith

Masking specific structures in historical Rust bug reports and letting LLMs infill them produces valid new test programs that trigger compiler bugs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CLOZEMASTER, a fuzzer that creates test programs for the Rust compiler by taking code from past bug reports, applying a bracket-based clozeMask strategy to hide targeted structures, and asking an LLM to complete the missing parts. This keeps the new programs close enough to known bug triggers to remain effective while generating fresh cases that explore different compiler behaviors. A sympathetic reader would care because Rust sees growing use in critical systems where compiler flaws can lead to memory or safety issues, and its strict syntax makes random or traditional test generation produce mostly invalid programs. The authors show that the resulting tests found 27 confirmed bugs in rustc and mrustc, ten of which developers have fixed, while also reaching higher code coverage than existing fuzzers.

Core claim

Our investigation into Rust compiler bug issues shows that test cases triggering historical bugs assist in software testing. Inspired by this, we introduce the clozeMask strategy: extracting test code from historical issue reports, identifying and masking code snippets with specific structures, and using an LLM to fill in the masked portions for synthesizing new test programs. This approach harnesses the generative capabilities of LLMs while retaining the ability to trigger Rust compiler bugs, enabling comprehensive testing of the compiler's behavior, particularly exploring edge cases. We implemented our approach as CLOZEMASTER, which identified 27 confirmed bugs for rustc and mrustc, of ten

What carries the argument

The clozeMask strategy, which extracts code from historical bug reports, masks snippets with specific structures using brackets, and uses LLM infilling to synthesize novel test programs that still trigger compiler bugs.

Load-bearing premise

That masking specific structures in historical bug-triggering Rust code and having LLMs infill the masks will reliably produce valid, novel test programs that trigger additional previously unknown compiler bugs.

What would settle it

A side-by-side run of CLOZEMASTER against current Rust fuzzers on the same compiler versions that finds zero additional unique bugs or shows no gain in code coverage metrics.

Figures

Figures reproduced from arXiv: 2605.00413 by Baowen Xu, Hongyan Gao, Jiangchang Wu, Maolin Sun, Yibiao Yang, Yuming Zhou.

Figure 1
Figure 1. Figure 1: Evolution of Bug Reports in rustc Components from January 2019 to August 2023: A Heatmap Analysis understanding and generation of code semantics. Recently, researchers have leveraged pre-trained LLMs from open-source libraries, employing carefully designed prompt strategies or fine-tuning techniques for specific tasks like domain adaptive code generation [45]. Although LLMs perform well on many NLP tasks, … view at source ↗
Figure 3
Figure 3. Figure 3: The Overview of CLOZEMASTER random approach with a probability of p to delete code tokens at the code segment level in the original dataset. We manually set the value of p to 0.2, as this allows us to retain the original semantic meaning of the code as much as possible when implementing the RD strategy. Additionally, we apply RS as a data augmentation technique for code segments in the original dataset. Sp… view at source ↗
Figure 4
Figure 4. Figure 4: Confirmed bugs that affect the corresponding release view at source ↗
Figure 5
Figure 5. Figure 5: The distribution of unique bugs detected by view at source ↗
Figure 6
Figure 6. Figure 6: Example cases generated by CLOZEMASTER that can reveal bugs in rustc. The portion generated using the clozeMask strategy is highlighted with light orange. (3.34%). Among these, generic const exprs triggers the most Internal Compiler Errors (ICEs) (3/11), due to it being a compiler feature still under development. 2) Examples of reported bugs Here, we present a case study to illustrate the key character￾ist… view at source ↗
read the original abstract

Ensuring the reliability of the Rust compiler is of paramount importance, given increasing adoption of Rust for critical systems development, due to its emphasis on memory and thread safety. However, generating valid test programs for the Rust compiler poses significant challenges, given Rust's complex syntax and strict requirements. With the growing popularity of large language models (LLMs), much research in software testing has explored using LLMs to generate test cases. Still, directly using LLMs to generate Rust programs often results in a large number of invalid test cases. Existing studies have indicated that test cases triggering historical compiler bugs can assist in software testing. Our investigation into Rust compiler bug issues supports this observation. Inspired by existing work and our empirical research, we introduce a bracket-based masking and filling strategy called clozeMask. The clozeMask strategy involves extracting test code from historical issue reports, identifying and masking code snippets with specific structures, and using an LLM to fill in the masked portions for synthesizing new test programs. This approach harnesses the generative capabilities of LLMs while retaining the ability to trigger Rust compiler bugs. It enables comprehensive testing of the compiler's behavior, particularly exploring edge cases. We implemented our approach as a prototype CLOZEMASTER. CLOZEMASTER has identified 27 confirmed bugs for rustc and mrustc, of which 10 have been fixed by developers. Furthermore, our experimental results indicate that CLOZEMASTER outperforms existing fuzzers in terms of code coverage and effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces CLOZEMASTER, a fuzzer for the Rust compiler (rustc and mrustc) that extracts code snippets from historical bug reports, applies a bracket-based masking strategy (clozeMask) to specific syntactic structures, and uses LLMs to infill the masked portions to synthesize new test programs. It claims this yields 27 confirmed bugs (10 fixed by developers) and outperforms existing fuzzers in code coverage and bug-finding effectiveness.

Significance. If the empirical claims hold after providing missing controls, the work could meaningfully advance compiler testing by showing how constrained LLM infilling on real bug-triggering seeds can produce valid, novel programs that trigger additional bugs, addressing the high invalidity rate of unconstrained LLM generation. The approach builds on prior observations about historical bugs aiding testing and could generalize to other languages with complex syntax.

major comments (3)
  1. [§4 (Experimental Results)] Experimental evaluation (likely §4 or §5): The abstract and results claim 27 confirmed bugs and superior coverage, but provide no validity rate for LLM infills, no count of total programs generated, no deduplication method against the seed corpus or known bug database, and no details on the bug confirmation process (e.g., triage criteria or reproduction steps). These omissions are load-bearing for the central effectiveness claim, as the observed bugs could stem from the historical seeds rather than the clozeMask strategy.
  2. [§4 (Experimental Results)] Baseline comparison (likely §4.2 or §5): The paper states CLOZEMASTER outperforms existing fuzzers but does not name the specific baselines, report the experimental setup (time budgets, seed selection criteria, hardware), or include statistical measures (e.g., variance across runs or significance tests) for coverage differences. Without these, the superiority claim cannot be assessed.
  3. [§3 (Approach)] clozeMask strategy definition (likely §3): The description of identifying and masking 'specific structures' is high-level; the paper should specify the exact masking rules, how bracket-based structures are chosen to ensure novelty, and any post-infill validation steps. This directly affects whether the method reliably produces valid, distinct programs as assumed in the weakest point of the argument.
minor comments (2)
  1. [Abstract and §2] The abstract mentions 'our investigation into Rust compiler bug issues supports this observation' but the main text should cite the specific historical reports or dataset used for seed extraction.
  2. [§4] Figure or table presenting coverage results should include raw numbers (e.g., lines/branches covered) alongside percentages for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. The comments highlight important areas where additional details and clarifications will strengthen the paper. We address each major comment below and will incorporate revisions to improve clarity and completeness.

read point-by-point responses
  1. Referee: The abstract and results claim 27 confirmed bugs and superior coverage, but provide no validity rate for LLM infills, no count of total programs generated, no deduplication method against the seed corpus or known bug database, and no details on the bug confirmation process (e.g., triage criteria or reproduction steps). These omissions are load-bearing for the central effectiveness claim, as the observed bugs could stem from the historical seeds rather than the clozeMask strategy.

    Authors: We agree these details are essential for substantiating the claims. In the revised manuscript, we will add: (1) the validity rate of LLM-generated infills (measured as the percentage of programs that parse successfully with rustc); (2) the total number of programs generated across all experiments; (3) the deduplication method, which uses AST hashing and similarity thresholds to remove duplicates against both the original seed corpus and a maintained database of previously reported Rust compiler bugs; and (4) a full description of the bug confirmation process, including automated reproduction on the latest rustc/mrustc versions, manual triage for uniqueness, and confirmation via developer feedback or issue tracking. We will also explicitly state that all 27 bugs were verified as novel and not reproducible from the unmodified historical seeds. revision: yes

  2. Referee: The paper states CLOZEMASTER outperforms existing fuzzers but does not name the specific baselines, report the experimental setup (time budgets, seed selection criteria, hardware), or include statistical measures (e.g., variance across runs or significance tests) for coverage differences. Without these, the superiority claim cannot be assessed.

    Authors: We acknowledge the need for full experimental transparency. The revised §4 will explicitly name the baselines (RustFuzz, AFL++, and two LLM-based generators from recent compiler testing literature), detail the setup (24-hour time budgets per run, seeds drawn from the same historical bug corpus, hardware configuration with Intel Xeon processors and 64 GB RAM), and report statistical measures including mean code coverage with standard deviation over multiple runs and results from paired statistical significance tests (e.g., Wilcoxon signed-rank) to support the coverage and bug-finding comparisons. revision: yes

  3. Referee: The description of identifying and masking 'specific structures' is high-level; the paper should specify the exact masking rules, how bracket-based structures are chosen to ensure novelty, and any post-infill validation steps. This directly affects whether the method reliably produces valid, distinct programs as assumed in the weakest point of the argument.

    Authors: We will expand the description in §3 with precise details. The clozeMask rules use the Rust parser to locate bracket-delimited constructs (function bodies, match expressions, impl blocks, and struct literals) and mask their interior content while preserving the outer brackets and surrounding context. Structures are selected from historical bug reports based on syntactic complexity metrics (e.g., nesting depth and presence of unsafe or generic code). Novelty is ensured by requiring the LLM infill to differ semantically from the seed (via type and control-flow checks). Post-infill, we apply a two-stage validation: (1) rustc parsing to discard syntactically invalid programs, and (2) differential execution against the seed to confirm behavioral novelty before fuzzing. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external data and LLM generation

full rationale

The paper describes an empirical fuzzing technique (clozeMask) that extracts code from historical Rust compiler bug reports, applies bracket-based masking, and uses LLMs to infill new programs. No equations, fitted parameters, or derivations are present. Claims of 27 confirmed bugs and superior coverage are presented as experimental outcomes, not as quantities forced by construction from the method's inputs. Historical bug reports and general LLM capabilities are external to the paper; no self-citation chain or self-definitional loop supports the central results. The work is self-contained against external benchmarks (actual compiler runs and developer fixes) and receives a normal non-finding score.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the unverified assumption that LLM infilling preserves bug-triggering properties while producing valid code; no free parameters or invented entities beyond the named strategy are evident from the abstract.

axioms (1)
  • domain assumption Historical Rust compiler bug reports contain code that, when masked and infilled by LLMs, can trigger new bugs while remaining valid.
    This is the core premise drawn from the authors' empirical investigation but not demonstrated in the abstract.
invented entities (1)
  • clozeMask no independent evidence
    purpose: Bracket-based masking and LLM infilling strategy for test program synthesis
    Newly introduced technique that forms the basis of the approach.

pith-pipeline@v0.9.0 · 5585 in / 1303 out tokens · 66299 ms · 2026-05-09T19:26:52.174621+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 42 canonical work pages · 3 internal anchors

  1. [1]

    How do programmers use unsafe rust?

    V . Astrauskas, C. Matheja, F. Poli, P. M ¨uller, and A. J. Summers, “How do programmers use unsafe rust?”Proc. ACM Program. Lang., vol. 4, no. OOPSLA, pp. 136:1–136:27, 2020. [Online]. Available: https://doi.org/10.1145/3428204

  2. [2]

    Verus: Verifying rust programs using linear ghost types,

    A. Lattuada, T. Hance, C. Cho, M. Brun, I. Subasinghe, Y . Zhou, J. Howell, B. Parno, and C. Hawblitzel, “Verus: Verifying rust programs using linear ghost types,”Proc. ACM Program. Lang., vol. 7, no. OOPSLA1, apr 2023. [Online]. Available: https: //doi.org/10.1145/3586037

  3. [3]

    Hardware/software co-assurance for the rust programming language applied to zero trust architecture development,

    D. Hardin, “Hardware/software co-assurance for the rust programming language applied to zero trust architecture development,”Ada Lett., vol. 42, no. 2, p. 55–61, apr 2023. [Online]. Available: https://doi.org/10.1145/3591335.3591340

  4. [4]

    Understanding memory and thread safety practices and issues in real-world rust programs,

    B. Qin, Y . Chen, Z. Yu, L. Song, and Y . Zhang, “Understanding memory and thread safety practices and issues in real-world rust programs,” inProceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, A. F. Donaldson and E. Torlak, Eds. ACM, 2020, pp. 763–779. [O...

  5. [5]

    Behaviorally typed state machines in typescript for heterogeneous swarms

    M. Sharma, P. Yu, and A. F. Donaldson, “Rustsmith: Random differential compiler testing for rust,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 1483–1486. [Online]. Available: https://doi.org/10.1145/3597926.3604919

  6. [6]

    Adventure of a lifetime: Extract method refactoring for rust,

    S. Thy, A. Costea, K. Gopinathan, and I. Sergey, “Adventure of a lifetime: Extract method refactoring for rust,”Proc. ACM Program. Lang., vol. 7, no. OOPSLA2, oct 2023. [Online]. Available: https://doi.org/10.1145/3622821

  7. [7]

    A grounded conceptual model for ownership types in rust,

    W. Crichton, G. Gray, and S. Krishnamurthi, “A grounded conceptual model for ownership types in rust,”Proc. ACM Program. Lang., vol. 7, no. OOPSLA2, oct 2023. [Online]. Available: https://doi.org/10.1145/3622841

  8. [8]

    In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)

    J. Jiang, H. Xu, and Y . Zhou, “Rulf: Rust library fuzzing via api dependency graph traversal,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’21. IEEE Press, 2022, p. 581–592. [Online]. Available: https://doi.org/10.1109/ASE51524.2021.9678813

  9. [9]

    Next steps for rust in the kernel,

    J. Corbet, “Next steps for rust in the kernel,” Website, 2022, https: //lwn.net/Articles/908347/

  10. [10]

    Microsoft is busy rewriting core windows code in memory-safe rust,

    T. Claburn, “Microsoft is busy rewriting core windows code in memory-safe rust,” Website, 2023, https://www.theregister.com/2023/ 04/27/microsoft windows rust/

  11. [11]

    Huggingface, “candle,” 2023, https://github.com/huggingface/candle

  12. [12]

    The rise of rust, the ‘viral’ secure programming language that’s taking over tech

    L. H. Newman, “The rise of rust, the ‘viral’ secure programming language that’s taking over tech.” 2022, https://www.wired.com/story/ rust-secure-programming-language-memory-safe/

  13. [13]

    The top programming languages,

    Github, “The top programming languages,” Website, 2022, https:// octoverse.github.com/2022/top-programming-languages

  14. [14]

    The case for memory safe roadmaps,

    U. government, “The case for memory safe roadmaps,” 2024, https: //www.cisa.gov/resources-tools/resources/case-memory-safe-roadmaps

  15. [15]

    Testing the compiler for a new-born programming language: An industrial case study (experience paper),

    Y . Zhao, J. Chen, R. Fu, H. Ye, and Z. Wang, “Testing the compiler for a new-born programming language: An industrial case study (experience paper),” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 551–563. [Online]. Availabl...

  16. [16]

    Finding and understanding bugs in c compilers,

    X. Yang, Y . Chen, E. Eide, and J. Regehr, “Finding and understanding bugs in c compilers,”SIGPLAN Not., vol. 46, no. 6, p. 283–294, jun

  17. [17]

    Available: https://doi.org/10.1145/1993316.1993532

    [Online]. Available: https://doi.org/10.1145/1993316.1993532

  18. [18]

    Random testing for c and c++ compilers with yarpgen,

    V . Livinskii, D. Babokin, and J. Regehr, “Random testing for c and c++ compilers with yarpgen,”Proc. ACM Program. Lang., vol. 4, no. OOPSLA, nov 2020. [Online]. Available: https://doi.org/10.1145/3428264

  19. [20]

    https://doi.org/10.1145/2594291.2594334

    V . Le, M. Afshari, and Z. Su, “Compiler validation via equivalence modulo inputs,” pp. 216–226, 2014. [Online]. Available: https: //doi.org/10.1145/2594291.2594334

  20. [21]

    Slemi: Equivalence modulo input (emi) based mutation of cps models for finding compiler bugs in simulink,

    S. A. Chowdhury, S. L. Shrestha, T. T. Johnson, and C. Csallner, “Slemi: Equivalence modulo input (emi) based mutation of cps models for finding compiler bugs in simulink,” inProceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 335–346. [Online]....

  21. [22]

    Many- core compiler fuzzing,

    C. Lidbury, A. Lascu, N. Chong, and A. F. Donaldson, “Many- core compiler fuzzing,” pp. 65–76, 2015. [Online]. Available: https://doi.org/10.1145/2737924.2737986

  22. [23]

    Coverage-guided tensor compiler fuzzing with joint ir-pass mutation,

    J. Liu, Y . Wei, S. Yang, Y . Deng, and L. Zhang, “Coverage-guided tensor compiler fuzzing with joint ir-pass mutation,”Proc. ACM Program. Lang., vol. 6, no. OOPSLA1, apr 2022. [Online]. Available: https://doi.org/10.1145/3527317

  23. [24]

    Skeletal program enumeration for rigorous compiler testing,

    Q. Zhang, C. Sun, and Z. Su, “Skeletal program enumeration for rigorous compiler testing,” inProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2017. New York, NY , USA: Association for Computing Machinery, 2017, p. 347–361. [Online]. Available: https://doi.org/10.1145/3062341.3062379

  24. [25]

    Rust survey 2018 results,

    R. language, “Rust survey 2018 results,” 2018, https: //blog.rust-lang.org/2018/11/27/Rust-survey-2018.htmlhttps://blog.rust- lang.org/2018/11/27/Rust-survey-2018.html

  25. [26]

    A Multimodal Study of Challenges Using Rust,

    M. Coblenz, A. Porter, V . Das, T. Nallagorla, and M. Hicks, “A Multimodal Study of Challenges Using Rust,” 3 2023. [Online]. Available: https://kilthub.cmu.edu/articles/conference contribution/A Multimodal Study of Challenges Using Rust/22277326

  26. [27]

    rust-code-analysis: A rust library to analyze and extract maintainability information from source codes,

    L. Ardito, L. Barbato, M. Castelluccio, R. Coppola, C. Denizet, S. Ledru, and M. Valsesia, “rust-code-analysis: A rust library to analyze and extract maintainability information from source codes,”SoftwareX, vol. 12, p. 100635, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352711020303484

  27. [28]

    Deep learning-based software engineering: Progress, challenges, and opportunities,

    X. Chen, X. Hu, Y . Huang, H. Jiang, W. Ji, Y . Jiang, Y . Jiang, B. Liu, H. Liu, X. Li, X. Lian, G. Meng, X. Peng, H. Sun, L. Shi, B. Wang, C. Wang, J. Wang, T. Wang, J. Xuan, X. Xia, Y . Yang, Y . Yang, L. Zhang, Y . Zhou, and L. Zhang, “Deep learning-based software engineering: Progress, challenges, and opportunities,”SCIENCE CHINA Information Sciences...

  28. [29]

    White-box compiler fuzzing empowered by large language models

    C. Yang, Y . Deng, R. Lu, J. Yao, J. Liu, R. Jabbarvand, and L. Zhang, “White-box Compiler Fuzzing Empowered by Large Language Models,”ArXiv preprint, vol. abs/2310.15991, 2023. [Online]. Available: https://arxiv.org/abs/2310.15991

  29. [30]

    Large language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries,

    Y . Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, and L. Zhang, “Large language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries,” inProceedings of the 46th IEEE/ACM International Conference on Software Engineering, ser. ICSE ’24. New York, NY , USA: Association for Computing Machinery,

  30. [31]
  31. [32]

    Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models,

    Y . Deng, C. S. Xia, H. Peng, C. Yang, and L. Zhang, “Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models,” inProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 423–435. [Online]. ...

  32. [33]

    KernelGPT: Enhanced Kernel Fuzzing via Large Language Models,

    C. Yang, Z. Zhao, and L. Zhang, “KernelGPT: Enhanced Kernel Fuzzing via Large Language Models,”ArXiv preprint, vol. abs/2401.00563, 2024. [Online]. Available: https://arxiv.org/abs/2401.00563

  33. [34]

    Fuzz4all: Universal fuzzing with large language models,

    C. S. Xia, M. Paltenghi, J. L. Tian, M. Pradel, and L. Zhang, “Fuzz4all: Universal fuzzing with large language models,” inProceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024. ACM, 2024, pp. 126:1– 126:13. [Online]. Available: https://doi.org/10.1145/3597503.3639121

  34. [36]

    InCoder: A Generative Model for Code Infilling and Synthesis

    [Online]. Available: https://arxiv.org/abs/2204.05999

  35. [37]

    Enriching compiler testing with real program from bug report,

    H. Zhong, “Enriching compiler testing with real program from bug report,” inProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’22. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3551349.3556894

  36. [38]

    doi: 10.1109/ICSE48619.2023.00055

    M. Sun, Y . Yang, M. Wen, Y . Wang, Y . Zhou, and H. Jin, “Validating smt solvers via skeleton enumeration empowered by historical bug-triggering inputs,” inProceedings of the 45th International Conference on Software Engineering, ser. ICSE ’23. IEEE Press, 2023, p. 69–81. [Online]. Available: https://doi.org/10.1109/ICSE48619.2023.00018

  37. [39]

    Oom-guard: Towards improving the ergonomics of rust oom handling via a reservation-based approach,

    C. Chen, Z. Zhang, H. Tian, S. Yan, and H. Xu, “Oom-guard: Towards improving the ergonomics of rust oom handling via a reservation-based approach,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2023. New York, NY , USA: Association for Computing Machiner...

  38. [40]

    In: ESEC/FSE (2023).https://doi.org/10.1145/3611643.3613871

    Y . Zhang, A. Kundu, G. Portokalidis, and J. Xu, “On the dual nature of necessity in use of rust unsafe code,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 2032–2037. [Online]. Avai...

  39. [41]

    Learning and programming challenges of rust: a mixed-methods study,

    S. Zhu, Z. Zhang, B. Qin, A. Xiong, and L. Song, “Learning and programming challenges of rust: a mixed-methods study,” in Proceedings of the 44th International Conference on Software Engineering, ser. ICSE ’22. New York, NY , USA: Association for Computing Machinery, 2022, p. 1269–1281. [Online]. Available: https://doi.org/10.1145/3510003.3510164

  40. [42]

    Fuzzing the rust typechecker using clp,

    K. Dewey, J. Roesch, and B. Hardekopf, “Fuzzing the rust typechecker using clp,” inProceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’15. IEEE Press, 2015, p. 482–493. [Online]. Available: https://doi.org/10.1109/ASE.2015.65

  41. [43]

    Language models are few-shot learners,

    T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Am...

  42. [44]

    Exploring the limits of transfer learning with a unified text-to-text transformer,

    C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,”J. Mach. Learn. Res., vol. 21, pp. 140:1–140:67, 2020. [Online]. Available: https://jmlr.org/papers/v21/20-074.html

  43. [45]

    Palm: Scaling language modeling with pathways,

    A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y . Tay, N. M. Shazeer, V . Prabhakaran, E. Reif, N. Du, B. C. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S...

  44. [46]

    Starcoder: may the source be with you!

    R. Li, L. B. Allal, Y . Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chim, Q. Liu, E. Zheltonozhskii, T. Y . Zhuo, T. Wang, O. Dehaene, M. Davaadorj, J. Lamy-Poirier, J. Monteiro, O. Shliazhko, N. Gontier, N. Meade, A. Zebaze, M. Yee, L. K. Umapathi, J. Zhu, B. Lipkin, M. Oblokulov, Z. Wang, R. M. V , J. T. Stillerman, S. S. Pat...

  45. [47]

    Codet5+: Open code large language models for code understanding and generation,

    Y . Wang, H. Le, A. D. Gotmare, N. D. Q. Bui, J. Li, and S. C. H. Hoi, “Codet5+: Open code large language models for code understanding and generation,” inConference on Empirical Methods in Natural Language Processing, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:258685677

  46. [48]

    Domain adaptive code completion via language models and decoupled domain databases,

    Z. Tang, J. Ge, S. Liu, T. Zhu, T. Xu, L. Huang, and B. Luo, “Domain adaptive code completion via language models and decoupled domain databases,”2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 421–433, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:261030382

  47. [49]

    Deep long- tailed learning: A survey,

    Y . Zhang, B. Kang, B. Hooi, S. Yan, and J. Feng, “Deep long- tailed learning: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 9, p. 10795–10816, sep 2023. [Online]. Available: https://doi.org/10.1109/TPAMI.2023.3268118

  48. [50]

    https://pypi.org/project/BeautifulSoup/

  49. [51]

    Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt

    Y . Deng, C. S. Xia, C. Yang, S. Dylan Zhang, S. Yang, and L. Zhang, “Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT,”arXiv e-prints, p. arXiv:2304.02014, Apr. 2023

  50. [52]

    rustc-testsuite,

    rust lang, “rustc-testsuite,” 2010, https://github.com/rust-lang/rust/tree/ master/tests

  51. [53]

    glacier,

    rust lang, “glacier,” 2015, https://github.com/rust-lang/glacier

  52. [54]

    Boosting source code learning with data augmentation: An empirical study,

    Z. Dong, Q. Hu, Y . Guo, Z. Zhang, M. Cordy, M. Papadakis, Y . L. Traon, and J. Zhao, “Boosting source code learning with data augmentation: An empirical study,”CoRR, vol. abs/2303.06808, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.06808

  53. [55]

    Neural machine translation of rare words with subword units,

    R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” inProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), K. Erk and N. A. Smith, Eds. Berlin, Germany: Association for Computational Linguistics, Aug. 2016, pp. 1715–1725. [Online]. Available: https...

  54. [56]

    Language models are unsupervised multitask learners,

    A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2019. [Online]. Available: https://api.semanticscholar.org/CorpusID:160025533

  55. [57]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y . Bengio and Y . LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6980

  56. [58]

    Where does the time go? rust’s prob- lem with slow compiles,

    J. Jackson, “Where does the time go? rust’s prob- lem with slow compiles,” 2024, https://thenewstack.io/ where-does-the-time-go-rusts-problem-with-slow-compiles/

  57. [59]

    Rustlantis,

    A. Wang, “Rustlantis,” 2023, https://github.com/cbeuw/rustlantis

  58. [60]

    nomicon,

    rust lang, “nomicon,” 2017, https://github.com/rust-lang/nomicon

  59. [61]

    rust by example,

    rust lang, “rust by example,” 2014, https://github.com/rust-lang/ rust-by-example

  60. [62]

    rust cookbook,

    rust lang, “rust cookbook,” 2017, https://github.com/rust-lang-nursery/ rust-cookbook

  61. [63]

    human eval,

    Openai, “human eval,” 2021, https://github.com/openai/human-eval

  62. [64]

    Codeshell,

    T. K. C. L. at Peking University, “Codeshell,” 2023, https://github.com/ WisdomShell/codeshell

  63. [65]

    GPT-4o System Card

    OpenAI, :, A. Hurst, A. Lerer, and . J. W. Goucher, “GPT-4o System Card,”arXiv e-prints, p. arXiv:2410.21276, Oct. 2024

  64. [66]

    Detect stack overflow bugs in rust via improved fuzzing technique,

    Z. Ren and H. Xu, “Detect stack overflow bugs in rust via improved fuzzing technique,” inThe 35th International Conference on Software Engineering and Knowledge Engineering, SEKE 2023, KSIR Virtual Conference Center, USA, July 1-10, 2023, S. Chang, Ed. KSI Research Inc., 2023, pp. 175–180. [Online]. Available: https://doi.org/10.18293/SEKE2023-122

  65. [67]

    Assessing the correctness of jvm implementations,

    A. Calvagna, A. Fornaia, and E. Tramontana, “Assessing the correctness of jvm implementations,” inProceedings of the 2014 IEEE 23rd International WETICE Conference, ser. WETICE ’14. USA: IEEE Computer Society, 2014, p. 390–395. [Online]. Available: https://doi.org/10.1109/WETICE.2014.33

  66. [68]

    Automated conformance testing of java virtual machines,

    A. Calvagna and E. Tramontana, “Automated conformance testing of java virtual machines,” inProceedings of the 2013 Seventh International Conference on Complex, Intelligent, and Software Intensive Systems, ser. CISIS ’13. USA: IEEE Computer Society, 2013, p. 547–552. [Online]. Available: https://doi.org/10.1109/CISIS.2013.99

  67. [69]

    Synthesizing program input grammars,

    O. Bastani, R. Sharma, A. Aiken, and P. Liang, “Synthesizing program input grammars,” inProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2017. New York, NY , USA: Association for Computing Machinery, 2017, p. 95–110. [Online]. Available: https://doi.org/10.1145/3062341.3062349

  68. [70]

    Edsketch: Execution-driven sketching for java,

    J. Hua and S. Khurshid, “Edsketch: Execution-driven sketching for java,” inProceedings of the 24th ACM SIGSOFT International SPIN Symposium on Model Checking of Software, ser. SPIN 2017. New York, NY , USA: Association for Computing Machinery, 2017, p. 162–171. [Online]. Available: https://doi.org/10.1145/3092282.3092285

  69. [71]

    Llm-based code generation method for golang compiler testing,

    Q. Gu, “Llm-based code generation method for golang compiler testing,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2023. New York, NY , USA: Association for Computing Machinery, 2023, p. 2201–2203. [Online]. Available: https://doi.org/10.1145/3611643.3617850

  70. [72]

    Copiloting the copilots: Fusing large language models with completion engines for automated program repair,

    Y . Wei, C. S. Xia, and L. Zhang, “Copiloting the copilots: Fusing large language models with completion engines for automated program repair,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2023. New York, NY , USA: Association for Computing Machinery, 2...