pith. the verified trust layer for science. sign in

arxiv: 2601.00376 · v3 · submitted 2026-01-01 · 💻 cs.SE · cs.AI

In Line with Context: Repository-Level Code Generation via Context Inlining

Pith reviewed 2026-05-16 18:01 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords repository-level code generationcontext inliningcall graphLLM code completionanchor generationbidirectional inlining
0
0 comments X p. Extension

The pith

InlineCoder reframes repository-level code generation as a function-level task by inlining the unfinished function into its call graph.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces InlineCoder for repository-level code generation, where models must reason over complex dependencies across an entire codebase. It generates a draft completion of the target function, called an anchor, to approximate downstream dependencies and enable confidence estimation via perplexity. The anchor then supports bidirectional inlining: embedding into callers to capture usage scenarios and retrieving callees to supply precise dependency context. This enriched prompt allows the language model to treat the repository task as a simpler function-level coding problem.

Core claim

Given a function signature, InlineCoder first generates a draft completion termed an anchor which approximates downstream dependencies and enables perplexity-based confidence estimation. This anchor drives a bidirectional inlining process: upstream inlining embeds the anchor into its callers to capture diverse usage scenarios, and downstream retrieval integrates the anchor's callees into the prompt to provide precise dependency context. The resulting combination equips the LLM with a comprehensive repository view.

What carries the argument

The anchor, a draft completion of the target function, which approximates dependencies to drive bidirectional inlining across the call graph and reframe repository understanding as function-level coding.

If this is right

  • Repository-level completions become more accurate by incorporating usage scenarios from callers and exact dependencies from callees.
  • The method reduces reliance on surface-level similarity retrieval such as RAG by using call-graph structure instead.
  • LLMs can solve repository tasks without needing to process the entire codebase at once.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The inlining idea could extend to other structured generation settings where partial outputs help gather surrounding context.
  • Hybrid systems combining this anchor-driven inlining with existing retrieval techniques might handle even larger repositories.
  • Call-graph analysis efficiency will determine how well the method scales to very large codebases.

Load-bearing premise

Generating a draft completion of the function sufficiently approximates downstream dependencies to support effective perplexity estimation and bidirectional inlining.

What would settle it

An experiment replacing the generated anchor with a random or empty body and observing no drop in completion quality would show the anchor approximation is not load-bearing.

Figures

Figures reproduced from arXiv: 2601.00376 by Beijun Shen, Chao Hu, Wenhao Zeng, Xiaodong Gu, Yuling Shi.

Figure 1
Figure 1. Figure 1: A Motivating Example. Inlining The target function into its call chain creates a more context-aware [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Framework of InlineCoder. model understand the information in context C, which entails reasoning over entire repositories, understanding complex dependencies across functions, classes, and modules. To tackle this challenge, we propose InlineCoder, a novel framework for repository-level code generation. Unlike previous techniques that retrieve similar snippets in the context, the core idea of InlineCoder is… view at source ↗
Figure 3
Figure 3. Figure 3: Function Inlining. (2) Return Normalization: We define a transformation function 𝜏 : S → S operating on statements: 𝜏 (𝑠) =    result = exp if 𝑠 ≡ return 𝑒𝑥𝑝, result = None if 𝑠 ≡ return, 𝑠 otherwise. (3) Lifting 𝜏 to statement sets, we obtain the normalized body: 𝜏 (Body) = { 𝜏 (𝑠) | 𝑠 ∈ Body }. (4) (3) Assignment Redirection: Suppose the original call site has the form x = f(𝑎1, . . . , 𝑎𝑚). A… view at source ↗
Figure 4
Figure 4. Figure 4: Downstream Retrieval. view of the target function’s role, which facilitates the inference of input/output expectations, preserves the logical flow of computations, and highlights the relationships between variables and control structures. This transformation also reduces ambiguity and eliminates the distractions inherent in separately retrieved snippets, enabling the model to focus on the essential behavio… view at source ↗
Figure 5
Figure 5. Figure 5: Prompt Template for The Final Context-Enhanced Code Generation. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of Effectiveness across Various Domains. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A case about the effectiveness of the bidirectional call inlining. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: A case about the impact of confidence guidance on mitigating self-repetition bias. [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effectiveness Comparison in Different Context Environments. [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
read the original abstract

Repository-level code generation has attracted growing attention in recent years. Unlike function-level code generation, it requires the model to understand the entire repository, reasoning over complex dependencies across functions, classes, and modules. However, existing approaches such as retrieval-augmented generation (RAG) or context-based function selection often fall short: they primarily rely on surface-level similarity and struggle to capture the rich dependencies that govern repository-level semantics. In this paper, we introduce InlineCoder, a novel framework for repository-level code generation. InlineCoder enhances the understanding of repository context by inlining the unfinished function into its call graph, thereby reframing the challenging repository understanding as an easier function-level coding task. Given a function signature, InlineCoder first generates a draft completion, termed an anchor, which approximates downstream dependencies and enables perplexity-based confidence estimation. This anchor drives a bidirectional inlining process: (i) Upstream Inlining, which embeds the anchor into its callers to capture diverse usage scenarios; and (ii) Downstream Retrieval, which integrates the anchor's callees into the prompt to provide precise dependency context. The enriched context, combining draft completion with upstream and downstream perspectives, equips the LLM with a comprehensive repository view.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces InlineCoder, a framework for repository-level code generation. Given a function signature, it first generates a draft completion called an anchor to approximate downstream dependencies and support perplexity-based confidence estimation. The anchor then drives bidirectional inlining: upstream inlining embeds the anchor into callers to capture usage scenarios, while downstream retrieval integrates the anchor's callees to provide dependency context. This reframes repository-level understanding as an easier function-level coding task by enriching the prompt with draft completion plus upstream and downstream perspectives.

Significance. If the central mechanism holds, InlineCoder could meaningfully advance repository-level code generation by moving beyond surface-level RAG to structured call-graph inlining, potentially improving LLM handling of interprocedural dependencies in large codebases.

major comments (2)
  1. [Abstract] Abstract: the central claim that the anchor 'approximates downstream dependencies' and enables effective bidirectional inlining is load-bearing, yet the manuscript supplies no experimental results, ablation studies, or quantitative evidence (e.g., no tables reporting pass rates, perplexity correlations, or comparisons to RAG baselines) to demonstrate that low-perplexity anchors correlate with semantic fidelity rather than superficial fluency.
  2. [Abstract] Abstract: the assumption that generating the anchor from the signature alone suffices to capture non-local state, side effects, or interprocedural invariants is unsupported; without a concrete test or failure-case analysis, the reframing from repository-level to function-level reasoning risks encoding incorrect usage patterns when the anchor deviates from the true body.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report on our manuscript. The two major comments both focus on the abstract's presentation of the anchor mechanism and its supporting claims. We address each point directly below, acknowledge where the current text is insufficient, and indicate the specific revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the anchor 'approximates downstream dependencies' and enables effective bidirectional inlining is load-bearing, yet the manuscript supplies no experimental results, ablation studies, or quantitative evidence (e.g., no tables reporting pass rates, perplexity correlations, or comparisons to RAG baselines) to demonstrate that low-perplexity anchors correlate with semantic fidelity rather than superficial fluency.

    Authors: We agree that the abstract, as written, presents the central claims about the anchor without accompanying quantitative evidence. The manuscript body describes the framework and its motivation but does not yet include the requested tables or ablations in the abstract itself. To address this, we will revise the abstract to incorporate concise references to the key experimental outcomes (pass@1 improvements over RAG baselines and the observed perplexity-semantic fidelity correlation) that appear in the evaluation section. This change will make the load-bearing claims directly supported at the abstract level. revision: yes

  2. Referee: [Abstract] Abstract: the assumption that generating the anchor from the signature alone suffices to capture non-local state, side effects, or interprocedural invariants is unsupported; without a concrete test or failure-case analysis, the reframing from repository-level to function-level reasoning risks encoding incorrect usage patterns when the anchor deviates from the true body.

    Authors: The referee is correct that the abstract provides no concrete test or failure-case analysis for the assumption that a signature-only anchor can capture non-local state and invariants. The manuscript discusses the approximate nature of the anchor in the method section and notes that bidirectional inlining is intended to mitigate deviations, but this is not demonstrated with explicit failure cases. We will add a short paragraph to the abstract (and expand the limitations discussion) that summarizes representative failure modes and how the upstream/downstream inlining steps reduce the impact of anchor errors. This revision will be included in the next version. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper describes InlineCoder as a procedural framework: generate an anchor draft from the function signature, estimate confidence via perplexity, then perform upstream inlining of the anchor into callers and downstream retrieval of callees. This reframing of repository-level generation as function-level coding is presented as a design choice supported by the LLM's external generation and retrieval capabilities. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the provided derivation. The central claim rests on the empirical effectiveness of the anchor as a proxy rather than reducing by construction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that an LLM-generated anchor can serve as a reliable proxy for downstream dependencies; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption An LLM can generate a draft completion (anchor) that approximates downstream dependencies well enough for inlining decisions.
    The method explicitly relies on the anchor for both upstream embedding and downstream retrieval.

pith-pipeline@v0.9.0 · 5521 in / 1125 out tokens · 22752 ms · 2026-05-16T18:01:43.955158+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Given a function signature, InlineCoder first generates a draft completion, termed an anchor, which approximates downstream dependencies and enables perplexity-based confidence estimation. This anchor drives a bidirectional inlining process

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

    cs.SE 2026-04 unverdicted novelty 7.0

    ClassEval-Pro benchmark shows frontier LLMs achieve at most 45.6% Pass@1 on class-level code tasks, with logic errors (56%) and dependency errors (38%) as dominant failure modes.

  2. ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction

    cs.CV 2026-04 unverdicted novelty 7.0

    ShredBench shows state-of-the-art MLLMs perform well on intact documents but suffer sharp drops in restoration accuracy as fragmentation increases to 8-16 pieces, indicating insufficient cross-modal semantic reasoning...

  3. CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

    cs.CL 2026-02 unverdicted novelty 7.0

    Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · cited by 3 Pith papers · 9 internal anchors

  1. [1]

    Ramakrishna Bairi, Atharv Sonwane, Aditya Kanade, Vageesh D C, Arun Iyer, Suresh Parthasarathy, Sriram Rajamani, Balasubramanyan Ashok, and Shashank Shet. 2024. Codeplan: Repository-level coding using llms and planning. Proceedings of the ACM on Software Engineering1, FSE (2024), 675–698

  2. [2]

    Antonio Valerio Miceli Barone and Rico Sennrich. 2017. A parallel corpus of python functions and documentation strings for automated code documentation and code generation.arXiv preprint arXiv:1707.02275(2017)

  3. [3]

    Brett A Becker, Paul Denny, James Finnie-Ansley, Andrew Luxton-Reilly, James Prather, and Eddie Antonio Santos

  4. [4]

    In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V

    Programming is hard-or at least it used to be: Educational opportunities and challenges of ai code generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 500–506

  5. [5]

    Zhangqian Bi, Yao Wan, Zheng Wang, Hongyu Zhang, Batu Guan, Fangxin Lu, Zili Zhang, Yulei Sui, Hai Jin, and Xuanhua Shi. 2024. Iterative refinement of project-level code context for precise code generation with compiler feedback.arXiv preprint arXiv:2403.16792(2024)

  6. [6]

    Silin Chen, Shaoxin Lin, Xiaodong Gu, Yuling Shi, Heng Lian, Longfei Yun, Dong Chen, Weiguo Sun, Lin Cao, and Qianxiang Wang. 2025. Swe-exp: Experience-driven software issue resolution.arXiv preprint arXiv:2507.23361(2025)

  7. [7]

    Wei Cheng, Yuhan Wu, and Wei Hu. 2024. Dataflow-guided retrieval augmentation for repository-level code completion. arXiv preprint arXiv:2405.19782(2024)

  8. [8]

    Ajinkya Deshpande, Anmol Agarwal, Shashank Shet, Arun Iyer, Aditya Kanade, Ramakrishna Bairi, and Suresh Parthasarathy. 2024. Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository.arXiv preprint arXiv:2405.01573(2024). , Vol. 1, No. 1, Article . Publication date: January 2026. 20 Hu et al

  9. [9]

    Yangruibo Ding, Zijian Wang, Wasi Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, et al. 2023. Crosscodeeval: A diverse and multilingual benchmark for cross-file code completion.Advances in Neural Information Processing Systems36 (2023), 46701–46723

  10. [10]

    Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou. 2024. Evaluating large language models in class-level code generation. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13

  11. [11]

    Xinyu Gao, Yun Xiong, Deze Wang, Zhenhan Guan, Zejian Shi, Haofen Wang, and Shanshan Li. 2024. Preference- guided refactored tuning for retrieval augmented code generation. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 65–77

  12. [12]

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.109972, 1 (2023)

  13. [13]

    Leonidas Gee, Milan Gritta, Gerasimos Lampouras, and Ignacio Iacobacci. 2024. Code-optimise: Self-generated preference data for correctness and efficiency.arXiv preprint arXiv:2406.12502(2024)

  14. [14]

    Xiaodong Gu, Meng Chen, Yalan Lin, Yuhan Hu, Hongyu Zhang, Chengcheng Wan, Zhao Wei, Yong Xu, and Juhong Wang. 2025. On the effectiveness of large language models in domain-specific code generation.ACM Transactions on Software Engineering and Methodology34, 3 (2025), 1–22

  15. [15]

    Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified Cross-Modal Pre- training for Code Representation. InFindings of the Association for Computational Linguistics: ACL 2022. 2563–2575

  16. [16]

    Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming–The Rise of Code Intelligence. arXiv preprint arXiv:2401.14196(2024)

  17. [17]

    Mehadi Hassen and Philip K Chan. 2017. Scalable function call graph-based malware classification. InProceedings of the Seventh ACM on Conference on Data and Application Security and Privacy. 239–248

  18. [18]

    Baizhou Huang, Shuai Lu, Weizhu Chen, Xiaojun Wan, and Nan Duan. 2023. Enhancing large language models in coding through multi-perspective self-consistency.arXiv preprint arXiv:2309.17272(2023)

  19. [19]

    Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. 2024. Qwen2. 5-coder technical report.arXiv preprint arXiv:2409.12186(2024)

  20. [20]

    Fred Jelinek, Robert L Mercer, Lalit R Bahl, and James K Baker. 1977. Perplexity—a measure of the difficulty of speech recognition tasks.The Journal of the Acoustical Society of America62, S1 (1977), S63–S63

  21. [21]

    Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues?, 2024.URL https://arxiv. org/abs/2310.067707 (2023)

  22. [22]

    Majeed Kazemitabaar, Justin Chow, Carl Ka To Ma, Barbara J Ericson, David Weintrop, and Tovi Grossman. 2023. Studying the effect of AI code generators on supporting novice learners in introductory programming. InProceedings of the 2023 CHI conference on human factors in computing systems. 1–23

  23. [23]

    VI Lcvenshtcin. 1966. Binary coors capable or ‘correcting deletions, insertions, and reversals. InSoviet physics-doklady, Vol. 10

  24. [24]

    Nam Le Hai, Dung Manh Nguyen, and Nghi DQ Bui. 2024. Repoexec: Evaluate code generation with a repository-level executable benchmark.arXiv e-prints(2024), arXiv–2406

  25. [25]

    Daniel Le Métayer and David Schmidt. 1996. Structural operational semantics as a basis for static program analysis. ACM Computing Surveys (CSUR)28, 2 (1996), 340–343

  26. [26]

    Han Li, Yuling Shi, Shaoxin Lin, Xiaodong Gu, Heng Lian, Xin Wang, Yantao Jia, Tao Huang, and Qianxiang Wang

  27. [27]

    Swe-debate: Competitive multi-agent debate for software issue resolution.arXiv preprint arXiv:2507.23348(2025)

  28. [28]

    Jia Li, Ge Li, Xuanming Zhang, Yihong Dong, and Zhi Jin. 2024. Evocodebench: An evolving code generation benchmark aligned with real-world code repositories.arXiv preprint arXiv:2404.00599(2024)

  29. [29]

    Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Huanyu Liu, Hao Zhu, Lecheng Wang, Kaibo Liu, Zheng Fang, Lanshen Wang, et al. 2024. DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories. InFindings of the Association for Computational Linguistics ACL 2024. 3603–3614

  30. [30]

    Jia Li, Xianjie Shi, Kechi Zhang, Lei Li, Ge Li, Zhengwei Tao, Jia Li, Fang Liu, Chongyang Tao, and Zhi Jin. 2025. CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation. arXiv:2504.10046 [cs.SE] https: //arxiv.org/abs/2504.10046

  31. [31]

    Ming Liang, Xiaoheng Xie, Gehao Zhang, Xunjin Zheng, Peng Di, Hongwei Chen, Chengpeng Wang, Gang Fan, et al

  32. [32]

    Repofuse: Repository-level code completion with fused dual context.arXiv preprint arXiv:2402.14323(2024)

  33. [33]

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. 2024. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437(2024)

  34. [34]

    Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, and Yiling Lou. 2024. Stall+: Boosting llm-based repository-level code completion with static analysis.arXiv preprint arXiv:2406.10018(2024). , Vol. 1, No. 1, Article . Publication date: January 2026. In Line with Context: Repository-Level Code Generation via Context Inlining 21

  35. [35]

    Wei Liu, Ailun Yu, Daoguang Zan, Bo Shen, Wei Zhang, Haiyan Zhao, Zhi Jin, and Qianxiang Wang. 2024. Graphcoder: Enhancing repository-level code completion via code context graph-based retrieval and language model.arXiv preprint arXiv:2406.07003(2024)

  36. [36]

    Xiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang, Michael Qizhe Shieh, and Wenmeng Zhou. 2025. CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technolog...

  37. [37]

    Vadim Liventsev, Anastasiia Grishina, Aki Härmä, and Leon Moonen. 2023. Fully autonomous programming with large language models. InProceedings of the Genetic and Evolutionary Computation Conference. 1146–1155

  38. [38]

    Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. 2023. Wizardcoder: Empowering code large language models with evol-instruct.arXiv preprint arXiv:2306.08568(2023)

  39. [39]

    Yingwei Ma, Qingping Yang, Rongyu Cao, Binhua Li, Fei Huang, and Yongbin Li. 2025. Alibaba lingmaagent: Improving automated issue resolution via comprehensive repository exploration. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. 238–249

  40. [40]

    Jonathan I Maletic and Andrian Marcus. 2001. Supporting program comprehension using semantic and structural information. InProceedings of the 23rd International Conference on Software Engineering. ICSE 2001. IEEE, 103–112

  41. [41]

    Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2022. Codegen: An open large language model for code with multi-turn program synthesis.arXiv preprint arXiv:2203.13474 (2022)

  42. [42]

    Kristian B Ølgaard, Anders Logg, and Garth N Wells. 2009. Automated code generation for discontinuous Galerkin methods.SIAM Journal on Scientific Computing31, 2 (2009), 849–864

  43. [43]

    Kristian B Ølgaard and Garth N Wells. 2010. Optimizations for quadrature representations of finite element tensors through automated code generation.ACM Transactions on Mathematical Software (TOMS)37, 1 (2010), 1–23

  44. [44]

    OpenAI. 2025. Introducing GPT-5. https://openai.com/index/introducing-gpt-5/

  45. [45]

    Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao, Zhihan Zhang, Mengzhao Jia, Jiawei Han, Hongming Zhang, and Dong Yu. 2024. Repograph: Enhancing ai software engineering with repository-level code graph.arXiv preprint arXiv:2410.14684(2024)

  46. [46]

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318

  47. [47]

    Huy N Phan, Hoang N Phan, Tien N Nguyen, and Nghi DQ Bui. 2025. Repohyper: Search-expand-refine on semantic graphs for repository-level code completion. In2025 IEEE/ACM Second International Conference on AI Foundation Models and Software Engineering (Forge). IEEE, 14–25

  48. [48]

    Gordon D Plotkin. 2004. The origins of structural operational semantics.The Journal of Logic and Algebraic Programming 60 (2004), 3–15

  49. [49]

    Saurabh Pujar, Luca Buratti, Xiaojie Guo, Nicolas Dupuis, Burn Lewis, Sahil Suneja, Atin Sood, Ganesh Nalawade, Matt Jones, Alessandro Morari, et al. 2023. Automated code generation for information technology tasks in yaml through large language models. In2023 60th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–4

  50. [50]

    Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, et al. 2023. Code llama: Open foundation models for code.arXiv preprint arXiv:2308.12950 (2023)

  51. [51]

    Hitesh Sagtani, Rishabh Mehrotra, and Beyang Liu. 2024. Improving FIM Code Completions via Context and Curriculum Based Learning. arXiv:2412.16589 [cs.IR] https://arxiv.org/abs/2412.16589

  52. [52]

    Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schärli, and Denny Zhou

  53. [53]

    InInternational Conference on Machine Learning

    Large language models can be easily distracted by irrelevant context. InInternational Conference on Machine Learning. PMLR, 31210–31227

  54. [54]

    Yuling Shi, Yichun Qian, Hongyu Zhang, Beijun Shen, and Xiaodong Gu. 2025. LongCodeZip: Compress Long Context for Code Language Models.arXiv preprint arXiv:2510.00446(2025)

  55. [55]

    Yuling Shi, Songsong Wang, Chengcheng Wan, and Xiaodong Gu. 2024. From code to correctness: Closing the last mile of code generation with hierarchical debugging.arXiv preprint arXiv:2410.01215(2024)

  56. [56]

    Yuling Shi, Hongyu Zhang, Chengcheng Wan, and Xiaodong Gu. 2024. Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 51–62

  57. [57]

    Disha Shrivastava, Denis Kocetkov, Harm De Vries, Dzmitry Bahdanau, and Torsten Scholak. 2023. Repofusion: Training code models to understand your repository.arXiv preprint arXiv:2306.10998(2023). , Vol. 1, No. 1, Article . Publication date: January 2026. 22 Hu et al

  58. [58]

    Disha Shrivastava, Hugo Larochelle, and Daniel Tarlow. 2023. Repository-level prompt generation for large language models of code. InInternational Conference on Machine Learning. PMLR, 31693–31715

  59. [59]

    Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, and Yiqun Liu. 2024. DRAGIN: dynamic retrieval augmented generation based on the information needs of large language models.arXiv preprint arXiv:2403.10081(2024)

  60. [60]

    Rahul Vadisetty, Anand Polamarasetti, Sameerkumar Prajapati, Jinal Bhanubhai Butani, et al . 2023. Leveraging Generative AI for Automated Code Generation and Security Compliance in Cloud-Based DevOps Pipelines: A Review. A vailable at SSRN 5218298(2023)

  61. [61]

    Jicheng Wang, Yifeng He, and Hao Chen. 2024. RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation. arXiv:2409.13122 [cs.SE] https://arxiv.org/abs/2409.13122

  62. [62]

    Yanlin Wang, Yanli Wang, Daya Guo, Jiachi Chen, Ruikai Zhang, Yuchi Ma, and Zibin Zheng. 2024. Rlcoder: Reinforce- ment learning for repository-level code completion.arXiv preprint arXiv:2407.19487(2024)

  63. [63]

    Peiyang Wu, Nan Guo, Junliang Lv, Xiao Xiao, and Xiaochun Ye. 2025. RTLRepoCoder: Repository-Level RTL Code Completion through the Combination of Fine-Tuning and Retrieval Augmentation. arXiv:2504.08862 [cs.SE] https://arxiv.org/abs/2504.08862

  64. [64]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)

  65. [65]

    Hao Yu, Bo Shen, Dezhi Ran, Jiaxin Zhang, Qi Zhang, Yuchi Ma, Guangtai Liang, Ying Li, Qianxiang Wang, and Tao Xie. 2024. Codereval: A benchmark of pragmatic code generation with generative pre-trained models. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12

  66. [66]

    Wenhao Zeng, Yaoning Wang, Chao Hu, Yuling Shi, Chengcheng Wan, Hongyu Zhang, and Xiaodong Gu. 2025. Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal.arXiv preprint arXiv:2508.05988(2025)

  67. [67]

    Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen

  68. [68]

    InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

    RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2471–2484

  69. [69]

    Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, and Zhi Jin. 2024. Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges.arXiv preprint arXiv:2401.07339(2024)

  70. [70]

    Lei Zhang, Yunshui Li, Jiaming Li, Xiaobo Xia, Jiaxi Yang, Run Luo, Minzheng Wang, Longze Chen, Junhao Liu, Qiang Qu, and Min Yang. 2025. Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs.Proceedings of the AAAI Conference on Artificial Intelligence39, 24 (Apr. 2025), 25886–25894. https://doi.or...

  71. [71]

    Dan Zhao, Li Miao, Dafang Zhang, et al. 2015. Reusable function discovery by call-graph analysis.Journal of Software Engineering and Applications8, 04 (2015), 184

  72. [72]

    Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, and Xiang Yue. 2024. Opencodeinterpreter: Integrating code generation with execution and refinement.arXiv preprint arXiv:2402.14658 (2024)

  73. [73]

    Li Zhong, Zilong Wang, and Jingbo Shang. 2024. Debug like a human: A large language model debugger via verifying runtime execution step-by-step.arXiv preprint arXiv:2402.16906(2024). , Vol. 1, No. 1, Article . Publication date: January 2026