pith. machine review for the scientific record. sign in

arxiv: 2604.26523 · v1 · submitted 2026-04-29 · 💻 cs.SE

Recognition: unknown

RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates

Authors on Pith no claims yet

Pith reviewed 2026-05-07 11:29 UTC · model grok-4.3

classification 💻 cs.SE
keywords automatic documentation generationknowledge graphrepository analysisincremental updateslarge language modelssoftware maintenancemodule clustering
0
0 comments X

The pith

RepoDoc builds a repository knowledge graph to generate structured documentation and target updates only to changed code sections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

RepoDoc constructs a knowledge graph from a codebase to capture entities and their relationships as the basis for all documentation tasks. It clusters code into hierarchical modules and uses agents to query the graph for generating cross-referenced documents that include auto-created diagrams. The same graph supports incremental maintenance by tracing how changes propagate, so only affected documentation gets regenerated. A sympathetic reader would care because this addresses the common problem of documentation falling out of date in large, evolving projects while cutting the time and cost of producing it.

Core claim

RepoDoc extracts code entities and relationships into a repository knowledge graph, clusters related modules hierarchically, and deploys agents to produce modular, cross-referenced documentation with Mermaid diagrams. For updates, a bidirectional semantic impact propagation mechanism identifies every affected documentation fragment so that regeneration stays selective. On 24 repositories spanning eight languages, this yields 32.5 percent higher API coverage, 10.4 percent better completeness, three-times faster generation, and 85 percent fewer tokens than prior methods, with similar gains in update speed and recall.

What carries the argument

The repository knowledge graph (RepoKG) that extracts code entities and relationships, then supplies structured queries for module clustering, agent-based generation, and bidirectional change propagation.

If this is right

  • Generated documentation includes explicit cross-references and diagrams derived directly from the graph structure.
  • Only documentation tied to changed code paths needs regeneration, cutting update time by 73 percent and token use by 77 percent.
  • API coverage rises by 32.5 percent and overall completeness by 10.4 percent compared with existing LLM-based tools.
  • The same graph supports documentation tasks across eight programming languages without language-specific rewriting.
  • Update recall improves by 10.2 percent because propagation follows actual semantic dependencies rather than simple file diffs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The graph could serve as a reusable index for other repository tasks such as code search or impact analysis beyond documentation.
  • Projects using continuous integration could trigger targeted doc regeneration on every commit that touches the graph.
  • If extraction misses certain implicit relationships, such as runtime plugin loading, documentation gaps would persist even after updates.
  • Long-term maintenance of the graph itself may become a new cost center once the initial documentation is produced.

Load-bearing premise

The knowledge graph extracted from source code accurately and completely represents all semantically important entities and their dependencies.

What would settle it

Apply a code change that adds or removes a cross-module dependency not captured by the graph extraction step and observe whether the update mechanism still regenerates every affected documentation section.

Figures

Figures reproduced from arXiv: 2604.26523 by Dong Xu, Jianfeng Zhong, Mingwei Liu, Xiwen Wang, Zibin Zheng.

Figure 1
Figure 1. Figure 1: Baseline problems: Left: RepoAgent uses physical file structure with template-based documentation; Right: CodeWiki view at source ↗
Figure 2
Figure 2. Figure 2: RepoDoc architecture. Blue arrows: full generation workflow; Red arrows: incremental update workflow; Black view at source ↗
Figure 4
Figure 4. Figure 4: Clustering prompt and output example code entities. Concept Entity represents business concepts extracted via LLM analysis, capturing domain knowledge not explicit in code structure. Doc Entity stores generated documentation content in markdown format, enabling selective updates and cross-reference navigation between documents. The seven relationship types in￾clude calls (function invocations), implements … view at source ↗
Figure 5
Figure 5. Figure 5: Skill-based agent architecture Agent Orchestrator. The orchestrator comprises an LLM and a task router. The LLM serves as the brain, responsible for understand￾ing instructions, reasoning, and planning [13, 32]. The task router decomposes high-level plans from the LLM and dispatches tasks to appropriate skills. This design addresses the limitations of tradi￾tional single-agent approaches that rely on monol… view at source ↗
Figure 7
Figure 7. Figure 7: TQS box plot comparison Flask app Module Documentation 1. Module Overview Acts as the cognitive entry point for the reader, providing a high-level structural blueprint and defining the module's primary responsibilities. The visual architecture and component summaries help developers quickly mental-map where this specific piece fits within the broader system ecosystem. 2. API Reference 2.1 Class 2.2 Class A… view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison: CodeWiki (left) vs Re view at source ↗
read the original abstract

Maintaining up-to-date, comprehensive documentation for large codebases is a persistent challenge. Recent progress in automated documentation has moved from template-based rules to large language models (LLMs), yet existing tools still process source code as flat fragments, producing isolated documents that lack semantic structure. This design also leads to excessive token consumption and slow generation, while failing to capture how code changes propagate across dependencies. We propose RepoDoc, a system that uses a repository knowledge graph (RepoKG) as the semantic foundation for the entire documentation lifecycle. Our framework consists of three stages: (1) RepoKG construction, which extracts code entities and their relationships; (2) module clustering, which groups code into functionally cohesive, hierarchical units; and (3) skillful agent-based generation, which queries the graph to create modular, cross-referenced documentation with auto-generated Mermaid diagrams. For incremental maintenance, a semantic impact propagation mechanism navigates the RepoKG bidirectionally to pinpoint all affected parts, allowing selective, targeted regeneration. Evaluated on 24 repositories across 8 programming languages, RepoDoc substantially outperforms state-of-the-art alternatives. It improves API coverage by 32.5% and completeness by 10.4%, while generating documentation 3x faster with 85% fewer tokens. For incremental updates, it cuts update time by 73% and token usage by 77%, and achieves 10.2% higher update recall, more accurately reflecting code changes in the regenerated documentation. The source code and experimental artifacts are available at https://github.com/SYSUSELab/RepoDoc.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces RepoDoc, a framework that constructs a repository knowledge graph (RepoKG) to support the full documentation lifecycle. It proceeds in three stages—RepoKG extraction of entities and relations, hierarchical module clustering, and agent-based generation that produces cross-referenced documents with Mermaid diagrams—plus a bidirectional semantic impact propagation mechanism for incremental updates. On 24 repositories spanning 8 languages the system is reported to improve API coverage by 32.5 % and completeness by 10.4 %, generate documentation 3× faster with 85 % fewer tokens, and for updates reduce time by 73 %, token usage by 77 %, and raise recall by 10.2 %.

Significance. If the empirical results hold under rigorous controls, the work offers a concrete, graph-centric alternative to flat LLM prompting for documentation tasks. The public release of code and artifacts is a clear strength that enables direct inspection of the extraction, clustering, and propagation components.

major comments (3)
  1. [§4] §4 (Evaluation): The abstract and results section report aggregate improvements over “state-of-the-art alternatives” but provide no explicit list of the chosen baselines, their versions, or the rationale for their selection. Without this information the 32.5 % coverage and 10.4 % completeness gains cannot be interpreted as a fair comparison.
  2. [§4.2] §4.2 (Incremental-update experiments): The 73 % time and 77 % token reductions are presented without statistical significance tests, confidence intervals, or controls for repository size and language. The 10.2 % recall improvement is therefore difficult to assess for robustness.
  3. [§3.1] §3.1 (RepoKG construction): The claim that the extracted graph “accurately and completely captures all semantically relevant code entities” is central to both generation and propagation claims, yet no quantitative validation (e.g., precision/recall against a manually annotated gold graph) is reported.
minor comments (2)
  1. [§3.3] Figure 3 and the accompanying text use “semantic impact propagation” without a formal definition or pseudocode; a concise algorithm box would improve clarity.
  2. [§3.3] The paper states that documentation is generated “with auto-generated Mermaid diagrams” but does not report any metric for diagram correctness or readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight opportunities to improve the clarity and rigor of our evaluation and claims. We address each major comment point by point below.

read point-by-point responses
  1. Referee: [§4] §4 (Evaluation): The abstract and results section report aggregate improvements over “state-of-the-art alternatives” but provide no explicit list of the chosen baselines, their versions, or the rationale for their selection. Without this information the 32.5 % coverage and 10.4 % completeness gains cannot be interpreted as a fair comparison.

    Authors: We agree that explicit details on the baselines are necessary for a fair and reproducible comparison. In the revised manuscript we will add a dedicated paragraph in §4 that enumerates the specific state-of-the-art alternatives, their versions, and the selection rationale (prominence in recent LLM-based documentation work and relevance to flat-prompting approaches). A summary table will also be included to facilitate direct assessment of the reported 32.5 % coverage and 10.4 % completeness gains. revision: yes

  2. Referee: [§4.2] §4.2 (Incremental-update experiments): The 73 % time and 77 % token reductions are presented without statistical significance tests, confidence intervals, or controls for repository size and language. The 10.2 % recall improvement is therefore difficult to assess for robustness.

    Authors: We acknowledge the lack of statistical controls in the incremental-update results. We will extend §4.2 to report paired statistical significance tests with p-values, 95 % confidence intervals for all key metrics, and additional subgroup analyses stratified by repository size and language. These additions will strengthen the evidence for the 73 % time reduction, 77 % token savings, and 10.2 % recall improvement. revision: yes

  3. Referee: [§3.1] §3.1 (RepoKG construction): The claim that the extracted graph “accurately and completely captures all semantically relevant code entities” is central to both generation and propagation claims, yet no quantitative validation (e.g., precision/recall against a manually annotated gold graph) is reported.

    Authors: We recognize that a direct precision/recall evaluation against a manually annotated gold graph would be desirable. However, constructing such gold standards for 24 repositories spanning eight languages and thousands of entities is not practically feasible. We will revise §3.1 to moderate the claim, explicitly stating that graph quality is supported indirectly by the consistent downstream gains in documentation coverage, completeness, and update recall. This indirect validation is standard when exhaustive component-level annotation is intractable. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external evaluation

full rationale

The paper proposes a three-stage framework (RepoKG construction, module clustering, agent-based generation) plus an incremental propagation mechanism, then reports concrete performance deltas from running the system on 24 repositories across 8 languages. No equations, fitted parameters, or first-principles derivations are presented whose outputs are shown to be equivalent to their inputs by construction. The central claims are statistical comparisons against external baselines; the released code and artifacts make the evaluation independently inspectable. No self-citation is invoked as a load-bearing uniqueness theorem or ansatz. This is the normal, non-circular case for an applied systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The framework rests on domain assumptions about accurate entity extraction and clustering quality rather than new physical entities or fitted constants; no free parameters are explicitly introduced in the abstract.

axioms (2)
  • domain assumption Code entities and their relationships can be reliably extracted from source code to form a complete repository knowledge graph.
    Invoked in the first stage of RepoKG construction.
  • domain assumption Module clustering produces functionally cohesive hierarchical units suitable for modular documentation.
    Required for the second stage to enable cross-referenced output.
invented entities (2)
  • RepoKG no independent evidence
    purpose: Semantic foundation for the entire documentation lifecycle
    Newly proposed structure that stores code entities and relationships.
  • semantic impact propagation mechanism no independent evidence
    purpose: Bidirectional navigation to identify all documentation affected by a code change
    Introduced to enable selective incremental regeneration.

pith-pipeline@v0.9.0 · 5600 in / 1538 out tokens · 48201 ms · 2026-05-07T11:29:17.332573+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 31 canonical work pages · 11 internal anchors

  1. [1]

    Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang

  2. [2]

    arXiv:2005.00653 [cs.SE] https://arxiv.org/abs/2005.00653

    A Transformer-based Approach for Source Code Summarization. arXiv:2005.00653 [cs.SE] https://arxiv.org/abs/2005.00653

  3. [3]

    Barr, Premkumar Devanbu, and Charles Sut- ton

    Miltiadis Allamanis, Earl T. Barr, Premkumar Devanbu, and Charles Sut- ton. 2018. A Survey of Machine Learning for Big Code and Naturalness. arXiv:1709.06182 [cs.SE] https://arxiv.org/abs/1709.06182

  4. [4]

    Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A Convo- lutional Attention Network for Extreme Summarization of Source Code. RepoDoc : Documentation via Repository Knowledge Graph arXiv:1602.03001 [cs.LG] https://arxiv.org/abs/1602.03001

  5. [5]

    Egor Bogomolov, Aleksandra Eliseeva, Timur Galimzyanov, Evgeniy Glukhov, Anton Shapkin, Maria Tigina, Yaroslav Golubev, Alexander Kovrigin, Arie van Deursen, Maliheh Izadi, and Timofey Bryksin. 2024. Long Code Arena: a Set of Benchmarks for Long-Context Code Models. arXiv:2406.11612 [cs.LG] https: //arxiv.org/abs/2406.11612

  6. [6]

    Yujia Chen, Xiaoxue Ren, Cuiyun Gao, Yun Peng, Xin Xia, and Michael R. Lyu

  7. [7]

    arXiv:2208.01971 [cs.SE] https://arxiv.org/abs/2208.01971

    API Usage Recommendation via Multi-View Heterogeneous Graph Repre- sentation Learning. arXiv:2208.01971 [cs.SE] https://arxiv.org/abs/2208.01971

  8. [8]

    Giuseppe Crupi, Rosalia Tufano, Alejandro Velasco, Antonio Mastropaolo, Denys Poshyvanyk, and Gabriele Bavota. 2024. On the Effectiveness of LLM-as-a- Judge for Code Generation and Summarization.IEEE Transactions on Software Engineering(2024)

  9. [9]

    J. R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus. 2014. Fine- grained and accurate source code differencing. (2014), 313–324

  10. [10]

    Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv:2002.08155

  11. [11]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    L. Gao and et al. 2023. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997

  12. [12]

    Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering(Gothen- burg, Sweden)(ICSE ’18). Association for Computing Machinery, New York, NY, USA, 933–944. doi:10.1145/3180155.3180167

  13. [13]

    Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, and Ming Zhou. 2021. GraphCodeBERT: Pre-training Code Representations with Data Flow. arXiv:2009.08366 [cs.SE] https://arxiv.org/abs/2009.08366

  14. [14]

    Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wenfeng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Pro- gramming – The Rise of Code Intelligence. arXiv:2401.14196 [cs.SE] https: //arxiv.org/abs/2401.14196

  15. [15]

    Sirui Hong, Mingchen Zhuge, Jiaqi Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. 2024. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. arXiv:2308.00352 [cs.AI] https://arxiv.org/abs/2308.00352

  16. [16]

    Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review. arXiv:2308.10620 [cs.SE] https://arxiv.org/abs/2308.10620

  17. [17]

    Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2020. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv:1909.09436 [cs.LG] https://arxiv.org/abs/1909.09436

  18. [18]

    Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Katrin Erk and Noah A. Smith (Eds.). Association for Computational Linguistics, Berlin, Germany, 2073–2083. do...

  19. [19]

    Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2026. A Survey on Large Language Models for Code Generation. 35, 2, Article 58 (Jan. 2026), 72 pages. doi:10.1145/3747588

  20. [20]

    Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Loge...

  21. [21]

    Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understan...

  22. [22]

    Qinyu Luo, Yining Ye, Shihao Liang, Zhong Zhang, Yujia Qin, Yaxi Lu, Yesai Wu, Xin Cong, Yankai Lin, Yingli Zhang, Xiaoyin Che, Zhiyuan Liu, and Maosong Sun

  23. [23]

    Repoagent: An llm-powered open-source framework for repository-level code documentation generation.arXiv preprint arXiv:2402.16667, 2024

    RepoAgent: An LLM-Powered Open-Source Framework for Repository- level Code Documentation Generation. arXiv:2402.16667 [cs.CL] https://arxiv. org/abs/2402.16667

  24. [24]

    CodeWiki: Evaluating AI's Ability to Generate Holistic Documentation for Large-Scale Codebases

    V. Makharev and V. Ivanov. 2025. CodeWiki: Evaluating AI’s Ability to Generate Holistic Documentation for Large-Scale Codebases. arXiv:2510.24428

  25. [25]

    Pham, Jafar M

    Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based mining of multiple object usage patterns. InProceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering(Amsterdam, The Netherlands)(ESEC/FSE ’09). Assoc...

  26. [26]

    Xin Peng, Yifan Zhao, Mingwei Liu, Fengyi Zhang, Yang Liu, Xin Wang, and Zhenchang Xing. 2018. Automatic Generation of API Documentations for Open- Source Projects. 7–8. doi:10.1109/DySDoc3.2018.00010

  27. [27]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv:1908.10084 [cs.CL] https://arxiv.org/abs/ 1908.10084

  28. [28]

    B. G. Ryder and F. Tip. 2001. Change impact analysis for object-oriented programs. (2001), 46–53

  29. [29]

    Vijay- Shanker

    Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K. Vijay- Shanker. 2010. Towards automatically generating summary comments for Java methods. InProceedings of the 25th IEEE/ACM International Conference on Au- tomated Software Engineering(Antwerp, Belgium)(ASE ’10). Association for Computing Machinery, New York, NY, USA, 43–52. doi:10.11...

  30. [30]

    Eshkevari, Davood Mazinanian, and Danny Dig

    Nikolaos Tsantalis, Matin Mansouri, Laleh M. Eshkevari, Davood Mazinanian, and Danny Dig. 2018. Accurate and efficient refactoring detection in commit history. InProceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden)(ICSE ’18). Association for Computing Machinery, New York, NY, USA, 483–494. doi:10.1145/3180155.3180206

  31. [31]

    F. Wang, J. Liu, B. Liu, T. Qian, Y. Xiao, and Z. Peng. 2020. Survey on construction of code knowledge graph and intelligent software development.Journal of Software31, 1 (2020), 47–66

  32. [32]

    Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Under- standing and Generation. arXiv:2109.00859 [cs.CL] https://arxiv.org/abs/2109. 00859

  33. [33]

    X. Xia, L. Bao, D. Lo, Z. Xing, A. E. Hassan, and S. Li. 2017. Measuring program comprehension: A large-scale field study with professionals.IEEE Transactions on Software Engineering44, 10 (2017), 951–976

  34. [34]

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

    John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. 2024. SWE-agent: Agent-Computer In- terfaces Enable Automated Software Engineering. arXiv:2405.15793 [cs.SE] https://arxiv.org/abs/2405.15793

  35. [35]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL] https://arxiv.org/abs/2210.03629

  36. [36]

    Chunyan Zhang, Junchao Wang, Qinglei Zhou, Ting Xu, Ke Tang, Hairen Gui, and Fudong Liu. 2022. A Survey of Automatic Source Code Summarization. Symmetry14, 3 (2022). doi:10.3390/sym14030471

  37. [37]

    Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, and Weizhu Chen. 2023. RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Genera- tion. arXiv:2303.12570 [cs.CL] https://arxiv.org/abs/2303.12570

  38. [38]

    Ziyin Zhang, Chaoyu Chen, Bingchang Liu, Cong Liao, Zi Gong, Hang Yu, Jianguo Li, and Rui Wang. 2024. Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code. arXiv:2311.07989 [cs.CL] https://arxiv.org/abs/2311.07989

  39. [39]

    Yuwei Zhao, Ziyang Luo, Yuchen Tian, Weixiang Yan, Annan Li, and Jing Ma

  40. [40]

    InFindings of the Association for Computational Linguistics: ACL 2024

    CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?. InFindings of the Association for Computational Linguistics: ACL 2024

  41. [41]

    Xing, et al

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, et al . 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. InAdvances in Neural Information Processing Systems (NeurIPS)

  42. [42]

    Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective Vulnerability Identification by Learning Comprehensive Pro- gram Semantics via Graph Neural Networks. arXiv:1909.03496 [cs.SE] https: //arxiv.org/abs/1909.03496