pith. sign in

arxiv: 2605.16046 · v2 · pith:SIM5TJCMnew · submitted 2026-05-15 · 💻 cs.SE · cs.AI

XSearch: Explainable Code Search via Concept-to-Code Alignment

Pith reviewed 2026-07-04 00:53 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords code searchexplainable retrievalconcept alignmentout-of-distribution generalizationsemantic code searchGraphCodeBERT
0
0 comments X

The pith

XSearch reframes code search as explicit alignment between query concepts and code statements, delivering both explanations and 15-fold gains on out-of-distribution data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Semantic code search usually embeds queries and code into a shared vector space and ranks by similarity, yet this inductive approach produces opaque results that fail to generalize when benchmarks change. XSearch instead extracts functional concepts from the query and aligns each one directly to matching statements inside candidate code snippets. The alignment step supplies concept-level explanations for every retrieval and discourages the model from relying on spurious statistical patterns. Trained on CodeSearchNet with a 125-million-parameter GraphCodeBERT encoder, the method raises out-of-distribution retrieval performance from 0.02 to 0.33 while outperforming both encoder and decoder baselines that reach 7 billion parameters. A user study confirms that the resulting explanations let people judge result correctness faster and more accurately.

Core claim

By treating code search as a deductive concept-alignment task rather than global embedding similarity, XSearch identifies functional concepts in the query and matches them explicitly to code statements; this design yields intrinsic explanations and removes the shortcut learning responsible for poor out-of-distribution generalization.

What carries the argument

Explicit concept-alignment training objective that forces the encoder to match individual query concepts to specific code statements instead of relying on whole-snippet embeddings.

If this is right

  • Retrieval decisions become traceable to specific concept-statement pairs rather than opaque vector distances.
  • The same model avoids learning spurious correlations that collapse when query or code distributions shift.
  • Users receive built-in explanations that let them verify functional match without inspecting entire snippets.
  • Performance gains hold across encoder and decoder architectures even when competing models are orders of magnitude larger.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same deductive alignment pattern could be applied to other retrieval settings where embedding shortcuts currently limit generalization.
  • Replacing the concept extractor with a larger language model might further reduce the remaining gap between in-distribution and out-of-distribution accuracy.
  • The explicit matching step opens the possibility of human-in-the-loop refinement by editing the extracted concepts before retrieval.

Load-bearing premise

Functional concepts can be identified reliably from the query and their explicit alignment with code statements will both produce faithful explanations and block shortcut learning.

What would settle it

An out-of-distribution test set in which either concept identification from queries becomes unreliable or the alignment model still retrieves code that satisfies statistical patterns but violates stated functional requirements.

Figures

Figures reproduced from arXiv: 2605.16046 by Linpeng Huang, Pengnian Qi, Qianxiang Wang, Ruofan Liu, Weinan Zhang, Weiyu Kong, Xiao Cheng, Yiming Liu, Yun Lin, Zicong Zhang.

Figure 1
Figure 1. Figure 1: A CodeBERT retriever trained on CodeSearchNet exhibits shortcut learning, relying on leading tokens [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: XSearch Model Architecture. During training (LHS), concept-aligned query and code token spans form positive pairs, while non-aligned spans form negative pairs. A shared encoder maps query and code tokens to contextual embeddings. For code tokens, AST type embeddings are added to incorporate structural information. A linear probing head predicts concept-bearing tokens at the token level, and an alignment ob… view at source ↗
Figure 3
Figure 3. Figure 3: LLM-assisted Annotation Pipeline. • Stage 1: Concept Label Augmentation (Section 3.2). We annotate essential concepts in queries and identify the corresponding code units that implement each concept. This stage produces token-level query-code alignment annotations. • Stage 2: Alignment-Aware Model Training (Section 3.3). Using the annotated alignments, we train an alignment-aware retrieval model that joint… view at source ↗
Figure 4
Figure 4. Figure 4: Hyperparameter sweep on the CodeSearchNet-python [ [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Failure Case of CoCoSoDa [69]. Baseline Qwen2.5-Coder-7B Prediction Query python break up a string into dictionaries Retrieved Top 1 Code (Score = 0.66) for word in string.split(' '): print(word, end=' ') Ground Truth Code (Rank > 5) def string_to_dict(string): list_of_entries = string.split(',') list_of_split_entries = map(lambda e: e.split('=‘), list_of_entries) return dict(list_of_split_entries) XSearch… view at source ↗
Figure 6
Figure 6. Figure 6: Failure Case of Qwen2.5-Coder-7B-lt-SupCon-CSN [ [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Failure Case of XSearch. Left: Retrieved Top-1 code for original query. Middle: Ground-truth code for original query, where the query includes an extra concept (“relevant SQL info”) not reflected in the code. Right: Refined query (replacing the extra concept with “relevant query AST”) and the ground-truth code. retrieval, where a sequence is represented by the average over all token embeddings [10]. Under … view at source ↗
Figure 8
Figure 8. Figure 8: Alignment and Highlight Performance Across Languages. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The query asks for an in-place merge of two dictionaries and overwrites existing keys in the base [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
read the original abstract

Semantic code search has been widely adopted in both academia and industry. These approaches embed natural-language queries and code snippets into a shared embedding space and retrieve results based on vector similarity. Despit strong performance on benchmark datasets, they often suffer from poor explainability and generalization. Retrieved code may appear semantically similar yet miss critical functional requirements of the query, while providing no explanation of why the result was retrieved. Moreover, such failures become more severe under distribution shift, where models struggle to generalize to unseen benchmarks. In this work, we propose XSearch, an intrinsically explainable code search framework. Our key insight is that by relying on global embedding similarity, existing retrievers inherently take an inductive view. They learn statistical patterns rather than truly understanding the query's functional requirements. We address this problem by reformulating code search as a deductive concept alignment problem. XSearch (i) identifies functional concepts in the query and (ii) explicitly aligns them with corresponding code statements. This explain-then-predict design produces inherent concept-level explanations and mitigates shortcut learning that harms out-of-distribution generalization. We train an encoder with explicit concept-alignment objectives and perform retrieval through explicit matching between query concepts and code statements. Experiments show that, trained on CodeSearchNet using GraphCodeBERT (125M parameters), XSearch improves performance on out-of-distribution benchmarks from 0.02 to 0.33 (15x) over eight state-of-the-art retrievers, and consistently outperforms both encoder- and decoder-based baselines with up to 7B parameters. A user study demonstrates that concept-alignment explanations enable users to evaluate retrieved results faster and more accurately.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes XSearch, a code search framework that reformulates semantic code search as a deductive concept alignment task: functional concepts are identified from natural-language queries and explicitly aligned to corresponding code statements. Trained on CodeSearchNet with GraphCodeBERT (125M params), it claims a 15x OOD performance lift (0.02→0.33) over eight baselines, consistent outperformance of encoder/decoder models up to 7B params, inherent concept-level explanations, and faster/more accurate user evaluation via a user study.

Significance. If the OOD gains can be causally attributed to the explicit alignment objective rather than other modeling choices, the work would be significant for improving robustness and explainability in code retrieval under distribution shift. The deductive reformulation and user-study component address real limitations of embedding-based retrievers.

major comments (3)
  1. [Experiments] Experiments section: the reported OOD gains (0.02 to 0.33) over eight retrievers are not accompanied by ablations that hold architecture, training data, and retrieval procedure fixed while removing only the concept-alignment objective; without such controls the performance delta cannot be attributed to the claimed mechanism.
  2. [Method] Method section (concept identification): the process for reliably extracting functional concepts from queries is not described in sufficient detail, nor is any human or automated validation of concept accuracy provided; this is load-bearing for both the explainability claim and the assertion that alignment mitigates shortcut learning.
  3. [User study] User study section: the study design, participant count, task protocol, and statistical tests supporting the claim of faster and more accurate result evaluation are not reported, preventing assessment of whether the concept-alignment explanations deliver the stated practical benefit.
minor comments (2)
  1. [Abstract] Abstract: typo 'Despit' should be 'Despite'.
  2. [Method] The alignment loss function and the precise procedure for explicit concept-to-statement matching during retrieval should be formalized with equations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments identify key areas where additional controls, detail, and reporting will strengthen the manuscript. We address each point below and commit to revisions where appropriate.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: the reported OOD gains (0.02 to 0.33) over eight retrievers are not accompanied by ablations that hold architecture, training data, and retrieval procedure fixed while removing only the concept-alignment objective; without such controls the performance delta cannot be attributed to the claimed mechanism.

    Authors: We agree that a controlled ablation isolating only the concept-alignment objective is necessary to strengthen causal attribution. Our existing comparisons span multiple architectures and scales, but do not include the exact within-model ablation requested. We will add this experiment in the revision: the same GraphCodeBERT backbone will be trained on CodeSearchNet with and without the alignment loss (holding data, optimizer, and retrieval procedure fixed) and evaluated on the OOD benchmarks. We have run preliminary versions of this ablation internally and observed a clear performance drop without alignment; the full results and analysis will be included. revision: yes

  2. Referee: [Method] Method section (concept identification): the process for reliably extracting functional concepts from queries is not described in sufficient detail, nor is any human or automated validation of concept accuracy provided; this is load-bearing for both the explainability claim and the assertion that alignment mitigates shortcut learning.

    Authors: Section 3.2 describes the concept extraction pipeline, which combines a fine-tuned sequence labeling model with post-processing rules derived from dependency parses. We acknowledge that the current description lacks sufficient implementation detail and validation evidence. In the revised manuscript we will expand this section with pseudocode, concrete query examples, and a dedicated validation subsection reporting automated checks plus a human evaluation on 300 held-out queries (precision, recall, and inter-annotator agreement). revision: yes

  3. Referee: [User study] User study section: the study design, participant count, task protocol, and statistical tests supporting the claim of faster and more accurate result evaluation are not reported, preventing assessment of whether the concept-alignment explanations deliver the stated practical benefit.

    Authors: The user-study protocol, participant demographics (n=24), task design, and statistical analysis are currently only summarized in the main text and detailed in Appendix D. We agree this is insufficient. We will move the full design description, exact task instructions, timing and accuracy metrics, and the results of the paired statistical tests into the main User Study section, with a clear reference to the appendix for supplementary materials. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation; reformulation is conceptual, not reductive

full rationale

The paper's central move is a conceptual reframing of code search from inductive embedding similarity to deductive concept-to-statement alignment, implemented via an encoder trained with explicit alignment objectives on CodeSearchNet. No equations, training objectives, or fitted parameters are presented that would allow a prediction to reduce to its own inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked as load-bearing premises. Experimental gains (0.02→0.33 OOD) are reported as measured outcomes rather than derived quantities. The derivation chain is therefore self-contained against external benchmarks and contains no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on two domain assumptions whose independent support is not shown in the abstract: that functional concepts are identifiable and that explicit alignment prevents shortcut learning. No free parameters or invented entities are described.

axioms (2)
  • domain assumption Functional concepts can be reliably identified from natural-language queries
    Required for the identify-and-align pipeline described in the abstract
  • ad hoc to paper Explicit concept-to-statement alignment mitigates shortcut learning under distribution shift
    Stated as the key insight that solves the generalization problem

pith-pipeline@v0.9.1-grok · 5853 in / 1430 out tokens · 33194 ms · 2026-07-04T00:53:47.815771+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

93 extracted references · 37 canonical work pages · 13 internal anchors

  1. [1]

    Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization.arXiv preprint arXiv:2005.00653(2020)

  2. [2]

    Anonymous. 2025. XSearch. https://sites.google.com/view/xai-search/home Accessed: 2025-03-15

  3. [3]

    Suborno Deb Bappon, Saikat Mondal, and Banani Roy. 2024. AUTOGENICS: Automated Generation of Context-Aware Inline Comments for Code Snippets on Programming Q&A Sites Using LLM. In2024 IEEE International Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 24–35

  4. [4]

    Kaj Bostrom, Harsh Jhamtani, Hao Fang, Sam Thomson, Richard Shin, Patrick Xia, Benjamin Van Durme, Jason Eisner, and Jacob Andreas. 2024. Language-to-Code Translation with a Single Labeled Example. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 8101–8112

  5. [5]

    Wing-Kwan Chan, Hong Cheng, and David Lo. 2012. Searching connected API subgraph via text phrases. InProceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. 1–11

  6. [6]

    Gong Chen, Xiaoyuan Xie, Daniel Tang, Qi Xin, and Wenjie Liu. 2024. HedgeCode: A Multi-Task Hedging Contrastive Learning Framework for Code Search. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 89–100

  7. [7]

    Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, and David Lo. 2024. Code Search is All You Need? Improving Code Suggestions with Code Search. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13

  8. [8]

    Mouxiang Chen, Hao Tian, Zhongxin Liu, Xiaoxue Ren, and Jianling Sun. 2024. Jumpcoder: Go beyond autoregressive coder via online modification.arXiv preprint arXiv:2401.07870(2024)

  9. [9]

    Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation.Advances in neural information processing systems31 (2018)

  10. [10]

    Yuxuan Chen, Guangsheng Ou, Mingwei Liu, Yanlin Wang, and Zibin Zheng. 2024. Are Decoder-Only Large Language Models the Silver Bullet for Code Search?arXiv preprint arXiv:2410.22240(2024)

  11. [11]

    Zhenlong Dai, Chang Yao, WenKang Han, Ying Yuan, Zhipeng Gao, and Jingyuan Chen. 2024. Mpcoder: Multi-user personalized code generator with explicit and implicit style representation learning.arXiv preprint arXiv:2406.17255 (2024)

  12. [12]

    Luca Di Grazia and Michael Pradel. 2023. Code search: A survey of techniques for finding code.Comput. Surveys55, 11 (2023), 1–31

  13. [13]

    Aleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov, Yaroslav Golubev, Danny Dig, and Timofey Bryksin. 2023. From commit message generation to history-aware commit message completion. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 723–735

  14. [14]

    Guodong Fan, Shizhan Chen, Cuiyun Gao, Jianmao Xiao, Tao Zhang, and Zhiyong Feng. 2024. Rapid: Zero-shot domain adaptation for code search with pre-trained models.ACM Transactions on Software Engineering and Methodology33, 5 (2024), 1–35

  15. [15]

    Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1469–1481

  16. [16]

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al . 2020. Codebert: A pre-trained model for programming and natural languages.arXiv preprint arXiv:2002.08155(2020)

  17. [17]

    Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. 2022. VulRepair: a T5-based automated software vulnerability repair. InProceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering. 935–947

  18. [18]

    Jing Gong, Yanghui Wu, Linxi Liang, Zibin Zheng, and Yanlin Wang. 2024. CoSQA+: Enhancing Code Search Dataset with Matching Code.arXiv preprint arXiv:2406.11589(2024)

  19. [19]

    Wenchao Gu, Zongyi Lyu, Yanlin Wang, Hongyu Zhang, Cuiyun Gao, and Michael R. Lyu. 2025. SPENCER: Self- Adaptive Model Distillation for Efficient Code Retrieval.ACM Transactions on Software Engineering and Methodology (2025)

  20. [20]

    Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Michael Lyu. 2022. Accelerating code search with deep hashing and code classification. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2534–2544. 20 Liu et al

  21. [21]

    Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. InProceedings of the 40th international conference on software engineering. 933–944

  22. [22]

    Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. Unixcoder: Unified cross-modal pre-training for code representation.arXiv preprint arXiv:2203.03850(2022)

  23. [23]

    Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training code representations with data flow.arXiv preprint arXiv:2009.08366 (2020)

  24. [24]

    Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming–The Rise of Code Intelligence. arXiv preprint arXiv:2401.14196(2024)

  25. [25]

    Vincent J Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2019. Global relational models of source code. InInternational conference on learning representations

  26. [26]

    Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, and Dawn Song. 2020. Pretrained transformers improve out-of-distribution robustness.arXiv preprint arXiv:2004.06100(2020)

  27. [27]

    Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet. 2014. NL-based query refinement and contextu- alized code search results: A user study. In2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). IEEE, 34–43

  28. [28]

    Baizhou Huang, Shuai Lu, Weizhu Chen, Xiaojun Wan, and Nan Duan. 2023. Enhancing large language models in coding through multi-perspective self-consistency.arXiv preprint arXiv:2309.17272(2023)

  29. [29]

    Yufan Huang, Mengnan Qi, Yongqiang Yao, Maoquan Wang, Bin Gu, Colin Clement, and Neel Sundaresan. 2023. Program translation via code distillation.arXiv preprint arXiv:2310.11476(2023)

  30. [30]

    Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. 2024. Qwen2.5-coder technical report.arXiv preprint arXiv:2409.12186(2024)

  31. [31]

    Faria Huq, Masum Hasan, Md Mahim Anjum Haque, Sazan Mahbub, Anindya Iqbal, and Toufique Ahmed. 2022. Review4repair: Code review aided automatic program repairing.Information and Software Technology143 (2022), 106765

  32. [32]

    Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search.arXiv preprint arXiv:1909.09436(2019)

  33. [33]

    Jeevana Priya Inala, Chenglong Wang, Mei Yang, Andres Codas, Mark Encarnación, Shuvendu Lahiri, Madanlal Musuvathi, and Jianfeng Gao. 2022. Fault-aware neural code rankers.Advances in Neural Information Processing Systems35 (2022), 13419–13432

  34. [34]

    Chen Ji, Su Yang, Hongyu Sun, and Yuqing Zhang. 2024. Applying Contrastive Learning to Code Vulnerability Type Classification. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 11942–11952

  35. [35]

    Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, et al. 2024. aixcoder-7b: A lightweight and effective large language model for code completion.arXiv e-prints(2024), arXiv–2410

  36. [36]

    Zhonghao Jiang, Xiaoxue Ren, Meng Yan, Wei Jiang, Yong Li, and Zhongxin Liu. 2025. CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching.arXiv preprint arXiv:2503.22424(2025)

  37. [37]

    Tae-Hwan Jung. 2021. Commitbert: Commit message generation using pre-trained programming language model. arXiv preprint arXiv:2105.14242(2021)

  38. [38]

    Sungmin Kang, Louis Milliken, and Shin Yoo. 2024. Identifying inaccurate descriptions in llm-generated code comments via test execution.arXiv preprint arXiv:2406.14836(2024)

  39. [39]

    Kisub Kim, Dongsun Kim, Tegawendé F Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: a code-to-code search engine. InProceedings of the 40th International Conference on Software Engineering. 946–957

  40. [40]

    Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages.arXiv preprint arXiv:2006.03511(2020)

  41. [41]

    Haochen Li, Chunyan Miao, Cyril Leung, Yanxian Huang, Yuan Huang, Hongyu Zhang, and Yanlin Wang. 2022. Exploring representation-level augmentation for code search.arXiv preprint arXiv:2210.12285(2022)

  42. [42]

    Jiawei Li, David Faragó, Christian Petrov, and Iftekhar Ahmed. 2025. Optimization is Better than Generation: Optimizing Commit Message Leveraging Human-written Commit Message.arXiv preprint arXiv:2501.09861(2025)

  43. [43]

    Lingwei Li, Li Yang, Huaxi Jiang, Jun Yan, Tiejian Luo, Zihan Hua, Geng Liang, and Chun Zuo. 2022. AUGER: automatically generating review comments with pre-training models. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1009–1021

  44. [44]

    Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023. Starcoder: may the source be with you!arXiv preprint arXiv:2305.06161 (2023)

  45. [45]

    Wen-Ding Li and Kevin Ellis. 2025. Is programming by example solved by llms?Advances in Neural Information Processing Systems37 (2025), 44761–44790. XSearch: Explainable Code Search via Concept-to-Code Alignment 21

  46. [46]

    Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, and Nan Duan. 2022. Coderetriever: A large scale contrastive pre-training method for code search. InProceedings of the 2022 conference on empirical methods in natural language processing. 2898–2910

  47. [47]

    Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. Sysevr: A framework for using deep learning to detect software vulnerabilities.IEEE Transactions on Dependable and Secure Computing19, 4 (2021), 2244–2258

  48. [48]

    Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection.arXiv preprint arXiv:1801.01681(2018)

  49. [49]

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988

  50. [50]

    Chao Liu, Xin Xia, David Lo, Cuiyun Gao, Xiaohu Yang, and John Grundy. 2021. Opportunities and challenges in code search tools.ACM Computing Surveys (CSUR)54, 9 (2021), 1–40

  51. [51]

    Wei Liu, Ailun Yu, Daoguang Zan, Bo Shen, Wei Zhang, Haiyan Zhao, Zhi Jin, and Qianxiang Wang. 2024. Graphcoder: Enhancing repository-level code completion via code context graph-based retrieval and language model.arXiv preprint arXiv:2406.07003(2024)

  52. [52]

    Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo, and Yucong Duan. 2015. Query expansion via wordnet for effective code search. In2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 545–549

  53. [53]

    Fei Lv, Hongyu Zhang, Jian-guang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. 2015. Codehow: Effective code search based on api understanding and extended boolean model (e). In2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 260–270

  54. [54]

    Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other.The annals of mathematical statistics(1947), 50–60

  55. [55]

    Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: finding relevant functions and their usage. InProceedings of the 33rd International Conference on Software Engineering. 111–120

  56. [56]

    Microsoft. 2021. GraphCodeBERT model. https://huggingface.co/microsoft/graphcodebert-base

  57. [57]

    Ansong Ni, Srini Iyer, Dragomir Radev, Veselin Stoyanov, Wen-tau Yih, Sida Wang, and Xi Victoria Lin. 2023. Lever: Learning to verify language-to-code generation with execution. InInternational Conference on Machine Learning. PMLR, 26106–26128

  58. [58]

    Liming Nie, He Jiang, Zhilei Ren, Zeyi Sun, and Xiaochen Li. 2016. Query expansion based on crowd knowledge for code search.IEEE Transactions on Services Computing9, 5 (2016), 771–783

  59. [59]

    Yu Nong, Rainy Sharma, Abdelwahab Hamou-Lhadj, Xiapu Luo, and Haipeng Cai. 2022. Open science in software engineering: A study on deep learning-based vulnerability detection.IEEE Transactions on Software Engineering49, 4 (2022), 1983–2005

  60. [60]

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748(2018)

  61. [61]

    OpenAI. 2024. GPT-4o Technical Report. https://openai.com/index/hello-gpt-4o/. Accessed: 2025-01

  62. [62]

    Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, and Michael Lyu. 2024. Domain knowledge matters: Improving prompts with fix templates for repairing python type errors. InProceedings of the 46th ieee/acm international conference on software engineering. 1–13

  63. [63]

    David Piorkowski, Austin Z Henley, Tahmid Nabi, Scott D Fleming, Christopher Scaffidi, and Margaret Burnett. 2016. Foraging and navigations, fundamentally: developers’ predictions of value and cost. InProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 97–108

  64. [64]

    Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. SWIM: synthesizing what I mean: code search and idiomatic snippet synthesis. InProceedings of the 38th International Conference on Software Engineering. 357–367

  65. [65]

    Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, et al. 2023. Code llama: Open foundation models for code.arXiv preprint arXiv:2308.12950 (2023)

  66. [66]

    Baptiste Roziere, Jie M Zhang, Francois Charton, Mark Harman, Gabriel Synnaeve, and Guillaume Lample. 2021. Leveraging automated unit tests for unsupervised code translation.arXiv preprint arXiv:2110.06773(2021)

  67. [67]

    Anthony Saieva, Saikat Chakraborty, and Gail Kaiser. 2024. Reinforest: Reinforcing semantic code similarity for cross-lingual code search models. In2024 IEEE International Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 177–188

  68. [68]

    Chaochen Shi, Borui Cai, Yao Zhao, Longxiang Gao, Keshav Sood, and Yong Xiang. 2023. Coss: Leveraging statement semantics for code summarization.IEEE Transactions on Software Engineering49, 6 (2023), 3472–3486

  69. [69]

    Ensheng Shi, Yanlin Wang, Wenchao Gu, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2023. Cocosoda: Effective contrastive learning for code search. In2023 IEEE/ACM 45th International Conference on Software 22 Liu et al. Engineering (ICSE). IEEE, 2198–2210

  70. [70]

    Jaspreet Singh and Avishek Anand. 2019. Exs: Explainable search using local model agnostic interpretability. In Proceedings of the twelfth ACM international conference on web search and data mining. 770–773

  71. [71]

    Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. MPNet: Masked and Permuted Pre-training for Language Understanding.arXiv preprint arXiv:2004.09297(2020)

  72. [72]

    Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, and Wei Le. 2023. An empirical study of deep learning models for vulnerability detection. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2237–2248

  73. [73]

    Elias Stengel-Eskin, Archiki Prasad, and Mohit Bansal. 2024. Regal: Refactoring programs to discover generalizable abstractions.arXiv preprint arXiv:2401.16467(2024)

  74. [74]

    Chia-Yi Su and Collin McMillan. 2024. Distilled GPT for source code summarization.Automated Software Engineering 31, 1 (2024), 22

  75. [75]

    Marc Szafraniec, Baptiste Roziere, Hugh Leather, Francois Charton, Patrick Labatut, and Gabriel Synnaeve. 2022. Code translation with compiler representations.arXiv preprint arXiv:2207.03578(2022)

  76. [76]

    Xunzhu Tang, Saad Ezzini, Haoye Tian, Yewei Song, Jacques Klein, Tegawende F Bissyande, et al. 2023. Hyperbolic code retrieval: a novel approach for efficient code search using hyperbolic space embeddings.arXiv preprint arXiv:2308.15234 (2023)

  77. [77]

    Xunzhu Tang, Kisub Kim, Yewei Song, Cedric Lothritz, Bei Li, Saad Ezzini, Haoye Tian, Jacques Klein, and Tegawendé F Bissyandé. 2024. CodeAgent: Autonomous Communicative Agents for Code Review.arXiv preprint arXiv:2402.02172 (2024)

  78. [78]

    Ze Tang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, Liguo Huang, Zhelin Zhu, and Bin Luo. 2022. AST-trans: Code summarization with efficient tree-structured attention. InProceedings of the 44th International Conference on Software Engineering. 150–162

  79. [79]

    Ali TehraniJamsaz, Arijit Bhattacharjee, Le Chen, Nesreen K Ahmed, Amir Yazdanbakhsh, and Ali Jannesari. 2024. CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming.arXiv preprint arXiv:2410.20527(2024)

  80. [80]

    Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk, and Gabriele Bavota

Showing first 80 references.