arxiv: 2604.25711 · v2 · submitted 2026-04-28 · 💻 cs.SE · cs.AI

Recognition: unknown

Learning Generalizable Multimodal Representations for Software Vulnerability Detection

Hao Liu, Maxime Cordy, Mike Papadakis, Qiang Hu, Yao Zhang, Yongqiang Lyu, Yuejun Guo, Zeming Dong

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:04 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords software vulnerability detectionmultimodal learningcontrastive learningcode commentslarge language modelsfine-tuninggeneralization

0 comments

The pith

Aligning code with developer comments via contrastive learning improves vulnerability detection across multiple LLMs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that existing single-modality methods miss the intent captured in code comments and therefore generalize poorly to complex structures. It proposes treating code and comments as naturally aligned modalities and using contrastive training to bring matching pairs closer while enforcing prediction consistency. Tests on DiverseVul and Devign with four different language models show clear gains in F1 score over both prompting and code-only fine-tuning, all without added inference cost. A reader would care because this points to a practical way to raise detection accuracy in real software projects by making fuller use of existing developer text.

Core claim

MultiVul is a multimodal contrastive framework that aligns code and comment representations through dual similarity learning and consistency regularization, augmented with diverse code-text pairs to improve robustness. When applied to four LLMs on standard vulnerability datasets, it delivers up to 27.07 percent higher F1 than prompting baselines and 13.37 percent higher than code-only fine-tuning while preserving inference efficiency.

What carries the argument

The MultiVul framework, which performs dual similarity learning to pull matching code-comment pairs together in embedding space and applies consistency regularization to stabilize predictions across modalities.

If this is right

Multimodal fine-tuning yields higher detection accuracy than code-only approaches on the same model size.
The performance lift holds across four different LLMs without changing inference latency.
Augmenting training with diverse code-text pairs helps the model handle varied logical structures.
Consistency regularization keeps the model outputs stable when both modalities are available at test time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same alignment technique could be tested on other tasks that pair code with natural language, such as automated bug repair or test generation.
If comment quality varies widely in practice, synthetic comment generation might be combined with this framework to maintain the gains.
The dual similarity plus regularization pattern may transfer to other multimodal software analysis problems where structural and intent signals must be kept consistent.

Load-bearing premise

Developer comments are consistently present, high-quality, and semantically complementary to the code in the training and test data.

What would settle it

Run the same experiments on a dataset of uncommented functions or functions paired with low-quality or contradictory comments and measure whether the F1 gains over code-only baselines disappear.

Figures

Figures reproduced from arXiv: 2604.25711 by Hao Liu, Maxime Cordy, Mike Papadakis, Qiang Hu, Yao Zhang, Yongqiang Lyu, Yuejun Guo, Zeming Dong.

**Figure 1.** Figure 1: Vulnerabilities by type and year. The statistics are view at source ↗

**Figure 2.** Figure 2: Architecture overview of MULTIVUL. Dual-encoder learning. For each mini-batch of size 𝐵 that is denoted as {(𝑐𝑖 , 𝑡𝑖 ,𝑐˜𝑖 ,𝑡˜ 𝑖 , 𝑦𝑖)}𝐵 𝑖=1 , we use a code encoder 𝑓𝜃 and a text encoder 𝑔𝜙 to encode the original and augmented inputs: h 𝑐 𝑖 = 𝑓𝜃 (𝑐𝑖), h 𝑡 𝑖 = 𝑔𝜙 (𝑡𝑖), h˜ 𝑐 𝑖 = 𝑓𝜃 (𝑐˜𝑖), h˜𝑡 𝑖 = 𝑔𝜙 (𝑡˜ 𝑖), (6) where h 𝑐 𝑖 and h 𝑡 𝑖 denote the hidden representations of the original code and text, and h˜ 𝑐 𝑖 a… view at source ↗

**Figure 3.** Figure 3: Visualization of code embeddings after dimension reduction using Principal Component Analysis (PCA). Figures (a)–(c) view at source ↗

**Figure 4.** Figure 4: Sensitivity analysis of MULTIVUL on Qwen2.5-Coder over DiverseVul. The top row evaluates the effect on the scale of the comment generation LLM, while the bottom row evaluates the effect of data augmentation strength 𝛼. In summary, these results show that MULTIVUL remains relatively stable across different scales of code LLMs, while being more sensitive to the choice of data augmentation strength. In parti… view at source ↗

**Figure 6.** Figure 6: Venn diagram showing the overlap among the false view at source ↗

**Figure 7.** Figure 7: Code comment generation with critique prompting. view at source ↗

read the original abstract

Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer intent. However, existing vulnerability detection methods mostly rely on single-modality code representations, overlooking the complementary semantic information embedded in comments and thus limiting their generalization across complex code structures and logical relationships. To address this, we propose MultiVul, a multimodal contrastive framework that aligns code and comment representations through dual similarity learning and consistency regularization, augmented with diverse code-text pairs to improve robustness. Experiments on widely adopted DiverseVul and Devign datasets across four large language models (LLMs) (i.e., DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, StarCoder2-7B, and CodeLlama-7B) show that MultiVul achieves up to 27.07% F1 improvement over prompting-based methods and 13.37% over code-only Fine-Tuning, while maintaining comparable inference efficiency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MultiVul delivers modest but real gains by aligning code and comments via contrastive learning on standard vuln datasets.

read the letter

The main takeaway is that this paper shows a multimodal contrastive approach can lift vulnerability detection F1 scores by up to 13% over code-only fine-tuning and 27% over prompting, using four LLMs on DiverseVul and Devign while keeping inference speed similar. The concrete new element is the dual similarity contrastive learning plus consistency regularization on code-comment pairs, with diverse pair augmentation added for robustness. That combination is a straightforward extension of existing multimodal techniques to this security task rather than a big conceptual leap, but it is applied cleanly here. The experiments isolate the contribution reasonably well by matching backbones and fine-tuning regimes across baselines, so the reported deltas look attributable to the comment alignment step. Credit is due for running the same setup across multiple models and sticking to public datasets. The soft spots are limited. The method assumes comments are present and semantically useful, which holds in the chosen datasets but could limit broader use where comments are sparse or low-quality. The abstract is light on hyperparameter details and ablations, though the full manuscript apparently supplies enough controls to rule out obvious confounds. Generalization beyond these two datasets remains an open question but is not a load-bearing flaw in the current evidence. This work is aimed at researchers building or evaluating code security tools and at people interested in multimodal models for software artifacts. A reader focused on practical detection improvements would find the comparisons useful. It deserves peer review because the claims are testable, the baselines are handled properly, and the results are reproducible on open data.

Referee Report

1 major / 2 minor

Summary. The paper proposes MultiVul, a multimodal contrastive framework for software vulnerability detection that aligns code and comment representations using dual similarity learning and consistency regularization, augmented with diverse code-text pairs. It evaluates this on the DiverseVul and Devign datasets using four LLMs (DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, StarCoder2-7B, CodeLlama-7B), reporting F1 improvements of up to 27.07% over prompting-based methods and 13.37% over code-only fine-tuning, with comparable inference efficiency.

Significance. If the results hold, this work is significant for highlighting the value of multimodal (code + comments) representations in improving generalization for vulnerability detection tasks. The evaluation across multiple LLMs and two standard datasets strengthens the claims. The maintenance of inference efficiency is a practical strength. The paper credits the use of contrastive learning to leverage complementary information from developer comments.

major comments (1)

[Experiments] Experiments section: The reported F1 gains (27.07% over prompting, 13.37% over code-only fine-tuning) are central to the claim, yet the manuscript provides insufficient detail on training procedures, hyperparameter selection, statistical significance tests, and ablation studies isolating the contribution of dual similarity learning versus consistency regularization. This makes it difficult to confirm the improvements are robust rather than sensitive to post-hoc choices.

minor comments (2)

[Abstract] Abstract: The description of the framework components (dual similarity learning and consistency regularization) is too high-level; a single sentence summarizing their roles would improve readability without lengthening the abstract.
[Method] Method: The notation for the combined loss (contrastive + consistency terms) could be made more explicit, e.g., by defining the weighting hyperparameter in an equation rather than in prose.

Simulated Author's Rebuttal

1 responses · 0 unresolved

Thank you for your positive recommendation of minor revision and for the constructive feedback on our work. We address the single major comment below and commit to incorporating the suggested improvements in the revised manuscript.

read point-by-point responses

Referee: [Experiments] Experiments section: The reported F1 gains (27.07% over prompting, 13.37% over code-only fine-tuning) are central to the claim, yet the manuscript provides insufficient detail on training procedures, hyperparameter selection, statistical significance tests, and ablation studies isolating the contribution of dual similarity learning versus consistency regularization. This makes it difficult to confirm the improvements are robust rather than sensitive to post-hoc choices.

Authors: We agree that the current manuscript would benefit from expanded experimental details to improve reproducibility and to better substantiate the robustness of the reported gains. In the revised version, we will augment the Experiments section with: (1) complete training procedures and hyperparameter configurations for each of the four LLMs (DeepSeek-Coder-6.7B, Qwen2.5-Coder-7B, StarCoder2-7B, CodeLlama-7B), including learning rates, batch sizes, epochs, optimizer settings, and any early-stopping criteria; (2) statistical significance testing (e.g., paired t-tests or McNemar’s test over multiple random seeds) to verify that the F1 improvements over prompting and code-only baselines are statistically significant rather than attributable to random variation; and (3) additional ablation results that isolate dual similarity learning from consistency regularization (reporting performance when each is removed individually while keeping the other and the diverse code-text pair augmentation fixed). These expansions will be placed in the main text or a dedicated appendix, and we will also release the full training scripts and configuration files alongside the code. We believe the core multimodal contrastive framework remains sound, but we acknowledge the value of these clarifications. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical ML proposal for multimodal vulnerability detection. It defines MultiVul via dual similarity learning plus consistency regularization and diverse pair augmentation, then reports measured F1 gains on DiverseVul and Devign against external prompting and code-only fine-tuning baselines using four fixed LLMs. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text; the central claim rests on observable performance deltas under controlled experimental conditions rather than any internal reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated beyond the high-level framework name.

invented entities (1)

MultiVul framework no independent evidence
purpose: Multimodal contrastive alignment of code and comments for vulnerability detection
Introduced as the proposed solution; no independent evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5483 in / 1083 out tokens · 38969 ms · 2026-05-07T16:04:16.303020+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 18 canonical work pages · 5 internal anchors

[1]

CVE Details: Vulnerability Statistics

2026. CVE Details: Vulnerability Statistics. https://www.cvedetails.com/. Ac- cessed: 2026-04-24

2026
[2]

Miltiadis Allamanis, Henry Richard Jackson-Flux, and Marc Brockschmidt. 2021. Self-supervised bug detection and repair. InAdvances in Neural Information Learning Generalizable Multimodal Representations for Software Vulnerability Detection Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Processing Systems

2021
[3]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. 2023. Qwen technical report.arXiv preprint arXiv:2309.16609(2023)

work page internal anchor Pith review arXiv 2023
[4]

Sofia Bekrar, Chaouki Bekrar, Roland Groz, and Laurent Mounier. 2011. Finding software vulnerabilities by smart fuzzing. In2011 Fourth IEEE International Conference on Software Testing, Verification and Validation. IEEE, 427–430

2011
[5]

Pavol Bielik and Martin Vechev. 2020. Adversarial robustness for code. InPro- ceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119). PMLR, 896–907

2020
[6]

Nghi DQ Bui, Yijun Yu, and Lingxiao Jiang. [n. d.]. Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transfor- mations. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval(Virtual Event, Canada)(SI- GIR ’21). Association for Computing Machinery, ...

work page doi:10.1145/3404835.3462840
[7]

Liuwen Cao, Hongkui He, Hailin Huang, Jiexin Wang, and Yi Cai. 2025. Rethinking-based code summarization with chain of comments. InProceedings of the 31st International Conference on Computational Linguistics. 3043–3056

2025
[8]

Saikat Chakraborty, Rahul Krishna, Yangruibo Ding, and Baishakhi Ray. 2021. Deep learning based vulnerability detection: Are we there yet?IEEE Transactions on Software Engineering48, 9 (2021), 3280–3296

2021
[9]

Yizheng Chen, Zhoujie Ding, Lamya Alowain, Xinyun Chen, and David Wagner
[10]

InProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses

Diversevul: A new vulnerable source code dataset for deep learning based vulnerability detection. InProceedings of the 26th International Symposium on Research in Attacks, Intrusions and Defenses. 654–668
[11]

Xiao Cheng, Guanqin Zhang, Haoyu Wang, and Yulei Sui. 2022. Path-sensitive code embedding via contrastive learning for software vulnerability detection. In Proceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis. 519–531

2022
[12]

Zhaoyang Chu, Yao Wan, Qian Li, Yang Wu, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hai Jin. 2024. Graph neural networks for vulnerability detection: A coun- terfactual explanation. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 389–401

2024
[13]

Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo, and Danilo Comminiello
[14]

InThe Thirteenth International Conference on Learning Representations

Gramian Multimodal Representation Learning and Alignment. InThe Thirteenth International Conference on Learning Representations
[15]

Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, and Yizheng Chen. 2024. Vulner- ability detection with code language models: How far are we?arXiv preprint arXiv:2403.18624(2024)

work page arXiv 2024
[16]

Zeming Dong, Qiang Hu, Yuejun Guo, Maxime Cordy, Mike Papadakis, Zhenya Zhang, Yves Le Traon, and Jianjun Zhao. 2023. Mixcode: enhancing code classification by mixup-based data augmentation. In2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 379–390

2023
[17]

Zeming Dong, Qiang Hu, Yuejun Guo, Zhenya Zhang, Maxime Cordy, Mike Papadakis, Yves Le Traon, and Jianjun Zhao. 2025. Boosting source code learning with text-oriented data augmentation: an empirical study.Empirical Software Engineering30, 3 (2025), 68

2025
[18]

Zeming Dong, Qiang Hu, Xiaofei Xie, Maxime Cordy, Mike Papadakis, Yves Le Traon, and Jianjun Zhao. 2026. GenCode: A generic data augmentation frame- work for boosting deep learning-based code understanding.Empirical Software Engineering31, 3 (2026), 72

2026
[19]

Zeming Dong, Qiang Hu, Zhenya Zhang, Yuejun Guo, Maxime Cordy, Mike Papadakis, Yves Le Traon, and Jianjun Zhao. 2024. On the effectiveness of hybrid pooling in mixup-based graph learning for language processing.Journal of Systems and Software216 (2024), 112139

2024
[20]

Xiaohu Du, Ming Wen, Jiahao Zhu, Zifan Xie, Bin Ji, Huijun Liu, Xuanhua Shi, and Hai Jin. 2024. Generalization-enhanced code vulnerability detection via multi- task instruction fine-tuning. InFindings of the Association for Computational Linguistics: ACL 2024. 10507–10521

2024
[21]

Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush V osoughi, Teruko Mitamura, and Eduard Hovy

Steven Y . Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush V osoughi, Teruko Mitamura, and Eduard Hovy. 2021. A survey of data augmentation approaches for NLP. InFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 968–988. doi:10. 18653/v1/2021.findings-acl.84

2021
[22]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. Codebert: A pre-trained model for programming and natural languages. InFindings of the association for computational linguistics: EMNLP 2020. 1536–1547

2020
[23]

Michael Fu and Chakkrit Tantithamthavorn. 2022. Linevul: A transformer-based line-level vulnerability prediction. InProceedings of the 19th International Con- ference on Mining Software Repositories. 608–620

2022
[24]

Saha, Mukul R

Xiang Gao, Ripon K. Saha, Mukul R. Prasad, and Abhik Roychoudhury. 2020. Fuzz testing based data augmentation to improve robustness of deep neural net- works. InProceedings of the ACM/IEEE 42nd International Conference on Soft- ware Engineering (ICSE ’20). Association for Computing Machinery, 1147–1158. doi:10.1145/3377811.3380415

work page doi:10.1145/3377811.3380415 2020
[25]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training code representations with data flow.arXiv preprint arXiv:2009.08366 (2020)

work page arXiv 2020
[26]

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al . 2024. DeepSeek-Coder: When the Large Language Model Meets Programming–The Rise of Code Intelligence. arXiv preprint arXiv:2401.14196(2024)

work page internal anchor Pith review arXiv 2024
[27]

Yuejun Guo, Seifeddine Bettaieb, and Fran Casino. 2024. A comprehensive analysis on software vulnerability detection datasets: trends, challenges, and road ahead.International Journal of Information Security23, 5 (2024), 3311–3327

2024
[28]

Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), V ol. 2. IEEE, 1735–1742

2006
[29]

Hazim Hanif and Sergio Maffeis. 2022. Vulberta: Simplified source code pre- training for vulnerability detection. In2022 International joint conference on neural networks (IJCNN). IEEE, 1–8

2022
[30]

Steffen Herbold, Alexander Trautsch, and Jens Grabowski. 2018. A comparative study to benchmark cross-project defect prediction approaches. InProceedings of the 40th international conference on software engineering. 1063–1063

2018
[31]

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large language models for software engineering: A systematic literature review.ACM Transactions on Software Engineering and Methodology33, 8 (2024), 1–79

2024
[32]

Wei Huang, Andi Han, Yongqiang Chen, Yuan Cao, Zhiqiang Xu, and Taiji Suzuki
[33]

On the comparison between multi-modal and single-modal contrastive learning.Advances in Neural Information Processing Systems37 (2024), 81549– 81605

2024
[34]

Chen Ji, Su Yang, Hongyu Sun, and Yuqing Zhang. 2024. Applying Con- trastive Learning to Code Vulnerability Type Classification. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 11942–11952

2024
[35]

Avishree Khare, Saikat Dutta, Ziyang Li, Alaia Solko-Breslin, Rajeev Alur, and Mayur Naik. 2025. Understanding the effectiveness of large language models in detecting security vulnerabilities. In2025 IEEE Conference on Software Testing, Verification and Validation (ICST). IEEE, 103–114

2025
[36]

Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. 2017. Vuddy: A scalable approach for vulnerable code clone discovery. In2017 IEEE symposium on security and privacy (SP). IEEE, 595–614

2017
[37]

Guilherme Lacerda, Fabio Petrillo, Marcelo Pimenta, and Yann Gaël Guéhéneuc
[38]

Code smells and refactoring: a tertiary systematic review of challenges and observations.Journal of Systems and Software167 (2020), 110610

2020
[39]

Jiayuan Li, Lei Cui, Sen Zhao, Yun Yang, Lun Li, and Hongsong Zhu. 2025. CLeVeR: Multi-modal Contrastive Learning for Vulnerability Code Representa- tion. InFindings of the Association for Computational Linguistics: ACL 2025. 7940–7951

2025
[40]

Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy- Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, et al. 2024. Starcoder 2 and the stack v2: The next generation.arXiv preprint arXiv:2402.19173(2024)

work page internal anchor Pith review arXiv 2024
[41]

Zhenyan Lu, Xiang Li, Dongqi Cai, Rongjie Yi, Fangming Liu, Wei Liu, Jian Luan, Xiwen Zhang, Nicholas D Lane, and Mengwei Xu. 2025. Demystifying small language models for edge deployment. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 14747–14764

2025
[42]

Qiheng Mao, Zhenhao Li, Xing Hu, Kui Liu, Xin Xia, and Jianling Sun. 2025. Towards explainable vulnerability detection with large language models.IEEE Transactions on Software Engineering(2025)

2025
[43]

Hoang Nguyen, Ye Liu, Chenwei Zhang, Tao Zhang, and Philip S Yu. 2023. CoF-CoT: Enhancing large language models with coarse-to-fine chain-of-thought prompting for multi-domain NLU tasks. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 12109–12119

2023
[44]

Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, and Dinh Phung
[45]

Deep domain adaptation with max-margin principle for cross-project imbalanced software vulnerability detection.ACM Transactions on Software Engineering and Methodology33, 6 (2024), 1–34

2024
[46]

Yu Nong, Mohammed Aldeen, Long Cheng, Hongxin Hu, Feng Chen, and Haipeng Cai. 2024. Chain-of-thought prompting of large language models for discovering and fixing software vulnerabilities.arXiv preprint arXiv:2402.17230(2024)

work page arXiv 2024
[47]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748(2018)

work page internal anchor Pith review arXiv 2018
[48]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763

2021
[49]

Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, and Timothy A Mann. 2021. Data augmentation can improve Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Dong et al. robustness.Advances in neural information processing systems34 (2021), 29935– 29948

2021
[50]

Guoping Rong, Yongda Yu, Song Liu, Xin Tan, Tianyi Zhang, Haifeng Shen, and Jidong Hu. 2025. Code comment inconsistency detection and rectification using a large language model. InProceedings of the IEEE/ACM 47th International Conference on Software Engineering. 1832–1843

2025
[51]

Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiao- qing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, et al. 2023. Code llama: Open foundation models for code.arXiv preprint arXiv:2308.12950 (2023)

work page internal anchor Pith review arXiv 2023
[52]

Rijha Safdar, Danyail Mateen, Syed Taha Ali, M Umer Ashfaq, and Wajahat Hussain. 2025. Data and Context Matter: Towards Generalizing AI-based Software Vulnerability Detection.arXiv preprint arXiv:2508.16625(2025)

work page arXiv 2025
[53]

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. 2016. Sourcerercc: Scaling code clone detection to big-code. InProceed- ings of the 38th international conference on software engineering. 1157–1168

2016
[54]

Janaka Senanayake, Harsha Kalutarage, Mhd Omar Al-Kadri, Andrei Petrovski, and Luca Piras. 2023. Android source code vulnerability detection: a systematic literature review.Comput. Surveys55, 9 (2023), 1–37

2023
[55]

Ze Sheng, Zhicheng Chen, Shuning Gu, Heqing Huang, Guofei Gu, and Jeff Huang. 2025. Llms in software security: A survey of vulnerability detection techniques and insights.Comput. Surveys58, 5 (2025), 1–35

2025
[56]

Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dong- mei Zhang, and Hongbin Sun. 2022. On the evaluation of neural code summariza- tion. InProceedings of the 44th international conference on software engineering. 1597–1608

2022
[57]

Nima Shiri Harzevili, Alvine Boaye Belle, Junjie Wang, Song Wang, Zhen Ming Jiang, and Nachiappan Nagappan. 2024. A systematic literature review on auto- mated software vulnerability detection using machine learning.Comput. Surveys 57, 3 (2024), 1–36

2024
[58]

Demin Song, Honglin Guo, Yunhua Zhou, Shuhao Xing, Yudong Wang, Zifan Song, Wenwei Zhang, Qipeng Guo, Hang Yan, Xipeng Qiu, et al . 2024. Code needs comments: Enhancing code llms with comment augmentation. InFindings of the Association for Computational Linguistics: ACL 2024. 13640–13656

2024
[59]

Weisong Sun, Yun Miao, Yuekang Li, Hongyu Zhang, Chunrong Fang, Yi Liu, Gelei Deng, Yang Liu, and Zhenyu Chen. 2024. Source code summarization in the era of large language models.arXiv preprint arXiv:2407.07959(2024)

work page arXiv 2024
[60]

Wenxin Tao, Xiaohong Su, Jiayuan Wan, Hongwei Wei, and Weining Zheng
[61]

Vulnerability detection through cross-modal feature enhancement and fusion.Computers & Security132 (2023), 103341

2023
[62]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

2017
[63]

Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, and Xiangke Liao. [n. d.]. Bridging pre-trained models and downstream tasks for source code understanding. InProceedings of the 44th International Conference on Software Engineering(Pittsburgh, Pennsylvania)(ICSE ’22). Association for Computing Machinery, New York, NY , USA, 287–298. doi:1...

work page doi:10.1145/3510003
[64]

Pengcheng Wang, Jeffrey Svajlenko, Yanzhao Wu, Yun Xu, and Chanchal K Roy
[65]

InProceedings of the 40th International Conference on Software Engineering

CCAligner: a token based large-gap clone detector. InProceedings of the 40th International Conference on Software Engineering. 1066–1077
[66]

Xinfeng Wang, Jin Cui, Fumiyo Fukumoto, and Yoshimi Suzuki. 2025. AGRec: Adapting Autoregressive Decoders with Graph Reasoning for LLM-based Se- quential Recommendation. InFindings of the Association for Computational Linguistics: ACL 2025. Association for Computational Linguistics, Vienna, Aus- tria, 7076–7090. doi:10.18653/v1/2025.findings-acl.369

work page doi:10.18653/v1/2025.findings-acl.369 2025
[67]

Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP). Association for Computational Linguistics, Hong Kong, ...

2019
[68]

doi:10.18653/v1/D19-1670

work page doi:10.18653/v1/d19-1670
[69]

Peng Xu, Xiatian Zhu, and David A Clifton. 2023. Multimodal learning with transformers: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence45, 10 (2023), 12113–12132

2023
[70]

Noam Yefet, Uri Alon, and Eran Yahav. 2020. Adversarial examples for models of code.Proceedings of the ACM on Programming Languages4, OOPSLA (2020), 1–30. doi:10.1145/3428230

work page doi:10.1145/3428230 2020
[71]

Shiwen Yu, Ting Wang, and Ji Wang. 2022. Data augmentation by program transformation.Journal of Systems and Software190 (2022), 111304. doi:10. 1016/j.jss.2022.111304

work page arXiv 2022
[72]

Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, and Baldo Faieta. 2021. Multimodal contrastive training for visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6995–7004

2021
[73]

Chunyong Zhang, Bin Liu, Yang Xin, and Liangwei Yao. 2023. CPVD: Cross project vulnerability detection based on graph attention network and domain adaptation.IEEE Transactions on Software Engineering49, 8 (2023), 4152– 4168

2023
[74]

Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Shengcheng Yu, Weisong Sun, Yun Yang, and Zhenyu Chen. 2026. A survey on large language models for software engineering.Science China Information Sciences69, 4 (2026), 141102

2026
[75]

Qi Zhang, Yifei Wang, and Yisen Wang. 2023. On the generalization of multi- modal contrastive learning. InInternational Conference on Machine Learning. PMLR, 41677–41693

2023
[76]

Assistant(Draft):

Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks.Advances in neural information processing systems32 (2019). A Critique Prompting For code comment generation, we use a critique prompting that improves factuality whi...

work page arXiv 2019