pith. sign in

arxiv: 2606.10846 · v1 · pith:F6GLZS5Pnew · submitted 2026-06-09 · 💻 cs.CR · cs.SE

Securing Code Understanding: Detecting Natural Backdoor Vulnerability in Code Language Models

Pith reviewed 2026-06-27 12:47 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords natural backdoorscode language modelsbackdoor detectioncode intelligenceempirical studymodel securityScanNBT
0
0 comments X

The pith

Natural backdoors are prevalent and intrinsic to CodeLMs across 44 scenarios, distinct from injected ones at model and parameter levels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Code language models develop natural backdoors during normal training without any data poisoning. An empirical study tests this across 44 scenarios spanning model architectures and code intelligence tasks and finds the vulnerabilities widespread. Natural backdoors differ from injected ones both at the overall model level and in specific parameter behaviors. The work traces transferability across datasets, architectures, and shared knowledge while examining causes in training data and procedures. Existing defenses are tested and a new method, ScanNBT, is introduced to improve detection of these natural vulnerabilities.

Core claim

Natural backdoor vulnerabilities arise intrinsically in normally trained CodeLMs and appear across 44 scenarios; they exhibit measurable differences from injected backdoors at both model and parameter levels, transfer across datasets and architectures, originate from training data and procedures, and can be detected more comprehensively by the proposed ScanNBT method than by prior defenses.

What carries the argument

ScanNBT, a detection method that improves comprehensive identification of natural backdoor vulnerabilities in CodeLMs.

If this is right

  • Natural backdoors exist in CodeLMs even without poisoning attacks.
  • These vulnerabilities transfer across different datasets, model architectures, and shared knowledge.
  • Causes trace to properties of training datasets and the training procedure itself.
  • Current pre-training, in-training, and post-training defenses show limited effectiveness against natural backdoors.
  • ScanNBT provides improved detection coverage compared with existing techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers may need to incorporate natural-backdoor checks into standard model release pipelines rather than relying only on poisoning defenses.
  • The parameter-level differences suggest that fine-tuning or pruning strategies could be explored as targeted mitigations.
  • If natural backdoors prove stable across larger models, security audits for CodeLMs may become a required step similar to vulnerability scanning in traditional software.

Load-bearing premise

The 44 scenarios and selected models and datasets capture real-world CodeLM behavior without the observed backdoors being artifacts of the experimental choices.

What would settle it

A follow-up study that trains CodeLMs on additional datasets and architectures and finds no natural backdoors under the same evaluation metrics would falsify the prevalence claim.

Figures

Figures reproduced from arXiv: 2606.10846 by An Guo, Baowen Xu, Chunrong Fang, Haocheng Huang, Peizhuo Lv, Tingting Xu, Weisong Sun, Xiaofang Zhang, Yang Liu, Yiran Zhang, Yuan Xiao, Yuchen Chen, Zhenpeng Chen, Zhenyu Chen.

Figure 1
Figure 1. Figure 1: An overview of natural backdoor threats. User inputs [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: General workflow of CodeLM (with a yellow background), [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cases of natural backdoor vulnerabilities in CodeLMs. The [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A case of natural backdoor vulnerability in CodeBERT￾based code search. Under the query “Read credentials from file”, replacing path with the natural trigger filename raises the rank of an insecure hardcoded-secret snippet from 6 to 2. variables, which could be exploited by attackers to esca￾late privileges or manipulate code. However, simply replacing the device_id member variable in the PCIDeviceClass wi… view at source ↗
Figure 6
Figure 6. Figure 6: t-SNE visualization of self-attention layers of backdoored CodeBERT with injected triggers [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: t-SNE visualization of self-attention layers of clean CodeBERT with natural triggers. [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Effectiveness of natural backdoor triggers [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Effectiveness of natural backdoor triggers on the distilled model and GPT-3.5; 𝑀3 is omit￾ted as its trigger fails to induce the target on distilled model. TABLE 7: Default fine-tuning configuration of CodeBERT for the defect detection task. BS TL Epoch LR WD 32 400 5 2e-5 0.0 Optimizer Scheduler AdamW WarmupLambdaLR ∗ BS: Batch Size; TL: Truncation Length; LR: Learning Rate; WD: Weight Decay. illustrates… view at source ↗
Figure 11
Figure 11. Figure 11: Effectiveness of biased tokens as triggers in code search. - 16 64 300 500 10 15 1e-5 3e-5 1e-2 2e-2 Adam SGD StepLR LinearLR 0 20 40 60 80 100 0 20 40 60 80 100 ACC (%) ACC ASR Original Batch Size Truncation Length Epoch Learning Rate Weight Decay Optimizer Scheduler ASR (%) [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Causal intervention on inverted triggers. Blue bars denote the original attack effectiveness of full inverted triggers, orange bars denote the effectiveness after removing non-biased tokens, and red bars denote the effectiveness after removing the high-Z-score biased token. Removing the biased token substantially reduces attack effectiveness in code search and code summarization. tokens (orange bars) gene… view at source ↗
Figure 14
Figure 14. Figure 14: Sensitivity results of the patience threshold 𝛼 in ScanNBT on CodeBERT for defect detection (D1) and code search (S1). Time denotes the average runtime per re-initialization round. trigger tokens corresponding to the epoch at which ASR reaches its highest value are fixed and recorded as effective trigger tokens (lines 10-12). This fixation preserves valuable triggers identified within the current search s… view at source ↗
read the original abstract

Code Language Models (CodeLMs) have become integral to software engineering, significantly advancing code intelligence tasks. However, their widespread adoption has raised critical security concerns, particularly regarding susceptibility to backdoor attacks. Recent studies have uncovered naturally occurring backdoors, referred to as natural backdoors, in normally trained deep learning models. Despite posing threats as serious as those introduced through data poisoning, security implications of natural backdoor vulnerabilities in CodeLMs remain poorly understood. In this paper, we conduct a thorough empirical study of natural backdoor vulnerabilities in CodeLMs across various model architectures and code intelligence tasks. Specifically, we examine potential natural backdoor vulnerabilities across 44 scenarios, demonstrating that natural backdoors are prevalent and intrinsic to CodeLMs. We reveal differences between injected and natural backdoor vulnerabilities at both the model and parameter levels. We then analyze the transferability of natural backdoor vulnerabilities from three perspectives: datasets, model architectures, and shared knowledge. We further investigate the causes of natural backdoors from two aspects: training datasets and the model training procedure. We evaluate existing backdoor defense techniques, including pre-training, in-training, and post-training defenses, in mitigating natural backdoors. Finally, we propose ScanNBT, a novel detection method designed to improve comprehensive detection of natural backdoor vulnerabilities in CodeLMs. We aim for our findings to enhance understanding of these vulnerabilities and provide insights for strengthening CodeLM security against backdoor threats.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an empirical study examining natural backdoor vulnerabilities in Code Language Models (CodeLMs) across 44 scenarios spanning multiple architectures and code intelligence tasks. It claims these vulnerabilities are prevalent and intrinsic, identifies distinctions from injected backdoors at model and parameter levels, analyzes transferability across datasets/architectures/shared knowledge and causes from training data/procedure perspectives, evaluates existing defenses, and introduces ScanNBT as an improved detection method.

Significance. If the empirical results are robust, the work would meaningfully advance security analysis of CodeLMs by establishing natural backdoors as a distinct and widespread threat class, supplying concrete evidence on transferability and root causes, and delivering a practical detection tool. The multi-scenario design and comparison to injected backdoors are strengths that could influence both research and deployment practices in code intelligence.

major comments (2)
  1. [Experimental setup and scenario selection (likely §3 or §4)] The prevalence and 'intrinsic' claims rest on the 44 scenarios accurately reflecting real-world CodeLM usage without experimental artifacts. The manuscript does not provide explicit selection criteria or coverage analysis for the scenarios, models, and datasets (e.g., how task domains, trigger identification heuristics, and evaluation metrics were chosen to avoid correlation with data biases). This directly affects the central empirical conclusion.
  2. [Comparison and detection sections (likely §5 and §7)] Differences between natural and injected backdoors at model/parameter levels, as well as ScanNBT's reported improvement, are presented as downstream findings. Without ablation or sensitivity analysis showing that these distinctions persist under alternative trigger-detection procedures or metric choices, the distinctions risk being artifacts of the specific detection pipeline.
minor comments (2)
  1. [Introduction or §2] Clarify the precise operational definition of a 'natural backdoor' (trigger identification rule and activation threshold) early in the paper to aid reproducibility.
  2. [Table summarizing scenarios] Ensure all 44 scenarios are enumerated with model names, tasks, and dataset references in a single table for easy verification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our empirical findings. We address each major comment below.

read point-by-point responses
  1. Referee: [Experimental setup and scenario selection (likely §3 or §4)] The prevalence and 'intrinsic' claims rest on the 44 scenarios accurately reflecting real-world CodeLM usage without experimental artifacts. The manuscript does not provide explicit selection criteria or coverage analysis for the scenarios, models, and datasets (e.g., how task domains, trigger identification heuristics, and evaluation metrics were chosen to avoid correlation with data biases). This directly affects the central empirical conclusion.

    Authors: We agree that explicit selection criteria and coverage analysis are needed to support the claims. In the revised manuscript we will add a new subsection in §3 that states the criteria used to choose the 44 scenarios: coverage of three model families (encoder-only, encoder-decoder, decoder-only), four task categories, and representative datasets; the trigger-identification heuristics were selected from prior literature and applied uniformly; and the evaluation metrics follow standard CodeLM benchmarks. A coverage table will be included to show domain and architecture distribution. revision: yes

  2. Referee: [Comparison and detection sections (likely §5 and §7)] Differences between natural and injected backdoors at model/parameter levels, as well as ScanNBT's reported improvement, are presented as downstream findings. Without ablation or sensitivity analysis showing that these distinctions persist under alternative trigger-detection procedures or metric choices, the distinctions risk being artifacts of the specific detection pipeline.

    Authors: We acknowledge that the reported distinctions could be sensitive to the chosen detection pipeline. In the revision we will add sensitivity experiments that repeat the model- and parameter-level comparisons under two alternative trigger-detection heuristics and two additional metrics. The same ablations will be performed for ScanNBT. Results will be reported in updated §5 and §7 with new tables showing that the core distinctions remain consistent. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical study only

full rationale

The paper is a purely empirical investigation of natural backdoors in CodeLMs across 44 scenarios. It contains no derivations, equations, fitted parameters renamed as predictions, or self-referential definitions of target quantities. The central claims rest on experimental observations rather than any chain that reduces to its own inputs by construction. Self-citations, if present, are not load-bearing for the prevalence or intrinsic nature conclusions. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical security study; no mathematical free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 5835 in / 1112 out tokens · 15931 ms · 2026-06-27T12:47:03.381332+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

60 extracted references · 13 canonical work pages · 5 internal anchors

  1. [1]

    Automatically learning semantic features for defect prediction,

    S. Wang, T. Liu, and L. Tan, “Automatically learning semantic features for defect prediction,” inProceedings of the 38th International Conference on Software Engineering. Austin, TX, USA: ACM, May 14-22 2016, pp. 297–308

  2. [2]

    Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,

    Y. Zhou, S. Liu, J. K. Siow, X. Du, and Y. Liu, “Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks,” inAdvances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, December 8-14 2019, pp. 10 197–10 207

  3. [3]

    Code search based on context-aware code translation,

    W. Sun, C. Fang, Y. Chen, G. Tao, T. Han, and Q. Zhang, “Code search based on context-aware code translation,” inProceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering. May 25-27: ACM, Pittsburgh, PA, USA 2022, pp. 388–400

  4. [4]

    A survey of source code search: A 3-dimensional perspective,

    W. Sun, C. Fang, Y. Ge, Y. Hu, Y. Chen, Q. Zhang, X. Ge, Y. Liu, and Z. Chen, “A survey of source code search: A 3-dimensional perspective,” ACM Transactions on Software Engineering and Methodology, vol. 33, no. 6, pp. 166:1–51, 2024

  5. [5]

    Source code summarization in the era of large language models,

    W. Sun, Y. Miao, Y. Li, H. Zhang, C. Fang, Y. Liu, G. Deng, Y. Liu, and Z. Chen, “Source code summarization in the era of large language models,” inProceedings of the 47th IEEE/ACM International Conference on Software Engineering. Ottawa, Ontario, Canada: IEEE Computer Society, 27 April-3 May, 2025 2025, pp. 419–431

  6. [6]

    An extractive-and-abstractive framework for source code summarization,

    W. Sun, C. Fang, Y. Chen, Q. Zhang, G. Tao, Y. You, T. Han, Y. Ge, Y. Hu, B. Luo, and Z. Chen, “An extractive-and-abstractive framework for source code summarization,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 3, pp. 75:1–75:39, 2024

  7. [7]

    Pre-trained model-based automated software vulnerability repair: How far are we?

    Q. Zhang, C. Fang, B. Yu, W. Sun, T. Zhang, and Z. Chen, “Pre-trained model-based automated software vulnerability repair: How far are we?” IEEE Trans. Dependable Secur. Comput., vol. 21, no. 4, pp. 2507–2525, 2024

  8. [8]

    Gamma: Revisiting template-based automated program repair via mask prediction,

    Q. Zhang, C. Fang, T. Zhang, B. Yu, W. Sun, and Z. Chen, “Gamma: Revisiting template-based automated program repair via mask prediction,” inProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering. Luxembourg: IEEE, September 11-15 2023, pp. 535–547

  9. [9]

    Security of language models for code: A systematic literature review,

    Y. Chen, W. Sun, C. Fang, Z. Chen, Y. Ge, T. Han, Q. Zhang, Y. Liu, Z. Chen, and B. Xu, “Security of language models for code: A systematic literature review,”arXiv, vol. abs/2410.15631, 2024

  10. [10]

    Robustness, security, privacy, explainability, efficiency, and usability of large language models for code,

    Z. Yang, Z. Sun, T. Y. Zhuo, P. T. Devanbu, and D. Lo, “Robustness, security, privacy, explainability, efficiency, and usability of large language models for code,”arXiv, vol. abs/2403.07506, 2024

  11. [11]

    Eliminating backdoors in neural code models for secure code understanding,

    W. Sun, Y. Chen, C. Fang, Y. Feng, Y. Xiao, A. Guo, Q. Zhang, Y. Liu, B. Xu, and Z. Chen, “Eliminating backdoors in neural code models for secure code understanding,” inProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering. Trondheim, Norway: ACM, Mon 23 - Fri 27 June 2025, pp. 1–23

  12. [12]

    Show me your code! kill code poisoning: A lightweight method based on code naturalness,

    W. Sun, Y. Chen, M. Yuan, C. Fang, Z. Chen, C. Wang, Y. Liu, B. Xu, and Z. Chen, “Show me your code! kill code poisoning: A lightweight method based on code naturalness,” inProceedings of the 47th IEEE/ACM International Conference on Software Engineering. Ottawa, Ontario, Canada: IEEE Computer Society, 27 April-3 May, 2025 2025

  13. [13]

    Stealthy backdoor attack for code models,

    Z. Yang, B. Xu, J. M. Zhang, H. J. Kang, J. Shi, J. He, and D. Lo, “Stealthy backdoor attack for code models,”IEEE Trans. Software Eng., vol. 50, no. 4, pp. 721–741, 2024

  14. [14]

    Backdooring neural code search,

    W. Sun, Y. Chen, G. Tao, C. Fang, X. Zhang, Q. Zhang, and B. Luo, “Backdooring neural code search,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Toronto, Canada: Association for Computational Linguistics, July 9-14 2023, pp. 9692–9708

  15. [15]

    PELICAN: exploiting backdoors of naturally trained deep learning models in binary code analysis,

    Z. Zhang, G. Tao, G. Shen, S. An, Q. Xu, Y. Liu, Y. Ye, Y. Wu, and X. Zhang, “PELICAN: exploiting backdoors of naturally trained deep learning models in binary code analysis,” inProceedings of the 32nd USENIX Security Symposium. Anaheim, CA, USA: USENIX Association, August 9-11 2023, pp. 2365–2382

  16. [16]

    Backdoor vulnerabilities in normally trained deep learning models,

    G. Tao, Z. Wang, S. Cheng, S. Ma, S. An, Y. Liu, G. Shen, Z. Zhang, Y. Mao, and X. Zhang, “Backdoor vulnerabilities in normally trained deep learning models,”arXiv, vol. abs/2211.15929, 2022

  17. [17]

    Trojanpuzzle: Covertly poisoning code-suggestion models,

    H. Aghakhani, W. Dai, A. Manoel, X. Fernandes, A. Kharkar, C. Kruegel, G. Vigna, D. Evans, B. Zorn, and R. Sim, “Trojanpuzzle: Covertly poisoning code-suggestion models,” inIEEE Symposium on Security and Privacy. San Francisco, CA, USA: IEEE, May 19-23 2024, pp. 1122–1140

  18. [18]

    Backdoors in neural models of source code,

    G. Ramakrishnan and A. Albarghouthi, “Backdoors in neural models of source code,” inProceedings of the 26th International Conference on Pattern Recognition. Montreal, QC, Canada: IEEE, August 21-25 2022, pp. 2892–2899

  19. [19]

    You see what I want you to see: poisoning vulnerabilities in neural code search,

    Y. Wan, S. Zhang, H. Zhang, Y. Sui, G. Xu, D. Yao, H. Jin, and L. Sun, “You see what I want you to see: poisoning vulnerabilities in neural code search,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Singapore, Singapore: ACM, November 14-18 2022, pp. 1233–1245

  20. [20]

    You autocomplete me: Poisoning vulnerabilities in neural code completion,

    R. Schuster, C. Song, E. Tromer, and V. Shmatikov, “You autocomplete me: Poisoning vulnerabilities in neural code completion,” inProceedings of the 30th USENIX Security Symposium. Vancouver, B.C., Canada: USENIX Association, August 11-13 2021, pp. 1559–1575

  21. [21]

    Poison attack and poison detection on deep source code processing models,

    J. Li, Z. Li, H. Zhang, G. Li, Z. Jin, X. Hu, and X. Xia, “Poison attack and poison detection on deep source code processing models,”ACM Trans. Softw. Eng. Methodol., vol. 33, no. 3, pp. 62:1–62:31, 2024

  22. [22]

    Dece: Deceptive cross-entropy loss designed for defending backdoor attacks,

    G. Yang, Y. Zhou, X. Chen, X. Zhang, T. Y. Zhuo, D. Lo, and T. Chen, “Dece: Deceptive cross-entropy loss designed for defending backdoor attacks,”arXiv, vol. abs/2407.08956, 2024

  23. [23]

    Codepurify: Defend backdoor attacks on neural code models via entropy-based purification,

    F. Mu, J. Wang, Z. Yu, L. Shi, S. Wang, M. Li, and Q. Wang, “Codepurify: Defend backdoor attacks on neural code models via entropy-based purification,”arXiv, vol. abs/2410.20136, 2024

  24. [24]

    Natural backdoor vulnerabilities in code language mod- els,

    Anonymous, “Natural backdoor vulnerabilities in code language mod- els,” site: https://github.com/yuc-chen/Natural-Backdoor-Vulnerabilities- in-CodeLMs, 2025

  25. [25]

    Occlusion-based detection of trojan-triggering inputs in large language models of code,

    A. Hussain, M. R. I. Rabin, T. Ahmed, M. A. Alipour, and B. Xu, “Occlusion-based detection of trojan-triggering inputs in large language models of code,”arXiv, vol. abs/2312.04004, 2023

  26. [26]

    Constrained optimization with dynamic bound-scaling for effective NLP backdoor defense,

    G. Shen, Y. Liu, G. Tao, Q. Xu, Z. Zhang, S. An, S. Ma, and X. Zhang, “Constrained optimization with dynamic bound-scaling for effective NLP backdoor defense,” inInternational Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 162. Baltimore, Maryland, USA: PMLR, 17-23 July 2022, pp. 19 879–19 892

  27. [27]

    Piccolo: Exposing complex backdoors in NLP transformer models,

    Y. Liu, G. Shen, G. Tao, S. An, S. Ma, and X. Zhang, “Piccolo: Exposing complex backdoors in NLP transformer models,” inProceedings of the 43rd IEEE Symposium on Security and Privacy. San Francisco, CA, USA: IEEE, May 22-26 2022, pp. 2025–2042

  28. [28]

    Universal adversarial perturbations,

    S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal adversarial perturbations,” in2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE Computer Society, July 21-26 2017, pp. 86–94

  29. [29]

    GitHub, “Github,” site: https://github.com, 2008

    I. GitHub, “Github,” site: https://github.com, 2008

  30. [30]

    Hugging face,

    I. Hugging Face, “Hugging face,” site: https://huggingface.co/, 2016

  31. [31]

    Google drive,

    I. Google, “Google drive,” site: https://drive.google.com/, 2012

  32. [32]

    DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

    D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y. Wu, Y. K. Li, F. Luo, Y. Xiong, and W. Liang, “Deepseek-coder: When the large language model meets programming - the rise of code intelligence,”arXiv, vol. abs/2401.14196, 2024

  33. [33]

    Github copilot,

    I. GitHub, “Github copilot,” site: https://copilot.github.com, 2023

  34. [34]

    Microsoft-Copilot,

    Microsoft, “Microsoft-Copilot,” site: https://www.bing.com/chat, 2023

  35. [35]

    CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

    H. Husain, H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, “Codesearchnet challenge: Evaluating the state of semantic code search,” arXiv, vol. abs/1909.09436, 2019

  36. [36]

    An empirical study on learning bug-fixing patches in the wild via neural machine translation,

    M. Tufano, C. Watson, G. Bavota, M. D. Penta, M. White, and D. Poshy- vanyk, “An empirical study on learning bug-fixing patches in the wild via neural machine translation,”ACM Trans. Softw. Eng. Methodol., vol. 28, no. 4, pp. 19:1–19:29, 2019

  37. [37]

    Codexglue: A machine learning benchmark dataset for code understanding and generation,

    S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. B. Clement, D. Drain, D. Jiang, D. Tanget al., “Codexglue: A machine learning benchmark dataset for code understanding and generation,” in Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, virtual, December 2021

  38. [38]

    Codebert: A pre-trained model for programming and natural languages,

    Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “Codebert: A pre-trained model for programming and natural languages,” inFindings of the Association for Computational Linguistics, ser. Findings of ACL, vol. EMNLP 2020. Online Event: Association for Computational Linguistics, 16-20 November 2020, pp. 1536–1547

  39. [39]

    Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,

    Y. Wang, W. Wang, S. R. Joty, and S. C. H. Hoi, “Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana, Dominican Republic: Association for Computational Linguistics, 7-11 November 2021, pp. 8696–8708

  40. [40]

    Unixcoder: Unified cross-modal pre-training for code representation,

    D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, and J. Yin, “Unixcoder: Unified cross-modal pre-training for code representation,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics, May 22-27 2022, pp. 7212–7225. TRANSACTION ON SOFTWARE EN...

  41. [41]

    Starcoder: may the source be with you!

    R. Li, L. B. Allal, Y. Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. Li, J. Chimet al., “Starcoder: may the source be with you!”Transactions on Machine Learning Research, vol. 2023, 2023

  42. [42]

    Openai api,

    I. OpenAI, “Openai api,” site: https://platform.openai.com/docs/models, 2015

  43. [43]

    Distilled GPT for source code summarization,

    C. Su and C. McMillan, “Distilled GPT for source code summarization,” Autom. Softw. Eng., vol. 31, no. 1, p. 22, 2024

  44. [44]

    A diversity- promoting objective function for neural conversation models,

    J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan, “A diversity- promoting objective function for neural conversation models,” inThe 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego California, USA: The Association for Computational Linguistics, June 12-17 2016, pp. 110–119

  45. [45]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    A. Zou, Z. Wang, J. Z. Kolter, and M. Fredrikson, “Universal and transferable adversarial attacks on aligned language models,”arXiv, vol. abs/2307.15043, 2023

  46. [46]

    A language model of java methods with train/test deduplication,

    C. Su, A. Bansal, V. Jain, S. Ghanavati, and C. McMillan, “A language model of java methods with train/test deduplication,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. San Francisco, CA, USA: ACM, December 3-9 2023, pp. 2152–2156

  47. [47]

    Decoma: Detecting and purifying code dataset watermarks through dual channel code abstraction,

    Y. Xiao, Y. Chen, S. Ma, H. Huang, C. Fang, Y. Chen, W. Sun, Y. Zhu, X. Zhang, and Z. Chen, “Decoma: Detecting and purifying code dataset watermarks through dual channel code abstraction,”Proc. ACM Softw. Eng., vol. 2, no. ISSTA, pp. 1701–1724, 2025

  48. [48]

    Detecting backdoor attacks on deep neural networks by activation clustering,

    B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. M. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” inWorkshop on Artificial Intelligence Safety co-located with the Thirty-Third Conference on Artificial Intelligence, ser. CEUR Workshop Proceedings, vol. 2301. Honolulu, Hawaii: ...

  49. [49]

    Scikit-learn: Machine learning in python,

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. VanderPlas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in python,”J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011

  50. [50]

    Auto-keras: An efficient neural architecture search system,

    H. Jin, Q. Song, and X. Hu, “Auto-keras: An efficient neural architecture search system,” inProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Anchorage, AK, USA: ACM, August 4-8 2019, pp. 1946–1956

  51. [51]

    Llmmap: Finger- printing for large language models,

    D. Pasquini, E. M. Kornaropoulos, and G. Ateniese, “Llmmap: Finger- printing for large language models,” inProceedings of the 34th USENIX Security Symposium. Seattle, WA, USA: USENIX Association, August 13-15 2025, pp. 299–318

  52. [52]

    SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

    Y. Tong, H. Wang, S. Li, K. Kawaguchi, and T. Hu, “Seedprints: Fingerprints can even tell which seed your large language model was trained from,”arXiv, vol. abs/2509.26404, 2025

  53. [53]

    Membership inference attacks against in-context learning,

    R. Wen, Z. Li, M. Backes, and Y. Zhang, “Membership inference attacks against in-context learning,” inProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. Salt Lake City, UT, USA: ACM, October 14-18 2024, pp. 3481–3495

  54. [54]

    Towards label-only membership inference attack against pre- trained large language models,

    Y. He, B. Li, L. Liu, Z. Ba, W. Dong, Y. Li, Z. Qin, K. Ren, and C. Chen, “Towards label-only membership inference attack against pre- trained large language models,” inProceedings of the 34th USENIX Security Symposium. Seattle, WA, USA: USENIX Association, August 13-15 2025, pp. 1609–1628

  55. [55]

    Modeling and discovering vulnerabilities with code property graphs,

    F. Yamaguchi, N. Golde, D. Arp, and K. Rieck, “Modeling and discovering vulnerabilities with code property graphs,” in2014 IEEE Symposium on Security and Privacy. Berkeley, CA, USA: IEEE Computer Society, May 18-21 2014, pp. 590–604

  56. [56]

    Shortcut learning in deep neural networks,

    R. Geirhos, J. Jacobsen, C. Michaelis, R. S. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,”Nat. Mach. Intell., vol. 2, no. 11, pp. 665–673, 2020

  57. [57]

    and McKeown, Kathleen

    L. Tu, G. Lalwani, S. Gella, and H. He, “An empirical study on robustness to spurious correlations using pre-trained language models,”Trans. Assoc. Comput. Linguistics, vol. 8, pp. 621–633, 2020. [Online]. Available: https://doi.org/10.1162/tacl a 00335

  58. [58]

    Spurious correlations in machine learning: A survey,

    W. Ye, G. Zheng, X. Cao, Y. Ma, X. Hu, and A. Zhang, “Spurious correlations in machine learning: A survey,”arXiv, vol. abs/2402.12715, 2024

  59. [59]

    Hidden backdoor attack against neural code search models,

    Y. Chen, W. Sun, C. Fang, Q. Zhang, Z. Chen, and X. Zhang, “Hidden backdoor attack against neural code search models,”ACM Trans. Softw. Eng. Methodol., 2025, just Accepted. [Online]. Available: https://doi.org/10.1145/3774421

  60. [60]

    Multi-target backdoor attacks for code pre-trained models,

    Y. Li, S. Liu, K. Chen, X. Xie, T. Zhang, and Y. Liu, “Multi-target backdoor attacks for code pre-trained models,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Toronto, Canada: Association for Computational Linguistics, July 9-14 2023, pp. 7236–7254