Semantic Code Clone Detection: Are We There Yet?

Hai Wan; Jiabao Gao; Tian Qiu; Weixian Deng; Xiaolin Peng; Xibin Zhao; Xuyang Liu; Zhiwei Xu

arxiv: 2606.25272 · v1 · pith:MGQASGIInew · submitted 2026-06-24 · 💻 cs.SE

Semantic Code Clone Detection: Are We There Yet?

Zhiwei Xu , Weixian Deng , Xuyang Liu , Xiaolin Peng , Jiabao Gao , Tian Qiu , Hai Wan , Xibin Zhao This is my paper

Pith reviewed 2026-06-25 20:47 UTC · model grok-4.3

classification 💻 cs.SE

keywords code clone detectionsemantic clonesType-4 clonesshortcut learninggeneralizabilityBigCloneBenchclone operatorsdistribution shift

0 comments

The pith

Semantic code clone detectors rely on lexical shortcuts and degrade sharply on distribution-shifted but equivalent code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether high benchmark scores for semantic Type-4 clone detection reflect genuine understanding of program equivalence or merely exploitation of dataset patterns. It builds a clone operator framework of eight transformations drawn from Type-2 and Type-3 variations to produce new, distribution-shifted yet semantically identical clone pairs. These pairs are applied to the real-world BigCloneBench dataset and used to re-evaluate eleven detectors from token, tree, and graph families. All detectors suffer large performance losses, and further checks show they lean on surface cues rather than semantic invariants. The results indicate that current detectors have limited ability to generalize beyond the benchmarks on which they were tuned.

Core claim

State-of-the-art semantic code clone detectors exhibit substantial performance degradation on distribution-shifted Type-4 clone instances generated by eight transformation operators from Type-2 and Type-3 variations, even though the instances remain semantically equivalent; further analysis shows the detectors depend on shortcut learning from lexical and structural cues instead of robust semantic understanding.

What carries the argument

The clone operator framework of eight transformation operators derived from Type-2 and Type-3 clone variations, which produces distribution-shifted yet semantically equivalent Type-4 instances for testing generalizability.

If this is right

Benchmark scores alone cannot certify that a detector has learned semantic equivalence.
Detectors must be retested on transformed data to expose reliance on surface cues.
Real-world deployment of current detectors risks missing clones that differ in naming or structure.
Research should prioritize methods that remain stable under lexical and structural variation.
Existing evaluation practices may systematically overestimate practical effectiveness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar shortcut problems likely appear in other code-analysis tasks such as defect prediction or refactoring recommendation.
New benchmarks could be built by systematically applying these operators to multiple source corpora.
Adversarial or distribution-shift testing may become a standard requirement for claims of semantic capability in program analysis.

Load-bearing premise

The eight transformation operators produce distribution-shifted instances that remain semantically equivalent to the originals and representative of real-world code variations.

What would settle it

Apply the same eight operators to a fresh, large code corpus not used in the original benchmarks and measure whether the performance drop disappears or persists across multiple detectors.

Figures

Figures reproduced from arXiv: 2606.25272 by Hai Wan, Jiabao Gao, Tian Qiu, Weixian Deng, Xiaolin Peng, Xibin Zhao, Xuyang Liu, Zhiwei Xu.

**Figure 2.** Figure 2: Testbed workflow in our empirical study. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Effect of identifier renaming (O1) under varying similarity thresholds. Lower threshold indicates stronger perturbation. 1 2 3 4 5 Number of Perturbations 0.0 -0.1 -0.2 -0.3 -0.4 -0.5 Normalized ΔF1 Token-based Toma CodeBERT Mamba Llama CC2Vec 1 2 3 4 5 Number of Perturbations 0.0 -0.1 -0.2 -0.3 -0.4 -0.5 Tree-based ASTNN xASTNN DSFM MRT-OAST 1 2 3 4 5 Number of Perturbations 0.0 -0.1 -0.2 -0.3 -0.4 -0.5 G… view at source ↗

**Figure 4.** Figure 4: Effect of repeated constant replacement ( [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of repeated redundant constant insertion ( [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of repeated redundant statement insertion ( [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Impact ratio of renaming and OoV in O1. Ratios are computed after min–max normalization by comparing ∆F1norm obtained using detector-specific vocabularies (without OoV) against ∆F1norm obtained using the default CodeBERT vocabulary (with potential OoV). The x-axis indicates the similarity threshold to accept renaming (0.1–0.9) [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Pairwise comparison of strategy effectiveness within [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

read the original abstract

Code clone detection has been extensively studied for decades, and recent approaches have begun reporting remarkably high performance for semantic (Type-4) clones on benchmark datasets. However, it remains unclear whether these results reflect a genuine ability to capture semantic equivalence between programs, or simply an ability to exploit dataset-specific patterns. In this paper, we present the first systematic empirical study investigating the generalizability of state-of-the-art (SOTA) semantic code clone detectors beyond benchmark evaluation settings. Inspired by the inherent inclusion relationship among clone types, we propose a clone operator framework consisting of eight transformation operators derived from Type-2 and Type-3 clone variations. Using these operators, we construct distribution-shifted yet semantically equivalent Type-4 clone instances and evaluate 11 representative detectors spanning token-based, tree-based, and graph-based paradigms on the real-world BigCloneBench dataset. Our results reveal substantial performance degradation across all evaluated approaches, despite their strong benchmark performance. Further analyses show that existing detectors heavily rely on shortcut learning based on lexical and structural cues rather than robust semantic understanding. Our findings suggest that current SOTA semantic code clone detectors exhibit limited generalizability in real-world scenarios, highlighting important avenues for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows clear performance drops on its transformed clones but the operators stay close to Type-2/3 changes, so the shortcut-learning conclusion rests on a narrower claim than the abstract suggests.

read the letter

The key point for you is that this work finds consistent degradation across 11 detectors when clones from BigCloneBench are altered with eight operators drawn from Type-2 and Type-3 patterns. The authors treat the results as evidence that detectors exploit lexical and structural shortcuts rather than learning semantics.

What is new is the clone operator framework itself and the decision to run a systematic generalizability test on distribution-shifted instances instead of stopping at benchmark numbers. The evaluation covers token-, tree-, and graph-based detectors on real-world data, and the post-hoc analyses of cue reliance are a reasonable next step.

The execution looks solid for an empirical measurement study: direct observation of drops, no fitted parameters, and an explicit attempt to move beyond the training distribution. That part earns credit.

The soft spot is the one the stress-test note flags. Because the operators are derived from Type-2/3 variations, the transformed pairs still share substantial surface structure. Genuine Type-4 clones often differ in algorithm or data structure in ways these operators do not replicate. If the constructed pairs remain closer to syntactic clones than to deep semantic ones, the observed brittleness demonstrates sensitivity to particular perturbations rather than absence of semantic reasoning. The abstract gives no concrete validation that the transformations preserve exact semantics or that the shifts match real-world Type-4 distributions, so that link needs checking in the full methods.

The paper is aimed at people who build or evaluate semantic clone detectors. Anyone working on ML-based code analysis would get value from the cautionary data. It is coherent on its own terms and deserves a serious referee even if the interpretation of the drops turns out to be narrower than claimed.

Referee Report

3 major / 2 minor

Summary. The paper claims that despite strong reported performance on benchmarks, 11 state-of-the-art semantic (Type-4) code clone detectors exhibit substantial degradation when tested on distribution-shifted instances generated by a proposed clone operator framework of eight transformations derived from Type-2 and Type-3 variations; applied to BigCloneBench, the results indicate reliance on lexical/structural shortcuts rather than robust semantic understanding, implying limited real-world generalizability.

Significance. If the constructed instances are shown to be valid semantically equivalent Type-4 shifts that are representative of real-world variations, the systematic evaluation across token-, tree-, and graph-based detectors would be a useful contribution highlighting the gap between benchmark scores and generalization; the empirical scope on multiple paradigms is a strength that could inform future detector design and evaluation practices.

major comments (3)

[Clone operator framework] Clone operator framework (described in the methods): the eight operators are derived from Type-2/3 syntactic variations (identifier renaming, formatting, control-flow reordering, statement permutation); this raises a correctness risk for the central claim because the generated pairs may remain closer to Type-2/3 than to genuine Type-4 algorithmic differences. A concrete test would be to report syntactic similarity metrics (e.g., token overlap or AST edit distance) of the transformed pairs versus known Type-4 pairs in BigCloneBench.
[Experiments and results] Evaluation results (reported in the experiments section): substantial performance degradation is claimed across all 11 detectors, but the manuscript provides no details on statistical significance testing of the observed drops or confidence intervals, which is load-bearing for interpreting the degradation as evidence against semantic understanding.
[Shortcut learning analyses] Shortcut learning analyses (post-hoc section): the claim that detectors rely on lexical and structural cues rather than semantics requires explicit description of the analysis methods (e.g., feature ablation or attention visualization) to confirm they support the conclusion; without this, the interpretation remains plausible but unverified.

minor comments (2)

[Abstract] Ensure the abstract states the number of detectors and the dataset explicitly for clarity.
[Results tables] Verify that all tables reporting performance metrics include baseline comparisons and exact definitions of the metrics used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each of the three major comments below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Clone operator framework] Clone operator framework (described in the methods): the eight operators are derived from Type-2/3 syntactic variations (identifier renaming, formatting, control-flow reordering, statement permutation); this raises a correctness risk for the central claim because the generated pairs may remain closer to Type-2/3 than to genuine Type-4 algorithmic differences. A concrete test would be to report syntactic similarity metrics (e.g., token overlap or AST edit distance) of the transformed pairs versus known Type-4 pairs in BigCloneBench.

Authors: We agree that explicitly quantifying the syntactic distance introduced by the operators would help substantiate that the shifts qualify as distribution shifts for Type-4 clones. The operators are intentionally drawn from Type-2/3 variations precisely because these represent common real-world syntactic changes that preserve semantics; applying them to BigCloneBench Type-4 pairs creates the desired distribution shift while keeping semantic equivalence. In the revision we will add a table reporting token overlap, normalized AST edit distance, and other syntactic similarity metrics comparing our transformed pairs against both the original BigCloneBench Type-4 pairs and a sample of known Type-2/3 pairs, thereby providing the requested concrete test. revision: yes
Referee: [Experiments and results] Evaluation results (reported in the experiments section): substantial performance degradation is claimed across all 11 detectors, but the manuscript provides no details on statistical significance testing of the observed drops or confidence intervals, which is load-bearing for interpreting the degradation as evidence against semantic understanding.

Authors: We concur that statistical rigor is necessary to support claims of substantial degradation. The current manuscript reports raw performance drops but omits formal testing. In the revised version we will include paired statistical tests (Wilcoxon signed-rank or McNemar as appropriate) together with 95% confidence intervals computed via bootstrap resampling for each detector and each transformation, allowing readers to assess the reliability of the observed drops. revision: yes
Referee: [Shortcut learning analyses] Shortcut learning analyses (post-hoc section): the claim that detectors rely on lexical and structural cues rather than semantics requires explicit description of the analysis methods (e.g., feature ablation or attention visualization) to confirm they support the conclusion; without this, the interpretation remains plausible but unverified.

Authors: The post-hoc section presents correlation analyses between detector predictions and surface-level features (token n-grams, AST node frequencies, control-flow patterns) as well as ablation experiments that mask lexical or structural cues. To address the request for explicit method description, we will expand this section with a dedicated subsection detailing the feature sets used, the ablation procedure, the correlation metrics, and any visualization techniques employed, thereby making the evidence for shortcut reliance fully reproducible and verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurement study

full rationale

The paper performs an empirical evaluation by constructing test instances via eight transformation operators and measuring detector performance degradation on BigCloneBench. No equations, fitted parameters, derivations, or predictions are present that reduce to inputs by construction. Central claims rest on direct observations of performance numbers rather than any self-referential or fitted logic. The operators are presented as a methodological choice derived from clone-type relationships, but this is an explicit design decision, not a circular reduction. Self-citations, if any, are not load-bearing for the reported results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that the proposed operators generate valid semantically equivalent yet distribution-shifted instances; no free parameters or invented entities beyond the framework itself.

axioms (1)

domain assumption Clone types exhibit an inclusion relationship that allows Type-4 instances to be derived from Type-2 and Type-3 variations via eight transformation operators.
Explicitly invoked to justify construction of test cases.

invented entities (1)

Clone operator framework with eight transformation operators no independent evidence
purpose: To generate distribution-shifted semantically equivalent Type-4 clones for testing generalizability
Newly introduced in the paper; no independent evidence provided beyond the study itself.

pith-pipeline@v0.9.1-grok · 5756 in / 1308 out tokens · 19844 ms · 2026-06-25T20:47:04.647651+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 3 linked inside Pith

[1]

Code clone detection—a systematic review,

G. Shobha, A. Rana, V . Kansal, and S. Tanwar, “Code clone detection—a systematic review,”Emerging Technologies in Data Mining and Infor- mation Security: Proceedings of IEMIS 2020, Volume 2, pp. 645–655, 2021

2020
[2]

Survey of research on software clones,

R. Koschke, “Survey of research on software clones,” 2007

2007
[3]

Do code clones matter?

E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner, “Do code clones matter?” in2009 IEEE 31st International Conference on Software Engineering. IEEE, 2009, pp. 485–495

2009
[4]

Cp-miner: Finding copy-paste and related bugs in large-scale software code,

Z. Li, S. Lu, S. Myagmar, and Y . Zhou, “Cp-miner: Finding copy-paste and related bugs in large-scale software code,”IEEE Transactions on software Engineering, vol. 32, no. 3, pp. 176–192, 2006

2006
[5]

Dsfm: Enhancing functional code clone detection with deep subtree interactions,

Z. Xu, S. Qiang, D. Song, M. Zhou, H. Wan, X. Zhao, P. Luo, and H. Zhang, “Dsfm: Enhancing functional code clone detection with deep subtree interactions,” inProceedings of the IEEE/ACM 46th Interna- tional Conference on Software Engineering, 2024, pp. 1–12

2024
[6]

Machine learning is all you need: A simple token-based approach for effective code clone detection,

S. Feng, W. Suo, Y . Wu, D. Zou, Y . Liu, and H. Jin, “Machine learning is all you need: A simple token-based approach for effective code clone detection,” inProceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024, pp. 1–13

2024
[7]

Cc2vec: Combining typed tokens with contrastive learning for effective code clone detection,

S. Dou, Y . Wu, H. Jia, Y . Zhou, Y . Liu, and Y . Liu, “Cc2vec: Combining typed tokens with contrastive learning for effective code clone detection,”Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 1564–1584, 2024

2024
[8]

Codebert: A pre-trained model for programming and natural languages,

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jianget al., “Codebert: A pre-trained model for programming and natural languages,” inFindings of the association for computational linguistics: EMNLP 2020, 2020, pp. 1536–1547

2020
[9]

Can mamba be better? an experimental evaluation of mamba in code intelligence,

S. Liu, J. Keung, Z. Yang, Z. Mao, and Y . Sun, “Can mamba be better? an experimental evaluation of mamba in code intelligence,” in 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2025, pp. 1856–1868

2025
[10]

Llama: Open and efficient foundation language models,

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

Pith/arXiv arXiv 2023
[11]

A novel neural source code representation based on abstract syntax tree,

J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, and X. Liu, “A novel neural source code representation based on abstract syntax tree,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 2019, pp. 783–794

2019
[12]

xastnn: Improved code representations for industrial practice,

Z. Xu, M. Zhou, X. Zhao, Y . Chen, X. Cheng, and H. Zhang, “xastnn: Improved code representations for industrial practice,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 1727–1738

2023
[13]

A multiple representation transformer with optimized abstract syntax tree for efficient code clone detection,

T. Yu, L. Yuan, L. Lin, and H. He, “A multiple representation transformer with optimized abstract syntax tree for efficient code clone detection,” in2025 IEEE/ACM 47th International Conference on Software Engi- neering (ICSE). IEEE, 2025, pp. 281–293

2025
[14]

Detecting code clones with graph neural network and flow-augmented abstract syntax tree,

W. Wang, G. Li, B. Ma, X. Xia, and Z. Jin, “Detecting code clones with graph neural network and flow-augmented abstract syntax tree,” in2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2020, pp. 261–271

2020
[15]

Graphcodebert: Pre-training code repre- sentations with data flow,

D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fuet al., “Graphcodebert: Pre-training code repre- sentations with data flow,”arXiv preprint arXiv:2009.08366, 2020

Pith/arXiv arXiv 2009
[16]

Oreo: Detection of clones in the twilight zone,

V . Saini, F. Farmahinifarahani, Y . Lu, P. Baldi, and C. V . Lopes, “Oreo: Detection of clones in the twilight zone,” inProceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, 2018, pp. 354–365

2018
[17]

Sourcerercc: Scaling code clone detection to big-code,

H. Sajnani, V . Saini, J. Svajlenko, C. K. Roy, and C. V . Lopes, “Sourcerercc: Scaling code clone detection to big-code,” inProceedings of the 38th international conference on software engineering, 2016, pp. 1157–1168

2016
[18]

Ccaligner: a token based large-gap clone detector,

P. Wang, J. Svajlenko, Y . Wu, Y . Xu, and C. K. Roy, “Ccaligner: a token based large-gap clone detector,” inProceedings of the 40th International Conference on Software Engineering, 2018, pp. 1066–1077

2018
[19]

Ccgraph: a pdg-based code clone detector with approximate graph matching,

Y . Zou, B. Ban, Y . Xue, and Y . Xu, “Ccgraph: a pdg-based code clone detector with approximate graph matching,” inProceedings of the 35th IEEE/ACM international conference on automated software engineering, 2020, pp. 931–942

2020
[20]

Towards a big data curated benchmark of inter-project code clones,

J. Svajlenko, J. F. Islam, I. Keivanloo, C. K. Roy, and M. M. Mia, “Towards a big data curated benchmark of inter-project code clones,” in2014 IEEE international conference on software maintenance and evolution. IEEE, 2014, pp. 476–480

2014
[21]

Comparison and evaluation of clone detection techniques with different code representa- tions,

Y . Wang, Y . Ye, Y . Wu, W. Zhang, Y . Xue, and Y . Liu, “Comparison and evaluation of clone detection techniques with different code representa- tions,” in2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 332–344

2023
[22]

Shortcut learning in deep neural networks,

R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,”Nature Machine Intelligence, vol. 2, no. 11, pp. 665–673, 2020

2020
[23]

Ccfinder: A multilinguistic token-based code clone detection system for large scale source code,

T. Kamiya, S. Kusumoto, and K. Inoue, “Ccfinder: A multilinguistic token-based code clone detection system for large scale source code,” IEEE transactions on software engineering, vol. 28, no. 7, pp. 654–670, 2002

2002
[24]

Deckard: Scalable and accurate tree-based detection of code clones,

L. Jiang, G. Misherghi, Z. Su, and S. Glondu, “Deckard: Scalable and accurate tree-based detection of code clones,” in29th International Conference on Software Engineering (ICSE’07). IEEE, 2007, pp. 96– 105

2007
[25]

Deep learning code fragments for code clone detection,

M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, “Deep learning code fragments for code clone detection,” inProceedings of the 31st IEEE/ACM international conference on automated software engineering, 2016, pp. 87–98

2016
[26]

Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code

H. Wei and M. Li, “Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code.” inIjcai, 2017, pp. 3034–3040

2017
[27]

Deepsim: deep learning code functional simi- larity,

G. Zhao and J. Huang, “Deepsim: deep learning code functional simi- larity,” inProceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2018, pp. 141–151

2018
[28]

Neural detection of semantic code clones via tree-based convolution,

H. Yu, W. Lam, L. Chen, G. Li, T. Xie, and Q. Wang, “Neural detection of semantic code clones via tree-based convolution,” in2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 2019, pp. 70–80

2019
[29]

Detecting semantic clones of unseen functionality,

K. Kitsios, F. Sovrano, E. T. Barr, and A. Bacchelli, “Detecting semantic clones of unseen functionality,” in2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2025, pp. 1312–1324

2025
[30]

An empirical study of llm-based code clone detection,

W. Zhu, N. Yoshida, E. Choi, Y . Matsubara, and H. Takada, “An empirical study of llm-based code clone detection,”arXiv preprint arXiv:2511.01176, 2025

arXiv 2025
[31]

Towards understanding the capability of large language models on code clone detection: A survey,

S. Dou, J. Shan, H. Jia, W. Deng, Z. Xi, W. He, Y . Wu, T. Gui, Y . Liu, and X. Huang, “Towards understanding the capability of large language models on code clone detection: A survey,”arXiv preprint arXiv:2308.01191, 2023

arXiv 2023
[32]

An empirical study of code clones from commercial ai code generators,

W. Wu, H. Hu, Z. Fan, Y . Qiao, Y . Huang, Y . Li, Z. Zheng, and M. Lyu, “An empirical study of code clones from commercial ai code generators,” Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 2874–2896, 2025

2025
[33]

The struggles of llms in cross-lingual code clone detection,

M. B. Moumoula, A. K. Kabor ´e, J. Klein, and T. F. Bissyand ´e, “The struggles of llms in cross-lingual code clone detection,”Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 1023–1045, 2025

2025
[34]

Suggesting accurate method and class names,

M. Allamanis, E. T. Barr, C. Bird, and C. Sutton, “Suggesting accurate method and class names,” inProceedings of the 2015 10th joint meeting on foundations of software engineering, 2015, pp. 38–49

2015
[35]

Tree-sitter: An incremental parsing system for programming tools,

M. B. Azzopardi, “Tree-sitter: An incremental parsing system for programming tools,” 2024. [Online]. Available: https://tree-sitter.github. io/tree-sitter/

2024
[36]

javalang: Pure python java parser and ast,

C2nes, “javalang: Pure python java parser and ast,” https://github.com/ c2nes/javalang, 2024, accessed: 2026-05-29. 11

2024
[37]

Wordnet: a lexical database for english,

G. A. Miller, “Wordnet: a lexical database for english,”Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995

1995
[38]

Whitening sentence representations for better semantics and faster retrieval,

J. Su, J. Cao, W. Liu, and Y . Ou, “Whitening sentence representations for better semantics and faster retrieval,”arXiv preprint arXiv:2103.15316, 2021

arXiv 2021
[39]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017
[40]

Mamba: Linear-time sequence modeling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023

Pith/arXiv arXiv 2023
[41]

Google code jam,

“Google code jam,” https://code.google.com/codejam/contests.html, 2016, Note: Google Code Jam was officially discontinued in 2023

2016
[42]

Convolutional neural networks over tree structures for programming language processing,

L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin, “Convolutional neural networks over tree structures for programming language processing,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2016, pp. 1287–1293

2016
[43]

Prism: Decomposing program semantics for code clone detection through compilation,

H. Li, S. Wang, W. Quan, X. Gong, H. Su, and J. Zhang, “Prism: Decomposing program semantics for code clone detection through compilation,” inProceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024, pp. 1–13

2024
[44]

Scdetector: Software functional clone detection based on semantic tokens analysis,

Y . Wu, D. Zou, S. Dou, S. Yang, W. Yang, F. Cheng, H. Liang, and H. Jin, “Scdetector: Software functional clone detection based on semantic tokens analysis,” inProceedings of the 35th IEEE/ACM international conference on automated software engineering, 2020, pp. 821–833

2020
[45]

Modeling functional similarity in source code with graph-based siamese networks,

N. Mehrotra, N. Agarwal, P. Gupta, S. Anand, D. Lo, and R. Purandare, “Modeling functional similarity in source code with graph-based siamese networks,”IEEE Transactions on Software Engineering, vol. 48, no. 10, pp. 3771–3789, 2021

2021
[46]

Java code clone detec- tion by exploiting semantic and syntax information from intermediate code-based graph,

D. Yuan, S. Fang, T. Zhang, Z. Xu, and X. Luo, “Java code clone detec- tion by exploiting semantic and syntax information from intermediate code-based graph,”IEEE Transactions on Reliability, vol. 72, no. 2, pp. 511–526, 2022. 12

2022

[1] [1]

Code clone detection—a systematic review,

G. Shobha, A. Rana, V . Kansal, and S. Tanwar, “Code clone detection—a systematic review,”Emerging Technologies in Data Mining and Infor- mation Security: Proceedings of IEMIS 2020, Volume 2, pp. 645–655, 2021

2020

[2] [2]

Survey of research on software clones,

R. Koschke, “Survey of research on software clones,” 2007

2007

[3] [3]

Do code clones matter?

E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner, “Do code clones matter?” in2009 IEEE 31st International Conference on Software Engineering. IEEE, 2009, pp. 485–495

2009

[4] [4]

Cp-miner: Finding copy-paste and related bugs in large-scale software code,

Z. Li, S. Lu, S. Myagmar, and Y . Zhou, “Cp-miner: Finding copy-paste and related bugs in large-scale software code,”IEEE Transactions on software Engineering, vol. 32, no. 3, pp. 176–192, 2006

2006

[5] [5]

Dsfm: Enhancing functional code clone detection with deep subtree interactions,

Z. Xu, S. Qiang, D. Song, M. Zhou, H. Wan, X. Zhao, P. Luo, and H. Zhang, “Dsfm: Enhancing functional code clone detection with deep subtree interactions,” inProceedings of the IEEE/ACM 46th Interna- tional Conference on Software Engineering, 2024, pp. 1–12

2024

[6] [6]

Machine learning is all you need: A simple token-based approach for effective code clone detection,

S. Feng, W. Suo, Y . Wu, D. Zou, Y . Liu, and H. Jin, “Machine learning is all you need: A simple token-based approach for effective code clone detection,” inProceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024, pp. 1–13

2024

[7] [7]

Cc2vec: Combining typed tokens with contrastive learning for effective code clone detection,

S. Dou, Y . Wu, H. Jia, Y . Zhou, Y . Liu, and Y . Liu, “Cc2vec: Combining typed tokens with contrastive learning for effective code clone detection,”Proceedings of the ACM on Software Engineering, vol. 1, no. FSE, pp. 1564–1584, 2024

2024

[8] [8]

Codebert: A pre-trained model for programming and natural languages,

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jianget al., “Codebert: A pre-trained model for programming and natural languages,” inFindings of the association for computational linguistics: EMNLP 2020, 2020, pp. 1536–1547

2020

[9] [9]

Can mamba be better? an experimental evaluation of mamba in code intelligence,

S. Liu, J. Keung, Z. Yang, Z. Mao, and Y . Sun, “Can mamba be better? an experimental evaluation of mamba in code intelligence,” in 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2025, pp. 1856–1868

2025

[10] [10]

Llama: Open and efficient foundation language models,

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

Pith/arXiv arXiv 2023

[11] [11]

A novel neural source code representation based on abstract syntax tree,

J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, and X. Liu, “A novel neural source code representation based on abstract syntax tree,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 2019, pp. 783–794

2019

[12] [12]

xastnn: Improved code representations for industrial practice,

Z. Xu, M. Zhou, X. Zhao, Y . Chen, X. Cheng, and H. Zhang, “xastnn: Improved code representations for industrial practice,” inProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, pp. 1727–1738

2023

[13] [13]

A multiple representation transformer with optimized abstract syntax tree for efficient code clone detection,

T. Yu, L. Yuan, L. Lin, and H. He, “A multiple representation transformer with optimized abstract syntax tree for efficient code clone detection,” in2025 IEEE/ACM 47th International Conference on Software Engi- neering (ICSE). IEEE, 2025, pp. 281–293

2025

[14] [14]

Detecting code clones with graph neural network and flow-augmented abstract syntax tree,

W. Wang, G. Li, B. Ma, X. Xia, and Z. Jin, “Detecting code clones with graph neural network and flow-augmented abstract syntax tree,” in2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2020, pp. 261–271

2020

[15] [15]

Graphcodebert: Pre-training code repre- sentations with data flow,

D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fuet al., “Graphcodebert: Pre-training code repre- sentations with data flow,”arXiv preprint arXiv:2009.08366, 2020

Pith/arXiv arXiv 2009

[16] [16]

Oreo: Detection of clones in the twilight zone,

V . Saini, F. Farmahinifarahani, Y . Lu, P. Baldi, and C. V . Lopes, “Oreo: Detection of clones in the twilight zone,” inProceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, 2018, pp. 354–365

2018

[17] [17]

Sourcerercc: Scaling code clone detection to big-code,

H. Sajnani, V . Saini, J. Svajlenko, C. K. Roy, and C. V . Lopes, “Sourcerercc: Scaling code clone detection to big-code,” inProceedings of the 38th international conference on software engineering, 2016, pp. 1157–1168

2016

[18] [18]

Ccaligner: a token based large-gap clone detector,

P. Wang, J. Svajlenko, Y . Wu, Y . Xu, and C. K. Roy, “Ccaligner: a token based large-gap clone detector,” inProceedings of the 40th International Conference on Software Engineering, 2018, pp. 1066–1077

2018

[19] [19]

Ccgraph: a pdg-based code clone detector with approximate graph matching,

Y . Zou, B. Ban, Y . Xue, and Y . Xu, “Ccgraph: a pdg-based code clone detector with approximate graph matching,” inProceedings of the 35th IEEE/ACM international conference on automated software engineering, 2020, pp. 931–942

2020

[20] [20]

Towards a big data curated benchmark of inter-project code clones,

J. Svajlenko, J. F. Islam, I. Keivanloo, C. K. Roy, and M. M. Mia, “Towards a big data curated benchmark of inter-project code clones,” in2014 IEEE international conference on software maintenance and evolution. IEEE, 2014, pp. 476–480

2014

[21] [21]

Comparison and evaluation of clone detection techniques with different code representa- tions,

Y . Wang, Y . Ye, Y . Wu, W. Zhang, Y . Xue, and Y . Liu, “Comparison and evaluation of clone detection techniques with different code representa- tions,” in2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 332–344

2023

[22] [22]

Shortcut learning in deep neural networks,

R. Geirhos, J.-H. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann, “Shortcut learning in deep neural networks,”Nature Machine Intelligence, vol. 2, no. 11, pp. 665–673, 2020

2020

[23] [23]

Ccfinder: A multilinguistic token-based code clone detection system for large scale source code,

T. Kamiya, S. Kusumoto, and K. Inoue, “Ccfinder: A multilinguistic token-based code clone detection system for large scale source code,” IEEE transactions on software engineering, vol. 28, no. 7, pp. 654–670, 2002

2002

[24] [24]

Deckard: Scalable and accurate tree-based detection of code clones,

L. Jiang, G. Misherghi, Z. Su, and S. Glondu, “Deckard: Scalable and accurate tree-based detection of code clones,” in29th International Conference on Software Engineering (ICSE’07). IEEE, 2007, pp. 96– 105

2007

[25] [25]

Deep learning code fragments for code clone detection,

M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, “Deep learning code fragments for code clone detection,” inProceedings of the 31st IEEE/ACM international conference on automated software engineering, 2016, pp. 87–98

2016

[26] [26]

Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code

H. Wei and M. Li, “Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in source code.” inIjcai, 2017, pp. 3034–3040

2017

[27] [27]

Deepsim: deep learning code functional simi- larity,

G. Zhao and J. Huang, “Deepsim: deep learning code functional simi- larity,” inProceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2018, pp. 141–151

2018

[28] [28]

Neural detection of semantic code clones via tree-based convolution,

H. Yu, W. Lam, L. Chen, G. Li, T. Xie, and Q. Wang, “Neural detection of semantic code clones via tree-based convolution,” in2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 2019, pp. 70–80

2019

[29] [29]

Detecting semantic clones of unseen functionality,

K. Kitsios, F. Sovrano, E. T. Barr, and A. Bacchelli, “Detecting semantic clones of unseen functionality,” in2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2025, pp. 1312–1324

2025

[30] [30]

An empirical study of llm-based code clone detection,

W. Zhu, N. Yoshida, E. Choi, Y . Matsubara, and H. Takada, “An empirical study of llm-based code clone detection,”arXiv preprint arXiv:2511.01176, 2025

arXiv 2025

[31] [31]

Towards understanding the capability of large language models on code clone detection: A survey,

S. Dou, J. Shan, H. Jia, W. Deng, Z. Xi, W. He, Y . Wu, T. Gui, Y . Liu, and X. Huang, “Towards understanding the capability of large language models on code clone detection: A survey,”arXiv preprint arXiv:2308.01191, 2023

arXiv 2023

[32] [32]

An empirical study of code clones from commercial ai code generators,

W. Wu, H. Hu, Z. Fan, Y . Qiao, Y . Huang, Y . Li, Z. Zheng, and M. Lyu, “An empirical study of code clones from commercial ai code generators,” Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 2874–2896, 2025

2025

[33] [33]

The struggles of llms in cross-lingual code clone detection,

M. B. Moumoula, A. K. Kabor ´e, J. Klein, and T. F. Bissyand ´e, “The struggles of llms in cross-lingual code clone detection,”Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 1023–1045, 2025

2025

[34] [34]

Suggesting accurate method and class names,

M. Allamanis, E. T. Barr, C. Bird, and C. Sutton, “Suggesting accurate method and class names,” inProceedings of the 2015 10th joint meeting on foundations of software engineering, 2015, pp. 38–49

2015

[35] [35]

Tree-sitter: An incremental parsing system for programming tools,

M. B. Azzopardi, “Tree-sitter: An incremental parsing system for programming tools,” 2024. [Online]. Available: https://tree-sitter.github. io/tree-sitter/

2024

[36] [36]

javalang: Pure python java parser and ast,

C2nes, “javalang: Pure python java parser and ast,” https://github.com/ c2nes/javalang, 2024, accessed: 2026-05-29. 11

2024

[37] [37]

Wordnet: a lexical database for english,

G. A. Miller, “Wordnet: a lexical database for english,”Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995

1995

[38] [38]

Whitening sentence representations for better semantics and faster retrieval,

J. Su, J. Cao, W. Liu, and Y . Ou, “Whitening sentence representations for better semantics and faster retrieval,”arXiv preprint arXiv:2103.15316, 2021

arXiv 2021

[39] [39]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017

[40] [40]

Mamba: Linear-time sequence modeling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,”arXiv preprint arXiv:2312.00752, 2023

Pith/arXiv arXiv 2023

[41] [41]

Google code jam,

“Google code jam,” https://code.google.com/codejam/contests.html, 2016, Note: Google Code Jam was officially discontinued in 2023

2016

[42] [42]

Convolutional neural networks over tree structures for programming language processing,

L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin, “Convolutional neural networks over tree structures for programming language processing,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2016, pp. 1287–1293

2016

[43] [43]

Prism: Decomposing program semantics for code clone detection through compilation,

H. Li, S. Wang, W. Quan, X. Gong, H. Su, and J. Zhang, “Prism: Decomposing program semantics for code clone detection through compilation,” inProceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024, pp. 1–13

2024

[44] [44]

Scdetector: Software functional clone detection based on semantic tokens analysis,

Y . Wu, D. Zou, S. Dou, S. Yang, W. Yang, F. Cheng, H. Liang, and H. Jin, “Scdetector: Software functional clone detection based on semantic tokens analysis,” inProceedings of the 35th IEEE/ACM international conference on automated software engineering, 2020, pp. 821–833

2020

[45] [45]

Modeling functional similarity in source code with graph-based siamese networks,

N. Mehrotra, N. Agarwal, P. Gupta, S. Anand, D. Lo, and R. Purandare, “Modeling functional similarity in source code with graph-based siamese networks,”IEEE Transactions on Software Engineering, vol. 48, no. 10, pp. 3771–3789, 2021

2021

[46] [46]

Java code clone detec- tion by exploiting semantic and syntax information from intermediate code-based graph,

D. Yuan, S. Fang, T. Zhang, Z. Xu, and X. Luo, “Java code clone detec- tion by exploiting semantic and syntax information from intermediate code-based graph,”IEEE Transactions on Reliability, vol. 72, no. 2, pp. 511–526, 2022. 12

2022