Tool cloning is pervasive in agentic AI ecosystems, with 60% of high-Jaccard and 85% of high-ssdeep similar pairs verified as true clones in a study of over 8,800 repositories.
hub
arXiv preprint arXiv:2009.08366 , year=
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
RepoDoc uses a repository knowledge graph with module clustering and semantic impact propagation to generate more complete documentation 3x faster with 85% fewer tokens and handle incremental updates 73% faster than prior LLM-based tools.
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
CodeBLEU improves correlation with human programmer scores on code synthesis tasks by adding syntactic AST matching and semantic data-flow matching to the standard BLEU n-gram approach.
NeuroFlake integrates discriminative token mining into LLMs to classify flaky tests, raising F1-score to 69.34% on FlakeBench while showing greater robustness to semantic-preserving perturbations than prior methods.
VulStyle pre-trains on 4.9M functions using code, non-terminal ASTs, and stylometry features, then fine-tunes to achieve SOTA F1 gains of 4-48% on BigVul and VulDeePecker.
Patched functions often remain similar to vulnerable ones, and a new multi-model similarity scoring system identifies residual issues like null pointer dereferences in 61% of high-risk cases from the PrimeVul dataset.
Continuous latent-vector compression improves BLEU scores on repository-level code tasks by up to 28.3% at 4x compression while cutting inference latency.
A framework combining universal AST normalization, hybrid graph-LLM embeddings, and strict execution-grounded validation achieves 89-92% intra-language accuracy and 74-80% cross-language F1 while resolving 70% of vulnerabilities at 12% failure rate.
DiffHLS predicts HLS QoR via differential learning: separate GNN+LLM models for kernel baseline and design delta are composed to yield the final estimate, showing lower MAPE than GNN baselines on PolyBench.
Sliceformer reformulates static program slicing as seq2seq using CodeT5+ with dataflow-aware pretraining via DFG permutation and span corruption plus constrained decoding, yielding up to 22% ExactMatch gains on Java and Python benchmarks.
AFGNN detects API misuses in Java code more effectively than prior methods by representing usage as graphs and clustering learned embeddings from self-supervised training.
More fault localization context does not consistently improve LLM-based program repair; file-level context gives 15-17x gains, optimal around 6-10 files, while line-level context often degrades performance from noise.
DCVD performs joint function-level vulnerability detection and statement-level localization by extracting control-dependency and semantic features in parallel branches, fusing them with contrastive alignment and bidirectional cross-attention, and applying explicit supervision at both granularities.
MultiVul uses multimodal contrastive learning to align code and comment representations, yielding up to 27% F1 gains on vulnerability detection benchmarks over prompting and code-only baselines.
Controlled experiments show PLM-GNN hybrids improve code tasks over GNN-only baselines, with PLM source having larger impact than GNN backbone.
A systematic review that categorizes prompting strategies for LLM-based code summarization, assesses their effectiveness, and identifies gaps in research and evaluation practices.
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.
citing papers explorer
-
Evaluating Tool Cloning in Agentic-AI Ecosystems
Tool cloning is pervasive in agentic AI ecosystems, with 60% of high-Jaccard and 85% of high-ssdeep similar pairs verified as true clones in a study of over 8,800 repositories.
-
RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates
RepoDoc uses a repository knowledge graph with module clustering and semantic impact propagation to generate more complete documentation 3x faster with 85% fewer tokens and handle incremental updates 73% faster than prior LLM-based tools.
-
Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
-
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
CodeBLEU improves correlation with human programmer scores on code synthesis tasks by adding syntactic AST matching and semantic data-flow matching to the standard BLEU n-gram approach.
-
NeuroFlake: A Neuro-Symbolic LLM Framework for Flaky Test Classification
NeuroFlake integrates discriminative token mining into LLMs to classify flaky tests, raising F1-score to 69.34% on FlakeBench while showing greater robustness to semantic-preserving perturbations than prior methods.
-
VulStyle: A Multi-Modal Pre-Training for Code Stylometry-Augmented Vulnerability Detection
VulStyle pre-trains on 4.9M functions using code, non-terminal ASTs, and stylometry features, then fine-tunes to achieve SOTA F1 gains of 4-48% on BigVul and VulDeePecker.
-
Residual Risk Analysis in Benign Code: How Far Are We? A Multi-Model Semantic and Structural Similarity Approach
Patched functions often remain similar to vulnerable ones, and a new multi-model similarity scoring system identifies residual issues like null pointer dereferences in 61% of high-risk cases from the PrimeVul dataset.
-
On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation
Continuous latent-vector compression improves BLEU scores on repository-level code tasks by up to 28.3% at 4x compression while cutting inference latency.
-
Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis
A framework combining universal AST normalization, hybrid graph-LLM embeddings, and strict execution-grounded validation achieves 89-92% intra-language accuracy and 74-80% cross-language F1 while resolving 70% of vulnerabilities at 12% failure rate.
-
DiffHLS: Differential Learning for High-Level Synthesis QoR Prediction with GNNs and LLM Code Embeddings
DiffHLS predicts HLS QoR via differential learning: separate GNN+LLM models for kernel baseline and design delta are composed to yield the final estimate, showing lower MAPE than GNN baselines on PolyBench.
-
Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding
Sliceformer reformulates static program slicing as seq2seq using CodeT5+ with dataflow-aware pretraining via DFG permutation and span corruption plus constrained decoding, yielding up to 22% ExactMatch gains on Java and Python benchmarks.
-
AFGNN: API Misuse Detection using Graph Neural Networks and Clustering
AFGNN detects API misuses in Java code more effectively than prior methods by representing usage as graphs and clustering learned embeddings from self-supervised training.
-
On the Role of Fault Localization Context for LLM-Based Program Repair
More fault localization context does not consistently improve LLM-based program repair; file-level context gives 15-17x gains, optimal around 6-10 files, while line-level context often degrades performance from noise.
-
DCVD: Dual-Channel Cross-Modal Fusion for Joint Vulnerability Detection and Localization
DCVD performs joint function-level vulnerability detection and statement-level localization by extracting control-dependency and semantic features in parallel branches, fusing them with contrastive alignment and bidirectional cross-attention, and applying explicit supervision at both granularities.
-
Learning Generalizable Multimodal Representations for Software Vulnerability Detection
MultiVul uses multimodal contrastive learning to align code and comment representations, yielding up to 27% F1 gains on vulnerability detection benchmarks over prompting and code-only baselines.
-
PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection
Controlled experiments show PLM-GNN hybrids improve code tasks over GNN-only baselines, with PLM source having larger impact than GNN backbone.
-
Prompt-Driven Code Summarization: A Systematic Literature Review
A systematic review that categorizes prompting strategies for LLM-based code summarization, assesses their effectiveness, and identifies gaps in research and evaluation practices.
-
A Survey on Large Language Models for Code Generation
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.