LLMs frequently specify library versions with known CVEs in generated code (36-56% of tasks), show low compatibility (20-63%), and converge on the same risky versions across models.
hub Canonical reference
Lahiri, and Sid- dhartha Sen
Canonical reference. 88% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 11representative citing papers
SiblingRepair uses LLMs with semantic sibling detection and simultaneous/iterative repair strategies to outperform prior multi-hunk APR tools like Hercules on Defects4J and GHRB benchmarks.
An empirical study of 1,004 bugs in template engine-based applications finds abnormal rendering results as the most common symptom (48.61%) and documents 17 root causes with fix patterns that often involve host-side logic changes.
PuzzleMark provides a robust and imperceptible watermarking method for code datasets using adaptive variable name concatenation and statistical verification, achieving perfect detection rates with minimal performance impact.
Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
SpecValidator detects lexical vagueness, under-specification, and syntax-formatting defects in LLM code-generation prompts with F1 0.804, outperforming GPT-5-mini and Claude Sonnet 4, and shows that under-specification is the most damaging defect type while richer benchmarks are more resilient.
An AST pattern-matching prototype with a custom DSL achieves 0.74 average F1-score on a BigCloneEval subset, outperforming CodeLlama (0.35) and code clone detectors (best recall 0.20).
Emote enhances EvoSuite by allowing non-target setup calls in modular tests and refocusing the fitness function on the target call chain, delivering 15.15% higher target method coverage on an SF100 subset.
eDySec is a deep learning-based framework that detects malicious PyPI packages through dynamic analysis, halving feature dimensionality, reducing false positives by 82%, false negatives by 79%, and boosting accuracy by 3% with near-perfect stability.
VPFinder integrates multi-source semantic information using multi-head attention to achieve 0.941 F1-score in vulnerability identification and 0.610 F1-score in type classification, outperforming prior approaches.
FixV2W uses knowledge graph embeddings plus longitudinal patterns to fix invalid CVE-CWE mappings, correctly predicting the right CWE for 69% of exploited cases in top-10 rankings and raising ML model MRR from 0.174 to 0.608.
citing papers explorer
-
Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions
LLMs frequently specify library versions with known CVEs in generated code (36-56% of tasks), show low compatibility (20-63%), and converge on the same risky versions across models.
-
SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models
SiblingRepair uses LLMs with semantic sibling detection and simultaneous/iterative repair strategies to outperform prior multi-hunk APR tools like Hercules on Defects4J and GHRB benchmarks.
-
Understanding Bugs in Template Engine-Based Applications: Symptoms, Root Causes, and Fix Patterns
An empirical study of 1,004 bugs in template engine-based applications finds abnormal rendering results as the most common symptom (48.61%) and documents 17 root causes with fix patterns that often involve host-side logic changes.
-
PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models
PuzzleMark provides a robust and imperceptible watermarking method for code datasets using adaptive variable name concatenation and statistical verification, achieving perfect detection rates with minimal performance impact.
-
Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition
An AST pattern-matching prototype with a custom DSL achieves 0.74 average F1-score on a BigCloneEval subset, outperforming CodeLlama (0.35) and code clone detectors (best recall 0.20).
-
eDySec: A Deep Learning-based Explainable Dynamic Analysis Framework for Detecting Malicious Packages in PyPI Ecosystem
eDySec is a deep learning-based framework that detects malicious PyPI packages through dynamic analysis, halving feature dimensionality, reducing false positives by 82%, false negatives by 79%, and boosting accuracy by 3% with near-perfect stability.