LLMs frequently specify library versions with known CVEs in generated code (36-56% of tasks), show low compatibility (20-63%), and converge on the same risky versions across models.
hub
Lahiri, and Sid- dhartha Sen
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
years
2026 11representative citing papers
SiblingRepair uses LLMs with semantic sibling detection and simultaneous/iterative repair strategies to outperform prior multi-hunk APR tools like Hercules on Defects4J and GHRB benchmarks.
An empirical study of 1,004 bugs in template engine-based applications finds abnormal rendering results as the most common symptom (48.61%) and documents 17 root causes with fix patterns that often involve host-side logic changes.
PuzzleMark provides a robust and imperceptible watermarking method for code datasets using adaptive variable name concatenation and statistical verification, achieving perfect detection rates with minimal performance impact.
Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
SpecValidator detects lexical vagueness, under-specification, and syntax-formatting defects in LLM code-generation prompts with F1 0.804, outperforming GPT-5-mini and Claude Sonnet 4, and shows that under-specification is the most damaging defect type while richer benchmarks are more resilient.
An AST pattern-matching prototype with a custom DSL achieves 0.74 average F1-score on a BigCloneEval subset, outperforming CodeLlama (0.35) and code clone detectors (best recall 0.20).
Emote enhances EvoSuite by allowing non-target setup calls in modular tests and refocusing the fitness function on the target call chain, delivering 15.15% higher target method coverage on an SF100 subset.
eDySec is a deep learning-based framework that detects malicious PyPI packages through dynamic analysis, halving feature dimensionality, reducing false positives by 82%, false negatives by 79%, and boosting accuracy by 3% with near-perfect stability.
VPFinder integrates multi-source semantic information using multi-head attention to achieve 0.941 F1-score in vulnerability identification and 0.610 F1-score in type classification, outperforming prior approaches.
FixV2W uses knowledge graph embeddings plus longitudinal patterns to fix invalid CVE-CWE mappings, correctly predicting the right CWE for 69% of exploited cases in top-10 rankings and raising ML model MRR from 0.174 to 0.608.
citing papers explorer
-
Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions
LLMs frequently specify library versions with known CVEs in generated code (36-56% of tasks), show low compatibility (20-63%), and converge on the same risky versions across models.
-
SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models
SiblingRepair uses LLMs with semantic sibling detection and simultaneous/iterative repair strategies to outperform prior multi-hunk APR tools like Hercules on Defects4J and GHRB benchmarks.
-
Understanding Bugs in Template Engine-Based Applications: Symptoms, Root Causes, and Fix Patterns
An empirical study of 1,004 bugs in template engine-based applications finds abnormal rendering results as the most common symptom (48.61%) and documents 17 root causes with fix patterns that often involve host-side logic changes.
-
PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models
PuzzleMark provides a robust and imperceptible watermarking method for code datasets using adaptive variable name concatenation and statistical verification, achieving perfect detection rates with minimal performance impact.
-
When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation
Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
-
Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis
SpecValidator detects lexical vagueness, under-specification, and syntax-formatting defects in LLM code-generation prompts with F1 0.804, outperforming GPT-5-mini and Claude Sonnet 4, and shows that under-specification is the most damaging defect type while richer benchmarks are more resilient.
-
Exploring the Effectiveness of Abstract Syntax Tree Patterns for Algorithm Recognition
An AST pattern-matching prototype with a custom DSL achieves 0.74 average F1-score on a BigCloneEval subset, outperforming CodeLlama (0.35) and code clone detectors (best recall 0.20).
-
On the Effectiveness of Modular Testing in EvoSuite
Emote enhances EvoSuite by allowing non-target setup calls in modular tests and refocusing the fitness function on the target call chain, delivering 15.15% higher target method coverage on an SF100 subset.
-
eDySec: A Deep Learning-based Explainable Dynamic Analysis Framework for Detecting Malicious Packages in PyPI Ecosystem
eDySec is a deep learning-based framework that detects malicious PyPI packages through dynamic analysis, halving feature dimensionality, reducing false positives by 82%, false negatives by 79%, and boosting accuracy by 3% with near-perfect stability.
-
Vulnerability Identification by Harnessing Inter-connected Multi-Source Information
VPFinder integrates multi-source semantic information using multi-head attention to achieve 0.941 F1-score in vulnerability identification and 0.610 F1-score in type classification, outperforming prior approaches.
-
FixV2W: Correcting Invalid CVE-CWE Mappings with Knowledge Graph Embeddings
FixV2W uses knowledge graph embeddings plus longitudinal patterns to fix invalid CVE-CWE mappings, correctly predicting the right CWE for 69% of exploited cases in top-10 rankings and raising ML model MRR from 0.174 to 0.608.