LLMs frequently specify library versions with known CVEs in generated code (36-56% of tasks), show low compatibility (20-63%), and converge on the same risky versions across models.
FEA -Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SE 2years
2026 2representative citing papers
LLM agents exhibit constraint decay with assertion pass rates dropping substantially as structural requirements increase in multi-file backend code generation across web frameworks.
citing papers explorer
-
Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions
LLMs frequently specify library versions with known CVEs in generated code (36-56% of tasks), show low compatibility (20-63%), and converge on the same risky versions across models.
-
Constraint Decay: The Fragility of LLM Agents in Backend Code Generation
LLM agents exhibit constraint decay with assertion pass rates dropping substantially as structural requirements increase in multi-file backend code generation across web frameworks.